vllm - curl for completion


Something i need to be handy when running or testing vllm. 

Start up your model using vllm using the following command

vllm serve Qwen/Qwen2.5-1.5B-Instruct

Then run the following command to generate possible 100 tokens outputs from the model.



curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-1.5B-Instruct",
        "prompt": "San Francisco is a",
        "max_tokens": 100,
        "temperature": 0
    }'











 

Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type