vllm - curl for completion

vllm - curl for completion

Something i need to be handy when running or testing vllm.

Start up your model using vllm using the following command

vllm serve Qwen/Qwen2.5-1.5B-Instruct

Then run the following command to generate possible 100 tokens outputs from the model.

curl http://localhost:8000/v1/completions \
    -H "Content-Type: application/json" \
    -d '{
        "model": "Qwen/Qwen2.5-1.5B-Instruct",
        "prompt": "San Francisco is a",
        "max_tokens": 100,
        "temperature": 0
    }'

Comments