vllm - curl for completion
Something i need to be handy when running or testing vllm.
Start up your model using vllm using the following command
vllm serve Qwen/Qwen2.5-1.5B-Instruct
Then run the following command to generate possible 100 tokens outputs from the model.
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen/Qwen2.5-1.5B-Instruct",
"prompt": "San Francisco is a",
"max_tokens": 100,
"temperature": 0
}'
Comments