vllm - using mistral model and resolving some of its issues
Bump into mistral assertion error - while trying to run vllm chat example here. While it is quite confusing what are the actual cause for this, I turn on VLLM_LOGGING_LEVEL=debug
And then re-run the code, then i saw "Forbidden for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/resolve/main/params.json". And you guess it, i had to goto mistral hugging face model card and accept the license agreement.
The log looks like this below:
Once i have accepted the license agreement, i re-run and it was working.
Then quickly, I ran into this error
ValueError: The model's max seq len (128000) is larger than the maximum number of tokens that can be stored in KV cache (32768). Try increasing `VLLM_CPU_KVCACHE_SPACE` or decreasing `max_model_len` when initializing the engine.
To resolve this, i tried running this from the command line:-
vllm serve mistralai/Mistral-7B-Instruct-v0.3 --max-model-len 32768
Detail logs
INFO 04-20 18:50:48 [__init__.py:239] Automatically detected platform cpu.
Traceback (most recent call last):
File "/home/jerwoawsnz/vllm_source/examples/offline_inference/chat_with_tools.py", line 53, in <module>
llm = LLM(model=model_name,
^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/utils.py", line 1149, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/entrypoints/llm.py", line 248, in __init__
self.llm_engine = LLMEngine.from_engine_args(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/llm_engine.py", line 515, in from_engine_args
vllm_config = engine_args.create_engine_config(usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1167, in create_engine_config
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1055, in create_model_config
return ModelConfig(
^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/config.py", line 447, in __init__
hf_config = get_config(self.hf_config_path or self.model,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 349, in get_config
config = load_params_config(model, revision, token=HF_TOKEN, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 655, in load_params_config
assert isinstance(config_dict, dict)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AssertionError


Comments