vllm - using mistral model and resolving some of its issues

Bump into mistral assertion error - while trying to run vllm chat example here. While it is quite confusing what are the actual cause for this, I turn on VLLM_LOGGING_LEVEL=debug

And then re-run the code, then i saw "Forbidden for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/resolve/main/params.json". And you guess it, i had to goto mistral hugging face model card and accept the license agreement.

The log looks like this below:

Once i have accepted the license agreement, i re-run and it was working.

Then quickly, I ran into this error

ValueError: The model's max seq len (128000) is larger than the maximum number of tokens that can be stored in KV cache (32768). Try increasing `VLLM_CPU_KVCACHE_SPACE` or decreasing `max_model_len` when initializing the engine.

To resolve this, i tried running this from the command line:-

vllm serve mistralai/Mistral-7B-Instruct-v0.3 --max-model-len 32768

How to configure this from LLM class itself?

You can set max_seq_len or max_model_len to a lower value as shown here. This is for illustration only. If you use max_model_len it gets converted automatically to max_seq_len.

llm = LLM(model=model_name,

tokenizer_mode="mistral",

config_format="mistral", trust_remote_code=True,

load_format="mistral", max_model_len=1000, dtype="float32")

Detail logs

INFO 04-20 18:50:48 [__init__.py:239] Automatically detected platform cpu.

Traceback (most recent call last):

File "/home/jerwoawsnz/vllm_source/examples/offline_inference/chat_with_tools.py", line 53, in <module>

llm = LLM(model=model_name,

^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/utils.py", line 1149, in inner

return fn(*args, **kwargs)

^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/entrypoints/llm.py", line 248, in __init__

self.llm_engine = LLMEngine.from_engine_args(

^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/llm_engine.py", line 515, in from_engine_args

vllm_config = engine_args.create_engine_config(usage_context)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1167, in create_engine_config

model_config = self.create_model_config()

^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1055, in create_model_config

return ModelConfig(

^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/config.py", line 447, in __init__

hf_config = get_config(self.hf_config_path or self.model,

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 349, in get_config

config = load_params_config(model, revision, token=HF_TOKEN, **kwargs)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 655, in load_params_config

assert isinstance(config_dict, dict)

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AssertionError

Search This Blog

mitzen

vllm - using mistral model and resolving some of its issues

Comments

Popular posts from this blog

vllm : Failed to infer device type

android studio kotlin source is null error

gemini cli getting file not defined error