vllm - using mistral model and resolving some of its issues

Bump into mistral assertion error - while trying to run vllm chat example here. While it is quite confusing what are the actual cause for this, I turn on VLLM_LOGGING_LEVEL=debug

And then re-run the code, then i saw "Forbidden for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/resolve/main/params.json". And you  guess it, i had to goto mistral hugging face model card and accept the license agreement. 

The log looks like this below: 

Once i have accepted the license agreement, i re-run and it was working.

Then quickly, I ran into this error 

ValueError: The model's max seq len (128000) is larger than the maximum number of tokens that can be stored in KV cache (32768). Try increasing `VLLM_CPU_KVCACHE_SPACE` or decreasing `max_model_len` when initializing the engine.

To resolve this, i tried running this from the command line:- 

vllm serve mistralai/Mistral-7B-Instruct-v0.3 --max-model-len 32768


How to configure this from LLM class itself? 

You can set max_seq_len or max_model_len to a lower value as shown here. This is for illustration only.  If you use max_model_len it gets converted automatically to max_seq_len. 



    
        llm = LLM(model=model_name,
          tokenizer_mode="mistral",
          config_format="mistral", trust_remote_code=True,
          load_format="mistral", max_model_len=1000, dtype="float32")

Detail logs

INFO 04-20 18:50:48 [__init__.py:239] Automatically detected platform cpu.

Traceback (most recent call last):

  File "/home/jerwoawsnz/vllm_source/examples/offline_inference/chat_with_tools.py", line 53, in <module>

    llm = LLM(model=model_name,

          ^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/utils.py", line 1149, in inner

    return fn(*args, **kwargs)

           ^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/entrypoints/llm.py", line 248, in __init__

    self.llm_engine = LLMEngine.from_engine_args(

                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/llm_engine.py", line 515, in from_engine_args

    vllm_config = engine_args.create_engine_config(usage_context)

                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1167, in create_engine_config

    model_config = self.create_model_config()

                   ^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1055, in create_model_config

    return ModelConfig(

           ^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/config.py", line 447, in __init__

    hf_config = get_config(self.hf_config_path or self.model,

                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 349, in get_config

    config = load_params_config(model, revision, token=HF_TOKEN, **kwargs)

             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

  File "/home/jerwoawsnz/myenv/lib/python3.12/site-packages/vllm-0.8.5.dev103+g87aaadef7.cpu-py3.12-linux-x86_64.egg/vllm/transformers_utils/config.py", line 655, in load_params_config

    assert isinstance(config_dict, dict)

           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

AssertionError 






Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type