vllm error in running sample: "reshape_and_cache_cpu_impl" not implemented for 'Half'

When running the sample, I hit this error above when running one of the basic example. 

To resolve it, you just need to add dtype and set it to "float32" - as shown below:- 


from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)


def main():
    # Create an LLM.
    llm = LLM(model="facebook/opt-125m", dtype="float32")
    # Generate texts from the prompts.





Example output after the fix





Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type