vllm error in running sample: "reshape_and_cache_cpu

vllm error in running sample: "reshape_and_cache_cpu_impl" not implemented for 'Half'

April 20, 2025

When running the sample, I hit this error above when running one of the basic example.

To resolve it, you just need to add dtype and set it to "float32" - as shown below:-

from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "Hello, my name is",
    "The president of the United States is",
    "The capital of France is",
    "The future of AI is",
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.8, top_p=0.95)


def main():
    # Create an LLM.
    llm = LLM(model="facebook/opt-125m", dtype="float32")
    # Generate texts from the prompts.

Example output after the fix

Search This Blog

mitzen

vllm error in running sample: "reshape_and_cache_cpu_impl" not implemented for 'Half'

Comments

Popular posts from this blog

vllm : Failed to infer device type

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

android studio kotlin source is null error