The specified repository contains sharded GGUF. Ollama does not support this yet

Getting this error while trying to run 

ollama run hf.co/unsloth/Kimi-K2-Instruct-GGUF:Q4_K_M "why is the sky blue"

Unfortunately, the other option is to use llama.cpp to use it.  Here is how you can do that. Please note the notebook crash out with disk storage full because kimi-k2 does take up quite a lot of space - approximately 373G of storage. 

First we installed the dependencies 

!apt-get -qq install build-essential cmake

!git clone https://github.com/ggerganov/llama.cpp
%cd llama.cpp
!cmake -B build
!cmake --build build --config Release

After you successfully built it, the llama.cpp will be place in this path here.


!/content/llama.cpp/llama.cpp/build/bin/llama-server -h

And finally to run it, use the following command


!/content/llama.cpp/llama.cpp/build/bin/llama-server -hf unsloth/Kimi-K2-Instruct-GGUF:Q4_K_M
--host 0.0.0.0 --port 8000

And we will see that it will download the kimi-k2 model











Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type