The specified repository contains sharded GGUF. Ollama does not support this yet
Getting this error while trying to run
ollama run hf.co/unsloth/Kimi-K2-Instruct-GGUF:Q4_K_M "why is the sky blue"
Unfortunately, the other option is to use llama.cpp to use it. Here is how you can do that. Please note the notebook crash out with disk storage full because kimi-k2 does take up quite a lot of space - approximately 373G of storage.
First we installed the dependencies
!apt-get -qq install build-essential cmake
!git clone https://github.com/ggerganov/llama.cpp
%cd llama.cpp
!cmake -B build
!cmake --build build --config Release
After you successfully built it, the llama.cpp will be place in this path here.
!/content/llama.cpp/llama.cpp/build/bin/llama-server -h
And finally to run it, use the following command
!/content/llama.cpp/llama.cpp/build/bin/llama-server -hf unsloth/Kimi-K2-Instruct-GGUF:Q4_K_M
--host 0.0.0.0 --port 8000
And we will see that it will download the kimi-k2 model
Comments