llama cpp running it in google colab
There are 2 options to do this in Google colab.
Option 1 (Easiest)
With this option, we can just install the relevant package and then run the model.
First we need to install the required packages using the following command:-
Next, we will try to get the model using the following command
# !pip install llama-cpp-python
We can see the model being downloaded:-
And then we will run the following command to test this model
Option 2
llama.cpp is a powerful inference engine and if you wanted to get it running in google colab, you come to the right place. So create your notebook and then build and install llama.cpp. This can take up to 5 minutes.
In the cell, run the following
Then you will get a bunch of binaries including llama-server.
This will run the server in the background. Next is where we issue curl command ask it some question
!curl http://localhost:8000/v1/chat/completions -H "Content-Type:
And we get some decent outputs:
Reference notebook
Option 1 - https://github.com/mitzenjeremywoo/google-colab-notebooks/blob/main/llama_cpp_python_gwen_google_colab.ipynb
Option 2 - https://github.com/mitzenjeremywoo/google-colab-notebooks/blob/main/llama_cpp_running_gwen_3_600k.ipynb
Comments