gke running ollama with qwen3:0.6b example
In this post, we are going to deploy gwen to an autopilot cluster. This is really better than creating your own dockerfile and your deployment yaml.
To get started, we would run the following command:
helm repo add ollama-helm https://otwld.github.io/ollama-helm/
helm repo update ollama-helm
helm upgrade --install ollama ollama-helm/ollama \
--namespace ollama \
--create-namespace \
--values values.yaml
The value file would look something like this. We are using gwen3.0.6b. You can use
choose to use other model too.
ollama:
gpu:
enabled: false
service:
type: "ClusterIP"
models:
pull:
- qwen3:0.6b
persistentVolume:
enabled: true
size: 10Gi
storageClass: "standard-rwx"
This takes abit longer like 5-10 minutest to provision the storage and get the pod running.
It will automatically download your chosen model. In a way it is way better then
baking it into your dockerfile.
You can use other option if your machine supports the storage class like hyperdisk-ml
After that, you will noticed the instances are running.
You can quickly run the following command to get a pod with curl command running
kubectl run curl-test -n ollama --image=curlimages/curl --rm -it -- sh
You can see that if we hit the endpoint, we get some responses.
Comments