gke running ollama with qwen3:0.6b example

In this post, we are going to deploy gwen to an autopilot cluster. This is really better than creating your own dockerfile and your deployment yaml.

To get started, we would run the following command:


helm repo add ollama-helm https://otwld.github.io/ollama-helm/

helm repo update ollama-helm

helm upgrade --install ollama ollama-helm/ollama \
  --namespace ollama \
  --create-namespace \
  --values values.yaml

The value file would look something like this. We are using gwen3.0.6b. You can use
choose to use other model too.


ollama:
  gpu:
    enabled: false
   
  service:
    type: "ClusterIP"

  models:
    pull:
      - qwen3:0.6b

persistentVolume:
  enabled: true
  size: 10Gi
  storageClass: "standard-rwx"


This takes abit longer like 5-10 minutest to provision the storage and get the pod running.
It will automatically download your chosen model. In a way it is way better then
baking it into your dockerfile.

You can use other option if your machine supports the storage class like hyperdisk-ml

After that, you will noticed the instances are running.



You can quickly run the following command to get a pod with curl command running

kubectl run curl-test -n ollama --image=curlimages/curl --rm -it -- sh

You can see that if we hit the endpoint, we get some responses.








Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type