gke running ollama with qwen3:0.6b example

September 26, 2025

In this post, we are going to deploy gwen to an autopilot cluster. This is really better than creating your own dockerfile and your deployment yaml.

To get started, we would run the following command:

helm repo add ollama-helm https://otwld.github.io/ollama-helm/

helm repo update ollama-helm

helm upgrade --install ollama ollama-helm/ollama \
  --namespace ollama \
  --create-namespace \
  --values values.yaml

The value file would look something like this. We are using gwen3.0.6b. You can use

choose to use other model too. 

ollama:
  gpu:
    enabled: false
    
  service:
    type: "ClusterIP"

  models:
    pull:
      - qwen3:0.6b

persistentVolume:
  enabled: true
  size: 10Gi
  storageClass: "standard-rwx"

This takes abit longer like 5-10 minutest to provision the storage and get the pod running. 

It will automatically download your chosen model. In a way it is way better then 

baking it into your dockerfile. 

You can use other option if your machine supports the storage class like hyperdisk-ml

After that, you will noticed the instances are running. 

You can quickly run the following command to get a pod with curl command running 

kubectl run curl-test -n ollama --image=curlimages/curl --rm -it -- sh

You can see that if we hit the endpoint, we get some responses. 

Search This Blog

mitzen

gke running ollama with qwen3:0.6b example

Comments

Popular posts from this blog

vllm : Failed to infer device type

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

android studio kotlin source is null error