serving model triton model on gke

In the google example, it uses gpu to deploy training and inference job into a gke workload but requires gpu. We can easily convert that to run on normal cpu.

First we need to create

1. a gke autopilot cluster

2. create a cloud storage called PROJECTID-gke-gpu-bucket

Next clone the repository

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
cd kubernetes-engine-samples/ai-ml/gke-online-serving-single-gpu

Configure the following environment variables:

export PROJECT_ID=$(gcloud config get project)
export REGION=$(gcloud config get compute/region)
export K8S_SA_NAME=gpu-k8s-sa
export GSBUCKET=$PROJECT_ID-gke-bucket
export MODEL_NAME=mnist
export CLUSTER_NAME=online-serving-cluster

Create a service account in gcp IAM called "gke-ai-sa" and provide it with 2 roles namely "storage insights collector service" and ""storage object admin".

create the following resources in k8s

kubectl create namespace gke-ai-namespace

kubectl create serviceaccount gpu-k8s-sa --namespace=gke-ai-namespace

Add iam binding to the service account

gcloud iam service-accounts add-iam-policy-binding gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[gke-ai-namespace/gpu-k8s-sa]"

Tied the service account to the service account

kubectl annotate serviceaccount gpu-k8s-sa \
    --namespace gke-ai-namespace \
    iam.gke.io/gcp-service-account=gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com

Deploying the Triton inference server. Start copying the model to the bucket

gcloud storage cp src/triton-model-repository gs://$GSBUCKET --recursive

Deploy this yaml. As you can see, the bucket is mounted to pod on /data. We copied the model to triton-model-repository above. The yaml corresponds to the configuration we have above.

You can see that we have the model in keras.

As you might have noticed, we have a rather big memory requirement to run it. 16G.

apiVersion: apps/v1triton-model-repository
kind: Deployment
metadata:
  name: triton-deployment
  labels:
    app: triton-server
spec:
  selector:
    matchLabels:
      app: triton
  replicas: 1
  template:
    metadata:
      labels:
        app: triton
      annotations:
        gke-gcsfuse/volumes: "true"
    spec:
      containers:
      - name: triton-server
        image: nvcr.io/nvidia/tritonserver:22.01-py3
        command: [ "tritonserver", "--model-repository=/data/triton-model-repository", "--exit-on-error=false"  ]
        ports:
        - name: grpc
          containerPort: 8001
        - name: http
          containerPort: 8000
        - name: metrics
          containerPort: 8002
        resources:
          limits:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
          requests:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
        volumeMounts:
        - name: gcs-fuse-csi-vol
          mountPath: /data
          readOnly: false
      serviceAccountName: $K8S_SA_NAME
      volumes:
      - name: gcs-fuse-csi-vol
        csi:
          driver: gcsfuse.csi.storage.gke.io
          readOnly: false
          volumeAttributes:
            bucketName: $GSBUCKET
            mountOptions: "implicit-dirs"
        

envsubst < src/gke-config/deployment-triton.yaml | kubectl --namespace=gke-ai-namespace apply -f -

Once it is deployed - you should see the following outputs

Next deploy triton service endpoint by runnig the following command

kubectl apply --namespace=gke-ai-namespace -f src/gke-config/service-triton.yaml

Get the service and its external IP by running this command

kubectl get services --namespace=gke-ai-namespace

Then try to hit the endpoint

curl -v EXTERNAL_IP:8000/v2/health/ready

To get predictions from the model, you can setup your client by running the following command

python -m venv ./mnist_client
source ./mnist_client/bin/activate

Install the require packages

pip install -r src/client/triton-requirements.txt

Getting the outputs from the

cd src/client
python triton_mnist_client.py -i EXTERNAL_IP -m mnist -p ./images/0.png

Search This Blog

mitzen

serving model triton model on gke

Comments

Popular posts from this blog

git subtree add gives you "Working tree has modifications. Cannot add"

The specified initialization vector (IV) does not match the block size for this algorithm

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20