Deploying TF serving on gke

In the google example, it uses gpu to deploy training and inference job into a gke workload but requires gpu. We can easily convert that to run on normal cpu.

First we need to create

1. a gke autopilot cluster

2. create a cloud storage called PROJECTID-gke-gpu-bucket

Next clone the repository

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
cd kubernetes-engine-samples/ai-ml/gke-online-serving-single-gpu

Configure the following environment variables:

export PROJECT_ID=$(gcloud config get project)
export REGION=$(gcloud config get compute/region)
export K8S_SA_NAME=gpu-k8s-sa
export GSBUCKET=$PROJECT_ID-gke-bucket
export MODEL_NAME=mnist
export CLUSTER_NAME=online-serving-cluster

Create a service account in gcp IAM called "gke-ai-sa" and provide it with 2 roles namely "storage insights collector service" and ""storage object admin".

create the following resources in k8s

kubectl create namespace gke-ai-namespace

kubectl create serviceaccount gpu-k8s-sa --namespace=gke-ai-namespace

Add iam binding to the service account

gcloud iam service-accounts add-iam-policy-binding gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[gke-ai-namespace/gpu-k8s-sa]"

Tied the service account to the service account

kubectl annotate serviceaccount gpu-k8s-sa \
    --namespace gke-ai-namespace \
    iam.gke.io/gcp-service-account=gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com

Deploying the TF Serve. Start copying the model to the storage bucket

Deploy this yaml. As you can see, the bucket is mounted to pod on /data. We copied the model to tfserve-model-repository above. The yaml corresponds to the configuration we have above.

gcloud storage cp src/tfserve-model-repository gs://$GSBUCKET --recursive

You can see that we have the model in keras.

As you might have noticed, we have a rather big memory requirement to run it. 16G.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: tfserve-deployment
  labels:
    app: tfserve-server
spec:
  selector:
    matchLabels:
      app: tfserve
  replicas: 1
  template:
    metadata:
      labels:
        app: tfserve
      annotations:
        gke-gcsfuse/volumes: "true"
    spec:
      containers:
      - name: tfserve-server
        image: tensorflow/serving:2.13.1
        command: [ "tensorflow_model_server", "--model_name=$MODEL_NAME", 
          "--model_base_path=/data/tfserve-model-repository/$MODEL_NAME", 
          "--rest_api_port=8000", "--monitoring_config_file=
          /data/tfserve-model-repository/monitoring_config.txt" ]
        ports:
        - name: http
          containerPort: 8000
        - name: grpc
          containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
          requests:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
            nvidia.com/gpu: 1
        volumeMounts:
        - name: gcs-fuse-csi-vol
          mountPath: /data
          readOnly: false
      serviceAccountName: $K8S_SA_NAME
      volumes:
      - name: gcs-fuse-csi-vol
        csi:
          driver: gcsfuse.csi.storage.gke.io
          readOnly: false
          volumeAttributes:
            bucketName: $GSBUCKET
            mountOptions: "implicit-dirs"
        

envsubst < src/gke-config/deployment-tfserve.yaml | kubectl --namespace=gke-ai-namespace apply -f -

Once it is deployed - you should see the following outputs

Next deploy tf serve service endpoint by runnig the following command

kubectl apply --namespace=gke-ai-namespace -f src/gke-config/service-tfserve.yaml

Get the service and its external IP by running this command

kubectl get services --namespace=gke-ai-namespace

Then try to hit the endpoint

curl -v EXTERNAL_IP:8000/v1/models/mnist

To get predictions from the model, you can setup your client by running the following command

python -m venv ./mnist_client
source ./mnist_client/bin/activate

Install the require packages

pip install -r src/client/tfserve-requirements.txt

Then finally run

cd src/client
python tfserve_mnist_client.py -i EXTERNAL_IP -m mnist -p ./images/0.png

python tfserve_mnist_client.py -i EXTERNAL_IP -m mnist -p ./images/1.png

References:

https://www.tensorflow.org/tfx/serving/docker

https://www.tensorflow.org/tfx/serving/serving_basic

Search This Blog

mitzen

Deploying TF serving on gke

Comments

Popular posts from this blog

git subtree add gives you "Working tree has modifications. Cannot add"

The specified initialization vector (IV) does not match the block size for this algorithm

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20