Deploying TF serving on gke

 

In the google example, it uses gpu to deploy training and inference job into a gke workload but requires gpu. We can easily convert that to run on normal cpu. 

First we need to create 

1. a gke autopilot cluster

2. create a cloud storage called PROJECTID-gke-gpu-bucket

Next clone the repository 

git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
cd kubernetes-engine-samples/ai-ml/gke-online-serving-single-gpu

Configure the following environment variables:


export PROJECT_ID=$(gcloud config get project)
export REGION=$(gcloud config get compute/region)
export K8S_SA_NAME=gpu-k8s-sa
export GSBUCKET=$PROJECT_ID-gke-bucket
export MODEL_NAME=mnist
export CLUSTER_NAME=online-serving-cluster


Create a service account in gcp IAM called "gke-ai-sa" and provide it with 2 roles namely "storage insights collector service"  and ""storage object admin". 

create the following resources in k8s 

kubectl create namespace gke-ai-namespace

kubectl create serviceaccount gpu-k8s-sa --namespace=gke-ai-namespace


Add iam binding to the service account


gcloud iam service-accounts add-iam-policy-binding gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[gke-ai-namespace/gpu-k8s-sa]"


Tied the service account  to the service account 

kubectl annotate serviceaccount gpu-k8s-sa \
    --namespace gke-ai-namespace \
    iam.gke.io/gcp-service-account=gke-ai-sa@PROJECT_ID.iam.gserviceaccount.com


Deploying the TF Serve. Start copying the model to the storage bucket 

Deploy this yaml. As you can see, the bucket is mounted to pod on /data. We copied the model to tfserve-model-repository above. The yaml corresponds to the configuration we have above. 

gcloud storage cp src/tfserve-model-repository gs://$GSBUCKET --recursive

You can see that we have the model in keras.



As you might have noticed, we have a rather big memory requirement to run it. 16G.


apiVersion: apps/v1
kind: Deployment
metadata:
  name: tfserve-deployment
  labels:
    app: tfserve-server
spec:
  selector:
    matchLabels:
      app: tfserve
  replicas: 1
  template:
    metadata:
      labels:
        app: tfserve
      annotations:
        gke-gcsfuse/volumes: "true"
    spec:
      containers:
      - name: tfserve-server
        image: tensorflow/serving:2.13.1
        command: [ "tensorflow_model_server", "--model_name=$MODEL_NAME",
"--model_base_path=/data/tfserve-model-repository/$MODEL_NAME",
"--rest_api_port=8000", "--monitoring_config_file=
/data/tfserve-model-repository/monitoring_config.txt" ]
        ports:
        - name: http
          containerPort: 8000
        - name: grpc
          containerPort: 8500
        resources:
          limits:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
          requests:
            cpu: "4"
            memory: "16Gi"
            ephemeral-storage: "1Gi"
            nvidia.com/gpu: 1
        volumeMounts:
        - name: gcs-fuse-csi-vol
          mountPath: /data
          readOnly: false
      serviceAccountName: $K8S_SA_NAME
      volumes:
      - name: gcs-fuse-csi-vol
        csi:
          driver: gcsfuse.csi.storage.gke.io
          readOnly: false
          volumeAttributes:
            bucketName: $GSBUCKET
            mountOptions: "implicit-dirs"
       


envsubst < src/gke-config/deployment-tfserve.yaml | kubectl --namespace=gke-ai-namespace apply -f -

Once it is deployed - you should see the following outputs 


Next deploy tf serve service endpoint by runnig the following command 

kubectl apply --namespace=gke-ai-namespace -f src/gke-config/service-tfserve.yaml



Get the service and its external IP by running this command 

kubectl get services --namespace=gke-ai-namespace

Then try to hit the endpoint 

curl -v EXTERNAL_IP:8000/v1/models/mnist

To get predictions from the model, you can setup your client by running the following command 


python -m venv ./mnist_client
source ./mnist_client/bin/activate

Install the require packages

pip install -r src/client/tfserve-requirements.txt

Then finally run

cd src/client
python tfserve_mnist_client.py -i EXTERNAL_IP -m mnist -p ./images/0.png
python tfserve_mnist_client.py -i EXTERNAL_IP -m mnist -p ./images/1.png


References:

https://www.tensorflow.org/tfx/serving/docker

https://www.tensorflow.org/tfx/serving/serving_basic






Comments

Popular posts from this blog

The specified initialization vector (IV) does not match the block size for this algorithm

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20