serving model triton model on gke
In the google example, it uses gpu to deploy training and inference job into a gke workload but requires gpu. We can easily convert that to run on normal cpu.
First we need to create
1. a gke autopilot cluster
2. create a cloud storage called PROJECTID-gke-gpu-bucket
Next clone the repository
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
cd kubernetes-engine-samples/ai-ml/gke-online-serving-single-gpu
Configure the following environment variables:
export PROJECT_ID=$(gcloud config get project)
export REGION=$(gcloud config get compute/region)
export K8S_SA_NAME=gpu-k8s-sa
export GSBUCKET=$PROJECT_ID-gke-bucket
export MODEL_NAME=mnist
export CLUSTER_NAME=online-serving-cluster
Create a service account in gcp IAM called "gke-ai-sa" and provide it with 2 roles namely "storage insights collector service" and ""storage object admin".
create the following resources in k8s
kubectl create namespace gke-ai-namespace
kubectl create serviceaccount gpu-k8s-sa --namespace=gke-ai-namespace
Add iam binding to the service account
gcloud iam service-accounts add-iam-policy-binding gke-ai-sa@PROJECT_ID .iam.gserviceaccount.com \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:PROJECT_ID .svc.id.goog[gke-ai-namespace/gpu-k8s-sa]"
Tied the service account to the service account
kubectl annotate serviceaccount gpu-k8s-sa \
--namespace gke-ai-namespace \
iam.gke.io/gcp-service-account=gke-ai-sa@PROJECT_ID .iam.gserviceaccount.com
Deploying the Triton inference server. Start copying the model to the bucket
gcloud storage cp src/triton-model-repository gs://$GSBUCKET --recursive
Deploy this yaml. As you can see, the bucket is mounted to pod on /data. We copied the model to triton-model-repository above. The yaml corresponds to the configuration we have above.
You can see that we have the model in keras.
As you might have noticed, we have a rather big memory requirement to run it. 16G.
envsubst < src/gke-config/deployment-triton.yaml | kubectl --namespace=gke-ai-namespace apply -f -
Once it is deployed - you should see the following outputs
Next deploy triton service endpoint by runnig the following command
kubectl apply --namespace=gke-ai-namespace -f src/gke-config/service-triton.yaml
Then try to hit the endpoint
curl -v
To get predictions from the model, you can setup your client by running the following command
python -m venv ./mnist_client
source ./mnist_client/bin/activate
pip install -r src/client/triton-requirements.txt
Getting the outputs from the
Comments