gke - serving model with monitoring

gke - serving model with monitoring

This is a great link for serving llm model with monitoring

https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-multihost-gpu

Configuring autoscaling for LLM

https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/autoscaling

Comments