gke - serving model with monitoring
This is a great link for serving llm model with monitoring
https://cloud.google.com/kubernetes-engine/docs/tutorials/serve-multihost-gpu
Configuring autoscaling for LLM
https://cloud.google.com/kubernetes-engine/docs/how-to/machine-learning/inference/autoscaling
Comments