azure foundry scale model to be able to handle peak request

June 07, 2026

Azure Foundry has three deployment types you should know. For peak traffic, the pattern recommended by Microsoft is Global Provisioned (PTU) as your baseline, with spillover to Global Standard to absorb bursts

Global Provisioned deployments use Azure's global infrastructure to dynamically route traffic to available datacenters, providing reserved model processing capacity with guaranteed throughput combining global routing with lower, more consistent latency than standard.

More details here.

However, this also means data can be processed global and violates data sovereignty.

Typcially we have different skus (to keep things simple) that we can configure the followings :-

1. global

2. standard

3. regional

4. developer

More details of deployment type can be find here.

Here is a diagram that might provide a better understanding.

And depending on your workload requirement, this offers some guide

If your workload is...	Recommended deployment
Prototyping or trying a new model	Instant Models (Preview)
Variable or bursty traffic	Standard / Global Standard
Consistently high traffic	Provisioned
Large batch jobs that are not time-sensitive	Global Batch / Data Zone Batch
Fine-tuned model testing and evaluation	Developer

Limits

For quota, TPM and RPM limits are defined per region, per subscription, and per model — and the highest level of quota restriction is scoped at the Azure subscription level, not tenant level. Spreading deployments across regions via APIM load balancing effectively multiplies your available quota

When we're using Gateway API or APIM to route request to our Azure foundry, it normally would direct traffic to a project tied to a single subscriptions.

If your workload is too, then you may want to re-think how to structure your Azure Foundry project accordingly.

Search This Blog

mitzen

azure foundry scale model to be able to handle peak request

Comments

Popular posts from this blog

gemini cli getting file not defined error

mongosh install properly

vllm : Failed to infer device type