azure foundry scale model to be able to handle peak request
Azure Foundry has three deployment types you should know. For peak traffic, the pattern recommended by Microsoft is Global Provisioned (PTU) as your baseline, with spillover to Global Standard to absorb bursts Global Provisioned deployments use Azure's global infrastructure to dynamically route traffic to available datacenters, providing reserved model processing capacity with guaranteed throughput combining global routing with lower, more consistent latency than standard. More details here . However, this also means data can be processed global and violates data sovereignty. Typcially we have different skus (to keep things simple) that we can configure the followings :- 1. global 2. standard 3. regional 4. developer More details of deployment type can be find here . Here is a diagram that might provide a better understanding. And depending on your workload requirement, this offers some guide If your workload is... Recommended deployment Prototyping or trying a new model Ins...