Posts

azure foundry scale model to be able to handle peak request

Image
Azure Foundry has three deployment types you should know. For peak traffic, the pattern recommended by Microsoft is Global Provisioned (PTU) as your baseline, with spillover to Global Standard to absorb bursts Global Provisioned deployments use Azure's global infrastructure to dynamically route traffic to available datacenters, providing reserved model processing capacity with guaranteed throughput combining global routing with lower, more consistent latency than standard. More details here . However, this also means data can be processed global and violates data sovereignty. Typcially we have different skus (to keep things simple) that we can configure the followings :- 1. global 2. standard 3. regional  4. developer More details of deployment type can be find here . Here is a diagram that might provide a better understanding.  And depending on your workload requirement, this offers some guide If your workload is... Recommended deployment Prototyping or trying a new model Ins...

vscode using inline chat

Image
 We can use vscode copilot inline chat by click on the right button and select ' inline chat ' or press  And depending on your input, you will eventually get some response or even inline file updates.

terraform console - helps with tracking down what is the value of your terraform variable

Image
It is important to be able to tell what sorts of value we have in our variables which affects how we declare terraform variables, use of maps and list and most importantly help us to debug.  To do what, we will use terraform console It supports the followings:-   -var 'foo=bar'    Set a variable in the Terraform configuration. This flag can be set multiple times.   -var-file=foo     Set variables in the Terraform configuration from a file. If "terraform.tfvars" or any ".auto.tfvars" files are present, they will be automatically loaded. Example usage :-

mcp hosted on a server and client app

In this post, we are going to host our mcp tool on a remote server and let it called by a client.  First ensure we have initialize the directory and added the right packages here  uv add fastmcp uvicorn Then we will have our server.py where we will use uvicorn to host it and then expose /sse endpoint for client to call. server.py # server.py import os from fastmcp import FastMCP         # 1. Initialize FastMCP mcp = FastMCP ( " Remote Centralized Tooling " ) # 2. Define your tool(s) @ mcp . tool () def calculate_server_metrics ( cpu_load : float , memory_load : float ) -> str :     """ Performs complex analysis on system performance metrics. """     # Since this lives entirely on your remote server, you can update this logic     # at any time, and local clients will get the updated behavior instantly.     score = ( cpu_load * 0.7 ) + ( memory_load * 0.3 )     if score > 80 : ...

setting up and using azure mcp server with vscode

Image
We can easily setup and use Azure MCP server by  1. Installing the extension Azure MCP Server as shown here:- 2. Enabling Azure MCP Server in the agent tool integration. Please ensure you have started the server.  And as you can see I have enabled and had it running here:-  Then this will appear in the logs :- 3. Prompt it to do its magic  List my Azure Storage containers

azure data factory - using anaology to understand data factory components

Image
This diagram provides a good analogy of Azure Data Factory vs delivery to map out what needs to be built in order for a data pipeline to work And comparing this to: Image taken from youtube https://www.youtube.com/watch?v=EpDkxTHAhOs&list=PLGjZwEtPN7j8b9dPA0HrtJDptOB69B506 When you build a pipeline in Data Factory, you are essentially setting up a supply chain. Here is a clear breakdown mapping how a real-world delivery system translates directly into what you build in ADF: 1. Linked Services = The Delivery Addresses & Access Keys Before a delivery driver can pick up or drop off a package, they need the exact address and the key code to get through the security gate. The Analogy: The factory/warehouse address (Source) and the customer's home address (Destination), along with the security badges required to enter. In ADF: Linked Services store your connection strings and authentication details (like passwords or Managed Identities). They tell ADF exactly how to securely c...

android send notification from firebase to android devices

Image
  Login to https://console.firebase.google.com/ and then goto your project. And then under " Devops and Engagement ". Then create a new campaign.  And then create a campaign as shown here and select "Firebase notification messages". And then let's create a test notifcations. Click on "Send test message". You will be prompted to insert FCM registration token.  After you please your token registered on your mobile devices, then you will be able to receive notification messages on your emulator What it looks like on your emulator :-

android : android activity does not exist but it does

While trying to run my app on my emulator, i kept on getting issue "my activity does not exist" but it does. This is due to a ghost cache problem. After making the following updates adb uninstall com.appcoreopc.getmyhome Then I was able to deploy and get my app to work

android unable to kill current application

 If you run into issue trying to install your app but being told unable to kill your current app, then you can try to run  adb shell am force-stop com.appcoreopc.getmyhome adb kill-server; adb start-server  

python huggingface loaddataset package erroring out - partially initialize module dataset has no attribute utils.

Image
The error AttributeError: partially initialized module 'datasets' has no attribute 'utils' suggests a problem with the datasets library's installation or an internal conflict, possibly due to a circular import. And all it took for us to do is Restart runtime and then re-run the cell.

Model training papers of interest

 Interesting papers for model optimizations  Paper: Polar Express The paper introduces Polar Express, a GPU-friendly polynomial method for computing the matrix polar decomposition, optimizing convergence speed and error minimization.It adapts polynomials iteratively, outperforming classical methods in deep learning applications like Muon, GPT-2 training, and image classification, with robust finite-precision stability and potential for large-scale, aspect-ratio-optimized, spectrum-aware acceleration. https://arxiv.org/pdf/2505.16932 Paper: LowRA Paper : "LowRA: Accurate and Efficient LoRA Fine-Tuning of LLMs under 2 Bits" Stanford University — Zhou, Zhang, Kumbong, Olukotun arXiv: 2502.08141 (Feb 2025, accepted ICLR 2026) https://arxiv.org/abs/2502.08141 The problem it solves QLoRA (what you'd use in the training code above) quantizes the base model to 4-bit but keeps the LoRA adapters themselves in full precision (bf16). LowRA asks: what if we also aggressively quant...