Posts

Showing posts from 2025

github actions setup and authenticating terraform provider using managed identity

Image
You can use the following yaml to setup terraform azurerm which authenticate using managed identity.  All you need is the following information  -  AZURE_CLIENT_ID - this would be your managed identity client id - AZURE_SUBSCRIPTION_ID  - AZURE_TENANT_ID  The permission is important and you need this to work. name : 'Build .Net app' on : [ push , workflow_dispatch ] permissions :   id-token : write   contents : read   jobs :   build-and-deploy :     runs-on : ubuntu-latest     steps :       - name : Azure login         uses : azure/login@v2         with :           client-id : ${{ secrets.AZURE_CLIENT_ID }}           tenant-id : ${{ secrets.AZURE_TENANT_ID }}           subscription-id : ${{ secrets.AZURE_SUBSCRIPTION_ID }}       - name : Azure CLI script   ...

how to call azure REST API endpoint

The followings are steps you need to do to call Azure REST API endpoint. We need to obtain the token and then hit this endpoint using bearer token authorizations TOKEN=$(az account get-access-token --query accessToken --output tsv)  curl -X GET \  " https://management.azure.com/ subscriptions/ $SUBSCRIPTION_ID /resourceGroups/$RESOURCE_ GROUP/providers/Microsoft.Sql/ servers/$SERVER_NAME?api- version=2024-02-01"  \ -H  "Authorization: Bearer $TOKEN"  

azure devops terraform provider - needs the following permission(s) on the resource Users to perform this action: Add Users

Image
While trying to setup my users using azure devops terraform provider, i bump into this error "needs the following permission(s) on the resource Users to perform this action: Add Users" Then i had to update the following permission to get it to work.

llm tool to help with quantization

To decrease the size and memory footprint of machine learning models, a technique called quantization is employed. This method, akin to lossy image compression, converts model weights into lower precision formats such as 8-bit or 4-bit. While this significantly reduces resource demands, it's important to note that, like image compression, quantization can potentially lead to a reduction in the model's accuracy. Tool that you can use are  https://github.com/ModelCloud/GPTQModel https://github.com/casper-hansen/AutoAWQ

k8s gateway api - setting up gateway and routes to different service in a cluster

  To setup a k8s gateway api in a gke cluster, you typically have to  - create gatewayapi - setup necessary http route  - deploy your services A and service B. Setting up gateway by running the following yaml apiVersion : gateway.networking.k8s.io/v1 kind : Gateway metadata :   name : my-shared-gateway spec :   gatewayClassName : gke-l7-regional-external-managed   listeners :     - protocol : HTTP # Or HTTPS for production       port : 80 # Or 443 for HTTPS       name : http       hostname : "*.example.com" create your http route A. apiVersion : gateway.networking.k8s.io/v1 kind : HTTPRoute metadata :   name : a-route   namespace : default spec :   parentRefs :   - name : my-shared-gateway   hostnames :   - "a.example.com"   rules :   - matches :     - path :         type : PathPrefix         value : /   ...

gke gateway :- error ensuring load balancer: generic::invalid_argument: Insert: Invalid value for field 'resource.target'

Image
This error  comes about when you do not have a proxy subnet created for your region when you're trying to deploy load balancer or when creating k8s API gateway's gateway resources.  These load balancers use a Google-managed proxy layer, and for internal communication between the proxy and your backend VMs, GCP needs a reserved IP range within your VPC. That’s where the proxy-only subnet comes in. You can create it by going here: https://cloud.google.com/load-balancing/docs/proxy-only-subnets#proxy_only_subnet_create You can see after i apply the fixes, my gateway resources starts to sync successfully.  And then allocated a ip for my gateway here:- 

gke - deploying multiple llm models with gradio interface

  First create a gke cluster and you will need to be able to support GPU (required by the hugging face text generation inference) . Once you have provision one create a secret :-  kubectl create secret generic l4-demo \   --from-literal=HUGGING_FACE_TOKEN=hf_token Next choose the ML of your choice for deployment. In this example, we going to use falcon.  Please refer to this page here for other models. apiVersion : apps/v1 kind : Deployment metadata :   name : llm spec :   replicas : 1   selector :     matchLabels :       app : llm   template :     metadata :       labels :         app : llm     spec :       containers :       - name : llm         image : us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.1-4.ubuntu2204.py310         resources :  ...

rust converting from u32 to string and vice versa

Image
  Converting to int to string and vice versa can be challenging especially for beginner like me. The following code demonstrate how to convert back and forth. fn main () {     // u32 to string     let a = 32 ;     let s = a . to_string ();     println! ( "{}" , s );     // string to u32     let a = "32" ;     let b : u32 = a . parse () . unwrap ();  } Notice how vscode infer the type here u can see  a.to_string() returns a string - which are stored in heap. Converting str to u32 can be achieve by calling parse().unwrap().  unwrap() returns a Result<T, E> which gives results and error.

using rust default value

Rust Default::default is really useful to place default value to a structure or pass it into a function.  Example below shows how we use the .. operator to assigned default value to more than one fields, when we instantiate the struct / type Point.  #[derive( Debug , Default )] struct Point {     x : i32 ,     y : i32 ,     a : i32 ,     isOk : bool } fn main () {     let s = Point {      isOk : true ,       .. Default :: default ()    };         println! ( "{:?}" , s ); }  You can also use that to pass function - for example and it prints out 0 as the total.  fn print_value ( total : u32 ) {     print! ( "{}" , total ); } fn main () {     print_value ( Default :: default ())     }

gcloud - local login account setup

After installing gcloud, we need to setup our user account for testing and development purpose. To do that run the following command:- gcloud init  gcloud auth application-default login

dockerfile and $@

Image
The $@ symbol are basically expanding or executing arguments passed from a client/entry point to the executing code.  Let's say for example we have a file call ./hello2.sh #!/bin/bash echo "Hello, World! - Argument: $@ " if we run hello2.sh test1 test2 test3 We will get the following output From a docker perspective, we can make a highly customizable image. For example, let's say we have this entrypoint.sh with the following contents entrypoint.sh #!/bin/bash ldconfig 2> /dev/null || echo 'unable to refresh ld cache, not a big deal in most cases' source /usr/src/.venv/bin/activate exec text-generation-launcher $@ And in my dockerfile,  Dockerfile ENTRYPOINT [ "/entrypoint.sh" ] CMD [ "--model-id" , "tiiuae/falcon-40b-instruct"] When my container runs, essentially what i am executing will be as follows. /entrypoint.sh --model-id tiiuae/falcon-40b-instruct which in turn calls  ./text-generation-launcher  --model-id tiiu...

gke - testing out Mixtral-8x7B-Instruct-v0.1 deployment not successful on a non-gpu pool

To deploy this, it requires GPU. I was trying to deploy without one and wasn't successfully mainly because the images  huggingface-text-generation-inference does not have a cpu compatible image.  This is the image that is required image : us-docker.pkg.dev/ deeplearning-platform-release/gcr.io/huggingface-text- generation-inference-cu124.2-3.ubuntu2204.py311 First you need to create a secret and ensure you have accepted mistral agreement. If not head over to hugging face and accept the license agreement. Then create your secret kubectl create secret generic l4-demo \   --from-literal=HUGGING_FACE_TOKEN=hf_your-key-here Then deploy gradio and your model to k8s cluster. You can find additional detail here.  The gradio deployment looks like this  apiVersion : apps/v1 kind : Deployment metadata :   name : gradio   labels :     app : gradio spec :   strategy :     type : Recreate   replicas : 1   selector :     m...