Posts

dotnet ??= and array? - getting basic understanding

Working with ??= is really good to reduce some boiler plate code such as the ones below // using ??= List < string > ? arrayList3 = null ; var result5 = arrayList3 ??= new List < string >(); // replacing tedious if else statement :) if ( arrayList3 == null ) {     var result6 = new List < string > { "test" }; } And allow us to quickly, checks for null and apply default value string ? dummyString = null ; arrayList . Add ( dummyString ??= "default" ); To work with array?[], it seems to be confusing to me. I wasn't really sure if it is checking if the array itself is null or array index value is null. It is checking if the array is null.  And of course, whether it will throw exception if i tries to access an invalid index. Additional details are shown here: // working with array var arrayList = new List < string >(); arrayList . Add ( "test" ); //var result = arrayList?[2]; // exception var result2 = a...

hotchoc graphql with database services

To setup graphql backend that lookup database records, you can use  hotchoc's ServiceAttribute, as shown in the code snippet here: using System . Collections . Generic ; using HotChocolate ; using HotChocolate . Types ; using HotChocolate . Types . Relay ; namespace Accounts . Types {     [ QueryType ]     public class Query     {                 public IEnumerable < User > GetUsers ([ Service ] UserRepository repository ) =>             repository . GetUsers ();         [ NodeResolver ]         public User GetUser ( int id , [ Service ] UserRepository repository ) =>             repository . GetUser ( id );     } } You can add database customization code later, for a demo example, a UserRepository would look like this.  public class UserRepository     { ...

dotnet - uninstalling templates

Image
  To uninstall template, first you need to figure out what template you already installed. To do that,  you can use  dotnet new -u This will get you template currently installed Once you have done that, you can use the following command to uninstall the template. In my laptop, my templates is still using dotnet 6, there's a newer version which support dotnet 8 .  dotnet new uninstall HotChocolate.Templates

graphql spec

 Here is a location for graphql spec https://spec.graphql.org/

github action directory basic detalis

Image
Github action tends to place code into 2 levels, for example if you called your repository  github-action-dotnet-build , it will be check out into  github-action-dotnet-build/github-action-dotnet-build . What if you have publish artifact which  you use in another stage, where are the code being downloaded into?  It will be place in the same directory. For example if you publish an artifact and called it myartifact, when you download your artifact. It will be automatically extracted into the default folder. ( GITHUB_WORKSPACE) In this example here I download my artifact Then you can see that it is being extracted automatically into  GITHUB_WORKSPACE    if you do not specify the path. And notice that is unzip for you too. 

github actions stage dependency using needs

In github action, we often needs to specify dependencies between stages - to ensure stage B will continue to run stage A if stage A is completed. In order to show this relationship we can use "needs" as shown below:- artifact-work :     runs-on : ubuntu-latest     needs : build-and-publish     steps :     - name : download artifact       uses : actions/download-artifact@v4       with :         name : published-artifact-net

azurerm terraform authentication using oidc

Image
It is confusing trying to understand which authentication method to use in terraform especially when there's concepts like managed identity and oidc. What is the difference and how to use it?  OIDC authentication using Azure Devops task Abit of confusion i think - but this setup here ( https://registry.terraform. io/providers/hashicorp/ azurerm/latest/docs/guides/ service_principal_oidc ), uses what identity that your service connection configures on - it can be a managed identity and it could be an app registration. It is NOT to say, you have a k8s workload identity setup and you wanted to use that managed identity (by passing service connection) 1. Setup service connection and federate:-  Prereq  1. Do you need to grant project collection admin to this service account     No you don't but you need to provide it with BASIC license access. 2. Do you need to federate it to the workload identity?      Yes you do - if you use Azure Devops service con...

Eagle and its uses in LLM (VLLM)

Image
The primary goal of EAGLE is to reduce the computational cost and latency associated with generating text from LLMs. It achieves this by introducing methods that allow for faster decoding during inference, making it particularly useful for applications requiring real-time or large-scale language processing. ​ If you're looking for a faster and more performant technical for text generation - this can help.  In the context of vllm, this approach can provide faster performance for model served with VLLM.  https://docs.vllm.ai/en/latest/getting_started/examples/eagle.html As you can see here under speculative_config.  

vllm - using mistral model and resolving some of its issues

Image
Bump into mistral assertion error - while trying to run vllm chat example here . While it is quite confusing what are the actual cause for this, I turn on  VLLM_LOGGING_LEVEL=debug And then re-run the code, then i saw "Forbidden for url: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/resolve/main/params.json". And you  guess it, i had to goto mistral hugging face model card and accept the license agreement.  The log looks like this below:  Once i have accepted the license agreement, i re-run and it was working. Then quickly, I ran into this error  ValueError: The model's max seq len (128000) is larger than the maximum number of tokens that can be stored in KV cache (32768). Try increasing `VLLM_CPU_KVCACHE_SPACE` or decreasing `max_model_len` when initializing the engine. To resolve this, i tried running this from the command line:-  vllm serve mistralai/Mistral-7B-Instruct-v0.3 --max-model-len 32768 How to configure this from LLM class itself?...

vllm - curl for completion

Something i need to be handy when running or testing vllm.  Start up your model using vllm using the following command vllm serve Qwen/Qwen2.5-1.5B-Instruct Then run the following command to generate possible 100 tokens outputs from the model. curl http: // localhost: 8000 / v1 / completions \     - H "Content-Type: application/json" \     - d '{         "model" : "Qwen/Qwen2.5-1.5B-Instruct" ,         "prompt" : "San Francisco is a" ,         "max_tokens" : 100 ,         "temperature" : 0     } '  

vllm error in running sample: "reshape_and_cache_cpu_impl" not implemented for 'Half'

Image
When running the sample, I hit this error above when running one of the basic example.  To resolve it, you just need to add dtype and set it to "float32" - as shown below:-  from vllm import LLM, SamplingParams # Sample prompts. prompts = [     "Hello, my name is" ,     "The president of the United States is" ,     "The capital of France is" ,     "The future of AI is" , ] # Create a sampling params object. sampling_params = SamplingParams( temperature = 0.8 , top_p = 0.95 ) def main ():     # Create an LLM.     llm = LLM( model = "facebook/opt-125m" , dtype = "float32" )     # Generate texts from the prompts. Example output after the fix

operator torchvision::nms does not exist when setting up vllm

Image
  Trying to run a basic app in vllm but ran into this error. I had a run in with this sometime ago but forgotten about it. My vm does not have gpu and i just installed a default (non-cpu only version) of vllm.  If you face similiar issue: Please follow the cpu installation guide here .  AVOID previously installed version of vllm.  After i have completed my setup, I am able to run  vllm serve Qwen/Qwen2.5-1.5B-Instruct

github action publishing nuget packages

Image
To publish nuget packages for your dotnet application using github action, you can use the following yaml. In github directory structure is pretty straight forward.  Here is the yaml that you can use. You will notice that i have configure permission for packages. This is to ensure we are able to use the default secret.GITHUB_TOKEN to publish to nuget packages. So we're not using a PAT token here. name : .NET on :   push :     branches : [ "main" ]   pull_request :     branches : [ "main" ] permissions :   packages : write jobs :   build-and-publish :     runs-on : ubuntu-latest     strategy :       matrix :         dotnet-version : [ '8.0' ] # Add all your supported versions     steps :     - name : Checkout code       uses : actions/checkout@v4     - name : Set up .NET ${{ matrix.dotnet-version }}       uses : actions/setup-do...

github actions setup and authenticating terraform provider using managed identity

Image
You can use the following yaml to setup terraform azurerm which authenticate using managed identity.  All you need is the following information  -  AZURE_CLIENT_ID - this would be your managed identity client id - AZURE_SUBSCRIPTION_ID  - AZURE_TENANT_ID  The permission is important and you need this to work. name : 'Build .Net app' on : [ push , workflow_dispatch ] permissions :   id-token : write   contents : read   jobs :   build-and-deploy :     runs-on : ubuntu-latest     steps :       - name : Azure login         uses : azure/login@v2         with :           client-id : ${{ secrets.AZURE_CLIENT_ID }}           tenant-id : ${{ secrets.AZURE_TENANT_ID }}           subscription-id : ${{ secrets.AZURE_SUBSCRIPTION_ID }}       - name : Azure CLI script   ...

how to call azure REST API endpoint

The followings are steps you need to do to call Azure REST API endpoint. We need to obtain the token and then hit this endpoint using bearer token authorizations TOKEN=$(az account get-access-token --query accessToken --output tsv)  curl -X GET \  " https://management.azure.com/ subscriptions/ $SUBSCRIPTION_ID /resourceGroups/$RESOURCE_ GROUP/providers/Microsoft.Sql/ servers/$SERVER_NAME?api- version=2024-02-01"  \ -H  "Authorization: Bearer $TOKEN"  

azure devops terraform provider - needs the following permission(s) on the resource Users to perform this action: Add Users

Image
While trying to setup my users using azure devops terraform provider, i bump into this error "needs the following permission(s) on the resource Users to perform this action: Add Users" Then i had to update the following permission to get it to work.

llm tool to help with quantization

To decrease the size and memory footprint of machine learning models, a technique called quantization is employed. This method, akin to lossy image compression, converts model weights into lower precision formats such as 8-bit or 4-bit. While this significantly reduces resource demands, it's important to note that, like image compression, quantization can potentially lead to a reduction in the model's accuracy. Tool that you can use are  https://github.com/ModelCloud/GPTQModel https://github.com/casper-hansen/AutoAWQ

k8s gateway api - setting up gateway and routes to different service in a cluster

  To setup a k8s gateway api in a gke cluster, you typically have to  - create gatewayapi - setup necessary http route  - deploy your services A and service B. Setting up gateway by running the following yaml apiVersion : gateway.networking.k8s.io/v1 kind : Gateway metadata :   name : my-shared-gateway spec :   gatewayClassName : gke-l7-regional-external-managed   listeners :     - protocol : HTTP # Or HTTPS for production       port : 80 # Or 443 for HTTPS       name : http       hostname : "*.example.com" create your http route A. apiVersion : gateway.networking.k8s.io/v1 kind : HTTPRoute metadata :   name : a-route   namespace : default spec :   parentRefs :   - name : my-shared-gateway   hostnames :   - "a.example.com"   rules :   - matches :     - path :         type : PathPrefix         value : /   ...

gke gateway :- error ensuring load balancer: generic::invalid_argument: Insert: Invalid value for field 'resource.target'

Image
This error  comes about when you do not have a proxy subnet created for your region when you're trying to deploy load balancer or when creating k8s API gateway's gateway resources.  These load balancers use a Google-managed proxy layer, and for internal communication between the proxy and your backend VMs, GCP needs a reserved IP range within your VPC. That’s where the proxy-only subnet comes in. You can create it by going here: https://cloud.google.com/load-balancing/docs/proxy-only-subnets#proxy_only_subnet_create You can see after i apply the fixes, my gateway resources starts to sync successfully.  And then allocated a ip for my gateway here:- 

gke - deploying multiple llm models with gradio interface

  First create a gke cluster and you will need to be able to support GPU (required by the hugging face text generation inference) . Once you have provision one create a secret :-  kubectl create secret generic l4-demo \   --from-literal=HUGGING_FACE_TOKEN=hf_token Next choose the ML of your choice for deployment. In this example, we going to use falcon.  Please refer to this page here for other models. apiVersion : apps/v1 kind : Deployment metadata :   name : llm spec :   replicas : 1   selector :     matchLabels :       app : llm   template :     metadata :       labels :         app : llm     spec :       containers :       - name : llm         image : us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-text-generation-inference-cu121.1-4.ubuntu2204.py310         resources :  ...