Posts

vllm : Failed to infer device type

Trying to run vllm without gpu will typicaly lands you into this issue. To resolve this issue: we can use set these both. Setting one does not work for me. So i had to configure both settings below:- 1.  environment variable CUDA_VISIBLE_DEVICES = "" 2. in the command line, change to use device cpu and remove  tensor-parallel-size. For example, python3 -m vllm.entrypoints.openai.api_server --port 8080 --model deepseek-ai/DeepSeek-R1 --device cpu --trust-remote-code --max-model-len 4096" References https://docs.vllm.ai/en/latest/getting_started/troubleshooting.html#failed-to-infer-device-type

postgres - working with vector

Image
Here is a  simple tutorial how to work with postgres vector. We will create a table and one of its column embedding uses a embedding dimension of 2.  CREATE TABLE doctest (     id SERIAL PRIMARY KEY ,     content TEXT ,     embedding vector( 2 ) -- OpenAI's ada-002 embedding size ) ; To insert value into the table  INSERT INTO doctest (content , embedding) VALUES ( 'test' , '[1.23,1.245]' ) INSERT INTO doctest (content , embedding) VALUES ( 'test7' , '[7, 10]' ) To select      SELECT content , 1 - (embedding <=> '[7, 9]' ) AS similarity     FROM doctest     ORDER BY similarity DESC     LIMIT 1 ; And gives you the output here: If you make another search     SELECT content , 1 - (embedding <=> '[1.23, 1.3]' ) AS similarity     FROM doctest     ORDER BY similarity DESC     LIMIT 1 ;

pip command not found - setting up new VM

This could happened for many reason. Path not set correctly etc etc. For my case, I am setting up a new VM and probably need to install the required pip. To resolve this, I simply had to :- sudo apt install python3-pip  

azure devops sdk - how to use managed identity to connect to your azure devops

In the Azure Devops example, we seen quite alot of examples using PAT token to authenticate and access Azure Devops. Here we will try to change that and use managed identity instead.  The following code show how you can achieve this.  public async Task SetupConnectionUsingAzureIdentity ( AzRepositoryConfiguration configuration ) {     logger . LogInformation ( "Starting SetupConnectionUsingAzureIdentity, getting access token" );     var azureCredential = new DefaultAzureCredential ();     var token = await azureCredential . GetTokenAsync ( new TokenRequestContext ( new string [] { ScaffolderAppConstants . AzureDevopsApplicationIdScope }));     VssCredentials creds = new VssAadCredential ( new VssAadToken ( ScaffolderAppConstants . BearAuthorizationText , token . Token ));     logger . LogInformation ( $"Authenticating and setting up connection to Azure Devops: { confi...

postgres creating vector column getting error pq: type "vector" does not exist

Image
Bump into this error while trying to create a vector column.  Resolution:- Enable vector extension by running the following in your sql studio or other client CREATE EXTENSION vector ; CREATE TABLE documents (     id SERIAL PRIMARY KEY ,     content TEXT ,     embedding vector( 1536 ) -- OpenAI's ada-002 embedding size ) ; And you will see that it executes successfully

gcp cloud sql creating postgres database

Image
We can easily create a postgres SQL using gcp. Go to cloud sql, then select postgres.   Then, provide a name for your instance. Provide a password and specify a region. For my dev purposes, i would just use sandbox edition as oppose to production.  Ensure you select 1vCPU for your  machine configuration and storage to use HDD.  Once you have created it, you can connect to your database using SQL Cloud studio in your browser. So probably don't need to install another PG client. Ensure you have whitelisted ips to connect to your database. You can goto Connection -> Add Network.

Deploying TF serving on gke

Image
  In the google example, it uses gpu to deploy training and inference job into a gke workload but requires gpu. We can easily convert that to run on normal cpu.  First we need to create  1. a gke autopilot cluster 2. create a cloud storage called PROJECTID-gke-gpu-bucket Next clone the repository  git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples cd kubernetes-engine-samples/ai-ml/gke-online-serving-single-gpu Configure the following environment variables: export PROJECT_ID = $( gcloud config get project ) export REGION = $( gcloud config get compute/region ) export K8S_SA_NAME = gpu-k8s-sa export GSBUCKET = $PROJECT_ID -gke-bucket export MODEL_NAME = mnist export CLUSTER_NAME = online-serving-cluster Create a service account in gcp IAM called "gke-ai-sa" and provide it with 2 roles namely "storage insights collector service"  and ""storage object admin".  create the following resources in k8s  kubectl create...