Posts

terraform provider alias

Terraform provider alias allow us to have different configuration when provision resources. In Azure term, that would be creating different resources using different subscriptions. Let's illustate this quickly.  Here we are configuring provider with different subscriptions  # Default Provider (Subscription A) provider "azurerm" {   features {}   subscription_id = " 00000000-0000-0000-0000-000000000000 " } # Secondary Provider (Subscription B) provider "azurerm" {   alias           = " sub_b "   features {}   subscription_id = " 11111111-1111-1111-1111-111111111111 " } And to use this provider here, we can try to create a resource group. resource "azurerm_resource_group" "rg_in_sub_a" {   name     = " primary-resources "   location = " East US "   # No provider specified, so it uses the default } resource "azurerm_resource_group" "rg_in_sub_b" {   provi...

terraform move code block - better way to let terraform know you have moved your resources

Image
 Sometimes we refactor our terraform code and move the basic terraform code  Let's say we have the following code  # main.tf (Root level) resource "azurerm_storage_account" "old_storage" {   name                     = " mystorage2026 "   resource_group_name       = " my-rg "   location                 = " East US "   account_tier             = " Standard "   account_replication_type = " LRS " } And then we refactor our code to the following code structure  The Module Code (modules/storage/main.tf) Notice that inside the module, I've given the resource a new local name: modular_storage. resource "azurerm_storage_account" "modular_storage" {   name                     = var . storage_name   resource_group_name       = ...

gcp running serverless dataproc with a hello world python script

Image
Pyspark Pi calculation To get thing started, we need a pyspark code - this is a simple pi.py andvthen upload this to your google bucket.  import sys from random import random from operator import add from pyspark . sql import SparkSession if __name__ == " __main__ " :     """         Usage: pi [partitions]     """     spark = SparkSession \         . builder \         . appName ( " PythonPi " )\         . getOrCreate ()     partitions = int ( sys . argv [ 1 ]) if len ( sys . argv ) > 1 else 2     n = 100000 * partitions     def f ( _ : int ) -> float :         x = random () * 2 - 1         y = random () * 2 - 1         return 1 if x ** 2 + y ** 2 <= 1 else 0     count = spark . sparkContext . parallelize ( range ( 1 , n...

ADK using Vertex AI RAG

Using RAG with ADK engine, we need to setup our RAG engine corpus and tell our agent to use that as a point of reference. As you can see here RAG_CORPUS contains details of this endpoint.   Then we use a tool called VertexAiRagRetrieval  from google . adk . tools . retrieval . vertex_ai_rag_retrieval import (     VertexAiRagRetrieval , ) Then we hook it up to our agent and make these knowledge accessible  # Initialize tools list tools = [] # Only add RAG retrieval tool if RAG_CORPUS is configured rag_corpus = os . environ . get ( " RAG_CORPUS " ) if rag_corpus :     ask_vertex_retrieval = VertexAiRagRetrieval (         name = " retrieve_rag_documentation " ,         description =(             " Use this tool to retrieve documentation and reference materials for the question from the RAG corpus, "         ),         rag_resources ...

ADK app deployment to Vertex AI and consuming it

This is for formality purposes and we constantly need to deploy and consumer agentic app that we deployed to Vertex AI.  To deploy,  we can use the following code. import logging import os import vertexai from dotenv import set_key from vertexai import agent_engines from vertexai . preview . reasoning_engines import AdkApp from rag . agent import root_agent logging . basicConfig ( level = logging . DEBUG ) logger = logging . getLogger ( __name__ ) GOOGLE_CLOUD_PROJECT = os . getenv ( " GOOGLE_CLOUD_PROJECT " ) GOOGLE_CLOUD_LOCATION = os . getenv ( " GOOGLE_CLOUD_LOCATION " ) STAGING_BUCKET = os . getenv ( " STAGING_BUCKET " ) # Define the path to the .env file relative to this script ENV_FILE_PATH = os . path . abspath (     os . path . join ( os . path . dirname ( __file__ ), " .. " , " .env " ) ) vertexai . init (     project = GOOGLE_CLOUD_PROJECT ,     location = GOOGLE_CLOUD_LOCATION ,     staging_bucket = S...

VertexAiSessionService creating client getting error -ValueError: Project/location and API key are mutually exclusive in the client initializer.

Bump into this error while trying to test out the adk service that is deployed to vertex AI. It seems like vertex agent can't decide it should use GOOGLE_API_KEY or another.  ValueError: Project/location and API key are mutually exclusive in the client initializer. To resolve this, just have to set   GOOGLE_API_KEY="". Then re-run  your script.

Postgres database and table storage and index optimizations

Image
Optimizating database storage is always a key focus area when running databases. Lets get started with some basics.  Database storage used Run the following query to see how much space your database currently using SELECT datname AS database_name , pg_size_pretty(pg_database_size(datname)) AS total_size FROM pg_database ORDER BY pg_database_size(datname) DESC ; And you may get further break down here:- Table size spaces used  Let's see how much space used between data and index  SELECT relname AS table_name , pg_size_pretty(pg_total_relation_size(relid)) AS total_size, pg_size_pretty(pg_relation_size(relid)) AS data_size, pg_size_pretty(pg_total_relation_size(relid) - pg_relation_size(relid)) AS index_size FROM pg_catalog . pg_statio_user_tables ORDER BY pg_total_relation_size(relid) DESC ; And we may get the following outputs. We can see how much actual data stored in the data (data_size) and how much taken up by the index_size .  It is important to keep...