Gemma 3 RAG sample implementation

RAG can be a better alternative compare to fine tuning where you can keep on using existing model yet you can feed it with new information during runtime or during a Q&A session. This saves time trying to redeploy and re-test to ensure everything works correctly. 

Anyways here is a hello world implementation of Gemma3 with RAG 

Go into your Google colab and run the following codes


# --- Step 1: Install dependencies ---
!apt-get update
!apt-get install -y curl wget gnupg
!curl -fsSL https://ollama.com/install.sh | sh

Next install all the require dependencies

!pip install langchain langchain_community faiss-cpu sentence-transformers

Install Ollama

!curl -fsSL https://ollama.com/install.sh | sh


Run ollama 

!nohup ollama serve &

Then pull down gemma3 model

!ollama pull gemma3

Then you have the following code 


# --- Step 3: Setup LangChain with FAISS ---
#from langchain.vectorstores import FAISS
from langchain_community.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
from langchain.llms import Ollama

# Example docs (replace with your knowledge base later)
docs = [
    Document(page_content="VIP customers are entitled to a 60-day refund window."),
    Document(page_content="Standard customers get only a 30-day refund."),
    Document(page_content="Refunds are processed back to the original payment method."),
]

# Create embeddings + FAISS index
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)


Next, prepare your prompt 


# --- Step 4: Retrieval + Augmentation ---
query = "What is the refund policy for VIP customers?"
retrieved_docs = vectorstore.similarity_search(query, k=2)

context = "\n".join([doc.page_content for doc in retrieved_docs])
prompt = f"""
You are gemma3. Answer truthfully using the context below.

Context:
{context}

Question: {query}

Answer:
"""

print("==== Prompt Sent to Gemma 3 ====")
print(prompt)

And finally getting your response 


# --- Step 5: Call Gemma3 via Ollama ---
llm = Ollama(model="gemma3")
response = llm(prompt)
print("==== gemma3 Response ====")
print(response)

You should be getting this as a response from google colab








Comments

Popular posts from this blog

gemini cli getting file not defined error

NodeJS: Error: spawn EINVAL in window for node version 20.20 and 18.20

vllm : Failed to infer device type