Gemma 3 RAG sample implementation

August 22, 2025

RAG can be a better alternative compare to fine tuning where you can keep on using existing model yet you can feed it with new information during runtime or during a Q&A session. This saves time trying to redeploy and re-test to ensure everything works correctly.

Anyways here is a hello world implementation of Gemma3 with RAG

Go into your Google colab and run the following codes

# --- Step 1: Install dependencies ---
!apt-get update
!apt-get install -y curl wget gnupg
!curl -fsSL https://ollama.com/install.sh | sh

Next install all the require dependencies

!pip install langchain langchain_community faiss-cpu sentence-transformers

Install Ollama

!curl -fsSL https://ollama.com/install.sh | sh

Run ollama

!nohup ollama serve &

Then pull down gemma3 model

!ollama pull gemma3

Then you have the following code

# --- Step 3: Setup LangChain with FAISS ---
#from langchain.vectorstores import FAISS
from langchain_community.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
from langchain.llms import Ollama

# Example docs (replace with your knowledge base later)
docs = [
    Document(page_content="VIP customers are entitled to a 60-day refund window."),
    Document(page_content="Standard customers get only a 30-day refund."),
    Document(page_content="Refunds are processed back to the original payment method."),
]

# Create embeddings + FAISS index
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

Next, prepare your prompt

# --- Step 4: Retrieval + Augmentation ---
query = "What is the refund policy for VIP customers?"
retrieved_docs = vectorstore.similarity_search(query, k=2)

context = "\n".join([doc.page_content for doc in retrieved_docs])
prompt = f"""
You are gemma3. Answer truthfully using the context below.

Context:
{context}

Question: {query}

Answer:
"""

print("==== Prompt Sent to Gemma 3 ====")
print(prompt)

And finally getting your response

# --- Step 5: Call Gemma3 via Ollama ---
llm = Ollama(model="gemma3")
response = llm(prompt)
print("==== gemma3 Response ====")
print(response)

You should be getting this as a response from google colab

Search This Blog

mitzen

Gemma 3 RAG sample implementation

Comments

Popular posts from this blog

vllm : Failed to infer device type

android studio kotlin source is null error

gemini cli getting file not defined error