Gemma 3 RAG sample implementation
RAG can be a better alternative compare to fine tuning where you can keep on using existing model yet you can feed it with new information during runtime or during a Q&A session. This saves time trying to redeploy and re-test to ensure everything works correctly.
Anyways here is a hello world implementation of Gemma3 with RAG
Go into your Google colab and run the following codes
# --- Step 1: Install dependencies ---
!apt-get update
!apt-get install -y curl wget gnupg
!curl -fsSL https://ollama.com/install.sh | sh
Next install all the require dependencies
!pip install langchain langchain_community faiss-cpu sentence-transformers
Install Ollama
!curl -fsSL https://ollama.com/install.sh | sh
Run ollama
!nohup ollama serve &
Then pull down gemma3 model
!ollama pull gemma3
Then you have the following code
# --- Step 3: Setup LangChain with FAISS ---
#from langchain.vectorstores import FAISS
from langchain_community.vectorstores.faiss import FAISS
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.docstore.document import Document
from langchain.llms import Ollama
# Example docs (replace with your knowledge base later)
docs = [
Document(page_content="VIP customers are entitled to a 60-day refund window."),
Document(page_content="Standard customers get only a 30-day refund."),
Document(page_content="Refunds are processed back to the original payment method."),
]
# Create embeddings + FAISS index
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)
Next, prepare your prompt
# --- Step 4: Retrieval + Augmentation ---
query = "What is the refund policy for VIP customers?"
retrieved_docs = vectorstore.similarity_search(query, k=2)
context = "\n".join([doc.page_content for doc in retrieved_docs])
prompt = f"""
You are gemma3. Answer truthfully using the context below.
Context:
{context}
Question: {query}
Answer:
"""
print("==== Prompt Sent to Gemma 3 ====")
print(prompt)
And finally getting your response
# --- Step 5: Call Gemma3 via Ollama ---
llm = Ollama(model="gemma3")
response = llm(prompt)
print("==== gemma3 Response ====")
print(response)
You should be getting this as a response from google colab
Comments