📚 This post is part of my Learning Notes – RAG Series.
Today’s session introduced me to Retrieval-Augmented Generation (RAG) and why it’s becoming essential in AI. The focus was on understanding the limitations of plain language models (LLMs) and how RAG helps overcome them.
..
What is RAG?
When I first heard about RAG, I thought it was just another complicated AI term. But the idea is actually pretty simple once I broke it down.
RAG is basically a combination of two steps:
- Retrieval → first, the system looks up relevant information from external sources (like documents or databases)
- Generation → then, a language model uses that information to generate an answer
What made it click for me is this:
Instead of answering from memory, the model looks things up first, then answers.
That “augmented” part just means the model is no longer relying only on what it learned during training—it’s being supported with fresh, relevant context.
Because of this, the answers become:
- more factual
- more relevant
- and sometimes more up-to-date
Basic RAG workflow: retrieval + generation working together to produce grounded answers.
Key Concepts:
Language Models (LLMs) generate text by predicting the next word based on learned patterns and context.
They use probabilities and context to produce coherent and meaningful responses.
Limitations of LLMs:
- Hallucinations (making up answers when unsure).
- Outdated knowledge (training data has a cutoff).
- No access to private or domain-specific documents.
RAG: Combines retrieval (fetching relevant info) with generation (LLM output).
- RAG helps improve factuality, relevance, and freshness by grounding responses in retrieved information
To understand why RAG is needed, it helps to first understand how LLMs actually work internally.
How Do LLMs Learn? (Weights & Parameters)
I realized language models are like giant equations with many adjustable values called weights. During training, the model tweaks these weights to decide how strongly one word or feature should influence the next.
Think of it like the formula y = mx + c.
Changing m or c changes the line.
Similarly, changing weights changes the model’s predictions.
By adjusting millions (or even billions) of these weights, the model learns patterns from huge datasets — grammar, facts, and relationships between words.
👉 In simple terms: weights are the “knobs” the model turns during training to get better at predicting the next word.
SLM vs. LLM (General Note)
While this session focused mainly on Large Language Models (LLMs) and RAG, it’s useful to know the distinction:
LLMs (Large Language Models) → Very big models trained on massive datasets. They’re powerful, but resource-heavy.
SLMs (Small Language Models) → More compact models designed for efficiency. They can run faster, use less memory, and are easier to deploy on devices with limited resources.
In practice:
- LLMs are great for complex reasoning and broad knowledge.
- SLMs are often used for lightweight tasks, edge devices, or situations where speed and efficiency matter.
This is useful context to keep in mind as I continue learning about RAG and AI systems.
Why Do We Need RAG?
Plain language models are powerful, but they have some important limitations.
Since they are trained on fixed data, they don’t truly “know” current or external information at the time of answering. This leads to a few common issues:
- Hallucinations → Sometimes the model confidently generates answers that are not correct, especially when it is unsure.
- Static knowledge → The model cannot update itself after training, so its knowledge may become outdated over time.
- No access to private data → It cannot read company documents, personal files, or domain-specific databases unless they are provided during runtime.
Because of these limitations, answers can sometimes be incomplete, outdated, or inaccurate.
How RAG helps
RAG addresses this by changing how the model answers questions.
Instead of relying only on what it learned during training, it first retrieves relevant information from external sources and then uses that context to generate a response.
This helps ground the output in real, up-to-date information and makes the answers more reliable and context-aware.
Key idea
RAG doesn’t replace the language model—it supports it with relevant information at the right time.
Key Examples from the Session
-
Dogs, Cats, and Lion →
- Without RAG: If a model has not seen enough relevant information about lions in its training data, it may generate incorrect or fabricated answers (hallucinations).
- With RAG: Retrieval brings in factual information about lions from external sources, helping the model generate a more accurate and grounded response.
-
COVID vs. Current Events →
- Without RAG: The model may know about COVID (from training data) but struggle with recent events due to outdated knowledge.
- With RAG: Retrieval pulls in recent articles or documents, allowing the model to respond with up-to-date context.
-
River Bank → Context confusion: “river bank” vs. “financial bank.”
- Without RAG: The model may confuse “river bank” (geography) with “bank” (finance) depending on context.
- With RAG: Retrieval provides relevant domain context, helping the model choose the correct meaning.
-
Company Docs → LLM alone can’t answer from private files, but RAG can.
- Without RAG: The model cannot access private or internal company documents.
- With RAG: Retrieval fetches relevant internal documents, enabling accurate answers based on company data.
-
Hello Predictions→
- Without RAG: With “Hello,” low temperature may produce “World,” while high temperature may produce “How are you?” or other creative outputs — but answers may drift.
- With RAG: Even at high temperature, retrieval keeps outputs grounded in factual context.
Temperature Settings [Temperature in LLMs]
- Low Temperature (~0) → More deterministic and consistent responses.
High Temperature (~1 or above) → More creative and varied responses.
-Takeaway: Use low temperature for consistency and high temperature for creativity. Note that temperature controls randomness, not correctness.My Note: Retrieval can help guide responses with relevant context, even when temperature increases variability.
Real-World Applications of RAG
- Customer Support → Answers from FAQs and manuals.
- Healthcare → Grounded responses from medical databases.
- Education → Fact-checked explanations for learners.
- Enterprise Search→ Unlocking insights from private organizational data.
Key Takeaways (Quick Reference)
- RAG = Retrieval + Generation.
- Helps reduce hallucinations, outdated knowledge issues, and lack of private context.
- Temperature controls creativity vs. accuracy.
- Real-world uses: support, healthcare, education, enterprise search.
- Core idea: ground AI in facts before generating answers.
My Conclusion
Today’s session gave me a strong foundation in understanding the limitations of AI and how RAG helps overcome them.
Instead of relying only on memory, RAG allows AI to look up relevant information before answering—just like how we perform better when we can refer to notes.
This is just the beginning of my learning journey with RAG — I’ll continue documenting as I go.
Next up: Session 2, where I’ll continue exploring and documenting my journey.

Top comments (0)