Ranjan Dailata

Posted on Apr 11

Why Chunking Is the Biggest Mistake in RAG Systems

#ai #documentintelligence #rag #algorithms

Retrieval-Augmented Generation (RAG) has become the default architecture for building AI-powered document intelligence systems. Most implementations follow the same pattern:

Split documents into chunks
Convert chunks into embeddings
Store them in a vector database
Retrieve the most similar chunks
Send them to an LLM to generate answers

This pipeline works reasonably well for simple text. However, when applied to structured documents like clinical records, chunking can introduce serious problems.

Healthcare documents are rich with context and hierarchy. Breaking them into arbitrary chunks often leads to context loss, retrieval errors, and fragmented reasoning.

In this article, you will understand why chunking fails using a realistic clinical document example, and how structure-aware indexing and summarization can produce far better results.

Note - This post focuses on the Healthcare Domain with the patient clinical document as an example.

The Clinical Document Example

Consider the following clinical summary sample:

Patient Name: Jordan M.
DOB: 06/21/1990
Date of Summary: 08/01/2025

Diagnosis: F33.1 Major Depressive Disorder, recurrent, moderate
Symptoms: Persistent low mood, disrupted sleep, concentration issues

Treatment Summary:
- 12 CBT sessions, weekly
- Focused on core beliefs, behavioral activation
- PHQ-9 improved from 17 to 6

Medications: Sertraline 50mg daily, no side effects reported

Follow-Up Plan:
- Referral to psychiatrist for medication continuation
- Recommended ongoing biweekly therapy

At first glance, this document appears small, but clinical records in real systems often span hundreds of pages across multiple visits.

Even in this simple example, the document contains clear semantic sections:

Patient Info
Diagnosis
Symptoms
Treatment Summary
Medications
Follow-Up Plan

These sections provide the structure necessary for proper interpretation.

What Happens When We Chunk This Document

A traditional RAG system might split the text into chunks like this:

Chunk A
Patient Name: Jordan M.
DOB: 06/21/1990
Diagnosis: Major Depressive Disorder
Symptoms: Persistent low mood

Chunk B
Treatment Summary:
12 CBT sessions
PHQ-9 improved from 17 to 6

Chunk C
Medications: Sertraline 50mg daily
Follow-Up Plan: referral to psychiatrist

1. Cross-Section Reasoning Questions

These require information from multiple chunks, which chunk-based retrieval often fails to assemble.

Example Questions

• What treatment improved the patient’s PHQ-9 score?
• What medication is being used to treat the patient's depression?
• What treatment approach was used along with medication?
• What interventions helped reduce the patient’s depression score?

Why Chunking Fails

The system may retrieve:

Chunk B
PHQ-9 improved from 17 to 6

But it does not contain medication information, so the answer becomes incomplete.

2. Contextual Medical Questions

These questions require understanding relationships between sections.

Example Questions

• What condition is the patient being treated for with Sertraline?
• Why was the patient referred to a psychiatrist?
• What symptoms led to the treatment plan?

Why Chunking Fails

Chunk C contains medication, but diagnosis is in Chunk A, so the model may not connect them.

3. Treatment Outcome Questions

These require linking treatment with outcomes.

Example Questions

• Did the therapy sessions improve the patient’s condition?
• What evidence shows the patient improved during treatment?
• How effective was the treatment plan?

Why Chunking Fails

The improvement metric:

PHQ-9 improved from 17 to 6

appears in Chunk B, but the context about depression diagnosis is in Chunk A.

4. Follow-Up Care Questions

These require understanding treatment history and next steps.

Example Questions

• Why does the patient need psychiatric follow-up?
• What follow-up care is recommended after treatment?
• What ongoing care is suggested for this patient?

Why Chunking Fails

Chunk C contains the follow-up plan but not the context of the diagnosis or therapy outcome.

5. Comprehensive Clinical Summary Questions

These require multiple chunks simultaneously.

Example Questions

• Summarize the patient’s diagnosis, treatment, and follow-up plan.
• What treatments has the patient received for depression?
• What is the overall care plan for this patient?

Why Chunking Fails

Chunk-based retrieval may only return one chunk, causing a partial summary.

Example incomplete retrieval:

Chunk B
Treatment Summary
12 CBT sessions
PHQ-9 improved from 17 to 6

But the system misses medication and follow-up care.

6. Ambiguous Retrieval Questions

These expose semantic similarity issues in vector search.

Example Questions

• What therapy is the patient receiving?
• What treatment is the patient undergoing?
• How is the patient being treated?

Vector search may retrieve:

Chunk B
Treatment Summary

But it misses medication in Chunk C, which is also part of the treatment plan.

Vector similarity measures semantic proximity, not clinical context.

The result: incorrect or incomplete answers.

Why Chunking Breaks Clinical Documents

Healthcare documents illustrate several fundamental problems with chunking.

1. Clinical Context Gets Fragmented

Clinical notes often rely on relationships between sections.

Example:

Diagnosis - Explains why treatment was prescribed
Treatment - Explains how symptoms improved
Follow-Up - Explains ongoing care

When chunked, these relationships disappear.

2. Important Meaning Spans Sections

Consider the treatment outcome:

PHQ-9 improved from 17 to 6

This metric only makes sense if the model also understands:

Diagnosis: Major Depressive Disorder
Treatment: CBT sessions
Medication: Sertraline

Chunking separates these connected ideas.

3. Clinical Reasoning Requires Structure

Doctors interpret records by navigating sections:

Diagnosis
Symptoms
Treatment
Medication
Follow-Up

Chunking ignores this hierarchy entirely.

A Better Approach: Structure-Aware Document Retrieval

Instead of splitting documents arbitrarily, the document structure can be preserved by producing a tree based hierarchical structure.

Example hierarchical representation:

Clinical Summary
 ├ Patient Information
 │   ├ Name
 │   ├ DOB
 │
 ├ Diagnosis
 │
 ├ Symptoms
 │
 ├ Treatment Summary
 │
 ├ Medications
 │
 └ Follow-Up Plan

Each section becomes a retrieval node.

This structure preserves the clinical context.

Adding Summarization for Better Retrieval

To improve retrieval efficiency, each section can be summarized.

Example summaries:

Patient Information
Summary: Patient demographics including name and DOB.

Diagnosis
Summary: Major Depressive Disorder (recurrent, moderate).

Treatment Summary
Summary: 12 CBT sessions with significant improvement in PHQ-9 score.

Medications
Summary: Sertraline 50mg daily with no reported side effects.

Follow-Up Plan
Summary: Referral to psychiatrist and continued biweekly therapy.

These summaries act as compressed semantic representations of the document.

How Retrieval Works with Summaries

User query:

"What medication is the patient currently taking?"

The system compares the query to section summaries:

Diagnosis - Mental health condition
Treatment - Therapy sessions
Medications - Drug prescription
Follow-Up - Future care

The correct section (Medications) is retrieved immediately.

Example Final Context

Retrieved section:

Medications:
Sertraline 50mg daily, no side effects reported

Generated response:

The patient is currently prescribed Sertraline 50mg daily, with no reported side effects.

High-level Architecture for Clinical RAG

A structure-aware system might follow this pipeline:

This preserves meaning while reducing noise.

Why This Matters in Healthcare AI

Clinical AI systems must prioritize:

• Accuracy
• Traceability
• Context awareness

Chunk-based retrieval often struggles to meet these requirements.

Structure-aware approaches provide:

Higher precision

Relevant sections are retrieved instead of unrelated chunks.

Better explainability

The system can show exact sections used in reasoning.

Improved clinical safety

Maintaining document hierarchy reduces the risk of misinterpretation.

The Future of RAG in Healthcare

As AI becomes more integrated into healthcare systems, document understanding will play a critical role.

The next generation of RAG architectures will likely include:

• Hierarchical document indexing
• Section-level summarization
• Reasoning-based retrieval
• Agentic document exploration

These approaches allow AI systems to navigate clinical documents more like human experts.

Conclusion

The chunking assumes documents are bags of paragraphs. But documents are actually structured knowledge systems. Even when documents appear unstructured, the structure can be inferred. And once structure exists, retrieval becomes far more accurate.

Structured documents like clinical records, it often causes more problems than it solves.

If you need the AI systems to truly understand documents, in such cases preserving the structure and allow models to reason over meaningful sections is really crucial.

Moving beyond chunking is a critical step toward building safer, more reliable document intelligence systems.

In the next blog posts, you will be walked with a realistic example on how to deal with the unstructured data and its retrieval.

Attribution

Clinical document sample was referenced from https://www.supanote.ai/templates/clinical-summary-template

This blog-post contents were formatted with ChatGPT to make it more professional and produce a polished content for the targeted audience.

Top comments (5)

Olebeng • May 6

The clinical document example makes the failure mode concrete in a way that generic RAG critiques rarely do. The PHQ-9 metric only means something in context of the diagnosis and the treatment. Chunking strips that context and the retrieval system has no way to recover it.

There is an interesting tension between this post and Ayan Arshad's chunking experiments published on Dev.to yesterday. His conclusion for code was that smaller chunks win, function-level AST extraction at roughly 120 tokens outperformed larger windows. Your conclusion for clinical documents is that larger semantic units win, the section, not the paragraph or the sentence.

These are not contradictory. They are both right for their domains, and together they point at something more precise than "chunking is a mistake." The optimal granularity is determined by two variables simultaneously: the structure of the data AND the nature of the question being asked.

For code, the question is usually "what does this function do", a bounded, function-scoped query. The semantic unit is the function. For clinical records, the question is often "what explains this treatment decision", a relational query that spans sections. The semantic unit is the relationship between sections, not any individual section.

The section-level summarization approach you describe solves the navigation problem elegantly. The risk worth naming is that it introduces a precision-recall tradeoff at the leaf level. A summary of the Medications section that says "Sertraline 50mg daily" is perfect for the query "what medication is the patient taking." But for the query "was there any adverse reaction noted in the medication review," the summary may not preserve that granularity, and the raw chunk would have been more precise.

The architecture that handles both is two-stage: section summaries for coarse navigation and relevance filtering, then raw chunk retrieval within the matched section for precise extraction. The summary routes the query to the right section. The chunk answers it. This avoids the over-generalisation risk in summary-only retrieval while preserving the cross-section context that chunking alone loses.

The compliance domain has exactly the same cross-section reasoning problem as clinical records. A question like "what legal basis justifies this data processing" requires connecting the stated purpose (one section), the legal basis (another section), and the retention policy (a third section). The same hierarchical structure-aware approach applies directly.

Ranjan Dailata • May 6 • Edited

Great explanation Olebeng. It's inspiring to see your in-depth analysis on this topic. There is an on-going research by PageIndex company, they are doing it with a completely different approach by vector less. Yes, we need to go case by case basis. I wish you may also like my other blog post - dev.to/ranjancse/a-vectorless-rag-...

Olebeng • May 6

I am currently grappling with this issue with my current build and conversations like these help put concepts into perspective. I will definitely give your other blog a read as well.

Athreya aka Maneshwar • Apr 11

Nice explanation, hadn't thought about this, thanks for the blog :)

Ranjan Dailata • Apr 11

I am glad you liked it :)