The Scenario

You are building a RAG system for a 50-page employee handbook. You embed the entire document as a single vector.

User asks: "What is the holiday allowance?" Vector Search: No results found.

The Problem

Embedding Dilution. When you squash 50 pages into a single vector (array of 1536 numbers), specific details like "25 days holiday" get averaged out by the noise of the other 49 pages. The vector represents the "average topic" of the document, not specific facts.

The Goal

Implement Chunking:

Split the text into smaller segments (e.g., 100 characters).
Add Overlap (e.g., 20 characters) to preserve context between chunks.
Return the list of chunks.

Requirements:

chunk_size: 100 chars
overlap: 20 chars
Ensure no data is lost at boundaries.

The Context Loss

The Scenario

The Problem

The Goal