The Context Loss
Difficulty: HARDID: rag-chunking-strategy
The Scenario
You are building a RAG system for a 50-page employee handbook. You embed the entire document as a single vector.
User asks: "What is the holiday allowance?" Vector Search: No results found.
The Problem
Embedding Dilution. When you squash 50 pages into a single vector (array of 1536 numbers), specific details like "25 days holiday" get averaged out by the noise of the other 49 pages. The vector represents the "average topic" of the document, not specific facts.
The Goal
Implement Chunking:
- Split the text into smaller segments (e.g., 100 characters).
- Add Overlap (e.g., 20 characters) to preserve context between chunks.
- Return the list of chunks.
Requirements:
chunk_size: 100 charsoverlap: 20 chars- Ensure no data is lost at boundaries.
solution.py
Loading...
⚠️ Do not include PII or secrets in your code.
SYSTEM_LOGS
5/5
// Waiting for execution trigger...
PREVIEW MODE — SOLVE PREVIOUS MISSIONS TO UNLOCK