The Context Loss

Difficulty: HARDID: rag-chunking-strategy

The Scenario

You are building a RAG system for a 50-page employee handbook. You embed the entire document as a single vector.

User asks: "What is the holiday allowance?" Vector Search: No results found.

The Problem

Embedding Dilution. When you squash 50 pages into a single vector (array of 1536 numbers), specific details like "25 days holiday" get averaged out by the noise of the other 49 pages. The vector represents the "average topic" of the document, not specific facts.

The Goal

Implement Chunking:

  1. Split the text into smaller segments (e.g., 100 characters).
  2. Add Overlap (e.g., 20 characters) to preserve context between chunks.
  3. Return the list of chunks.

Requirements:

  • chunk_size: 100 chars
  • overlap: 20 chars
  • Ensure no data is lost at boundaries.
solution.py
Loading...
⚠️ Do not include PII or secrets in your code.
SYSTEM_LOGS
5/5
// Waiting for execution trigger...
PREVIEW MODE — SOLVE PREVIOUS MISSIONS TO UNLOCK