The Scenario

Your agent runs on a high-end model (Gemini 3.0) with a 1M+ token context window. Context overflow is no longer the problem—cost and reasoning quality are.

Every request costs $0.05 because you're sending 50,000 tokens of "filler" chat history. The model is starting to hallucinate because of noise in the history.

The Goal

Implement a Tiered Memory System that prioritizes Information Density:

Extract Facts: Scan old messages for key entities (e.g., User's Name, Hobbies) and move them to a 'Fact Store'.
KV Cache Alignment: Structure your prompt so the "System" and "Fact Store" blocks effectively use the provider's prompt caching.
Recent Window: Keep only the last 10 messages as raw text.

Requirements:

Implement build_smart_context(system_prompt, history, user_input, fact_store).
Store facts in the fact_store dict (passed as parameter).
Build context in order: System Prompt → Facts → Recent History → User Input.
Ensure "Alice" and "sailing" are retrievable 100+ messages later.

The Memory Layer

The Scenario

The Goal