The Scenario

Your RAG application is becoming popular, but your Model Bill is exploding. Users ask the same questions repeatedly: "How do I reset my password?"

Current State: Every query goes straight to GPT-5.2 ($0.03/req). You are burning money on duplicate compute for identical questions.

The Goal

Implement a Tiered Cache Strategy:

Exact Match (Tier 1): Check if query exists exactly in cache. (Free, 0ms)
Semantic Match (Tier 2): If no exact match, embed the query and find nearest neighbor.
Reranker Validation (Tier 3): If similarity > 0.88, use a Cross-Encoder to confirm intent matches.
Hygiene (TTL): Data older than 30 days is considered 'stale' and must be ignored.

Helpers Provided:

mock_embedder.embed(text): Returns a vector.
mock_reranker.verify(q1, q2): Returns True if intents are identical.
cosine_similarity(v1, v2): Returns [0.0-1.0].

The Wallet Burner

The Scenario

The Goal