The Reasoning Buffer
Difficulty: MEDIUMID: ai-reasoning-heartbeat
The Scenario
You are integrating GPT-5.2 "Deep Think" (or o3-pro) into a customer dashboard. These models function in two phases:
- Thinking Phase: 30-120 seconds of silence while the model plans (no tokens generated).
- Generation Phase: Rapid token streaming.
The Crash:
Your API Gateway (AWS API Gateway / Vercel) has a hard 29-second timeout.
Because the Thinking Phase emits zero tokens for ~45 seconds, the gateway assumes the backend is dead and kills the connection with 504 Gateway Timeout.
The Goal
Implement a Heartbeat Generator that wraps the LLM stream:
- Race Condition: Wait for tokens from the LLM.
- Heartbeat: If no token arrives within
heartbeat_interval(e.g., 5s), yield a "processing" comment to keep the HTTP connection alive.- Example:
<!-- internal: thinking -->or specific SSE comment.
- Example:
- Passthrough: Once the LLM starts generating, pass the tokens through immediately.
Requirements:
- Implement
stream_with_heartbeat(mock_llm_stream). - If the stream is silent for > 5s, yield a heartbeat.
- Do not buffer the actual content; stream it as soon as it arrives.
solution.py
Loading...
⚠️ Do not include PII or secrets in your code.
SYSTEM_LOGS
5/5
// Waiting for execution trigger...
PREVIEW MODE — SOLVE PREVIOUS MISSIONS TO UNLOCK