The Reasoning Buffer

Difficulty: MEDIUMID: ai-reasoning-heartbeat

The Scenario

You are integrating GPT-5.2 "Deep Think" (or o3-pro) into a customer dashboard. These models function in two phases:

  1. Thinking Phase: 30-120 seconds of silence while the model plans (no tokens generated).
  2. Generation Phase: Rapid token streaming.

The Crash: Your API Gateway (AWS API Gateway / Vercel) has a hard 29-second timeout. Because the Thinking Phase emits zero tokens for ~45 seconds, the gateway assumes the backend is dead and kills the connection with 504 Gateway Timeout.

The Goal

Implement a Heartbeat Generator that wraps the LLM stream:

  1. Race Condition: Wait for tokens from the LLM.
  2. Heartbeat: If no token arrives within heartbeat_interval (e.g., 5s), yield a "processing" comment to keep the HTTP connection alive.
    • Example: <!-- internal: thinking --> or specific SSE comment.
  3. Passthrough: Once the LLM starts generating, pass the tokens through immediately.

Requirements:

  • Implement stream_with_heartbeat(mock_llm_stream).
  • If the stream is silent for > 5s, yield a heartbeat.
  • Do not buffer the actual content; stream it as soon as it arrives.
solution.py
Loading...
⚠️ Do not include PII or secrets in your code.
SYSTEM_LOGS
5/5
// Waiting for execution trigger...
PREVIEW MODE — SOLVE PREVIOUS MISSIONS TO UNLOCK