The Scenario

You are integrating GPT-5.2 "Deep Think" (or o3-pro) into a customer dashboard. These models function in two phases:

Thinking Phase: 30-120 seconds of silence while the model plans (no tokens generated).
Generation Phase: Rapid token streaming.

The Crash: Your API Gateway (AWS API Gateway / Vercel) has a hard 29-second timeout. Because the Thinking Phase emits zero tokens for ~45 seconds, the gateway assumes the backend is dead and kills the connection with 504 Gateway Timeout.

The Goal

Implement a Heartbeat Generator that wraps the LLM stream:

Race Condition: Wait for tokens from the LLM.
Heartbeat: If no token arrives within heartbeat_interval (e.g., 5s), yield a "processing" comment to keep the HTTP connection alive.
- Example:  or specific SSE comment.
Passthrough: Once the LLM starts generating, pass the tokens through immediately.

Requirements:

Implement stream_with_heartbeat(mock_llm_stream).
If the stream is silent for > 5s, yield a heartbeat.
Do not buffer the actual content; stream it as soon as it arrives.

The Reasoning Buffer

The Scenario

The Goal