The Scenario

Your reverse proxy has a 10-second timeout. If no data is sent, it kills the connection with 504 Gateway Timeout.

GPT-4 takes 30 seconds to "think" before generating the first token for complex prompts.

Result: Your users see timeout errors, even though the LLM is working.

The Problem

Your code uses synchronous blocking:

def get_completion(prompt):
    response = openai.chat.completions.create(...)
    return response.choices[0].message.content  # Blocks for 30s

The HTTP request hangs. The gateway times out. Users leave.

Refactor to use a Generator that streams tokens as they arrive:

def stream_completion(prompt):
    for chunk in openai.chat.completions.create(stream=True):
        yield chunk  # Send data immediately

Requirements:

Note: This is how ChatGPT delivers responses word-by-word in real-time.