The Scenario

Your service just crashed. Within seconds, 10,000 clients detect the failure and start retrying. They all retry at exactly the same time, overwhelming your service as soon as it tries to recover. This is a Retry Storm - a cascading failure caused by synchronized retries.

The Problem

Your current retry logic uses exponential backoff, but with fixed intervals. When 10,000 clients all fail at the same time, they all calculate the same backoff (1s, 2s, 4s...). They all retry simultaneously, creating thundering herd waves that prevent recovery.

The Goal

Implement Full Jitter - randomized exponential backoff that spreads retries across time.

Instead of waiting exactly 4 seconds, wait a random time between 0 and 4 seconds. This breaks synchronization and gives your service breathing room.

Requirements:

Use random.uniform(0, backoff) to add jitter
Exponential backoff: base delay doubles each retry (1s, 2s, 4s, 8s...)
Maximum retry attempts: 5
Cap maximum backoff at 30 seconds

The Retry Storm

The Scenario

The Problem

The Goal