In distributed systems, average response times may be excellent, yet a small percentage of requests can take significantly longer due to temporary server congestion, network delays, or other transient issues.
To minimize the impact of these slow requests, many large-scale systems employ a technique known as Request Hedging.
Request Hedging is a latency optimization technique in which an additional copy of a request is sent if the original request does not receive a response within a predefined time threshold.
The system uses the first successful response it receives and cancels any remaining duplicate requests.
Consider the following scenario:
Minimizes the impact of unusually slow requests caused by temporary performance issues.
Reduces the likelihood of users experiencing long waiting times.
Provides a faster alternative when one server is temporarily overloaded or experiencing performance degradation.
Request Hedging is commonly used in:
Sending duplicate requests consumes additional network and server resources if not carefully controlled.
Launching the duplicate request too early wastes resources, while launching it too late reduces its effectiveness in lowering latency.
Although both techniques improve reliability, they operate differently.
Retry is triggered after a request has failed or timed out.
Request Hedging sends a duplicate request before the original request is considered failed, with the goal of reducing response latency rather than recovering from failure.
To implement Request Hedging effectively:
Yes, it can slightly increase system load because duplicate requests are generated. However, when properly configured, the improvement in response latency often outweighs the additional overhead.
No. It is most beneficial for latency-sensitive services where fast and predictable response times are critical.

Request Hedging is an advanced optimization technique that reduces the impact of slow requests in distributed systems. By issuing backup requests before the original request is declared failed, organizations can significantly reduce tail latency, improve user experience, and enhance the reliability of high-performance services.