Request Retrial
In this pattern, we'll explore how API services can define a retry policy for both determining which requests are eligible to be retried as well as the timing algorithm to determine how long to wait before retrying.
Implementation
Retry eligibility
GENERALLY RETRIABLE
Code | Name | Description |
408 | Request Timeout | The client didn't produce a request fast enough. |
421 | Misdirected Request | The request was sent to a server that couldn't handle it. |
425 | Too Early | The server doesn't want to try handling a request that might be replayed. |
429 | Too Many Requests | The client has sent too many requests in a given period of time. |
503 | Service Unavailable | The server cannot handle the request because it's overloaded. |
DEFINITELY NOT RETRIABLE
Code | Name | Description |
403 | Forbidden | The request was fine, but the server is refusing to handle it. |
405 | Method Not Allowed | The method specified is not allowed. |
412 | Precondition Failed | The server does not meet the conditions of the request. |
501 | Not Implemented | The server cannot recognize or handle the request. |
MAYBE RETRIABLE
Code | Name | Description |
500 | Internal Server Error | An unexpected failure occurred on the server. |
502 | Bad Gateway | The request was passed to a downstream server that sent an invalid response. |
504 | Gateway Timeout | The request was passed to a downstream server that never replied. |
API definition
async function getChatRoomWithRetries(
id: string, maxDelayMs = 32000, maxRetries = 10): Promise<ChatRoom> {
return new Promise<ChatRoom>(async (resolve, reject) => {
let retryCount = 0;
let delayMs = 1000;
while (true) {
try {
return resolve(GetChatRoom({ id }));
} catch (e) {
if (retryCount++ > maxRetries) return reject(e);
await new Promise((resolve) => {
let actualDelayMs;
if ('Retry-After' in e.response.headers) {
actualDelayMs = Number(
e.response.headers['Retry-After']) * 1000;
} else {
actualDelayMs = delayMs + (Math.random() * 1000);
}
return setTimeout(resolve, actualDelayMs);
});
delayMs *= 2;
if (delayMs > maxDelayMs) delayMs = maxDelayMs;
}
}
});
}
Exercises
- Why isn't there a simple rule for deciding which failed requests can safely be retried?
Some are due to client-side errors, whereas some are server-side errors . |
- What is the underlying reason for relying on exponential back-off? What is the purpose for the random jitter between retries?
When we know nothing else about the system. |
To prevent from a stampeding herd. |
- When does it make sense to use the Retry-After header?
When the service is in control of when the next request is allowed. |
Summary
-
Errors that are in some way transient or time related (e.g., HTTP 429 Too Many Requests) are likely to be retriable, whereas those that are related to some permanent state (e.g., HTTP 403 Forbidden) are mostly unsafe to retry.
-
Whenever code automatically retries requests, it should rely on some form of exponential back-off with limits on the number of retries and the delay between requests. Ideally it should also introduce some jitter to avoid the stampeding herd problem where all requests are retried according to the same rules and therefore always arrive at the same time.
-
If the API service knows something about when a request is likely to be successful if retried, it should indicate this using a Retry-After HTTP header.