跳轉到

Request Retrial

In this pattern, we'll explore how API services can define a retry policy for both determining which requests are eligible to be retried as well as the timing algorithm to determine how long to wait before retrying.

Implementation

Retry eligibility

GENERALLY RETRIABLE

Code Name Description
408 Request Timeout The client didn't produce a request fast enough.
421 Misdirected Request The request was sent to a server that couldn't handle it.
425 Too Early The server doesn't want to try handling a request that might be replayed.
429 Too Many Requests The client has sent too many requests in a given period of time.
503 Service Unavailable The server cannot handle the request because it's overloaded.

DEFINITELY NOT RETRIABLE

Code Name Description
403 Forbidden The request was fine, but the server is refusing to handle it.
405 Method Not Allowed The method specified is not allowed.
412 Precondition Failed The server does not meet the conditions of the request.
501 Not Implemented The server cannot recognize or handle the request.

MAYBE RETRIABLE

Code Name Description
500 Internal Server Error An unexpected failure occurred on the server.
502 Bad Gateway The request was passed to a downstream server that sent an invalid response.
504 Gateway Timeout The request was passed to a downstream server that never replied.

API definition

async function getChatRoomWithRetries(
  id: string, maxDelayMs = 32000, maxRetries = 10): Promise<ChatRoom> {
  return new Promise<ChatRoom>(async (resolve, reject) => {
    let retryCount = 0;
    let delayMs = 1000;
    while (true) {
      try {
        return resolve(GetChatRoom({ id }));
      } catch (e) {
        if (retryCount++ > maxRetries) return reject(e);
        await new Promise((resolve) => {
          let actualDelayMs;
          if ('Retry-After' in e.response.headers) {
            actualDelayMs = Number(
              e.response.headers['Retry-After']) * 1000;
          } else {
            actualDelayMs = delayMs + (Math.random() * 1000);
          }
          return setTimeout(resolve, actualDelayMs);
        });
        delayMs *= 2;
        if (delayMs > maxDelayMs) delayMs = maxDelayMs;
      }
    }
  });
}

Exercises

  1. Why isn't there a simple rule for deciding which failed requests can safely be retried?
Some are due to client-side errors, whereas some are server-side errors .
  1. What is the underlying reason for relying on exponential back-off? What is the purpose for the random jitter between retries?
When we know nothing else about the system.
To prevent from a stampeding herd.
  1. When does it make sense to use the Retry-After header?
When the service is in control of when the next request is allowed.

Summary

  • Errors that are in some way transient or time related (e.g., HTTP 429 Too Many Requests) are likely to be retriable, whereas those that are related to some permanent state (e.g., HTTP 403 Forbidden) are mostly unsafe to retry.

  • Whenever code automatically retries requests, it should rely on some form of exponential back-off with limits on the number of retries and the delay between requests. Ideally it should also introduce some jitter to avoid the stampeding herd problem where all requests are retried according to the same rules and therefore always arrive at the same time.

  • If the API service knows something about when a request is likely to be successful if retried, it should indicate this using a Retry-After HTTP header.