Overload control for scaling we chat microservices || Tushar Tyagi

Overload control for scaling wechat microservices

I read about this paper on The Morning Paper and the summary felt interesting enough that I wanted to read this myself. WeChat engineering team is running around 3000 microservices on close to 20000 machines, and the system works fine under surges of overload. When a large chunk of 1.3 billion people use your application at a time, the surge it produces is going to be phenomenol.

Overload Control

The paper first describes Overload Control which is the process that the microservices based system goes through to tackle the overload it is experiencing. This is required so that the services are available 24x7.

For simple stand-alone service, the overload control is delegated to the operating system. For multi-tier architecture, there’s a gateway service which rejects the requests under load instead of sending these to the application layer.

For microservices, the overload control process is challenging, because:

Since a system can have hundreds to thousands of microservices, monitoring each one for overload control is resource intensive
If a microservice dies under heavy load, all the other services dependent on it develop latency as well.
Overload control itself needs to be available under different environments and conditions of varying load.

Service Types

The paper describes the system as essentially a DAG where there can be two types of services:

Basic: These do not depend on any other services.
Leap: These depend on other services.

In addition, the Leap services can be either Entry level where there’s no incoming edge, and just outgoing edges. The user requests start from these services and move downstream to other services, e.g. Login.

The other type is Shared Leap which have a non-zero indegree and are used by multiple other services as intermediaries.

So the user requests start from Entry Leap, move through Shared Leap and end at Basic.

Overload Scenarios

The paper discusses three overload scenarios encountered in the wild:

An upstream service sends a request to an downstream service. The downstream service is overloaded under multiple requests, crashes, and affects the upstream services as well.
An upstream service sends more than one request to a single downstream service.
An upstream service sends more than one request to more than one downstream services.

From these, the last two are only encountered in the micro-service architecture and the scenario is called Subsequent Overload. The first one is also present in the N-tier, monolithic architecture.

Overload Detection

The parameter to do an overload detection is the response queuing time which is the time taken by the server to start processing* the request. This is different from the turnaround time since turnaround time will also factor in the time taken by other services which may not be the best metric as it varies throughout the day based on the load.

Admission Control

Admission control is the mechanism by which service request is admitted for processing by services. DAGOR uses a combination of Business and User oriented admission strategies:

Business Oriented

WeChat has prepared a hash of services & their priorities and the same is available on each server which serves the entry services. Services which are absolutely essential like Login and Payment have the highest priority, followed by services like Chat and Moments. These priorities are based on logic and by user analytics (how they complain when a service is down, etc.) Any service missing from the hash is considered to have the lowest priority.

A service which sends requests to other services copies the priority to the request header so other services can serve the requsets based on the given priority.

User Oriented

Users are given priority randomly using a hash function and for providing priority to the User, the servers change the hash function on an hourly basis. These hash functions take in the user id and gives a random priority from 1 to 128. An hour window is good enough to provide a uniform behavior to the end user.

This priority is introduced because of “looping” of service priorities that might happen:

To elaborate, let us consider the following scenario where load shedding in an overloaded service is solely based on the business-oriented admission control. Suppose the current admission level of business priority is τ but the service is still overloaded. Then the admission level is adjusted to τ − 1, i.e., all requests with business priority value greater than or equal to τ are discarded by the service. However, system soon detects that the service is underloaded with such admission level. As a consequence, the admission level is set back to τ, and then the service quickly becomes overloaded again. The above scenario continues to repeat. As a result, the related requests with business priority equal to τ are in fact partially discarded by the service in the above scenario.

The compound priority is arranged first by business priority and then by user priority.

Session Oriented

WeChat have rejected the idea of session oriented priority because users generally learn how to game the sessions by logging out and logging in again.

Adaptive Admission Control

Each server calculates the admission threshold dynamically based on the load and keeps changing this throughout the day. This threshold is a combination of the business and user priorities.

Workflow

User request reaches an entry service, and is assigned business and user priorities to the request.
The entry service makes request to the downstream services and each downstream service request gets a copy of the priorities in the header.
When the downstream service gets the request, it checks its admission level and either accepts or rejects the request depending on its capacity to serve the request.
In either case, the downstream service sends its current admission level to the upstream service with either the result or the rejection.
The upstream service updates its own admission status of the downstream service with the new value and throttles the new requests based on the new value.

The End

This is just a summary of the paper but I guess it mostly covers it. The paper is interesting and worth a read for anyone trying to understand how a really big system of microservices work.