Practice of high availability architecture of Tongcheng travel network based on rocketmq

Rocketmq developer 2021-10-14 06:39:18


Background introduction

Why do you choose RocketMQ

We decided a few years ago to introduce MQ when , There are many mature solutions in the market , such as RabbitMQ , ActiveMQ,NSQ,Kafka etc. . Considering the stability 、 Maintenance cost 、 Company technology stack and other factors , We chose RocketMQ :

  • pure Java Development , Without relying on , Easy to use , Problems can hold ;

  • After Ali's double 11 test , performance 、 Stability can guarantee ;

  • Functional and practical , The sender : Sync 、 asynchronous 、 unilateral 、 Delay sending ; The consumer end : Message reset , Retry queue , Dead letter queue ;

  • The community is active , Communicate and solve problems in time .


  • It is mainly used for peak clipping 、 decoupling 、 Asynchronous processing ;

  • It's on the train ticket 、 Plane ticket 、 Hotels and other core businesses are widely used , Carrying the huge wechat entrance traffic ;

  • In payment 、 Order 、 Ticket issue 、 Data synchronization and other core processes are widely used ;

  • Every day 1000+ One hundred million pieces of consumer goods Interest turnover .

The picture below is MQ Access frame diagram

Because of the company's technology stack ,client sdk We provide java sdk ; For other languages , Convergence to http proxy , Shielding language details , Saving maintenance costs . According to the major business lines , The back-end storage nodes are isolated , Mutual indifference .

MQ Double center transformation

There have been network failures in the single room before , Great impact on business . To ensure high availability of business , The transformation of double centers in the same city has been put on the agenda .

Why do you have a dual center

  • Single room fault service available ;

  • Make sure the data is reliable : If all the data are in one computer room , Once the computer room breaks down , There is a risk of data loss ;

  • Lateral capacity : Single room capacity is limited , Multiple computer rooms can share the traffic .

Dual Center Program

Before we do double center , We did some research on the scheme of double centers in the same city , It's mainly cold ( heat ) Backup 、 Double live two kinds of . At that time, the community Dledger The version hasn't appeared yet ,Dledger Version can be used as an alternative to dual center .)

1) Cold in the same city ( heat ) Backup

Two independent MQ colony , User traffic is written to a primary cluster , The data is synchronized to the standby cluster in real time , The community has mature RocketMQ Replicator programme , Metadata needs to be synchronized on a regular basis , Like the theme , Consumer groups , Consumption progress, etc .

2) Double living in the same city

Two independent MQ colony , The user traffic is written to MQ colony , The data is not synchronized with each other .

The normal business is written into the computer room MQ colony , If a computer room hangs up , You can switch all the traffic requested by users to another computer room , The message will also be produced to another machine room .

For the double live scheme , Need to be solved MQ Cluster domain name .

1) If two clusters use one domain name , Domain names can be dynamically resolved to their respective computer rooms . This method requires the production of 、 Consumption must be in the same computer room . If production is in idc1 , Consumption is in idc2 , It's produced in this way 、 Each consumer connects to a cluster , No way to consume data .

2) If a cluster has a domain name , The business side has changed a lot , Our previous cluster of external services was deployed in a single center , A large number of operators have access to , It is difficult to popularize this scheme .
In order to minimize business side changes , Domain name can only continue to use the previous domain name , In the end, we adopted a Global MQ colony , Across double rooms , Whether the business is single center deployment or dual center deployment, it does not affect ; And just upgrade the client , No code changes needed .

Dual center appeal

  • Nearby principle : Producers in the A Computer room , The message of production is stored in A Computer room broker ;   Consumers are A Computer room , The news of consumption comes from A Computer room broker .
  • Single room failure : Production is normal , The news is not lost .
  • broker Primary node failure : Auto select master .

Nearby principle

In short , It's about two things :

  • node ( Client node , Server node ) How to judge where you are idc;
  • How can a client node determine where a server node is idc.
How to judge where you are idc?

1)  ip Inquire about
When a node starts, it can get itself ip , Through the company's internal components to query the computer room .

2) Environmental perception
I need to cooperate with my classmates , When the node is installed , Add some metadata of your own , For example, the computer room information is written into the local configuration file , Read and write the configuration file directly at startup .
We took the second option , No component dependency , In profile logicIdcUK The value of is the machine room flag .

How can a client node identify a server node in the same computer room ?

The client node can get the server node's ip as well as broker Name , therefore :

  • ip Inquire about : Query through internal components of the company ip Information of the computer room ;

  • broker Add computer room information to the name : In the configuration file , Add machine room information to broker On the name ;

  • Protocol layer adds machine room identification : When the server node registers with the metadata system , Register your computer room information together .

Compared to the first two , It's a little more complicated to implement , Changed protocol layer ,  We've adopted a combination of the second and the third .


Produce nearby    

Based on the above analysis , The idea of production nearby is very clear , The default priority is local production ;

If the service node of the computer room is not available , We can try to expand the production room , Business can be configured according to actual needs .

Consume nearby

Give priority to our computer room consumption , By default, all messages should be consumed .

The queue allocation algorithm is based on the computer room

  • Each computer room message is equally distributed to the consumer end of this computer room ;

  • There is no consumer side in this computer room , Share equally to other computer room consumers .

The pseudocode is as follows :

Map<String, Set> mqs = classifyMQByIdc(mqAll);Map<String, Set> cids = classifyCidByIdc(cidAll);Set<> result = new HashSet<>;for(element in mqs){ result.add(allocateMQAveragely(element, cids, cid)); //cid For the current client }

Consumption scenarios are mainly unilateral deployment and bilateral deployment .

Unilateral deployment , The consumer will pull all messages of each computer room by default .

Bilateral deployment , The consumer end will only consume the information of its own computer room , We should pay attention to the actual production of each computer room and the number of consumers , To prevent the occurrence of a computer room consumer side too little .

Single room failure

  • Each group broker To configure

One master and two slaves , One master and one slave are in a computer room , One from the other machine room ; A message from the end of synchronization , The message is sent successfully .

  • Single room failure

Message production across machine rooms ; Non consumed messages continue to be consumed in another computer room .


Fault switch to main

In a group broker When the primary node fails , To ensure the availability of the whole cluster , Need to be in slave Choose the master and switch . Do that , First of all, there has to be broker Arbitration system of main fault , namely nameserver( hereinafter referred to as ns ) Metadata system ( Be similar to redis The sentinel in ).

ns The nodes in the metadata system are located in three computer rooms ( There is a third-party cloud computing room , Deploy... On the cloud ns node , The amount of metadata is small , The delay is acceptable ), Three rooms ns Node passing raft Choose one of the agreements leader,broker The node synchronizes the metadata to leader, leader Synchronize metadata to follower .

When the client node gets the metadata , from leader,follower All data can be read in .

Cut to the main process

  • if nameserver leader Monitoring to broker The master node is abnormal , And ask for other follower confirm ; Half of the follower Think broker Node exception , be leader Notice on broker Select the master from the nodes , If the synchronization progress is large, choose the primary node ;

  • Newly elected broker The master node performs the switch action and registers to the metadata system ;

  • The production end can't move to the old one broker The master node sends messages .

The flow chart is as follows

Cut center drill

User request load to dual Center , The following operation first switches the flow to the second center --- Back to double center --- Cut to a center . Make sure that each center can take on the full amount of user requests .
First, switch all user traffic to the second center

Traffic returns to dual centers , And cut to a center


  • overall situation Global colony

  • Nearby principle

  • One master and two subordinates , After half of the message is written, the message is written successfully

  • Metadata system raft Elector

  • broker Primary node failure , Auto select master

MQ Platform governance

Even if the system is high performance 、 High availability , If it is used casually or irregularly , There are all kinds of problems , Increased unnecessary maintenance costs , Therefore, the necessary means of governance are indispensable .


  • Make the system more stable

  • Alarm in time

  • Rapid positioning 、 Stop loss

What aspects of governance

The theme / Consumer group governance

  • Apply for the use of

Production environment MQ colony , We turned off the automatic creation of themes and consumption groups , Before use, you need to apply for and record the item identification and user of the theme and consumption group . When something goes wrong , We can immediately find the person in charge of the theme and consumption group , Learn about . If there is a test , Grayscale , Production and many other environments , You can apply for multiple clusters to take effect at the same time , Avoid the trouble of applying cluster by cluster .

  • Production speed

To avoid business negligence, send a lot of useless messages , It is necessary to flow control the theme production speed on the server side , Avoid this topic crowding out the processing resources of other topics .

  •   There's a backlog of news

Consumer groups sensitive to message accumulation , The user can set the threshold value of message accumulation quantity and alarm mode , Over this threshold , Inform the user immediately ; You can also set the threshold of message stacking time , Not consumed for more than a period of time , Inform the user immediately .

  • The consumer node is offline

The consumer node is offline or has no response for a period of time , The user needs to be informed .
Client governance

  • send out 、 Consumption time test

Monitoring transmission / The time it takes to consume a message , Low performance applications detected , Inform users to start modification to improve performance ; At the same time, monitor the message body size , The average size of message body exceeds 10 KB Project , Push projects to enable compression or message refactoring , Control the message body in 10 KB within .

  • Message link tracking

A message from which ip 、 At what point in time , And by what ip 、 At what point in time , Plus the server statistics of message reception 、 Message push information , It forms a simple message link tracing , Concatenate the life cycle of a message , Users can query msgId Or preset key view message 、 Troubleshoot problems .

  • Low or hidden version detection

As functions iterate ,sdk Versions will also be upgraded and may introduce risks . Report regularly sdk edition , Push the user to upgrade the problematic or too low version .
Server side governance

  • Cluster health inspection

How to judge whether a cluster is healthy ? Detect the number of nodes in the cluster regularly 、 Cluster write tps 、 consumption tps , And simulate user production 、 News consumption .

  • Cluster performance inspection

The performance index is finally reflected in the processing time of message production and consumption . Server statistics processing each production 、 Time of consumption request , In a statistical period , If there is a certain proportion of message processing time is too long , The performance of this node is not good ; The main cause of performance problems is the physical bottleneck of the system , For example, disk. io util High utilization rate ,cpu load higher , These hardware indicators automatically alarm through the Nighthawk monitoring system .

  • Cluster high availability      

High availability is mainly for broker in master The node can not work normally due to hardware and software failure , slave The node is automatically switched to master , Suitable for message order 、 Scenarios where cluster integrity is required .

Part of the background operation display

Theme and consumer group application

production , consumption , Real time statistics

Cluster monitoring

The pit of tread

The community is right MQ The system has experienced a long time of improvement and precipitation , We have also come to some problems in the use process , We need to learn more about the source code , Do not panic when there is a problem , Quick stop loss .

  • When new and old consumers coexist , Our queue allocation algorithm is not compatible , Be compatible ;

  • The theme 、 There are many consumer groups , Registration takes too long , Memory oom , Shorten registration time by compression , The community has been restored ;

  • topic The length is inconsistent , Cause the restart to lose the message , The community has been restored ;

  • centos 6.6 In the version ,broker Process feign death , upgrade os Version can .

MQ Future outlook

At present, the message retention time is short , It's not convenient for troubleshooting and data prediction , We're going to archive historical messages and predict data based on them .

  • Historical data archiving

  • Underlying storage stripping , Separation of computing and storage

  • Based on historical data , Do more data forecasting

  • Upgrade the server to Dledger , Ensure strict consistency of messages

Learn more about RocketMQ Information , You can join the community communication group , Here's the nail group , Welcome to add group message .

This article is from WeChat official account. - RocketMQ Officer of micro (ApacheRocketMQ).
If there is any infringement , Please contact the Delete .
Participation of this paper “OSC Source creation plan ”, You are welcome to join us , share .

Please bring the original link to reprint ,thank
Similar articles