Brother Hua is honored to participate in the activities around nuggets , Thank you for your support to me and nuggets , Leave a message in the comments section below , Random gifts ~
Lottery software ： Lottery software , Enter in the order of comments
Prize draw ： Randomly select two partners , Give me the Nuggets badge 1 gold
It has been nearly a month since the last posting , Recently, all kinds of things are busy together , At the beginning of this month, I was lucky to get the official 【 Apply for nuggets free 】 Activity qualification , This is not to send benefits to my friends today , After all, your happiness is my motivation , Haha haha .
Long thoughts , Finally, I decided to share some distributed knowledge with you today , Use simple examples to talk about the relevant solutions of distributed transactions , This is also a high-frequency question in the interview , Learn this article and you will be interested in
Local message table 、
Final consistency 、
Best effort notification And have a better understanding of the corresponding scene .
Single system transaction
Let's not talk about distribution here , Just talk about business , What is business ？ I think everyone can say something , For example, the most commonly heard atomicity 、 Uniformity 、 Isolation, 、 persistence , namely ACID characteristic , How to explain it in official words , Wikipedia uses the example of transfer to illustrate ,A Transfer the account to B Account 100 element , This business includes A Account minus 100 element 、B Account plus 100 Meta these two operations , For systems that support transactions , In any case , Ensure that both operations can be completed , You can't A Account deduction succeeded , however B No money added to the account .
Here with Nuggets level 1 and below , Take a look back. ACID：
- Atomicity （Atomicity）： The transaction is executed as a whole , The operations contained in the transaction are either not performed , Or all . That is, in the transfer scenario ,A Account deduction 、B Both actions of account addition must be successful or failed .
- Uniformity （Consistency）： Data must meet integrity constraints , Transition from one consistency state to another consistency state . That is, there is no... In the transfer scenario A Account deduction 100 element , however B There is no increase in accounts .
- Isolation, （Isolation）： When multiple transactions are executed concurrently, they do not interfere with each other .
- persistence （Durability）： Once the transaction is complete , Changes to the database will be saved permanently , Other subsequent operations will not affect the committed transaction results .
Next, let's use a common payment scenario to describe the transactions in a single machine system .
Now my little partner , I believe everyone often buys online , Have you ever thought about , After we complete the payment , How does the mall system work , To complete the order modification 、 Inventory deduction 、 Coupon deduction 、 Notice of news 、 Logging and so on . The above figure is a common way to write in a stand-alone system , Complete the following operations in a transaction ：
- The user initiates and completes the payment
- Payment callback received
- Open transaction
- Order status modification
- Coupon deduction
- Notice of news
- Commit transaction
In this process, we use Business , Once one of them fails , The system will roll back the completed operation , For example, after logging fails , Will take the first two steps （ Order status modification 、 Coupon deduction ） The operation is rolled back to the state before modification , So as to ensure the business correctness of the system .
Although the logic of the above writing is clear 、 Implement a simple , But the same disadvantage is very obvious , High code coupling , For example, a new scoring system will be added later , You must continue to add... To the original logic ; In addition, this pattern is not applicable to high concurrency business scenarios , How to solve this problem , That's the distributed transaction we're going to talk about today .
When it comes to distributed systems , The first impression is ： wow , It's so tall , Look at yourself again ** What I write every day is the demand for spicy chicken . That's an important part of distributed , How do we understand distributed transactions ？ We all know about stand-alone systems , Each module operates the local database in sequence , To complete the addition and modification of data , Roll back all operations when an exception occurs . Distributed transactions are similar , First look at the picture below ：
Distributed system , Each module in the original stand-alone system will be disassembled into independent systems , Deploy to different servers , Each subsystem can run independently , Independent of other subsystems .
When the user completes the payment , In fact, the whole process is basically consistent with the single machine system , The order system first receives the payment success notification , While operating on the database, it will also notify all other subsystems to start working , When other subsystems receive the message , Will complete their own business processing and finally write to the database . If all subsystems successfully complete their work , Then the whole process ends perfectly , However, if a subsystem is due to a defect or network timeout , For example, the message module is down , The user was not notified in time , This leads to no complete closed loop of the whole business , What should we do then ？ In fact, it's used here Distributed transactions 了 .
Next, several common distributed transaction solutions are analyzed
2PC yes Two-phase commit protocol Abbreviation , Which translates as Two-phase commit , So what is a two-stage submission （ A face of meng ）. seeing the name of a thing one thinks of its function , It is to control the submission of transactions in two stages （ Preparation stage 、 Submission phase ）, stay 2PC in , It also introduces two important roles , One is the transaction coordinator , The other is the participant . A simple example , We are in business , Should encounter in order to complete a requirement , You need to add two pieces of data to two databases （ Main library 、 Sub Library ） in , At this time, we will complete these two operations in the same transaction .
Don't worry if you don't understand , Next, brother Hua uses the diagram to disassemble the process step by step .
In the preparation stage , The transaction coordinator sends a preparation request to all participants , Ask if the commit operation can be performed , Participants receive the request and feed back the preparation results to the coordinator , The complete flow chart is shown in the figure below .
- Preparation stage
- Submission phase
When the coordinator receives feedback from all participants, the message is 【 Prepare to succeed 】 when , The coordinator will notify each participant to enter the submission phase ; At this point, the participant node completes the operation , And release the resources occupied during the whole transaction , And initiate... To the coordinator 【 Submit successfully 】 Feedback ; Finally, the coordinator completes the transaction .
There will be another situation ： In the preparation stage , One participant returned that preparation failed , The coordinator will notify all participants , Each participant is required to roll back , After the participant rolls back successfully , The coordinator will be informed 【 Roll back the success 】.
- Preparation stage
- Submission phase
After reading the above pictures , We know ,2PC Ensure that all participants in the first stage are ready to succeed （ Failure ） when , Through the coordinator, complete the of each database （ participants ） Commit a transaction （ Roll back ） The notice of , Finally, each role cooperates to complete the submission of the whole distributed transaction .
Serious friends will have a question , What if a participant fails to submit during the submission phase ？ Because there are two situations in the preparation stage , So in the submission phase , It will be discussed in two cases , Also discuss separately ：
- If the second stage is Commit transaction ： By trying again and again , Until all participants have submitted , If you fail to execute successfully in the end , That can only be through manual active intervention ......
- If the second stage is Roll back the transaction ： It will also try again and again , Until all participants complete the rollback , Otherwise, the participants in the first phase will always be blocked .
TCC The whole process is divided into three stages , Namely Try、Confirm、Cancel,
- Try Stage ： This stage is about testing the resources of each service and locking or reserving the resources
- Confirm Stage ： This stage refers to the confirmation operation , Actually, it has been really implemented
- Cancel Stage ： If the business method execution of a service fails , The successful business logic will be rolled back
Take the example of transfer , When transferring money across banks , Need a distributed transaction involving two banks , from A The bank directed B Bank transfer 100 element , The whole process is as follows ：
- Try Stage ： frozen A Bank account 100 element ,B Bank account pre increase 100 element ;
- Confirm Stage ： Perform the actual transfer operation ,A Capital deduction from bank account ,B The increase of funds in the bank account ;
- Cancel Stage ： If any bank fails , Then you need to roll back to compensate , For example A If the bank account has been deducted , however B The increase in bank account funds failed , Then we have to put A Add back the bank account funds .
TCC The intrusion into the business is large and the business is tightly coupled , To tell the truth, this scheme is rarely used by people , But there are also scenarios where it applies .
A more suitable scene ： Very high requirements for consistency , For example, the common scenario is fund , It can be used TCC programme , Write a lot of business logic by yourself , Judge whether each link in a transaction is executed normally , If an exception occurs, execute the rollback operation .
But in general , In this scheme, transaction rollback relies heavily on handwritten code for rollback and compensation , It will cause huge compensation code , Not recommended for easy use .
Local message table
The mechanism is in the database of each system , Add a message table , After operating the system business table （ As in step 1/5） after , A message record related to the business will be added and saved in the message table （ As in step 2、 step 6）, Finally, the integrity of the whole distributed transaction is guaranteed through the whole link .
Here's the picture , In order to ensure A System 、B All the operations of the system are in one transaction , We will be in A After the system inserts business data , The unique identification information of the data （ Such as ID） Save to message table , Note that the status to be confirmed is recorded in the message table , And then A System notification MQ; then B The system will get MQ The messages in the , Start your own business logic processing , That is, first insert the business table data , Then insert the message table data .
There is one caveat ：B When the system inserts message table data , We need to pay attention to MQ Some characteristics of , That is, the problem of repeated consumption , therefore B When the system inserts the message table , Ensure that this operation is performed for the first time , Can pass B Unique in the system business table ID Make sure .
And then B System callback A System , Inform them that the operation of the system is successful , then A When the system receives the message , take A The status of the system message table is changed to completed , This is the end of the entire distributed .
In order to ensure B The system can receive messages normally ,A The system can add polling operations , For all messages to be confirmed every 1s Poll once , Check whether the specified time has elapsed （ Such as 1 minute ） No response yet , You can choose to resend or rollback according to your business .
The defect of this scheme is that it relies heavily on the message table of the database , In concurrency scenarios, bottlenecks are also obvious , Moreover, the system needs to tolerate data inconsistency for a certain period of time .
This pattern is implemented through message oriented middleware , Such as RocketMQ To complete distributed transactions .
First A The system will send a message prepared Status message to MQ in , This type of message is not visible to subscribers , Therefore, it will not be consumed ; Once the transmission is successful ,A The system will continue to complete the execution of local transactions , If the execution is normal , Another confirmation message will be sent , inform mq Local transaction execution completed , You can inform B The system has completed consumption .RocketMQ Will poll prepared Status messages , No confirmation message has been received within a certain period of time , Will take the initiative to conduct a counter check , Confirm whether the message is successful .
When step 3 After execution ,B The system will now receive MQ The news of , Start local transaction , After execution , Will consume the message ; if B The system encountered an exception while executing a local transaction , There are several solutions , Such as coordination MQ retransmission 、 inform A System retransmission 、 Add other middleware （ Such as zk） etc. .
Besides , Also pay attention to B When the system consumes messages , The problem of idempotency , Similar to the local message table , No more details here .
Best effort notification
The most intuitive expression of the scheme is ： I've tried my best to inform other systems , If this still can't be done , Then I have no choice , At this time, it can only be intervened manually . This scheme is suitable for the situation that the requirements for distributed transactions are not strict , Like logging 、 SMS notification of successful purchase .
This kind of scheme is suitable for A system , The pressure is relatively small , It just completes the local transaction and reports to MQ Send message in , Even if it's the end of this business , The rest will be handed over to 【 Best effort notification service 】 To coordinate , If the best effort notification service fails to receive B Systematic feedback , A certain threshold can be set （ Such as 20 Time ） Retry , When the threshold is exceeded , You can notify people to intervene or give up .
The process is as follows ：
- System A After executing the local transaction , Send a message to MQ;
- Try your best to inform service consumption MQ Then write it down in the database , Or put it in a memory queue , Then call the system B The interface of ;
- If the system B Successful implementation , This transaction ended normally ; But if the system B Execution failure , The best effort notification service will try to call the system again B, Over and over again N Time , Finally, if you still fail to succeed, give up or notify the labor directly .
This section explains several common schemes in distributed transaction processing , in application , We should combine our own business to choose .
such as 2PC Applicable to database level ,TCC It belongs to the thought of compensatory things , Is to accomplish things at the business level , But the code is more intrusive , Choose carefully , And the last three local messages 、 News of things 、 Best effort notification , The common idea is to ensure final consistency , It is suitable for scenarios that are not sensitive to time and do not have strict distributed requirements .
Finally, leave a question for the partners , What solutions do you use in your daily development , You can share the pit with you in the comment area , Finally, we will randomly select two partners to give today's Nuggets .