The main problem in distributed system is how to keep the consistency of node state , Whatever happens failure, As long as most of the nodes in the cluster can work properly , Then these nodes have the same state , bring into correspondence with , stay client It looks like a machine .

The essence of consistency is replicated state machines, That is, all nodes are from the same state set out , They all go through the same sequence of operations (log), And finally to the same state. To ensure that each node performs the same operation sequence is raft What the algorithm wants to achieve . stay raft One of the algorithms is Leader Role ,client Interact with it , also Leader Coordinate Follower, Protect all the Follower Have the same sequence of operations , Finally, submit these operations , Make the state machine reach a new consensus stat.

Whole raft The algorithm is divided into Leader The election , Log distribution , Log compression ( snapshot ), Cluster member changes . Among them Leader Election is the core part of the algorithm . The algorithm guarantees at most one at any time Leader, But maybe not Leader( For example, during the election process or when the majority of cluster members are not available ). stay Leader After establishment , You can do log distribution , The algorithm ensures that the logs will be safely distributed to the cluster, and the logs applied to the state machine are the same as themselves . Snapshot is to reduce the amount of logs , Remove the intermediate process . The purpose of cluster member change is to use the new cluster configuration safely without stopping service .

Raft In the case of non Byzantine error , Including network delay 、 Partition 、 Packet loss 、 Errors such as redundancy and out of order can ensure correctness , No error results will be returned , That's security assurance . In fact, it is to ensure that all member state machines are in the same order , Execute the same order . Here are some typical scenarios ,raft How to ensure safety .

1. Leader After the election , If Follower And Leader How to deal with log conflicts ?

Raft Regulations Follower The conflict log in will be Leader Log overlay in , bring Follower The logs in are always related to Leader Keep your logs consistent .Leader We have to find Follower The last part of the log where the two agree , Then delete all log entries after that point , Send your own log to Follower. All of these operations are log copying RPC The consistency check is completed : Leader For each Follower One was maintained nextIndex, This means that the next one needs to be sent to Follower The purpose of the log entry index. When one Leader When you first got power , He initializes everything nextIndex Value for your last log index Add 1.Leader Every time you send a log copy RPC There will be two values in the list :prevLogIndex and prevLogTerm Corresponding to the index value of the log immediately before the new log entry (index) And term number (term), namely prevLogIndex = newIndex - 1, If Follower Can't find in its log that contains Leader Sent by index and term The entry of , Then he will refuse to accept new log entries . So this is the time Follower 's logs and Leader atypism , Log copy RPC The consistency check will fail . In being Follower After refusing ,Leader It will decrease nextIndex Value and retry . Final nextIndex It's going to be in a position to make Leader and Follower The log of the agreed . When this happens , Log copy RPC You will succeed , At this time, I will put Follower Delete all conflicting log entries and add Leader Log . Once the log is copied RPC success , that Follower My log will be with Leader bring into correspondence with , And for the rest of his term, he continued to . Go through the above steps ,Raft The algorithm ensures that the logs are copied sequentially . That is to say , If there is an old log that has not been copied to FollowerA, Then the updated log cannot be copied . Sequential replication of logs , A lot of simplification Raft Algorithm . For example, check the new and old logs of each member , Just compare the last log .

 2.  If in a Follower When it goes down Leader Submitted a number of log entries , And then this Follower It may be elected as Leader And override these log entries , So there will be inconsistencies .

Raft Through to Leader To restrict the election of , To ensure that any newly elected Leader For a given term number , All of the log entries submitted for the previous term , The restriction rule is :candidate The sent voting request message must be accompanied by its last log entry Index And Term; The receiver needs to judge that Index And Term At least as new as the last log entry in the local log , Otherwise, we won't vote . because   Previous Leader The condition for submitting log entries is that the logs are copied to most members of the cluster ,Candidate Elected as Leader It is also necessary for the majority of members to vote . Then there must be an intersection between these two members , That is, a member has the log , And voted for the new Leader, It means new Leader The log of is at least not older than that of the member , So new Leader Also has the log . This proves , Follow up Leader There must be a front Leader Submitted logs .

3. Even if the above election rules are guaranteed , And there's no guarantee of consistency , That is to say, there will be Leader After submitting log entries for previous terms , The entry may also be later Leader Inconsistencies caused by overlay . As shown in the figure below :

  • (a) S1 yes Leader, And partially copied index-2;
  • (b)  S1 Downtime ,S5 obtain S3、S4、S5 The vote for the new Leader(S2 Will not choose S5, because S2 My journal is more S5 new ), And in index-2 Write to a new entry , Now it's term=3( notes : The reason is term=3, Because in term-2 In the election of ,S3、S4、S5 At least one vote , That is, at least one knows term-2, Although they didn't term-2 Log );
  • (c)   S5 Downtime ,S1 Restored and elected as Leader, And start copying logs ( That is to say, it will come from term-2 Of index-2 Copied to S3), here term-2,index-2 Has been replicated to most servers , But it hasn't been submitted yet ;
  • (d)  S1 Down again , also S5 The restoration was elected again as Leader( adopt S2、S3、S4 vote , because S2、S3、S4 Of term=4<5, And the log entry is (term=2,index=2) did not S5 New log entry for , So support the election ), Then cover Follower Medium index-2 For coming from term-3 Of index-2;( notes : And then there is ,term-2 Medium index-2 Has been replicated to three servers , It's covered );
  • (e)  However , If S1 His current term of office was terminated before the outage (term-4) All of the items are copied out , Then the entry is submitted ( that S5 Will not win the election , because S1、S2、S3 Log term=4 Than S5 Duxin ). At this point, all the previous entries will be well submitted .

If the above situation (c) in ,term=2,index=2 After most of the log entries have been copied , If elected at this time S1 Submitted the log entry , Then the subsequent generation of term=3,index=2 It will cover it , At this point it may be in the same index The location submits a different log one after the other , This violates state machine security , There's a disagreement . That is to say, when a new Leader When elected , Because the log progress of all members is different , It's likely that you need to continue copying the front term Log entries for , Even if you copy to most servers and submit , Or it could be covered , Because the front term The corresponding log entry is older , It's easy to make other servers without these entries elected Leader, These log entries are then overridden .

In order to eliminate the above scenario, it is stipulated that Leader You can copy the previous term's log , But will not voluntarily submit the previous term's log . But by submitting a log of the current tenure , And indirectly submit the previous term's log .

4. When the configuration changes , It needs to be guaranteed that there can only be one at any time Leader, It's not safe to switch from the old configuration to the new one , As shown in the figure below :

In the process of transformation ,server1,server2 Choose one from the old configuration Leader( Two of the three support ),server3,server4,server5 Choose a new configuration Leader( Three of the five support ), This is in the same term There are two of them Leader, To cause disagreement , To ensure safety , Configuration changes must use a two-phase approach . stay Raft in , The cluster first switches to a transitional configuration , We call it consensus ; Once the mutual agreement has been submitted , Then the system switches to the new configuration .

Transitional configuration guarantees that it won't be old And new Both configurations produce Leader :

Transitional configuration means that the cluster has old + new The configuration of all machines in .

Leader A transition configuration log is generated , Apply it locally first , Then copy to all members in the transition configuration . All members who receive configuration , The configuration is applied directly to the local .

Members in transition configuration , stay Leader The rules changed when voting and submitting logs , They all want to get old And new The majority of the members in the two configurations agree . such as :


new:2、3、4、5、6  ( Delete machine 1, Add machines 4、5、6)

old+new:1、2、3、4、5、6( The machine is old+new All machines )

So the members in the transition configuration are Leader Election and submission logs , Need to be satisfied to get old(1、2、3) Most of the three members , as well as new(2、3、4、5、6) Most of the five members . instead of old+new A majority of the six members of the . It's crucial to understand that .

1) from old -> old+new Before the configuration is submitted

It's not happening yet new To configure , So it's impossible to new Configuration Leader.

See if it's possible to old And old+new The next generation is Leader.old The next step is to produce Leader need old A majority vote in ,old+new The next step is to produce Leader Also needed old A majority vote in , But in old And old+new The next generation is Leader, So in old At least one of the members of the group voted for old Medium Leader, And old+new Medium Leader. According to the voting rules , In a term You can only vote once , So it's impossible to old And old+new In the same configuration term produce Leader.

And in the difference term It doesn't violate the rules , Because of the new term The next generation Leader, Send heartbeat to old Leader, used Leader It's going to be Follower.

2)old+new After submission

as long as old+new This configuration submission , It means that the configuration is copied to old Majority of members in , So in old It's impossible to generate another one in the configuration Leader Come out . Only in old+new And new Configuration . And in the old+new And new It can't be in the same term Next, we will elect Leader, This proof is related to old And old+new The configuration is the same .

If one member is added or deleted at a time, no transitional configuration is required

That is to say , Only one member can be added or deleted per configuration change . If it is satisfied, it can be obtained directly from old The configuration changes to new To configure .

Prove that the old configuration has N Members , The new configuration has N+1 Members , In the switching process, it will not be generated separately in the old configuration and the new configuration Leader.

1) If elected under the old configuration Leader, You need to N/2+1 Members vote .

2) Under the new configuration , From the number of members is N+1 Get rid of and vote for N/2+1 There are only... Members left N/2 Members .

3) Under the new configuration, we have to elect Leader need (N+1)/2+1=N/2+3/2 Members vote , But only N/2 Members can vote , Therefore, under the new configuration, it is impossible to elect Leader.

4) And vice versa .

Reference resources :

raft More articles on protocol security assurance

  1. Understanding distributed consistency :Raft agreement

    Understanding distributed consistency :Raft agreement What is distributed consistency Leader The election Log replication process term The election cycle timeout Elections and elections timeout Election split Log replication and heartbeat timeout In distributed systems , ...

  2. MIT-6.824 Raft agreement

    Abstract raft It's a ratio. paxos Easy to understand consistency algorithm , Compared with paxos Simple and many . The first part of this paper describes the details of the algorithm , The latter part tries to discuss the principle of this algorithm . Algorithm description raft One of the reasons the algorithm is simple is that it decomposes the problem into ...

  3. Raft Protocol learning notes

    Catalog Catalog 1 1.  Preface 1 2.  Noun 1 3.  What is distributed consistency ? 3 4. Raft The election 3 4.1.  What is? Leader The election ? 3 4.2.  Realization of elections 4 4.3. Term and Lease ...

  4. Raft agreement -- Introduction to Chinese papers

    This blog is famous for RAFT Chinese translation of consistency algorithm paper , The paper is called <In search of an Understandable Consensus Algorithm (Extended Vers ...

  5. understand Raft agreement

    Catalog 1.Paxos The problem with the algorithm 2.Raft Algorithm     2.1 Copy state machine     2.2. Raft Algorithm     2.2.1 Security issues     2.2.2 Leader The election     2.2. ...

  6. Raft Agreement understanding

    raft The key parts of the agreement are the leadership election and log replication Log copy Log matching principle : If two logs are in the same index position entry The term of office is the same , So these two logs are exactly the same from the beginning to the index position . Log matching principle can be interpreted as the following two ...

  7. Raft agreement

    Paxos The problem is Paxos The description of the algorithm is academic , A lot of details are missing , Can't be directly applied to Engineering . Most of the distributed algorithms in practical engineering applications are Paxos Variants , It is also a difficult problem to verify the correctness of these algorithms . for instance : ...

  8. raft agreement

    One .Raft Consistency algorithm Eureka:Peer To Peer, The status of each node is equal , Each node can receive write requests , After each node receives the request , Package the request , Asynchronization is delayed a little bit , Synchronize data to Eure ...

  9. HTTP A more robust way to ensure that the password is not obtained under the protocol

    When it comes to http How to ensure the password security of user login under the protocol :     Xiaobai's first thought may be , When a user enters a password on the login page to log in , The foreground page encrypts the password entered by the user , Then use the encrypted password as http The request parameters are sent through the network ...

Random recommendation

  1. nodejs Advanced (6)— Connect MySQL database

    1. Build a library and connect it Connect MySQL Database needs installation support npm install mysql We need to install it ahead of time mysql sever End Build a database mydb1 mysql> CREATE DATABA ...

  2. WampServer Database import sql file

    WampServer in MySQL How to import sql file :

  3. Collect your own js-2014-2-23

    (function(){})( window.EventUtil={ addHandler:function(element,type,handler){ // alert(1); if(elemen ...

  4. C++ Quote and java The difference of quotation

    stay c++ The reference in is actually an alias for a variable , and java Is a variable that stores the address and address of the actual object C++ The pointer is very similar

  5. DirectoryInfo class

    DirectoryInfo Classes and Directory The relationship between classes and FileInfo Classes and File The relationship between classes is very similar . So let's talk about that DirectoryInfo Common properties of class . DirectoryInfo Class ...

  6. Android Realization KSOAP2 visit WebService

    Android Realization KSOAP2 visit WebService development tool :Andorid Studio 1.3 Running environment :Android 4.4 KitKat Code implementation Write a tool class to use for the main interface , The function is to use ...

  7. unity, The color changes with height shader

    One , The color changes with the height of world space . Shader "Custom/heightGradual_worldSpace" {    Properties {        _Color (& ...

  8. How to put SEL In the NSArray in

    My technology blog is often maliciously crawled and reprinted by rogue websites . Please move to the original :, Enjoy neat typography . Effective links . Correct code indentation . Better reading experience ...

  9. QT Window fade effect , Window shake effect , The mouse moves the window

    // Window fade effect void MainWindow::closeWindowAnimation() // Close the window effect { QPropertyAnimation *animation = new QProp ...

  10. JVM-ClassLoader( turn )

    In the loading phase, the method area is mainly used : Method area is a runtime memory area that can be shared by all threads . It stores the structure information of each class , For example, runtime constant pools (Runtime Constant Pool). Field and method data . Constructors and ordinary methods ...