Link to the original text : BASE: An Acid Alternative
Pdf Download link : Base

database ACID, No stranger : Atomicity 、 Uniformity 、 Isolation and persistence , This is in the era of single server , Easy to implement , But now , In the face of such a huge amount of access and data , It's impossible for a single server to adapt , and ACID In a cluster environment , It's almost impossible to meet our expectations , To ensure the ACID, The efficiency will be greatly reduced , What's worse , Such a high demand , Not easy to expand ~ And again the CAP principle (Consistency( Uniformity )、Availability( Usability )、Partition tolerance( Partition tolerance )) and BASE principle (Basically Available( Basic available )、Soft state( Soft state )、Eventually consistent( Final agreement )), Look at their English ,Availability/Basically Available,Consistency/Eventually consistent, Basically ,BASE The principle is right CAP A further interpretation of the principle .

This article is about Ebay Our architects are 2008 Published to ACM The article , It's an explanation BASE principle , Or the classic article of ultimate consistency . In this paper, Dan it has been reviewed that BASE And ACID The fundamental difference in principle , And how to design a large website to meet the growing demand for scalability , How to adjust and compromise the business during the period . And the introduction of some specific compromise technologies .

After partitioning the database , For usability (Availability) Sacrifice partial consistency (Consistency) Can significantly improve the scalability of the system (Scalability).

——By DAN PRITCHETT, EBAY ,Translated by Jametong

Web Applied in the past 10 Years are becoming more and more popular . Whether it's for end users or for application developers , The hope for this app is likely to be , This application is used by the most extensive users - Widespread use will lead to the growth of transactions . If business depends on persistence , Data storage is likely to become a bottleneck .

There are two strategies for extending any application . The first one is , It's also the simplest , Namely Vertical expansion : Move applications to bigger and stronger computers . The biggest machine available at present can't meet its capacity, which is its most obvious limitation . Vertical expansion is also expensive , Increasing transaction capacity usually requires the purchase of the next larger machine . Vertical expansion often also creates a dependency on suppliers , So as to further increase the cost .

Horizontal scaling (Horizontal Scaling) Provides more flexibility , But it also significantly increases the complexity . Horizontal data expansion can go in two directions . Press Function extension (Functional Scaling) It involves grouping data by function , Different functional groups are distributed in different databases . Split data into multiple databases within the function , That is to carry out Fragmentation (Sharding), It adds a new dimension to scale out . chart -1 The horizontal data expansion strategy is briefly explained .

chart 1

Pictured -1 Shown , The two methods of horizontal expansion can be used at the same time . User information (Users)、 Product information (Products) And trading information (Transactions) It can be stored in different databases . in addition , Each functional area depends on its trading capacity (transactional capacity) It can be split into multiple databases . As shown in the figure , Functional areas can be expanded independently of each other .

Functional areas (Functional Partitioning)

Functional partitioning is very important to achieve high scalability . Every good database architecture will be summarized according to its function (Schema) Break it down into multiple tables . user (Users)、 product (Products)、 transaction (Transactions) And communications are examples of functional zoning . The common method is , Using things like foreign keys (foreign key) A kind of database concept to maintain data consistency between these functional areas .

It depends on the constraints of the database to ensure the consistency between functional groups , It leads to different profiles of the database (schema) Highly coupled in deployment strategy . To support constraints , Tables must exist on a single database server , When the transaction rate (transaction rate) You can't scale it out as you grow . In many cases , Migration of different functional groups of data to independent database servers is the easiest way to achieve outward expansion (Scale-out) programme .

Profiles that scale to very high transaction volumes place data from different functions on different database servers . This requires migrating constraints between data from database to application . At the same time, it will introduce some new challenges , The following contents of this article will discuss this in depth .

CAP Theorem (CAP Theorem)

Eric Brewer, A professor at the University of California, Berkeley ,Inktomi Co founder and chief scientist of the company , The following conjecture has been made ,Web Services cannot meet the following requirements at the same time 3 Attributes ( Its initials form an abbreviation CAP):

  • Uniformity (Consistency). The client knows that a series of operations will happen at the same time ( take effect ).
  • Usability (Availability). Each operation must end with a predictable response .
  • Partition tolerance (Partition tolerance). Even if a single component is not available , The operation can still be completed .

Specifically speaking , In any database design , One Web Applications can only support at most two of the above properties at the same time . obviously , Any scale out strategy depends on data partitioning ; therefore , Designers have to choose between consistency and usability .

ACID Solution

ACID Database transactions greatly simplify the work of application developers . As its abbreviation indicates ,ACID Transactions provide the following guarantees :

  • Atomicity (Atomicity). All operations in a transaction , All or nothing , Or not at all .
  • Uniformity (Consistency). At the beginning and end of a transaction , The database is in a consistent state .
  • Isolation, (Isolation). A transaction is as if only one operation is being performed by the database .
  • persistence (Durability). At the end of the transaction , This operation will be irreversible .( That is, as long as the transaction is committed , The system will ensure that the data will not be lost , Even if there's a system Crash, The translator added ).

Database vendors have long recognized the need for database partitioning , And introduced a new method called 2PC( Two-phase commit ) Technology to provide services across multiple database instances ACID Guarantee . The agreement is divided into the following two phases :

  • The first stage , The transaction coordinator requires that each database involved in a transaction be pre committed (precommit) This operation , And reflect whether you can submit .
  • The second stage , The transaction coordinator requires each database to commit data .

If any database rejects the submission , Then all databases will be required to roll back their part of the information in this transaction . What are the drawbacks of doing this ? We can get consistency between partitions . If Brewer My guess is right , So we're going to have an impact on usability , but , How can it be like this ?

The availability of any system is a product of the availability of the relevant components that perform operations . The second half of this statement is particularly important . Components that may be used but are not required in the system , It doesn't reduce the availability of the system . Two database transactions are involved in two-phase commit , Its availability is a product of the availability of each of these two databases . for example , If we assume that every database has a purpose 99.9% The usability of , So the availability of this transaction is 99.8%, Or monthly 43 Minutes of extra downtime .

About two-phase submission , You can have a look “ Nine algorithms to change the future ”, There is a brilliant explanation in it ~

A kind of ACID alternatives

If ACID Provide consistent choices for partitioned databases , So how do you achieve usability ? The answer is BASE( Basically available 、 soft ( weak ) state 、 Final consistency ).
BASE And ACID The opposite .ACID More pessimistic , Force consistency at the end of each operation , and BASE More optimistic , Accept that the consistency of the database is in a volatile state . although , It sounds hard to deal with , It's actually quite manageable , And it can bring ACID Higher levels of scalability beyond reach .

BASE The availability of is achieved by supporting local failures rather than system global failures . Here is a simple example : If the user partition is in 5 Database server ,BASE Design encourages a similar approach , The failure of such a user database will only affect the 20% Users of . There's no magic involved here , however , It does lead to higher perceived system availability .

therefore , up to now , You've broken the data down into functional groups , And the busiest functional groups are partitioned into multiple databases , How to apply it in your application BASE Principle ? And ACID Compared with the typical application scenarios of ,BASE More in-depth analysis of operations in logical transactions is needed . How to analyze it ? Subsequent content will provide some guidelines .

Consistency patterns (Consistency Patterns)

Along Brewer Guess , If BASE Select keep availability in the partitioned database (Availability), that , Weakening a certain degree of consistency becomes an inevitable choice . It's usually difficult to make decisions , Because business investors and developers tend to think of consistency (Consistency) Critical to the success of the application . Even temporary inconsistencies can't hide from end users , therefore , Both technology and product departments need to be involved , To decide how much consistency to weaken .

chart -2 It's a simple summary , It explains BASE Things to consider in consistency . The user table stores user information , It also includes total sales and total purchases . These are runtime Statistics . The transaction table stores every transaction , Will the buyer 、 The seller and the amount of the transaction are linked together . These are the results of over simplification of tables used in practice , however , It already contains the necessary elements to explain many aspects of consistency .

chart 2

Generally speaking , Consistency between functional groups is easier to weaken than consistency within functional groups . This sample profile contains two functional groups : Users and transactions . Every time an item is sold ( The goods ), A record will be added to the transaction list , The counters of buyers and sellers will be updated . Use ACID Style business ,SQL The statement might look like this -3 Shown .

chart 3

The columns of total sales and total purchases in the user table can be considered as a cache of the transaction table (Cache). It exists to improve the efficiency of the system . For this reason , Consistency constraints can be weakened . You can adjust the expectations of buyers and sellers , So their running balance (running balance) Can't immediately reflect the outcome of the deal . This is very common , actually , People often encounter this delay between trading and running balances ( for example ,ATM Withdrawal or cell phone call ).

How to modify SQL Statement to weaken consistency depends on how the run balance is defined , If they're just simple estimates , That is, some transactions can be missed without statistics ,SQL It's very simple , Pictured -4 Shown .

chart 4

Now? , We have decoupled the update of user table and transaction table . Consistency between the two tables will no longer be guaranteed . actually , Failure between the first transaction and the second transaction , Will cause the user table to persist in an inconsistent state , however , If the contract says run time summary (running total) If it's an estimate , That's enough .

If the estimate is unacceptable , What to do ? How to continue to decouple the update of user table and transaction table ? Introduce a persistent message queue to solve this problem . There are many options for persistent messages . However , The most critical factor in implementing this message queue is , Make sure that the persistence of the queue supports the same resources as the database . To implement the queue without involving 2PC In this case, commit by transaction , It's necessary . current SQL The operation looks a little different , Pictured -5 Shown .

chart 5

The syntax in this example is a bit arbitrary , In order to explain the concept, its logic has also been greatly simplified . By queuing persistent messages in the same transaction where the statement is inserted , It can capture the information needed to update the user's running balance . This transaction is contained in the same database instance , therefore , It doesn't affect the availability of the system .

A separate message processing component , Each message is taken from the queue , And apply this information to the user table . This example seems to solve all the problems , however , There's another problem that hasn't been solved . In order to avoid queue time 2PC, The message is persistent on the host of the transaction . If a message is taken out of the queue in a transaction involving the user host , We will still encounter 2PC Scenario .

In the message processing component 2PC One solution is to do nothing . By decoupling update operations to a separate backend (back-end) Components , Can maintain the availability of customer facing components . Business needs may be able to accept lower availability of message processors .

however , Suppose your system is totally unacceptable 2PC. How to solve this problem ? First , You need to understand the concept of equal power . If an operation is applied one or more times, the same result can be obtained , It's considered to be equal power . The idempotent operation is very useful , Because they allow local failures , Repeating them does not change the final state of the system .

From the perspective of equal power , The selected example is problematic . Update operations are usually not exponentially . In this example, there is the operation of accumulating account columns . Repeated application of this operation will obviously result in wrong account balance . However , Even the update operation of setting only one value is not equal power , Because it's also about the sequence of operations . If the system cannot guarantee that the update operation will be applied in the order received , The final state of the system will also be incorrect . We will discuss this further in the following sections .

In the case of account updates , You need a way to track which updates have been successfully applied , Which updates are still unresolved . One technology is , Use a table to record the unique identification numbers of those transactions that have been applied .

chart -6 The table shown in will record transactions ID、 Which account was updated and the user who applied it ID. Now? , Our sample pseudo code is shown in the figure -7 Shown .

chart 6

chart 7

This example depends on being able to peek at a message in the queue , And delete the message immediately after successful processing . If necessary, , It can be handled by two separate transactions : A transaction on the message queue , A transaction on the user database . Database operation successfully submitted , Just submit the queue operation . Current algorithms can support local failures , And it can also provide services that don't depend on 2PC The business guarantee of .

If you just focus on the order of updates , There's a simpler technique to make sure that the exponentially updated . Let's tweak our sample summary a little bit , To explain the challenges and solutions ( See the picture -8).

chart 8

Suppose two purchases take place in a very short time window , Our messaging system doesn't guarantee sequential operations . The situation you are facing is , Depending on the order in which messages are processed ,last_purchase There may be an incorrect value . Fortunately, , It can be done by SQL Statement to make a simple adjustment to solve this kind of update problem , Pictured -9 Described .

chart 9

Just by not allowing last_purchase Time goes backwards , You can make the update operation sequence irrelevant . You can also use this method to protect any update from unordered updates (out-of-order update). You can also try to use monotonic incremental transactions ID Instead of time .

The order of the message queue

About sequential message delivery , The following brief adjunct may be useful . A messaging system can provide the ability to ensure that messages are sent in the same order as they are received . however , Supporting this feature can be very expensive , Usually it's not necessary , actually , Sometimes it just gives a false sense of security .

The examples provided here illustrate how to weaken the order of messages , In the end, it can still provide a consistent view of the database . The overhead of weakening message ordering is nominal , In most cases , This cost is significantly less than the cost of ensuring message order in a messaging system .

Further talk , Whatever the style of interaction ,Web Applications are semantically an event driven system . Client requests arrive at the system in any order . The processing time required for each request is also different . The request scheduling of different components of the whole system is also uncertain , This leads to the uncertainty of message queuing . The requirement to keep messages in order gives a false sense of security . The simple fact is , Uncertain inputs lead to uncertain outputs .

Weak state / Final consistency (Soft State/Eventually Consistent)

Only this and nothing more , The point has always been to trade off partial consistency for usability . The other side of the coin is , Understand the impact of soft state and final consistency on Application Design .
Because software engineers tend to think of the system as a closed loop (closed loop) Of . From the perspective of foresight input to foresight output , We can think about the predictability of their behavior in this way . This is necessary to create the right software system . The good news is , In most cases, use BASE It doesn't change the predictability of a closed-loop system , however , It really needs to be looked at as a whole .

A simple example can help explain this . Consider such a system , The user can transfer the asset to another user here . It doesn't matter which type of asset , It can be money or equipment in the game . For this example , We assume that , By using a message queue for decoupling , Decouple the following two operations : Take assets from a user , Give the asset to another user .

Soon , The system will feel problems and uncertainties . As assets leave one user to another , There's a time delay . The size of this time window is determined by the design of the message system . in any case , Between the start state and the end state , There will always be a time interval , In the meantime , It seems that no user owns this asset .

however , If we think about it from the perspective of users , This time interval may be indifferent or not exist at all . Neither the receiving user nor the sending user may know when the asset will arrive . If the time interval between sending and receiving is a few seconds , For users who specifically communicate asset transfer , It will be hidden or indeed tolerable . In this case , This kind of system behavior for users , Is consistent and acceptable , Even if , We rely on soft state and final consistency in our implementation .

Event driven architecture (Event-Driven Architecture)

If you really need to know , When will the system reach a consistent state ? You may need an algorithm , To apply to this state , however , It is applied only when it reaches a consistent state associated with subsequent requests .

Continue with the previous example , If at the time the asset arrives , Users need to be notified , What do I do ? Create an event in the transaction that delivers the asset to the receiving user , It can provide a mechanism , When a predetermined state is reached , It can be further processed .EDA( Event driven architecture ,Event-Driven Architecture) It can significantly improve scalability and architecture decoupling . about EDA Further discussion of application is beyond the scope of this paper .

Conclusion

Significantly expand the transaction rate of the system , We need to think about how to manage resources in a new way . When the load needs to be distributed over a large number of components , The traditional transaction model is full of holes . Decouple the operation , And deal with them in turn , May provide better availability and scalability , But at the expense of consistency .BASE A model is provided to consider this decoupling .

Reference resources

Dan Pritchett yes Ebay One of our technicians , He used to 4 Years have been Ebay Members of the architecture team . In this position , He and Ebay The Marketing Department 、Paypal as well as Skype The strategy of 、 business 、 Product and technical team work together . He already has 20 Years of working experience in a technology company , The companies he worked for included Sun,HP And Silicon Valley graphics (Silicon Graphics),Pritchett Rich technical experience , From network layer protocol and operating system to system design and software mode . He owns Missouri Rolla A bachelor's degree in computer science from a university .

Base: A kind of Acid More articles on alternatives to

  1. Base: A kind of Acid alternatives

    Link to the original text : BASE: An Acid Alternative database ACID, No stranger : Atomicity . Uniformity . Isolation and persistence , This is in the era of single server , Easy to implement , But now , In the face of such a large number of visits ...

  2. CAP The principle and BASE Thought and ACID Model

    Interpretation of the problem For the three examples above , I'm sure you can see , Our end users have different requirements for data consistency when using different computer products : 1. Some systems , Respond quickly to users , At the same time, we should ensure that the data of the system is true for any client ...

  3. CAP principle 、 Consistency model 、BASE Theory and ACID characteristic

    CAP principle In Theoretical Computer Science ,CAP Theorem (CAP theorem), It's also called brewer's theorem (Brewer's theorem), It points out that for a distributed computing system , It is impossible to meet the following three points at the same time : Uniformity (Con ...

  4. Spark, An alternative to fast data analysis

    The source of the original text is :http://www.ibm.com/developerworks/library/os-spark/ Spark It is a kind of with Hadoop Similar open source cluster computing environment , But there are some differences between the two ...

  5. Distributed systems (Distributed System) Information

    This is about distributed systems , The author wrote so well . Bring it to spare website :https://github.com/ty4z2008/Qix/blob/master/ds.md Hope to reprint friends , You don't have to contact me . but ...

  6. Don't understand these distributed architectures 、 Data consistency solutions for distributed systems , How can you find a high-tech Internet Job ? Strong analysis eBay BASE Pattern 、 Qunar and mushroom Street distributed architecture

    The Internet industry is the general trend , It can be seen from the salary level of recruitment , So how to improve your skills , Meet the skill requirements of the Internet industry ? Need to be goal oriented , Improve your skills , This paper mainly focuses on the design of distributed system mentioned in recruitment . framework ( Data consistency ) Made an analysis , I wish you all the best ...

  7. Don't understand these high concurrency distributed architectures 、 Data consistency solutions for distributed systems , How can you find a high-tech Internet Job ? Strong analysis eBay BASE Pattern 、 Qunar and mushroom Street distributed architecture

    The Internet industry is the general trend , It can be seen from the salary level of recruitment , So how to improve your skills , Meet the skill requirements of the Internet industry ? Need to be goal oriented , Improve your skills , This paper focuses on the design of high concurrency distributed system . framework ( Data consistency ) Made an analysis , I wish you all an early start to ...

  8. analysis eBay BASE Pattern 、 Qunar and mushroom Street distributed architecture

    Catalog : Problem analysis concept interpretation Most Simple Principle interpretation eBey. Where are you going? . Mushroom Street distributed transaction case study Reference material 1. Problem analysis      To do architecture , The problem has to be identified , It's about who , What's the problem? . The obvious , Distributed architecture solution ...

  9. Strong analysis eBay BASE Pattern 、 Qunar and mushroom Street distributed architecture

    The Internet industry is the general trend , It can be seen from the salary level of recruitment , So how to improve your skills , Meet the skill requirements of the Internet industry ? Need to be goal oriented , Improve your skills , This paper focuses on the design of high concurrency distributed system . framework ( Data consistency ) Made an analysis , I wish you all an early start to ...

Random recommendation

  1. phpstudy Configure pseudo static methods

    mod_rewrite yes Apache A very powerful function of , It can implement pseudo static pages . Let me talk about its usage in detail 1. testing Apache Do you support mod_rewrite adopt php Provided phpinfo() Function to look at the ring ...

  2. sentinel build redis Summary of cluster experience

    One .protected-mode By default ,redis node and sentinel Of protected-mode All are yes, When building a cluster , If you want to connect remotely redis colony , Need to put redis node and se ...

  3. 【 turn 】 Apply for alignment of a structure size buffer

    in the majority of cases , The compiler and C The library handles alignment problems transparently .POSIX Marked by malloc( ), calloc( ), and  realloc( )  Return the address for any C All types are aligned . stay Linux in ...

  4. javascript Of replace() How to use the method

    String.prototype.replace() The replace() method returns a new string with some or all matches of a p ...

  5. Flask Introduction flask-wtf Form processing

    Reference article 1.  Use WTForms Perform form validation   The first 11 Set #Sample.py # coding:utf-8 from flask import Flask,render_template,re ...

  6. JSP Implicit objects

    One . Implicit objects JSP Implicit objects object type request javax.servlet.http.HttpServletRequest response javax.servlet.http.Http ...

  7. Miniconda install scrapy course

    One . The background that I want to study it again two days ago Scrapy, The environment at that time was PyCharm Community Edition +Python 3.7. Use pip Installation error reported all the time “distutils.errors.DistutilsPlatformEr ...

  8. vs2017 And docker

    Basic needs System win10 vs2017 docker step 1. Turn on the system hyper-v 2. Restart the computer 3. install docker Download address :https://docs.docker.com/docker ...

  9. salt Replace it with a new one key

    1   stop it salt-minion service service salt-minion stop 2 Delete salt-minion A public key file rm /etc/salt/pki/minion/minion.pub r ...

  10. 74.VS2013 and opencv3.1.0 Installation tutorial

    One . Download the file first 1.VS2013 VS2013 There are many versions , pro , Flagship Edition , Chinese, English and so on , The corresponding key is also different . I chose the simplified Chinese Professional Edition . The download link is as follows . http://www.musnow.com/t ...