Recently I was reading a book from Daniel, the technical team of Taobao , Name is 《 Large website system and Java Middleware practice 》. The opening chapter details the evolution of a website architecture from small to large , In the process of upgrading from stand-alone architecture to cluster architecture, it focuses on session Synchronization problem , This is also a topic that many people can't get around when they talk about distributed . Let's sort out the contents of the book , It's a Book note , For future reference .

Where does the problem come from

do web Development students should session I can't be more familiar , It is the session ID assigned by the server to the client , Every time the browser asks, it will bring this identification to tell the server who I am , The server stores these different session information in memory , This is used to identify which session the request comes from . Total environments deployed on a stand-alone basis , because web The server and session It's all on the same machine , Therefore, the corresponding session data can be found . But if you have 2 platform web The server (A and B) Provide services , If the first request falls A And created session, So how to ensure the next time B Can read session data ?


There are the following 4 Common solutions in .

1、Session Sticky

This is the simplest Method , The core idea is to make all requests for the same session land on the same server , It's like a single machine , We can do some identification and control forwarding in load balancing to achieve this goal . The advantage of this is that it can simplify the session Handle , It is also convenient for local caching , But the disadvantages are obvious :

  • If this server goes down or restarts , Then all session data will be lost , Lost the high availability feature of distributed cluster .

  • Increased load balancer burden , Make it stateful , And it will consume more resources , Easy to become a performance bottleneck .

2、Session Replication

seeing the name of a thing one thinks of its function , This is a kind of session Replicated scenarios , The core idea is to increase the number of servers session Synchronization mechanism to ensure data consistency .

It looks a lot easier than the first one , And there's no first flaw , But there are still serious problems in some application scenarios :

  • Data synchronization between servers brings extra network consumption , As the number of machines and data increases , There will be a lot of pressure on network bandwidth , It will inevitably lead to delay problems .

  • All session data is stored on each server , If the number of sessions is large, most of the server's memory space will be occupied .

At present, many application containers support this synchronization method , So it is a good solution when the cluster size and data volume are relatively small .

3、Session Centralized storage

The idea of this method is to store and manage all session data in a unified way , All application servers need to session Read and write through session Server to operate :

The advantage of this scheme is independence session Management of , Single responsibility ,session How servers are stored ( Memory 、 database 、 file 、NoSql wait ), Any way to provide external services is transparent . No additional overhead for application system and load balancing , Consistency without data synchronization , It should look perfect , But there are also some small defects :

  • Yes session Network operation required for reading and writing , Comparison session Directly stored in web Server time increases latency and instability , Fortunately session The server and web The server is usually deployed in the local area network , Can minimize this problem .

  • session Server issues will affect all web service , If multi machine deployment is adopted, data consistency problems will also arise .

Each scheme has its unique advantages , At the same time, it will bring new problems , It's just that it's not perfect , Only fit is the best . On the whole , This scheme is very advantageous when both the application server and the session data volume are large .

4、Cookie Base

This solution is based on cookie The transmission of , The core idea is very simple , It is to write the complete session data to the client after processing cookie, In the future, the client will bring this with every request cookie, And then the server analyzes cookie Data to get session information , As shown in the figure below :

The plan is simple and clear , There are no problems caused by the previous schemes , But the disadvantages are obvious :

  • First, through cookie It's not safe to pass on critical data , Even with special encryption .

  • If the client is disabled cookie, Will directly cause the service to be unavailable .

  • cookie The data of is limited in size , If the data passed exceeds the limit size , Data exception will be caused .

  • stay http Carrying a large amount of data in the request for transmission will increase the network burden , Again , The server responds to a large amount of data, which will slow down the request , When the concurrency is large, it will be terrible .


above 4 All kinds of schemes are feasible , As mentioned earlier , Each scheme has its own advantages and disadvantages , Not perfect , In practical application, trade-offs and trade-offs should be made according to requirements . These are all common solutions , I believe there will be other problems in the real practice and landing process , Experienced people may have some new ways “ tricks ”, Welcome to the discussion .

