Formal introduction ZooKeeper Before , Let's take a look first ZooKeeper The origin of , It's interesting .
Here's an excerpt from 《 from Paxos To ZooKeeper 》 Chapter four, section one , I recommend you to read ：
ZooKeeper It originated from a research group of Yahoo Research Institute . At the time , The researchers found that , Many large systems in Yahoo need to rely on a similar system for distributed coordination , But these systems often have distributed single point problems . therefore , Yahoo developers are trying to develop a general distributed coordination framework without single point problems , So that developers can focus on the business logic .
About “ZooKeeper” The name of the project , In fact, there is an interesting story . At the beginning of the project , Considering that many internal projects are named after animals before （ For example, famous Pig project ), Yahoo engineers want to give the project the name of an animal . The chief scientist of the Institute at that time RaghuRamakrishnan Jokingly say ：“ Go on like this , We have become a zoo here ！” There is a , We all said that we would call zookeepers one after another because the distributed components named after animals are put together , Yahoo's entire distributed system looks like a large zoo , and ZooKeeper It's just going to be used to coordinate the distributed environment ,ZooKeeper That's why the name of .
ZooKeeper It's an open source Distributed coordination services , Its design goal is to encapsulate the complex and error prone distributed consistency services , Form an efficient and reliable set of primitives , And a series of simple and easy-to-use interfaces are provided for users to use .
The original language ： Operating systems or computer networks . Is composed of several instructions , A process used to accomplish a certain function . It is indivisible · That is, the execution of the primitive must be continuous , No interruption is allowed during execution .
ZooKeeper Provides us with high availability 、 High performance 、 Stable distributed data consistency solution , It's often used to implement things like data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Distributed lock and distributed queue .
in addition ,ZooKeeper Save data in memory , The performance is excellent . stay “ read ” More than “ Write ” Especially high performance in applications , because “ Write ” Will cause all servers to synchronize state .（“ read ” More than “ Write ” It's a typical scenario for coordinating services ）.
- Sequential consistency ： Transaction requests initiated from the same client , It will eventually be applied strictly in order to ZooKeeper In the middle .
- Atomicity ： The processing results of all transaction requests are consistent across all machines in the cluster , in other words , Or all machines in the whole cluster have successfully applied a certain transaction , Or there's no application .
- Single system image ： No matter which client is connected to ZooKeeper Server , The server data model it sees is consistent .
- reliability ： Once a change request is applied , The result of the change will be persisted , Until overridden by the next change .
ZooKeeper In the Overview , We've introduced how to use it to implement data publishing / subscribe 、 Load balancing 、 Naming service 、 Distributed coordination / notice 、 Cluster management 、Master The election 、 Distributed lock and distributed queue .
Select below 3 A typical application scenario to talk about ：
- Distributed lock ： Get distributed locks by creating unique nodes , The lock is released when the party who obtains the lock executes the relevant code or hangs up .
- Naming service ： Can pass ZooKeeper The order nodes of the order are generated globally unique ID
- Data Publishing / subscribe ： adopt Watcher Mechanism It is very convenient to realize data publishing / subscribe . When you publish data to ZooKeeper On the monitored node , Other machines can monitor ZooKeeper The dynamic update of the configuration is realized by changing the nodes on the .
actually , The realization of these functions basically benefits from ZooKeeper The function of saving data , however ZooKeeper Not suitable for holding large amounts of data , This needs attention .
- Kafka : ZooKeeper Mainly for Kafka Provide Broker and Topic Registration and multiple Partition Load balancing and other functions of .
- Hbase : ZooKeeper by Hbase Provide to ensure that there is only one Master And preserving and providing regionserver State information （ Whether online ） And so on .
- Hadoop : ZooKeeper by Namenode Provide high availability support .
alternative pronunciations ： Take out the small book , The following is very important ！
ZooKeeper The data model adopts hierarchical multi branch tree structure , Data can be stored on each node , These data can be numbers 、 String or binary sequence . also . Each node can also have N Child node , The top layer is the root node with “/” To represent the . Each data node is in ZooKeeper Known as znode, It is ZooKeeper The smallest unit of data in . also , Every znode A unique path identifier .
Emphasize one sentence ：ZooKeeper It's mainly used to coordinate services , It's not used to store business data , So don't put big data in znode On ,ZooKeeper The upper limit given is that the maximum data size of each node is 1M.
It can be seen more intuitively from the figure below ：ZooKeeper Node path identification method and Unix File system paths are very similar , It's all made up of a series of slashes "/" Split path representation , Developers can write human data to this node , You can also create child nodes under the nodes . We will introduce these operations later .
It introduces ZooKeeper After the tree data model , We know that every data node is in ZooKeeper Known as znode, It is ZooKeeper The smallest unit of data in . The data you want to store is on it , You use ZooKeeper A concept that you often need to touch in the process .
We usually will znode It is divided into 4 Categories: ：
- persistent （PERSISTENT） node ： Once created, it's always there, even if ZooKeeper Cluster down , Until you delete it .
- temporary （EPHEMERAL） node ： The lifetime of a temporary node is related to Client session （session） The binding of , If the session disappears, the node disappears . also , Temporary nodes can only be leaf nodes , Cannot create child nodes .
- Persistent order （PERSISTENT_SEQUENTIAL） node ： In addition to having lasting （PERSISTENT） Besides the characteristics of nodes , The names of child nodes are also sequential . such as
- Provisional order （EPHEMERAL_SEQUENTIAL） node ： Besides having temporary （EPHEMERAL） Besides the characteristics of nodes , The names of child nodes are also sequential .
Every znode from 2 Part of it is made up of :
- stat ： State information
- data ： The specific content of the data stored by the node
The author has sorted the interview questions and answers into interview special documents