☰
|
B
|
Basic Level Interview Questions |
|
I
|
Intermediate Level Interview Questions |
|
A
|
Advanced Level Interview Questions |
Ensemble is an array of nodes (or servers, if you like) that form your Distributed Computer Ecosystem.When you want to have high availability in zookeeper server you use multiple zookeeper servers to create an ensemble.
In production, you run ZooKeeper in replicated mode. A replicated group of servers in the same application is called a quorum, and in replicated mode, all servers in the quorum have copies of the same configuration file.
Creating own protocols for coordinating the cluster in a distributed systems results in failure and frustration for the developers. The architecture of a distributed system can be prone to deadlocks, inconsistency and race conditions. This leads to various difficulties in making the cluster fast, reliable and scalable. To address all such problems, Apache ZooKeeper can be used as a coordination service to write efficient distributed applications without having to reinvent the wheel from the beginning for co-ordination issues.
Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed application providing services for writing a distributed application.
The common services provided by ZooKeeper are as follows :
ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble for development purpose. It is useful for debugging and working around with different options. To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/zkServer.sh start”) and then, ZooKeeper client (“bin/zkCli.sh”).
Once the client starts, you can create znodes, watch znode for changes, set data, create children of a znode, delete a znode etc.
If you would like to run one server, that's fine from ZooKeeper's perspective; you just won't have a highly reliable or available system. A three-node ZooKeeper ensemble will support one failure without loss of service, which is probably fine for most users and arguably the most common deployment topology. However, to be safe, use five nodes in your ensemble. A five-node ensemble allows you to take one server out for maintenance or rolling upgrade and still be able to withstand a second unexpected failure without interruption of service.
In short three, five, or seven is the most typical number of nodes in a ZooKeeper ensemble, the more members an ensemble has, the more tolerant the ensemble is of host failures. Keep in mind that the size of your ZooKeeper ensemble has little to do with the size of the nodes in your distributed system. The nodes in your distributed system would be clients to the ZooKeeper ensemble, and each ZooKeeper server can handle a large number of clients scalably. For example, HBase (a distributed database on Hadoop) relies upon ZooKeeper for leader election and lease management of region servers. You can have a large 50-node HBase cluster running with a relatively small (say, five) node ZooKeeper ensemble.
For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. In ZooKeeper, an even number of peers is supported, but it is normally not used because an even sized ensemble requires, proportionally, more peers to form a quorum than an odd sized ensemble requires. consider a case when you have 4 nodes in your cluster. Zookeeper will remain up if at least 3 nodes are up (>4/2). So effectively you can handle failure of 1 nodes. If you had 3 nodes in your cluster, you would need at least 2 nodes up for the zookeeper to function (>3/2). Hence even in 3 node cluster, you can handle failure of 1 nodes. So having 4th node doesn't give any additional advantage at all.
Lets have one more example, as you know an ensemble with 4 nodes requires 3 to form a quorum, while an ensemble with 5 also requires 3 nodes to form a quorum. Thus, an ensemble of 5 allows 2 nodes to fail, and thus is more fault tolerant than the ensemble of 4, which allows only 1 down peer.
Similarly, Zookeeper elects a master based on the opinion of more than half of the nodes from the cluster. And finally, it keeps functioning if and only if more than half of the nodes are up.
Leader Election is the process of electing the leader, which is a server that has been chosen by an ensemble of servers and that continues to have support from that ensemble. The purpose of the leader is to order client requests that change the ZooKeeper state: create, setData, and delete. The leader transforms each request into a transaction, and proposes to the followers that the ensemble accepts and applies them in the order issued by the leader.
The process of leader election is as follows −