Apache ZooKeeper Interview Questions and Answers Part-2 | Hadoop Interview Questions

I What is ZooKeeper ensemble?

Ensemble is an array of nodes (or servers, if you like) that form your Distributed Computer Ecosystem.When you want to have high availability in zookeeper server you use multiple zookeeper servers to create an ensemble.

I What is ZooKeeper quorum?

In production, you run ZooKeeper in replicated mode. A replicated group of servers in the same application is called a quorum, and in replicated mode, all servers in the quorum have copies of the same configuration file.

I What is the difference between ZooKeeper ensemble and ZooKeeper quorum?

Ensemble refers to the full set of peer servers in a ZooKeeper cluster.
Quorum refers to the minimum number of nodes that must agree on a transaction before it is considered committed.
For examples, a 3-node ensemble requires a quorum of 2 servers running to commit a transaction. A 5-node ensemble requires a quorum of 3 servers running to commit a transaction.

I What problems can be addressed by using Zookeeper?

Creating own protocols for coordinating the cluster in a distributed systems results in failure and frustration for the developers. The architecture of a distributed system can be prone to deadlocks, inconsistency and race conditions. This leads to various difficulties in making the cluster fast, reliable and scalable. To address all such problems, Apache ZooKeeper can be used as a coordination service to write efficient distributed applications without having to reinvent the wheel from the beginning for co-ordination issues.

I What Is Apache Zookeeper Meant For?

Apache ZooKeeper is a service used by a cluster (group of nodes) to coordinate between themselves and maintain shared data with robust synchronization techniques. ZooKeeper is itself a distributed application providing services for writing a distributed application.

The common services provided by ZooKeeper are as follows :

Naming service:Identifying the nodes in a cluster by name. It is similar to DNS, but for nodes.
Configuration management: Latest and up-to-date configuration information of the system for a joining node.
Cluster management:Joining / leaving of a node in a cluster and node status at real time.
Highly reliable data registry:Availability of data even when one or a few nodes are down.
Leader election:Electing a node as leader for coordination purpose.
Locking and synchronization service:Locking the data while modifying it.

B Where Zookeeper Is Used?

Apache Hadoop relies on ZooKeeper for automatic fail-over of Hadoop HDFS Namenode.
Apache HBase, a distributed database built on Hadoop, uses ZooKeeper for master election, lease management of region servers, and other communication between region servers.
Apache Storm, being a real time stateless processing framework, manages its state in ZooKeeper Service
Apache Kafka uses it for choosing leader node for the topic partitions
Apache YARN relies on it for the automatic failover of resource manager (master node)
Apache Solr uses ZooKeeper for leader election and centralized configuration.

A Explain The Cli In Zookeeper?

ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble for development purpose. It is useful for debugging and working around with different options. To perform ZooKeeper CLI operations, first turn on your ZooKeeper server (“bin/zkServer.sh start”) and then, ZooKeeper client (“bin/zkCli.sh”).

Once the client starts, you can create znodes, watch znode for changes, set data, create children of a znode, delete a znode etc.

A How-many-zookeepers-should-i-run ?

If you would like to run one server, that's fine from ZooKeeper's perspective; you just won't have a highly reliable or available system. A three-node ZooKeeper ensemble will support one failure without loss of service, which is probably fine for most users and arguably the most common deployment topology. However, to be safe, use five nodes in your ensemble. A five-node ensemble allows you to take one server out for maintenance or rolling upgrade and still be able to withstand a second unexpected failure without interruption of service.

In short three, five, or seven is the most typical number of nodes in a ZooKeeper ensemble, the more members an ensemble has, the more tolerant the ensemble is of host failures. Keep in mind that the size of your ZooKeeper ensemble has little to do with the size of the nodes in your distributed system. The nodes in your distributed system would be clients to the ZooKeeper ensemble, and each ZooKeeper server can handle a large number of clients scalably. For example, HBase (a distributed database on Hadoop) relies upon ZooKeeper for leader election and lease management of region servers. You can have a large 50-node HBase cluster running with a relatively small (say, five) node ZooKeeper ensemble.

A Why zookeeper should run on odd numbers of nodes?

For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. In ZooKeeper, an even number of peers is supported, but it is normally not used because an even sized ensemble requires, proportionally, more peers to form a quorum than an odd sized ensemble requires. consider a case when you have 4 nodes in your cluster. Zookeeper will remain up if at least 3 nodes are up (>4/2). So effectively you can handle failure of 1 nodes. If you had 3 nodes in your cluster, you would need at least 2 nodes up for the zookeeper to function (>3/2). Hence even in 3 node cluster, you can handle failure of 1 nodes. So having 4th node doesn't give any additional advantage at all.

Lets have one more example, as you know an ensemble with 4 nodes requires 3 to form a quorum, while an ensemble with 5 also requires 3 nodes to form a quorum. Thus, an ensemble of 5 allows 2 nodes to fail, and thus is more fault tolerant than the ensemble of 4, which allows only 1 down peer.

Similarly, Zookeeper elects a master based on the opinion of more than half of the nodes from the cluster. And finally, it keeps functioning if and only if more than half of the nodes are up.

I Explain The Zookeeper Workflow?

Once a ZooKeeper ensemble starts, it will wait for the clients to connect.
Clients will connect to one of the nodes in the ZooKeeper ensemble. It may be a leader or a follower node.
Once a client is connected, the node assigns a session ID to the particular client and sends an acknowledgement to the client. If the client does not get an acknowledgment, it simply tries to connect another node in the ZooKeeper ensemble.
Once connected to a node, the client will send heartbeats to the node in a regular interval to make sure that the connection is not lost.
If a client wants to read a particular znode, it sends a read request to the node with the znode path and the node returns the requested znode by getting it from its own database. For this reason, reads are fast in ZooKeeper ensemble.
If a client wants to store data in the ZooKeeper ensemble, it sends the znode path and the data to the server. The connected server will forward the request to the leader and then the leader will reissue the writing request to all the followers. If only a majority of the nodes respond successfully, then the write request will succeed and a successful return code will be sent to the client. Otherwise, the write request will fail.

I What is Leader Election and how does it happen?

Leader Election is the process of electing the leader, which is a server that has been chosen by an ensemble of servers and that continues to have support from that ensemble. The purpose of the leader is to order client requests that change the ZooKeeper state: create, setData, and delete. The leader transforms each request into a transaction, and proposes to the followers that the ensemble accepts and applies them in the order issued by the leader.

The process of leader election is as follows −

All the nodes create a sequential, ephemeral znode with the same path, /app/leader_election/guid_. And ZooKeeper ensemble will append the 10-digit sequence number to the path and the znode created will be /app/election/guid_0000000001, /app/leader_election/guid_0000000002, etc.
For a given instance, the node which creates the smallest number in the znode becomes the leader and all the other nodes are followers. Also each follower node watches the znode having the next smallest number.
If the leader goes down, then its corresponding znode gets deleted. The next in line follower node will get the notification through watcher about the leader removal.
The next in line follower node will check if there are other znodes with the smallest number. If none, then it will assume the role of the leader. Otherwise, it finds the node which created the znode with the smallest number as leader.

❮ Prev

BIG
DATA

JAVA

ZooKeeper Interview Questions Part-2

I What is ZooKeeper ensemble?

I What is ZooKeeper quorum?

I What is the difference between ZooKeeper ensemble and ZooKeeper quorum?

I What problems can be addressed by using Zookeeper?

I What Is Apache Zookeeper Meant For?

B Where Zookeeper Is Used?

A Explain The Cli In Zookeeper?

A How-many-zookeepers-should-i-run ?

A Why zookeeper should run on odd numbers of nodes?

I Explain The Zookeeper Workflow?

I What is Leader Election and how does it happen?

B	Basic Level Interview Questions
I	Intermediate Level Interview Questions
A	Advanced Level Interview Questions

BIGDATA

JAVA

ZooKeeper Interview Questions Part-2

CoreJavaGuru

Vivek HJ

I What is ZooKeeper ensemble?

I What is ZooKeeper quorum?

I What is the difference between ZooKeeper ensemble and ZooKeeper quorum?

I What problems can be addressed by using Zookeeper?

I What Is Apache Zookeeper Meant For?

B Where Zookeeper Is Used?

A Explain The Cli In Zookeeper?

A How-many-zookeepers-should-i-run ?

A Why zookeeper should run on odd numbers of nodes?

I Explain The Zookeeper Workflow?

I What is Leader Election and how does it happen?

Recommended

ZooKeeper Command Line Interface (CLI) is used to interact with the ZooKeeper ensemble which lets you perform simple, file-like operations.

HBase is called the Hadoop database because it is a NoSQL database that runs on top of Hadoop.

YARN stands for Yet Another Resource Negotiator. Apache YARN is part of the core Hadoop project. It is Hadoop’s cluster resource management system.

StringBuilder objects are like String objects, except that they can be modified. Hence Java StringBuilder class is also used to create mutable (modifiable) string object. Read more.

BIG
DATA