BIG
DATA

JAVA

Apache ZooKeeper Interview Questions and Answers Part-1

Read more about »
  • Java 9 features
  • Read about Hadoop
  • Read about Storm
  • Read about Storm
 
B

Basic Level Interview Questions

I

Intermediate Level Interview Questions

A

Advanced Level Interview Questions


Apache ZooKeeper is a distributed, open-source coordination service for distributed applications. It is also called as 'King of Coordination'. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.

Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.

Below are some of the important ZooKeeper interview questions:

B What is ZooKeeper?

Apache ZooKeeper is a highly available service for maintaining small amounts of coordination data, notifying clients of changes in that data, and monitoring clients for failures ZooKeeper is a distributed co-ordination service to manage large set of hosts. Co-ordinating and managing a service in a distributed environment is a complicated process. ZooKeeper solves this issue with its simple architecture and API. ZooKeeper allows developers to focus on core application logic without worrying about the distributed nature of the application.

B What is Distributed Computing?

A distributed application is software that is executed or run on multiple computers within a network and can be stored on servers or with cloud computing. These applications interact in order to achieve a specific goal or task. Traditional applications relied on a single system to run them. Unlike traditional applications that run on a single system, distributed applications run on multiple systems simultaneously for a single task or job.

B What Are The Benefits Of Distributed Applications?

  • Reliability: Failure of a single or a few systems does not make the whole system to fail.
  • Scalability: Performance can be increased as and when needed by adding more machines with minor change in the configuration of the application with no downtime.
  • Transparency: Hides the complexity of the system and shows itself as a single entity / application.

I What Are The Challenges Of Distributed Applications?

  • Race condition: Two or more machines trying to perform a particular task, which actually needs to be done only by a single machine at any given time. For example, shared resources should only be modified by a single machine at any given time.
  • Deadlock:Two or more operations waiting for each other to complete indefinitely.
  • Inconsistency:Partial failure of data.

I What does a ZooKeeper do?

ZooKeeper is itself a distributed application providing services for developing a distributed application. It coordinates a group of nodes within the cluster and maintains shared data with effective synchronization techniques. Some of the services provided by zookeeper are:

  • ZooKeeper exposes a simple interface for Naming service which identifies the nodes in a cluster by name simialr to DNS.
  • ZooKeeper provides for an easy way for you to implement distributed mutexes to allow for serialized access to a shared resource in your distributed system.
  • You can use ZooKeeper to centrally store and manage the configuration of your distributed system. This means that any new nodes joining will pick up the up-to-date centralized configuration from ZooKeeper as soon as they join the system. This also allows you to centrally change the state of your distributed system by changing the centralized configuration through one of the ZooKeeper clients.
  • ZooKeeper provides off-the-shelf support for leader election which will deal with the problem of nodes going down.

B What is the model of zookeeper cluster ?

leader and follower

B What is the zookeeper daemon name ?

quorumpeermain

I Why Apache Zookeeper?

In the good old past, each application software was a single program running on a single computer with a single CPU. Today, things have changed. In the Big Data world, application softwares are made up of many independent programs running on an ever-changing set of computers. These applications are known as Distributed Application. A distributed application can run on multiple systems in a network simultaneously by coordinating among themselves to complete a particular task in a fast and efficient manner.

Building distributed systems is hard. Nowadays, a lot of the software applications people use daily, however, depend on such systems, and it doesn’t look like we will stop relying on distributed computer systems any time soon. Coordinating the actions of the independent programs in a distributed systems is far more difficult than writing a single program to run on a single computer. It is easy for developers to get mired in coordination logic and lack the time to write their application logic properly or perhaps the converse, to spend little time with the coordination logic and simply to write a quick-and-dirty master coordinator that is fragile and becomes an unreliable single point of failure.

ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata. ZooKeeper is an application library with two principal implementations of the APIs—Java and C—and a service component implemented in Java that runs on an ensemble of dedicated servers.

When designing an application with ZooKeeper, one ideally separates application data from control or coordination data.

ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.

A What are watches?

Accessing a znode every time a client needs to know its content would be very expensive. ZooKeeper has an event system referred to as watch which can be set on Znode to trigger an event whenever it is removed, altered or any new children are created below it. Clients register with ZooKeeper to receive notifications of changes to znodes by setting a watch.

B What is the znodes?

znodes are ZooKeeper data nodes. ZooKeeper has a file system-like data model composed of znodes.

I Explain The Types Of Znodes?

Types of znodes: ephemeral, persistent and sequential.

  • Ephemeral znodes - Ephemeral znodes are active until the client is alive. When a client gets disconnected from the ZooKeeper ensemble, then the ephemeral znodes get deleted automatically. For this reason, ephemeral znodes are not allowed to have a children further. Ephemeral znodes play an important role in Leader election.
  • Persistence znodes - Persistence znode is alive even after the client, which created that particular znode, is disconnected. By default, all znodes are persistent unless otherwise specified.
  • Sequential znodes - Sequential znodes can be either persistent or ephemeral. When a new znode is created as a sequential znode, then ZooKeeper sets the path of the znode by attaching a 10 digit sequence number to the original name.

    For example, let’s say client created a cznode. In the ZooKeeper server, the cznode will be named like this: cznode0000000001 If client creates another sequential znode, it would bear the next number in a sequence. So the next sequential znode will be called <znode-name>0000000002. Sequential znodes play an important role in Locking and Synchronization.

B What are some of the prime features of Apache ZooKeeper are?

  • Reliable System: This system is very reliable as it keeps working even if a node fails.
  • Simple Architecture: The architecture of ZooKeeper is quite simple as there is a shared hierarchical namespace which helps coordinating the processes.
  • Fast Processing: Zookeeper is specially fast in "read-dominant" workloads (i.e. workloads in which reads are much more common than writes).
  • Scalable: The performance of ZooKeeper can be improved by adding nodes.