☰
|
|
| Written in | Java |
|---|---|
| Operating system | Cross-platform |
| Type | Distributed computing |
| License | Apache License 2.0 |
| Website | zookeeper.apache.org |
ZooKeeper is a distributed, open-source coordination service for distributed applications. It is also called as 'King of Coordination'. It exposes a simple set of primitives that distributed applications can build upon to implement higher level services for synchronization, configuration maintenance, and groups and naming. It is designed to be easy to program to, and uses a data model styled after the familiar directory tree structure of file systems. It runs in Java and has bindings for both Java and C.
Coordination services are notoriously hard to get right. They are especially prone to errors such as race conditions and deadlock. The motivation behind ZooKeeper is to relieve distributed applications the responsibility of implementing coordination services from scratch.
In the good old past, each application software was a single program running on a single computer with a single CPU. Today, things have changed. In the Big Data world, application softwares are made up of many independent programs running on an ever-changing set of computers. These applications are known as Distributed Application. A distributed application can run on multiple systems in a network simultaneously by coordinating among themselves to complete a particular task in a fast and efficient manner.
Building distributed systems is hard. Nowadays, a lot of the software applications people use daily, however, depend on such systems, and it doesn’t look like we will stop relying on distributed computer systems any time soon. Coordinating the actions of the independent programs in a distributed systems is far more difficult than writing a single program to run on a single computer. It is easy for developers to get mired in coordination logic and lack the time to write their application logic properly or perhaps the converse, to spend little time with the coordination logic and simply to write a quick-and-dirty master coordinator that is fragile and becomes an unreliable single point of failure.
ZooKeeper was designed to be a robust service that enables application developers to focus mainly on their application logic rather than coordination. It exposes a simple API, inspired by the filesystem API, that allows developers to implement common coordination tasks, such as electing a master server, managing group membership, and managing metadata. ZooKeeper is an application library with two principal implementations of the APIs—Java and C—and a service component implemented in Java that runs on an ensemble of dedicated servers.
When designing an application with ZooKeeper, one ideally separates application data from control or coordination data.
ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.
A distributed system is a collection of independent computers that appear to the users of the system as a single computer. A distributed system is a model in which components located on networked computers communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal.
A distributed application is software that is executed or run on multiple computers within a network and can be stored on servers or with cloud computing. These applications interact in order to achieve a specific goal or task. Traditional applications relied on a single system to run them. Unlike traditional applications that run on a single system, distributed applications run on multiple systems simultaneously for a single task or job.
The word distributed means data being spread out over more than one computer in a network. Distributed applications are broken up into two separate programs: the client software and the server software. The client accesses the data from the server, while the server processes the data. Cloud computing can be used instead of servers or hardware to process a distributed application's data. With distributed applications, if a node that is running a particular application goes down, another node can resume the task.
Normally, huge, complex and time-consuming tasks, which will take hours to complete by an application running in a single system can be done in minutes by a distributed application by using computing capabilities of all the system involved within the dedicated distributed system. Time can be further reduced by configuring the distributed application to run on more systems. In a distributed application architecture a group of systems in which a distributed application is running is called a Cluster and each machine in a cluster is called a Node.
An distributed application architecture has two parts, Server and Client application. Server applications are the ones which is distributed and have a common interface so that clients can connect to any server in the cluster and get the prcess done.
Running a application on multiple computers within a network has its share of advantages and disadvantages. Lets see them:
Despite these potential problems, many people feel that the advantages outweigh the disadvantages, and it is expected that distributed systems will become increasingly important in the coming years. In fact, it is likely that within a few years, most organizations will connect most of their computers into large distributed systems to provide better, cheaper, and more convenient service for the users.
ZooKeeper was developed at Yahoo! Research. Yahoo had been working on ZooKeeper for a while and pitching it to other groups. At the time the ZooKeeper group had been working with the Hadoop team and had started a variety of projects with the names of animals, Apache Pig being the most well known. As the group started talking about different possible names, one of the group members mentioned that they should avoid another animal name because it started to sound like a zoo. That is when it clicked: distributed systems are a zoo. They are chaotic and hard to manage, and ZooKeeper is meant to keep them under control.
When designing a distributed system, there is typically a need for designing and developing some coordination services:
Previous systems in a distributed sytems have implemented components like distributed lock managers or have used distributed databases for coordination. While it's possible to design and implement all of these services from scratch, it's extra work and difficult to debug any problems, race conditions, or deadlocks. Just like you don't go around writing your own hashing function in your code, there was a need that people shouldn't go around writing their own name services or leader election services from scratch every time they need it. Moreover, you could hack together a very simple group membership service relatively easily, but it would require much more work to write it to provide reliability, replication, and scalability. This led to the development and open sourcing of Apache ZooKeeper, an out-of-the box reliable, scalable, and high-performance coordination service for distributed systems.
ZooKeeper, in fact, borrows a number of concepts from these prior systems. It does not expose a lock interface or a general purpose interface for storing data, however. The design of ZooKeeper is specialized and very focused on coordination tasks. It is certainly possible to build distributed systems without using ZooKeeper. ZooKeeper, however, offers developers the possibility of focusing more on application logic rather than on arcane distributed systems concepts. Programming distributed systems without ZooKeeper is possible, but more difficult.
When an application starts up, all of the different processes needs to find the application configuration. Over time this configuration may change. We could shut everything down, redistribute configuration files, and restart, but that may incur extended periods of application downtime during reconfiguration. Also as the load changes, we want to be able to add or remove new machines and processes.
The problems described above are functional problems that you can design solutions for and you can test your solutions before deployment. But the truly difficult problems encounter, when the distributed applications have to do with faults specifically, crashes and communication faults. These failures can crop up at any point, and it may be impossible to enumerate all the different cases that need to be handled.
One of the diferences between single machine and distributed applications is: When a single machine crashes, all the processes running on that machine fail. If there are multiple processes running on the machine and a process fails, the other processes can find out about the failure from the operating system. The operating system can also provide strong messaging guarantees between processes. All of this changes in a distributed environment: if a machine or process fails, other machines will keep running and may need to take over for the faulty processes. To handle faulty processes, the processes that are still running must be able to detect the failure; messages may be lost, and there may even be clock drift.
Okay, so we cannot have an ideal fault-tolerant, distributed, real-world system that transparently takes care of all problems that might ever occur. We can strive for a slightly less ambitious goal, though.
Having pointed out that the perfect solution is impossible, we can repeat that ZooKeeper is not going to solve all the problems that the distributed application developer has to face. It does give the developer a nice framework to deal with these problems, though.
ZooKeeper is itself a distributed application providing services for developing a distributed application. It coordinates a group of nodes within the cluster and maintains shared data with effective synchronization techniques. Some of the services provided by zookeeper are:
From the overview chapter we have understood the requirements of distributed applications at a high level. In the next chapters we will learn about zookeeper.