BIG
DATA

JAVA

Apache ZooKeeper Installation and Configuration

Read more about »
  • Java 9 features
  • Read about Hadoop
  • Read about Storm
  • Read about Storm
 

Now let's set up and deploy ZooKeeper...

Apache ZooKeeper installation pre-requisites

Supported Platforms

Before installing ZooKeeper, make sure your system is running on any of the following OS-

  • Linux OS : supported as a development and production platform for both server and client.
  • Windows OS : supported as a development platform only for both server and client.
  • Mac OS : supported as a development platform only for both server and client.

Required Software

ZooKeeper is created in Java. It runs in Java, release 1.6 or greater (JDK 6 or greater, FreeBSD support requires openjdk7). For instructions to install Java, click here: Setting Java

Download ZooKeeper

To get a ZooKeeper Server Package, download a recent stable release from :
http://zookeeper.apache.org/releases.html

Install Zookeeper

Make sure all the pre-requisites are met before you follow with the below instructions

ZooKeeper cluster setup

For reliable ZooKeeper service, you should deploy ZooKeeper in a cluster known as an ensemble. As long as a majority of the ensemble are up, the service will be available. Because Zookeeper requires a majority, it is best to use an odd number of machines. Three ZooKeeper servers is the minimum recommended size for an ensemble, and we also recommend that they run on separate machines.

Here are the steps to install zookeeper service in a server that will be part of an ensemble. These steps should be performed on every host in the ensemble:

  • Install the Java JDK. This is required because ZooKeeper server runs on JVM.
  • Download the ZooKeeper Server Package from:
    http://zookeeper.apache.org/releases.html

    As of now, the latest version of ZooKeeper is 3.4.6 (ZooKeeper-3.4.6.tar.gz).

  • Extract the tar file: Extract the tar file to an appropriate location using the following commands
    $ cd opt/
    $ tar -zxf zookeeper-3.4.6.tar.gz
    
  • Create a directory for storing the state associated with the ZooKeeper server:
    mkdir /var/lib/zookeeper
    
  • Set up the configuration. Create or edit the zookeeper-3.4.6/conf/zoo.cfg file to look something like
    tickTime=2000
    dataDir=/var/lib/zookeeper
    clientPort=2181
    initLimit=5
    syncLimit=2
    server.1=zkserver1.corejavaguru.com:2888:3888
    server.2=zkserver2.corejavaguru.com:2888:3888
    server.3=zkserver3.corejavaguru.com:2888:3888
    

    Note: Ports 2181, 2888, and 3888 should be open across all three machines. In this example, config, port 2181 is used by ZooKeeper clients to connect to the ZooKeeper servers, port 2888 is used by peer ZooKeeper servers to communicate with each other, and port 3888 is used for leader election. You may chose any ports of your liking. It's usually recommended that you use the same ports on all of the ZooKeeper servers.

    Every machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. You accomplish this with the series of lines of the form server.id=host:port:port. The parameters host and port are straightforward. You attribute the server id to each machine by creating a file named myid, one for each server, which resides in that server's data directory, as specified by the configuration file parameter dataDir.

  • Create a /var/lib/zookeeper/myid file. The myid file consists of a single line containing only the text of that machine's id. So myid of zkserver1.corejavaguru.com would contain the text "1" and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255. Similarly the contents of this file would be just the numeral 2 on zkserver2.corejavaguru.com, and numeral 3 onzkserver3.corejavaguru.com.

  • Now you are ready to start the ZooKeeper servers on each of these machines if your configuration file is set up as instructed in above steps.
    cd zookeeper3.4.5/bin/
    zkServer.sh start
    

    After executing start command, you will get a response as follows:

    $ JMX enabled by default
    $ Using config: /Users/../zookeeper-3.4.6/bin/../conf/zoo.cfg
    $ Starting zookeeper ... STARTED
    
  • Now, you can start a CLI client from one of the machines you are running the ZooKeeper server. To start execute the following command
    bin/zkCli.sh
    

    Once CLI is started, you will be connected to the ZooKeeper server and you should get the below response.

    Connecting to localhost:2181
    ................
    ................
    ................
    Welcome to ZooKeeper!
    ................
    ................
    WATCHER::
    WatchedEvent state:SyncConnected type: None path:null
    [zk: localhost:2181(CONNECTED) 0]
    
  • You can stop the zookeeper server by using the following command.
    bin/zkServer.sh stop
    

ZooKeeper single node setup

If you want to setup ZooKeeper for development purposes, you will probably want to setup a single server instance of ZooKeeper. The steps to setting up a single server instance are the similar to the above, except the configuration file is simpler like below.

tickTime=2000
dataDir=/var/lib/zookeeper
clientPort=2181
initLimit=5
syncLimit=2

Also myid file creation is not required for single node setup. And you can start and stop as usual.

Zookeeper configuration

ZooKeeper's behavior is governed by the ZooKeeper configuration file. This file is designed so that the exact same file can be used by all the servers that make up a ZooKeeper server assuming the disk layouts are the same. If servers use different configuration files, care must be taken to ensure that the list of servers in all of the different configuration files match.

Here are the minimum configuration keywords that must be defined in the configuration file:

  • clientPort: the port to listen for client connections; that is, the port that clients attempt to connect to.
  • dataDir: the location where ZooKeeper will store the in-memory database snapshots and, unless specified otherwise, the transaction log of updates to the database.
  • tickTime: the length of a single tick, which is the basic time unit used by ZooKeeper, as measured in milliseconds. It is used to regulate heartbeats, and timeouts. For example, the minimum session timeout will be two ticks.

Other importent configurations...

  • initLimit: Amount of time, in ticks (see tickTime), to allow followers to connect and sync to a leader. Increased this value as needed, if the amount of data managed by ZooKeeper is large.
  • server.x=[hostname]:nnnnn[:nnnnn]: servers making up the ZooKeeper ensemble. When the server starts up, it determines which server it is by looking for the file myid in the data directory. That file contains the server number, in ASCII, and it should match x in server.x in the left hand side of this setting.

    The list of servers that make up ZooKeeper servers that is used by the clients must match the list of ZooKeeper servers that each ZooKeeper server has. There are two port numbers nnnnn. The first followers use to connect to the leader, and the second is for leader election.

  • syncLimit: Amount of time, in ticks (see tickTime), to allow followers to sync with ZooKeeper. If followers fall too far behind a leader, they will be dropped.
  • dataLogDir: This option will direct the machine to write the transaction log to the dataLogDir rather than the dataDir. This allows a dedicated log device to be used, and helps avoid competition between logging and snaphots.