Apache HBase Architecture

Architecture

HBase is an open-source, NoSQL, distributed, non-relational, versioned, multi-dimensional, column-oriented store which is built on the same principles as Google BigTable that runs on top of Apache HDFS with random real-time read/write access to data. HBase provides all the features of Google BigTable.

HBase is a distributed database, meaning it is designed to run on a cluster of few to possibly thousands of servers. As a result it is more complicated to install. All the typical problems of distributed computing begin to come into play such as coordination and management of remote processes, locking, data distribution, network latency and number of round trips between servers. Fortunately HBase makes use of several other mature technologies, such as Apache Hadoop and Apache ZooKeeper, to solve many of these issues.

HBase is a column-oriented data store, meaning it stores data by columns rather than by rows. In HBase if there is no data for a given column family, it simply does not store anything at all; contrast this with a relational database which must store null values explicitly. In addition, when retrieving data in HBase, you should only ask for the specific column families you need; because there can literally be millions of columns in a given row, you need to make sure you ask only for the data you actually need.

Like HDFS, HBase architecture follows the traditional master slave model where you have a master which takes decisions and one or more slaves which does the real task. In HBase, the master is called HMaster and slaves are called HRegionServers

HBase Architectural Components

As you can see from the above diagram, typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions called HRegions. Also HBase uses ZooKeeper as a distributed coordination service to maintain server state in the cluster.

Data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.

Lets see each component in more detail.

HMaster

HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. In a distributed cluster, the Master typically runs on the NameNode.

The HMaster in the HBase is responsible for:

It performs administration
It manages and monitors the cluster
It coordinates the region servers by
- Assigning regions on startup , re-assigning regions for recovery or load balancing
- Monitoring all RegionServer instances in the cluster (listens for notifications from zookeeper)
It controls the Load Balancing and Failover
Its an interface for creating, deleting, updating tables
It interacts with the external world (Hmaster web site, client, Region Servers and other management utilities like JConsole).
It uses zookeeper to keep track of certain events and happenings in the cluster. In Master, a centralized class called ZookeeperWatcher acts as a proxy for any event tracker which uses zookeeper. All the common things like connection handling, node management and exceptions are handled here. Any tracker which needs the service of this call must register with this class to get notified of any specific event.

Chores done by HMaster

Log Cleaner Chore: Hmaster runs a chore to delete the Hlogs in the oldlogs directory.
HFile Cleaner Chore: There is also a chore which runs at some specified intervals which handles the HFile cleaning functions inside the master.
Balancer Chore: The balancer is a tool that balances disk space usage on an HDFS cluster when some datanodes become full or when new empty nodes join the cluster.
Catalog Janitor Chore: A janitor for catalog tables. It scans the META tables looking for unused regions to garbage collect.

HRegionServer

HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions. Regions are nothing but tables that are split up and spread across the region servers. In a distributed cluster, a RegionServer runs on a DataNode.

When Region Server (RS) receives write request, it directs the request to specific Region. Each Region stores set of rows. Rows data can be separated in multiple column families (CFs). Data of particular CF is stored in HStore which consists of Memstore and a set of HFiles.

The HRegionServer perform the following task:

Hosting and managing Regions
Splitting the Regions automatically
Handling of read and write requests for all the regions under it.
Communicating with the Clients directly
The RegionServer runs a variety of background threads:
- Checks for splits and handle minor compactions.
- Checks for major compactions.
- Periodically flushes in-memory writes in the MemStore to StoreFiles.
- Periodically checks the RegionServer’s WAL.

Each Region Server contains a Write-Ahead Log (called HLog) and multiple Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles (HFile). The data lives in these StoreFiles in the form of Column Families (explained below). The MemStore holds in-memory modifications to the Store (data).

The mapping of Regions to Region Server is kept in a system table called .META. When trying to read or write data from HBase, the clients read the required Region information from the .META table and directly communicate with the appropriate Region Server.

Write Ahead Log (WAL)

The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. Under normal operations, the WAL is not needed because data changes move from the MemStore to StoreFiles. However, if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed. If writing to the WAL fails, the entire operation to modify the data fails. Usually, there is only one instance of a WAL per RegionServer. The RegionServer records Puts and Deletes to it, before recording them to the MemStore for the affected store. The WAL resides in HDFS in the /hbase/WALs/ directory (prior to HBase 0.94, they were stored in /hbase/.logs/), with subdirectories per region.

With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck. HBase 1.0 introduces support MultiWal. MultiWAL allows a RegionServer to write multiple WAL streams in parallel, by using multiple pipelines in the underlying HDFS instance, which increases total throughput during writes. This parallelization is done by partitioning incoming edits by their Region.

HLog

The class which implements the WAL is called HLog. HLog stores all the edits to the HStore. It performs logfile-rolling, so external callers are not aware that the underlying file is being rolled. There is one HLog per RegionServer. All edits for all Regions carried by a particular RegionServer are entered first in the HLog. When a HRegion is instantiated the single HLog is passed on as a parameter to the constructor of HRegion.

HRegions

Regions are the basic element of availability and distribution for tables, and are comprised of a Store per Column Family. Regions are nothing but tables that are split up and spread across the region servers. The heirarchy of objects is as follows:

Table	(HBase table)
  Region	(Regions for the table)
    Store	(Store per ColumnFamily for each Region for the table)
      MemStore	(MemStore for each Store for each Region for the table)
      StoreFile	(StoreFiles for each Store for each Region for the table)
        Block	(Blocks within a StoreFile within a Store for each Region for the table)

HStore

A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.

MemStore
The MemStore holds in-memory modifications to the Store. Modifications are KeyValues. When asked to flush, current memstore is moved to snapshot and is cleared. HBase continues to serve edits out of new memstore and backing snapshot until flusher reports in that the flush succeeded. At this point the snapshot is let go.
StoreFile (HFile)
StoreFiles are where your data lives.
Blocks
StoreFiles are composed of blocks. The blocksize is configured on a per-ColumnFamily basis. Compression happens at the block level within StoreFiles.

ZooKeeper

HBase comes integrated with Zoo keeper. When You start HBase, Zoo keeper instance is also started. The reason is that the Zoo keeper helps us in keeping a track of all region servers that are there for HBase. Zoo keeper keeps track of how many region servers are there, which region servers are holding from which data node to which data node. It keeps track of smaller data sets where Hadoop is missing out. It decreases the overhead on top of Hadoop which keeps track of most of your Meta data. Hence HMaster gets the details of region servers by actually contacting Zoo keeper.

Note: You can also start HBase without inbuilt ZooKeeper. To point HBase at an existing ZooKeeper cluster, one that is not managed by HBase, set HBASE_MANAGES_ZK in conf/hbase-env.sh to false and next set ensemble locations and client port, if non-standard, in hbase-site.xml.

Compaction

There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction will pick up all the StoreFiles in the Store and in this case it actually promotes itself to being a major compaction.

After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. Caution: major compactions rewrite all of the Stores data and on a loaded system, this may not be tenable; major compactions will usually have to be done manually on large systems.

← Previous
HBase-Introduction

Next →
HBase Data Model

BIG
DATA

JAVA

Apache HBase Architecture

Architecture

HBase Architectural Components

HMaster

Chores done by HMaster

HRegionServer

Write Ahead Log (WAL)

HLog

HRegions

HStore

ZooKeeper

Compaction

BIGDATA

JAVA

Apache HBase Architecture

CoreJavaGuru

Vivek HJ

Architecture

HBase Architectural Components

HMaster

Chores done by HMaster

HRegionServer

Write Ahead Log (WAL)

HLog

HRegions

HStore

ZooKeeper

Compaction

BIG
DATA