☰
HBase is an open-source, NoSQL, distributed, non-relational, versioned, multi-dimensional, column-oriented store which is built on the same principles as Google BigTable that runs on top of Apache HDFS with random real-time read/write access to data. HBase provides all the features of Google BigTable.
HBase is a distributed database, meaning it is designed to run on a cluster of few to possibly thousands of servers. As a result it is more complicated to install. All the typical problems of distributed computing begin to come into play such as coordination and management of remote processes, locking, data distribution, network latency and number of round trips between servers. Fortunately HBase makes use of several other mature technologies, such as Apache Hadoop and Apache ZooKeeper, to solve many of these issues.
HBase is a column-oriented data store, meaning it stores data by columns rather than by rows. In HBase if there is no data for a given column family, it simply does not store anything at all; contrast this with a relational database which must store null values explicitly. In addition, when retrieving data in HBase, you should only ask for the specific column families you need; because there can literally be millions of columns in a given row, you need to make sure you ask only for the data you actually need.
Like HDFS, HBase architecture follows the traditional master slave model where you have a master which takes decisions and one or more slaves which does the real task. In HBase, the master is called HMaster and slaves are called HRegionServers
As you can see from the above diagram, typically, the HBase cluster has one Master node, called HMaster and multiple Region Servers called HRegionServer. Each Region Server contains multiple Regions called HRegions. Also HBase uses ZooKeeper as a distributed coordination service to maintain server state in the cluster.
Data in HBase is stored in Tables and these Tables are stored in Regions. When a Table becomes too big, the Table is partitioned into multiple Regions. These Regions are assigned to Region Servers across the cluster. Each Region Server hosts roughly the same number of Regions.
Lets see each component in more detail.
HMaster is the implementation of the Master Server. The Master server is responsible for monitoring all RegionServer instances in the cluster, and is the interface for all metadata changes. In a distributed cluster, the Master typically runs on the NameNode.
The HMaster in the HBase is responsible for:
HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions. Regions are nothing but tables that are split up and spread across the region servers. In a distributed cluster, a RegionServer runs on a DataNode.
When Region Server (RS) receives write request, it directs the request to specific Region. Each Region stores set of rows. Rows data can be separated in multiple column families (CFs). Data of particular CF is stored in HStore which consists of Memstore and a set of HFiles.
The HRegionServer perform the following task:
Each Region Server contains a Write-Ahead Log (called HLog) and multiple Regions. Each Region in turn is made up of a MemStore and multiple StoreFiles (HFile). The data lives in these StoreFiles in the form of Column Families (explained below). The MemStore holds in-memory modifications to the Store (data).
The mapping of Regions to Region Server is kept in a system table called .META. When trying to read or write data from HBase, the clients read the required Region information from the .META table and directly communicate with the appropriate Region Server.
The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. Under normal operations, the WAL is not needed because data changes move from the MemStore to StoreFiles. However, if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed. If writing to the WAL fails, the entire operation to modify the data fails. Usually, there is only one instance of a WAL per RegionServer. The RegionServer records Puts and Deletes to it, before recording them to the MemStore for the affected store. The WAL resides in HDFS in the /hbase/WALs/ directory (prior to HBase 0.94, they were stored in /hbase/.logs/), with subdirectories per region.
With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck. HBase 1.0 introduces support MultiWal. MultiWAL allows a RegionServer to write multiple WAL streams in parallel, by using multiple pipelines in the underlying HDFS instance, which increases total throughput during writes. This parallelization is done by partitioning incoming edits by their Region.
The class which implements the WAL is called HLog. HLog stores all the edits to the HStore. It performs logfile-rolling, so external callers are not aware that the underlying file is being rolled. There is one HLog per RegionServer. All edits for all Regions carried by a particular RegionServer are entered first in the HLog. When a HRegion is instantiated the single HLog is passed on as a parameter to the constructor of HRegion.
Regions are the basic element of availability and distribution for tables, and are comprised of a Store per Column Family. Regions are nothing but tables that are split up and spread across the region servers. The heirarchy of objects is as follows:
Table(HBase table)Region(Regions for the table)Store(Store per ColumnFamily for each Region for the table)MemStore(MemStore for each Store for each Region for the table)StoreFile(StoreFiles for each Store for each Region for the table)Block(Blocks within a StoreFile within a Store for each Region for the table)
A Store hosts a MemStore and 0 or more StoreFiles (HFiles). A Store corresponds to a column family for a table for a given region.
HBase comes integrated with Zoo keeper. When You start HBase, Zoo keeper instance is also started. The reason is that the Zoo keeper helps us in keeping a track of all region servers that are there for HBase. Zoo keeper keeps track of how many region servers are there, which region servers are holding from which data node to which data node. It keeps track of smaller data sets where Hadoop is missing out. It decreases the overhead on top of Hadoop which keeps track of most of your Meta data. Hence HMaster gets the details of region servers by actually contacting Zoo keeper.
Note: You can also start HBase without inbuilt ZooKeeper. To point HBase at an existing ZooKeeper cluster, one that is not managed by HBase, set HBASE_MANAGES_ZK in conf/hbase-env.sh to false and next set ensemble locations and client port, if non-standard, in hbase-site.xml.
There are two types of compactions: minor and major. Minor compactions will usually pick up a couple of the smaller adjacent StoreFiles and rewrite them as one. Minors do not drop deletes or expired cells, only major compactions do this. Sometimes a minor compaction will pick up all the StoreFiles in the Store and in this case it actually promotes itself to being a major compaction.
After a major compaction runs there will be a single StoreFile per Store, and this will help performance usually. Caution: major compactions rewrite all of the Stores data and on a loaded system, this may not be tenable; major compactions will usually have to be done manually on large systems.