BIG
DATA

JAVA

Apache HBase Data Model

Read more about »
  • Java 9 features
  • Read about Hadoop
  • Read about Storm
  • Read about Storm
 

Data Model

In HBase, data is stored in tables, which have rows and columns. HBase is referred to as a column family-oriented data store. It’s has rows and each row is indexed by a key called rowkey that you can use for lookup. Each column family groups, like data within rows. Think of a row as the join of all values in all column families. Records in HBase are stored in sorted order, according to rowkey.

Tables are divided into sequences of rows, by key range, called regions. These regions are then assigned to the data nodes in the cluster called RegionServers. This scales read and write capacity by spreading regions across the cluster. This is done automatically and is how HBase was designed for horizontal sharding. Column families are stored in separate files, which can be accessed separately.

The data HBase is stored in HBase table cells. The entire cell, with the added structural information, is called Key Value. The entire cell, the row key, column family name, column name, timestamp, and value are stored for every cell for which you have set a value. The key consists of the row key, column family name, column name, and timestamp. Table cells are versioned uninterpreted arrays of bytes.

HBase table cell structure

Logically, cells are stored in a table format, but physically, rows are stored as linear sets of cells containing all the key value information inside them. In the image below, the top left shows the logical layout of the data, while the lower right section shows the physical storage in files. Column families are stored in separate files. HBase logical vs physical storage

The complete coordinates to a cell's value are: Table:Row:Family:Column:Timestamp ➔ Value. HBase tables are sparsely populated. If data doesn’t exist at a column, it’s not stored.

Time To Live (TTL)
ColumnFamilies can set a TTL length in seconds, and HBase will automatically delete rows once the expiration time is reached. This applies to all versions of a row, even the current one.

Column Metadata
There is no store of column metadata outside of the internal KeyValue instances for a ColumnFamily. Thus, while HBase can support not only a wide number of columns per row, but a heterogeneous set of columns between rows as well, it is your responsibility to keep track of the column names. The only way to get a complete set of columns that exist for a ColumnFamily is to process all the rows.


Versions

A {row, column, version} tuple exactly specifies a cell in HBase. It’s possible to have an unbounded number of cells where the row and column are the same but the cell address differs only in its version dimension.

While rows and column keys are expressed as bytes, the version is specified using a long integer. Typically this long contains time instances such as those returned by java.util.Date.getTime() or System.currentTimeMillis(), that is: the difference, measured in milliseconds, between the current time and midnight, January 1, 1970 UTC. The HBase version dimension is stored in decreasing order, so that when reading from a store file, the most recent values are found first.

Versioning is built in. A put is both an insert (create) and an update, and each one gets its own version. Delete gets a tombstone marker. The tombstone marker prevents the data being returned in queries. Get requests return specific version(s) based on parameters. If you do not specify any parameters, the most recent version is returned.


HBase Data Model Terminology

Table

An HBase table consists of multiple rows. Tables are declared up front at schema definition time.

Row

A row in HBase consists of a row key and one or more columns with values associated with them. Row keys are uninterpreted bytes. Rows are lexicographically sorted with the lowest order appearing first in a table. For this reason, the design of the row key is very important. The goal is to store data in such a way that related rows are near each other.

Column

A column in HBase consists of a column family and a column qualifier, which are delimited by a : (colon) character.

Column Family

Columns in Apache HBase are grouped into column families. Column families physically colocate a set of columns and their values, often for performance reasons. Each row in a table has the same column families, though a given row might not store anything in a given column family.

All column members of a column family have the same prefix. For example, the columns courses:history and courses:math are both members of the courses column family. The colon character (:) delimits the column family from the column family qualifier. The column family prefix must be composed of printable characters.

Column families must be declared up front at schema definition time whereas columns do not need to be defined at schema time but can be conjured on the fly while the table is up and running. Physically, all column family members are stored together on the filesystem.

Column Qualifier

A column qualifier is added to a column family to provide the index for a given piece of data. Given a column family content, a column qualifier might be content:html, and another might be content:pdf. Though column families are fixed at table creation, column qualifiers are mutable and may differ greatly between rows.

Cell

A cell is a combination of row, column family, and column qualifier, and contains a value and a timestamp, which represents the value’s version. A {row, column, version} tuple exactly specifies a cell in HBase. Cell content is uninterpreted bytes.

Timestamp

A timestamp is written alongside each value, and is the identifier for a given version of a value. By default, the timestamp represents the time on the RegionServer when the data was written, but you can specify a different timestamp value when you put data into the cell.


HBase Data Model Operations

The four primary data model operations are Get, Put, Scan, and Delete.

  • Get : Get returns attributes for a specified row. Gets are executed via HTable.get.
  • Put : Put either adds new rows to a table (if the key is new) or can update existing rows (if the key already exists). Puts are executed via HTable.put(writeBuffer).
  • Scans : Scan allow iteration over multiple rows for specified attributes.
  • Delete : Delete removes a row from a table. Deletes are executed via HTable.delete. HBase does not modify data in place, and so deletes are handled by creating new markers called tombstones markers. These tombstones, along with the dead values, are cleaned up on major compactions.

HBase Sort Order
All HBase data model operations return data in sorted order. First by row, then by ColumnFamily, followed by column qualifier, and finally timestamp (sorted in reverse, so newest records are returned first).


HBase Data Delete Operation

When a Delete command is issued through the HBase client, no data is actually deleted. Surprised!!. What really happens during data deletion process ?

When a Delete command is issued through the HBase client, no data is actually deleted. Instead a tombstone marker is set, making the deleted cells effectively invisible. User Scans and Gets automatically filter deleted cells until they get removed.

The tombstone markers and deleted cells are only deleted during major compactions (which compacts all store files to a single one). Hence HBase periodically removes deleted cells during compactions.

There are 3 different types of internal delete markers.

  • Delete: for a specific version of a column.
  • Delete column: for all versions of a column.
  • Delete family: for all columns of a particular ColumnFamily

When deleting an entire row, HBase will internally create a tombstone for each ColumnFamily (i.e., not each individual column). Deletes work by creating tombstone markers. For example, let’s suppose we want to delete a row. For this you can specify a version, or else by default the currentTimeMillis is used. What this means is delete all cells where the version is less than or equal to this version.

HBase never modifies data in place, so for example a delete will not immediately delete (or mark as deleted) the entries in the storage file that correspond to the delete condition. Rather tombstone is written, which will mask the deleted values. When HBase does a major compaction, the tombstones are processed to actually remove the dead values, together with the tombstones themselves. If the version you specified when deleting a row is larger than the version of any value in the row, then you can consider the complete row to be deleted.

Also, if you delete data and put more data but with an earlier timestamp than the tombstone timestamp, further gets may be masked by the tombstone marker. It only gets fixed after major compaction has run and hence you will not receive the inserted value till after major compaction in this case.