Previously we are introduced to the HBase architectureand examine the most basic component of the architecture that is RegionServer. We have seen that using auto-sharding the Hbase system automatically split these database table and distribute the load to another region server when a certain configuration table limit is reached. Now we identify how exactly Hbase system does all this mechanism on the basis of regions.
Here we have Hbase Architecture model, from the previous post.
RegionServers are one thing, but we also have to take a look at how individual regions work. In HBase, a table is both spread across numerous RegionServers as well as being made up of individual regions. As tables are split, this splits become regions. Regions store a particular range of key-value pairs, and every RegionServer manages a configurable number of regions. But how do an individual region look like? HBase is a column-family-oriented data store, so how do the individual regions store key-value pairs on the basis of column families they belong to? The below figure (second figure) begins to answer these questions and helps us digest more vital information about the architecture of HBase.
HBase is written and implemented in Java — likewise the vast majority of Hadoop technologies. Java is an object oriented programming (OOPS) language and an ideal technology for distributed computing. So, as we continue to find out more about HBase, remember that each of the component in the architecture is ultimately Java object.
First off, all second figure gives us a pretty clear picture of what region objects actually look like, generally speaking. This Figure also makes it clear for us, that the regions separate data into column families and stores the data in the HDFS using HFile objects. When clients stores key-value pairs into the data tables, the keys are processed so that data is stored on the basis of column family the pair belongs to. As shown in the figure, every column family store object has two more components:
· A read cache known as the BlockCache.
· A write cache known as the MemStore.
The BlockCache helps us with random read performance. Data is read in blocks from the HDFS and stored in the BlockCache. Subsequent reads for the data — or data stored in close proximity — will be read from RAM instead of disk, which improves overall performance. The Write Ahead Log (WAL, for short) make sures that our HBase writes are reliable. There is oneWAL per RegionServer.
Always heed the Iron Law of Distributed Computing: A failure isn’t the exception— it’s the norm, especially when clustering hundreds or even thousands of servers. Google considered the Iron Law in designing BigTable and HBase followed suit. But, node failures are handled in HBase and we use WAL as a key part of this overall strategy.
When we write or modify data in HBase, the data is first persisted to the WAL, which is stored in the HDFS, and then the data is written to the MemStore cache. At configurable intervals, key-value pairs stored in the MemStore are written to HFiles in the HDFS and afterwards WAL entries are erased. If a failure occurs after the initial WAL write but before the final MemStore write to disk, the WAL can be replayed to avoid any data loss.
Above Figure shows two HFile objects in each column family. The design of HBase is to flush column family data stored in the MemStore to one HFile per flush. Then at configurable intervals HFiles are combined into larger HFiles. This strategy queues up the critical compaction operation in HBase, which we learn in the next post.
Elena Glibart
13-Apr-2017