blog

Home / DeveloperSection / Blogs / Big Data: HBase as Distributed, Persistent, Multidimensional Sorted Map

Big Data: HBase as Distributed, Persistent, Multidimensional Sorted Map

marcel ethan3235 09-May-2016

Now we are very well familiar with the power packed characteristics and nature of Hbase. As we define already Hbase is a big data tool (used by Hadoop) that is based on Google Big table Distributed data storage system (DDSS) and Google defined it as a “sparse, distributed, persistent multidimensional sorted map.”

Previously, we have seen what exactly it means by “Sparse” and how Hbase is designed to fit this very nature of Sparseness by supporting with no waste of costly storage space for null values and also how we can dynamically add data field’s overtime without having concerned about redesigning the schema or disrupt any operation.

But here we examine the last three characteristics of the definition, how Hbase behaves as distributed, persistent multidimensional stored map.

HBase is distributed and persistent

Google’s BigTable is a distributed and persistent database store. By Persistent, it simply means that the data we store in BigTable (and HBase, for that matter) will persist or remain after our program or session ends. That’s pretty straightforward — persistent means that it persists — but we should spend a little more time thinking about how the data is persisted. In the BigTable paper, Google explains the distributed file system also called as Google File System or GFS. It turns out that, just as HBase is an open source implementation of Google’s BigTable, likewise HDFS is also an open source implementation of GFS. By default, HBase leverages HDFS to enable persistent nature and persist its data to disk storage. But, we can use other distributed data stores with HBase, the big majority of the HBase installations leverage HDFS. This makes an ideal clue given that HBase is the “Hadoop Database” — hey, it’s  already built into the name, for goodness sake.

HDFS is an important enabling technology  just not only for Hadoop but also for HBase as well. By storing data in HDFS, HBase enables reliability, availability, seamless scalability, high performance and many more features— all on cost effective distributed servers!

HBase has a multidimensional Sorted Map

Let’s start with the basics, a map (also called as an associative array) is an abstract collection of key-value pairs, where the key is always unique. This definition is crucial to our understanding of HBase because the HBase data model is often described in many ways — most of the time incompletely as a column-oriented store. HBase is, in actual at  the bottom, a key-value data store in which each key is unique — meaning it appears at most once in the HBase data store. Additionally, the map is sorted and multidimensional too. The keys are stored in HBase and sorted in bytelexicographical order. Every value can have multiple versions, which makes the data model multidimensional. By default, data versions are implemented with a timestamp.


Leave Comment

Comments

Liked By