HBase Interview Questions and Answers For Freshers Part-1
1. What is Apache HBase?
It is a column oriented database which is used to store the sparse data sets. It is run on the top of Hadoop file distributed system.Apache HBase is a database that runs on a Hadoop cluster. Clients can access HBase data through either a native Java API, or through a Thrift or REST gateway, making it accessible by any language. Some of the key properties of HBase include:
• NoSQL: HBase is not a traditional relational database (RDBMS). HBase relaxes the ACID (Atomicity, Consistency, Isolation, Durability) properties of traditional RDBMS systems in order to achieve much greater scalability. Data stored in HBase also does not need to fit into a rigid schema like with an RDBMS, making it ideal for storing unstructured or semi-structured data.
• Wide-Column: HBase stores data in a table-like format with the ability to store billions of rows with millions of columns. Columns can be grouped together in “column families” which allows physical distribution of row values onto different cluster nodes.
• Distributed and Scalable: HBase group rows into “regions” which define how table data is split over multiple nodes in a cluster. If a region gets too large, it is automatically split to share the load across more servers.
• Consistent: HBase is architected to have “strongly-consistent” reads and writes, as opposed to other NoSQL databases that are “eventually-consistent”. This means that once a write has been performed, all read requests for that data will return the same value.
2. Give the name of the key components of Hbase
The key components of Hbase are Zookeeper, RegionServer, Region, Catalog Tables and Hbase Master.
3. What is S3?
S3 stands for simple storage service and it is a one of the file system used by hbase.
4. What is the use of get() method?
get() method is used to read the data from the table.
5. What is the reason of using Hbase?
Hbase is used because it provides random read and write operations and it can perform number of operation per second on a large data sets.
6. In how many modes Hbase can run?
There are two run modes of Hbase i.e. standalone and distributed.
7. Define the difference between hive and hbase?
Hbase is used to support record level operations but hive does not support record level operations.
8. Define column families?
It is a collection of columns whereas row is a collection of column families.
9. Define standalone mode in Hbase?
It is a default mode of Hbase .In standalone mode, HBase does not use HDFS—it uses the local filesystem instead—and it runs all HBase daemons and a local ZooKeeper in the same JVM process.
10. What is decorating Filters?
It is useful to modify, or extend, the behavior of a filter to gain additional control over the returned data.