Generate Hbase data file(StoreFile) using a custom mapreduce job) from the data source. The StoreFile is created in Hbase internal format which can be efficiently loaded.
The prepared file is imported using another tool like comletebulkload to import data into a running cluster. Each file gets loaded to one specific region.
32.How does Hbase provide high availability?
Hbase uses a feature called region replication. In this feature for each region of a table, there will be multiple replicas that are opened in different RegionServers. The Load Balancer ensures that the region replicas are not co-hosted in the same region servers.
33.what is HMaster?
The Hmaster is the Master server responsible for monitoring all RegionServer instances in the cluster and it is the interface for all metadata changes. In a distributed cluster, it runs on the Namenode.
34.What is HRegionServer in Hbase?
HRegionServer is the RegionServer implementation. It is responsible for serving and managing regions. In a distributed cluster, a RegionServer runs on a DataNode.
35.What are the different Block Caches in Hbase?
HBase provides two different BlockCache implementations: the default on-heap LruBlockCache and the BucketCache, which is (usually) off-heap.
36.How does WAL help when a RegionServer crashes?
The Write Ahead Log (WAL) records all changes to data in HBase, to file-based storage. if a RegionServer crashes or becomes unavailable before the MemStore is flushed, the WAL ensures that the changes to the data can be replayed.
37.Why MultiWAL is needed?
With a single WAL per RegionServer, the RegionServer must write to the WAL serially, because HDFS files must be sequential. This causes the WAL to be a performance bottleneck.
38.In Hbase what is log splitting?
When a region is edited, the edits in the WAL file which belong to that region need to be replayed. Therefore, edits in the WAL file must be grouped by region so that particular sets can be replayed to regenerate the data in a particular region. The process of grouping the WAL edits by region is called log splitting.
39.How can you disable WAL? What is the benefit?
WAL can be disabled to improve performance bottleneck.
This is done by calling the Hbase client field Mutation.writeToWAL(false).
40.When do we do manula Region splitting?
The manual region splitting is done we have an unexpected hotspot in your table because of many clients querying the same table.