Skip to Content, PHP, MySQL, Javascript, Ajax, Htacces

Hadoop Interview Questions and Answers Part-5

Be First!

hadoop41.What does “file could only be replicated to 0 nodes, instead of 1” mean?
The namenode does not have any available DataNodes.

42.What is a Combiner?
The Combiner is a ‘mini-reduce’ process which operates only on data generated by a mapper. The Combiner will receive as input all data emitted by the Mapper instances on a given node. The output from the Combiner is then sent to the Reducers, instead of the output from the Mappers

Consider case scenario: In M/R system, – HDFS block size is 64 MB
– Input format is FileInputFormat

– We have 3 files of size 64K, 65Mb and 127Mb

43.How many input splits will be made by Hadoop framework?

Hadoop will make 5 splits as follows −

– 1 split for 64K files
– 2 splits for 65MB files
– 2 splits for 127MB files

44.Suppose Hadoop spawned 100 tasks for a job and one of the task failed. What will Hadoop do?
It will restart the task again on some other TaskTracker and only if the task fails more than four ( the default setting and can be changed) times will it kill the job.

45.What are Problems with small files and HDFS?
HDFS is not good at handling large number of small files. Because every file, directory and block in HDFS is represented as an object in the namenode’s memory, each of which occupies approx 150 bytes So 10 million files, each using a block, would use about 3 gigabytes of memory. when we go for a billion files the memory requirement in namenode cannot be met.

46.What is speculative execution in Hadoop?
If a node appears to be running slow, the master node can redundantly execute another instance of the same task and first output will be taken .this process is called as Speculative execution.

47.Can Hadoop handle streaming data?
Yes, through Technologies like Apache Kafka, Apache Flume, and Apache Spark it is possible to do large-scale streaming.

48.Why is Checkpointing Important in Hadoop?
As more and more files are added the namenode creates large edit logs. Which can substantially delay NameNode startup as the NameNode reapplies all the edits. Checkpointing is a process that takes an fsimage and edit log and compacts them into a new fsimage. This way, instead of replaying a potentially unbounded edit log, the NameNode can load the final in-memory state directly from the fsimage. This is a far more efficient operation and reduces NameNode startup time.

49.What is Twitter Bootstrap?
Bootstrap is a sleek, intuitive, and powerful mobile first front-end framework for faster and easier web development. It uses HTML, CSS and Javascript.

50.What is Bootstrap Grid System?
Bootstrap includes a responsive, mobile first fluid grid system that appropriately scales up to 12 columns as the device or viewport size increases. It includes predefined classes for easy layout options, as well as powerful mixins for generating more semantic layouts.



Leave a Reply