Hadoop Interview Questions and Answers Part-1

1.What does ‘jps’ command do?
It gives the status of the deamons which run Hadoop cluster. It gives the output mentioning the status of namenode, datanode , secondary namenode, Jobtracker and Task tracker.

2.How to restart Namenode?
Step-1. Click on stop-all.sh and then click on start-all.sh OR

Step-2. Write sudo hdfs (press enter), su-hdfs (press enter), /etc/init.d/ha (press enter) and then /etc/init.d/hadoop-0.20-namenode start (press enter).

3.Which are the three modes in which Hadoop can be run?
The three modes in which Hadoop can be run are −

standalone (local) mode
Pseudo-distributed mode
Fully distributed mode
What does /etc /init.d do?
/etc /init.d specifies where daemons (services) are placed or to see the status of these daemons. It is very LINUX specific, and nothing to do with Hadoop.

4.What if a Namenode has no data?
It cannot be part of the Hadoop cluster.

5.What happens to job tracker when Namenode is down?
When Namenode is down, your cluster is OFF, this is because Namenode is the single point of failure in HDFS.

6.What is Big Data?
Big Data is nothing but an assortment of such a huge and complex data that it becomes very tedious to capture, store, process, retrieve and analyze it with the help of on-hand database management tools or traditional data processing techniques.

7.What are the four characteristics of Big Data?
the three characteristics of Big Data are −
Volume − Facebook generating 500+ terabytes of data per day.

Velocity − Analyzing 2 million records each day to identify the reason for losses.

Variety − images, audio, video, sensor data, log files, etc. Veracity: biases, noise and abnormality in data

8.How is analysis of Big Data useful for organizations?
Effective analysis of Big Data provides a lot of business advantage as organizations will learn which areas to focus on and which areas are less important. Big data analysis provides some early key indicators that can prevent the company from a huge loss or help in grasping a great opportunity with open hands! A precise analysis of Big Data helps in decision making! For instance, nowadays people rely so much on Facebook and Twitter before buying any product or service. All thanks to the Big Data explosion.

9.Why do we need Hadoop?
Everyday a large amount of unstructured data is getting dumped into our machines. The major challenge is not to store large data sets in our systems but to retrieve and analyze the big data in the organizations, that too data present in different machines at different locations. In this situation a necessity for Hadoop arises. Hadoop has the ability to analyze the data present in different machines at different locations very quickly and in a very cost effective way. It uses the concept of MapReduce which enables it to divide the query into small parts and process them in parallel. This is also known as parallel computing. The following link Why Hadoop gives a detailed explanation about why Hadoop is gaining so much popularity!

10.What is the basic difference between traditional RDBMS and Hadoop?
Traditional RDBMS is used for transactional systems to report and archive the data, whereas Hadoop is an approach to store huge amount of data in the distributed file system and process it. RDBMS will be useful when you want to seek one record from Big data, whereas, Hadoop will be useful when you want Big data in one shot and perform analysis on that later

Post Views: 1,116