Hadoop Interview Questions and Answers Part-4
hadoop fs -setrep -w 2 apache_hadoop/sample.txt
32.What is rack awareness?
Rack awareness is the way in which the namenode decides how to place blocks based on the rack definitions Hadoop will try to minimize the network traffic between datanodes within the same rack and will only contact remote racks if it has to. The namenode is able to control this due to rack awareness.
33.Which file does the Hadoop-core configuration?
Is there a hdfs command to see available free space in hdfs
hadoop dfsadmin -report
34.The requirement is to add a new data node to a running Hadoop cluster; how do I start services on just one data node?
You do not need to shutdown and/or restart the entire cluster in this case.
First, add the new node’s DNS name to the conf/slaves file on the master node.
Then log in to the new slave node and execute −
$ cd path/to/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker
then issuehadoop dfsadmin -refreshNodes and hadoop mradmin -refreshNodes so that the NameNode and JobTracker know of the additional node that has been added.
35.How do you gracefully stop a running job?
Hadoop job –kill jobid
36.Does the name-node stay in safe mode till all under-replicated files are fully replicated?
No. During safe mode replication of blocks is prohibited. The name-node awaits when all or majority of data-nodes report their blocks.
37.What happens if one Hadoop client renames a file or a directory containing this file while another client is still writing into it?
A file will appear in the name space as soon as it is created. If a writer is writing to a file and another client renames either the file itself or any of its path components, then the original writer will get an IOException either when it finishes writing to the current block or when it closes the file.
38.How to make a large cluster smaller by taking out some of the nodes?
Hadoop offers the decommission feature to retire a set of existing data-nodes. The nodes to be retired should be included into the exclude file, and the exclude file name should be specified as a configuration parameter dfs.hosts.exclude.
The decommission process can be terminated at any time by editing the configuration or the exclude files and repeating the -refreshNodes command
39.Can we search for files using wildcards?
Yes. For example, to list all the files which begin with the letter a, you could use the ls command with the * wildcard &minu;
hdfs dfs –ls a*
40.What happens when two clients try to write into the same HDFS file?
HDFS supports exclusive writes only.
When the first client contacts the name-node to open the file for writing, the name-node grants a lease to the client to create this file. When the second client tries to open the same file for writing, the name-node will see that the lease for the file is already granted to another client, and will reject the open request for the second client