1.What is the role of JDBC driver in a Sqoop set up?
To connect to different relational databases sqoop needs a connector. Almost every DB vendor makes this connecter available as a JDBC driver which is specific to that DB. So Sqoop needs the JDBC driver of each of the database it needs to inetract with.
2.Is JDBC driver enough to connect sqoop to the databases?
No. Sqoop needs both JDBC and connector to connect to a database.
3.When to use –target-dir and when to use –warehouse-dir while importing data?
To specify a particular directory in HDFS use –target-dir but to specify the parent directory of all the sqoop jobs use –warehouse-dir. In this case under the parent directory sqoop will cerate a directory with the same name as th e table.
4.How can you import only a subset of rows form a table?
By using the WHERE clause in the sqoop import statement we can import only a subset of rows.
5.How can we import a subset of rows from a table without using the where clause?
We can run a filtering query on the database and save the result to a temporary table in database.
Then use the sqoop import command without using the –where clause
6.What is the advantage of using –password-file rather than -P option while preventing the display of password in the sqoop import statement?
The –password-file option can be used inside a sqoop script while the -P option reads from standard input , preventing automation.
7.What is the default extension of the files produced from a sqoop import using the –compress parameter?
8.What is the significance of using –compress-codec parameter?
To get the out file of a sqoop import in formats other than .gz like .bz2 we use the –compress -code parameter.
9.What is a disadvantage of using –direct parameter for faster data load by sqoop?
The native utilities used by databases to support faster laod do not work for binary data formats like SequenceFile
10.How can you control the number of mappers used by the sqoop command?
The Parameter –num-mapers is used to control the number of mappers executed by a sqoop command. We should start with choosing a small number of map tasks and then gradually scale up as choosing high number of mappers initially may slow down the performance on the database side.