Datastage Interview Questions and Answers Part-5
41) Have you have ever worked in UNIX environment and why it is useful in Datastage?
Yes, I have worked in UNIX environment. This knowledge is useful in Datastage because sometimes one has to write UNIX programs such as batch programs to invoke batch processing etc.
42) Differentiate between Datastage and Datastage TX?
Datastage is a tool from ETL (Extract, Transform and Load) and Datastage TX is a tool from EAI (Enterprise Application Integration).
43) What is size of a transaction and an array means in a Datastage?
Transaction size means the number of row written before committing the records in a table. An array size means the number of rows written/read to or from the table respectively.
44) How many types of views are there in a Datastage Director?
There are three types of views in a Datastage Director i.e. Job View, Log View and Status View.
45) Why we use surrogate key?
In Datastage, we use Surrogate Key instead of unique key. Surrogate key is mostly used for retrieving data faster. It uses Index to perform the retrieval operation.
46) What is the use of node in data stage ? If we increase the nodes wt will happens?
Node is just a process. Its a logical thing used to increase the efficiency of the jobs by running them in parallel just like multi processing in operating system. Each node processes may run on the same processor or different processors.
47) How to identify updated records in datastage?
How can we identify updated records in datastage? Only updated records without having any row-id and date column available.
48) I have a sequence of job in datastage which is taking more than 4 hrs which is supposed to complete in less than 1 hr so what could be the possibilities to take much longer time than expected?
Check if any stage is reading/processing data sequentially which could have been done in parallel.
49) Having single input source want three different outputs.
I have a source file having data like: 10 10 10 20 20 20 30 30 40 40 50 60 70 i want three output from the above input file, these output would be: 1) having only unique records no duplicates should be there. Like: 10 20 30 40 50 60 70 2) having only duplicate records, no unique records should be there. Like: 10 10 10 20 20 20 30 30 40 40 3) only unique record should be present. Like: 50 60 70 how can i achieve this using datastage 8.5?
Sourcefile –> copy stage –> 1st link –> Removeduplicate stage –> outputfile1 with 10,20,30,40,50,60,70
Copy stage–>2nd link –> aggregator stage (creates the row count)–> filter stage–>filter1 (count>1) –>outputfile2 with 10,20,30,40 –>Filter2(count=1)–>outputfile3 with 50,60,70
50) I have 3 jobs a,b & c, which are dependent each other, I want to run a & C jobs daily and b job run only on Sunday. How can I do it?
Create 2 sequencers, in 1st sequence job A&c and in the 2nd sequence only B will be there. Schedule the 1st Sequence through director for daily run and schedule the 2nd to run only on sunday.