DataWareHouse Interview Questions and Answers For Freshers Part-2
11.What is junk dimension?
• In scenarios where certain data may not be appropriate to store in the schema, this data (or attributes) can be stored in a junk dimension. The nature of data of junk dimension is usually Boolean or flag values.
• A single dimension is formed by lumping a number of small dimensions. This dimension is called a junk dimension. Junk dimension has unrelated attributes. The process of grouping random flags and text attributes in dimension by transmitting them to a distinguished sub dimension is related to junk dimension.
12.What are the different types of SCD’s used in data warehousing?
SCD (Slowly changing dimensions), are the dimensions in which the data changes slowly, rather than changing regularly on a time basis.
Three types of SCDs are used in data warehousing, which are defined as:
– SCD1: It is a record that is used to replace the original record even there is only one record existing in the database. The current data will be replaced and the new data will take its place.
– SCD2: It is the new record file that is added to the dimension table. This record exists in the database with the current data and previous data that is stored in the history.
– SCD3: This uses the original data that is modified to the new data. This consists of two records: one record that exist in the database and another record that will replace the old database record with the new information.
13.Which one is faster, Multidimensional OLAP or Relational OLAP?
Multidimensional OLAP is faster than Relational OLAP.
MOLAP: Multi-dimensional OLAP
Data is stored in a multidimensional cube. The storage is not in the relational database, but in proprietary formats (one example is PowerOLAP’s .olp file). MOLAP products can be compatible with Excel, which can make data interactions easy to learn.
ROLAP: Relational OLAP
ROLAP products access a relational database by using SQL (structured query language), which is the standard language that is used to define and manipulate data in an RDBMS. Subsequent processing may occur in the RDBMS or within a mid-tier server, which accepts requests from clients, translates them into SQL statements, and passes them on to the RDBMS.
14.What is Hybrid SCD?
Hybrid SCDs are a combination of both SCD 1 and SCD 2.
It may happen that in a table, some columns are important and we need to track changes for them i.e., capture the historical data for them whereas in some columns even if the data changes, we do not have to bother.
For such tables, we implement Hybrid SCDs, where in some columns are Type 1 and some are Type 2.
15.Why do we override the execute method is struts?
As part of Struts Framework, we can develop the Action Servlet, ActionForm servlets (ActionServlet means what class extends the Action class, and ActionForm means what class extends the Action Form class) and other servlet classes.
In case of ActionForm class, we can develop validate() method. This method will return the ActionErrors object. In this method we can write the validation code. If this method returns null or ActionErrors with size=0, the web container will call execute() as part of the Action class.
• If it returns size > 0, it will not call the execute() method. It will rather execute the jsp, servlet or html file as value for the input attribute as part of the attribute in struts-config.xml file.
16.What is VLDB?
A very large database, or VLDB, is a database that contains an extremely large number of tuples (database rows), or occupies an extremely large physical file system storage space. A one terabyte database would normally be considered to be a VLDB.
17.How do you load the time dimension?
Time dimensions are usually loaded by a program that loops through all possible dates appearing in the data. It is not unusual for 100 years to be represented in a time dimension, with one row per day.
18.What is conformed fact?
• Conformed dimensions are the dimensions which can be used across multiple Data Marts in combination with multiple facts tables accordingly.
• A conformed dimension is a dimension that has exactly the same meaning and content when being referred from different fact tables. A conformed dimension can refer to multiple tables in multiple data marts within the same organization.
19.What is the main difference between Inmon and Kimball philosophies of data warehousing?
Both differ in the concept of building the data warehouse.
• Kimball views data warehousing as a constituency of Data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence, a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level.
• Inmon explains in creating a data warehouse on a subject-by-subject area basis. Hence, the development of the data warehouse can start with data from the online store. Other subject areas can be added to the data warehouse as their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary.
• Hence, Kimball–First Data Marts–Combined way —Data warehouse
Inmon—First Data warehouse–Later—-Data marts