Tuesday, 13 August 2013

Understanding Drill down analysis in Star Schema

Problem Statement:  Assuming the users of the Auto Sales data warehouse is the Sales departments.
The Executives might want to do the analysis as follows:
1.       How many Sedan class cars were sold in the Northern Zone for the year 2006?
2.       Quarter wise Analysis of the above query.
3.       Quarter and State (in the Zone) wise analysis of the above query.
4.       Quantity of cars by make within the Sedan Class (e.g.: Swift, Maruti Dx etc.) by State (in the Zone) and by Quarter for the year 2006.

We see that in every query, the level of granularity increases.


Characteristics of Dimensional tables

1.       Primary Key of the table uniquely identifies each row.
2.       Large number of attributes.
3.       Attributes are generally textual in nature.
4.       Attributes in the dimensional table are not directly related to other attributes of the table.
5.       Dimensional table is not Normalized or Low level of normalization (for efficient query performance, query should pick the attribute and directly go to fact table)
6.       Compared to the fact table, it has fewer records.
7.       Facilitates drilling down and rolling up of data since the data is hierarchal. E.g.: Year, Month, Day etc.

 Characteristics of a Fact Table

1.       Concatenated Key:  A row in the fact table relates to a combination of rows from all dimensional tables.
2.       Data Grain: Level of detail for the measurement or metrics.
3.       Fully Additive Measures: The attributes, if can be summed up by simple addition are known as fully additive. When we run a query to aggregate the measures in a fact table, the output will be correct only, if the measures are fully additive in nature.
4.       Semi additive Measures:  These are attributes like percentage etc. which have been derived from the additive measures.
5.       Fewer attributes but more rows compared to dimensional tables.
6.       Sparse Data: for a particular combination of dimensional table, it might be the case that the fact table doesn’t have any data. E.g. : Form the month of February 2006, there might be no sale of Maruti Swift in Rajasthan, hence no entry would be present in the fact table This is an example of sparse data.
7.       Degenerate Dimensions: When we are selecting facts and dimensions from the operational systems, there may be some attributes, which are neither measures (facts) nor belong to dimension tables. E.g.: invoice number, Order number. They are useful for analysis and kept in the fact table.

The Fact less fact table

Assuming that a fact table is made for recording the attendance of the students. Dimensions of this model would be student, course and time.  If the student is present then it will be recorded as ‘1’ in the fact table.  The presence of any entry in the fact table would represent that that the student is present and the fact hence would not contain any specific measure i.e. it is fact less. Such types of situations arise when the fact is measuring /recording events.

Conclusion

The entity relationship model is not suitable for Decision support System. Dimensional modeling is apt for designing Decision support systems as they facilitate more analysis. It not only is easy for users to understand but also optimizes navigation and is most suitable for query processing. Hierarchies within different dimensions facilitate Drill down and roll up analysis of data, providing more flexibility for analysis.

2 comments:

  1. Your blog is really useful for me, and I gathered some information from this blog. I did SAP Course in Chennai, at FITA Academy. It's really useful for me to make a bright career in IT industry.

    ReplyDelete
  2. Thanks rebeka

    ReplyDelete