Friday, 17 July 2015

Difference between Kimball and Inmon approach

The Inmon approach is said to have the Hub and Spoke architecture. The primary feature of the same is the Enterprise Data Warehouse (EDW) which is maintained as a 3NF database. It is slightly different from the Operational database as it is:

1. Subject Oriented
2. Non-volatile - Data doesn't change . Soft delete is present by the use of timestamping.
3. Time variant - Maintains historical data
4. Integrated.

It does not have fact or dimensions, and has a set of functional data marts pulling data out of it.

OLTP  > EDW (3N) >Datamart (3N) >OLAP Cube

The Kimball approach is said to have the Bus Architecture. It contains set of stars i.e de-normalized groups of fact and dimension tables connected via conformed dimension (dimensions common to two facts)

OLTP  > Star Schema (Dimensional Model) >OLAP Cube

Sunday, 12 July 2015

Definition of a DWH

A DWH is a

1. Subject Oriented i.e. Has the power to mould the transactional data around a particular subject.If we want to answer questions around the sales of a particular product, the DWH would be around the Subject-sales.etc.
2. Integrated i.e. Integrates data from various sources and keeps in form of dimensional model.
3. Non-volatile i.e. -Data once entered in DWH does not change.
4. Time variant - DWH maintains historical data.

collection of data.

Why is a DWH required in an organization?

1.       Data is an asset and has to be retained as per organizational needs.DWH helps us in maintaining          large volumes of data.
2.       To reduce IT involvement in report creation and helps the business to custom build the reports.
3.       MDM is used to maintain the single version of the truth.

Categories of Enterprise Data

1.    Master data -   Business entities of the organization.Data revolves around these business entities. Customer,Employee etc. This is maintained as part of the MDM solution.

2.       Transactional Data – Generated as part of day to day business processes. Also known as the OLTP system Online transactional processing systems)

3.       Analytical Data – Typically related to Key Performance Indicators (KPIs).Data Warehousing systems are used to store and retrieve analytical data.

4.       Meta Data-  Data about data.Describes IT systems, Business rules etc.

5.       Reference Data- Static data elements, sometime transient like currency ,industry standards. A reference data hub is maintained for transaction processing.


6.       Un-Structured data- All the above mentioned data would be classified as structured data.Examples of unstructured data would be Emails,Social Media data i.e. Big Data. Hadoop is a platform used to manage un-structured data.