Sunday, 17 May 2015

Stories of Big Daddies of the industry

Larry Ellison (Oracle) - Never gradated from college, learning was primarily based on Self study. He just picked up a book and started programming.

Bill Gates (Microsoft) - Never completed college

Steven Jobs (Apple) - Epic Misfit!!

The reason I need to write the same in my blog, that every s/w engineer needs to remind himself that he is working in an industry where degrees and certifications do not matter.What matters is your intellect, competence and aptitude. It gives a fair platform to a private engineering college or a correspondence student to compete against the ones coming from the elites institutions.So if you are good enough you are bound to make it big.The question is -Do you have it in you?

Saturday, 2 May 2015

Difference between Delta Load and Incremental Load

Scenario:
We have completed the first load into the target table, and wish to implement the Delta load and the Incremental load.

  • Delta Data is the parent of Incremental Data.
  • What is Delta Data?
    • Assuming that in our scenario, the first load happened on 1-Jan-2015. The data present in the source on 2-Jan -2015 is the delta data. 
  • How to read delta data?
    • Add the date filter in the source query i.e. WHERE load_date>1-Jan-2015
  • What is incremental data?
    • The data that we have read , as delta data, needs to be compared with the target data for key columns. This is generally done through SCD(Slowly Changing Dimensions) algorithms. Incremental data is the changed or new data found while the delta and target data are compared via SCD algorithms.






Sunday, 5 April 2015

MSBI - SSIS,SSRS,SSAS - Interview experience-HCL

DWH and SSAS

1. How is Surrogate key different from Primary key? - It is independent of business data.
2. What is a conformed dimension? - Dimension common to more than one fact table
3. Whats is a degenerate dimension? - Dimension too small to be made into a table, hence included in fact. eg: Attendance, Boolean values.
4.Junk Dimension?- Grouping meaningful flags and texts in a single dimensional table.
5. Aggregate fact table use? fact table containing pre-calculated value,that feeds the report for faster results.
6. SCD? -
Slowly changing dimension 01 - update
Slowly changing dimension 02 - update and insert - History maintained
Slowly changing dimension 03 - First and Latest record only.
7. In a bottom up approach we always have a datamart at the bottom then the DWH and then OLAP at topmost.
8. What is a cube? - Data modelling technique to do multidimensional analysis.
9. What is a Hypercube?-To analyse more than 3 dimensions using a cube,the dimensions are merged representing single edge of the cube.Such cubes are called hyper cubes.
10.What is faster ROLAP or MOLAP? - MOLAP it is faster and is more complex as well.Supports Slice and Dice features which are not supported in ROLAP.
11.What is Slice and Dice feature of cube? - If each edge represents a single dimension of the cube, the page displayed is a slice or 2-dimensional plane of the cube.With each rotation the dimensions on the edges of the cube changes and so does the plane giving user the liberty to view and analyse the data from different angles.
12. Difference between relational model and dimensional model?

SSIS

1. What is the difference between Merge,Merge Join and Union all?
2. How can we trigger a SSRS report once a job in SSIS is completed? --sp_start_job
3. Name some native connections in SSIS? - ADO.NET,OLEDB,flatfiles,excel
4.Difference between Fuzzy lookup and lookup?
5. Container Types ={For each loop container,For loop container,Sequence container,task host}

SSRS
1. Difference between SSRS and Crystal reports.
2. How caching works in SSRS reports?
3. Deployment of SSRS in Native and SharePoint modes?
4. Is it safe to cache the report in report manager if we are required to save the credentials? - Yes as the credentials are starred.

Wednesday, 4 February 2015

Scrum Implementation : Live project experience

So we completed a 3 – Sprint DWH project for a FMCG US based company.
It was a roller coaster ride and we stretched every bit of our mental and physical skin to deliver.

Technical Streams: The project had the following technical streams

1.       Data Modeling – The data modelers started off with their part of analysis and prepared the logical data models of the Salesforce ERP source system.
a.       The analogous tables were physicalised in the Oracle relational dbse which was called as the first layer. The business key was being used as the primary key in this layer.
b.      The second layer was the staging layer where the dimensional model existed(ODS). Facts and dimensions were identified by the onsite teams and the same were incorporated in the logical data model and implemented in the physical data model in the 2nd layer of the DWH.Here the Surrogate key was the primary key called as the SID or the source identifier.
c.       The presentation layer (BIDW)was the third and the final layer which had the warehouse identifier (WID) as the primary key , it also had the SID.The tables were a mirror image of their siblings in the ODS table.

2.       ETL – Informatica 9.6 on Linux was being used for this project.

3.       Reporting – Cognos 10.1 clients were used.

DataWarehouse and Data Marts

Data Warehouse (Top – down approach)
  • -          Union of all the data mart
  • -          Data Received from staging area
  • -          Corporate enterprise wide data
  • -          Data Received from staging area

Data Marts (Bottom up approach)
  • -          Departmental view of data
  • -          Star Join (Facts and Dimension)
  • -          A single business process

Thursday, 21 August 2014

New Informatica Interview Questions

1.       Name some Informatica Application Services
·         Repository Service
·         Integration Service
·         Data Integration Service
·         Metadata repository service.
·         Analyst Service.
2.       What is a Service Manager?
·         The service manager is the main service of Informatica which is responsible for running of other application services. When we start the Informatica services on the node, we are actually starting the Service manager. Default port 6005.
3.       From where can we call SQL functions in Informatica.
·         Pre/Post session level SQL
·         Source Qualifier override.
4.       What is the minimum number of ports required to Install Informatica.
·         6000 to 6113.
5.       Name the environment variables required to be set at Informatica level.
·         TNS_ADMIN
·         ORACLE_HOME
·         ORACLE_BASE
·         LD_LIBRARY_PATH
·         NLS_LANG
6.       Can you implement High Availability Architecture without a grid?
·         Yes, by adding nodes as primary and backup nodes under the same domain.License should have HA property.
7.       How do you configure a WSDL connection at session level?
·         As an application connectionà WSDL consumer-àAuthentication type Basic.
8.       Should the client and server for Informatica , be of the same version?
·         Yes
9.       What is Target Load Plan?
·         Please study from the guide and understand that the load can be prioritized even if the pipes are originating from the same source.
10.   How to switch off/start Informatica services?
·         Informatica/9.5.1/tomcat/bin infaservice.sh startup/shutdown.
11.   Give an example of usage of shell scripts in pre-post session.
·         Used to FTP the flat file on a particular folder in the unix box from where informatica could read it.
·         Do a validation check, pre-post count check of the feed being processed.


Friday, 15 August 2014

Update Strategy with WSDL target

While importing the Web Service (consumer) as a target we need to specify that the WSDL will be for :

1. insert
2. delete- select delete multiple not used delete record
3. Update

and then use Update strategy with different targets for different kind of DML operations.