DWBI Cafe: 2013

Tuesday, 31 December 2013

Headstrong- Company Review

Pros - Had once been the destination of elite brains from top engineering and management colleges from India.Some of which have decided to stay back.Good development projects in healthcare and capital markets domain.A culture for learning of domain.Notice period 2 months.

Cons- The company has been exponentially degrading after Genpact acquisition. Most of the old staff have resigned . The company is loosing its brand name and vision to that of Genpact which has a very conservative approach.Will not suggest people to join the company.

iGATE Global Solution - Company Review

Pros - Excellent onsite opportunities at junior level.

Make sure you are clear on the account for which you are being hired. It is okay to ask for the account eventif you are a trainee and are being interviewed for Project. Make sure you land up in the Royal Bank of Canada account.It has maximum long term onsite opportunities provided you hang around for 2 years.Learn to lick some arse for quick results.Do not expect a change of account , once you join a project in a specific account you'll be there for eternity.

Friendly environment.- iGATE has a very friendly environment. The igate Rubber band rocks!!

You will have an opportunity to look into end to end implementation of systems here. Something which you might not get to see in big companies. Good technical learning.

Cons

Cheap corporate politics-exists everywhere.
Below industry standard compensation.

The key to thrive in igate is not only work, but be assertive. Managers are non-technical and depend on resources to fulfill the job. Go ahead and bully the managers, they are meek.If you are good at your job they will not mind pleasing you :)

Sunday, 22 December 2013

Lifecycle to Implement an Informatica Project

This can be divided into two parts:

1. Implementing Informatica Infrastructure.

2. Implementing Infomratica Solution at functional level.

a. If it is a DWH solution-Please refer Dimensional modeling tab to know how dimensional and fact tables are designed and implemented using Informatica

b. If it is a data integration the methodology will e covered in this section.

The following methodology has been given by Informatica Velocity, which id tailor made for data integration scenarios.

1. Manage

2. Analyze

3. Architect

4. Design

5. Build

6. Test

7. Deploy

8. Operate

1. Manage Phase- This phase lasts through the entire lifecycle of the project. Following activities shall be a part of this phase.

· Project Estimation

· Project Planning – Deciding the business scope of the project. i.e. A high level solution of the problem that is being addressed by the project.

· Establishing the roles in the project i.e. What kind of technical profile should the technical leads and project leads should have. How many resources shallbe in the technical team? What should be the number of years of experience team memebers should have and on which technologies?

· Build Business cases

· Project Closure

2. Analyze Phase- Following are the activities under this phase

· Define Business Requirements

· Define Business Scope- The proposed solution should be frozen here through a business sign off. Road maps for incremental delivery. Identification of source system. Data flow Diagrams.

· Define Functional Requirement- Creation and passing of the business requirement document.

· Determine Technical readiness.

· Perform Data Quality analysis of source data and report the analysis to business.

3. Architecture Phase- This is the phase where the actual action starts. It begins with the Solution architect analysis. The Solution architect should perform the following activities :

· Define Technical Requirement – What version of Informatica is to be used? What is the License requirement? What is the backend database used for metadata repository?

· Develop Logical/Physical View of the architecture- Through a box diagram, or Microsoft Visio diagram, show how the client and server are related logically? What ports are opened and if they are outbound or inbound?

· Configuration Recommendation and Estimation of the amount of data that will flow through informatica mappings.

· The Solution Architect will develop a Technical Design Document in this phase and will documents the findings in the TDD.

· Define Development Architecture- The development team will tell the number of folders they require in the informatica environment .The naming conventions of the mappings/workflows. Define a configuration management procedure for the code.

· Implement technical architechture – Install informatica client/server.

4. Design Phase

· Develop facts and dimension tables- Presentation layer

· Create source to target data store matrix.

· Design physical database design

· Source and Target connectivity check.

· Develop source and target relationships for impact analysis.

5. Build Phase- Launch the build phase. Develop reusable mappings, error handling strategies – i.e. error tables. Define a defect tracking process, conduct peer reviews and unit tests.

6. Test Phase-Prepare test cases, conduct system tests, integration tests and user acceptance test. Performance improvement and tuning comes under this phase as well.

7. Deployment Phase – Prepare a punch list document. This document will have the links to all the code that is uploaded to a storage location. The test or the prod resource will use this document to download and deploy the code in this environment. A runbook will have all the snapshots of the installation of the code. It is a reference document for system administration.

8. Operate Phase – Develop and operations manual and monitor jobs. Maintain the repository and do upgrades if required.

Saturday, 14 December 2013

Informatica Interview Experience : Accenture

Experience Level- 5 years - Team Lead/Technical lead.
Location- Bangalore
Date- Dec 2013

Please make sure to make a note of the number that the HR uses to communicate to you.

The HR scheduled a telephonic interview which lasted for 30 minutes.

Round -01- Technical Skills round

The person called from Accenture Hyderabad and asked questions around my experience.

1. Informatica Administrator Grid implementation
2. The kind of services to be configured required for IDQ - Model RS,Content Management RS and Data Integration Service.
3. Implementation of SCD-2. Full explanation on the call
4. How to use MD5 function in the expression while using dynamic lookup.
5. How does a lookup outperforms joiner- The dynamic lookup functionalty cannot be implemented by a Joiner.
6.Difference between PC 8.x and PC 9.x
7. Informatica connectivity with sales force. - Property of the license,Designer>Import Source >Salesforce object.

He questioned me for 30 minutes, did not waste a single second. You can ask for a feedback at the end of the interview.

Round 02- HR round

After my telephonic HR round I got a mail against my candidature ID saying I have cleared the technical round and shall soon be contacted for the HR round.

The HR called me up after 3 days for a quick HR discussion.He will capture basic details about your job , make sure ti answer the questions correctly.Tell the kind of role expectations you have from accenture.Do some research work about the kind of profile and package you should ask for.He told me that the HR team shall contact me shortly.

Still waiting to hear from the HR :D

Informatica interview Experience - Headstrong Bangalore

Experience Level- 2 years
Location- Bangalore
Date- Jan 2011

The interview call was prompt. There were two technical rounds and one Managerial round.

Technical Round 01- Conceptual and Scenario based.

Difference between Group by and Having
Unix shell scripting asked. ow to fetch the topmost and last row.
Informatica Scenarios asked.

Technical Round 02- Theorotical Informatica Concepts-Easy

Managerial Round- You need to sow that you are eager to join the company. Ask questions about the client and project. Headstrong Banagalore has some very good i-bank clients like Morgan Stanley,Wells Fargo,Gold Sachs (client location),Scotia and they are trying to grow their Healthcare vertical too.

Ask for 50% hike in package.

Tuesday, 10 December 2013

Informatica Interview Experience- J.P. Morgan

Experience Level- 2 years
Location- Bangalore
Date- Jan 2011

Round 1- Technical Telephonic interview.

1. Details of the project.
2. The person asked about the shell scripts that I used in the project.
3. I had to go on to the extent of explaining the algorithm of the shell scripts.
4. Why are they used? Their relevance to my project.
5. Functional aspect of the project.
6. I was asked if I will be comfortable with a junior designation since in J.P. people with 5 year exp are software engineers.

Round 2

The HR had invited me for the 2nd round F2F after 10 days, by that time I had joined headstrong.

Informatica Interview Experience- Fidelity Gurgaon

Location: Fidelity-Gurgaon
Event: Walk-In
Exp level 3-8 years.
Date: Sep 2012

Please read about the Fidelity-FLS(Gurgaon) it is not the same as Fidelity FMR (Bangalore)
The HR would expect you to know the difference.
Good procedure.The will contact you only through the placement agency.However , there was an instance where candidates were invited for a different technology and it was a bad experience for them.
Make sure that you are invited for your technology.

Round 1- Written round. 40 Questions spread across Informatica Powercenter development and transformations,Unix shell scripting and sql. Mainly transformation.Level was good , but not very tough. Straight forward questions.

Round 2- Takes some time - an HR and a Senior manager discussion

To get selected, you can say that you live in gurgaon or close by. Indicate that you dont have problem traveling to Gurgaon. They give cab facility so you can quote the same.
Be sharply dressed in formals, preferably wear a tie.
The HR will grill you, they are just testing your temperament.
They will like to know you on the personal front. Do not indicate if someone in your family is ill.

Psychological Check- How will you define the performance of 5 horses running in a race course.
They expect you to be analytical. Factors like the horses will tire after 1 lap.The horse at the back might overtake due to lesser wind resilience in 1st round.The climate and race-course conditions will be a determining factor.

What is the toughest situation you have faced in your career till now.?

Round 3- If you survive the above round a technical round will be conducted.Scenario based questionsIf you prepare from the scenarios available on the net, you can easily crack it.Only Informatica asked.The job-role will be more on the production support side.

They will not make the offer there and then and will get back to you.

Informatica Interview Experience- TCS

Location: TCS-Noida
Event: Walk-In
Exp level 3-8 years.
Date: Nov 2013

1. You need to register to the career site of TCS through the portal.
2. Take a printout and be at the mentioned venue.
3. You will be given a form to be filled up by the HR.
4.Quote expected salary to be 30% of you are getting. TCS is not a very good pay master.

Interview panel

The venue was quite mismanaged.There were in all three evaluators.
Two technical and 1 for soft skills and HR based questions.
Some questions were as follows:

1. OLTP vs OLAP ?
2. SCD 1,2,3
3.Difference between Surrogate key and Primary key (Answer-Surrogate key is independent of the data)
4.Scenario based questions (rows to columns conversion)
5.Explanation of your projects. (This is the main part where you can guide the interview)

Tuesday, 13 August 2013

Understanding Drill down analysis in Star Schema

Problem Statement: Assuming the users of the Auto Sales data warehouse is the Sales departments.

The Executives might want to do the analysis as follows:

1. How many Sedan class cars were sold in the Northern Zone for the year 2006?

2. Quarter wise Analysis of the above query.

3. Quarter and State (in the Zone) wise analysis of the above query.

4. Quantity of cars by make within the Sedan Class (e.g.: Swift, Maruti Dx etc.) by State (in the Zone) and by Quarter for the year 2006.

We see that in every query, the level of granularity increases.

Characteristics of Dimensional tables

1. Primary Key of the table uniquely identifies each row.

2. Large number of attributes.

3. Attributes are generally textual in nature.

4. Attributes in the dimensional table are not directly related to other attributes of the table.

5. Dimensional table is not Normalized or Low level of normalization (for efficient query performance, query should pick the attribute and directly go to fact table)

6. Compared to the fact table, it has fewer records.

7. Facilitates drilling down and rolling up of data since the data is hierarchal. E.g.: Year, Month, Day etc.

Characteristics of a Fact Table

1. Concatenated Key: A row in the fact table relates to a combination of rows from all dimensional tables.

2. Data Grain: Level of detail for the measurement or metrics.

3. Fully Additive Measures: The attributes, if can be summed up by simple addition are known as fully additive. When we run a query to aggregate the measures in a fact table, the output will be correct only, if the measures are fully additive in nature.

4. Semi additive Measures: These are attributes like percentage etc. which have been derived from the additive measures.

5. Fewer attributes but more rows compared to dimensional tables.

6. Sparse Data: for a particular combination of dimensional table, it might be the case that the fact table doesn’t have any data. E.g. : Form the month of February 2006, there might be no sale of Maruti Swift in Rajasthan, hence no entry would be present in the fact table This is an example of sparse data.

7. Degenerate Dimensions: When we are selecting facts and dimensions from the operational systems, there may be some attributes, which are neither measures (facts) nor belong to dimension tables. E.g.: invoice number, Order number. They are useful for analysis and kept in the fact table.

The Fact less fact table

Assuming that a fact table is made for recording the attendance of the students. Dimensions of this model would be student, course and time. If the student is present then it will be recorded as ‘1’ in the fact table. The presence of any entry in the fact table would represent that that the student is present and the fact hence would not contain any specific measure i.e. it is fact less. Such types of situations arise when the fact is measuring /recording events.

Conclusion

The entity relationship model is not suitable for Decision support System. Dimensional modeling is apt for designing Decision support systems as they facilitate more analysis. It not only is easy for users to understand but also optimizes navigation and is most suitable for query processing. Hierarchies within different dimensions facilitate Drill down and roll up analysis of data, providing more flexibility for analysis.

Principles of Dimensional Modeling

Overview

The requirements definition completely drives the data design for the data warehouse.

Data design consists of putting together the data structures. A group of data elements form a data structure. Logical data design includes determination of the various data elements that are needed and combination of the data elements into structures of data. Logical data design also includes establishing the relationships among the data structures. An essential component of this document is the set of information package diagrams. Remember that these are information matrices showing the metrics, business dimensions, and the hierarchies within individual business dimensions. The information package diagrams form the basis for the logical data design for the data warehouse. The data design process results in a dimensional data model.

So far we have formed the fact table and the dimension tables. How should these tables be arranged in the dimensional model? What are the relationships and how should we mark the relationships in the model? The dimensional model should primarily facilitate queries and analyses. What would be the types of queries and analyses? These would be queries and analyses where the metrics inside the fact table are analyzed across one or more dimensions using the dimension table attributes. Before we decide how to arrange the fact and dimension tables in our dimensional model and mark the relationships, let us go over what the dimensional model needs to achieve and what its purposes are. Here are some of the criteria for combining the tables into a dimensional model.

1. The model should provide the best data access.

2. The whole model must be query-centric.

3. It must be optimized for queries and analyses.

4. The model must show that the dimension tables interact with the fact table.

5. It should also be structured in such a way that every dimension can interact equally with the fact table.

The model should allow drilling down or rolling up along dimension hierarchies. With these requirements, we find that a dimensional model with the fact table in the middle and the dimension tables arranged around the fact table satisfies the conditions. In this arrangement, each of the dimension tables has a direct relationship with the fact table in the middle. This is necessary because every dimension table with its attributes must have an even chance of participating in a query to analyze the attributes in the fact table. Such an arrangement in the dimensional model looks like a star formation, with the fact table at the core of the star and the dimension tables along the spikes of the star. The dimensional model is therefore called a STAR schema.

FACT and DIMENSIONAL Tables in Start Schema representation

Case Study

Problem Statement:

Let us say that the goal is to analyze sales for an automobile company e.g. : Maruti Suzuki. We want to build a data warehouse that will allow the user to analyze automobile sales in a number of ways. The output information from the data warehouse shall facilitate the following types of analysis on the sales of the car.

1. How many Alto cars are sold in the state of Karnataka, during the marriage seasons?

2. Which car maximized the profits for the hatchback sections? Were these profits less than or more than compared to the last quarter?

3. Which locations shall more manufacturing units be setup to cater to the demands?

We shall try to identify the business dimensions, analyzing the problem statement.

1. Product :

a. What kind of cars Maruti Suzuki is manufacturing? Examples would be Maruti 800, Maruti Zen, Wagon R, SX4, Swift and Alto.

b. What would be the product line of each of these cars? – For instance, hatchback, sedan, SUV, Sports or Luxury. Where would each of the products fall into?

c. What are the colors (interior as well as exterior) in which each model is available in?

d. The first model year of each model?

2. Dealer :

a. Name of the Dealer?

b. Location of the dealer? i.e. State and City

c. Single Brand flag? Does the dealer sells only Maruti Suzuki cars exclusively or he sells cars of Toyota, Ford as well? This can be answered in Yes or No hence we use a flag.

d. Date of first operation?

3. Customer Demographics :

a. Name of the Customer?

b. Gender?

c. Income range?

d. Marital Status?

e. Vehicles owned?

4. Payment Method :

a. Finance Type – Bought on Loan or Paid fully?

b. Term of Loan in month.

c. Agent facilitating the finances

d. Interest rate of the loan

5. Time: This is a common dimension. It can be interpreted as follows in the context of the problem.

a. Date

b. Month

c. Quarter

d. Year

e. Day of month

f. Season –{Winter,Summer,Autumn,Spring}

g. Holiday Flag

But using these business dimensions, what exactly are the users analyzing? What numbers are they analyzing? The numbers the users analyze are the measurements or metrics that measure the success of their departments. These are the facts that indicate to the users how their departments are doing in fulfilling their departmental objectives. In the case of the automaker, these metrics relate to the sales. These are the numbers that tell the users about their performance in sales. These are numbers about the sale of each individual automobile. The set of meaningful and useful metrics for analyzing automobile sales is as follows:

1. Actual sale price - The price at which a particular was sold to the customer

2. MSRP sale price - Manufactured Suggested retail price of a product is the price which the manufacturer recommends that the retailer sell the product. The intention was to help to standardize prices among locations. While some stores always sell at, or below, the suggested retail price, others do so only when items are on sale or closeout/clearance.

3. Dealer add-ons

4. Dealer credits

5. Dealer invoice

6. Amount of down payment by the customer

7. Amount financed

Information package diagram

PRODUCT	DEALER	CUSTOMER DEMOGRAPHICS	PAYMENT METHOD	TIME
Model Name	Name of the dealer	Customer Name	Finance Type	Year
Product Line	Location of the dealer	Marital Status	Term (Months)	Quarter
Launching Year	Single brand flag	Gender	Agent Name	Month
Interior colors	Date of First operation	Income range	Interest Rate	Date
Exterior Colors		Vehicles owned	Holiday Flag	Season

FACTS : Actual Sale Price, MSRP Price, Dealer add-ons , Dealer credits, Dealer Invoice , Amount Financed , Amount of down payment , Quantity Sold

Information package: Automaker Sales

Dimensional Modeling: Basic Concepts

Difference between OLTP and Decision Support Systems

The comparison can be well understood by looking at the above figure which depicts a moving car.

The OLTP systems are operational systems, which participate in day to day activities of the business. If business is the Car, OLTP systems are the wheels of the car, which help it move by loading data in the dbases about a single entity or “make” the wheels of the car turn. Examples would be

1.       Take an order

2.       Process claim

3.       Make a shipment

4.       Generate an invoice

5.       Reserve an airline seat

The technological implementation would be a Java or .NET based web form at the front end, which is used as an input to store data in the backend which could be any dbase.

The Decision support system, on the other hand is more analytical in nature. With reference to our example, it would be “watching” the wheels of the car turn and predicting the behavior of car at different speeds and on different surfaces. In Business terminology, decision support systems would do the following:

1.       Show the top selling product for a particular product in a particular region

2.       Alert the manager, when a particular store sells a particular product, below a certain quantity.

Since the output data in the decision support systems is used for analytical purposes. It follows a different data model as compared to the traditional OLTP systems. This is called as Dimensional Modeling.

We do need different types of decision support systems to provide strategic information. The type of information needed for strategic decision making is different from that available from operational systems. We need a new type of system environment for the purpose of providing strategic information for analysis, discerning trends, and monitoring performance.

Let us examine the desirable features and processing requirements of this new type of system environment. Let us also consider the advantages of this type of system environment designed for strategic information.

A New Type of System Environment

The desired features of the new type of system environment are:

1. Database designed for analytical tasks - (achieved through Dimensional modeling)

2. Data from multiple applications- (achieved through ETL tools like Informatica, Abinitio)

3. Easy to use and conducive to long interactive sessions by users

4. Read-intensive data usage

5. Direct interaction with the system by the users without IT assistance (achieved through reporting tools like Cognos )

6. Content updated periodically and stable (achieved through Dimensional modeling)

7. Content to include current and historical data (achieved through Dimensional modeling)

8. Ability for users to run queries and get results online (achieved through reporting tools like Cognos )

9. Ability for users to initiate reports (achieved through reporting tools like Cognos )

This new system environment that users desperately need to obtain strategic information happens to be the new paradigm of data warehousing. Enterprises that are building data warehouses are actually building this new system environment. This new environment is kept separate from the system environment supporting the day-to-day operations. The data warehouse essentially holds the business intelligence for the enterprise to enable strategic decision making.

Defining Business Requirements: Key to a successful data warehouse

The new methodology for determining requirements for a data warehouse system is based on business dimensions. It flows out of the need of the users to base their analysis on business dimensions. The new concept incorporates the basic measurements and the business dimensions along which the users analyze these basic measurements. Using the new methodology, you come up with the measurements and the relevant dimensions that must be captured and kept in the data warehouse. You come up with what is known as an information package for the specific subject.

Our primary goal in the requirements definition phase is to compile information packages for all the subjects for the data warehouse. Once we have firmed up the information packages, we’ll be able to proceed to the other phases.

Essentially, information packages enable you to:

· Define the common subject areas

· Design key business metrics

· Decide how data must be presented

· Determine how users will aggregate or roll up

· Decide the data quantity for user analysis or query

· Decide how data will be accessed

The Requirement gathering phase involves extensive client interaction and interviews between the Vendor Team with the senior management, middle management, Business Analysts and the IT department of the client. Executives will give you a sense of direction and scope for your data warehouse. They are the ones closely involved in the focused area. The key departmental managers are the ones that report to the executives in the area of focus. Business analysts are the ones who prepare reports and analyses for the executives and managers. The operational system DBAs and IT applications staff will give you information about the data sources for the warehouse.

How to enable Version Control in Informatica 9.5

1. Log into the Admin Console
2. Click on the repository > Properties
3. Under repository Properties > Edit (Operating mode)

4. Choose the mode as Exclusive and enable Version Control as shown

Static-Deployment Groups

1. Used to migrate code between two different repositories only.
2. Cannot be used to migrate code between folders of the same repository
3. Login to Repository manager.
4. Click on the repository Service
5. Goto > Tools > Deployment > Groups

6. Create a Deployment group and add permissions
7. Click the repository object to be added to the deployment group.
8. Go to Tools > Add to Deployment group ..
9. Repeat the process for all repository objects across folders.
10. Drag and Drop the static deployment group to the target repository.

Informatica Installation on Linux- (Distributed Environment)

Challenges

1 Setting up of DISPLAY variable

2 Setting up of Environment variables for oracle client installation

3 Setting up of Kernel Parameters

4 Downloading oracle specific packages

5 Installing oracle client on the linux box

6 Making soft links to the shared objects

Risks

1 Firewalls of the Linux box need to be opened to install oracle specific packages using yum install <packagename>

2 Check if the RS needs to be re-assigned shared objects via soft links at Linux level every time the machine is re-started

DWBI Cafe