Chapter-5
1. Introduction
The Normalizer transformation is used to normalize data i.e. convert one incoming row into multiple output rows. But what if you need to do the exact opposite?
Denormalizing data in Informatica is exact opposite what Normalizer transformation does where data is read in multiple incoming rows, aggregated and converted into columns. In order to Denormalize the data, there is no pre-built transformation available in Informatica Cloud and it needs to be implemented using a mapping logic.
2. Problem Statement
Consider below Student data as the source table
Source Data:
ID | Name | Subject | Score |
1 | Ram | Maths | 50 |
1 | Ram | Physics | 65 |
1 | Ram | Chemistry | 70 |
2 | Neena | Maths | 88 |
2 | Neena | Physics | 90 |
2 | Neena | Chemistry | 65 |
3 | John | Maths | 100 |
3 | John | Physics | 45 |
3 | John | Chemistry | 52 |
You want to denormalize the data into below structure
Target Data:
ID | Name | Maths | Physics | Chemistry |
1 | Ram | 50 | 65 | 70 |
2 | Neena | 88 | 90 | 65 |
3 | John | 100 | 45 | 52 |
3. Solution
To accomplish this follow below steps
1 Select the object with above mentioned data as source and pass the data to an expression transformation.
2. Make sure the data is sorted else sort data using Sorter transformation before passing it to expression transformation.
3. In the expression create an output field for each subject in the source data and assign the value as Score when the record belongs to that particular subject, else assign value as null.
O_Maths | IIF(Subject = ‘Maths’, Score, NULL) |
O_Physics | IIF(Subject = ‘Physics’, Score, NULL) |
O_Chemistry | IIF(Subject = ‘Chemistry’, Score, NULL) |
The output from expression will be as below
ID | Name | O_Maths | O_Physics | O_Chemistry |
1 | Ram | 50 | ||
1 | Ram | 65 | ||
1 | Ram | 70 | ||
2 | Neena | 88 | ||
2 | Neena | 90 | ||
2 | Neena | 65 | ||
3 | John | 100 | ||
3 | John | 45 | ||
3 | John | 52 |
4. In the next step pass the data from expression transformation to an Aggregator transformation.
5. In the Group By tab of aggregator, select ID and Name fields as Group by fields.
6. In the Aggregate tab of aggregator, create below output fields.
Maths | MAX(O_Maths) |
Physics | MAX(O_Physics) |
Chemistry | MAX(O_Chemistry) |
7. Enable Sorted Input in Advanced tab for a better performance in aggregator.
8. Pass the data from aggregator transformation to a target transformation and configure the target.
4. Approach
- When we compare the source data and the target data, it is evident that some of the fields in target (Maths, Physics, and Chemistry) are not present in the source transformation. So it is clear that these fields’ needs to be created using an expression transformation and the values to be assigned as per the subject.
- The output from expression transformation gives same number of rows as received from source with newly created fields where Score for each subject is present in only one row with rest of the subject scores as NULL.
- The introduction of Aggregator transformation is used to combine these records of each student into one record. Hence they are grouped by ID and Name fields. Depending on source data, either Max() or Sum() functions can be used to aggregate the data.
5. How to Denormalize data in Oracle using a SQL query?
The same approach that we applied in Informatica can be applied to denormalize the data in Oracle using a SQL query.
Consider the same source data with table name as Students. The students data can be Denormalized using below SQL query.
SELECT
a.ID,
a.Name,
MAX(a.Maths) AS Maths,
MAX(a.Physics) AS Physics,
MAX(a.Chemistry) AS Chemistry
FROM(
SELECT
ID,
NAME,
CASE WHEN Subject = 'Maths' THEN Score ELSE null END AS Maths,
CASE WHEN Subject = 'Physics' THEN Score Else null END AS Physics,
CASE WHEN Subject = 'Chemistry' THEN Score Else null END AS Chemistry
FROM Students) a
GROUP BY ID, Name
6. Denormalizing data using PIVOT function
Alternatively you can use PIVOT function in Oracle to denormalize the data to convert the rows into the columns. The PIVOT functions takes data in separate rows, aggregates it and converts it into columns.
The SQL query using PIVOT function to denormalize the data in Oracle is as below
SELECT * FROM Students
PIVOT (MAX(Score) FOR Subject IN ('Maths' AS Maths, 'Physics' AS Physics,'Chemistry' AS Chemistry))
ORDER BY ID
7. Conclusion
There is another way of viewing the denormalizing data where all the column values are concatenated using a delimiter and stored in a single column. This can be achieved by comparing current record with previous record and the final record for each ID will have the concatenated values of all the records with the same ID. This final record can be later filtered out using Aggregator. The method we discussed here in this article is different as we are converting the row values into separate columns instead of loading the concatenated field values into a single column.
The same solution can be implemented in a little shorter way:
For Informatica mapping:
1. No need to introduce Expression transformation, because Aggregator transformation supports conditional aggregation.
2. Create following 3 ports at Aggregator transformation (select ID and Name fields as Group by fields):
Maths —-> MAX(Score, Subject=‘Maths’)
Physics —> MAX(Score, Subject=‘Physics’)
Chemistry —> MAX(Score, Subject=‘Chemistry’)
And map these ports to the Target columns.
Similarly, in SQL query:
SELECT
ID,
Name,
MAX(CASE WHEN Subject = ‘Maths’ THEN Score END) AS Maths,
MAX(CASE WHEN Subject = ‘Physics’ THEN Score END) AS Physics,
MAX(CASE WHEN Subject = ‘Chemistry’ THEN Score END) AS Chemistry
FROM Students
GROUP BY ID, Name
Hi Sachin,
Your solution is correct too. But think from the perspective of explaining some one new to concept.
Breaking down each step using expression transformation helps visualize data flow and understand better.
Once the concept is understood, it is up to you to optimize the code.
Thats exactly y ThinkETL is best
Best wishes !!
Thanks ThinkETL team for providing the detailed information.
Thank Abhijit. Glad it helped!!