1. Rank Transformation
Rank Transformation is an active and connected transformation in Informatica Cloud (IICS). It helps in identifying largest or smallest set of values in a group of data by filtering the data and ranking the values.
Let us understand how Rank transformation works with an example.
Problem Statement:
Consider we have EMPLOYEES table which holds the salary information of employees from multiple departments. The requirement is to find the employee details with highest salary from each department.
SQL Solution:
The above requirement can be fulfilled using DENSE_RANK Analytical function in SQL as shown below.
WITH CTE AS(
SELECT E.*,
DENSE_RANK() OVER (PARTITION BY DEPARTMENT_ID ORDER BY SALARY DESC) AS RANK_INDEX
FROM EMPLOYEES E
)
SELECT * FROM CTE WHERE RANK_INDEX =1
ORDER BY DEPARTMENT_ID, RANK_INDEX;
Let us understand how the same can be achieved using Rank transformation in Informatica Cloud.
2. Rank Transformation Properties
When you configure a Rank transformation, you define rank properties on the following tabs of the Properties panel:
- Rank tab: Configure the field to rank by, specify the rank order, and specify number of rows to rank by.
- Group By tab: Configure Group by Fields to define how to group data for ranking values.
3. Defining Rank Properties
The following properties should be configured in Rank tab of Rank transformation.
- Rank By: The field on which the data to be ranked (
SALARY
). - Rank Order: The order of ranking, Top or Bottom (
ORDER BY SALARY DESC
). - Number of Rows: The number of rows to be ranked from source data (
RANK_INDEX =1
).
The below image shows that the data is ranked on SALARY field to extract the top most record.
4. Defining Rank Group By Properties
The Group By tab in Rank transformation lets you define the fields on which the data to be grouped to apply the rank properties. Select one or more incoming fields as Group By Fields. When multiple fields are selected as Group By fields, the data grouping is done in the order in which the fields are selected.
The below image shows that the data is grouped based on the DEPARTMENT_ID field.
If no Group By field is configured,
the entire data is considered as one single set of data and ranking is performed.
5. RANKINDEX in Rank Transformation
The Rank transformation creates an output field named RANKINDEX to store the rank index value calculated for each row in a group. The RANKINDEX field appears on the Incoming Fields tab of the downstream transformation.
The below image shows the RANKINDEX field coming from the Rank transformation in the Incoming Fields of target transformation.
6. How Rank Transformation works?
Data Integration creates the following caches for the Rank transformation:
- Index Cache: Stores group values as configured in the Group By fields.
- Data Cache: Stores row data based on the Group By fields.
When you run a mapping that contains a Rank transformation, Data Integration creates an Index cache on Group By fields and data cache on Rank By field value. For every input row that out-ranks a cached row, Data Integration replaces the cached value with the input row value. If you configure the Rank transformation to rank across multiple groups, Data Integration ranks incrementally for each group that it finds.
For example, consider below employees data. Let us understand how data is processed through rank transformation to calculate max salary of a department.
EMPLOYEE_ID | SALARY | DEPARTMENT_ID |
101 | 12000 | 1001 |
102 | 14000 | 1001 |
103 | 10000 | 1001 |
104 | 13000 | 1002 |
105 | 11000 | 1002 |
When the initial row is processed through Rank Transformation, the Data Integration Service creates an Index cache on DEPARTMENT_ID 1001 and stores the MAX salary value of department as 12000 in data cache.
When the second row is processed through Rank Transformation, the Data Integration Service compares the existing cached value with current row value and overwrites the MAX salary value of department as 14000 in data cache.
When third row is processed through Rank Transformation, the Data Integration Service compares the existing cached value with current row value and as the current row value is less than cached value, the cached value in data cache 14000 is retained.
When the fourth row is processed through Rank Transformation, the Data Integration Service creates a new Index cache on DEPARTMENT_ID 1002 and the process continues.
When you run a mapping that contains a Rank transformation, Data Integration caches input data until it performs the rank calculations.
Subscribe to our Newsletter !!
Related Articles:
A complete guide to the Informatica Cloud (IICS) transformations and their functionalities.
Lookup transformation is used to look up a database table or a flat file based on a condition and retrieve value from the lookup.
Union Transformation in IICS is used to read data from multiple pipelines and merge data into a single pipeline.