Rank Transformation in Informatica Cloud (IICS)

Spread the love

1. Rank Transformation

Rank Transformation is an active and connected transformation in Informatica Cloud (IICS). It helps in identifying largest or smallest set of values in a group of data by filtering the data and ranking the values.

Let us understand how Rank transformation works with an example.

Problem Statement:

Consider we have EMPLOYEES table which holds the salary information of employees from multiple departments. The requirement is to find the employee details with highest salary from each department.

SQL Solution:

The above requirement can be fulfilled using DENSE_RANK Analytical function in SQL as shown below.

WITH CTE AS(
SELECT E.*,
  DENSE_RANK() OVER (PARTITION BY DEPARTMENT_ID ORDER BY SALARY DESC) AS RANK_INDEX
FROM EMPLOYEES E
)
SELECT * FROM CTE WHERE RANK_INDEX =1
ORDER BY DEPARTMENT_ID, RANK_INDEX;

Let us understand how the same can be achieved using Rank transformation in Informatica Cloud.

2. Rank Transformation Properties

When you configure a Rank transformation, you define rank properties on the following tabs of the Properties panel:

  • Rank tab: Configure the field to rank by, specify the rank order, and specify number of rows to rank by.
  • Group By tab: Configure Group by Fields to define how to group data for ranking values.

3. Defining Rank Properties

The following properties should be configured in Rank tab of Rank transformation.

  • Rank By: The field on which the data to be ranked (SALARY).
  • Rank Order: The order of ranking, Top or Bottom (ORDER BY SALARY DESC).
  • Number of Rows: The number of rows to be ranked from source data (RANK_INDEX =1).

The below image shows that the data is ranked on SALARY field to extract the top most record.

Defining Rank Properties
Defining Rank Properties

4. Defining Rank Group By Properties

The Group By tab in Rank transformation lets you define the fields on which the data to be grouped to apply the rank properties. Select one or more incoming fields as Group By Fields. When multiple fields are selected as Group By fields, the data grouping is done in the order in which the fields are selected.

The below image shows that the data is grouped based on the DEPARTMENT_ID field.

Defining Group By Fields in Rank Transformation
Defining Group By Fields in Rank Transformation

If no Group By field is configured,
the entire data is considered as one single set of data and ranking is performed.

5. RANKINDEX in Rank Transformation

The Rank transformation creates an output field named RANKINDEX to store the rank index value calculated for each row in a group. The RANKINDEX field appears on the Incoming Fields tab of the downstream transformation.

The below image shows the RANKINDEX field coming from the Rank transformation in the Incoming Fields of target transformation.

RANKINDEX field in Rank Transformation
RANKINDEX field in Rank Transformation

6. How Rank Transformation works?

Data Integration creates the following caches for the Rank transformation:

  • Index Cache: Stores group values as configured in the Group By fields.
  • Data Cache: Stores row data based on the Group By fields.

When you run a mapping that contains a Rank transformation, Data Integration creates an Index cache on Group By fields and data cache on Rank By field value. For every input row that out-ranks a cached row, Data Integration replaces the cached value with the input row value. If you configure the Rank transformation to rank across multiple groups, Data Integration ranks incrementally for each group that it finds.

For example, consider below employees data. Let us understand how data is processed through rank transformation to calculate max salary of a department.

EMPLOYEE_IDSALARYDEPARTMENT_ID
101120001001
102140001001
103100001001
104130001002
105110001002

When the initial row is processed through Rank Transformation, the Data Integration Service creates an Index cache on DEPARTMENT_ID 1001 and stores the MAX salary value of department as 12000 in data cache.

When the second row is processed through Rank Transformation, the Data Integration Service compares the existing cached value with current row value and overwrites the MAX salary value of department as 14000 in data cache.

When third row is processed through Rank Transformation, the Data Integration Service compares the existing cached value with current row value and as the current row value is less than cached value, the cached value in data cache 14000 is retained.

When the fourth row is processed through Rank Transformation, the Data Integration Service creates a new Index cache on DEPARTMENT_ID 1002 and the process continues.

When you run a mapping that contains a Rank transformation, Data Integration caches input data until it performs the rank calculations.

Subscribe to our Newsletter !!

Related Articles:

  • Informatica Cloud Data Integration (IICS) Transformations Guide

    A complete guide to the Informatica Cloud (IICS) transformations and their functionalities.

    READ MORE

  • Lookup Transformation in Informatica Cloud (IICS)

    Lookup transformation is used to look up a database table or a flat file based on a condition and retrieve value from the lookup.

    READ MORE

  • Union Transformation in Informatica Cloud (IICS)

    Union Transformation in IICS is used to read data from multiple pipelines and merge data into a single pipeline.

    READ MORE

Leave a Comment

Related Posts