1. Joiner Transformation
Joiner transformation is an active and connected transformation in Informatica Cloud(IICS). It is used to join data coming from two different data streams based on the condition and the join type defined in the transformation. The advantage of joiner transformation is it is able to join data coming from heterogeneous(different) systems.
If there are n sources, the number of Joiner transformations required to join the data is n-1.
2. Joiner Transformation Properties
When you configure a Joiner transformation, you define Joiner properties on the following tabs of the Properties panel:
- Incoming Fields tab: Configure the Master and Detail source fields and resolve field name conflicts if any.
- Join Condition tab: Configure the Join Type and Join Conditions to join the data from Master and Detail pipelines
- Advanced tab: Configure advanced properties like Sorted Input from the advanced tab.
3. Incoming Fields in Joiner transformation
Joiner transformation consists of two default input groups – Master group and Default group. When you link a transformation to the Joiner transformation, you have to connect it either Master or Detail group.
During execution, the Master source is cached into the memory for joining purpose. So it is recommended to select the source with less number of records as the Master source. The Joiner transformation then compares each row of the Master source from cache against the Detail source.
When there are fields with same name coming from Master and Detail group, the cloud mapping designer generates a Field Name Conflict error. You can either resolve the conflict by renaming the fields in the upstream transformation itself or you can create a separate field rule in Joiner transformation to Bulk Rename fields by adding a prefix or a suffix to all incoming fields.
4. Join Condition in Joiner transformation
The Join Condition in Joiner transformation represents the condition upon which the rows coming from two sources must be matched. You can create one or more conditions which define the relationship between both the sources.
The join condition in the Joiner transformation can be completely parameterized for which values should be passed during the mapping run or while configuring the Mapping task.
5. Join Type in Joiner transformation
Joiner transformation supports four different types of joins
- Normal Join
Includes only rows with matching join conditions. Discards rows that do not match the join conditions.
- Master Outer
Includes all rows from the Detail pipeline and the matching rows from the Master pipeline. It discards the unmatched rows from the Master pipeline.
- Detail Outer
Includes all rows from the Master pipeline and the matching rows from the Detail pipeline. It discards the unmatched rows from the Detail pipeline.
- Full Outer
Includes rows with matching join conditions and all incoming data from the Master pipeline and Detail pipeline.
The Joiner transformation does not match null values. To join rows with null values, you can replace null values with default values, and then join on the default values.
6. Sorted Input in Joiner transformation
Under Advanced tab of Joiner transformation, Sorted Input option can be enabled which increases the performance. By enabling Sorted Input, we are specifying the Joiner transformation that the data coming from Master and Detail pipeline is sorted. Since the rows are sorted, the Master rows are not compared against all the Detail source rows minimizing the disk input and output.
If Sorted Input option is enabled and the data passed to Joiner transformation is not sorted, mapping throws error.
7. How Sorted Input improves performance in Joiner Transformation?
If the data is not sorted, even though the join condition is met in the initial rows, the master row is compared against all the detail source rows because there could be other rows from detail that might satisfy the join condition.
If the data is sorted, when a master row is compared against the details rows, all the detail rows which satisfy the join condition would be in order. Once after the join condition is met and the same master row is compared against subsequent rows which do not satisfy the condition, the Integration Service identifies that there won’t be any matching rows further. Hence it skips joining with the subsequent rows and passes the so far joined data to the downstream transformation decreasing the disk I/O there by increasing the performance.