What is Slowly Changing Dimension (SCD) Type-2?
A Slowly Changing Dimension Type-2 is used to maintain complete history of a record in the target. When the value of chosen attribute(s) of the record changes, the record is made inactive and a new record is created with modified data values and is considered as active record.
The implementation of the SCD Type-2 mapping in Informatica Cloud involves
- Identifying the new record and inserting it.
- Identifying the existing record and making it INACTIVE if there is a change in the data. The record with updated data is inserted as a new record in the table and is considered ACTIVE.
- Identifying the existing record and dropping it in the mapping if there is no change in data.
Setting up Source and Target data objects for Demonstration
For the demonstration purpose consider a flat file employees.csv as a source which provides employees information.
EMP_ID,EMP_NAME,EMP_SALARY,DEPT_ID
100,Jennifer,6000,10
101,Micheal,8000,10
The data should be loaded into the target table Dim_Employees. The structure of the table is as follows.
CREATE TABLE Dim_Employees
(
Employee_Key Number(6,0),
Employee_Id Number(6,0),
Name Varchar2(20),
Salary Number(8,2),
Department_Id Number(4,0),
Start_Date Date,
End_Date Date,
Checksum Varchar2(50)
)
;
- Employee_Key is the surrogate key which increments the value plus one for each record inserted into the table.
- Every record has a CHECKSUM value which is unique sequence representing the data of the record. For any change in the data, the checksum value differs and helps in identifying if there is any change in the record data.
- For every record which is inserted, Start_Date field is loaded with system date value and the End_Date field is loaded with maximum date value which is ‘9999-12-31’ to represent it as an ACTIVE record.
- When the record data is modified, the End_Date field of the record is updated with system date value to make it INACTIVE.
Steps to Create SCD Type-2 Mapping
The below mapping illustrates the design of SCD Type-2 implementation in Informatica Cloud Data Integration.
Follow below steps to create a SCD Type-2 mapping in Informatica Cloud.
1. Select the Source Object
In the source transformation, select the employees.csv as the source object. Under Formatting Options, select the delimiter and other flat file related properties. Verify if the source fields are properly read under Fields section.
2. Create Output Fields and Checksum Value
Pass the data from source transformation to the expression transformation.
- All the fields read from flat file will be of type String. These fields should be converted to the respective data types defined as in the target dimension table.
- Calculate the Checksum value using a MD5 function by passing the required source fields in the expression transformation.
O_CHECKSUM = MD5(EMP_ID||EMP_NAME||EMP_SALARY||DEPT_ID)
3. Look up on Target Object based on Natural Key
Pass the data from expression to the Lookup transformation. Select only the active records from Dim_Employees table using a SQL query as shown below.
SELECT
Employee_Key AS Lkp_Employee_Key ,
Employee_Id AS Lkp_Employee_Id ,
Checksum AS Lkp_Checksum
FROM Dim_Employees
Where End_Date='9999-12-31'
Under Lookup Condition tab, select the condition based on employee id fields from source and lookup objects as shown below.
4. Flag data based on output from Lookup
Based on output from lookup transformation, the data needs to be flagged for either Insert or Insert-Update.
- If there is no matching record found for the employee_id read from source in the lookup object, flag the record for Insert.
- If there is a matching record found for the employee_id read from source in the lookup object and the data is changed, flag the record for Insert-Update.
- If there is a matching record found for the employee_id read from source in the lookup object and the data is unchanged, ignore it.
Pass the data from Lookup to an Expression transformation and create an output field Flag and assign the below field expression to flag the records.
IIF( ISNULL( Lkp_Employee_Key ), 'I',
IIF( O_Checksum != Lkp_Checksum, 'IU' )
)
5. Route data to different targets based on Flag
Create two output groups in the Router transformation to route the date to two different targets based on the flag value for Insert (Flag=’I’) and Insert_Update (Flag=’IU’) operations as shown below.
Route the data from Router transformation to three different expression transformations one from Insert output group and two from Insert_Update output group before mapping to target.
6. Configure Sequence Generator for generating Surrogate Keys
A Sequence Generator transformation could be used for generating surrogate keys for the target dimension table. Pass the data from sequence generator to the expression transformation linked to the Insert group of the Router transformation.
The following fields are created in the expression transformation linked from the sequence generator
- An output field O_EMPLOYEE_KEY of type bigint and assign the field value as NEXTVAL coming from sequence generator.
- An output field O_START_DATE of type date/time and assign the system variable SYSDATE as value.
- An output field O_END_DATE of type date/time and assign the date value as ‘9999-12-31’.
If the surrogate key value generation is handled at the database level, there is no need to use the Sequence generator transformation.
7. Configure Target for Insert Operations
Pass the data from expression configured in earlier step to a target transformation. Select the dimension table Dim_Employees as target object with operation defined as Insert.
Under Field Mapping section of the target transformation, map the output fields created in the mapping to the respective target fields.
8. Configure Target for Update Operation of existing record
In one of the expression transformation to which data is routed from Insert_Update group of the router transformation,
- Create an output field O_EMPLOYEE_KEY and assign the LKP_EMPLOYEE_KEY field as the value which is read from lookup.
- Create an output field O_END_DATE and assign SYSDATE as the value.
Pass the data from expression to a target transformation. Select the dimension table Dim_Employees as target object with operation defined as Update. Select the EMPLOYEE_KEY as the Update column. It is the column based on which record is identified in the target and updated.
Under Field Mapping section of the target transformation, map the output fields created in the mapping to the respective target fields. Note that we need to only update the End_Date field to make the record INACTIVE. Employee_Key is mapped to identify the record to update.
9. Configure Target for Insert Operation of existing record
The process is similar to what we discussed in steps 6 and 7. Pass the data from sequence generator to another expression transformation which is linked to the Insert_Update output group of the Router transformation.
Create the same output fields discussed in step 6 in the expression transformation and pass the data to the target.
The target configuration is exactly same as what we discussed under step 7. Validate, Save and trigger the mapping.
Verifying data in dimension table after the initial run
The data in the Dim_Employees table after the source data is processed is as below.
Since there is no data in the dimension table during the initial run, both the records are flagged for Insert and the data is inserted. The Start_Date value is loaded with system date value and End_Date is loaded with maximum date value as defined in the mapping.
Verifying data in dimension table after the second run with modified source data
Consider the salary of employee with id 100 is modified to 9000 from 6000. The other employee data remains unchanged.
EMP_ID,EMP_NAME,EMP_SALARY,DEPT_ID
100,Jennifer,9000,10
101,Micheal,8000,10
After the data is processed by Informatica, the data in the dimension table is as below.
- The initial record of employee with id 100 is made INACTIVE by updating the End_Date field with system date value. A new record of employee with id 100 is inserted with updated salary information with Employee_Key=3.
- Note: Next time when we lookup for data in Dim_Employees table, we look up only on records with key values 2 and 3 which are ACTIVE.
- The information of employee with id 101 remains unchanged as there is no change in data.
Conclusion
Please note the below best practices to be followed while developing the mapping.
- Follow proper naming convention while naming transformation and fields. Observe that all the output fields in the mapping demonstrated are prefixed with ‘O_’ and the lookup fields with ‘LKP_’
- If there are multiple fields in the dimension table for which change in data needs to be tracked, make sure to include a checksum field in the table as it makes easy to track the changes in the data.
- In the target transformation, limit the incoming fields by only selecting only the fields with prefix ‘O_’ using Field Rules. This eases the process of field mapping by allowing Automap feature as only required fields are allowed into the target.
Subscribe to our Newsletter !!
Related Articles:
SCD Type-2 Implementation using Dynamic Lookup transformation in Informatica Cloud Data Integration
A complete guide to implementing SCD Type-1 mapping in Informatica Cloud Data Integration
A complete guide to implementing SCD Type-2 mapping in Informatica Cloud Data Integration
There was a correction here
4. Flag data based on output from Lookup
IIF( ISNULL( Lkp_Employee_Key ), ‘I’, //bug
IIF( O_Checksum != Lkp_Checksum, ‘IU’ )
)
Below is the correct one
IIF( ISNULL( Lkp_Employee_id ), ‘I’,
IIF( O_Checksum != Lkp_Checksum, ‘IU’ )
)
Don’t think so. It doesn’t matter if use key or Id. We are already verifying the data in target based on Employee_Id in Lookup transformation.
Will the source read all the records every time when it runs? or only the new or changed records ?
ho wis the source going to identify the yesterday load and today’s?
You need apply the incremental data loading logic while reading data from your source.
Refer: https://thinketl.com/incremental-data-loading-in-informatica-cloud-iics/