Informatica Cloud Data Integration Elastic (CDI-E)

Spread the love

1. Introduction

Informatica Cloud Data Integration is a cloud platform that allows users to seamlessly connect, integrate, and transform data across various sources and targets in cloud and hybrid environments.

Informatica Cloud Data Integration Elastic also known as Advanced Cloud Data Integration extends the capabilities of Informatica Cloud Data Integration with enhanced scalable and flexible features for complex data integration scenarios.

In this article, let us understand what Cloud Data Integration Elastic is, how is it different from Cloud Data Integration, its features and benefits.

2. What is Informatica Cloud Data Integration Elastic?

Informatica Cloud Data Integration Elastic (CDI-E) enables you to process your data integration jobs using Spark serverless engine running on a Kubernetes cluster. With Cloud Data Integration Elastic the users do not have to manage Spark. The Elastic Secure Agent handles the conversion of mappings into Spark code and running them on advanced clusters on cloud platform of your choice. This helps run your data integration jobs in multiple clouds.

In CDI-Elastic, the Secure Agent resides in the customer managed infrastructure and data is processed in automated infrastructure whose life cycle is managed by Informatica.

3. Difference between CDI and CDI-Elastic?

Below are the differences between Cloud Data Integration and Cloud Data Integration Elastic

CDI vs CDI-Elastic
CDI vs CDI-Elastic

3.1. Infrastructure Management

In Cloud Data Integration, you are responsible for managing the underlying compute resources required for data integration tasks. You need to ensure that the infrastructure can handle the expected data volumes and processing demands efficiently.

Cloud Data Integration Elastic takes a serverless approach to infrastructure management. You don’t need to worry about configuring or scaling compute resources manually. The platform dynamically allocates and manages resources based on workload demands.

3.2. Compute Engine

In Cloud Data Integration, the Secure Agent orchestrates and executes data integration jobs on the Data Integration Server engine. This engine is optimized for a wide range of data integration tasks, providing versatility and reliability.

In Cloud Data Integration Elastic, the Secure Agent takes advantage of an advanced Spark serverless engine. This engine is specifically designed to handle large-scale data processing tasks with efficiency and scalability.

3.3. Workload Data Volume

Cloud Data Integration is well-suited for low to medium workloads. It efficiently manages data integration tasks involving moderate data volumes and complexity.

Cloud Data Integration Elastic is well-suited to handle medium to high workloads. It excels in scenarios where large data volumes need to be processed, transformed, and integrated.

4. Execution Life cycle of Cloud Data Integration Elastic

The Cloud Data Integration Elastic relies on execution of jobs using Apache Spark on Kubernetes. It is called as IICS based Spark Serverless solution.

Apache Spark is an open-source, powerful data processing framework used for big data workloads. Spark features a distributed processing model, where tasks are divided and executed across multiple computers or nodes in a cluster, allowing for parallel computation and enabling scalability.

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. It also creates and manages clusters of computers to ensure your apps run reliably and efficiently.

Containerized applications are applications that run in isolated packages of code called containers. Containers are a way to package and distribute code, making it easy to move and run applications consistently across different servers, environments, and platforms.

Elastic Jobs running on Spark cluster
Elastic Jobs running on Spark cluster

The execution life cycle of Cloud Data Integration Elastic involves the following stages:

4.1. Setting up IICS Cloud Environment

Set up your cloud environment so that the Secure Agent can connect to and access cloud resources and also create and deploy an elastic cluster.

4.2. Configuring the Kubernetes Cluster (Advanced Cluster)

Referred to as the “Advanced Cluster” within IICS, the Kubernetes Cluster properties should be configured by the Administrator. Navigate to Administrator > Advanced Clusters, set up Advanced Configuration which is a set of properties that define the resources that you provision to create an advanced cluster.

The Advanced Cluster creation is supported on AWS, Azure and GCP.

4.3. Designing and Executing Mappings

User creates an Elastic Mapping within Informatica Cloud Data Integration encapsulating the business logic and submits the job for execution. Alternatively, an existing mapping can be triggered in Advanced Mode. This creates a copy of Elastic mapping (if supported) which then can be triggered to run on spark engine.

4.4. Transformation to Spark Code and Kubernetes Deployment

The Cloud Data Integration Elastic Secure Agent converts the mapping logic into a deployable spark code which then through Cluster Creation Service creates and starts the Kubernetes cluster based on Advanced Configuration and automatically pushes the spark code to the cluster for processing.

During the entire data processing on Advanced Cluster, the data always stays within the Customer’s VPC

4.5. Scaling up Nodes in the Cluster

The Elastic Secure Agent ensures optimal resource utilization by dynamically scaling the cluster in response to demand and consumption patterns. When additional jobs are submitted for execution, the agent intelligently identifies the requirement for more resources and takes action by introducing new nodes to the cluster.

4.6. Monitoring Spark Job Logs and Cluster Health

The logs generated during the execution of elastic jobs on the Spark engine are securely stored in the cloud location configured within the Advanced Configuration. Due to the transient(short-lived) nature of the nodes running Spark jobs, it’s mandatory to configure a suitable log storage location based on the chosen cloud platform (e.g., Amazon S3 for AWS, ADLS Gen2 Storage for Microsoft Azure).

Additionally, the Secure Agent provides real-time reporting on the status of Spark jobs and cluster statistics, which users can monitor from the IICS Monitor service.

4.7. Scaling down Nodes and Cluster Deletion

Upon the completion of submitted jobs, the Elastic Secure Agent initiates the scaling down of nodes within the cluster. This process ensures that resources are optimally allocated and reduces unnecessary overhead. Ultimately, when all submitted jobs are successfully executed and resources are no longer needed, the cluster resources are deleted.

The agent restarts the cluster when another elastic job is submitted.

Execution Life cycle of Cloud Data Integration Elastic
Execution Life cycle of Cloud Data Integration Elastic

This well-defined sequence of stages ensures the efficient, scalable, and seamless execution of jobs in Cloud Data Integration Elastic.

5. Why Cloud Data Integration Elastic is needed?

Below are some of the scenarios which can help you understand the need for Cloud Data Integration Elastic.

5.1. Handling Resource Intensive Jobs

Consider a scenario where you have resource intensive jobs that process large volumes of data in Cloud Data Integration. If the overall processing time is more with the existing configuration of sever on which the secure agent is installed, the configuration of the server can be increased for better performance of jobs.

However, this results in the increased costs of server maintenance and the resources are under-utilized when the resource intensive jobs are completed.

5.2. Parallel Processing of Jobs

Consider a scenario where multiple jobs are submitted for execution during a time window in Cloud Data Integration. In order to enable parallel processing of these jobs, a secure agent group with multiple agents could be configured. This enables the execution of tasks distributed across various agents in the secure agent group.

However, the challenges with this approach are

  • The secure agent should be installed on multiple servers.
  • All servers must be up and running irrespective of the number tasks that are running at the moment as they cannot be dynamically scaled up and down.
  • The user must make sure the various components utilized by the IICS tasks like parameter files, source files, scripts and the libraries used by scripts are available and maintained consistently across all the servers in the group.
  • A Data Integration Job can run only on one server at a time and it could not utilize the compute resources of the other servers in the secure agent group even though they are idle.

These above mentioned challenges can be resolved using Cloud Data Integration Elastic.

6. Benefits of Cloud Data Integration Elastic

CDI-Elastic brings all the capabilities of CDI along with advanced data processing capabilities of Spark engine. Below are the benefits of Cloud Data Integration Elastic.

6.1. Auto Scaling

The Cloud Data Integration Elastic offers automatic scaling of resources based on demand and consumption. Auto scaling ensures optimal performance without manual intervention, providing cost savings and improved processing times.

6.2. Spark Serverless Compute Engine

The Cloud Data Integration Elastic leverages the advanced Spark serverless compute engine to process large volumes of data with high concurrency. The Elastic Secure Agent allows you to take full advantage of Spark’s advanced processing capabilities without the need to manage underlying infrastructure.

6.3. Multi-Cloud Support

The Cloud Data Integration Elastic offers support for multiple cloud platforms, including AWS, Azure, and GCP. This multi-cloud compatibility provides you with the flexibility to choose the cloud environment that best suits your organization’s needs and strategies.

6.4. Controlled Cost

With Cloud Data Integration Elastic, you gain better control over costs. The auto-scaling feature ensures that you only use the necessary resources when needed, preventing overprovisioning and unnecessary expenses.

6.5. Simplified Monitoring

The Cloud Data Integration Elastic provides a simplified and centralized monitoring mechanism. You can easily track the status of your data integration jobs, monitor cluster performance, and review Spark job logs, all from a unified interface.

Subscribe to our Newsletter !!

Related Articles:

  • What is Informatica Cloud Secure Agent?

    IICS Secure Agent runs all tasks and enables secure communication across the firewall between your organization and Informatica Intelligent Cloud Services.

    READ MORE

  • HOW TO: Download and Install Informatica Cloud (IICS) Secure Agent?

    A complete guide on how to download, install and register Informatica Cloud Secure Agent in Windows and Linux.

    READ MORE

  • Secure Agent Groups in Informatica Cloud (IICS)

    When you install Secure Agent, it is added to its own group by default. You can either create a new group and add agents under it or add new agents in existing group.

    READ MORE

Leave a Comment

Related Posts