Indirect loading of AWS S3 files using Informatica Cloud (IICS)

Spread the love

1. Introduction

The process of loading data from multiple source files of same file structure and properties through a single mapping into a single target in a single session run is called Indirect file loading.

We have discussed in detail regarding the difference between Direct and Indirect loading, guidelines to perform indirect loading of flat files present in Linux and Windows machines through IICS in the below article.

Article: Indirect File Loading in Informatica Cloud (IICS)

In this article, let us discuss how to perform the Indirect loading of files from AWS S3 bucket using Informatica Cloud (IICS).

2. Requirements for Indirect loading of AWS S3 files

The below requirements should be met to read multiple files from the same AWS S3 bucket and write to single target from Informatica Cloud (IICS)

  • All files must share same file structure and properties.
  • Every file and its path must be local to the AWS S3 bucket mentioned in the Source Connection.
  • A .manifest file should be created that contains the list of all source file names with the respective absolute path or directory path.
  • The manifest file should be available under same S3 bucket where the source files are placed.

Let us discuss in detail about the manifest file.

3. Manifest File

Instead of passing the actual source files as source, we pass .manifest file as source which holds the information of all source filenames along with the directory path. Hence this load type is referred as Indirect loading.

You must specify the manifest file name in the following format: <file_name>.manifest

The Manifest file is a JSON file. The contents of the manifest file are as below.

<file_name>.manifest


{
	"fileLocations": [{
		"URIs": [
			"<directory path>/file_1.csv",
			"<directory path>/file_2.csv"
		]
	}, {
		"URIPrefixes": [
			"<AWS S3 bucket Name>/"                       
		]
	}],
	"settings": {
		"stopOnFail": "true"
	}
}

4. Wildcards in Manifest file

Instead of passing each file name separately, you can specify the wild card character in the manifest file to select a group of files from a single directory as source files.

To select files from different directories, specify each directory and the file name formats separately as shown below. You can also specify files with different format in the same directory in a similar way.

The format of the Manifest file which accepts wild card characters to read files is a below:

<file_name>.manifest


{
	"fileLocations": [{
		"WildcardURIs": [
			"<directory path>/file_*.csv",
			"<directory path2>/*.txt "
		]
	}, {
		"URIPrefixes": [
			"<AWS S3 bucket Name>/"                       
		]
	}],
	"settings": {
		"stopOnFail": "true"
	}
}

Note:
Amazon S3 Connector supports only asterisk (*) wildcard character.
You cannot use the wildcard characters to specify folder names.

5. Mapping Development

Follow below steps while configuring the source transformation in the mapping to process multiple AWS s3 files using Informatica Cloud (IICS)

  1. Select the Amazon S3 v2 Connection which connects to required AWS S3 bucket.
  2. Select the .manifest file placed in the s3 bucket folder as the Source Object.
  3. Select the Format of the source file from the drop-down list present under source object.
  4. Under Formatting Options, select the options according to the file format.
  5. Once everything is selected as expected, you should see the field names under Fields tab of source transformation.

DO NOT select the file format as JSON since the manifest file is of JSON format. Select the formatting options according to the actual file you want to read as source mentioned in the manifest file.

The Data Preview tab displays the data of the first file specified under URI section in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.

23 thoughts on “Indirect loading of AWS S3 files using Informatica Cloud (IICS)”

  1. I have a quick question! Is there a way we can automate the manifest file build. I.e, Once the file is uploaded, the manifest file would automatically be updated

    Reply
      • How to archive file once processed. Also, I am unable to use manifest file and load data from multiple S3 Files from one bucket to another. Please help

        Reply
        • You can use AWS CLI commands to move S3 files once processed. Make sure all the S3 files are of same structure and type. The manifest file format is different for with and without wildcards. Make sure you use the correct format.

          Reply
  2. While using the manifest files as indirect file processing of AWS S3 files, then how to retrieve the currently processed file name dynamically in IICS. Like in power center, we used to have “AddCurrentlyProcessedFileName” for indirect file processing.

    Any pointers or references are highly appreciated!!

    Reply
  3. Thank you for this article. I have a use case where XML files are dropped into the S3 bucket and I need to read them and convert it to relational structures

    1) I have created the manifest file as you have described below. I have used that in the source transformation
    2) I connected the Source transformation to Hierarchy Parser transformation. In Incoming Fields section, I get to two options: data & FileName. I have mapped the ‘data’ to Input in the Input Field Selection section. In the Input Settings, I have pointed to the Hierarchy Schema and have selected ‘File’ option

    With this, I was expecting that the Hierarchy Parser transformation can process this data, but it is not working. It is giving me error. Could you please help/guide on how to perform indirect load of XML files on S3 with Hierarchy Parser transformation?

    Reply
  4. Hi,

    My Use case is also same. I want to read multiple json files through indirect method as source from s3 through IICS mapping. Can you share more details, link for the same.

    Reply
  5. Hi Team,
    our requirement is somewhat similar.
    Can we read multiple fixed width files from s3 bucket using this approach.
    I heard its not possible?any suggestions.

    Reply
  6. Hello,
    One question:
    How can we parameterize or select the S3 bucket file without any manual intervention when we deploy the code from DEV-UAT-PROD.
    Once the code is deployed, we need to manually select the .manifest file in UAT and PROD.
    Is there any easiest way to perform this task. Please provide your valuable feedback.
    Thank you in advance.

    Reply
  7. I followed the steps as mentioned above. However, on data preview it is reading the manifest file itself and not reading the actual data file. What am I missing here?

    Reply
  8. Can you please help me how we can write multiple txt files in to one single file in S3 bucket and using it in source transformation.
    Instead of using manifest is there any other option.

    Reply
    • Not that we know using Informatica. But if you are also using databases like Snowflake or Synapse, you can create external stages/tables on top these files present in S3 and read them at a time.

      Reply
  9. Thankyou for your answer,
    but we are not using any databases , we just want to merge text files in to one single file in S3 bucket without the use of manifest file .
    Is there any option?

    Reply
    • I see no other option using Informatica. You should consider other external options like building a script which merges all files in s3.
      What is the problem using Manifest file?

      Reply

Leave a Comment

Related Posts