Indirect loading of AWS S3 files using Informatica Cloud (IICS)

November 26, 2023July 11, 2021

Spread the love

Contents hide

1. Introduction

2. Requirements for Indirect loading of AWS S3 files

3. Manifest File

4. Wildcards in Manifest file

5. Mapping Development

1. Introduction

The process of loading data from multiple source files of same file structure and properties through a single mapping into a single target in a single session run is called Indirect file loading.

We have discussed in detail regarding the difference between Direct and Indirect loading, guidelines to perform indirect loading of flat files present in Linux and Windows machines through IICS in the below article.

Article: Indirect File Loading in Informatica Cloud (IICS)

In this article, let us discuss how to perform the Indirect loading of files from AWS S3 bucket using Informatica Cloud (IICS).

2. Requirements for Indirect loading of AWS S3 files

The below requirements should be met to read multiple files from the same AWS S3 bucket and write to single target from Informatica Cloud (IICS)

All files must share same file structure and properties.
Every file and its path must be local to the AWS S3 bucket mentioned in the Source Connection.
A .manifest file should be created that contains the list of all source file names with the respective absolute path or directory path.
The manifest file should be available under same S3 bucket where the source files are placed.

Let us discuss in detail about the manifest file.

3. Manifest File

Instead of passing the actual source files as source, we pass .manifest file as source which holds the information of all source filenames along with the directory path. Hence this load type is referred as Indirect loading.

You must specify the manifest file name in the following format: <file_name>.manifest

The Manifest file is a JSON file. The contents of the manifest file are as below.

<file_name>.manifest

{
	"fileLocations": [{
		"URIs": [
			"<directory path>/file_1.csv",
			"<directory path>/file_2.csv"
		]
	}, {
		"URIPrefixes": [
			"<AWS S3 bucket Name>/"                       
		]
	}],
	"settings": {
		"stopOnFail": "true"
	}
}

4. Wildcards in Manifest file

Instead of passing each file name separately, you can specify the wild card character in the manifest file to select a group of files from a single directory as source files.

To select files from different directories, specify each directory and the file name formats separately as shown below. You can also specify files with different format in the same directory in a similar way.

The format of the Manifest file which accepts wild card characters to read files is a below:

<file_name>.manifest

{
	"fileLocations": [{
		"WildcardURIs": [
			"<directory path>/file_*.csv",
			"<directory path2>/*.txt "
		]
	}, {
		"URIPrefixes": [
			"<AWS S3 bucket Name>/"                       
		]
	}],
	"settings": {
		"stopOnFail": "true"
	}
}

Note:
Amazon S3 Connector supports only asterisk (*) wildcard character.
You cannot use the wildcard characters to specify folder names.

5. Mapping Development

Follow below steps while configuring the source transformation in the mapping to process multiple AWS s3 files using Informatica Cloud (IICS)

Select the Amazon S3 v2 Connection which connects to required AWS S3 bucket.
Select the .manifest file placed in the s3 bucket folder as the Source Object.
Select the Format of the source file from the drop-down list present under source object.
Under Formatting Options, select the options according to the file format.
Once everything is selected as expected, you should see the field names under Fields tab of source transformation.

DO NOT select the file format as JSON since the manifest file is of JSON format. Select the formatting options according to the actual file you want to read as source mentioned in the manifest file.

The Data Preview tab displays the data of the first file specified under URI section in the .manifest file. If the URI section is empty, the first file in the folder specified in URIPrefixes is displayed.

Subscribe to our Newsletter !!

Join our WhatsApp Channel

Connect with us on LinkedIn

Related Articles:

The process to read JSON files from AWS S3 using IICS is different reading from Secure agent machine. Learn how to read JSON files from AWS S3 using different methods.
READ MORE
Learn how to create Amazon S3 v2 Connection in Informatica Cloud with Basic Authentication method using Access Keys.
READ MORE
Learn how to create Amazon S3 v2 Connection in Informatica Cloud with IAM Authentication method using IAM Roles.
READ MORE

23 thoughts on “Indirect loading of AWS S3 files using Informatica Cloud (IICS)”

Chinna chowdary
July 11, 2021 at 11:28 am
Dear Team,
Great share 🙂
Reply
- ThinkETL
  July 13, 2021 at 8:55 pm
  Thank you!!👍
  Reply
Akshay Reddy Chada
July 30, 2021 at 9:49 pm
I have a quick question! Is there a way we can automate the manifest file build. I.e, Once the file is uploaded, the manifest file would automatically be updated
Reply
- ThinkETL
  August 3, 2021 at 7:14 pm
  use directory_path/* to read any file once uploaded. Archive the file once it is processed successfully.
  Reply
  - sun
    August 22, 2021 at 10:51 pm
    How to archive file once processed. Also, I am unable to use manifest file and load data from multiple S3 Files from one bucket to another. Please help
    Reply
    - ThinkETL
      August 27, 2021 at 6:49 pm
      You can use AWS CLI commands to move S3 files once processed. Make sure all the S3 files are of same structure and type. The manifest file format is different for with and without wildcards. Make sure you use the correct format.
      Reply
Kedar
September 22, 2021 at 7:11 pm
While using the manifest files as indirect file processing of AWS S3 files, then how to retrieve the currently processed file name dynamically in IICS. Like in power center, we used to have “AddCurrentlyProcessedFileName” for indirect file processing.
Any pointers or references are highly appreciated!!
Reply
- ThinkETL
  October 4, 2021 at 10:27 pm
  I didn’t find any such option available with S3 files. Let me know if you found any work around.
  Reply
Chayakiran Subramaniam
October 1, 2021 at 2:11 pm
Thank you for this article. I have a use case where XML files are dropped into the S3 bucket and I need to read them and convert it to relational structures
1) I have created the manifest file as you have described below. I have used that in the source transformation
2) I connected the Source transformation to Hierarchy Parser transformation. In Incoming Fields section, I get to two options: data & FileName. I have mapped the ‘data’ to Input in the Input Field Selection section. In the Input Settings, I have pointed to the Hierarchy Schema and have selected ‘File’ option
With this, I was expecting that the Hierarchy Parser transformation can process this data, but it is not working. It is giving me error. Could you please help/guide on how to perform indirect load of XML files on S3 with Hierarchy Parser transformation?
Reply
- ThinkETL
  October 4, 2021 at 10:32 pm
  The process to read XML or JSON file from S3 bucket is different. Here is the Informatica article I found.
  I am able to process a JSON file using this method. Able to process multiple json files using manifest file.
  I will publish a detailed article on the same soon.
  Reply
  - ThinkETL
    October 8, 2021 at 11:08 am
    Hi, refer this article published on our site. Feel free to share your feedback!!
    Reply
rp
October 5, 2021 at 6:45 pm
Hi,
My Use case is also same. I want to read multiple json files through indirect method as source from s3 through IICS mapping. Can you share more details, link for the same.
Reply
- ThinkETL
  October 8, 2021 at 11:04 am
  Hi, Refer this article published on our site. Feel free to share your feedback!!
  Reply
Shafi
October 22, 2021 at 8:13 pm
Hi Team,
our requirement is somewhat similar.
Can we read multiple fixed width files from s3 bucket using this approach.
I heard its not possible?any suggestions.
Reply
- ThinkETL
  October 23, 2021 at 7:56 pm
  Yes.. it is not supported by Informatica yet!!
  Reply
Jatin K
February 7, 2022 at 4:00 am
Hello,
One question:
How can we parameterize or select the S3 bucket file without any manual intervention when we deploy the code from DEV-UAT-PROD.
Once the code is deployed, we need to manually select the .manifest file in UAT and PROD.
Is there any easiest way to perform this task. Please provide your valuable feedback.
Thank you in advance.
Reply
- ThinkETL
  February 8, 2022 at 6:48 pm
  are you using same folder structure in all environments where file is placed?
  Reply
SR
November 24, 2022 at 12:39 am
I followed the steps as mentioned above. However, on data preview it is reading the manifest file itself and not reading the actual data file. What am I missing here?
Reply
- ThinkETL
  November 24, 2022 at 9:43 am
  Make sure you select the formatting options according to the actual file you want to read as source mentioned in the manifest file.
  Reply
Ashna
November 24, 2022 at 12:36 pm
Can you please help me how we can write multiple txt files in to one single file in S3 bucket and using it in source transformation.
Instead of using manifest is there any other option.
Reply
- ThinkETL
  November 24, 2022 at 1:21 pm
  Not that we know using Informatica. But if you are also using databases like Snowflake or Synapse, you can create external stages/tables on top these files present in S3 and read them at a time.
  Reply
Ashna
November 24, 2022 at 2:00 pm
Thankyou for your answer,
but we are not using any databases , we just want to merge text files in to one single file in S3 bucket without the use of manifest file .
Is there any option?
Reply
- ThinkETL
  November 24, 2022 at 10:58 pm
  I see no other option using Informatica. You should consider other external options like building a script which merges all files in s3.
  What is the problem using Manifest file?
  Reply