A Complete Reference for Informatica Power Center ETL Tool

today we are living in the world of data science, where we have to handle bulk amount of data to run any organization. To accomplish the goal of any organization it’s mandatory to take right decision at the right time. For this data is maintained in the form of data ware housing and for the extract, transform and load majorly informatics power center tool is used by organization. So in this paper we have shared the complete informatics power center logics, that will be useful not only for organization’s but also be useful for data scientists as a complete reference.

If you are using stored procedure transformation, configure it to normal.

UDF -
To create complex functions using builtin functions. This can be used in multiple mappings after creation.
There are two of UDF -1. private (can be used by all users' in repo. It can be transformations, w/fs, link condition and cmd task). 2. public (can be used only inside public udf).

Goto mapping designer => udf folder
Right click on it and click on new and then launch editor after specific entries and when passed on launch editor it show expression editor where you can define udf and save it. Now it's possible to access it through any expression editor and can be called as :udf.udf_name(param_list); In workflow tasks can be placed in serial, parallel, or any required format. Start task come by default whenever workflow is created by double click on it we can edit name and description.
To show always full name of task -tools => options => general => show full name of tasks.
Connections are created so that i.s. can read/write data from/into source or target. Connections that can be created in w/f are relational, ftp, queue and application.
To create connection -Connection menu => select connection type => a wizard will open => new => give credential to connect => close.
Components of w/f manager -task developer, worklet designer, workflow designer.

Workflow variable -
It's used to share information between two tasks. For eg. 2 nd session will run only then when 1 st session get completed. Double click on the link and set expression like $cmd_create_folder.status = succeeded. Next session will run only then when 1 st session get satisfied.

Workflow parameter -
Can be defined in the parameter file and give it's path to the workflow.
Workflow => edit => properties => parameter filename (place full path) It can be used to define connection for source and target. Eg. $DBConnection_source place it in the file.

Worklet -
Group of tasks, linked with each other. Types -reusable and non-reusable. Aborting the session -it has time out of 60 seconds. If it can't finish processing and committing data within timeout period it kills the dtm process and terminate session. This leaves the memory block and causes memory issues in server and leads to poor performance.

Concurrent run of workflow -
Say, when processing transactions using w/f, data belonging to multiple entities (eg. Regions) may be present in different flat-files and available for processing at the same time. The usual approach is to process them sequentially by running same w/f for each flat-file to reduce processing time it's required to run a separate instance of w/f for each flat file concurrently by using concurrent execution feature of the w/f.
A w/f configured for concurrent execution can run multiple instances concurrently. w/fs => edit => configure concurrent execution _ (click onn concurrent execution).
In concurrent execution configuration window will chose any one of below:-1. allow concurrent run with same instance name.(different run-ids) 2. allow concurrent run only with unique instance name.(different instance names).
Add instance name and param file path as many that many youu want to run parallel.

Commit types -
There are 3 types of commits -source based, target based and user defined.
Target based commit -1. Commit interval -it's the interval at which server commits data to relational target during a session.
International Journal of Trend in Scientific Research and Development (IJTSRD) @ www.ijtsrd.com eISSN: 2456-6470 @ IJTSRD | Unique Reference Paper ID -IJTSRD19045 | Volume -3 | Issue -2 | Jan-Feb 2019 Page: 1065 2. Commit point depends upon buffer block size and commit interval. Server commits data based on the number of target rows and key constraints on target table. 3. during a session, server continues to fill writer buffer after it reaches the commit interval. When buffer block is full. Informatica server issues a commit command. As a result commit point generally exceeds the commit interval. 4. server commits data to each target based on pk-fk key.
Session => edit => properties A. commit type -target/source. B. commit interval -10000 5. commit interval depends upon -A. commit interval B. writer wait timeout C. buffer blocks. 6. when you run a target based commit session, i.s. may issue commit on after or before the configured time interval. i.s. issues following process to issue commit -A. i.s. reaches commit interval, still continues to fill the writer buffer block, after this fills it issues commit command. B. if writer buffer block fills before commit interval i.s. starts to write on target. But it waits to issue commit, it issues commit when one of the following condition is true -I. the writer idle for an amount of time specified by i.s.
'writer wait timeout' option. II. i.s. reaches the commit interval and fills another buffer. Source based commit -I. configured as per (iv) above. II. commit point is the commit interval. Server commits data based on number of source rows, from active sources in a single pipeline. A pipeline consists of source qualifier and all transformations except active transformation viz filter, router etc. and target. III. when server runs a session, it identifies active rows for each pipeline in the mapping. The server generates a commit row from active source at every commit interval and when target receives it perform commit.
User Defined Commiti.s. commits data based on transactions defined in mapping properties. you can also commit and rollback options in sessions. User defined exception -we can handle this exception using-I. error handling functions -error, abort. II. user defined error tables.

Tracing Level Types -
Error () -this function causes i.s. to skip a row and issue an error message, which you define. The error message display either in session log or written to the error log table based on error logging type configuration in session. You can use error () in expression transformation to validate data. Generally, you use error() within an iif or decode function to set rules for skipping rows.
Eg. -IIF (trans_date > sysdate, error('trans_date invalid)); Abort () -stops the session and issues a specified error message to session log file or written to error log tables based on error logging type configuration in the session.
Eg. -IIF (ISNULL (creditcard_number), abort ('empty credit card')); from failure within maximum allowed number of attempts it goes to suspended state.

Task => edit => Recovery Strategy
In the drop-down menu chose the recovery strategy, for each task it differs:-1. Restart Task -available for all tasks. 2. Fail task and continue w/f -session and command tasks only. 3. Resume from task checkpoint -session task only.
When workflow recovers, it recover tasks and sessions on the basis of recovery strategy.
When you configure the session recovery strategy to 'resume from task checkpoint', i.s. stores the session state of operation in the shared location, $PMStorageDir and also it's written to the recovery tables to determine from where to begin loading data to target tables, in case of recovery.
Recovering w/f from failure -I. Recover automatically -using HA option. II. Recovering manually -using w/f manager or monitor.
Open w/f manager or monitor -right click and select:-1. Recover Task

Target Load Plan -
if there are multiple targets then to define which is the load order of targets.

Constraint Based Loading -
if there is constraint based dependency between tables like pk-fk relationship then first table having pk will loaded then table having fk will be loaded.

Push Down Optimization -
to convert etl logic to the sql and fire it against the database to improve performance. PDO are of three types -1. Source based 2. Target Based 3. Full PDO. Source based means fire sql against source. Target based means fire sql against target. A common use case for full pushdown optimization is when the source and target databases are in the same relational database management system. For example, configure full pushdown optimization to push all the transformation logic from a Teradata source to a Teradata target. When you run a session configured for full pushdown optimization, the Integration Service analyzes the mapping from the source to the target. When the Integration Service reaches a downstream transformation that it cannot push to the target database, it generates and executes SQL statements against the source or target based on the transformation logic that it can push to the database. Push Down Optimization Viewer is there to see which transformation will go to source/target.

Flat Files -
I. it can be: -1. delimited flat-file 2. fixed-width flat file II. Direct and Indirect flat-file:you need to declare following things in session, while working with flat-file -1. Source file name -file name, being loaded to target. 2. Source file directory -directory name, where file is resided. 3. Source file type -direct/Indirect. III. In the indirect loading file path name list is given in the file and this file path is given in the session, it's used to handle say 100 files in source side so it can be used:eg. > cat cust_list.dat $PMSourceFileDir/cust_us.dat $PMSourceFileDir/cust_uk.dat $PMSourceFileDir/cust_india.dat

IV.
Expression transformation can be used to generate flag and transaction control transformation to perform tc_commit_before, tc_commit_after and so on when to generate files in the target side. V.
whenever cobol file is placed Informatica add normalizer transformation automatically, If not add it. VI.
remove duplicates -I. for relational tables -I) by sql override. II) in S.Q. select 'distinct' option. II. for flat-files or other sources -I) sorter with distinct II) aggregator with group by funda.

II.
Start w/f from specific task -PMCMD starttask -service Informatica_integration_serviced domain_name -u user_name -p password -f folder_namew workflow_name -startfrom task_name; iii. Abort task -PMCMD aborttask -service Informatica_integration_serviced domain_name -u user_name -p password -f folder_namew workflow_name task_name; 3. PMCMD is program command utility to communicate with Informatica servers. PMCMD command is used to control Informatica repository events through unix, when Informatica server is on unix server. 11. Use of mapping variable -I. to pass value from one session to another. II. Incremental data load. III. in lookup override where you want to use dynamic sql override.