DataStage® flows are the design-time assets that contain data integration logic.
You can create an empty DataStage flow and add connectors and stages to it or you can import an existing DataStage flow from an ISX or ZIP file.
- Data sources that read data
- Stages that transform the data
- Data targets that write data
- Links that connect the sources, stages, and targets
DataStage flows and their associated objects are organized in projects. To start, open an existing project or create a new project.
Creating a DataStage flow
To create a DataStage flow, complete the following steps.
- Open an existing project or create a project.
- On the Assets tab, click .
- On the Create a DataStage flow page, use one of the following two methods
to create the DataStage flow:
- Click the New tab, add the necessary details for the DataStage flow, then click Create. The new DataStage flow opens with no objects on the DataStage designer canvas.
- Click the Local file tab, then upload an ISX or ZIP file from your local computer. Then, click Create. When the import process is complete, close the import report page, then open the imported DataStage flow from the Assets tab of the project.
- Drag connectors or stages from the palette onto the DataStage design canvas as nodes and arrange them as you
like. Connect these nodes on the canvas by hovering your pointer over a node to make an arrow appear
on the node, then click the arrow icon and drag it to the node that you want to connect to.
This action creates a link between the nodes.
To connect to remote data, see Connecting to a data source in DataStage.
- Double-click a node to open up its properties panel, where you can specify configurations and settings for the node.
- Click Run when you are done setting up the flow.
The flow is automatically saved, compiled, and run. You can view logs for both the compilation and job run.
Editing a DataStage flow
You can use the following actions to edit a DataStage flow.
- Drag a stage or connector and drop it on a link between two nodes that are already on the DataStage design canvas. Links are automatically added for the new node and columns are automatically propagated. Click Run again to see the results.
- Manually detach and reattach links from nodes on the DataStage canvas by hovering your pointer over them and clicking the end points of the links.
- Drag a stage or connector from the palette and drop it onto a link that is already on the canvas. The stage or connector is automatically linked to the node on either side of it and the columns in the DataStage flow automatically propagated.
- Click the Replace icon and select another flow to replace your flow. This action is also available for Build, Custom, and Wrapped stages, as well as subflows and Java libraries.
Previewing data
You can edit and preview data in your DataStage flow. On the canvas, right-click your connection, and select Preview Data. You can preview your data with all the connections and file connectors. For more information on file connectors, see File connectors in DataStage.
For example, you can preview time and microsecond time with the time zone. Both
time
and microseconds time
data types appear in the standard
format: HH:mm:ss
for time, and HH:mm:ss.SSSSSS
for microseconds
time.
Input time: 00:00:01-10:00
Local Time: 00:00:01
offset: -10 (which means UTC is 10 hours ahead of local time)
UTC Time: LocalTime + Offset = 00:00:01 + 10 hrs = 10:00:01
where time zone is converted
and displayed as 10:00:01
in the standard time format.Considerations
- Sensitive information and encrypted property values
- Specifying encrypted property values such as passwords in DataStage flows is not recommended. Instead, create a
parameter set of type Encrypted with a named parameter and do not specify a
default value for the parameter. In your flow, reference the encrypted parameter set and specify the
named parameter for the property value, ex:
#<parameter set>.<parameter name>#
. Specify the encrypted value of the parameter#parameter set.parameter name#
in the job running your flow. - Naming files in sources and targets to avoid data corruption
- In most cases, do not use the same file name in the source as in the target if the source and target points to the same database or storage system. This rule applies to files and database tables. If the names are the same, the data can be corrupted.
- Column metadata change propagation
- When you change a column's metadata, the changes are automatically propagated downstream. Changes made upstream do not apply to a column once you modify its metadata. If you delete a column, modifying the column in a later stage will not add the column back.
- Runtime column propagation
- When RCP is set, if your job encounters extra columns that are not defined in the metadata when it runs, it adopts these extra columns and propagates them through the rest of the job. This avoids errors due to missing mappings.
- Adding parameters
- See Adding parameters.
Learn more
Examples
- Creating a DataStage flow
-
Watch the following video for an example of how to create a simple DataStage flow.
This video provides a visual method to learn the concepts and tasks in this documentation.
- Importing a DataStage flow into a project
-
Watch the following video for an example of how to import a DataStage flow into a project.
This video provides a visual method to learn the concepts and tasks in this documentation.