Compute resource options for DataStage in projects

Last updated: Jan 12, 2024

To run DataStage jobs, you need to select an environment template for the DataStage parallel (PX) engine. The environment template specifies the configuration, number of virtual cores (vCPU), and memory to run the job. You can select different environment templates for any particular job.

Environment types and locations
Default environment templates
Compute usage
Runtime scope
Changing the runtime

Environment types and locations

DataStage has one runtime, the DataStage PX engine. The DataStage PX engine can run on either an SMP or MPP environment type. Both environment types have S (Small), M (Medium), and L (Large) configurations.

You can use these types of environments with DataStage:

SMP Environment (one conductor)
MPP Environment Configurations (one conductor node and one or more compute nodes)

Default environment templates

So you can start quickly, your project has pre-loaded environment templates for the DataStage PX engine runtime environment. You can select one of these environment templates to run your job in IBM Cloud.

Compute usage is tracked by capacity unit hours (CUH) and different environments use different rates of capacity units per hour.

Preset environment templates available in projects for DataStage
Name	Hardware configuration	Capacity units per hour (CUH)
Default DataStage PX S	1 conductor: 2 vCPU and 8 GB RAM	2
Default DataStage PX M	1 conductor: 4 vCPU and 16 GB RAM	4
Default DataStage PX L	1 conductor: 8 vCPU and 32 GB RAM	8

The number of capacity unit hours that are used for a DataStage job is based on the capacity unit rating of the environment and the number of seconds that the runtime was active.

The runtimes for DataStage stop automatically when processing is complete.

Compute usage in projects

You can monitor the total monthly amount of CUH consumption for the DataStage service on the Resource usage page on the Manage tab of your project.

Data quality rules run as DataStage flows and consume CUH. See Data quality rules.

Runtime scope

The runtime environment template that you select is specific to the job you selected for and the compute resources in that environment are not shared with any other DataStage jobs. You also have the flexibility to run the same job on different environment configurations by updating the environment that you want for that job.

To update the environment that you want to use:

On the flow canvas, select the run settings icon and select the environment that you want to use.
Select a job, edit the job configuration, and on the run settings tab, change the environment.