Use DataStage® in the Extract, Load, and Transform (ELT) run mode to more efficiently run SQL queries in target databases.
- What is the ELT process?
- What is the ELT run mode in DataStage?
- Availability
- Supported connectors
- Supported stages
- Limitations
- Enabling ELT run mode
- Selecting a materialization policy
- Setting database connector permissions for ELT run mode
- Compiling with the dsjob command
- Example
What is the ELT process?
The primary process that DataStage uses is Extract, Transform, and Load (ETL), in which data is read into memory, processed, then written to a target. In DataStage, all jobs run in ETL mode by default.
The Extract, Load, and Transform (ELT) process transfers data from a source to a target database and then prepares the information in the database itself. In certain situations, it is advantageous to use the ELT process to transform the data in the target database. An example might be when you need to transform large data sets that are already in specific data sources or targets.
What is the ELT run mode in DataStage?
- No ELT mode
- When the analysis determines that the DataStage flow cannot be converted to SQL, ETL mode is used, and DataStage compiles the flow with the PX runtime engine.
- ELT mode
- When the analysis determines that the DataStage flow can be converted to SQL, ELT mode is used, and DataStage compiles the flow to SQL.
- Mixed ETL and ELT mode
- When the analysis determines that the DataStage flow can be only partially converted to SQL, both ETL and ELT modes are used as needed.
Availability
ELT run mode is available in both DataStage Enterprise and DataStage Enterprise Plus.
Supported connectors
- Amazon RDS for PostgreSQL
- Amazon Redshift
- Google BigQuery
- IBM Cloud® Databases for PostgreSQL
- IBM Db2®
- IBM Db2 for DataStage
- IBM Db2 on Cloud
- IBM® Db2 Warehouse
- Oracle
- PostgreSQL
- Snowflake
- Teradata
- Teradata database for DataStage
- watsonx.data
Connector | Mode | Limitations |
---|---|---|
Amazon Redshift | Mixed |
|
Google BigQuery | ELT |
|
IBM Db2 | ELT |
|
IBM Db2 optimized | ELT |
|
Oracle | ELT |
|
Supported stages
Stage | Mode | Limitations |
---|---|---|
Aggregator | Mixed |
|
Copy | ELT |
|
Filter | Mixed |
|
Funnel | ELT |
|
Join | ELT |
|
Lookup | Mixed |
|
Remove Duplicates | Mixed |
|
Sort | Mixed |
|
Transformer | Mixed |
|
Limitations
- Flows with runtime column propagation enabled are not supported.
- Implicit data truncation is not performed when the table action is CREATE or REPLACE.
Enabling ELT run mode
- Open a DataStage flow.
- On the toolbar, click the Settings icon .
- Click the Run tab.
- Click Extract, load, transform (ELT). Then, click Save.
Selecting a materialization policy
- Generate nested SQL
- Only output models (target tables) are materialized. Each output model is represented by a single query that includes all transformations from input nodes to outputs.
- Link as table
- All integration flow links are materialized as tables. These tables are temporary and are deleted once target tables are built.
- Link as view
- All integration flow links are materialized as views. These views are temporary and are deleted once target tables are built.
- Advanced
- Cardinality changers are materialized as tables. A cardinality changer is a subset of connected nodes, based around stages such as Lookup, Join, and Aggregator. These nodes are combined to conserve resources by building more efficiently and avoiding repetitive computations.
Setting database connector permissions for ELT run mode
- Create views from select statements
- Drop views
- Create tables from select statements
- Drop tables
- Alter tables for adding null constraints and primary keys
If a connector is configured with custom SQL statements, then more permissions are needed for the SQL operations that are used in the custom SQL statements.
The permissions must be set within the corresponding database by a database administrator, which is outside of the scope of DataStage and Cloud Pak for Data.
Compiling with the dsjob command
cpdctl dsjob compile --project <project name> --enable-elt-mode
Example
- A PostgreSQL data source
- A Sort stage
- A Filter stage
- A PostgreSQL data target
You enable ELT mode by opening the run settings and selecting it.
After you compile the job successfully, the Compile log has a single message of "full pushdown". This message indicates that the entire flow was compiled in ELT mode.
You run the job. During the job run, the PostgreSQL source data is converted by using SQL statements that apply the sorting and filtering. The result is persisted as a table that is defined in the PostgreSQL target connector.
The job run finishes and the target database now holds all the transformed data.