Managing metadata enrichment jobs
For metadata enrichment, several types of jobs are used to run the requested analysis and enrichment processes.
Jobs for metadata enrichment are created automatically. To change any job settings, you must update the metadata enrichment asset.
Creating a job
The jobs that are used for running metadata enrichment or advanced analysis are automatically created:
- A job for running the basic metadata enrichment or advanced profiling. This job is created when you create a metadata enrichment asset and has the type Metadata Enrichment. By default, the job name equals the metadata enrichment name. You can specify a different name when you set up the metadata enrichment or change the name later.
- Two jobs for deep analysis. These jobs are created when you start a deep primary key or relationship analysis and have the type Key Analysis for Metadata Enrichment Assets. For deep primary key analysis, the job name is
<metadata-enrichment-job-name> (PK Detection)
. For deep relationship analysis, the job name is<metadata-enrichment-job-name> (Relationship Detection)
. - A job for publishing metadata enrichment results. This job is created when you publish metadata enrichment results for the first time and has the type Publish Metadata Enrichment Assets. The job name is
<metadata-enrichment-job-name> (Publishing)
.
These jobs are listed on the Jobs page of the project. For each job, you can view detailed information about job runs, job state changes, and job failures in the job run log.
Deleting jobs or job runs
You cannot delete any of the metadata enrichment jobs from the Jobs page. They are deleted only when you delete the metadata enrichment asset from the project. However, you can delete individual job runs of the metadata enrichment jobs.
Running metadata enrichment jobs
At any time, you can start a metadata enrichment run manually. Depending on the run configuration, a metadata enrichment can also run automatically. See Run definition.
Running an enrichment manually
You can manually start a metadata enrichment run at any time for the entire set of assets or a subset of assets.
To run the enrichment for the entire set of assets:
- Open the metadata enrichment asset and select Enrich all assets from the overflow menu next to the asset name.
- Open the metadata enrichment asset. On the Assets tab, select all assets and select Enrich from the toolbar.
- Go to the project's Jobs page and run the enrichment job from there. See Jobs.
To run the enrichment for a subset of the assets:
-
Open the metadata enrichment asset. On the Assets tab, select assets as required and select Enrich from the overflow menu next to the asset name.
-
Open the metadata enrichment asset. On the Assets tab, select assets as required and select Enrich from the toolbar.
You have several enrichment options.
- You can run the enrichment as configured. With this option, you start a run of the regular metadata enrichment job.
- You can run an analysis to identify primary keys for the assets. See Identifying primary keys. With this option, you start a run of a PK Detection key analysis job.
- You can run an analysis to identify relationships between the assets, or to detect overlapping or redundant data. See Identifying relationships. With this option, you start a run of a Relationship Detection key analysis job.
- You can run advanced data profiling to get more accurate profiling results without any approximations. See Advanced data profiling. With this option, you start a run of the metadata enrichment job.
Running enrichments automatically
Depending on the run configuration, a metadata enrichment job run starts directly after you create the metadata enrichment, or as a scheduled single or recurring run.
Advanced analysis does not run automatically. You must start advanced profiling, key analysis, relationship analysis, or overlap analysis manually.
Rerunning metadata enrichment
If the configured enrichment is run again, regardless of whether automatically or manually on the entire scope, your selection of the data scope on reruns determines which assets are actually reenriched. The job run log shows reruns of metadata enrichments that are configured with data scope for reruns that is limited to new, modified, or previously not enriched assets as delta metadata enrichment job runs. A delta job run might complete without enriching any assets because none of the assets matched the criteria of the limited data scope.
At any time, you can change the metadata enrichment configuration by updating the metadata enrichment asset before you run the enrichment manually or before the next scheduled run. Assets are then profiled and analyzed according to the current enrichment configuration.
In case of a rerun, assets might not be available for reenrichment because they were deleted from the data source or were removed from the enrichment scope. For such assets, the timestamp of the asset profile will still show the date and time of the previous run.
Pausing and resuming enrichment job runs
At any time, you can pause a job run for a metadata enrichment and later resume it. This does not apply to job runs of any of the advanced analyses jobs (key or relationship analysis or advanced profiling) or a publishing job. You can manually pause and resume an enrichment run from the Job Details page or from the job run details in a project, or from the Jobs page that you can access from the main navigation menu.
When an enrichment is paused, processing is halted. If you manually pause the enrichment, assets where enrichment hadn't started or wasn't complete at the time of pausing get the status pending
.
When the job run is resumed, assets where enrichment hadn't started or wasn't complete at the time of pausing are processed.
Canceling a job run
You can cancel an active job run from the Job Details page. The job run is immediately stopped. For a basic metadata enrichment job run, processing of assets with the status in progress
might still be completed
depending on the internal processing status. Thus, the information on the Run metrics page might differ from the actual results because these values reflect the enrichment status at the point when the run was canceled.
Learn more
Parent topic: Enriching your data assets