0 / 0
Reviewing metadata enrichment results

Reviewing metadata enrichment results

Review enrichment results after the enrichment job completes. Access the results by viewing the metadata enrichment asset.

Required permissions
To view enrichment results, you must have at least the Viewer role in the project.
To edit results, you must have at least the Editor role in the project. To change term assignments, you also must have at least view access to the categories that are used in the enrichment.

Metadata enrichment assets are listed in the Metadata enrichments section of the project's Assets page. To view a metadata enrichment asset, click its name or click View from the asset's overflow menu.

A side panel provides a summary of relevant information about the metadata enrichment.

The following indicators are used in the results tables and the details panels:

  • A purple dot for automatically assigned terms or data classes
  • A purple square for automatically assigned display names and for AI-generated descriptions that were automatically assigned
  • A blue dot for accepted display-name or description suggestions, or for edited display names or descriptions
    This indicator is removed when the asset is published to a catalog.
  • An AI icon AI icon for AI-suggested descriptions

Metadata enrichment is run on assets that are available in the project. Thus, the list of enriched assets might not correspond to the configured scope of included metadata import assets in these cases:

  • Metadata import was not yet complete when the enrichment started.
  • Metadata import failed for a set of assets or failed completely.

Reviewing results at the asset level

On the Assets tab, the following information is provided for each data asset in the scope of the metadata enrichment:

  • Asset name. For relational data, also the table type is shown.
  • Source information.
  • Display name. You can edit the names and accept suggested names inline.
  • Description. You can edit the descriptions and accept AI-suggested descriptions inline.
  • Assigned business terms and the number of suggested terms.
  • Assigned classifications.
  • Assigned primary keys and the number of suggested ones.
  • Overall data quality score achieved in the last enrichment.
  • Review status.
  • The status and the date and time of the last enrichment.
  • Publish status.

Until the enrichment ran at least once, only the Name and Context columns are populated.

By default, all information is shown on the tab. You can customize the view and show only the information that you need. Click the Customize columns icon Customize columns icon and deselect all columns that you want to hide. You can also reorder the columns by clicking an entry and dragging it to a new position.

Check asset details and enrichment results:

You can directly go to the columns of an individual asset by clicking the asset name or by clicking View columns in the overflow menu actions icon three vertical dots.

Detailed results for each asset are also available in its asset profile in the project. Column-level profile details can also be accessed from the Governance tab in the column details panel. For Watson Query and watsonx.data view assets, all users are denied access to the profiling results to prevent accidental exposure of value distributions.

If you want to remove a specific asset from the enrichment scope, select the asset and click Remove asset from the overflow menu.

Asset and enrichment details

Access asset and enrichment details by clicking the asset name or by clicking View asset details from the overflow menu. On the Details tab in the side panel, you find this information:

  • The source of the data asset: the connection and database for connected assets. For files uploaded from a local system, Project is shown in the Source column.
  • Asset details: the table type, the number of columns and rows in the asset, and its data format.
  • The asset owner. The asset owner is usually the user who added the asset to the project except for assets that were added from a catalog. In this case, the catalog asset owner is also the project asset owner.
  • The selected enrichment options.
  • The sampling method.
  • The date when the asset was last enriched and a link to the details of that enrichment job run.
  • The asset description.

Display name

If the enrichment options include the Expand metadata option, this section initially contains an alternative name for the data asset that was found through fuzzy matching. Fuzzy matching expands the source name based on a predefined glossary to provide a name that is easy to understand. The expanded name might already be assigned because the confidence was high enough or it is a suggestion that you can accept. At any time, you can edit the display name.

Description

This section can contain a description for the asset. If the enrichment options include the Expand metadata option, this section initially contains an AI-generated description. This description might already be assigned because the confidence was high enough or it is a suggestion that you can accept. At any time, you can edit the description.

Source

This section shows the connection and database for connected assets. For files uploaded from a local system, Project is shown in the Source column.

Asset details

Asset details include the number of columns and rows in the asset and the data format of the asset. For relational data, also the the table type is listed.

Asset owner

The asset owner is usually the user who added the asset to the project except for assets that were added from a catalog. In this case, the catalog asset owner is also the project asset owner.

Enrichment details

Enrichment details include the list of selected enrichment options, the selected sampling method, the date when the asset was last enriched, and a link to the corresponding job run.

Governance information

Governance information for an asset includes assigned and suggested business terms and assigned classifications, which are listed in the Business terms and Classifications columns of the results. For assigned terms, a purple dot indicates that at least one of the terms was automatically assigned.

Access detailed governance information for an asset by clicking the asset name, by clicking the View more link in the Business terms or Classifications column, or by clicking View asset details from the overflow menu. On the Governance tab in the side panel, you can manage term and classification assignments.

Terms

Review assigned and suggested terms. For each assigned or suggested term, the confidence score is shown. You can click a term to see some of its properties: its description, its primary and secondary categories, a list of data stewards, hierarchical type relationships, and related classifications and data classes.

Accept suggestions as required. You can also search for any business terms that are not listed as suggestions and assign them manually. Remove any assigned terms that you think are inaccurate. Such negative feedback is considered in the next enrichment run. Terms that you remove in bulk are treated differently from those that you remove individually. If you remove a term from a single asset, that term is considered rejected. It is also listed in the side panel, and you can reassign it any time. For more information, see Term assignment.

Classifications

Review assigned classifications. Depending on the project settings, classifications that are related to a business term are also assigned when the business term is automatically assigned. You can assign additional classifications or remove the classifications that were assigned by the system and replace them with other ones. For more information about the project settings, see Default enrichment settings: Classification assignment.

Information about primary keys and relationships

Access key and relationship information for an asset by clicking the asset name, by clicking the View more link in the Primary key column, or by clicking View asset details from the overflow menu. On the Keys tab in the side panel, you can manage key assignments and relationships.

For primary keys that were identified through primary key analysis, the number and percentage of unique values, the number and percentage of null values, and the number of analyzed columns is shown. This information is not available for a key that was picked and assigned manually without prior primary key analysis, or for suggested primary keys that result from in-depth key relationship analysis.

The Relationships section provides these views of the assigned relationships:

  • Parent of tab: In relationships to the listed assets, the current asset provides the primary key.
  • Child of tab: In relationships to the listed assets, the current asset provides the foreign key.

If a relationship analysis was run but no relationships are assigned yet, you can click the plus icon to view and work with the analysis results.

At any time, you can edit the assigned relationships by clicking the pencil icon.

For more information, see Identifying primary keys and Identifying relationships.

Data quality score

A data quality score is displayed only if at least one data quality check was applied to the asset. Otherwise, a dash (—) is shown. The score shown for a data asset is the weighted average of the scores provided by the columns in the data asset. Data quality scores that are below the specified threshold are marked with a red dot. Data quality scores that are equal to or exceed the specified threshold are marked green.

A delta value shows how the overall data quality score changed compared to the score from 90 days before the latest analysis:

  • A green arrow pointing to the upper right (arrow pointing to the upper right) indicates that the data quality score went up.
  • A red arrow pointing to the lower right (arrow pointing to the lower right) indicates that the data quality score went down.

To find assets with quality issues quickly, especially when the enrichment scope is large, you can filter the list by quality scores.

For details about data quality issues, select an asset and click View data quality details from the overflow menu, or click the quality score.

For more information, see Data quality analysis results and Data quality scores.

Review status

Initially, the review status of all assets in the metadata enrichment is Not reviewed. After you review the enrichment results for an asset in the asset profile, you can set the asset's review status to Reviewed. Thus, everybody on the team is aware of what already was looked at and what still needs to be reviewed. If a later enrichment run updates the results of an asset with the status Reviewed, the asset's review status is set to Reanalyzed after review (Icon that indicates changes to the enrichment results for an already reviewed asset). The review status does not change for updates to the asset that were found during metadata import.

Note that, for assets that are marked as reviewed, term assignments are not updated on reruns of an enrichment. For more information, see How new analysis results update existing term assignments.

Filter the list of assets by review status to quickly find any assets that must be looked at.

You can reset the review status of an asset at any time. To change the review status, click Mark as reviewed or Mark as not reviewed from the asset's overflow menu. To change the review status of several assets at once, select the assets, click More, and select Mark as reviewed or Mark as not reviewed. The review status of an asset is independent of what review status its columns have. You can also use APIs instead of the user interface to set the review status of assets. The links to these APIs are listed in the Learn more section.

When you do a bulk change of the review status, you might see a success message before the changes are actually complete depending on the volume of the requested changes. You might need to refresh the view several times before you see all changes applied.

Enrichment status

The enrichment status column can have these values:

Not analyzed
This asset was added after the last enrichment run.
Finished
Enrichment for this asset is complete. This status is also shown if the enrichment happened outside the current enrichment asset, for example, if the asset was profiled manually before it was added to this enrichment.
Failed
An error occurred during the enrichment.
Canceled
The job run for the enrichment was canceled.

You can sort or filter the result list by enrichment status. For sorting, the primary sort order is by status. Ascending order is canceled, failed, and finished. Depending on the general sort order, assets with the status Not analyzed are displayed unordered at the top or the end of the list.

Publish status

This column shows whether an asset was published to a catalog. Publish details such as the target catalog or the name of the publish job are available in the asset information in the side panel.

However, only the details of the most recent publish request are shown.

Reviewing results at the column level

On the Columns tab, the following information is provided for each column in a data asset in the scope of the metadata enrichment:

  • Column name. A Key icon key icon next to the name indicates that the column is assigned as primary key.
  • Table to which the column belongs and the context of that asset.
  • Display name. You can edit the names and accept AI-suggested names inline.
  • Description. You can edit the descriptions and accept AI-suggested descriptions inline.
  • Assigned business terms and the number of suggested terms.
  • Assigned data class.
  • Assigned classifications.
  • Data quality score for this column.
  • Review status.

The Columns tab is empty until the enrichment ran at least once.

By default, all information is shown on the tab. You can customize the view and show only the information that you need. Click the Customize columns icon Customize columns icon and deselect all columns that you want to hide. You can also reorder the columns by clicking an entry and dragging it to a new position.

If you want to check only the columns of a specific data asset, click the asset name on the Assets tab or click View columns from the overflow menu actions icon three vertical dots.

Check column details and enrichment results:

Column details

Access column and enrichment details by clicking the column name or by clicking View column details from the overflow menu. On the Details tab in the side panel, you find this information:

  • Display name. If the enrichment options include the Expand metadata option, this section initially contains an alternative name for the column that was found through fuzzy matching. Fuzzy matching expands the source name based on a predefined glossary to provide a name that is easy to understand. The expanded name might already be assigned because the confidence was high enough or it is a suggestion that you can accept. At any time, you can edit the display name.

  • Description. This section can contain a description for the column. If the enrichment options include the Expand metadata option, this section initially contains an AI-generated description. This description might already be assigned because the confidence was high enough or it is a suggestion that you can accept. At any time, you can edit the description.

  • The context of the asset to which the column belongs in the Source section.

  • Statistics about the data for each column such as the number of distinct values, the percentage of unique values, minimum, maximum, or mean, and sometimes the standard deviation in that column. The number of distinct values indicates how many different values exist in the sampled data for the column. The percentage of unique values indicates the percentage of distinct values that appear only once in the column.

    Depending on a column’s data format, the statistics vary slightly. For example, statistics for a column of data type integer have minimum, maximum, and mean values and a standard deviation value while statistics for a column of data type string have minimum length, maximum length, and mean length values.

  • The frequency distribution of the values found and the number of missing values.

  • The data format of the columns in the sampled rows.

  • The asset owner.

More detailed results for each column are available in the column profile. To view that profile:

  • Select View data profile from the column's overflow menu.
  • Click the Open column profile icon open column profile icon next to Statistics or Formats in the column details.
  • Click the View all link in the Statistics or Formats section. Whether this link is available depending on the number of results.

If the column is subject to a data protection rule, only a subset of this information is available: the description and the context

For Watson Query and watsonx.data view assets, all users are denied access to the profiling results to prevent accidental exposure of value distributions.

Governance information

Governance information for a column includes assigned and suggested business terms, which are listed in the Business terms column of the results, assigned and suggested data classes, which are shown in the Data class column, and assigned classifications, which are shown in the Classifications column. An automatically assigned data class is identified by a purple dot next to the data class name. For assigned terms, the dot indicates that at least one of the terms was automatically assigned.

Access detailed governance information for a column by clicking the column name, by clicking the View more link in the Business terms, Data class, or Classifications column, or by clicking View column details from the overflow menu. On the Governance tab in the side panel, you can manage term, data class, and classification assignments.

The same information is provided when you click the View more link that appears below the business term, data class, or classification when you hover over a specific column.

Terms

Review assigned and suggested terms. For each assigned or suggested term, the confidence score is shown. You can click a term to see some of its properties: its description, its primary and secondary categories, a list of data stewards, its hierarchical type relationships, and related classifications and data classes.

Accept suggestions as required. You can also search for any business terms that are not listed as suggestions and assign them manually. Remove any assigned terms that you think are inaccurate. Such negative feedback is considered in the next enrichment run. Terms that you remove in bulk are treated differently from those that you remove individually. If you remove a term from a single column, that term is considered rejected. It is also listed in the side panel, and you can reassign it any time. For more information, see Term assignment.

Note that term assignments do not affect data class assignments. If a term that is associated with a data class is assigned to a column by an ML model or through name matching, the related data class is not automatically assigned as well.

Data class

Review the assigned data class and the suggested data classes. You can click a data class to see some of its properties: its description, its primary and secondary categories, the type of data matching, its parent and dependent data classes, and related classifications and data classes.

The confidence score for assigning or suggesting a data class must at least equal the set threshold. See Data class assignment settings. If a threshold is set on a data class directly, this threshold takes precedence when data classes are assigned. It is not considered for suggestions. In addition to the confidence score, the priority of a data class is taken into account. See Adding data matching to data classes.

For details about data classes, see Data classes and Predefined data classes.

A dash (—) indicates that no data class was assigned during analysis.

Several data classes are more generic identifiers that are detected and assigned at column level only. These data classes are assigned when a more specific data class could not be identified at a value level. Generic identifiers include the following data classes: Code, Identifier, Indicator, Quantity, and Text

When you assign a data class manually, either a suggested data class or an entirely different one, terms that are associated with that data class are assigned in the next enrichment run. Term assignments, however, do not entail automatic assignment of associated data classes.

Classifications

Review assigned classifications. Depending on the project settings, classifications that are related to data class or a business term are also assigned when the data class or business term is automatically assigned. You can assign additional classifications or remove the classifications that were assigned by the system and replace them with other ones. For more information about the project settings, see Default enrichment settings: Classification assignment.

Data quality score

A data quality score is displayed only if at least one data quality check was applied to the column. Otherwise, a dash (—) is shown. Data quality scores are computed for each individual column in a data asset based on the results of the applied data quality checks. A setting in the data quality analysis results determines whether a column's quality core is considered for calculating the overall asset and dimension scores.

A delta value shows how the data quality score changed compared to the score from 90 days before the latest analysis:

  • A green arrow pointing to the upper right (arrow pointing to the upper right) indicates that the data quality score went up.
  • A red arrow pointing to the lower right (arrow pointing to the lower right) indicates that the data quality score went down.

To find columns with quality issues quickly, especially when the enrichment scope is large, you can filter the list by quality scores.

For details about data quality issues, select a column and click View data quality details from the overflow menu, or click the column's quality score.

For more information, see Data quality analysis results and Data quality scores.

Review status

Initially, the review status of all columns in the metadata enrichment is Not reviewed. After you review the enrichment results for a column, you can set its review status to Reviewed. Thus, everybody on the team is aware of what already was looked at and what still needs to be reviewed. If a later enrichment run updates the results of a column with the status Reviewed, the column's review status is set to Reanalyzed after review (Icon that indicates changes to the enrichment results for an already reviewed column). The review status does not change for updates that were found during metadata import.

Note that, for columns that are marked as reviewed, term assignments are not updated on reruns of an enrichment. For more information, see How new analysis results update existing term assignments.

You can reset the review status of a column at any time. To change the review status, click Mark as reviewed or Mark as not reviewed from the column's overflow menu. To change the review status of several columns at once, select the columns, click More, and select Mark as reviewed or Mark as not reviewed. A column's review status is independent of the review status that its containing asset has. You can also use APIs instead of the user interface to set the review status of columns. The links to these APIs are listed in the Learn more section.

When you do a bulk change of the review status, you might see a success message before the changes are actually complete depending on the volume of the requested changes. You might need to refresh the view several times before you see all changes applied.

If the built-in machine learning model for ML-based term assignment is used and is trained from project assets, columns that are marked as reviewed and have automatically assigned business terms serve as training data.

Filter the list of columns by review status to quickly find any columns that must be looked at.

Next step

Learn more

Parent topic: Managing metadata enrichment

Generative AI search and answer
These answers are generated by a large language model in watsonx.ai based on content from the product documentation. Learn more