Reviewing and updating enrichment results in an external program
You want to use a familiar spreadsheet environment to review and manage data class and term assignments for the data assets in the scope of a single metadata enrichment.
Requirements and restrictions
For managing data class and term assignments in a spreadsheet, the following requirements and restrictions exist.
Prerequisite configuration
The Review metadata Office add-in must be deployed in your organization and you must have a copy of the Microsoft Excel workbook template that is provided with the add-in.
A Microsoft admin can download the manifest.xml
file and the Review metadata - IBM Knowledge Catalog.xlsx
workbook template from the metadata-enrichment
folder in the IBM Knowledge Catalog samples GitHub
repository at: https://github.com/IBM/knowledge-catalog-samples
Instructions for tailoring the manifest.xml
are provided in the readme file that accompanies the manifest file and the Excel template.
The admin must deploy and publish the add-in as described in the Microsoft documentation Deploy and publish Office Add-ins.
You must activate the Review metadata Excel add-in. For information about how to do that, check the documentation that applies to your version of Excel.
Restrictions
Before you start working with the workbook and the add-in, review the information in Issues with the Microsoft Excel add-in.
What the workbook looks like
The workbooks consists of 5 protected sheets:
Sheet | Content |
---|---|
Data assets | Columns: • Connection • Data path • Data asset • Column • Type • Description • Assigned / suggested data classes • Data class • Assigned / suggested business terms • Business term columns. By default, 3 columns are provided. You can add further columns. See Reviewing and updating assignments. |
Business terms | Columns: • Name • Abbr. A list of the abbreviations defined for the term. • Category path • Distinctive name. If multiple terms with the same name exist, the name and category path are listed here to help distinguish the terms. • Description • Secondary categories • Tags • Classifications • Effective start • Effective end |
Data classes | Columns: • Name • Category path • Distinctive name. If multiple data classes with the same name exist, the name and category path are listed here to help distinguish the data classes. • Description • Secondary categories • Tags • Classifications • Effective start • Effective end |
Categories | Columns: • Name • Path • Description • Tags • Classifications |
Knowledge Catalog | • Download information • Upload information |
Retrieving data from Cloud Pak for Data
To load the data into the workbook:
-
Create a copy of the workbook template for each metadata enrichment that you want to work on. Give each copy a meaningful name, for example, include the project name and the metadata enrichment name. Thus, you can easily identify where the data belongs.
-
Open a workbook. If you already activated the add-in, the Excel Home ribbon contains the Review metadata button (). If you don't see that button, activate the add-in now by following the instructions that apply to your version of Excel.
To open the add-in task pane, click the Review metadata button.
-
Log in with your Cloud Pak for Data credentials.
-
Retrieve governance artifacts and data assets. You can download this information in 2 separate steps. However, you must download the governance artifacts before you download the data assets. Otherwise, the assignments can't be displayed.
-
Retrieve governance artifacts
Add information about all data classes and business terms that are defined in Cloud Pak for Data to the respective sheets in the workbook. Also, add information about the categories to which the data classes and terms belong.
-
Retrieve data assets
Select a project and a metadata enrichment, and download the data assets that are in the scope of the selected metadata enrichment. If you don't see a newly created project in the projects list, reload the add-in.
Important: To avoid any potential data mismatches, always use a new workbook for data retrieval even if you retrieve data from a metadata enrichment on which you worked previously. -
After you successfully retrieve the information, the Knowledge Catalog sheet is populated with this information:
- The Cloud Pak for Data hostname
- The names of the project and the metadata enrichment from which the data was loaded. The spreadsheet will always reflect the display names as of the initial retrieval. They are not updated when the name of the project or the metadata enrichment is changed in IBM Knowledge Catalog. However, this does not impact the updates on upload because these are done by using the resource IDs, which are immutable.
- The date and time when the governance artifacts and the data assets were downloaded
In addition, the upload option is enabled in the add-in task pane.
The Business terms, Data classes, and Categories sheets contain the information listed in What the workbook looks like.
The Data assets sheet contains an alphabetical list of the data assets followed by an alphabetical list of all columns. The columns of the Data assets sheet are populated as follows:
Sheet column | Editable | Data asset | Asset column |
---|---|---|---|
Connection | No | Connection name | Connection name |
Data path | No | Schema | Schema |
Data asset | No | Asset name | Asset name |
Column | No |
|
Column name |
Type | No | Set to Dataset | Set to Field |
Description | Yes | Any description that might be available for the data asset | Any description that might be available for the asset column |
Assigned / suggested data classes | No | Assigned and suggested data classes An assigned data class is also listed in the Data class column. |
Assigned and suggested data classes An assigned data class is also listed in the Data class column. |
Data class | No for data asset Yes for asset columns |
Assigned data class | Assigned data class |
Assigned / suggested business terms | No | Assigned and suggested terms Assigned terms are also listed in separate Business term columns. |
Assigned and suggested terms Assigned terms are also listed in separate Business term columns. |
Business term The number of columns can vary. The default is 3 columns. If the data asset or asset column has more terms assigned, columns are added as needed. You can add further columns as required. See Reviewing and updating assignments. |
Yes | Assigned term | Assigned term |
Reviewing and updating assignments
To review and update the metadata:
-
Check the Data class and Business terms columns.
-
Leave correct assignments unchanged. Replace or remove incorrect assignments. For business terms, you can add as many as required. Each term must be in a separate column. By default, the sheet contains 3 columns for business terms. You can add extra columns as follows:
- Unprotect the Data asset sheet.
- Select the last Business term column.
- Right-click anywhere in that column and select Insert.
- Optional: Add the column header Business term.
- Protect the sheet again.
You can now use this new column to assign business terms.
Uploading the reviewed results
When you completed your review, upload the updated metadata to Cloud Pak for Data. You don't have to the save the workbook before you start the upload.
The data that you upload overwrites the enrichment results in the project. All previously assigned data classes are unassigned and marked as suggestions. Then, data class and business term assignments are updated as specified in the spreadsheet. Descriptions in the spreadsheet overwrite the asset and column descriptions in the project. All columns and assets are marked as reviewed.
Learn more
Parent topic: Managing metadata enrichment