Customizing and strengthening your matching algorithm (IBM Match 360)
IBM Match 360 with Watson includes tools that data engineer users can use to tune and customize your matching algorithm. By tuning your algorithm, you can control the way IBM Match 360 matches your data to create master data entities.
- Required permissions
- To configure a master data instance, you must be a member of the DataEngineer user group for the IBM Match 360 service.
There are four key parts of configuring and tuning your algorithm:
-
Selecting matching attributes. By choosing the data model attributes that are compared during the matching process, you can tell IBM Match 360 with Watson what data points are most important considerations for your algorithm. It's important to choose attributes that are strong differentiators. Unique identifiers such as drivers license numbers are excellent matching attributes. You must select matching attributes before you run matching the first time.
-
Requesting and completing pair reviews. Request a pair review to generate intelligent tuning recommendations that optimize your matching algorithm's weights and matching thresholds. During a pair review, a data steward compares pairs of records to determine if they are a match, maybe a match, or not a match. The data steward's answers inform the resulting tuning recommendations.
-
Applying tuning recommendations. After a pair review task is completed, a data engineer can decide whether to apply the tuning recommendations.
-
Defining autolink and clerical review thresholds. If you accept tuning recommendations from pair reviews, the autolink and clerical thresholds are automatically determined, but you can always override the thresholds manually if necessary. Each record-to-record matching comparison that IBM Match 360 completes generates a matching score. This score can be taken as a percentage value from 0 to 100, with 0 being a definite non-match and 100 being a definite match. As part of configuring the matching algorithm, a data engineer can define two threshold values:
-
The autolink threshold defines the minimum matching score for the algorithm to make an automatic match decision between any two records.
- If the autolink threshold is low, you will have more overall matches, with likely more false positive matches.
- If the autolink threshold is high, you will have fewer overall matches and more singleton entities (made up of only a single member record), with likely more false negative non-matches.
-
The clerical review threshold defines the minimum matching score for a potential match. Scores below the clerical review threshold are considered non-matches. Scores that fall in the range between the clerical review threshold and the autolink threshold can be sent through the potential matches workflow to be remediated by a data steward user.
Important: If the clerical range is not enabled in the matching settings, then the potential matches workflow cannot generate any tasks. For information about the potential matches workflow, see [Configuring master data workflows](m360-config-workflow.html). -
For information about advanced algorithm tuning procedures that use the IBM Match 360 REST API, see Advanced matching algorithm tuning.
In this topic:
- Preparing to tune your matching algorithm
- Selecting matching attributes
- Requesting pair reviews and applying tuning recommendations
- Manually changing the autolink and clerical review thresholds
Preparing to tune your matching algorithm
If you have not yet run matching on your data, then you must select your matching attributes first before you run matching. You can change your selections later if needed.
You cannot change your autolink threshold sensitivity or request pair reviews until after you run matching at least one time. This restriction ensures that you have some basis of comparison for changing your threshold from the default sensitivity. For example, if you notice too many false positive matches in your data, you can increase the sensitivity. If there are too many singleton records, you can decrease the sensitivity.
Before modifying your matching algorithm settings, consider creating a new configuration snapshot to save your current settings. Having a snapshot will make it easier to revert to the previous configuration later if you're unhappy with the results of your changes. For information about creating snapshots, see Saving and loading master data configuration settings by using snapshots.
Selecting matching attributes
To select the attributes that IBM Match 360 uses in the matching algorithm:
-
Click the navigation menu and select Matching setup to open the matching setup page.
-
Go to the Match settings tab and select Attribute selection in the sidebar to select the attributes to use in matching data. The first time that you go to this tab, IBM Match 360 automatically generates some suggested attributes from your data model to use in matching.
-
Review the list of matching attributes and their component fields. These attributes and fields will be used as the basis of comparison to match records and create master data entities. To add or remove attributes from the list, click Edit attributes then select or clear attributes and their component fields as needed.
As you choose your matching attributes, use the Match strength indicator to see an estimate of how your changes affect the matching algorithm.
If you have added any custom attributes to the data model, they are not selected for consideration in matching by default. If you want to use a custom attribute type in matching, you must select it and then specify which of its fields to consider. If you do not specify any fields, then the matching algorithim cannot use the attribute.
For non-custom (predefined) attribute types, if you do not specify which fields to consider, the matching algorithm uses a default set of fields.
-
When you are satisfied with your matching attribute changes, click Save.
-
Regenerate your matched entities based on your updated settings. Click the run matching icon in the action bar.
The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.
Requesting pair reviews and applying tuning recommendations
Use pair reviews to tune your matching algorithm. Each organization has different levels of risk tolerance for false matches, and pair reviews can help determine the right match settings for you.
Data engineers can request pair reviews to be completed by a data steward, and then decide whether to accept the resulting tuning recommendations.
To request a pair review:
-
Click the navigation menu and select Matching setup to open the matching setup page.
-
Select Algorithm tuning in the sidebar to access the algorithm tuning tools.
-
Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
-
In the Pair review section, click Request pair review.
-
Choose the number of record pairs that should be reviewed as part of this task. Reviewing more pairs will result in better tuning recommendations. If too few pairs are reviewed, then IBM Match 360 will not be able to generate recommendations.
Note: The actual number of generated pairs might not match the number defined in this step. The number of generated record pairs depends on the available amount of data in the system and other factors. -
Click Send request.
IBM Match 360 starts generating the record pairs and creating the pair review task. The Algorithm tuning section keeps you notified you of the status of the review (Generating pairs or Review in progress), and also tracks the progress of the current review task.
For information about completing a pair review task as a data steward user, see Completing pair reviews.
To review and apply the tuning recommendations generated by a pair review:
-
Click the navigation menu and select Matching setup to open the matching setup page.
-
Select Algorithm tuning in the sidebar to access the algorithm tuning tools.
-
Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
-
In the Pair review section, review the progress of the latest pair review task. You can see the total number of pairs reviewed and the numbers of pairs that were determined to be matches, not matches, or uncertain matches.
-
In the Thresholds section, review the current matching algorithm settings, as well as estimates of the current false positive and false negative rates.
If too few pair reviews have been completed or if matching has not yet been run, the false positive and false negative rates cannot be displayed.
-
Expand the Threshold recommendation section.
-
Review the recommended updates to the matching algorithm settings. The recommendation represents the threshold with the lowest false positive and false negative rates, based on your reviewed pairs.
-
If you want to use the recommended settings, click Apply recommendation. Applying the recommendation will change the autolink sensitivity and the associated matching weights of each attribute.
-
Regenerate your matched entities based on your updated settings. Go to the Match results tab, then click the run matching icon in the action bar.
The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.
Manually changing the autolink and clerical review thresholds
If you don't use pair reviews to generate recommendations, finding the correct autolink and clerical review sensitivity for your needs might take some trial and error. Depending on the particular requirements of your organization, you might need to repeat the process of adjusting the sensitivity and re-matching your data more than once.
The total autolink threshold is calculated by multiplying the autolink sensitivity (0-100) by the maximum possible matching score, which is determined based on the selected match attributes and their maximum weights in the algorithm.
To manually change the sensitivity of the matching alogrithm's autolink and clerical review thresholds:
- Click the navigation menu and select Matching setup to open the matching setup page.
- Select Algorithm tuning in the sidebar to access the algorithm tuning tools.
- Ensure that the correct matching algorithm is selected. The default matching algorithm names are Person - Person entity and Organization - Organization entity.
- Review the current settings in the Thresholds section.
- Use the slider or type number values to update your autolink and clerical review thresholds, then click Apply threshold. You will be prompted to run matching to apply your algorithm changes.
- Optionally, you can disable the clerical review range by using the Clerical range toggle switch. If the clerical range is disabled, the algorithm can only make match or no-match decisions, and cannot queue any potential match tasks for data stewards to remediate.
- Regenerate your matched entities based on your updated settings. Go to the Match results tab, then click the run matching icon in the action bar.
The matching process takes a while to complete. It runs in the background so that you can continue working. You'll be notified when it's complete, and then you can review details of the results on the Match results tab.
Next steps
Learn more
Parent topic: Configuring master data