Text Analytics rapidly and accurately captures key concepts from text data by using an extraction process. This process relies on linguistic resources to dictate how large amounts of unstructured, textual data is analyzed and interpreted.
You can use the Resource editor tab to view the linguistic resources that are used in the extraction process. These resources are stored in the form of templates and libraries, which are used to extract concepts, group them under types, discover patterns in the text data, and other processes. Text Analytics offers several preconfigured resource templates, and in some languages, you can also use the resources in text analysis packages.
On the Resource editor tab, you work with terms and types to identify the concepts to extract from a document. These technical terms are defined as follows.
- Concepts
- Concepts are important words and phrases that were identified and extracted from your text data. They are also referred to as extraction results. These concepts are grouped into types. You can use these concepts to explore your data and create your categories.
- Terms
- Terms are the specific words that make up a concept. Terms are single words such as
airport
orlocation
and word phrases such asairport pick-up
. They are used to identify concepts in the text. Terms can be plural or singular forms of words, parts of larger words, synonyms, or spelling variations. - Types
- Types are semantic groupings for concepts. When concepts are extracted, they are assigned a type
to help group similar concepts. For example, some of the default types are
<Location>
,<Organization>
,<Person>
,<Positive>
, and<Negative>
.
You can use the Resource editor tab to customize and tune the linguistic resources. You can also use the controls to manage how terms are matched with text data and define rules for text links analysis (TLA).
Terms/synonyms pane
The Terms/synonyms pane shows all the libraries that are used as linguistic resources during the extraction process. If you want to customize how specific terms are grouped into concepts, you can edit the terms in the libraries. You can also add terms to the libraries. For instance, if your text data is specific to one field or discipline, you can add any technical terms that might be missing.
Custom libraries and templates
Because these resources might not fit the context of your data perfectly, you can create and manage your own resources for a particular context or domain in the Resource editor tab.
You can save any changes that you make to a library or template as a project asset, which you can then reuse in other flows. You can also import custom libraries or templates in case you manage your resources by using local files.
Fuzzy grouping and inflection grouping
You can use fuzzy grouping and inflection grouping techniques when analyzing text data. The fuzzy grouping technique groups commonly-misspelled words or closely-spelled words, and the inflection grouping technique groups inflected variants of words based on the root.
If you find that two words with similar spelling are incorrectly grouped together when you enabled these features, you can exclude the words from these grouping techniques. You can add the incorrectly matched pairs into the Exceptions section in the Advanced resources tab.