SPSS Modeler uses an extraction process that relies on linguistic resources. These linguistic resources serve as the basis for how to process the text data and extract information to get the concepts, types, and sometimes patterns.
The linguistic resources can be divided into different types:
- Category sets
- Categories are a group of closely related ideas and patterns that the text data is assigned to through a scoring process.
- Libraries
- Libraries are used as building blocks for both TAPs and templates. Each library is made up of several dictionaries, which are used to define and manage terms, synonyms, and exclude lists. While libraries are also delivered individually, they are prepackaged together in templates and TAPs.
- Templates
- Templates consist of a set of libraries and some advanced linguistic and nonlinguistic resources. These resources form a specialized set that is adapted to a particular domain or context, such as product opinions.
- Text analysis packages (TAP)
- A text analysis package is a predefined template that is bundled with one or more category sets. TAPs bundle together these resources so that the categories and the resources that were used to generate them are both stored together and reusable. You can then reuse a TAP to apply the same categories and resources to other flows.
Custom linguistic resources
SPSS Modeler has a default set of specialized linguistic resources. You can use these linguistic resources to benefit from research and fine-tuning for specific languages and specific applications. However, these linguistic resources might not be optimized for your context or your data. You can edit and save your changes to these linguistic resources to optimize the extraction process for your flow.
You can also create and import custom linguistic resources that are uniquely fine-tuned to your organization's data. You can use local files to share these linguistic resources between users and projects. You can add a template, library, or TAP as a project asset from a local file.