The Watson Natural Language Processing Syntax block encapsulates syntax analysis functionality.
Block names
syntax_izumo_<language>_stock
syntax_izumo_<language>_stock-dp
Supported languages
The Syntax analysis block is available for the following languages. For a list of the language codes and the corresponding language, see Language codes.
Language codes to use for model syntax_izumo_<language>_stock
: af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw
Language codes to use for model syntax_izumo_<language>_stock-dp
: af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh
Task | Supported language codes |
---|---|
Tokenization | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh |
Part-of-speech tagging | af, ar, bs, ca, cs, da, de, nl, nn, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh |
Lemmatization | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh |
Sentence detection | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh |
Paragraph detection | af, ar, bs, ca, cs, da, de, el, en, es, fi, fr, he, hi, hr, it, ja, ko, nb, nl, nn, pl, pt, ro, ru, sk, sr, sv, tr, zh_cn, zh_tw, zh |
Dependency parsing | af, ar, bs, cs, da, de, en, es, fi, fr, hi, hr, it, ja, nb, nl, nn, pt, ro, ru, sk, sr, sv |
Capabilities
Use this block to perform tasks like sentence detection, tokenization, part-of-speech tagging, lemmatization and dependency parsing in different languages. For most tasks, you will likely only need sentence detection, tokenization, and part-of-speech
tagging. For these use cases use the syntax_model_xx_stock
model. If you want to run dependency parsing, use the syntax_model_xx_stock-dp
model.
The analysis for Part-of-speech (POS) tagging and dependencies follows the Universal Parts of Speech tagset (Universal POS tags) and the Universal Dependencies v2 tagset (Universal Dependency Relations).
The following table shows you the capabilities of each task based on the same example and the outcome to the parse.
Capabilities | Examples | Parser attributes |
---|---|---|
Tokenization | "I don't like Mondays" --> "I" , "do", "n't", "like", "Mondays" | token |
Part-Of_Speech detection | "I don't like Mondays" --> "I"\POS_PRON, "do"\POS_AUX, "n't"\POS_PART, "like"\POS_VERB, "Mondays"\POS_PROPN | part_of_speech |
Lemmatization | "I don't like Mondays" --> "I", "do", "not", "like", "Monday" | lemma |
Dependency parsing | "I don't like Mondays" --> "I"-SUBJECT->"like"<-OBJECT-"Mondays" | dependency |
Sentence detection | "I don't like Mondays" --> returns this sentence | sentence |
Paragraph detection (Currently paragraph detection is still experimental and returns similar results to sentence detection.) | "I don't like Mondays" --> returns this sentence as being a paragraph | sentence |
Dependencies on other blocks
None
Code sample
import watson_nlp
# Load Syntax for English
syntax_model = watson_nlp.load('syntax_izumo_en_stock')
# Detect tokens, lemma and part-of-speech
text = 'I don\'t like Mondays'
syntax_prediction = syntax_model.run(text, parsers=('token', 'lemma', 'part_of_speech'))
# Print the syntax result
print(syntax_prediction)
Output of the code sample:
{
"text": "I don't like Mondays",
"producer_id": {
"name": "Izumo Text Processing",
"version": "0.0.1"
},
"tokens": [
{
"span": {
"begin": 0,
"end": 1,
"text": "I"
},
"lemma": "I",
"part_of_speech": "POS_PRON"
},
{
"span": {
"begin": 2,
"end": 4,
"text": "do"
},
"lemma": "do",
"part_of_speech": "POS_AUX"
},
{
"span": {
"begin": 4,
"end": 7,
"text": "n't"
},
"lemma": "not",
"part_of_speech": "POS_PART"
},
{
"span": {
"begin": 8,
"end": 12,
"text": "like"
},
"lemma": "like",
"part_of_speech": "POS_VERB"
},
{
"span": {
"begin": 13,
"end": 20,
"text": "Mondays"
},
"lemma": "Monday",
"part_of_speech": "POS_PROPN"
}
],
"sentences": [
{
"span": {
"begin": 0,
"end": 20,
"text": "I don't like Mondays"
}
}
],
"paragraphs": [
{
"span": {
"begin": 0,
"end": 20,
"text": "I don't like Mondays"
}
}
]
}
Parent topic: Watson Natural Language Processing task catalog