You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/09 14:04:52 UTC
[jira] [Created] (STANBOL-1251) Pos tag based Phrase extraction
Engine
Rupert Westenthaler created STANBOL-1251:
--------------------------------------------
Summary: Pos tag based Phrase extraction Engine
Key: STANBOL-1251
URL: https://issues.apache.org/jira/browse/STANBOL-1251
Project: Stanbol
Issue Type: New Feature
Components: Enhancement Engines
Reporter: Rupert Westenthaler
Assignee: Rupert Westenthaler
Implement an Enhancement Engine that uses POS tags to extract Noun and Verb Phrases
In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see documentation at [1] for detailed information). This alignment allows engines to language independent determine the lexical categories of tokens in the text.
The Pos-Chunker Engine will use those lexical categories of tokens to extract Noun and Verb phrases by using the following rules
### Noun Phrases
* start: noun, pronoun, determiners, adjectives
* continuation: nouns, adpositions , pronouns, determiner, adjectives, punctations
* end: noun, pronoun, determiners, adjectives
* required: noun
### Verb Phrases
* start: verb, adverb
* continuation: verb, adverb, punctations
* end: verb, adverb
* required: verb
This engine will allow to configure the processed languages (e.g. to deactivate it for languages where other chunker are available).
The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK
The current plan is to make this engine also available in the 0.12 branch
[1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)