You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/09 14:04:52 UTC

[jira] [Created] (STANBOL-1251) Pos tag based Phrase extraction Engine

Rupert Westenthaler created STANBOL-1251:
--------------------------------------------

             Summary: Pos tag based Phrase extraction Engine
                 Key: STANBOL-1251
                 URL: https://issues.apache.org/jira/browse/STANBOL-1251
             Project: Stanbol
          Issue Type: New Feature
          Components: Enhancement Engines
            Reporter: Rupert Westenthaler
            Assignee: Rupert Westenthaler


Implement an Enhancement Engine that uses POS tags to extract Noun and Verb Phrases

In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see documentation at [1] for detailed information). This alignment allows engines to language independent determine the lexical categories of tokens in the text.

The Pos-Chunker Engine will use those lexical categories of tokens to extract Noun and Verb phrases by using the following rules

### Noun Phrases

* start: noun, pronoun, determiners, adjectives
* continuation: nouns, adpositions , pronouns, determiner, adjectives, punctations
* end: noun, pronoun, determiners, adjectives
* required: noun

### Verb Phrases

* start: verb, adverb
* continuation: verb, adverb, punctations
* end: verb, adverb
* required: verb

This engine will allow to configure the processed languages (e.g. to deactivate it for languages where other chunker are available).

The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK

The current plan is to make this engine also available in the 0.12 branch

[1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)