You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Rupert Westenthaler (JIRA)" <ji...@apache.org> on 2014/01/22 10:57:19 UTC

[jira] [Resolved] (STANBOL-1251) Pos tag based Phrase extraction Engine

     [ https://issues.apache.org/jira/browse/STANBOL-1251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Rupert Westenthaler resolved STANBOL-1251.
------------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.12.0

A first working version of the Engine is available in trunk (1.0.0-SNAPSHOT) and the 0.12 branch. Further improvements (see TODO comments in the engine) should be done in their own issues.

> Pos tag based Phrase extraction Engine
> --------------------------------------
>
>                 Key: STANBOL-1251
>                 URL: https://issues.apache.org/jira/browse/STANBOL-1251
>             Project: Stanbol
>          Issue Type: New Feature
>          Components: Enhancement Engines
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>             Fix For: 0.12.0
>
>
> Implement an Enhancement Engine that uses POS tags to extract Noun and Verb Phrases
> In Stanbol POS annotations can be aligned to concepts of the OLIA ontology (see documentation at [1] for detailed information). This alignment allows engines to language independent determine the lexical categories of tokens in the text.
> The Pos-Chunker Engine will use those lexical categories of tokens to extract Noun and Verb phrases by using the following rules
> ### Noun Phrases
> * start: noun, pronoun, determiners, adjectives
> * continuation: nouns, adpositions, adjectives, punctations
> * end: noun, pronoun, determiners, adjectives
> * required: noun
> ### Verb Phrases
> * start: verb, adverb
> * continuation: verb, adverb, punctations
> * end: verb, adverb
> * required: verb
> This engine will allow to configure the processed languages (e.g. to deactivate it for languages where other chunker are available).
> The EnhancementEngine ordering will be ServiceProperties.ORDERING_NLP_CHUNK
> The current plan is to make this engine also available in the 0.12 branch
> [1] http://stanbol.staging.apache.org/docs/trunk/components/enhancer/nlp/nlpannotations



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)