You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@stanbol.apache.org by "Fabian Christ (JIRA)" <ji...@apache.org> on 2012/12/12 20:48:19 UTC

[jira] [Updated] (STANBOL-795) OpenNLP Tokenizer Engine

     [ https://issues.apache.org/jira/browse/STANBOL-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Fabian Christ updated STANBOL-795:
----------------------------------

    Component/s:     (was: Enhancer)
                 Engine - OpenNLP Tokenizer
    
> OpenNLP Tokenizer Engine
> ------------------------
>
>                 Key: STANBOL-795
>                 URL: https://issues.apache.org/jira/browse/STANBOL-795
>             Project: Stanbol
>          Issue Type: Sub-task
>          Components: Engine - OpenNLP Tokenizer
>            Reporter: Rupert Westenthaler
>            Assignee: Rupert Westenthaler
>
> Implement an separate OpenNLP Tokenizer Engine.
> While some Engines like the OpenNLP POS or the CELI Lemmatizer engine do support tokenizing (if tokens do not already exist in the Analyzed Text) it is important to implement an engine explicitly for this task.
> This engine also supports the language configuration (see following example)
>     en;model=SIMPLE
>     de;model=mySpecificTokenizerModel_de.bin
>     !jp
>     !zh
>     *
> the 'model' parameter can be used to load specific tokenizer models. "SIMPLE" forces the use of the OpenNLP SimpleTokenizer. If no model configuration is present the default tokenizer for the language is loaded ("{lang}-token.bin" or the simple tokenizer if the language model is not present).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira