You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (JIRA)" <ji...@apache.org> on 2012/06/27 16:53:44 UTC

[jira] [Updated] (OPENNLP-515) Request for multi-words expressions (MWE) support in serialization formats

     [ https://issues.apache.org/jira/browse/OPENNLP-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicolas Hernandez updated OPENNLP-515:
--------------------------------------

    Issue Type: Improvement  (was: New Feature)
    
> Request for multi-words expressions (MWE) support in serialization formats
> --------------------------------------------------------------------------
>
>                 Key: OPENNLP-515
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-515
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Chunker, Command Line Interface, Coref, Doccat, Name Finder, Parser, POS Tagger
>    Affects Versions: tools-1.5.3
>            Reporter: Nicolas Hernandez
>
> Multi-words expressions (MWE) are expressions with whitespace-separated words like "traffic light", "in order to", "two thousand and one", "Jules Verne"...
> So far, by using the CLI to train a model (in particular a POS model), there was no way to specify what is a simple or a multi-word expressions. 
> By convention, users use the underscore character to concat the words of MWE and make MWE a token.
> Consequently a model trained by the API on the same data can be distinct since this preprocessing is not required.
> We need to offer to the users the possibility to set by parameter in the CLI what is the MWE separator char sequence.
> This concerns both trainers and labelers.
> A default MWE separator should be specified which will be used when serializing data with MWEs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira