You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Nicolas Hernandez (JIRA)" <ji...@apache.org> on 2012/06/27 16:53:44 UTC
[jira] [Updated] (OPENNLP-515) Request for multi-words expressions
(MWE) support in serialization formats
[ https://issues.apache.org/jira/browse/OPENNLP-515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Nicolas Hernandez updated OPENNLP-515:
--------------------------------------
Issue Type: Improvement (was: New Feature)
> Request for multi-words expressions (MWE) support in serialization formats
> --------------------------------------------------------------------------
>
> Key: OPENNLP-515
> URL: https://issues.apache.org/jira/browse/OPENNLP-515
> Project: OpenNLP
> Issue Type: Improvement
> Components: Chunker, Command Line Interface, Coref, Doccat, Name Finder, Parser, POS Tagger
> Affects Versions: tools-1.5.3
> Reporter: Nicolas Hernandez
>
> Multi-words expressions (MWE) are expressions with whitespace-separated words like "traffic light", "in order to", "two thousand and one", "Jules Verne"...
> So far, by using the CLI to train a model (in particular a POS model), there was no way to specify what is a simple or a multi-word expressions.
> By convention, users use the underscore character to concat the words of MWE and make MWE a token.
> Consequently a model trained by the API on the same data can be distinct since this preprocessing is not required.
> We need to offer to the users the possibility to set by parameter in the CLI what is the MWE separator char sequence.
> This concerns both trainers and labelers.
> A default MWE separator should be specified which will be used when serializing data with MWEs.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira