You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@opennlp.apache.org by "William Colen (JIRA)" <ji...@apache.org> on 2017/01/02 20:11:58 UTC

[jira] [Updated] (OPENNLP-743) The chunker training data format is incorrectly/insufficiently described.

     [ https://issues.apache.org/jira/browse/OPENNLP-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

William Colen updated OPENNLP-743:
----------------------------------
    Fix Version/s: 1.7.1

> The chunker training data format is incorrectly/insufficiently described.
> -------------------------------------------------------------------------
>
>                 Key: OPENNLP-743
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-743
>             Project: OpenNLP
>          Issue Type: Documentation
>          Components: Chunker
>    Affects Versions: 1.7.0
>            Reporter: Zuzana Neverilova
>            Priority: Minor
>              Labels: documentation, easyfix, newbie
>             Fix For: 1.7.1
>
>   Original Estimate: 10m
>  Remaining Estimate: 10m
>
> The chunker training data format is described as follows: The train data consist of three columns separated by spaces. Each word has been put on a separate line and there is an empty line after each sentence. However, in the example, several spaces are between tokens and tag. First, it looks like tabs (which are not allowed), second several spaces are not allowed as well (apparently, the line String is splitted(" ")). Suggestion: emphasize that columns are separated by one space and tabs are not allowed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)