You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Martin Wiesner (Jira)" <ji...@apache.org> on 2023/09/01 14:46:00 UTC

[jira] [Comment Edited] (OPENNLP-1190) CONLL02 format

    [ https://issues.apache.org/jira/browse/OPENNLP-1190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17761328#comment-17761328 ] 

Martin Wiesner edited comment on OPENNLP-1190 at 9/1/23 2:45 PM:
-----------------------------------------------------------------

In 2023, [https://www.lsi.upc.es/~nlp/tools/nerc/nerc.html] yields a 404 for which reason the resource mentioned on the mailing list in 2014 is no longer available this way.

Alternatively, the URL 

[https://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html]

is working fine.


was (Author: mawiesne):
In 2023, [https://www.lsi.upc.es/~nlp/tools/nerc/nerc.html] yields a 404 for which reason the resource mentioned on the mailing list in 2014 is no longer available this way.

> CONLL02 format
> --------------
>
>                 Key: OPENNLP-1190
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1190
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Formats
>    Affects Versions: tools-1.5.3
>            Reporter: Luca
>            Priority: Major
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> According to the documentation, the following should work
>  bin/opennlp TokenNameFinderConverter conll02 -data esp.train -lang es -types per > es_corpus_train_persons.txt
> However currently it delivers error message since  it expects 3 columns instead of 2 that are in the dataset.
> This is a bug, introduced at line 130 of   opennlp.tools.formats.Conll02NameSampleStream.java where a length of 3 is imposed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)