You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/08/01 01:04:00 UTC
[jira] [Commented] (OPENNLP-1210) Outdated documentation on -lang argument?

    [ https://issues.apache.org/jira/browse/OPENNLP-1210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564583#comment-16564583 ] 

ASF GitHub Bot commented on OPENNLP-1210:
-----------------------------------------

kojisekig closed pull request #325: [OPENNLP-1210] Change `-lang en` in documentation to `-lang eng`
URL: https://github.com/apache/opennlp/pull/325
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/opennlp-docs/src/docbkx/corpora.xml b/opennlp-docs/src/docbkx/corpora.xml
index aeef36c6d..187c9c313 100644
--- a/opennlp-docs/src/docbkx/corpora.xml
+++ b/opennlp-docs/src/docbkx/corpora.xml
@@ -270,6 +270,8 @@ path: .\es_ner_person.bin]]>
 		<para>After one of the corpora is available the data must be
 		transformed as explained in the README file to the CONLL format.
 		The transformed data can be read by the OpenNLP CONLL03 converter.
+
+      Note that for CoNLL-2003 corpora, the -lang argument should either be "eng" or "deu", instead of "en" or "de".
 		</para>
 		</section>
 		<section id="tools.corpora.conll.2003.converting">
@@ -278,13 +280,13 @@ path: .\es_ner_person.bin]]>
 		To convert the information to the OpenNLP format:
 		<screen>
 			<![CDATA[
-$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.train > corpus_train.txt]]>
+$ opennlp TokenNameFinderConverter conll03 -lang eng -types per -data eng.train > corpus_train.txt]]>
 		</screen>
 		Optionally, you can convert the training test samples as well.
 		<screen>
 			<![CDATA[
-$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testa > corpus_testa.txt
-$ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testb > corpus_testb.txt]]>
+$ opennlp TokenNameFinderConverter conll03 -lang eng -types per -data eng.testa > corpus_testa.txt
+$ opennlp TokenNameFinderConverter conll03 -lang eng -types per -data eng.testb > corpus_testb.txt]]>
 		</screen>
 		</para>
 		</section>
@@ -295,7 +297,7 @@ $ opennlp TokenNameFinderConverter conll03 -lang en -types per -data eng.testb >
                 <screen>
                 <![CDATA[
 $ opennlp TokenNameFinderTrainer.conll03 -model en_ner_person.bin -iterations 500 \
-                                 -lang en -types per -data eng.train -encoding utf8]]>
+                                 -lang eng -types per -data eng.train -encoding utf8]]>
                 </screen>
             </para>
 		    <para>
@@ -346,7 +348,7 @@ path: .\en_ner_person.bin]]>
                 <screen>
                 <![CDATA[
 $ opennlp TokenNameFinderEvaluator.conll03 -model en_ner_person.bin \
-                                   -lang en -types per -data eng.testa -encoding utf8]]>
+                                   -lang eng -types per -data eng.testa -encoding utf8]]>
                 </screen>
             </para>
 		    <para>
@@ -745,4 +747,4 @@ Organization: precision:   85.11%;  recall:   79.38%; F1:   82.14%. [target: 130
 			</para>
 		</section>
 	</section>
-</chapter>
\ No newline at end of file
+</chapter>


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Outdated documentation on -lang argument?
> -----------------------------------------
>
>                 Key: OPENNLP-1210
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-1210
>             Project: OpenNLP
>          Issue Type: Bug
>            Reporter: Xiang Ji
>            Priority: Major
>
> I encountered "Unsupported language: en" error when I was trying to run the `TokenNameFinderConverter` or the `{{TokenNameFinderTrainer}}`.
>  
> I'm not sure if I understood the bug correctly but it seems that after 2 hours of trying, I found out that apparently in a certain version after `1.5.3`, OpenNLP changed the language codes from two characters to three characters, i.e. one should have passed in `eng` instead of `en`. But the documentation was never updated on this and no meaningful error message was given (i.e. the program didn't suggest "supported languages" instead).
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)