You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Xiang Ji <xi...@student.uni-tuebingen.de> on 2018/07/30 16:33:53 UTC

OpenNLP Name Finder training: Unsupported language: en

Hi,

I’m trying to test training OpenNLP’s Name Finder on some data, 
according to the guide in the documentation. However, I encountered the 
error: |Unsupported language: en|, which doesn’t seem to make any sense.

The command I ran is: |opennlp TokenNameFinderTrainer.conll03 -model 
model.bin -lang en -types per,loc,org,misc -data train.txt -encoding UTF-8|

I downloaded OpenNLP 1.9.0 from 
https://opennlp.apache.org/download.html. The |OPENNLP_HOME| environment 
variable does seem to be properly set, and the |lang| folder in the base 
folder contains an |en| folder.

Best regards,

Xiang Ji

​

Re: OpenNLP Name Finder training: Unsupported language: en

Posted by Xiang Ji <xi...@student.uni-tuebingen.de>.
The `eng` language code only applies to CoNLL-2003 related commands. I 
made a PR and a JIRA issue to update the documentation.


On 30.07.2018 20:09, Xiang Ji wrote:
>
> Apparently the language name is changed from `en` to `eng` in a later 
> version... Why wasn't the documentation updated? (it still says "en") 
> Was banging my head for 2 hours trying to figure it out!
>
>
> On 30.07.2018 20:05, Xiang Ji wrote:
>>
>> OK apparently using OpenNLP 1.5.3 works. Something is broken in the 
>> later versions. I'll try to file a bug report.
>>
>>
>> On 30.07.2018 20:00, Xiang Ji wrote:
>>>
>>> This seems to have something to do with the CoNLL2003 format. If I 
>>> try to run the trainer directly it works. However my input data is 
>>> in CoNLL 2003 format. Running `TokenNameFinderConverter` gives me 
>>> the same error. Even trying it on the official example 
>>> https://github.com/apache/opennlp/blob/master/opennlp-tools/src/test/resources/opennlp/tools/formats/conll2003-en.sample 
>>> doesn't work.
>>>
>>>
>>> On 30.07.2018 18:33, Xiang Ji wrote:
>>>>
>>>> Hi,
>>>>
>>>> I’m trying to test training OpenNLP’s Name Finder on some data, 
>>>> according to the guide in the documentation. However, I encountered 
>>>> the error: |Unsupported language: en|, which doesn’t seem to make 
>>>> any sense.
>>>>
>>>> The command I ran is: |opennlp TokenNameFinderTrainer.conll03 
>>>> -model model.bin -lang en -types per,loc,org,misc -data train.txt 
>>>> -encoding UTF-8|
>>>>
>>>> I downloaded OpenNLP 1.9.0 from 
>>>> https://opennlp.apache.org/download.html. The |OPENNLP_HOME| 
>>>> environment variable does seem to be properly set, and the |lang| 
>>>> folder in the base folder contains an |en| folder.
>>>>
>>>> Best regards,
>>>>
>>>> Xiang Ji
>>>>
>>>> ​
>>>
>>
>


Re: OpenNLP Name Finder training: Unsupported language: en

Posted by Xiang Ji <xi...@student.uni-tuebingen.de>.
Apparently the language name is changed from `en` to `eng` in a later 
version... Why wasn't the documentation updated? (it still says "en") 
Was banging my head for 2 hours trying to figure it out!


On 30.07.2018 20:05, Xiang Ji wrote:
>
> OK apparently using OpenNLP 1.5.3 works. Something is broken in the 
> later versions. I'll try to file a bug report.
>
>
> On 30.07.2018 20:00, Xiang Ji wrote:
>>
>> This seems to have something to do with the CoNLL2003 format. If I 
>> try to run the trainer directly it works. However my input data is in 
>> CoNLL 2003 format. Running `TokenNameFinderConverter` gives me the 
>> same error. Even trying it on the official example 
>> https://github.com/apache/opennlp/blob/master/opennlp-tools/src/test/resources/opennlp/tools/formats/conll2003-en.sample 
>> doesn't work.
>>
>>
>> On 30.07.2018 18:33, Xiang Ji wrote:
>>>
>>> Hi,
>>>
>>> I’m trying to test training OpenNLP’s Name Finder on some data, 
>>> according to the guide in the documentation. However, I encountered 
>>> the error: |Unsupported language: en|, which doesn’t seem to make 
>>> any sense.
>>>
>>> The command I ran is: |opennlp TokenNameFinderTrainer.conll03 -model 
>>> model.bin -lang en -types per,loc,org,misc -data train.txt -encoding 
>>> UTF-8|
>>>
>>> I downloaded OpenNLP 1.9.0 from 
>>> https://opennlp.apache.org/download.html. The |OPENNLP_HOME| 
>>> environment variable does seem to be properly set, and the |lang| 
>>> folder in the base folder contains an |en| folder.
>>>
>>> Best regards,
>>>
>>> Xiang Ji
>>>
>>> ​
>>
>


Re: OpenNLP Name Finder training: Unsupported language: en

Posted by Xiang Ji <xi...@student.uni-tuebingen.de>.
OK apparently using OpenNLP 1.5.3 works. Something is broken in the 
later versions. I'll try to file a bug report.


On 30.07.2018 20:00, Xiang Ji wrote:
>
> This seems to have something to do with the CoNLL2003 format. If I try 
> to run the trainer directly it works. However my input data is in 
> CoNLL 2003 format. Running `TokenNameFinderConverter` gives me the 
> same error. Even trying it on the official example 
> https://github.com/apache/opennlp/blob/master/opennlp-tools/src/test/resources/opennlp/tools/formats/conll2003-en.sample 
> doesn't work.
>
>
> On 30.07.2018 18:33, Xiang Ji wrote:
>>
>> Hi,
>>
>> I’m trying to test training OpenNLP’s Name Finder on some data, 
>> according to the guide in the documentation. However, I encountered 
>> the error: |Unsupported language: en|, which doesn’t seem to make any 
>> sense.
>>
>> The command I ran is: |opennlp TokenNameFinderTrainer.conll03 -model 
>> model.bin -lang en -types per,loc,org,misc -data train.txt -encoding 
>> UTF-8|
>>
>> I downloaded OpenNLP 1.9.0 from 
>> https://opennlp.apache.org/download.html. The |OPENNLP_HOME| 
>> environment variable does seem to be properly set, and the |lang| 
>> folder in the base folder contains an |en| folder.
>>
>> Best regards,
>>
>> Xiang Ji
>>
>> ​
>


Re: OpenNLP Name Finder training: Unsupported language: en

Posted by Xiang Ji <xi...@student.uni-tuebingen.de>.
This seems to have something to do with the CoNLL2003 format. If I try 
to run the trainer directly it works. However my input data is in CoNLL 
2003 format. Running `TokenNameFinderConverter` gives me the same error. 
Even trying it on the official example 
https://github.com/apache/opennlp/blob/master/opennlp-tools/src/test/resources/opennlp/tools/formats/conll2003-en.sample 
doesn't work.


On 30.07.2018 18:33, Xiang Ji wrote:
>
> Hi,
>
> I’m trying to test training OpenNLP’s Name Finder on some data, 
> according to the guide in the documentation. However, I encountered 
> the error: |Unsupported language: en|, which doesn’t seem to make any 
> sense.
>
> The command I ran is: |opennlp TokenNameFinderTrainer.conll03 -model 
> model.bin -lang en -types per,loc,org,misc -data train.txt -encoding 
> UTF-8|
>
> I downloaded OpenNLP 1.9.0 from 
> https://opennlp.apache.org/download.html. The |OPENNLP_HOME| 
> environment variable does seem to be properly set, and the |lang| 
> folder in the base folder contains an |en| folder.
>
> Best regards,
>
> Xiang Ji
>
> ​