You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Richard Eckart de Castilho <re...@apache.org> on 2016/03/01 23:13:52 UTC
OpenNLP maxent model trained with wrong encoding
Hi all,
I noticed that the OpenNLP German POS Tagger maxent model available from Sourceforge has been trained using the wrong encoding setting. Apparently the input data was UTF-8, but it was read as ISO8859-1. The perceptron model is not affected. I only examined NER and POS models, not tokenizer or sentence splitter models.
Best,
-- Richard
Re: OpenNLP maxent model trained with wrong encoding
Posted by Richard Eckart de Castilho <re...@apache.org>.
Hi again,
the Spanish and Dutch NER models are also affected, was just a bit more difficult to figure out because the models internally lower-case the features.
Cheers,
-- Richard
> On 01.03.2016, at 23:13, Richard Eckart de Castilho <re...@apache.org> wrote:
>
> Hi all,
>
> I noticed that the OpenNLP German POS Tagger maxent model available from Sourceforge has been trained using the wrong encoding setting. Apparently the input data was UTF-8, but it was read as ISO8859-1. The perceptron model is not affected. I only examined NER and POS models, not tokenizer or sentence splitter models.
>
> Best,
>
> -- Richard