You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Stefan Schweter <st...@schweter.it> on 2017/09/29 22:03:11 UTC

Sentence Detector on TIGER corpus

Hi OpenNLP-users,

I have one question about the pretrained model for the German sentence
detector.

The documentation says:

"Usually Sentence Detection is done **before** the text is tokenized and
that's the way the pre-trained models on the web site are trained"

So how was the provided model for German exactly trained? The TIGER
corpus IS tokenized - so was the TIGER corpus detokenized for training?

Is there any documentation available so that I can reproduce the
training steps for the pretrained model?

Thanks + regards,

Stefan