You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Stefan Schweter <st...@schweter.it> on 2017/09/29 22:03:11 UTC
Sentence Detector on TIGER corpus
Hi OpenNLP-users,
I have one question about the pretrained model for the German sentence
detector.
The documentation says:
"Usually Sentence Detection is done **before** the text is tokenized and
that's the way the pre-trained models on the web site are trained"
So how was the provided model for German exactly trained? The TIGER
corpus IS tokenized - so was the TIGER corpus detokenized for training?
Is there any documentation available so that I can reproduce the
training steps for the pretrained model?
Thanks + regards,
Stefan