You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Jeff Zemerick <jz...@apache.org> on 2022/03/29 14:17:23 UTC

OPENNLP-1185: Tokenizers should be able to output a new line token

There is a JIRA task [1] that Jörn wrote a few years ago that calls for
allowing the tokenizers to output new line tokens and there is a PR [2] for
it.

The PR does not change the interfaces and just adds a keepNewLines boolean
to the tokenizers. It doesn't look like this change would affect any
existing applications using OpenNLP. I have built and tested the branch.

I'd appreciate another set of approval eyes on this one to see if we can
merge it and close the task.

[1] https://issues.apache.org/jira/browse/OPENNLP-1185
[2] https://github.com/apache/opennlp/pull/337

Thanks,
Jeff