You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2013/04/02 22:41:15 UTC
[jira] [Created] (OPENNLP-568) Doccat command line tagger should
assume whitespace tokenized input
Joern Kottmann created OPENNLP-568:
--------------------------------------
Summary: Doccat command line tagger should assume whitespace tokenized input
Key: OPENNLP-568
URL: https://issues.apache.org/jira/browse/OPENNLP-568
Project: OpenNLP
Issue Type: Bug
Components: Command Line Interface, Doccat
Affects Versions: tools-1.5.2-incubating
Reporter: Joern Kottmann
Assignee: Joern Kottmann
Fix For: tools-1.5.3
The DoccatTool should read the doccat default format from stdin. The default format is whitespace tokenized, but the DoccatTool uses the Simple Tokenizer to tokenize the input text.
To fix this issue use the Whitespace Tokenizer instead of the Simple Tokenizer.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira