You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2013/04/02 22:41:15 UTC

[jira] [Created] (OPENNLP-568) Doccat command line tagger should assume whitespace tokenized input

Joern Kottmann created OPENNLP-568:
--------------------------------------

             Summary: Doccat command line tagger should assume whitespace tokenized input
                 Key: OPENNLP-568
                 URL: https://issues.apache.org/jira/browse/OPENNLP-568
             Project: OpenNLP
          Issue Type: Bug
          Components: Command Line Interface, Doccat
    Affects Versions: tools-1.5.2-incubating
            Reporter: Joern Kottmann
            Assignee: Joern Kottmann
             Fix For: tools-1.5.3


The DoccatTool should read the doccat default format from stdin. The default format is whitespace tokenized, but the DoccatTool uses the Simple Tokenizer to tokenize the input text.

To fix this issue use the Whitespace Tokenizer instead of the Simple Tokenizer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira