You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Joern Kottmann (JIRA)" <ji...@apache.org> on 2016/11/02 18:20:58 UTC

[jira] [Comment Edited] (OPENNLP-857) ParserTool should take use Tokenizer instance. It should not use java.util.StringTokenizer

    [ https://issues.apache.org/jira/browse/OPENNLP-857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15629913#comment-15629913 ] 

Joern Kottmann edited comment on OPENNLP-857 at 11/2/16 6:20 PM:
-----------------------------------------------------------------

Thanks that is really nice work. We can apply that like it is. I removed one if statement and just initialize the tokenizer variable to the white space tokenizer.


was (Author: joern):
Thanks that is really nice work. We can apply that like it is. I remove one if statement and just initialize the tokenizer variable to the white space tokenizer.

> ParserTool should take use Tokenizer instance. It should not use java.util.StringTokenizer
> ------------------------------------------------------------------------------------------
>
>                 Key: OPENNLP-857
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-857
>             Project: OpenNLP
>          Issue Type: Improvement
>          Components: Parser
>    Affects Versions: 1.6.0
>            Reporter: Tristan Nixon
>            Assignee: Joern Kottmann
>             Fix For: 1.6.1
>
>         Attachments: ParserToolTokenize.patch
>
>
> It would be nice if the ParserTool would make use of a real tokenizer. In addition to being the "right" thing to do, it would obviate issues like OPENNLP-240 when using the parser tool.
> While I realize that java.util.StringTokenizer effectively does the same work as WhitespaceTokenizer, it seems odd to use the former when the latter exists.
> To this end, I'm attaching a patch that adds an additional method
> public static Parse[] parseLine(String line, Parser parser, Tokenizer tokenizer, int numParses)
> I've left the existing method
> public static Parse[] parseLine(String line, Parser parser, int numParses)
> in for convenience and backwards compatibility. It simply calls the new method with WhitespaceTokenizer.INSTANCE
> For good measure, I've added a new command-line argument -tk, which takes the name of a tokenizer model. If none is specified, it will fall back on the current behavior of using the whitespace tokenizer.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)