You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/07/26 23:08:09 UTC
[jira] [Commented] (OPENNLP-240) Full-Stop detection not working
during full NLP parse
[ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071361#comment-13071361 ]
Jörn Kottmann commented on OPENNLP-240:
---------------------------------------
The parser cannot tokenize the input, it expects that all tokens are separated by white spaces. In your case month and the dot are not separated by a white space. That is the reason the parser treats it as one token.
Since the tokenizer can tokenize it correctly I suggest that you run it first through the tokenizer and then pass it to the parser.
> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
> Key: OPENNLP-240
> URL: https://issues.apache.org/jira/browse/OPENNLP-240
> Project: OpenNLP
> Issue Type: Bug
> Components: Parser
> Affects Versions: tools-1.5.1-incubating
> Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> public Parse parse(String line) {
> if(parser == null)
> parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> return ParserTool.parseLine(line, parser, 1)[0];
> }// :end parse
> Reporter: mark meiklejohn
> Fix For: tools-1.5.2-incubating
>
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira