You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Jörn Kottmann (JIRA)" <ji...@apache.org> on 2011/07/26 23:08:09 UTC

[jira] [Commented] (OPENNLP-240) Full-Stop detection not working during full NLP parse

    [ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071361#comment-13071361 ] 

Jörn Kottmann commented on OPENNLP-240:
---------------------------------------

The parser cannot tokenize the input, it expects that all tokens are separated by white spaces. In your case month and the dot are not separated by a white space. That is the reason the parser treats it as one token.

Since the tokenizer can tokenize it correctly I suggest that you run it first through the tokenizer and then pass it to the parser.

> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
>                 Key: OPENNLP-240
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>         Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> 	public Parse parse(String line)  {
> 		if(parser == null)
> 			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> 		return ParserTool.parseLine(line, parser, 1)[0];
> 	}// :end parse
>            Reporter: mark meiklejohn
>             Fix For: tools-1.5.2-incubating
>
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira