You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "mark meiklejohn (JIRA)" <ji...@apache.org> on 2011/07/26 20:57:09 UTC

[jira] [Created] (OPENNLP-240) Full-Stop detection not working during full NLP parse

Full-Stop detection not working during full NLP parse
-----------------------------------------------------

                 Key: OPENNLP-240
                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
             Project: OpenNLP
          Issue Type: Bug
          Components: Parser
    Affects Versions: tools-1.5.1-incubating
         Environment: Win 7, JDK1.6.0_23

Example Instantiation:

	public Parse parse(String line)  {
		if(parser == null)
			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
		return ParserTool.parseLine(line, parser, 1)[0];
	}// :end parse
            Reporter: mark meiklejohn
             Fix For: tools-1.5.2-incubating


There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence


(TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))

Although it does work fine with the tokenizer on its own

[I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (OPENNLP-240) Full-Stop detection not working during full NLP parse

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13080805#comment-13080805 ] 

Jörn Kottmann commented on OPENNLP-240:
---------------------------------------

Can we close this issue?

> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
>                 Key: OPENNLP-240
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>         Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> 	public Parse parse(String line)  {
> 		if(parser == null)
> 			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> 		return ParserTool.parseLine(line, parser, 1)[0];
> 	}// :end parse
>            Reporter: mark meiklejohn
>             Fix For: tools-1.5.2-incubating
>
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-240) Full-Stop detection not working during full NLP parse

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13071361#comment-13071361 ] 

Jörn Kottmann commented on OPENNLP-240:
---------------------------------------

The parser cannot tokenize the input, it expects that all tokens are separated by white spaces. In your case month and the dot are not separated by a white space. That is the reason the parser treats it as one token.

Since the tokenizer can tokenize it correctly I suggest that you run it first through the tokenizer and then pass it to the parser.

> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
>                 Key: OPENNLP-240
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>         Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> 	public Parse parse(String line)  {
> 		if(parser == null)
> 			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> 		return ParserTool.parseLine(line, parser, 1)[0];
> 	}// :end parse
>            Reporter: mark meiklejohn
>             Fix For: tools-1.5.2-incubating
>
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

       

[jira] [Commented] (OPENNLP-240) Full-Stop detection not working during full NLP parse

Posted by "mark meiklejohn (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13081046#comment-13081046 ] 

mark meiklejohn commented on OPENNLP-240:
-----------------------------------------

Yes, I was not instantiating it correctly.

Mark





> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
>                 Key: OPENNLP-240
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>         Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> 	public Parse parse(String line)  {
> 		if(parser == null)
> 			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> 		return ParserTool.parseLine(line, parser, 1)[0];
> 	}// :end parse
>            Reporter: mark meiklejohn
>             Fix For: tools-1.5.2-incubating
>
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Closed] (OPENNLP-240) Full-Stop detection not working during full NLP parse

Posted by "Jörn Kottmann (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/OPENNLP-240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jörn Kottmann closed OPENNLP-240.
---------------------------------

       Resolution: Not A Problem
    Fix Version/s:     (was: tools-1.5.2-incubating)
         Assignee: Jörn Kottmann

> Full-Stop detection not working during full NLP parse
> -----------------------------------------------------
>
>                 Key: OPENNLP-240
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-240
>             Project: OpenNLP
>          Issue Type: Bug
>          Components: Parser
>    Affects Versions: tools-1.5.1-incubating
>         Environment: Win 7, JDK1.6.0_23
> Example Instantiation:
> 	public Parse parse(String line)  {
> 		if(parser == null)
> 			parser = ParserFactory.create(new ParserModel(new FileInputStream(_NLPModelPath+"/en-parser-chunking.bin")));
> 		return ParserTool.parseLine(line, parser, 1)[0];
> 	}// :end parse
>            Reporter: mark meiklejohn
>            Assignee: Jörn Kottmann
>
> There seems to be an issue with OpenNLP detecting the full stop at the end of the sentence
> (TOP (S (NP (PRP I)) (VP (VBP intend) (S (VP (TO to) (VP (VB quit) (NP (NP (NN smoking)) (NP (DT this) (NN month.)))))))))
> Although it does work fine with the tokenizer on its own
> [I, intend, to, quit, smoking, this, month, .]

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira