You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@opennlp.apache.org by "Chris Brew (JIRA)" <ji...@apache.org> on 2011/07/20 20:52:57 UTC

[jira] [Commented] (OPENNLP-233) Parser produces "log probabilities" that are positive

    [ https://issues.apache.org/jira/browse/OPENNLP-233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13068556#comment-13068556 ] 

Chris Brew commented on OPENNLP-233:
------------------------------------

I found the problem. opennlp.tools.cmdline.parser.ParserTool.java has the line


 Parse p = new Parse(text, new Span(0, text.length()), AbstractBottomUpParser.INC_NODE,  1, 0);

which should be

 Parse p = new Parse(text, new Span(0, text.length()), AbstractBottomUpParser.INC_NODE, 0.0, 0);

because the Parse objects should always have log probabilities, so the starting point should be 0, not 1.

(so it turns out that this bug is unlikely ever to matter, because the order of parses is unchanged)



> Parser produces "log probabilities" that are positive
> -----------------------------------------------------
>
>                 Key: OPENNLP-233
>                 URL: https://issues.apache.org/jira/browse/OPENNLP-233
>             Project: OpenNLP
>          Issue Type: Task
>          Components: Command Line Interface, Parser
>         Environment: Mac OS 10.6.8, but also observed on Linux and Windows 7
>            Reporter: Chris Brew
>              Labels: math
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> Using the distributed version and the binary model from the sourceforge site, I see the following bad behaviour. This is bad because probabilities always should be <= 1, so log probabilities should be <= 0, which they clearly are not.
> Script started on Mon Jul 18 19:34:36 2011
> bash-3.2$ bin/opennlp Parser -k 2 models/en-parser-chunking.bin 
> Loading Parser model ... done (14.573s)
> The old are wise .
> 0 0.06948959676790605 (TOP (S (NP (DT The) (JJ old)) (VP (VBP are) (ADJP (JJ wise))) (. .)))
> 1 -1.3788870933108204 (TOP (S (NP (DT The) (JJ old)) (VP (VBP are) (ADVP (RB wise))) (. .)))
> The young are foolish .
> 0 0.2094212498812974 (TOP (S (NP (DT The) (JJ young)) (VP (VBP are) (ADJP (JJ foolish))) (. .)))
> 1 -2.2380713063683784 (TOP (S (NP (DT The) (NNP young)) (VP (VBP are) (ADJP (JJ foolish))) (. .)))
> ^D
> Average: 0.1 sent/s 
> Total: 4 sent
> Runtime: 57.565s
> bash-3.2$ exit
> Script done on Mon Jul 18 19:35:56 2011

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira