You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Greg Hanowski <gr...@tekknow.net> on 2014/09/13 18:26:50 UTC

Parser producing incorrect code tree

Hi,

When I run the parser against the following simple sentence:

"The gene for beta amyloid exists on chromosome 21."

 

it produces the following  incorrect code tree:

[0] S 461729928 -> 461729928 TOP The gene for beta amyloid exists on
chromosome 21.

[0.0] NP 461695703 -> 461729928 S The gene for beta amyloid

[0.0.0] NP 461672430 -> 461695703 NP The gene

[0.0.0.0] DT 461665585 -> 461672430 NP The

[0.0.0.0.0] TK 461665585 -> 461665585 DT The

[0.0.0.1] NN 461875042 -> 461672430 NP gene

[0.0.0.1.0] TK 461875042 -> 461875042 NN gene

[0.0.1] PP 462151580 -> 461695703 NP for beta amyloid

[0.0.1.0] IN 462133783 -> 462151580 PP for

[0.0.1.0.0] TK 462133783 -> 462133783 IN for

[0.0.1.1] NP 462354192 -> 462151580 PP beta amyloid

[0.0.1.1.0] JJ 462343240 -> 462354192 NP beta

[0.0.1.1.0.0] TK 462343240 -> 462343240 JJ beta

[0.0.1.1.1] JJ 462607457 -> 462354192 NP amyloid

[0.0.1.1.1.0] TK 462607457 -> 462607457 JJ amyloid

[0.1] VP 463041430 -> 461729928 S exists on chromosome

[0.1.0] VBZ 463022264 -> 463041430 VP exists

[0.1.0.0] TK 463022264 -> 463022264 VBZ exists

[0.1.1] PP 463396001 -> 463041430 VP on chromosome

[0.1.1.0] IN 463380942 -> 463396001 PP on

[0.1.1.0.0] TK 463380942 -> 463380942 IN on

[0.1.1.1] NP 463547960 -> 463396001 PP chromosome

[0.1.1.1.0] NN 463547960 -> 463547960 NP chromosome

[0.1.1.1.0.0] TK 463547960 -> 463547960 NN chromosome

[0.2] . 464110619 -> 461729928 S 21.

[0.2.0] TK 464110619 -> 464110619 . 21.

 

It looks like it got the first noun phrase correct: "The gene for beta
amyloid"

But the Verb Phrase shows as "exists on chromosome".  In my opinion it
should be "exists on"

And it doesn't know what to do with 21.  It is not smart enough to know that
"chromosome 21" should stay together as a noun phrase.

 

This was my very first attempt so I'm thinking if it can't handle a simple
sentence, how is it going to handle a complex one?

I noticed in some of the list comments people talking about Stanford NLP.
Is that one smarter ?  Any suggestions on the best package to use for
biology knowledge extraction?

Thank you,

Greg