You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Andrew Knight <an...@yahoo.com> on 2013/08/07 20:55:00 UTC

Incomplete parse tree problems

Hi,

We are currently getting good results using OpenNLP and Stanford NLP together to generate a dependency parse. For performance reasons we are using OpenNLP to generate a parse tree, and then sending this to the Stanford parser to generate the dependency. Very easy to do in fact and significantly better performing than using the Stanford parser alone, and the Stanford models do not need to be loaded.

The problem we are having is the percentage of sentences that return an incomplete parse through OpenNLP seems a bit high. With the default beam size and advance percentage we are seeing about 5 percent of all sentences incomplete. With an increased beam size, we can get down to about 1.5% of all sentences incomplete. This may seem pretty good, but at this point (OpenNLP beam size = 50, advance percentage = 80) the Stanford parser performs faster and gets all the parse trees complete.

I would prefer not to have to resort to using the Stanford parser to generate the parse tree because then the Stanford models need to be loaded.

Any suggestions on how else to reduce incomplete parses using OpenNLP? Any advice would be appreciated.

Thanks for reading,
-Andy Knight
Cambridge Reading Project