You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by John Stewart <ca...@gmail.com> on 2011/11/07 17:20:16 UTC

parser specifics?

Greetings,

I'm a new user of OpenNLP.  I'm evaluating the parser against others,
I enjoy the ease of using OpenNLP with Clojure so it would be great to
be able to settle on OpenNLP's parser.  Unfortunately I'm unable to
find details online of how the parser was trained on, how it works --
is it lexicalized?  Was it trained with the MaxEnt package?

For this reason I'm unable to guess at its coverage.  Any technical
details about it would be very welcome.

I should say that in informal tests, while it has lower exhibited
coverage than, say, the Stanford parser, the coverage doesn't appear
*that* much lower.  So I'm optimistic :)

Thanks!

jds

Re: parser specifics?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 11/8/11 3:43 PM, John Stewart wrote:
> Thanks Jörn.  Was it trained on the whole Penn treebank?  And do you
> happen to know if this means there are licensing restrictions on the
> use of the parser, that say would need to be resolved via the LDC?
>
Using the models in a commercial application is a grey area.
The models do not break the copyright of the original corpus, because
it is not possible to reproduce it with it. Therefore I am in doubt that the
LDC can restrict the usage of them.

We didn't spent time to resolve these issues yet, and that is also the 
reason
why the models are still distributed via our old SourceForge page.

I don't know if the training file contains the entire TreeBank, it has 
arround
60K sentences. I believe section 23 is used for testing.

Jörn

Re: parser specifics?

Posted by John Stewart <ca...@gmail.com>.
Thanks Jörn.  Was it trained on the whole Penn treebank?  And do you
happen to know if this means there are licensing restrictions on the
use of the parser, that say would need to be resolved via the LDC?

jds

On Mon, Nov 7, 2011 at 4:58 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 11/7/11 5:20 PM, John Stewart wrote:
>>
>> I'm a new user of OpenNLP.  I'm evaluating the parser against others,
>> I enjoy the ease of using OpenNLP with Clojure so it would be great to
>> be able to settle on OpenNLP's parser.  Unfortunately I'm unable to
>> find details online of how the parser was trained on, how it works --
>> is it lexicalized?  Was it trained with the MaxEnt package?
>>
>> For this reason I'm unable to guess at its coverage.  Any technical
>> details about it would be very welcome.
>>
>> I should say that in informal tests, while it has lower exhibited
>> coverage than, say, the Stanford parser, the coverage doesn't appear
>> *that*  much lower.  So I'm optimistic:)
>
> Parser documentation is still very sparse.
> The Parser itself is based on a paper from  Adwait Ratnaparkhi.
>
> You can find the link to it, and other papers
> OpenNLP is based on in our wiki:
> https://cwiki.apache.org/OPENNLP/nlp-papers.html
>
> The models from the website are trained on the Penn Treebank.
>
> Let me know if you need more information.
>
> Hope this helps,
> Jörn
>

Re: parser specifics?

Posted by Jörn Kottmann <ko...@gmail.com>.
On 11/7/11 5:20 PM, John Stewart wrote:
> I'm a new user of OpenNLP.  I'm evaluating the parser against others,
> I enjoy the ease of using OpenNLP with Clojure so it would be great to
> be able to settle on OpenNLP's parser.  Unfortunately I'm unable to
> find details online of how the parser was trained on, how it works --
> is it lexicalized?  Was it trained with the MaxEnt package?
>
> For this reason I'm unable to guess at its coverage.  Any technical
> details about it would be very welcome.
>
> I should say that in informal tests, while it has lower exhibited
> coverage than, say, the Stanford parser, the coverage doesn't appear
> *that*  much lower.  So I'm optimistic:)

Parser documentation is still very sparse.
The Parser itself is based on a paper from  Adwait Ratnaparkhi.

You can find the link to it, and other papers
OpenNLP is based on in our wiki:
https://cwiki.apache.org/OPENNLP/nlp-papers.html

The models from the website are trained on the Penn Treebank.

Let me know if you need more information.

Hope this helps,
Jörn