You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by John Helmsen <jo...@serendipitynow.com> on 2013/03/13 20:25:46 UTC

Fwd: Re-training en-parser-chunking.bin

Gentlemen and Ladies,

Currently, my group is undertaking a project that involves performing
english understanding of sentence fragments.  While the Apache parser with
the pre-trained binary is very good, we anticipate the need to retrain the
parser eventually on our own data sets to handle special terms and
idiosyncrasies that may arise in our particular context.

The best way to retrain the parser is to mix our parsing solutions in with
the existing parser training set, so that we enhance the already good
performance of the parser in the direction of our particular input.

Unfortunately, it seems that the online documentation for
en-parser-chunking.bin does not include links to the training sets that
were used.  Do any of you good people know what these might be? Thanks!

John Helmsen

Re: Re-training en-parser-chunking.bin

Posted by John Helmsen <jo...@serendipitynow.com>.
Thanks Michael, that is very helpful.  I'll discuss with my employers if we
may be able to create a subset training set for public distribution.


On Wed, Mar 13, 2013 at 4:34 PM, Michael Schmitz
<sc...@cs.washington.edu>wrote:

> I ran the OpenNlp stock models on wikipedia text at one time.  You may be
> able to use this.
>
> https://gist.github.com/3291931
>
> If you create superior models without licensing restrictions, please share.
>
> Peace.  Michael
>
>
> On Wed, Mar 13, 2013 at 12:25 PM, John Helmsen <
> john.helmsen@serendipitynow.com> wrote:
>
> > Gentlemen and Ladies,
> >
> > Currently, my group is undertaking a project that involves performing
> > english understanding of sentence fragments.  While the Apache parser
> with
> > the pre-trained binary is very good, we anticipate the need to retrain
> the
> > parser eventually on our own data sets to handle special terms and
> > idiosyncrasies that may arise in our particular context.
> >
> > The best way to retrain the parser is to mix our parsing solutions in
> with
> > the existing parser training set, so that we enhance the already good
> > performance of the parser in the direction of our particular input.
> >
> > Unfortunately, it seems that the online documentation for
> > en-parser-chunking.bin does not include links to the training sets that
> > were used.  Do any of you good people know what these might be? Thanks!
> >
> > John Helmsen
> >
>

Re: Re-training en-parser-chunking.bin

Posted by Michael Schmitz <sc...@cs.washington.edu>.
I ran the OpenNlp stock models on wikipedia text at one time.  You may be
able to use this.

https://gist.github.com/3291931

If you create superior models without licensing restrictions, please share.

Peace.  Michael


On Wed, Mar 13, 2013 at 12:25 PM, John Helmsen <
john.helmsen@serendipitynow.com> wrote:

> Gentlemen and Ladies,
>
> Currently, my group is undertaking a project that involves performing
> english understanding of sentence fragments.  While the Apache parser with
> the pre-trained binary is very good, we anticipate the need to retrain the
> parser eventually on our own data sets to handle special terms and
> idiosyncrasies that may arise in our particular context.
>
> The best way to retrain the parser is to mix our parsing solutions in with
> the existing parser training set, so that we enhance the already good
> performance of the parser in the direction of our particular input.
>
> Unfortunately, it seems that the online documentation for
> en-parser-chunking.bin does not include links to the training sets that
> were used.  Do any of you good people know what these might be? Thanks!
>
> John Helmsen
>