You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Sam Li <sa...@gmail.com> on 2012/08/14 07:10:16 UTC

how to train sentence detector without losing previous data?

It seems using the training tool for the Sentence Detector will only rewrite a given model.
How does one train and append to a previously created model?

-Sam

Re: how to train sentence detector without losing previous data?

Posted by Yuan Luo <yu...@gmail.com>.

Hi Jörn,
Are all the original training corpora MUC? And would you mind
providing a list of which MUC corpora you used or all of them? I am
thinking of getting them from MUC if you guys didn't make customized
changes to those corpora.

Best,
Yuan

On Mon, Aug 20, 2012 at 3:44 AM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 08/17/2012 08:15 AM, Sam Li wrote:
>>
>> Right now I'm using the English sentence model provided on sourceforge. I
>> would like to append additional data to it.
>> But this means I need the original source of the model, right? If so, how
>> do I get that?
>
>
> The orginial data is copyright protected, its data from the MUC corpus, so
> we cannot distribute it
> with OpenNLP. But you can use other English resources for training.
> You need data which is sentence segmented, such as CONLL2000 for example.
>
> Jörn

Re: how to train sentence detector without losing previous data?

Posted by Sam Li <sa...@gmail.com>.

I see. Would love to see a feature that allows additive training that doesn't require having the original corpus. That way in the future if there are new special cases that users want to add to the model, it would be easier. What do you think?

Is there something like this type of feature in the works?

-Sam

On Aug 20, 2012, at 3:44 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 08/17/2012 08:15 AM, Sam Li wrote:
>> Right now I'm using the English sentence model provided on sourceforge. I would like to append additional data to it.
>> But this means I need the original source of the model, right? If so, how do I get that?
> 
> The orginial data is copyright protected, its data from the MUC corpus, so we cannot distribute it
> with OpenNLP. But you can use other English resources for training.
> You need data which is sentence segmented, such as CONLL2000 for example.
> 
> Jörn

Re: how to train sentence detector without losing previous data?

Posted by Jörn Kottmann <ko...@gmail.com>.

On 08/17/2012 08:15 AM, Sam Li wrote:
> Right now I'm using the English sentence model provided on sourceforge. I would like to append additional data to it.
> But this means I need the original source of the model, right? If so, how do I get that?

The orginial data is copyright protected, its data from the MUC corpus, 
so we cannot distribute it
with OpenNLP. But you can use other English resources for training.
You need data which is sentence segmented, such as CONLL2000 for example.

Jörn

Re: how to train sentence detector without losing previous data?

Posted by Sam Li <sa...@gmail.com>.

Right now I'm using the English sentence model provided on sourceforge. I would like to append additional data to it.
But this means I need the original source of the model, right? If so, how do I get that?

On Aug 14, 2012, at 3:31 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 08/14/2012 07:10 AM, Sam Li wrote:
>> It seems using the training tool for the Sentence Detector will only rewrite a given model.
>> How does one train and append to a previously created model?
> 
> That is not possible with our current implementation. The maxent model training
> algorithm cannot update the model, but the perceptron one could, but we do not
> have any support for that.
> 
> All you can do for now is to keep a copy of your training data, append to it
> and train again.
> 
> Jörn
> 
>

Re: how to train sentence detector without losing previous data?

Posted by Jörn Kottmann <ko...@gmail.com>.

On 08/14/2012 07:10 AM, Sam Li wrote:
> It seems using the training tool for the Sentence Detector will only rewrite a given model.
> How does one train and append to a previously created model?

That is not possible with our current implementation. The maxent model 
training
algorithm cannot update the model, but the perceptron one could, but we 
do not
have any support for that.

All you can do for now is to keep a copy of your training data, append to it
and train again.

Jörn