You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2011/05/17 10:25:27 UTC

Fixes that decrease existing model performance

Hi all,

I was wondering if we can do bug fixes which slightly decrease
the performance of existing models?

In this case I am speaking about OPENNLP-172 which fixes the handling
of lower case sequences in of the token class feature. It detects a
lower case sequences when they contain only A to Z, but in other languages
are more letters like the German umlauts.

This fix will decrease the recall of the existing spanish person ner 
model by 2%,
should we apply it anyway for the next release?

After retraining the recall goes up by 6%.

Jörn

Re: Fixes that decrease existing model performance

Posted by Jason Baldridge <ja...@gmail.com>.

+1

On Tue, May 17, 2011 at 6:49 AM, Olivier Grisel <ol...@ensta.org>wrote:

> 2011/5/17 Jörn Kottmann <ko...@gmail.com>:
> >
> >>> After retraining the recall goes up by 6%.
> >>
> >> I am +1 for fixing bugs and providing retrained models for the next
> >> release.
> >
> > I guess it will also improve your french models after re-training.
>
> Good to know, thanks.
>
> --
> Olivier
> http://twitter.com/ogrisel - http://github.com/ogrisel
>



-- 
Jason Baldridge
Assistant Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Fixes that decrease existing model performance

Posted by Olivier Grisel <ol...@ensta.org>.

2011/5/17 Jörn Kottmann <ko...@gmail.com>:
>
>>> After retraining the recall goes up by 6%.
>>
>> I am +1 for fixing bugs and providing retrained models for the next
>> release.
>
> I guess it will also improve your french models after re-training.

Good to know, thanks.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Re: Fixes that decrease existing model performance

Posted by Jörn Kottmann <ko...@gmail.com>.

On 5/17/11 12:29 PM, Olivier Grisel wrote:
> 2011/5/17 Jörn Kottmann<ko...@gmail.com>:
>> Hi all,
>>
>> I was wondering if we can do bug fixes which slightly decrease
>> the performance of existing models?
>>
>> In this case I am speaking about OPENNLP-172 which fixes the handling
>> of lower case sequences in of the token class feature. It detects a
>> lower case sequences when they contain only A to Z, but in other languages
>> are more letters like the German umlauts.
>>
>> This fix will decrease the recall of the existing spanish person ner model
>> by 2%,
>> should we apply it anyway for the next release?
>>
>> After retraining the recall goes up by 6%.
> I am +1 for fixing bugs and providing retrained models for the next release.

I guess it will also improve your french models after re-training.

Jörn

Re: Fixes that decrease existing model performance

Posted by Olivier Grisel <ol...@ensta.org>.

2011/5/17 Jörn Kottmann <ko...@gmail.com>:
> Hi all,
>
> I was wondering if we can do bug fixes which slightly decrease
> the performance of existing models?
>
> In this case I am speaking about OPENNLP-172 which fixes the handling
> of lower case sequences in of the token class feature. It detects a
> lower case sequences when they contain only A to Z, but in other languages
> are more letters like the German umlauts.
>
> This fix will decrease the recall of the existing spanish person ner model
> by 2%,
> should we apply it anyway for the next release?
>
> After retraining the recall goes up by 6%.

I am +1 for fixing bugs and providing retrained models for the next release.

-- 
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel