You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@opennlp.apache.org by Jörn Kottmann <ko...@gmail.com> on 2011/09/01 15:45:12 UTC

German NER performance on CONLL03

Hi All,

I did a little testing with the German CONLL03 data, we only
get a recall of around 38% and a precision of 82% on the
development data for person names.

I wonder what we are doing wrong here, that the numbers are
so bad compared to other systems which participated back than and
get a similar precision but much higher recall.

Is the lack of lemma and pos features causing this? Or could it
be something else?

These guys have a much better recall, and also use a maxent based
system:
http://www.cnts.ua.ac.be/conll2003/pdf/18083kle.pdf

Any ideas what could be done to improve our name finder?

Jörn

Re: German NER performance on CONLL03

Posted by Jörn Kottmann <ko...@gmail.com>.
On 9/2/11 4:36 AM, Jason Baldridge wrote:
> Before starting to make significant changes just based on the numbers, it
> would be very useful to see the system output side by side with the gold
> annotations to see if there are clear patterns in the errors.

I will try that.

Thanks to William we can now do that very easily with the new missclassified
parameter on the evaluator.


There is also a bug in the evaluator, it does not reset the adaptive data
after the end of a document, I will fix that, but it reduces
my recall by 3 % ...

Thanks,
Jörn

Re: German NER performance on CONLL03

Posted by Jörn Kottmann <ko...@gmail.com>.
One issue I observed is that the previous map could work much better.

In German person names the word "von" (from) can appear, but it occurs often
again in the article without be part of a name. That reduces the weight 
of the previous
map features.

I also experimented with combining the previous map features with the 
context,
e.g. word before or word after.

I think having a list of tokens which should not be tracked by the 
previous map will
usually help to improve its performance.

Jörn

On 9/2/11 4:36 AM, Jason Baldridge wrote:
> Before starting to make significant changes just based on the numbers, it
> would be very useful to see the system output side by side with the gold
> annotations to see if there are clear patterns in the errors.
>
> -Jason
>
> On Thu, Sep 1, 2011 at 3:41 PM, Jörn Kottmann<ko...@gmail.com>  wrote:
>
>> On 9/1/11 4:50 PM, william.colen@gmail.com wrote:
>>
>>> Maybe you need some language specific features. I just evaluated the
>>> Portuguese proper name finder with the default OpenNLP features and got
>>> the
>>> following:
>>>
>>>
>>> Evaluated 56994 samples with 26462 entities; found: 26623 entities;
>>> correct:
>>> 23077.
>>>         TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>>>          prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>>> [target:
>>> 26462; tp: 23077; fp: 3546]
>>>
>>> A friend of mine is working directly with Maxent and got better results
>>> because he is using specific features he developed for Portuguese. But it
>>> is
>>> really difficult to tune it.
>>>
>> I am still not sure how the feature generation should be modified, these
>> papers
>> suggest that using prefix and suffix features help. And we already have
>> such feature
>> generators, when I use these the recall goes up a little and the precision.
>> I got now 85% precision, and 44% recall, but I still would like to get a
>> much higher
>> recall some where in the range of 70% or even 80%.
>>
>> Some also use trigger words, not sure if that helps much, or other
>> dictionaries.
>> Maybe compound noun splitting helps, not sure.
>>
>> Or should I try to use a topic model, like they do in more modern NERs?
>>
>> Jörn
>>
>
>


Re: German NER performance on CONLL03

Posted by Jason Baldridge <ja...@gmail.com>.
Before starting to make significant changes just based on the numbers, it
would be very useful to see the system output side by side with the gold
annotations to see if there are clear patterns in the errors.

-Jason

On Thu, Sep 1, 2011 at 3:41 PM, Jörn Kottmann <ko...@gmail.com> wrote:

> On 9/1/11 4:50 PM, william.colen@gmail.com wrote:
>
>> Maybe you need some language specific features. I just evaluated the
>> Portuguese proper name finder with the default OpenNLP features and got
>> the
>> following:
>>
>>
>> Evaluated 56994 samples with 26462 entities; found: 26623 entities;
>> correct:
>> 23077.
>>        TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>>         prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>> [target:
>> 26462; tp: 23077; fp: 3546]
>>
>> A friend of mine is working directly with Maxent and got better results
>> because he is using specific features he developed for Portuguese. But it
>> is
>> really difficult to tune it.
>>
>
> I am still not sure how the feature generation should be modified, these
> papers
> suggest that using prefix and suffix features help. And we already have
> such feature
> generators, when I use these the recall goes up a little and the precision.
> I got now 85% precision, and 44% recall, but I still would like to get a
> much higher
> recall some where in the range of 70% or even 80%.
>
> Some also use trigger words, not sure if that helps much, or other
> dictionaries.
> Maybe compound noun splitting helps, not sure.
>
> Or should I try to use a topic model, like they do in more modern NERs?
>
> Jörn
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: German NER performance on CONLL03

Posted by Jörn Kottmann <ko...@gmail.com>.
On 9/1/11 4:50 PM, william.colen@gmail.com wrote:
> Maybe you need some language specific features. I just evaluated the
> Portuguese proper name finder with the default OpenNLP features and got the
> following:
>
>
> Evaluated 56994 samples with 26462 entities; found: 26623 entities; correct:
> 23077.
>         TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
>          prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%. [target:
> 26462; tp: 23077; fp: 3546]
>
> A friend of mine is working directly with Maxent and got better results
> because he is using specific features he developed for Portuguese. But it is
> really difficult to tune it.

I am still not sure how the feature generation should be modified, these 
papers
suggest that using prefix and suffix features help. And we already have 
such feature
generators, when I use these the recall goes up a little and the precision.
I got now 85% precision, and 44% recall, but I still would like to get a 
much higher
recall some where in the range of 70% or even 80%.

Some also use trigger words, not sure if that helps much, or other 
dictionaries.
Maybe compound noun splitting helps, not sure.

Or should I try to use a topic model, like they do in more modern NERs?

Jörn

Re: German NER performance on CONLL03

Posted by "william.colen@gmail.com" <wi...@gmail.com>.
Hi Jörn,

On Thu, Sep 1, 2011 at 10:45 AM, Jörn Kottmann <ko...@gmail.com> wrote:

> Hi All,
>
> I did a little testing with the German CONLL03 data, we only
> get a recall of around 38% and a precision of 82% on the
> development data for person names.
>
> I wonder what we are doing wrong here, that the numbers are
> so bad compared to other systems which participated back than and
> get a similar precision but much higher recall.
>
> Is the lack of lemma and pos features causing this? Or could it
> be something else?
>
> These guys have a much better recall, and also use a maxent based
> system:
> http://www.cnts.ua.ac.be/**conll2003/pdf/18083kle.pdf<http://www.cnts.ua.ac.be/conll2003/pdf/18083kle.pdf>
>
> Any ideas what could be done to improve our name finder?
>
> Jörn
>

Maybe you need some language specific features. I just evaluated the
Portuguese proper name finder with the default OpenNLP features and got the
following:


Evaluated 56994 samples with 26462 entities; found: 26623 entities; correct:
23077.
       TOTAL: precision:   86,68%;  recall:   87,21%; F1:   86,94%.
        prop: precision:   86,68%;  recall:   87,21%; F1:   86,94%. [target:
26462; tp: 23077; fp: 3546]

A friend of mine is working directly with Maxent and got better results
because he is using specific features he developed for Portuguese. But it is
really difficult to tune it.

William