You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Nicolas Hernandez <ni...@gmail.com> on 2011/10/12 14:36:48 UTC

Name Finder and chunker training format

Hi Everyone,

Looking at the the Name Finder and the chunker tool, I wonder why they
do not use the same training format?

For exemple, this

Mr. <START:person> Pierre Vinken <END> is chairman

may also be represented like this

Mr. NNP O
Pierre NNP B-person
Vinken NNP I-person
is VBZ O
chairman NN O

I have noted that the Name Finder API offers the possibility to custom
the feature generation to consider for the training, but both the Name
Finder and the chunker use the same implementation of the learning
algorithm don't they ?

/Nicolas

Re: Name Finder and chunker training format

Posted by Nicolas Hernandez <ni...@gmail.com>.

Ok thanks

On Wed, Oct 12, 2011 at 2:46 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 10/12/11 2:36 PM, Nicolas Hernandez wrote:
>>
>> Looking at the the Name Finder and the chunker tool, I wonder why they
>> do not use the same training format?
>>
>> For exemple, this
>>
>> Mr.<START:person>  Pierre Vinken<END>  is chairman
>>
>> may also be represented like this
>>
>> Mr. NNP O
>> Pierre NNP B-person
>> Vinken NNP I-person
>> is VBZ O
>> chairman NN O
>>
>> I have noted that the Name Finder API offers the possibility to custom
>> the feature generation to consider for the training, but both the Name
>> Finder and the chunker use the same implementation of the learning
>> algorithm don't they ?
>
> That has historical reasons, the name finder development was inspired by
> the MUC shared tasks, and the chunker development was inspired by the CONLL
> 2000
> shared task.
>
> The implementations are actually different, and the biggest difference is
> the way features
> are generated. The chunker can use pos tags, and the name finder cannot.
>
> We have plans to use the feature generation framework which was created for
> the name finder
> also in the POS tagger and chunker.
>
> Anyway the reasons why we have different components for sequence tagging is
> that it makes it easier to integrate them if there is one component per
> task.
>
> Everything in OpenNLP uses maxent or perceptron, yes.
>
> Jörn
>

Re: Name Finder and chunker training format

Posted by Jörn Kottmann <ko...@gmail.com>.

On 10/12/11 2:36 PM, Nicolas Hernandez wrote:
> Looking at the the Name Finder and the chunker tool, I wonder why they
> do not use the same training format?
>
> For exemple, this
>
> Mr.<START:person>  Pierre Vinken<END>  is chairman
>
> may also be represented like this
>
> Mr. NNP O
> Pierre NNP B-person
> Vinken NNP I-person
> is VBZ O
> chairman NN O
>
> I have noted that the Name Finder API offers the possibility to custom
> the feature generation to consider for the training, but both the Name
> Finder and the chunker use the same implementation of the learning
> algorithm don't they ?

That has historical reasons, the name finder development was inspired by
the MUC shared tasks, and the chunker development was inspired by the 
CONLL 2000
shared task.

The implementations are actually different, and the biggest difference 
is the way features
are generated. The chunker can use pos tags, and the name finder cannot.

We have plans to use the feature generation framework which was created 
for the name finder
also in the POS tagger and chunker.

Anyway the reasons why we have different components for sequence tagging is
that it makes it easier to integrate them if there is one component per 
task.

Everything in OpenNLP uses maxent or perceptron, yes.

Jörn