You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Nicolas Hernandez <ni...@gmail.com> on 2011/10/12 14:36:48 UTC
Name Finder and chunker training format
Hi Everyone,
Looking at the the Name Finder and the chunker tool, I wonder why they
do not use the same training format?
For exemple, this
Mr. <START:person> Pierre Vinken <END> is chairman
may also be represented like this
Mr. NNP O
Pierre NNP B-person
Vinken NNP I-person
is VBZ O
chairman NN O
I have noted that the Name Finder API offers the possibility to custom
the feature generation to consider for the training, but both the Name
Finder and the chunker use the same implementation of the learning
algorithm don't they ?
/Nicolas
Re: Name Finder and chunker training format
Posted by Nicolas Hernandez <ni...@gmail.com>.
Ok thanks
On Wed, Oct 12, 2011 at 2:46 PM, Jörn Kottmann <ko...@gmail.com> wrote:
> On 10/12/11 2:36 PM, Nicolas Hernandez wrote:
>>
>> Looking at the the Name Finder and the chunker tool, I wonder why they
>> do not use the same training format?
>>
>> For exemple, this
>>
>> Mr.<START:person> Pierre Vinken<END> is chairman
>>
>> may also be represented like this
>>
>> Mr. NNP O
>> Pierre NNP B-person
>> Vinken NNP I-person
>> is VBZ O
>> chairman NN O
>>
>> I have noted that the Name Finder API offers the possibility to custom
>> the feature generation to consider for the training, but both the Name
>> Finder and the chunker use the same implementation of the learning
>> algorithm don't they ?
>
> That has historical reasons, the name finder development was inspired by
> the MUC shared tasks, and the chunker development was inspired by the CONLL
> 2000
> shared task.
>
> The implementations are actually different, and the biggest difference is
> the way features
> are generated. The chunker can use pos tags, and the name finder cannot.
>
> We have plans to use the feature generation framework which was created for
> the name finder
> also in the POS tagger and chunker.
>
> Anyway the reasons why we have different components for sequence tagging is
> that it makes it easier to integrate them if there is one component per
> task.
>
> Everything in OpenNLP uses maxent or perceptron, yes.
>
> Jörn
>
Re: Name Finder and chunker training format
Posted by Jörn Kottmann <ko...@gmail.com>.
On 10/12/11 2:36 PM, Nicolas Hernandez wrote:
> Looking at the the Name Finder and the chunker tool, I wonder why they
> do not use the same training format?
>
> For exemple, this
>
> Mr.<START:person> Pierre Vinken<END> is chairman
>
> may also be represented like this
>
> Mr. NNP O
> Pierre NNP B-person
> Vinken NNP I-person
> is VBZ O
> chairman NN O
>
> I have noted that the Name Finder API offers the possibility to custom
> the feature generation to consider for the training, but both the Name
> Finder and the chunker use the same implementation of the learning
> algorithm don't they ?
That has historical reasons, the name finder development was inspired by
the MUC shared tasks, and the chunker development was inspired by the
CONLL 2000
shared task.
The implementations are actually different, and the biggest difference
is the way features
are generated. The chunker can use pos tags, and the name finder cannot.
We have plans to use the feature generation framework which was created
for the name finder
also in the POS tagger and chunker.
Anyway the reasons why we have different components for sequence tagging is
that it makes it easier to integrate them if there is one component per
task.
Everything in OpenNLP uses maxent or perceptron, yes.
Jörn