You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Rodrigo Agerri <ro...@ehu.es> on 2012/12/19 10:56:42 UTC

[opennlp-users] NameFinder convert formats for Italian (evalita) training corpus

Hi,

I needed to train Italian models for NER and I used the evalita 07 and
evalita 09 corpora:

http://www.evalita.it/

Evalita NER 07 and 09 formats is based on CoNLL 2003 except for the
fact that instead of a MISC class they add a GPE (geopolitical entity)
class.

In order to convert the dataset format  to OpenNLP training data I
created two classes which I attach, in case you think it is useful for
the project.

Cheers,

Rodrigo

Re: [opennlp-users] NameFinder convert formats for Italian (evalita) training corpus

Posted by Rodrigo Agerri <ro...@ehu.es>.
I forgot to attach the classes :)

R

On Wed, Dec 19, 2012 at 10:56 AM, Rodrigo Agerri <ro...@ehu.es> wrote:
> Hi,
>
> I needed to train Italian models for NER and I used the evalita 07 and
> evalita 09 corpora:
>
> http://www.evalita.it/
>
> Evalita NER 07 and 09 formats is based on CoNLL 2003 except for the
> fact that instead of a MISC class they add a GPE (geopolitical entity)
> class.
>
> In order to convert the dataset format  to OpenNLP training data I
> created two classes which I attach, in case you think it is useful for
> the project.
>
> Cheers,
>
> Rodrigo

Re: [opennlp-users] NameFinder convert formats for Italian (evalita) training corpus

Posted by Jörn Kottmann <ko...@gmail.com>.
Hello,

it would be nice if we can add support for this dataset to the
OpenNLP formats package.

Can you please open a jira issue and request support for Evalita 07/09
and attach your code to it?

All attachments are removed by our mailing list server, so its not 
possible to
share it that way.

Thanks for sharing your code,
Jörn

On 12/19/2012 10:56 AM, Rodrigo Agerri wrote:
> Hi,
>
> I needed to train Italian models for NER and I used the evalita 07 and
> evalita 09 corpora:
>
> http://www.evalita.it/
>
> Evalita NER 07 and 09 formats is based on CoNLL 2003 except for the
> fact that instead of a MISC class they add a GPE (geopolitical entity)
> class.
>
> In order to convert the dataset format  to OpenNLP training data I
> created two classes which I attach, in case you think it is useful for
> the project.
>
> Cheers,
>
> Rodrigo