You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Juan Manuel Caicedo Carvajal <ju...@cavorite.com> on 2012/02/02 23:02:26 UTC

Spanish trained models for POS tagging

Hello everyone,

I trained POS tagging models for Spanish using the CoNLL data [1].

I created two versions using a different model type (percetron and
maxent) and I also created versions of the models using the universal
Part-of-Speech Tags [2].

I uploaded the files to my server, you can read more details here,
including the evaluation results:

http://cavorite.com/labs/nlp/opennlp-models-es/

And the files are here:

http://files.cavorite.com/projects/opennlp-models-es/ner/models/


Feel free to host them on the OpenNLP website and do not hesitate to
send me your questions or comments.

Cheers,

Juan Manuel Caicedo

[1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
[2] http://code.google.com/p/universal-pos-tags/

Re: Spanish trained models for POS tagging

Posted by Jason Baldridge <ja...@gmail.com>.
Thanks much! I merged the pull request yesterday.

On Mon, Apr 9, 2012 at 6:40 PM, Juan Manuel Caicedo Carvajal <
juan@cavorite.com> wrote:

> Hello,
>
> I finally made the pull request that includes the models for the POS
> tagger for Spanish.
>
> I created the models using their original tags and also the universal
> POS tags. For each tag set I trained two models: one using maxent and
> the other using perceptron.
>
> The pull request contains the models and the scripts that I used to train
> them:
>
> https://github.com/utcompling/OpenNLP-Models/pull/1
>
> Cheers,
>
> Juan Manuel Caicedo
>
> On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge
> <ja...@gmail.com> wrote:
> >
> >
> > On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal
> > <ju...@cavorite.com> wrote:
> >>
> >> (Sorry for the late reply)
> >>
> >> I just cloned the repository and I'll add the scripts I used to
> >> convert the input files and to train the models. this afternoon I'll
> >> put them together on a pull request.
> >>
> >
> > Great!
> >
> >>
> >> Should we keep a copy of the training data in GitHub? I think it could
> >> be useful for training again the models and it also be helpful in case
> >> that the original files are not available anymore (e.g. 404 errors).
> >> Otherwise, should be enough to include links those files?
> >>
> > It depends on whether it is legal to do so. For example, the Norwegian
> data
> > used to train the models there cannot be distributed. If it is fine to
> have
> > it and the corpus isn't too massive, then it might make sense.
> >
> >
> >>
> >> I also have a script for generating a Maven repository for the models.
> >> The GitHub project could also be used for hosting that repository,
> >> what do you think?
> >>
> >
> > +1 Sounds interesting, so if you want to set that up, it sounds good to
> me.
> >
> > -Jason
> >
> >> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
> >> <ja...@gmail.com> wrote:
> >> > That's great! Would you be interested in contributing code and/or data
> >> > to
> >> > the OpenNLP Models repo?
> >> >
> >> > https://github.com/utcompling/OpenNLP-Models
> >> >
> >> >
> >> >
> >> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
> >> > <ju...@cavorite.com> wrote:
> >> >>
> >> >> Hello everyone,
> >> >>
> >> >> I trained POS tagging models for Spanish using the CoNLL data [1].
> >> >>
> >> >> I created two versions using a different model type (percetron and
> >> >> maxent) and I also created versions of the models using the universal
> >> >> Part-of-Speech Tags [2].
> >> >>
> >> >> I uploaded the files to my server, you can read more details here,
> >> >> including the evaluation results:
> >> >>
> >> >> http://cavorite.com/labs/nlp/opennlp-models-es/
> >> >>
> >> >> And the files are here:
> >> >>
> >> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
> >> >>
> >> >>
> >> >> Feel free to host them on the OpenNLP website and do not hesitate to
> >> >> send me your questions or comments.
> >> >>
> >> >> Cheers,
> >> >>
> >> >> Juan Manuel Caicedo
> >> >>
> >> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
> >> >> [2] http://code.google.com/p/universal-pos-tags/
> >> >
> >> >
> >> >
> >> >
> >> > --
> >> > Jason Baldridge
> >> > Associate Professor, Department of Linguistics
> >> > The University of Texas at Austin
> >> > http://www.jasonbaldridge.com
> >> > http://twitter.com/jasonbaldridge
> >> >
> >> >
> >
> >
> >
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
> >
> >
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Spanish trained models for POS tagging

Posted by Juan Manuel Caicedo Carvajal <ju...@cavorite.com>.
Hello,

I finally made the pull request that includes the models for the POS
tagger for Spanish.

I created the models using their original tags and also the universal
POS tags. For each tag set I trained two models: one using maxent and
the other using perceptron.

The pull request contains the models and the scripts that I used to train them:

https://github.com/utcompling/OpenNLP-Models/pull/1

Cheers,

Juan Manuel Caicedo

On Mon, Feb 13, 2012 at 12:47 PM, Jason Baldridge
<ja...@gmail.com> wrote:
>
>
> On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal
> <ju...@cavorite.com> wrote:
>>
>> (Sorry for the late reply)
>>
>> I just cloned the repository and I'll add the scripts I used to
>> convert the input files and to train the models. this afternoon I'll
>> put them together on a pull request.
>>
>
> Great!
>
>>
>> Should we keep a copy of the training data in GitHub? I think it could
>> be useful for training again the models and it also be helpful in case
>> that the original files are not available anymore (e.g. 404 errors).
>> Otherwise, should be enough to include links those files?
>>
> It depends on whether it is legal to do so. For example, the Norwegian data
> used to train the models there cannot be distributed. If it is fine to have
> it and the corpus isn't too massive, then it might make sense.
>
>
>>
>> I also have a script for generating a Maven repository for the models.
>> The GitHub project could also be used for hosting that repository,
>> what do you think?
>>
>
> +1 Sounds interesting, so if you want to set that up, it sounds good to me.
>
> -Jason
>
>> On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
>> <ja...@gmail.com> wrote:
>> > That's great! Would you be interested in contributing code and/or data
>> > to
>> > the OpenNLP Models repo?
>> >
>> > https://github.com/utcompling/OpenNLP-Models
>> >
>> >
>> >
>> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
>> > <ju...@cavorite.com> wrote:
>> >>
>> >> Hello everyone,
>> >>
>> >> I trained POS tagging models for Spanish using the CoNLL data [1].
>> >>
>> >> I created two versions using a different model type (percetron and
>> >> maxent) and I also created versions of the models using the universal
>> >> Part-of-Speech Tags [2].
>> >>
>> >> I uploaded the files to my server, you can read more details here,
>> >> including the evaluation results:
>> >>
>> >> http://cavorite.com/labs/nlp/opennlp-models-es/
>> >>
>> >> And the files are here:
>> >>
>> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
>> >>
>> >>
>> >> Feel free to host them on the OpenNLP website and do not hesitate to
>> >> send me your questions or comments.
>> >>
>> >> Cheers,
>> >>
>> >> Juan Manuel Caicedo
>> >>
>> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
>> >> [2] http://code.google.com/p/universal-pos-tags/
>> >
>> >
>> >
>> >
>> > --
>> > Jason Baldridge
>> > Associate Professor, Department of Linguistics
>> > The University of Texas at Austin
>> > http://www.jasonbaldridge.com
>> > http://twitter.com/jasonbaldridge
>> >
>> >
>
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>
>

Re: Spanish trained models for POS tagging

Posted by Jason Baldridge <ja...@gmail.com>.
On Thu, Feb 9, 2012 at 7:36 AM, Juan Manuel Caicedo Carvajal <
juan@cavorite.com> wrote:

> (Sorry for the late reply)
>
> I just cloned the repository and I'll add the scripts I used to
> convert the input files and to train the models. this afternoon I'll
> put them together on a pull request.
>
>
Great!


> Should we keep a copy of the training data in GitHub? I think it could
> be useful for training again the models and it also be helpful in case
> that the original files are not available anymore (e.g. 404 errors).
> Otherwise, should be enough to include links those files?
>
> It depends on whether it is legal to do so. For example, the Norwegian
data used to train the models there cannot be distributed. If it is fine to
have it and the corpus isn't too massive, then it might make sense.



> I also have a script for generating a Maven repository for the models.
> The GitHub project could also be used for hosting that repository,
> what do you think?
>
>
+1 Sounds interesting, so if you want to set that up, it sounds good to me.

-Jason

On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
> <ja...@gmail.com> wrote:
> > That's great! Would you be interested in contributing code and/or data to
> > the OpenNLP Models repo?
> >
> > https://github.com/utcompling/OpenNLP-Models
> >
> >
> >
> > On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
> > <ju...@cavorite.com> wrote:
> >>
> >> Hello everyone,
> >>
> >> I trained POS tagging models for Spanish using the CoNLL data [1].
> >>
> >> I created two versions using a different model type (percetron and
> >> maxent) and I also created versions of the models using the universal
> >> Part-of-Speech Tags [2].
> >>
> >> I uploaded the files to my server, you can read more details here,
> >> including the evaluation results:
> >>
> >> http://cavorite.com/labs/nlp/opennlp-models-es/
> >>
> >> And the files are here:
> >>
> >> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
> >>
> >>
> >> Feel free to host them on the OpenNLP website and do not hesitate to
> >> send me your questions or comments.
> >>
> >> Cheers,
> >>
> >> Juan Manuel Caicedo
> >>
> >> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
> >> [2] http://code.google.com/p/universal-pos-tags/
> >
> >
> >
> >
> > --
> > Jason Baldridge
> > Associate Professor, Department of Linguistics
> > The University of Texas at Austin
> > http://www.jasonbaldridge.com
> > http://twitter.com/jasonbaldridge
> >
> >
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge

Re: Spanish trained models for POS tagging

Posted by Juan Manuel Caicedo Carvajal <ju...@cavorite.com>.
(Sorry for the late reply)

I just cloned the repository and I'll add the scripts I used to
convert the input files and to train the models. this afternoon I'll
put them together on a pull request.

Should we keep a copy of the training data in GitHub? I think it could
be useful for training again the models and it also be helpful in case
that the original files are not available anymore (e.g. 404 errors).
Otherwise, should be enough to include links those files?

I also have a script for generating a Maven repository for the models.
The GitHub project could also be used for hosting that repository,
what do you think?

On Thu, Feb 2, 2012 at 7:50 PM, Jason Baldridge
<ja...@gmail.com> wrote:
> That's great! Would you be interested in contributing code and/or data to
> the OpenNLP Models repo?
>
> https://github.com/utcompling/OpenNLP-Models
>
>
>
> On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal
> <ju...@cavorite.com> wrote:
>>
>> Hello everyone,
>>
>> I trained POS tagging models for Spanish using the CoNLL data [1].
>>
>> I created two versions using a different model type (percetron and
>> maxent) and I also created versions of the models using the universal
>> Part-of-Speech Tags [2].
>>
>> I uploaded the files to my server, you can read more details here,
>> including the evaluation results:
>>
>> http://cavorite.com/labs/nlp/opennlp-models-es/
>>
>> And the files are here:
>>
>> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
>>
>>
>> Feel free to host them on the OpenNLP website and do not hesitate to
>> send me your questions or comments.
>>
>> Cheers,
>>
>> Juan Manuel Caicedo
>>
>> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
>> [2] http://code.google.com/p/universal-pos-tags/
>
>
>
>
> --
> Jason Baldridge
> Associate Professor, Department of Linguistics
> The University of Texas at Austin
> http://www.jasonbaldridge.com
> http://twitter.com/jasonbaldridge
>
>

Re: Spanish trained models for POS tagging

Posted by Jason Baldridge <ja...@gmail.com>.
That's great! Would you be interested in contributing code and/or data to
the OpenNLP Models repo?

https://github.com/utcompling/OpenNLP-Models


On Thu, Feb 2, 2012 at 4:02 PM, Juan Manuel Caicedo Carvajal <
juan@cavorite.com> wrote:

> Hello everyone,
>
> I trained POS tagging models for Spanish using the CoNLL data [1].
>
> I created two versions using a different model type (percetron and
> maxent) and I also created versions of the models using the universal
> Part-of-Speech Tags [2].
>
> I uploaded the files to my server, you can read more details here,
> including the evaluation results:
>
> http://cavorite.com/labs/nlp/opennlp-models-es/
>
> And the files are here:
>
> http://files.cavorite.com/projects/opennlp-models-es/ner/models/
>
>
> Feel free to host them on the OpenNLP website and do not hesitate to
> send me your questions or comments.
>
> Cheers,
>
> Juan Manuel Caicedo
>
> [1] http://www.lsi.upc.edu/~nlp/tools/nerc/nerc.html
> [2] http://code.google.com/p/universal-pos-tags/
>



-- 
Jason Baldridge
Associate Professor, Department of Linguistics
The University of Texas at Austin
http://www.jasonbaldridge.com
http://twitter.com/jasonbaldridge