You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Charles Jalin <ch...@gmail.com> on 2014/03/19 10:25:51 UTC

Models in spanish

Hello,

I am newbie in OpenNLP. I need to use it in a spanish project. Is there
models in spanish?

Thanks for your attention.

Regards.

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

I now also committed the second part of this issue. The parser trainer
tool now has an option to specify the Artifact Serializer class which 
should be
used for the Head Rules.

With this change it should be possible to train a Spanish parser model 
without
any custom code.

Jörn

On 03/22/2014 11:00 AM, Rodrigo Agerri wrote:
> Hi Jörn,
>
> This is great, thanks very much, in the email I meant that before thinking
> about me contributing something else I had hoped to help solve the casting issue, but
> it has been fixed at lightning speed, so thanks again.
>
> I will try now with the Spanish head rules class I added.
>
> Rodrigo
>
> On 2014/03/21 at 11:37, Joern Kottmann wrote:
>> On 03/19/2014 02:50 PM, Rodrigo Agerri wrote:
>>> About contributing, sure, no problem I will do that in the future, as soon as
>>> the parser language specific casting is corrected:)
>>>
>>> https://issues.apache.org/jira/browse/OPENNLP-665
>> The casting should work now. I was quite busy the last couple of
>> days with other stuff.
>>
>> We still need to improve the command line tool. It should have an
>> option to take
>> the impl class name of the head rule files, and there should be a
>> factory method
>> which can create the default Head Rule object based on the language.
>>
>> I hope I will finish that on the weekend.
>>
>> Jörn

Re: Models in spanish

Posted by Rodrigo Agerri <ag...@gmail.com>.

Hi Jörn, 

This is great, thanks very much, in the email I meant that before thinking
about me contributing something else I had hoped to help solve the casting issue, but
it has been fixed at lightning speed, so thanks again. 

I will try now with the Spanish head rules class I added. 

Rodrigo

On 2014/03/21 at 11:37, Joern Kottmann wrote:
> On 03/19/2014 02:50 PM, Rodrigo Agerri wrote:
> >About contributing, sure, no problem I will do that in the future, as soon as
> >the parser language specific casting is corrected:)
> >
> >https://issues.apache.org/jira/browse/OPENNLP-665
> 
> The casting should work now. I was quite busy the last couple of
> days with other stuff.
> 
> We still need to improve the command line tool. It should have an
> option to take
> the impl class name of the head rule files, and there should be a
> factory method
> which can create the default Head Rule object based on the language.
> 
> I hope I will finish that on the weekend.
> 
> Jörn

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

On 03/19/2014 02:50 PM, Rodrigo Agerri wrote:
> About contributing, sure, no problem I will do that in the future, as soon as
> the parser language specific casting is corrected:)  
>
> https://issues.apache.org/jira/browse/OPENNLP-665

The casting should work now. I was quite busy the last couple of days 
with other stuff.

We still need to improve the command line tool. It should have an option 
to take
the impl class name of the head rule files, and there should be a 
factory method
which can create the default Head Rule object based on the language.

I hope I will finish that on the weekend.

Jörn

Re: Models in spanish

Posted by Rodrigo Agerri <ro...@ehu.es>.

Hi Jörn, 

Yes, I am currently training new models with more features and other corpora
for English and Spanish. 

About contributing, sure, no problem I will do that in the future, as soon as
the parser language specific casting is corrected :) 

https://issues.apache.org/jira/browse/OPENNLP-665

Cheers, 

Rodrigo

On 2014/03/19 at 14:31, Joern Kottmann wrote:
> I had a short look at the paper. For English NER you might want in
> addition to publish OntoNotes models. There is format support for that
> in OpenNLP.
> 
> Maybe it could be interesting for you to contribute the work you did
> on the tokenization
> or coref component to OpenNLP.
> 
> Jörn
> 
> On 03/19/2014 01:59 PM, Rodrigo Agerri wrote:
> >Hi,
> >
> >We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with evaluations
> >etc and so on as part of the IXA pipeline tools.
> >
> >We also have tokenizer (tried opennlp models and were not adaptable enough)
> >based on JFlex specification. Coreference resolution (loosely based on Stanford
> >NLP approach) coming very soon (for May).
> >
> >More info here:
> >
> >http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> >
> >Thanks,
> >
> >Rodrigo
> >
> >On 2014/03/19 at 12:39, Charles Jalin wrote:
> >>For tokenizer, sentence, pos tagger y tokchunk.
> >>
> >>I amn't sure that i can obtain Spanish corpora.
> >>
> >>Thanks.
> >>
> >>
> >>2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> >>
> >>>On 03/19/2014 12:01 PM, Charles Jalin wrote:
> >>>
> >>>>How i do this?
> >>>>
> >>>>
> >>>>
> >>>Depends on the model. For which component?
> >>>
> >>>Anyway, the best way to improve the situation would be to
> >>>add support to OpenNLP to train it on the available Spanish corpora.
> >>>
> >>>Jörn
> >>>
>

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

Very thanks for your help.
It is a very good starting point.

Regards.


2014-03-19 17:53 GMT+01:00 Rodrigo Agerri <ro...@ehu.es>:

> Hi Charles,
>
> The pos tagger, ner (NameFind) and Parsing models are Apache OpenNLP 1.5.3
> models. The ixa pipe tools mentioned add various others functionalities
> but the
> one thing they all use is the OpenNLP model objects for each tool, namely,
> Parse, TokenNameFinderModel, and POSModel.
>
> Therefore, you can take the models provided in the ixa pipes tools and they
> will work with opennlp CLI or their API. For example, you can easily try
> the parser model
> from the CLI by passing it a txt with a tokenize sentence.
>
> bin/opennlp Parser es-ancora-parsing.bin < test.txt
>
> HTH
>
> Rodrigo
>
> On 2014/03/19 at 17:36, Charles Jalin wrote:
> > Very thanks for your work.It is very interesting.
> >
> > I am newbie in opennlp and, how can i change this models to opennlp
> format?
> >
> > Regards.
> >
> >
> >
> > 2014-03-19 14:41 GMT+01:00 Rodrigo Agerri <ag...@gmail.com>:
> >
> > > Hi,
> > >
> > > Yes, I need to add the info and releases to that page. In the meantime
> you
> > > can
> > > check up the github repos:
> > >
> > > https://github.com/ixa-ehu/ixa-pipe-tok
> > > https://github.com/ixa-ehu/ixa-pipe-pos
> > > https://github.com/ixa-ehu/ixa-pipe-nerc
> > > https://github.com/ixa-ehu/ixa-pipe-parse
> > >
> > > Cheers,
> > >
> > > Rodrigo
> > >
> > > On 2014/03/19 at 14:28, Richard Eckart de Castilho wrote:
> > > > Hello Rodrigo,
> > > >
> > > > do you have a link to a page from where the IXA tools or at least
> their
> > > models
> > > > can be obtained? The link mentioned in the paper doesn't seem to
> have any
> > > > substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes
> > > >
> > > > Cheers,
> > > >
> > > > -- Richard
> > > >
> > > > On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora)
> > > with evaluations
> > > > > etc and so on as part of the IXA pipeline tools.
> > > > >
> > > > > We also have tokenizer (tried opennlp models and were not adaptable
> > > enough)
> > > > > based on JFlex specification. Coreference resolution (loosely
> based on
> > > Stanford
> > > > > NLP approach) coming very soon (for May).
> > > > >
> > > > > More info here:
> > > > >
> > > > >
> > >
> http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Rodrigo
> > > > >
> > > > > On 2014/03/19 at 12:39, Charles Jalin wrote:
> > > > >> For tokenizer, sentence, pos tagger y tokchunk.
> > > > >>
> > > > >> I amn't sure that i can obtain Spanish corpora.
> > > > >>
> > > > >> Thanks.
> > > > >>
> > > > >>
> > > > >> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> > > > >>
> > > > >>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
> > > > >>>
> > > > >>>> How i do this?
> > > > >>>>
> > > > >>>>
> > > > >>>>
> > > > >>> Depends on the model. For which component?
> > > > >>>
> > > > >>> Anyway, the best way to improve the situation would be to
> > > > >>> add support to OpenNLP to train it on the available Spanish
> corpora.
> > > > >>>
> > > > >>> Jörn
> > >
>

Re: Models in spanish

Posted by Rodrigo Agerri <ro...@ehu.es>.

Hi Charles, 

The pos tagger, ner (NameFind) and Parsing models are Apache OpenNLP 1.5.3
models. The ixa pipe tools mentioned add various others functionalities but the
one thing they all use is the OpenNLP model objects for each tool, namely,
Parse, TokenNameFinderModel, and POSModel. 

Therefore, you can take the models provided in the ixa pipes tools and they 
will work with opennlp CLI or their API. For example, you can easily try the parser model
from the CLI by passing it a txt with a tokenize sentence. 

bin/opennlp Parser es-ancora-parsing.bin < test.txt

HTH 

Rodrigo

On 2014/03/19 at 17:36, Charles Jalin wrote:
> Very thanks for your work.It is very interesting.
> 
> I am newbie in opennlp and, how can i change this models to opennlp format?
> 
> Regards.
> 
> 
> 
> 2014-03-19 14:41 GMT+01:00 Rodrigo Agerri <ag...@gmail.com>:
> 
> > Hi,
> >
> > Yes, I need to add the info and releases to that page. In the meantime you
> > can
> > check up the github repos:
> >
> > https://github.com/ixa-ehu/ixa-pipe-tok
> > https://github.com/ixa-ehu/ixa-pipe-pos
> > https://github.com/ixa-ehu/ixa-pipe-nerc
> > https://github.com/ixa-ehu/ixa-pipe-parse
> >
> > Cheers,
> >
> > Rodrigo
> >
> > On 2014/03/19 at 14:28, Richard Eckart de Castilho wrote:
> > > Hello Rodrigo,
> > >
> > > do you have a link to a page from where the IXA tools or at least their
> > models
> > > can be obtained? The link mentioned in the paper doesn't seem to have any
> > > substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes
> > >
> > > Cheers,
> > >
> > > -- Richard
> > >
> > > On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com>
> > wrote:
> > >
> > > > Hi,
> > > >
> > > > We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora)
> > with evaluations
> > > > etc and so on as part of the IXA pipeline tools.
> > > >
> > > > We also have tokenizer (tried opennlp models and were not adaptable
> > enough)
> > > > based on JFlex specification. Coreference resolution (loosely based on
> > Stanford
> > > > NLP approach) coming very soon (for May).
> > > >
> > > > More info here:
> > > >
> > > >
> > http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> > > >
> > > > Thanks,
> > > >
> > > > Rodrigo
> > > >
> > > > On 2014/03/19 at 12:39, Charles Jalin wrote:
> > > >> For tokenizer, sentence, pos tagger y tokchunk.
> > > >>
> > > >> I amn't sure that i can obtain Spanish corpora.
> > > >>
> > > >> Thanks.
> > > >>
> > > >>
> > > >> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> > > >>
> > > >>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
> > > >>>
> > > >>>> How i do this?
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>> Depends on the model. For which component?
> > > >>>
> > > >>> Anyway, the best way to improve the situation would be to
> > > >>> add support to OpenNLP to train it on the available Spanish corpora.
> > > >>>
> > > >>> Jörn
> >

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

Sorry, i have seen the bin files for pos an parse. Thanks

But there isn't models for tokenizer, sentence and chunker.

Very thanks for your help.

Regads



2014-03-19 17:36 GMT+01:00 Charles Jalin <ch...@gmail.com>:

> Very thanks for your work.It is very interesting.
>
> I am newbie in opennlp and, how can i change this models to opennlp format?
>
> Regards.
>
>
>
> 2014-03-19 14:41 GMT+01:00 Rodrigo Agerri <ag...@gmail.com>:
>
> Hi,
>>
>> Yes, I need to add the info and releases to that page. In the meantime
>> you can
>> check up the github repos:
>>
>> https://github.com/ixa-ehu/ixa-pipe-tok
>> https://github.com/ixa-ehu/ixa-pipe-pos
>> https://github.com/ixa-ehu/ixa-pipe-nerc
>> https://github.com/ixa-ehu/ixa-pipe-parse
>>
>> Cheers,
>>
>> Rodrigo
>>
>> On 2014/03/19 at 14:28, Richard Eckart de Castilho wrote:
>> > Hello Rodrigo,
>> >
>> > do you have a link to a page from where the IXA tools or at least their
>> models
>> > can be obtained? The link mentioned in the paper doesn't seem to have
>> any
>> > substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes
>> >
>> > Cheers,
>> >
>> > -- Richard
>> >
>> > On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com>
>> wrote:
>> >
>> > > Hi,
>> > >
>> > > We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora)
>> with evaluations
>> > > etc and so on as part of the IXA pipeline tools.
>> > >
>> > > We also have tokenizer (tried opennlp models and were not adaptable
>> enough)
>> > > based on JFlex specification. Coreference resolution (loosely based
>> on Stanford
>> > > NLP approach) coming very soon (for May).
>> > >
>> > > More info here:
>> > >
>> > >
>> http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
>> > >
>> > > Thanks,
>> > >
>> > > Rodrigo
>> > >
>> > > On 2014/03/19 at 12:39, Charles Jalin wrote:
>> > >> For tokenizer, sentence, pos tagger y tokchunk.
>> > >>
>> > >> I amn't sure that i can obtain Spanish corpora.
>> > >>
>> > >> Thanks.
>> > >>
>> > >>
>> > >> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
>> > >>
>> > >>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
>> > >>>
>> > >>>> How i do this?
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>> Depends on the model. For which component?
>> > >>>
>> > >>> Anyway, the best way to improve the situation would be to
>> > >>> add support to OpenNLP to train it on the available Spanish corpora.
>> > >>>
>> > >>> Jörn
>>
>
>

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

Very thanks for your work.It is very interesting.

I am newbie in opennlp and, how can i change this models to opennlp format?

Regards.



2014-03-19 14:41 GMT+01:00 Rodrigo Agerri <ag...@gmail.com>:

> Hi,
>
> Yes, I need to add the info and releases to that page. In the meantime you
> can
> check up the github repos:
>
> https://github.com/ixa-ehu/ixa-pipe-tok
> https://github.com/ixa-ehu/ixa-pipe-pos
> https://github.com/ixa-ehu/ixa-pipe-nerc
> https://github.com/ixa-ehu/ixa-pipe-parse
>
> Cheers,
>
> Rodrigo
>
> On 2014/03/19 at 14:28, Richard Eckart de Castilho wrote:
> > Hello Rodrigo,
> >
> > do you have a link to a page from where the IXA tools or at least their
> models
> > can be obtained? The link mentioned in the paper doesn't seem to have any
> > substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes
> >
> > Cheers,
> >
> > -- Richard
> >
> > On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com>
> wrote:
> >
> > > Hi,
> > >
> > > We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora)
> with evaluations
> > > etc and so on as part of the IXA pipeline tools.
> > >
> > > We also have tokenizer (tried opennlp models and were not adaptable
> enough)
> > > based on JFlex specification. Coreference resolution (loosely based on
> Stanford
> > > NLP approach) coming very soon (for May).
> > >
> > > More info here:
> > >
> > >
> http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> > >
> > > Thanks,
> > >
> > > Rodrigo
> > >
> > > On 2014/03/19 at 12:39, Charles Jalin wrote:
> > >> For tokenizer, sentence, pos tagger y tokchunk.
> > >>
> > >> I amn't sure that i can obtain Spanish corpora.
> > >>
> > >> Thanks.
> > >>
> > >>
> > >> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> > >>
> > >>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
> > >>>
> > >>>> How i do this?
> > >>>>
> > >>>>
> > >>>>
> > >>> Depends on the model. For which component?
> > >>>
> > >>> Anyway, the best way to improve the situation would be to
> > >>> add support to OpenNLP to train it on the available Spanish corpora.
> > >>>
> > >>> Jörn
>

Re: Models in spanish

Posted by Rodrigo Agerri <ag...@gmail.com>.

Hi, 

Yes, I need to add the info and releases to that page. In the meantime you can 
check up the github repos: 

https://github.com/ixa-ehu/ixa-pipe-tok
https://github.com/ixa-ehu/ixa-pipe-pos
https://github.com/ixa-ehu/ixa-pipe-nerc
https://github.com/ixa-ehu/ixa-pipe-parse

Cheers, 

Rodrigo

On 2014/03/19 at 14:28, Richard Eckart de Castilho wrote:
> Hello Rodrigo,
> 
> do you have a link to a page from where the IXA tools or at least their models
> can be obtained? The link mentioned in the paper doesn't seem to have any
> substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes
> 
> Cheers,
> 
> -- Richard
> 
> On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com> wrote:
> 
> > Hi, 
> > 
> > We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with evaluations
> > etc and so on as part of the IXA pipeline tools. 
> > 
> > We also have tokenizer (tried opennlp models and were not adaptable enough)
> > based on JFlex specification. Coreference resolution (loosely based on Stanford
> > NLP approach) coming very soon (for May). 
> > 
> > More info here: 
> > 
> > http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> > 
> > Thanks, 
> > 
> > Rodrigo
> > 
> > On 2014/03/19 at 12:39, Charles Jalin wrote:
> >> For tokenizer, sentence, pos tagger y tokchunk.
> >> 
> >> I amn't sure that i can obtain Spanish corpora.
> >> 
> >> Thanks.
> >> 
> >> 
> >> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> >> 
> >>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
> >>> 
> >>>> How i do this?
> >>>> 
> >>>> 
> >>>> 
> >>> Depends on the model. For which component?
> >>> 
> >>> Anyway, the best way to improve the situation would be to
> >>> add support to OpenNLP to train it on the available Spanish corpora.
> >>> 
> >>> Jörn

Re: Models in spanish

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hello Rodrigo,

do you have a link to a page from where the IXA tools or at least their models
can be obtained? The link mentioned in the paper doesn't seem to have any
substantial content yet: http://adimen.si.ehu.es/web/ixa-pipes

Cheers,

-- Richard

On 19.03.2014, at 13:59, Rodrigo Agerri <ag...@gmail.com> wrote:

> Hi, 
> 
> We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with evaluations
> etc and so on as part of the IXA pipeline tools. 
> 
> We also have tokenizer (tried opennlp models and were not adaptable enough)
> based on JFlex specification. Coreference resolution (loosely based on Stanford
> NLP approach) coming very soon (for May). 
> 
> More info here: 
> 
> http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
> 
> Thanks, 
> 
> Rodrigo
> 
> On 2014/03/19 at 12:39, Charles Jalin wrote:
>> For tokenizer, sentence, pos tagger y tokchunk.
>> 
>> I amn't sure that i can obtain Spanish corpora.
>> 
>> Thanks.
>> 
>> 
>> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
>> 
>>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
>>> 
>>>> How i do this?
>>>> 
>>>> 
>>>> 
>>> Depends on the model. For which component?
>>> 
>>> Anyway, the best way to improve the situation would be to
>>> add support to OpenNLP to train it on the available Spanish corpora.
>>> 
>>> Jörn

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

I had a short look at the paper. For English NER you might want in
addition to publish OntoNotes models. There is format support for that
in OpenNLP.

Maybe it could be interesting for you to contribute the work you did on 
the tokenization
or coref component to OpenNLP.

Jörn

On 03/19/2014 01:59 PM, Rodrigo Agerri wrote:
> Hi,
>
> We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with evaluations
> etc and so on as part of the IXA pipeline tools.
>
> We also have tokenizer (tried opennlp models and were not adaptable enough)
> based on JFlex specification. Coreference resolution (loosely based on Stanford
> NLP approach) coming very soon (for May).
>
> More info here:
>
> http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1
>
> Thanks,
>
> Rodrigo
>
> On 2014/03/19 at 12:39, Charles Jalin wrote:
>> For tokenizer, sentence, pos tagger y tokchunk.
>>
>> I amn't sure that i can obtain Spanish corpora.
>>
>> Thanks.
>>
>>
>> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
>>
>>> On 03/19/2014 12:01 PM, Charles Jalin wrote:
>>>
>>>> How i do this?
>>>>
>>>>
>>>>
>>> Depends on the model. For which component?
>>>
>>> Anyway, the best way to improve the situation would be to
>>> add support to OpenNLP to train it on the available Spanish corpora.
>>>
>>> Jörn
>>>

Re: Models in spanish

Posted by Rodrigo Agerri <ag...@gmail.com>.

Hi, 

We have new models 1.5.3 for pos, ner (conll 2002), parser (Ancora) with evaluations
etc and so on as part of the IXA pipeline tools. 

We also have tokenizer (tried opennlp models and were not adaptable enough)
based on JFlex specification. Coreference resolution (loosely based on Stanford
NLP approach) coming very soon (for May). 

More info here: 

http://www.rodrigoagerri.net/recent-papers/ixa-pipes.pdf?attredirects=0&d=1

Thanks, 

Rodrigo

On 2014/03/19 at 12:39, Charles Jalin wrote:
> For tokenizer, sentence, pos tagger y tokchunk.
> 
> I amn't sure that i can obtain Spanish corpora.
> 
> Thanks.
> 
> 
> 2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:
> 
> > On 03/19/2014 12:01 PM, Charles Jalin wrote:
> >
> >> How i do this?
> >>
> >>
> >>
> > Depends on the model. For which component?
> >
> > Anyway, the best way to improve the situation would be to
> > add support to OpenNLP to train it on the available Spanish corpora.
> >
> > Jörn
> >

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

For tokenizer, sentence, pos tagger y tokchunk.

I amn't sure that i can obtain Spanish corpora.

Thanks.


2014-03-19 12:08 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:

> On 03/19/2014 12:01 PM, Charles Jalin wrote:
>
>> How i do this?
>>
>>
>>
> Depends on the model. For which component?
>
> Anyway, the best way to improve the situation would be to
> add support to OpenNLP to train it on the available Spanish corpora.
>
> Jörn
>

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

On 03/19/2014 12:01 PM, Charles Jalin wrote:
> How i do this?
>
>

Depends on the model. For which component?

Anyway, the best way to improve the situation would be to
add support to OpenNLP to train it on the available Spanish corpora.

Jörn

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

How i do this?



2014-03-19 11:52 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:

> On 03/19/2014 11:45 AM, Charles Jalin wrote:
>
>> Thanks to all.
>>
>> I found models in spanish for opennlp 1.4.3. Can they be used in opennlp
>> 1.5.3?
>>
>>
> Only when you manually package them in the 1.5.x format.
>
> Jörn
>

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

On 03/19/2014 11:45 AM, Charles Jalin wrote:
> Thanks to all.
>
> I found models in spanish for opennlp 1.4.3. Can they be used in opennlp
> 1.5.3?
>

Only when you manually package them in the 1.5.x format.

Jörn

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

Thanks to all.

I found models in spanish for opennlp 1.4.3. Can they be used in opennlp
1.5.3?

Regards.


2014-03-19 11:37 GMT+01:00 Jörn Kottmann <ko...@gmail.com>:

> On 03/19/2014 11:22 AM, Richard Eckart de Castilho wrote:
>
>> Of course you could probably always train your own models, at least
>> for the tokenizer, sentencedetector, and pos tagger. I believe the
>> AnCora corpus should serve well [1].
>>
>> Not sure about the chunker though and last time I looked, I believe
>> the parser was pretty much hard-coded to English.
>>
>
> The chunker can be trained for Spanish with out any modifications. All you
> need is
> a training corpus and a tool which can convert it into the OpenNLP format.
>
> The parser needs a head rules file for Spanish, we recently got a
> contribution for one
> and it should soon be possible to train it on Spanish too.
>
> HTH,
> Jörn
>

Re: Models in spanish

Posted by Jörn Kottmann <ko...@gmail.com>.

On 03/19/2014 11:22 AM, Richard Eckart de Castilho wrote:
> Of course you could probably always train your own models, at least
> for the tokenizer, sentencedetector, and pos tagger. I believe the
> AnCora corpus should serve well [1].
>
> Not sure about the chunker though and last time I looked, I believe
> the parser was pretty much hard-coded to English.

The chunker can be trained for Spanish with out any modifications. All 
you need is
a training corpus and a tool which can convert it into the OpenNLP format.

The parser needs a head rules file for Spanish, we recently got a 
contribution for one
and it should soon be possible to train it on Spanish too.

HTH,
Jörn

Re: Models in spanish

Posted by Richard Eckart de Castilho <re...@apache.org>.

Of course you could probably always train your own models, at least
for the tokenizer, sentencedetector, and pos tagger. I believe the
AnCora corpus should serve well [1].

Not sure about the chunker though and last time I looked, I believe
the parser was pretty much hard-coded to English. 

Anybody, please correct me if I am wrong.

Cheers,

-- Richard

[1] http://clic.ub.edu/corpus/ancora-descarregues

On 19.03.2014, at 10:51, Richard Eckart de Castilho <re...@apache.org> wrote:

> Hi,
> 
> there are no models for tokenizer, sentencedetector, chunker and parser
> that I would know of - but that is not an authoritative answer. 
> 
> There are third-party models for the POS tagger available here 
> - https://github.com/utcompling/OpenNLP-Models
> 
> If you are not restricted to OpenNLP, there is other software which
> supports spanish at various linguistic levels, e.g.
> 
> - mate-tools - https://code.google.com/p/mate-tools/
> - freeling - http://nlp.lsi.upc.edu/freeling/
> 
> Cheers,
> 
> -- Richard
> 
> On 19.03.2014, at 10:43, Charles Jalin <ch...@gmail.com> wrote:
> 
>> I refer to models for tokenizer, sentencedetector, chunker, pos tagger and
>> parser.
>> 
>> Thanks for your quick response.
>> 
>> Regards.
>> 
>> 
>> 2014-03-19 10:36 GMT+01:00 swapnil marathe <sp...@gmail.com>:
>> 
>>> Yes there are models for Spanish language.
>>> Opennlp models <http://opennlp.sourceforge.net/models-1.5/>
>>> 
>>> check under language column "es"
>>> you can get name finder models there
>>> 
>>> es Name Finder Person name finder model. Trained on conll02 shared task
>>> data. es-ner-person.bin<
>>> http://opennlp.sourceforge.net/models-1.5/es-ner-person.bin>
>>> es Name Finder Organization name finder model. Trained on conll02 shared
>>> task data. es-ner-organization.bin<
>>> http://opennlp.sourceforge.net/models-1.5/es-ner-organization.bin>
>>> es Name Finder Location name finder model. Trained on conll02 shared task
>>> data. es-ner-location.bin<
>>> http://opennlp.sourceforge.net/models-1.5/es-ner-location.bin>
>>> es Name Finder Misc name finder model. Trained on conll02 shared task data.
>>> es-ner-misc.bin <http://opennlp.sourceforge.net/models-1.5/es-ner-misc.bin
>>> 
>>> On Wed, Mar 19, 2014 at 2:55 PM, Charles Jalin
>>> <ch...@gmail.com>wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I am newbie in OpenNLP. I need to use it in a spanish project. Is there
>>>> models in spanish?
>>>> 
>>>> Thanks for your attention.
>>>> 
>>>> Regards.

Re: Models in spanish

Posted by Richard Eckart de Castilho <re...@apache.org>.

Hi,

there are no models for tokenizer, sentencedetector, chunker and parser
that I would know of - but that is not an authoritative answer. 

There are third-party models for the POS tagger available here 
- https://github.com/utcompling/OpenNLP-Models

If you are not restricted to OpenNLP, there is other software which
supports spanish at various linguistic levels, e.g.

- mate-tools - https://code.google.com/p/mate-tools/
- freeling - http://nlp.lsi.upc.edu/freeling/

Cheers,

-- Richard

On 19.03.2014, at 10:43, Charles Jalin <ch...@gmail.com> wrote:

> I refer to models for tokenizer, sentencedetector, chunker, pos tagger and
> parser.
> 
> Thanks for your quick response.
> 
> Regards.
> 
> 
> 2014-03-19 10:36 GMT+01:00 swapnil marathe <sp...@gmail.com>:
> 
>> Yes there are models for Spanish language.
>> Opennlp models <http://opennlp.sourceforge.net/models-1.5/>
>> 
>> check under language column "es"
>> you can get name finder models there
>> 
>> es Name Finder Person name finder model. Trained on conll02 shared task
>> data. es-ner-person.bin<
>> http://opennlp.sourceforge.net/models-1.5/es-ner-person.bin>
>> es Name Finder Organization name finder model. Trained on conll02 shared
>> task data. es-ner-organization.bin<
>> http://opennlp.sourceforge.net/models-1.5/es-ner-organization.bin>
>> es Name Finder Location name finder model. Trained on conll02 shared task
>> data. es-ner-location.bin<
>> http://opennlp.sourceforge.net/models-1.5/es-ner-location.bin>
>> es Name Finder Misc name finder model. Trained on conll02 shared task data.
>> es-ner-misc.bin <http://opennlp.sourceforge.net/models-1.5/es-ner-misc.bin
>> 
>> On Wed, Mar 19, 2014 at 2:55 PM, Charles Jalin
>> <ch...@gmail.com>wrote:
>> 
>>> Hello,
>>> 
>>> I am newbie in OpenNLP. I need to use it in a spanish project. Is there
>>> models in spanish?
>>> 
>>> Thanks for your attention.
>>> 
>>> Regards.
>>> 
>>

Re: Models in spanish

Posted by Charles Jalin <ch...@gmail.com>.

I refer to models for tokenizer, sentencedetector, chunker, pos tagger and
parser.

Thanks for your quick response.

Regards.


2014-03-19 10:36 GMT+01:00 swapnil marathe <sp...@gmail.com>:

> Yes there are models for Spanish language.
> Opennlp models <http://opennlp.sourceforge.net/models-1.5/>
>
> check under language column "es"
> you can get name finder models there
>
> es Name Finder Person name finder model. Trained on conll02 shared task
> data. es-ner-person.bin<
> http://opennlp.sourceforge.net/models-1.5/es-ner-person.bin>
> es Name Finder Organization name finder model. Trained on conll02 shared
> task data. es-ner-organization.bin<
> http://opennlp.sourceforge.net/models-1.5/es-ner-organization.bin>
> es Name Finder Location name finder model. Trained on conll02 shared task
> data. es-ner-location.bin<
> http://opennlp.sourceforge.net/models-1.5/es-ner-location.bin>
> es Name Finder Misc name finder model. Trained on conll02 shared task data.
> es-ner-misc.bin <http://opennlp.sourceforge.net/models-1.5/es-ner-misc.bin
> >
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Wed, Mar 19, 2014 at 2:55 PM, Charles Jalin
> <ch...@gmail.com>wrote:
>
> > Hello,
> >
> > I am newbie in OpenNLP. I need to use it in a spanish project. Is there
> > models in spanish?
> >
> > Thanks for your attention.
> >
> > Regards.
> >
>

Re: Models in spanish

Posted by swapnil marathe <sp...@gmail.com>.

Yes there are models for Spanish language.
Opennlp models <http://opennlp.sourceforge.net/models-1.5/>

check under language column "es"
you can get name finder models there

es Name Finder Person name finder model. Trained on conll02 shared task
data. es-ner-person.bin<http://opennlp.sourceforge.net/models-1.5/es-ner-person.bin>
es Name Finder Organization name finder model. Trained on conll02 shared
task data. es-ner-organization.bin<http://opennlp.sourceforge.net/models-1.5/es-ner-organization.bin>
es Name Finder Location name finder model. Trained on conll02 shared task
data. es-ner-location.bin<http://opennlp.sourceforge.net/models-1.5/es-ner-location.bin>
es Name Finder Misc name finder model. Trained on conll02 shared task data.
es-ner-misc.bin <http://opennlp.sourceforge.net/models-1.5/es-ner-misc.bin>

On Wed, Mar 19, 2014 at 2:55 PM, Charles Jalin
<ch...@gmail.com>wrote:

> Hello,
>
> I am newbie in OpenNLP. I need to use it in a spanish project. Is there
> models in spanish?
>
> Thanks for your attention.
>
> Regards.
>