You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@opennlp.apache.org by Sheng <sh...@gmail.com> on 2017/07/14 19:59:56 UTC

Training with features

Hi,

I am new to opennlp, and currently is trying to learn how to train a ner
model. I have 2 questions,

1. In case I am using a custom set of features for training, do I have to
feed that set of features to NameFinderMe when I load the trained model. I
think not, as the xml descriptor has been part of artifactMap which is
persisted, but I may be wrong.

2. In the documentation on your web, you give an example of xml desc file
for training a ner model, which includes a few "cluster" based features.
These features need dictionary objects as part of the instantiation from
the resources. Apart from BrownCluster which is mentioned in the javadoc
that one should download a document from
metaoptimize.com/projects/wordreprs/. Do I just need to load that file into
BrownCluster directly? That link is unreachable at the moment, is it
already dead forever? And how about the other clusters? How can one create
a word2vec cluster, and what is clark.cluster ??

This is a long question. I really appreciate your patience of reading and
responding it!

Re: Training with features

Posted by Sheng <sh...@gmail.com>.
I am sorry - I somehow missed these replies. They look very helpful -
thanks a lot for your help!

On Tue, Jul 18, 2017 at 11:18 AM Rodrigo Agerri <ro...@ehu.eus>
wrote:

> There are many brown clusters here
>
> http://www.derczynski.com/sheffield/brown-tuning/
>
> Also the Brown bllip clusters are available
>
> http://people.csail.mit.edu/maestro/papers/bllip-clusters.gz
>
> And here if you unzip the models you can find clusters (Brown, Clark
> and Word2vec) inside for several languages:
>
> http://ixa2.si.ehu.es/ixa-pipes/models/nerc-models-1.5.4.tgz
>
> The clusters description is here (go to table 4):
>
> https://doi.org/10.1016/j.artint.2016.05.003
>
> Furthermore, you can find here clusters induced on Yelp data
> (reviews). Just unzip the models:
>
> http://ixa2.si.ehu.es/ixa-pipes/models/ote-models-1.5.0.tgz
>
> HTH,
>
> R
>
>
>
> On Tue, Jul 18, 2017 at 2:35 PM, William Colen <wi...@gmail.com>
> wrote:
> > Sheng,
> >
> > Regarding 2, take a look at this like, it can help you:
> > https://github.com/ragerri/cluster-preprocessing
> >
> > Regarding 1, you are right. If you trained with a custom feature
> generator
> > it will be applied both in training and runtime.
> >
> > William
> >
> > 2017-07-14 16:59 GMT-03:00 Sheng <sh...@gmail.com>:
> >
> >> Hi,
> >>
> >> I am new to opennlp, and currently is trying to learn how to train a ner
> >> model. I have 2 questions,
> >>
> >> 1. In case I am using a custom set of features for training, do I have
> to
> >> feed that set of features to NameFinderMe when I load the trained
> model. I
> >> think not, as the xml descriptor has been part of artifactMap which is
> >> persisted, but I may be wrong.
> >>
> >> 2. In the documentation on your web, you give an example of xml desc
> file
> >> for training a ner model, which includes a few "cluster" based features.
> >> These features need dictionary objects as part of the instantiation from
> >> the resources. Apart from BrownCluster which is mentioned in the javadoc
> >> that one should download a document from
> >> metaoptimize.com/projects/wordreprs/. Do I just need to load that file
> >> into
> >> BrownCluster directly? That link is unreachable at the moment, is it
> >> already dead forever? And how about the other clusters? How can one
> create
> >> a word2vec cluster, and what is clark.cluster ??
> >>
> >> This is a long question. I really appreciate your patience of reading
> and
> >> responding it!
> >>
>

Re: Training with features

Posted by Rodrigo Agerri <ro...@ehu.eus>.
There are many brown clusters here

http://www.derczynski.com/sheffield/brown-tuning/

Also the Brown bllip clusters are available

http://people.csail.mit.edu/maestro/papers/bllip-clusters.gz

And here if you unzip the models you can find clusters (Brown, Clark
and Word2vec) inside for several languages:

http://ixa2.si.ehu.es/ixa-pipes/models/nerc-models-1.5.4.tgz

The clusters description is here (go to table 4):

https://doi.org/10.1016/j.artint.2016.05.003

Furthermore, you can find here clusters induced on Yelp data
(reviews). Just unzip the models:

http://ixa2.si.ehu.es/ixa-pipes/models/ote-models-1.5.0.tgz

HTH,

R



On Tue, Jul 18, 2017 at 2:35 PM, William Colen <wi...@gmail.com> wrote:
> Sheng,
>
> Regarding 2, take a look at this like, it can help you:
> https://github.com/ragerri/cluster-preprocessing
>
> Regarding 1, you are right. If you trained with a custom feature generator
> it will be applied both in training and runtime.
>
> William
>
> 2017-07-14 16:59 GMT-03:00 Sheng <sh...@gmail.com>:
>
>> Hi,
>>
>> I am new to opennlp, and currently is trying to learn how to train a ner
>> model. I have 2 questions,
>>
>> 1. In case I am using a custom set of features for training, do I have to
>> feed that set of features to NameFinderMe when I load the trained model. I
>> think not, as the xml descriptor has been part of artifactMap which is
>> persisted, but I may be wrong.
>>
>> 2. In the documentation on your web, you give an example of xml desc file
>> for training a ner model, which includes a few "cluster" based features.
>> These features need dictionary objects as part of the instantiation from
>> the resources. Apart from BrownCluster which is mentioned in the javadoc
>> that one should download a document from
>> metaoptimize.com/projects/wordreprs/. Do I just need to load that file
>> into
>> BrownCluster directly? That link is unreachable at the moment, is it
>> already dead forever? And how about the other clusters? How can one create
>> a word2vec cluster, and what is clark.cluster ??
>>
>> This is a long question. I really appreciate your patience of reading and
>> responding it!
>>

Re: Training with features

Posted by William Colen <wi...@gmail.com>.
Sheng,

Regarding 2, take a look at this like, it can help you:
https://github.com/ragerri/cluster-preprocessing

Regarding 1, you are right. If you trained with a custom feature generator
it will be applied both in training and runtime.

William

2017-07-14 16:59 GMT-03:00 Sheng <sh...@gmail.com>:

> Hi,
>
> I am new to opennlp, and currently is trying to learn how to train a ner
> model. I have 2 questions,
>
> 1. In case I am using a custom set of features for training, do I have to
> feed that set of features to NameFinderMe when I load the trained model. I
> think not, as the xml descriptor has been part of artifactMap which is
> persisted, but I may be wrong.
>
> 2. In the documentation on your web, you give an example of xml desc file
> for training a ner model, which includes a few "cluster" based features.
> These features need dictionary objects as part of the instantiation from
> the resources. Apart from BrownCluster which is mentioned in the javadoc
> that one should download a document from
> metaoptimize.com/projects/wordreprs/. Do I just need to load that file
> into
> BrownCluster directly? That link is unreachable at the moment, is it
> already dead forever? And how about the other clusters? How can one create
> a word2vec cluster, and what is clark.cluster ??
>
> This is a long question. I really appreciate your patience of reading and
> responding it!
>