You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Daniel Frank <da...@trendrr.com> on 2010/12/16 21:47:05 UTC

Serializing Maxent models in 1.5.0 / 3.0.0 vs 1.4.3 / 2.5.2

Hi all,

I'm new to this list, hope it is the appropriate place for help questions.
At Trendrr we had a lot of success with OpenNLP-tools 1.4.3 and Maxent 2.5.2
for sentiment classification, and were hoping to move to 1.5.0/3.0.0 as we
explored more features. However, I am running into something of a roadblock,
as it appears some things that affect serialization of models have changed.
Here is the problem:

In the past, to train a model, we called DocumentCategorizerME.train, which
returned a GISModel. We could then serialize this using GISModelWriter (from
maxent) or some such thing.

Having updated opennlp-tools and maxent, I now find that my earlier usage of
DocumentCategorizerME is deprecated, and I am instead urged to use a call
that returns a DoccatModel. Now I can no longer use GISModelWriter, as
DoccatModel is not a subclass of AbstractModel. So my first question is
this: what is the recommended method to serialize a DoccatModel? I've come
across GenericModelSerializer, but it appears not to perform a lot of the
legwork that GISModelWriter did.

Now, there is another way to train a GIS model, through the GIS class in
maxent. However, this is not suitable for us, as we need to specify our own
feature generators, and this does not appear to be possible in the GIS
class. I suppose then that if anybody could suggest a way to train a GIS
model in which I am able to specify my own feature generator(s), my problems
would be solved, so that is my second "question".

If you're reading this, thanks for bearing with me, and I appreciate any
input you have. Cheers,

Dan

Re: Serializing Maxent models in 1.5.0 / 3.0.0 vs 1.4.3 / 2.5.2

Posted by Jörn Kottmann <ko...@gmail.com>.

On 12/16/10 9:47 PM, Daniel Frank wrote:
> Hi all,
>
> I'm new to this list, hope it is the appropriate place for help questions.
> At Trendrr we had a lot of success with OpenNLP-tools 1.4.3 and Maxent 2.5.2
> for sentiment classification, and were hoping to move to 1.5.0/3.0.0 as we
> explored more features. However, I am running into something of a roadblock,
> as it appears some things that affect serialization of models have changed.
> Here is the problem:
>
> In the past, to train a model, we called DocumentCategorizerME.train, which
> returned a GISModel. We could then serialize this using GISModelWriter (from
> maxent) or some such thing.
>
> Having updated opennlp-tools and maxent, I now find that my earlier usage of
> DocumentCategorizerME is deprecated, and I am instead urged to use a call
> that returns a DoccatModel. Now I can no longer use GISModelWriter, as
> DoccatModel is not a subclass of AbstractModel. So my first question is
> this: what is the recommended method to serialize a DoccatModel? I've come
> across GenericModelSerializer, but it appears not to perform a lot of the
> legwork that GISModelWriter did.
>

We changed a lot of things in 1.5. All components expect the coref are now
using a "model package" its the old maxent model wrapped in a zip file with
meta data and eventually resources, configurations, etc..
All these new model packages extend the BaseModel class
which has a serialize method. Just call DoccatModel.serialize and pass 
an OutputStream
where the model is written into.

In the end you would have to retrain your models with the new API, thats why
we left the old API in place.
> Now, there is another way to train a GIS model, through the GIS class in
> maxent. However, this is not suitable for us, as we need to specify our own
> feature generators, and this does not appear to be possible in the GIS
> class. I suppose then that if anybody could suggest a way to train a GIS
> model in which I am able to specify my own feature generator(s), my problems
> would be solved, so that is my second "question".
>

The GISModel expects a set of features which it then calculates the
probs out the outcomes for. You either have a pre-trained model
or if you train it yourself you have to send events to the GISTrainer.

I assume you just implemented doccats FeatureGenerator interface, right ?

If your feature generation is not a secret we would of course be interested
to hear how you are doing it and maybe improve ours.

Hope that help,
Jörn