You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@opennlp.apache.org by Jeffrey Zemerick <jz...@apache.org> on 2014/05/27 21:11:35 UTC

models in memory

Hi Users,

Is anyone aware of a way to load a TokenNameFinder model and use it without
storing the entire model in memory? My models take up about 6 GB of memory.
I see in the code that the model files are unzipped and put into a HashMap.
Is it possible to store the data structure off-heap somewhere?

Thanks,
Jeff

Re: models in memory

Posted by Jörn Kottmann <ko...@gmail.com>.

The model size depends on the amount of features you have, each feature
is stored as a String object in memory combined with some weights which 
are stored
as doubles.

How much training data do you have? How many features and outcomes does 
the data have?

Jörn

On 05/28/2014 12:32 AM, William Colen wrote:
> Usually you don't need a huge training data set to have an effective model.
> You can measure the tradeoff between the training dataset size, the cutoff
> and the algorithm using the 10-fold cross-validation tool included in the
> OpenNLP command line interface. You would need to run different experiments
> changing these parameters. In your case not only the F-measure is
> important, but also the model size.
>
>
> 2014-05-27 18:59 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:
>
>> I do not, William. I assumed it was due to the large training data set. I
>> will look into the things you mentioned. Thanks!
>>
>>
>> On Tue, May 27, 2014 at 3:35 PM, William Colen <william.colen@gmail.com
>>> wrote:
>>> Do you know why your model is so big?
>>>
>>> You can reduce its size by using a higher cutoff, or trying Perceptron.
>> You
>>> can also try using a entity dictionary, which will avoid the algorithm
>>> storing the entities in the form of features.
>>>
>>> I am not aware of a way to avoid loading it into memory.
>>>
>>> Regards,
>>> William
>>>
>>> 2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:
>>>
>>>> Hi Users,
>>>>
>>>> Is anyone aware of a way to load a TokenNameFinder model and use it
>>> without
>>>> storing the entire model in memory? My models take up about 6 GB of
>>> memory.
>>>> I see in the code that the model files are unzipped and put into a
>>> HashMap.
>>>> Is it possible to store the data structure off-heap somewhere?
>>>>
>>>> Thanks,
>>>> Jeff
>>>>

Re: models in memory

Posted by William Colen <wi...@gmail.com>.

Usually you don't need a huge training data set to have an effective model.
You can measure the tradeoff between the training dataset size, the cutoff
and the algorithm using the 10-fold cross-validation tool included in the
OpenNLP command line interface. You would need to run different experiments
changing these parameters. In your case not only the F-measure is
important, but also the model size.


2014-05-27 18:59 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:

> I do not, William. I assumed it was due to the large training data set. I
> will look into the things you mentioned. Thanks!
>
>
> On Tue, May 27, 2014 at 3:35 PM, William Colen <william.colen@gmail.com
> >wrote:
>
> > Do you know why your model is so big?
> >
> > You can reduce its size by using a higher cutoff, or trying Perceptron.
> You
> > can also try using a entity dictionary, which will avoid the algorithm
> > storing the entities in the form of features.
> >
> > I am not aware of a way to avoid loading it into memory.
> >
> > Regards,
> > William
> >
> > 2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:
> >
> > > Hi Users,
> > >
> > > Is anyone aware of a way to load a TokenNameFinder model and use it
> > without
> > > storing the entire model in memory? My models take up about 6 GB of
> > memory.
> > > I see in the code that the model files are unzipped and put into a
> > HashMap.
> > > Is it possible to store the data structure off-heap somewhere?
> > >
> > > Thanks,
> > > Jeff
> > >
> >
>

Re: models in memory

Posted by Jeffrey Zemerick <jz...@apache.org>.

I do not, William. I assumed it was due to the large training data set. I
will look into the things you mentioned. Thanks!


On Tue, May 27, 2014 at 3:35 PM, William Colen <wi...@gmail.com>wrote:

> Do you know why your model is so big?
>
> You can reduce its size by using a higher cutoff, or trying Perceptron. You
> can also try using a entity dictionary, which will avoid the algorithm
> storing the entities in the form of features.
>
> I am not aware of a way to avoid loading it into memory.
>
> Regards,
> William
>
> 2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:
>
> > Hi Users,
> >
> > Is anyone aware of a way to load a TokenNameFinder model and use it
> without
> > storing the entire model in memory? My models take up about 6 GB of
> memory.
> > I see in the code that the model files are unzipped and put into a
> HashMap.
> > Is it possible to store the data structure off-heap somewhere?
> >
> > Thanks,
> > Jeff
> >
>

Re: models in memory

Posted by William Colen <wi...@gmail.com>.

Do you know why your model is so big?

You can reduce its size by using a higher cutoff, or trying Perceptron. You
can also try using a entity dictionary, which will avoid the algorithm
storing the entities in the form of features.

I am not aware of a way to avoid loading it into memory.

Regards,
William

2014-05-27 16:11 GMT-03:00 Jeffrey Zemerick <jz...@apache.org>:

> Hi Users,
>
> Is anyone aware of a way to load a TokenNameFinder model and use it without
> storing the entire model in memory? My models take up about 6 GB of memory.
> I see in the code that the model files are unzipped and put into a HashMap.
> Is it possible to store the data structure off-heap somewhere?
>
> Thanks,
> Jeff
>