You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Kasi Subrahmanyam <ka...@gmail.com> on 2014/05/26 06:48:06 UTC

Using existing model to train again

Hi team,
I have trained a model in naive Bayes using training data of 1 million
records. Now I have another 1 million records . Can I add this new training
data to the existing model and train it again to get a new model instead of
passing all the 2 million records at once to get a model.

Thanks,
Subbu

RE: Using existing model to train again

Posted by Andrew Palumbo <ap...@outlook.com>.
Hi Subbu,  

There is currently no way to update an already trained Naive Bayes Model.  You'd have to retrain on the full 2 million records.  

You could probably hack TrainNaiveBayesJob.java [1] to meet your needs if you anticipated this as something that you'd need to do in the future, but your new data will have to be vectorized in the exact same manner as the original data to update the model correctly- this would limit you to pure term frequencies (no IDF transformation) and would not allow for anything like maxDFPercent, etc.

Andy

[1]https://github.com/apache/mahout/blob/master/mrlegacy/src/main/java/org/apache/mahout/classifier/naivebayes/training/TrainNaiveBayesJob.java


> Hi team,
> I have trained a model in naive Bayes using training data of 1 million
> records. Now I have another 1 million records . Can I add this new training
> data to the existing model and train it again to get a new model instead of
> passing all the 2 million records at once to get a model.
>
> Thanks,
> Subbu
>
 		 	   		  

RE: Using existing model to train again

Posted by Andrew Palumbo <ap...@outlook.com>.
Hi Namit,

The current Naive Bayes implementation is based on MapReduce and therefore dependant on Hadoop.  You could run mahout trainnb and mahout  testnb scripts locally by setting the environment variable MAHOUT_LOCAL=true.  

This will keep everything on your local filesystem and prevent Mahout from attempting to run in cluster mode.  But Hadoop is required. 

Andy


> Date: Mon, 26 May 2014 10:22:18 +0530
> Subject: Re: Using existing model to train again
> From: namitmaheshwari7@gmail.com
> To: user@mahout.apache.org
> 
> Hi Subbu,
> 
> I was too working with Naive Bayes. I wanted to know whether it is possible
> to run *Naive Bayes without Hadoop* in Mahout or is it necessary to use
> Hadoop.
> 
> Thanks
> Namit
> 
> 
> On Mon, May 26, 2014 at 10:18 AM, Kasi Subrahmanyam
> <ka...@gmail.com>wrote:
> 
> > Hi team,
> > I have trained a model in naive Bayes using training data of 1 million
> > records. Now I have another 1 million records . Can I add this new training
> > data to the existing model and train it again to get a new model instead of
> > passing all the 2 million records at once to get a model.
> >
> > Thanks,
> > Subbu
> >
 		 	   		  

Re: Using existing model to train again

Posted by namit maheshwari <na...@gmail.com>.
Hi Subbu,

I was too working with Naive Bayes. I wanted to know whether it is possible
to run *Naive Bayes without Hadoop* in Mahout or is it necessary to use
Hadoop.

Thanks
Namit


On Mon, May 26, 2014 at 10:18 AM, Kasi Subrahmanyam
<ka...@gmail.com>wrote:

> Hi team,
> I have trained a model in naive Bayes using training data of 1 million
> records. Now I have another 1 million records . Can I add this new training
> data to the existing model and train it again to get a new model instead of
> passing all the 2 million records at once to get a model.
>
> Thanks,
> Subbu
>