You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Frederic Dang Ngoc <fr...@gmail.com> on 2013/03/14 02:29:05 UTC

Re: How to classifyan individual file after training

BS TLC <bstlc <at> ymail.com> writes:

> 
> Does anyone have a working piece of code for classifying individual documents 
after training the naive
> bayes model?
> 
> In the past, the class org.apache.mahout.classifier.Classify did this job, but 
i haven't found any
> equivalent working on the current version.
> Thanks
> 
> > That's exactly what I was trying to do, by running TestNewsGroups.java, as
> > I explained in my last post.
> > Here's the code again with the stack trace. There's something wrong I'm
> > doing while loading up the model (and I can't load up the Naive Bayes, see
> > code)
> > 
> > Thanks
> > 
> > https://gist.github.com/anonymous/4720473 
> 
> 

Hi,

I have just written a post on my blog to describe how to train the model and use 
it to classify new documents:

https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-
classifier-to-automatically-classify-twitter-messages/

To classify new documents, you'll need the following files from HDFS:
- labelindex
- model directory with the file naiveBayesModel.bin in it
- dictionary.file-0 (in the vectors directory)
- df-count (in the vectors directory)

I use the following code to classify new documents using those files:
https://github.com/fredang/mahout-naive-bayes-
example/blob/master/src/main/java/com/chimpler/example/bayes/Classifier.java

Hope that it helps.

Frederic


Re: How to classifyan individual file after training

Posted by Adam Baron <ad...@gmail.com>.
Frederic,

Adding the functionality to classify new text on a go-forward basis against
an existing Naïve Bayes model would be very helpful functionality to add to
Mahout.  I found your blog post informative and I'm sure many other
classification users of Mahout have faced similar challenges to what we
have.

Regards,
         Adam

On Wed, Mar 13, 2013 at 6:29 PM, Frederic Dang Ngoc <
frederic.dangngoc@gmail.com> wrote:

> BS TLC <bstlc <at> ymail.com> writes:
>
> >
> > Does anyone have a working piece of code for classifying individual
> documents
> after training the naive
> > bayes model?
> >
> > In the past, the class org.apache.mahout.classifier.Classify did this
> job, but
> i haven't found any
> > equivalent working on the current version.
> > Thanks
> >
> > > That's exactly what I was trying to do, by running
> TestNewsGroups.java, as
> > > I explained in my last post.
> > > Here's the code again with the stack trace. There's something wrong I'm
> > > doing while loading up the model (and I can't load up the Naive Bayes,
> see
> > > code)
> > >
> > > Thanks
> > >
> > > https://gist.github.com/anonymous/4720473
> >
> >
>
> Hi,
>
> I have just written a post on my blog to describe how to train the model
> and use
> it to classify new documents:
>
> https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-
> classifier-to-automatically-classify-twitter-messages/<https://chimpler.wordpress.com/2013/03/13/using-the-mahout-naive-bayes-classifier-to-automatically-classify-twitter-messages/>
>
> To classify new documents, you'll need the following files from HDFS:
> - labelindex
> - model directory with the file naiveBayesModel.bin in it
> - dictionary.file-0 (in the vectors directory)
> - df-count (in the vectors directory)
>
> I use the following code to classify new documents using those files:
> https://github.com/fredang/mahout-naive-bayes-
>
> example/blob/master/src/main/java/com/chimpler/example/bayes/Classifier.java
>
> Hope that it helps.
>
> Frederic
>
>