You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Robin Anil <ro...@gmail.com> on 2010/08/18 19:55:15 UTC

Re: machine learning with Mahout

On Wed, Aug 18, 2010 at 8:31 AM, Srivathsan Srinivas <
srivathsan.srinivas@gmail.com> wrote:

> Hi Robin,
>       I am in the process of learning to use Mahout for machine learning
> and want to compare 2 different classification algorithms on the same set of
> data (say TwentyNewsGroups). Currently, I am looking at comparing Naive
> Bayes and Random Forest. How do I go about running these two algorithms for
> the same data set?

Seems like you are able to run the NewsGroups example on NaiveBayes. I
believe the whole test framework for RF was completed recently. I haven't
tried it myself. So I will have to pass this question to the Mahout mailing
list. There will be folks there who will be able to help

>




> I am able to run the Bayes classifier on the data-set - from the example
> tutorial. I am kind of lost after that. Also, I am interested in
> writing/building my own model later. Any example code to do that would also
> be helpful.

Well to run Naive Bayes all you need to do is replace the dataset in the
example with your own. Be sure you understand the format and prepare your
data in the same way.
The command line tool does everything, there aren't any hidden knobs at the
moment other than those which are available as command line parameters. The
model you will generate will depend on your data and these parameters. If
you would like to know about anything specific or good practices for the
type of data that you have you can ask around the mahout mailing list.


> Your suggestions are highly appreciated.
>
> Thanks,
> Srinivas.
>