You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Deshpande, Vikas" <Vi...@netapp.com> on 2012/03/30 17:29:00 UTC

A Mahout question

Hi All,

I am trying to use Mahout for a personal "classification" project in which my input is one single big file of tab-separated "strings/words" on each line. Each line is an input to the classifier. While training, the class will be the first field in the line. While testing, the field will not be present (obviously).

Now how should I proceed? I tried going through the 20-newsgroups example. I tried following the code, but debugging it using command-line is difficult, and for some reason my eclipse (helios) does not like working with map/reduce applications (that's a whole new discussion).

Could somebody guide/help me with this?

I am running a 2-node hadoop cluster using the Cloudera CDH3u3 distribution (virtual machine) on a VMWare player.

Thanks in advance,
Vikas