You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Simanchala Prasad Panigrahi <si...@gmail.com> on 2012/09/11 19:59:07 UTC

Need help on understanding Mahout classification algorithm

Hi,

I am a beginner to Mahout. For last 3 weeks I am working on
understanding *Mahout
Classification algorithm.*

Here are my problems:

   - I couldn't able to track how it works.
   - I don't know what should be the input file format for the
   "trainclassifier" algorithm.
   - Do i need to parse and turn my input data into any vector format.
   - How to store/use the output/analysis.

I am using Mahout version 5 and the algorithms which I am using are:

   - trainlogistic
   - runlogistic
   - trainclassifier
   - testclassifier

Suppose my task is to check multiplication of two numbers is correct or
not, so my input data is:

*a,b,result,flag*
*2,2,4,true*
*2,2,5,false*

so it's a CSV file format.

First I want to understand it for a basic operation so I have chosen this
example.

Can any one of you please help me on this.

*Really I am working very hard but not getting any result.*

I have searched on internet but no where I am finding a proper solution.

Thank you,
Simanchala Panigrahi

Re: Need help on understanding Mahout classification algorithm

Posted by Salman Mahmood <sa...@influestor.com>.
First let me state that i have not worked on mahoout 0.5. I have got
0.7 and i am assuming it has got same examples.
In mahout the input to the training algorithm is always in vector
format. You might provide input as text files but they always get
converted to vectors. If you are playing around with the code instead
of the command-line, you will notice that every input gets converted
into vectors before the training statement. If i remember correctly,
the line you are looking for is encoder.addToVector() where encoder is
the type staticwordvalueencoder.
The output of a training algo from the examples are in confusion
matrix format. You might need to know how to interpret a confusion
matrix. Notice that the confusion matrix output is only for testing
purposes. If you want to play with the output theres a
classificationresult class in testing examples, check its properties.
Hope this helps.

Sent from my iPhone

On 11 Sep 2012, at 20:00, Simanchala Prasad Panigrahi
<si...@gmail.com> wrote:

> Hi,
>
> I am a beginner to Mahout. For last 3 weeks I am working on
> understanding *Mahout
> Classification algorithm.*
>
> Here are my problems:
>
>   - I couldn't able to track how it works.
>   - I don't know what should be the input file format for the
>   "trainclassifier" algorithm.
>   - Do i need to parse and turn my input data into any vector format.
>   - How to store/use the output/analysis.
>
> I am using Mahout version 5 and the algorithms which I am using are:
>
>   - trainlogistic
>   - runlogistic
>   - trainclassifier
>   - testclassifier
>
> Suppose my task is to check multiplication of two numbers is correct or
> not, so my input data is:
>
> *a,b,result,flag*
> *2,2,4,true*
> *2,2,5,false*
>
> so it's a CSV file format.
>
> First I want to understand it for a basic operation so I have chosen this
> example.
>
> Can any one of you please help me on this.
>
> *Really I am working very hard but not getting any result.*
>
> I have searched on internet but no where I am finding a proper solution.
>
> Thank you,
> Simanchala Panigrahi