You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Chandra Mohan, Ananda Vel Murugan" <An...@honeywell.com> on 2012/08/16 12:28:40 UTC

Encoding and vectorizing

I am a beginner in mahout with not much background in math. I want to know what is encoder and vectorizer in mahout.

As far I know vector can be thought of as an array or tuple containing values for a specific attribute of the object which vector represents.

I have testcell data for mechanical component testing. I create a CSV file with various details gathered from test cell database. I want to run logistic regression on this data and predict the components life based on test cell data.  I want to understand what is vectorization and encoding in this context.

Any help would be greatly appreciated.

Regards,
Anand.C

Re: Encoding and vectorizing

Posted by Ted Dunning <te...@gmail.com>.
The thing to look at is the encoder framework
in org.apache.mahout.vectorizer.encoders

See for instance

https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/vectorizer/encoders/StaticWordValueEncoder.java

Chapter 14 of Mahout in Action describes the process in more detail.  There
are examples in the Mahout distribution as well.

On Thu, Aug 16, 2012 at 5:41 PM, Chandra Mohan, Ananda Vel Murugan <
Ananda.Murugan@honeywell.com> wrote:

> Hi,
>
> Almost all my data in CSV file is categorical data. Can you elaborate what
> you mean by fancier footwork? Should I convert categories into some numbers
> and store in vector? Thanks!!
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Thursday, August 16, 2012 8:08 PM
> To: user@mahout.apache.org
> Cc: mahout-user@apache.org
> Subject: Re: Encoding and vectorizing
>
> If your data is dense and numerical, then you don't need anything but
> trivial encoding.  Just copy the values from your CSV file into the vector,
> converting to numbers as you go.  If some of your data are categorical or
> textual, you will need fancier footwork.
>
> On Thu, Aug 16, 2012 at 3:28 AM, Chandra Mohan, Ananda Vel Murugan <
> Ananda.Murugan@honeywell.com> wrote:
>
> > I am a beginner in mahout with not much background in math. I want to
> know
> > what is encoder and vectorizer in mahout.
> >
> > As far I know vector can be thought of as an array or tuple containing
> > values for a specific attribute of the object which vector represents.
> >
> > I have testcell data for mechanical component testing. I create a CSV
> file
> > with various details gathered from test cell database. I want to run
> > logistic regression on this data and predict the components life based on
> > test cell data.  I want to understand what is vectorization and encoding
> in
> > this context.
> >
> > Any help would be greatly appreciated.
> >
> > Regards,
> > Anand.C
> >
>

RE: Encoding and vectorizing

Posted by "Chandra Mohan, Ananda Vel Murugan" <An...@honeywell.com>.
Hi, 

Almost all my data in CSV file is categorical data. Can you elaborate what you mean by fancier footwork? Should I convert categories into some numbers and store in vector? Thanks!!

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Thursday, August 16, 2012 8:08 PM
To: user@mahout.apache.org
Cc: mahout-user@apache.org
Subject: Re: Encoding and vectorizing

If your data is dense and numerical, then you don't need anything but
trivial encoding.  Just copy the values from your CSV file into the vector,
converting to numbers as you go.  If some of your data are categorical or
textual, you will need fancier footwork.

On Thu, Aug 16, 2012 at 3:28 AM, Chandra Mohan, Ananda Vel Murugan <
Ananda.Murugan@honeywell.com> wrote:

> I am a beginner in mahout with not much background in math. I want to know
> what is encoder and vectorizer in mahout.
>
> As far I know vector can be thought of as an array or tuple containing
> values for a specific attribute of the object which vector represents.
>
> I have testcell data for mechanical component testing. I create a CSV file
> with various details gathered from test cell database. I want to run
> logistic regression on this data and predict the components life based on
> test cell data.  I want to understand what is vectorization and encoding in
> this context.
>
> Any help would be greatly appreciated.
>
> Regards,
> Anand.C
>

Re: Encoding and vectorizing

Posted by Ted Dunning <te...@gmail.com>.
If your data is dense and numerical, then you don't need anything but
trivial encoding.  Just copy the values from your CSV file into the vector,
converting to numbers as you go.  If some of your data are categorical or
textual, you will need fancier footwork.

On Thu, Aug 16, 2012 at 3:28 AM, Chandra Mohan, Ananda Vel Murugan <
Ananda.Murugan@honeywell.com> wrote:

> I am a beginner in mahout with not much background in math. I want to know
> what is encoder and vectorizer in mahout.
>
> As far I know vector can be thought of as an array or tuple containing
> values for a specific attribute of the object which vector represents.
>
> I have testcell data for mechanical component testing. I create a CSV file
> with various details gathered from test cell database. I want to run
> logistic regression on this data and predict the components life based on
> test cell data.  I want to understand what is vectorization and encoding in
> this context.
>
> Any help would be greatly appreciated.
>
> Regards,
> Anand.C
>