You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by selva <se...@gmail.com> on 2011/12/21 08:06:23 UTC

Map/Reduce for mahout SGD Classification

Hi everyone,
           I modified mahout sgd code for my input format and generated
model for my input. But i have large amount of data, for that i need
map/reduce for sgd algorithm. Kindly i request to give me a solution for
this...

When will map/reduce release for mahout SGD Classification?
When will mahout 0.6 release ?

Thanks,
Selva

Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by Lance Norskog <go...@gmail.com>.
Cool!

On Wed, Dec 28, 2011 at 2:39 PM, Ted Dunning <te...@gmail.com> wrote:
> In particular, see the
>
>
> https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig
>
> directory.
>
> On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <te...@gmail.com> wrote:
>
>> Yes.
>>
>> In the pig-vector thing I am working on, I have a nice way to specify
>> types and conversions.
>>
>> See https://github.com/tdunning/pig-vector
>>
>>
>> On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <gs...@apache.org>wrote:
>>
>>> > When strings (or nominals) are converted to doubles, it seems to me
>>> that the conversion adds additional irrelevant structure that I don't want.
>>>   Depending on the order in which the strings are added, the assigned
>>> doubles will vary.     Adjacent strings in the ordering will be close
>>> together in the metric space/distance measure.  For example, if "john" is
>>> 1, "bob" is 2, and "nancy" is 3, then john is
>>> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
>>> users will probably really want three binary attributes: one for john, one
>>> for bob, and one for nancy.
>>> >
>>>
>>> We could perhaps use the SGD vector encoding stuff here?
>>>
>>
>>



-- 
Lance Norskog
goksron@gmail.com

Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by Ted Dunning <te...@gmail.com>.
In particular, see the


https://github.com/tdunning/pig-vector/tree/master/src/main/antlr3/org/apache/mahout/pig

directory.

On Wed, Dec 28, 2011 at 2:38 PM, Ted Dunning <te...@gmail.com> wrote:

> Yes.
>
> In the pig-vector thing I am working on, I have a nice way to specify
> types and conversions.
>
> See https://github.com/tdunning/pig-vector
>
>
> On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> > When strings (or nominals) are converted to doubles, it seems to me
>> that the conversion adds additional irrelevant structure that I don't want.
>>   Depending on the order in which the strings are added, the assigned
>> doubles will vary.     Adjacent strings in the ordering will be close
>> together in the metric space/distance measure.  For example, if "john" is
>> 1, "bob" is 2, and "nancy" is 3, then john is
>> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
>> users will probably really want three binary attributes: one for john, one
>> for bob, and one for nancy.
>> >
>>
>> We could perhaps use the SGD vector encoding stuff here?
>>
>
>

Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by Ted Dunning <te...@gmail.com>.
Yes.

In the pig-vector thing I am working on, I have a nice way to specify types
and conversions.

See https://github.com/tdunning/pig-vector

On Wed, Dec 28, 2011 at 1:55 PM, Grant Ingersoll <gs...@apache.org>wrote:

> > When strings (or nominals) are converted to doubles, it seems to me that
> the conversion adds additional irrelevant structure that I don't want.
> Depending on the order in which the strings are added, the assigned doubles
> will vary.     Adjacent strings in the ordering will be close together in
> the metric space/distance measure.  For example, if "john" is 1, "bob" is
> 2, and "nancy" is 3, then john is
> > closer to bob than to nancy.    For nominals, that seems wrong.    Most
> users will probably really want three binary attributes: one for john, one
> for bob, and one for nancy.
> >
>
> We could perhaps use the SGD vector encoding stuff here?
>

Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by Grant Ingersoll <gs...@apache.org>.
On Dec 23, 2011, at 6:21 PM, Donald A. Smith wrote:

> More on conversion from ARFF files:
> 
> Looking at the code in MapBackedARFModel.java (below), each string in the document is assigned a separate double (converted from an integer value).  Nominals are treated similarly: each possible nominal/symbolic value is assigned an integer-valued double. 
> 
> When strings (or nominals) are converted to doubles, it seems to me that the conversion adds additional irrelevant structure that I don't want.   Depending on the order in which the strings are added, the assigned doubles will vary.     Adjacent strings in the ordering will be close together in the metric space/distance measure.  For example, if "john" is 1, "bob" is 2, and "nancy" is 3, then john is 
> closer to bob than to nancy.    For nominals, that seems wrong.    Most users will probably really want three binary attributes: one for john, one for bob, and one for nancy.
> 

We could perhaps use the SGD vector encoding stuff here?  

> Am I correct that representing nominals and strings as doubles (in a single attribute) introduces distracting structure (distance relations)?  Maybe I'm missing something.
> 
> What I may want is to create a different attribute for each possible value of each component of the URL (counting from the left).   Attribute  component1_1 through component1_k  would be binary attributes representing the k possible values in the first component of the URL. Similarly for component2_1, ...  Weka has its own utility class for converting string attributes 
> to nominal attributes. That might give me what I want, for path based 
> data. I'd need to preprocess the data.

Or implement your own ARFFModel.

> 
> For URLs I have additional structure: ordering on the URL components.  But if I just wanted to represent a document as an unordered bag-of-words, then each possible string or nominal should become a separate binary attribute,   MapBackedARFFModel.java doesn't seem to do the right thing.

We can patch this if you have an alternate implementation.

> 
> Seems like a compressed binary format would be useful for representing such attributes, unless you also needed a count.
> 
>  Thanks, Don
> 
> --- On Wed, 12/21/11, Grant Ingersoll <gs...@apache.org> wrote:
> 
> 
>     From: Grant Ingersoll <gs...@apache.org>
>     Subject: Re: Will "mahout arff.vector" correctly convert string attributes?
>     To: user@mahout.apache.org
>     Date: Wednesday, December 21, 2011, 10:09 AM
> 
>     The javadocs on ARFFVectorIterable say:
>     * Attribute type handling:
>     * <ul>
>     * <li>Numeric -> As is</li>
>     * <li>Nominal -> ordinal(value) i.e. @attribute lumber {'\'(-inf-0.5]\'','\'(0.5-inf)\''}
>     * will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
>     * <li>Dates -> Convert to time as a long</li>
>     * <li>Strings -> Create a map of String -> long</li>
>     * </ul>
> 
>     The code for this is in MapBackedARFFModel which implements ARFFModel, so I suspect if it doesn't do exactly as you wish, it can be overridden.
> 
>     On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:
> 
>     > Weka's ARFF format allows string attrbutes.
>     >
>     >   @ATTRIBUTE userName string
>     >
>     > Will "mahout arff.vector" correctly handle conversion from such strings to vectors in such a way that the attribute will, effectively, be treated the same as a nominal attribute? That is, will the set of strings be converted into a set of nominal attributes (one for each possible string value)?
>     >
>     >   @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
>     >
>     > In general, will I lose any information by using arff.vector?
>     >
>     > For date attributes, will mahout insert derived attributes (hour of day, day of week)? I presume not and I presume I have to add them myself.
>     >
>     >  Thanks, Don
> 
>     --------------------------------------------
>     Grant Ingersoll
>     http://www.lucidimagination.com
> 
> 
> 
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Re: Does mahout have nominal attributes?

Posted by Zach Richardson <za...@raveldata.com>.
1-of-n encoding.  That's it.

On Mon, Dec 26, 2011 at 4:36 PM, Ted Dunning <te...@gmail.com> wrote:

> Mahout uses 1-of-n encoding (aka Zach's bitmap) but stores these encodings
> all together in double vectors for consistency.
>
> In the hashed encoding, we do this, but all of the encoded variables live
> on top of each other in randomized and multiple locations in the encoded
> vector.  This sounds crazy, but works quite well.
>
> On Sun, Dec 25, 2011 at 9:18 PM, Zach Richardson <za...@raveldata.com>
> wrote:
>
> > In a way yes.
> >
> > Generally you want to convert nominal attributes to a "bitmap" (this has
> a
> > fancier name that is slipping my mind at the moment).  Where each "name"
> in
> > the nominal feature has a spot in the vector for being on or off.  In
> most
> > cases this should be set to one.  I am not aware of anything like that in
> > mahout for regular vector encoding.  You could reasonably easy write your
> > own.
> >
> > For instance if you have A, B, and C as the three possible values in your
> > nominal feature, you would encode
> >
> > A B C
> > 1 0 0 for A
> > 0 1 0 for B etc.
> >
> > However, if you are planning on using the SGD classifiers you can use the
> > Hash based encoding for Categorical / Nominal features through the
> > WordValueEncoder.
> >
> > Hope this helps.
> >
> > Zach
> >
> > On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith
> > <th...@yahoo.com>wrote:
> >
> > > I believe that vectorized attributes are stored as doubles in mahout.
> >  Are
> > > some
> > > attributes "nominal"? That is, for some attributes is the distance
> > > function such that any two unequal values are at distance 1?
> > >
> > > Looking
> > > at MapBackedARFFModel.java, I see that weka nominal attributes get
> > > converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the
> > > nominal with value 1.0 be closer to the nominal with value 2.0 than to
> > > the nominal with value 3.0?  Or is the distance between 1.0 and 3.0
> also
> > 1?
> > >
> > >
> > >
> > >  Thanks, Don
> >
> >
> >
> >
> > --
> > Zach Richardson
> > Ravel, Co-founder
> > Austin, TX
> > zach@raveldata.com
> > 512.825.6031
> >
>



-- 
Zach Richardson
Ravel, Co-founder
Austin, TX
zach@raveldata.com
512.825.6031

Re: Does mahout have nominal attributes?

Posted by Ted Dunning <te...@gmail.com>.
Mahout uses 1-of-n encoding (aka Zach's bitmap) but stores these encodings
all together in double vectors for consistency.

In the hashed encoding, we do this, but all of the encoded variables live
on top of each other in randomized and multiple locations in the encoded
vector.  This sounds crazy, but works quite well.

On Sun, Dec 25, 2011 at 9:18 PM, Zach Richardson <za...@raveldata.com> wrote:

> In a way yes.
>
> Generally you want to convert nominal attributes to a "bitmap" (this has a
> fancier name that is slipping my mind at the moment).  Where each "name" in
> the nominal feature has a spot in the vector for being on or off.  In most
> cases this should be set to one.  I am not aware of anything like that in
> mahout for regular vector encoding.  You could reasonably easy write your
> own.
>
> For instance if you have A, B, and C as the three possible values in your
> nominal feature, you would encode
>
> A B C
> 1 0 0 for A
> 0 1 0 for B etc.
>
> However, if you are planning on using the SGD classifiers you can use the
> Hash based encoding for Categorical / Nominal features through the
> WordValueEncoder.
>
> Hope this helps.
>
> Zach
>
> On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith
> <th...@yahoo.com>wrote:
>
> > I believe that vectorized attributes are stored as doubles in mahout.
>  Are
> > some
> > attributes "nominal"? That is, for some attributes is the distance
> > function such that any two unequal values are at distance 1?
> >
> > Looking
> > at MapBackedARFFModel.java, I see that weka nominal attributes get
> > converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the
> > nominal with value 1.0 be closer to the nominal with value 2.0 than to
> > the nominal with value 3.0?  Or is the distance between 1.0 and 3.0 also
> 1?
> >
> >
> >
> >  Thanks, Don
>
>
>
>
> --
> Zach Richardson
> Ravel, Co-founder
> Austin, TX
> zach@raveldata.com
> 512.825.6031
>

Re: Does mahout have nominal attributes?

Posted by Zach Richardson <za...@raveldata.com>.
In a way yes.

Generally you want to convert nominal attributes to a "bitmap" (this has a
fancier name that is slipping my mind at the moment).  Where each "name" in
the nominal feature has a spot in the vector for being on or off.  In most
cases this should be set to one.  I am not aware of anything like that in
mahout for regular vector encoding.  You could reasonably easy write your
own.

For instance if you have A, B, and C as the three possible values in your
nominal feature, you would encode

A B C
1 0 0 for A
0 1 0 for B etc.

However, if you are planning on using the SGD classifiers you can use the
Hash based encoding for Categorical / Nominal features through the
WordValueEncoder.

Hope this helps.

Zach

On Sun, Dec 25, 2011 at 10:18 PM, Donald A. Smith
<th...@yahoo.com>wrote:

> I believe that vectorized attributes are stored as doubles in mahout.  Are
> some
> attributes "nominal"? That is, for some attributes is the distance
> function such that any two unequal values are at distance 1?
>
> Looking
> at MapBackedARFFModel.java, I see that weka nominal attributes get
> converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the
> nominal with value 1.0 be closer to the nominal with value 2.0 than to
> the nominal with value 3.0?  Or is the distance between 1.0 and 3.0 also 1?
>
>
>
>  Thanks, Don




-- 
Zach Richardson
Ravel, Co-founder
Austin, TX
zach@raveldata.com
512.825.6031

Does mahout have nominal attributes?

Posted by "Donald A. Smith" <th...@yahoo.com>.
I believe that vectorized attributes are stored as doubles in mahout.  Are some 
attributes "nominal"? That is, for some attributes is the distance 
function such that any two unequal values are at distance 1?   

Looking 
at MapBackedARFFModel.java, I see that weka nominal attributes get 
converted to integer-valued doubles (1.0, 2.0, 3.0, ...).   Will the 
nominal with value 1.0 be closer to the nominal with value 2.0 than to 
the nominal with value 3.0?  Or is the distance between 1.0 and 3.0 also 1?



 Thanks, Don

Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by "Donald A. Smith" <th...@yahoo.com>.
Forgot to include the code from MapBackedARFFModel.java:

  protected double processString(String data) {
    data = QUOTE_PATTERN.matcher(data).replaceAll("");
    // map it to an long
    Long theLong = words.get(data);
    if (theLong == null) {
      theLong = wordCount++;
      words.put(data, theLong);
    }
    return theLong;
  }


--- On Fri, 12/23/11, Donald A. Smith <th...@yahoo.com> wrote:

From: Donald A. Smith <th...@yahoo.com>
Subject: Re: Will "mahout arff.vector"  correctly convert string attributes?
To: user@mahout.apache.org
Date: Friday, December 23, 2011, 3:21 PM

More on conversion from ARFF files:

Looking at the code in MapBackedARFModel.java (below), each string in the document is assigned a separate double (converted from an integer value).  Nominals are treated similarly: each possible nominal/symbolic value is assigned an integer-valued double. 

When strings (or nominals) are converted to doubles, it seems to me that the conversion adds additional irrelevant structure that I don't want.   Depending on the order in which the strings are added, the assigned doubles will vary.     Adjacent strings in the ordering will be close together in the metric space/distance measure.  For example, if "john" is 1, "bob" is 2, and "nancy" is 3, then john is 
closer to bob than to nancy.    For nominals, that seems wrong.    Most users will probably really want three binary attributes: one for john, one for bob, and one for nancy.

Am I correct that representing nominals and strings as doubles (in a single attribute) introduces distracting structure (distance relations)?  Maybe I'm missing something.

What I may want is to create a different attribute for each possible value of each component of the URL (counting from the left).   Attribute  component1_1 through component1_k  would be binary attributes representing the k possible values in the first component of the URL. Similarly for component2_1, ...  Weka has its own utility class for converting string attributes 
to nominal attributes. That might give me what I want, for path based 
data. I'd need to preprocess the data.

For URLs I have additional structure: ordering on the URL components.  But if I just wanted to represent a document as an unordered bag-of-words, then each possible string or nominal should become a separate binary attribute,   MapBackedARFFModel.java doesn't seem to do the right thing.

Seems like a compressed binary format would be useful for representing such attributes, unless you also needed a count.

 Thanks, Don

--- On Wed, 12/21/11, Grant Ingersoll <gs...@apache.org> wrote:


    From: Grant Ingersoll <gs...@apache.org>
    Subject: Re: Will "mahout arff.vector" correctly convert string attributes?
    To: user@mahout.apache.org
    Date: Wednesday, December 21, 2011, 10:09 AM

    The javadocs on ARFFVectorIterable say:
    * Attribute type handling:
    * <ul>
    * <li>Numeric -> As is</li>
    * <li>Nominal -> ordinal(value) i.e. @attribute lumber {'\'(-inf-0.5]\'','\'(0.5-inf)\''}
    * will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
    * <li>Dates -> Convert to time as a long</li>
    * <li>Strings -> Create a map of String -> long</li>
    * </ul>

    The code for this is in MapBackedARFFModel which implements ARFFModel, so I suspect if it doesn't do exactly as you wish, it can be overridden.

    On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:

    > Weka's ARFF format allows string attrbutes.
    >
    >   @ATTRIBUTE userName string
    >
    > Will "mahout arff.vector" correctly handle conversion from such strings to vectors in such a way that the attribute will, effectively, be treated the same as a nominal attribute? That is, will the set of strings be converted into a set of nominal attributes (one for each possible string value)?
    >
    >   @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
    >
    > In general, will I lose any information by using arff.vector?
    >
    > For date attributes, will mahout insert derived attributes (hour of day, day of week)? I presume not and I presume I have to add them myself.
    >
    >  Thanks, Don

    --------------------------------------------
    Grant Ingersoll
    http://www.lucidimagination.com





Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by "Donald A. Smith" <th...@yahoo.com>.
More on conversion from ARFF files:

Looking at the code in MapBackedARFModel.java (below), each string in the document is assigned a separate double (converted from an integer value).  Nominals are treated similarly: each possible nominal/symbolic value is assigned an integer-valued double. 

When strings (or nominals) are converted to doubles, it seems to me that the conversion adds additional irrelevant structure that I don't want.   Depending on the order in which the strings are added, the assigned doubles will vary.     Adjacent strings in the ordering will be close together in the metric space/distance measure.  For example, if "john" is 1, "bob" is 2, and "nancy" is 3, then john is 
closer to bob than to nancy.    For nominals, that seems wrong.    Most users will probably really want three binary attributes: one for john, one for bob, and one for nancy.

Am I correct that representing nominals and strings as doubles (in a single attribute) introduces distracting structure (distance relations)?  Maybe I'm missing something.

What I may want is to create a different attribute for each possible value of each component of the URL (counting from the left).   Attribute  component1_1 through component1_k  would be binary attributes representing the k possible values in the first component of the URL. Similarly for component2_1, ...  Weka has its own utility class for converting string attributes 
to nominal attributes. That might give me what I want, for path based 
data. I'd need to preprocess the data.

For URLs I have additional structure: ordering on the URL components.  But if I just wanted to represent a document as an unordered bag-of-words, then each possible string or nominal should become a separate binary attribute,   MapBackedARFFModel.java doesn't seem to do the right thing.

Seems like a compressed binary format would be useful for representing such attributes, unless you also needed a count.

 Thanks, Don

--- On Wed, 12/21/11, Grant Ingersoll <gs...@apache.org> wrote:


    From: Grant Ingersoll <gs...@apache.org>
    Subject: Re: Will "mahout arff.vector" correctly convert string attributes?
    To: user@mahout.apache.org
    Date: Wednesday, December 21, 2011, 10:09 AM

    The javadocs on ARFFVectorIterable say:
    * Attribute type handling:
    * <ul>
    * <li>Numeric -> As is</li>
    * <li>Nominal -> ordinal(value) i.e. @attribute lumber {'\'(-inf-0.5]\'','\'(0.5-inf)\''}
    * will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
    * <li>Dates -> Convert to time as a long</li>
    * <li>Strings -> Create a map of String -> long</li>
    * </ul>

    The code for this is in MapBackedARFFModel which implements ARFFModel, so I suspect if it doesn't do exactly as you wish, it can be overridden.

    On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:

    > Weka's ARFF format allows string attrbutes.
    >
    >   @ATTRIBUTE userName string
    >
    > Will "mahout arff.vector" correctly handle conversion from such strings to vectors in such a way that the attribute will, effectively, be treated the same as a nominal attribute? That is, will the set of strings be converted into a set of nominal attributes (one for each possible string value)?
    >
    >   @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
    >
    > In general, will I lose any information by using arff.vector?
    >
    > For date attributes, will mahout insert derived attributes (hour of day, day of week)? I presume not and I presume I have to add them myself.
    >
    >  Thanks, Don

    --------------------------------------------
    Grant Ingersoll
    http://www.lucidimagination.com





Re: Will "mahout arff.vector" correctly convert string attributes?

Posted by Grant Ingersoll <gs...@apache.org>.
The javadocs on ARFFVectorIterable say:
* Attribute type handling:
 * <ul>
 * <li>Numeric -> As is</li>
 * <li>Nominal -> ordinal(value) i.e. @attribute lumber {'\'(-inf-0.5]\'','\'(0.5-inf)\''}
 * will convert -inf-0.5 -> 0, and 0.5-inf -> 1</li>
 * <li>Dates -> Convert to time as a long</li>
 * <li>Strings -> Create a map of String -> long</li>
 * </ul>

The code for this is in MapBackedARFFModel which implements ARFFModel, so I suspect if it doesn't do exactly as you wish, it can be overridden.

On Dec 21, 2011, at 12:37 PM, Donald A. Smith wrote:

> Weka's ARFF format allows string attrbutes.
> 
>   @ATTRIBUTE userName string
> 
> Will "mahout arff.vector" correctly handle conversion from such strings to vectors in such a way that the attribute will, effectively, be treated the same as a nominal attribute? That is, will the set of strings be converted into a set of nominal attributes (one for each possible string value)?
> 
>   @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}
> 
> In general, will I lose any information by using arff.vector?
> 
> For date attributes, will mahout insert derived attributes (hour of day, day of week)? I presume not and I presume I have to add them myself.
> 
>  Thanks, Don

--------------------------------------------
Grant Ingersoll
http://www.lucidimagination.com




Will "mahout arff.vector" correctly convert string attributes?

Posted by "Donald A. Smith" <th...@yahoo.com>.
Weka's ARFF format allows string attrbutes.

  @ATTRIBUTE userName string

Will "mahout arff.vector" correctly handle conversion from such strings to vectors in such a way that the attribute will, effectively, be treated the same as a nominal attribute? That is, will the set of strings be converted into a set of nominal attributes (one for each possible string value)?

  @ATTRIBUTE userName {bob, fred, harry, jill, betsy, george, bill}

In general, will I lose any information by using arff.vector?

For date attributes, will mahout insert derived attributes (hour of day, day of week)? I presume not and I presume I have to add them myself.

 Thanks, Don

Re: Map/Reduce for mahout SGD Classification

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 20, 2011 at 11:06 PM, selva <se...@gmail.com> wrote:

>           I modified mahout sgd code for my input format and generated
> model for my input. But i have large amount of data, for that i need
> map/reduce for sgd algorithm. Kindly i request to give me a solution for
> this...
>

One useful solution is to use map-reduce to encode your data as vectors.

That will make training much faster since much of the CPU time is required
only for encoding.

Also, if you would like more help, you really need to tell us more about
the data, the size and the shape for instance.  Without specific questions,
you can't expect to get specific answers.

Re: Map/Reduce for mahout SGD Classification

Posted by Lance Norskog <go...@gmail.com>.
https://issues.apache.org/jira/browse/MAHOUT-918 is a project to do
this. It is not checked in.

What is the nature of your data? It may be that you can train from a
sampled subset of your full corpus.

On Tue, Dec 20, 2011 at 11:06 PM, selva <se...@gmail.com> wrote:
> Hi everyone,
>           I modified mahout sgd code for my input format and generated
> model for my input. But i have large amount of data, for that i need
> map/reduce for sgd algorithm. Kindly i request to give me a solution for
> this...
>
> When will map/reduce release for mahout SGD Classification?
> When will mahout 0.6 release ?
>
> Thanks,
> Selva



-- 
Lance Norskog
goksron@gmail.com

Re: Map/Reduce for mahout SGD Classification

Posted by Isabel Drost <is...@apache.org>.
On 21.12.2011 Ted Dunning wrote:
> On Tue, Dec 20, 2011 at 11:06 PM, selva <se...@gmail.com> wrote:
> > When will map/reduce release for mahout SGD Classification?
> 
> Probably 0.6
> 
> >  When will mahout 0.6 release ?
> 
> Q1 of 2012

Valid for both: If you need the functionality faster - Any helping hand even if 
it just involves testing the patch is welcome.

Isabel

Re: Map/Reduce for mahout SGD Classification

Posted by Ted Dunning <te...@gmail.com>.
On Tue, Dec 20, 2011 at 11:06 PM, selva <se...@gmail.com> wrote:

> When will map/reduce release for mahout SGD Classification?
>

Probably 0.6


>  When will mahout 0.6 release ?
>

Q1 of 2012