You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ParvathyPillai <pa...@gmail.com> on 2012/05/22 12:31:04 UTC

Forecasting in Mahout

I am currently working on a project which deals with demand forecasting and
machine learning on Hadoop. I came across Mahout when researching for this.
>From the various tutorials and 'Mahout in Action' book, I came to understand
that classification algorithms on Mahout though allow the use of continuous
predictor variables, needs the target variables to be categorical. Is it
possible to apply these classification algorithms for predicting the values
of continuous variables, essentially like demand? If so, how?

--
View this message in context: http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Forecasting in Mahout

Posted by Nick Pentreath <ni...@gmail.com>.
Depending on your data size, if you need a distributed algorithm, you might
consider Vowpal Wabbit (https://github.com/JohnLangford/vowpal_wabbit/wiki).
It supports a squared loss function => regression.

Disclaimer: I have not used VW in distributed mode, but supposedly it can
handle just about any scale you want: http://hunch.net/?p=2094


On Wed, May 23, 2012 at 8:50 AM, Ted Dunning <te...@gmail.com> wrote:

> Counts as machine learning to me!
>
> On Wed, May 23, 2012 at 12:57 AM, Jason Xin <Ja...@sas.com> wrote:
>
> > A 'regular' regression may not qualify as machine-learning, although
> > machines definitely can learn regular regression. If data set is too
> large,
> > your R may crash. That is, most of R programs today.
> >
> > -----Original Message-----
> > From: Ted Dunning [mailto:ted.dunning@gmail.com]
> > Sent: Tuesday, May 22, 2012 7:34 PM
> > To: user@mahout.apache.org
> > Cc: mahout-user@lucene.apache.org
> > Subject: Re: Forecasting in Mahout
> >
> > That is a regression, not a classifier.  There are no good regression in
> > Mahout just now.
> >
> > How large is your data?  Is R not an option?
> >
> > On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
> > <pa...@gmail.com>wrote:
> >
> > > I am currently working on a project which deals with demand
> > > forecasting and machine learning on Hadoop. I came across Mahout when
> > researching for this.
> > > From the various tutorials and 'Mahout in Action' book, I came to
> > > understand that classification algorithms on Mahout though allow the
> > > use of continuous predictor variables, needs the target variables to
> > > be categorical. Is it possible to apply these classification
> > > algorithms for predicting the values of continuous variables,
> > > essentially like demand? If so, how?
> > >
> > > --
> > > View this message in context:
> > > http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm
> > > l Sent from the Mahout User List mailing list archive at Nabble.com.
> > >
> >
>

Re: Forecasting in Mahout

Posted by Ted Dunning <te...@gmail.com>.
Logistic regression can, strictly speaking, be used for regression of
probabilities.  The mahout implementation assumes that all of the inputs
are 0 or 1.

It is, however, still a regression method.

Logistic and linear regression are unified under the scheme of generalized
linear modeling.  There are other forms as well such as probit or Poisson
regression.  Each form has a natural kind of input and implies a different
kind of error process.

On Wed, May 23, 2012 at 7:01 AM, Philippe Adjiman <ad...@gmail.com> wrote:

> I think you mix logistic regression (which is a classifier as
> Ted mentioned) and (multivariables) linear regression.
> What you need is the latter.
> R or Matlab or Octave (others exists too) are great options if size of your
> data is tractable.
>
>
> On Wed, May 23, 2012 at 9:57 AM, Paritosh Ranjan <pr...@xebia.com>
> wrote:
>
> > Please correct me if I am wrong.
> >
> > I see Logistic Regression and Locally Weighted Linear Regression on the
> > algorithms page. Can't they be used for predicting the value of
> continuous
> > variables.
> >
> >
> https://cwiki.apache.org/**confluence/display/MAHOUT/**Logistic+Regression
> <https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression>
> > https://cwiki.apache.org/**confluence/display/MAHOUT/**
> > Locally+Weighted+Linear+**Regression<
> https://cwiki.apache.org/confluence/display/MAHOUT/Locally+Weighted+Linear+Regression
> >
> >
> >
> >
> > On 23-05-2012 12:20, Ted Dunning wrote:
> >
> >> Counts as machine learning to me!
> >>
> >> On Wed, May 23, 2012 at 12:57 AM, Jason Xin<Ja...@sas.com>  wrote:
> >>
> >>  A 'regular' regression may not qualify as machine-learning, although
> >>> machines definitely can learn regular regression. If data set is too
> >>> large,
> >>> your R may crash. That is, most of R programs today.
> >>>
> >>> -----Original Message-----
> >>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> >>> Sent: Tuesday, May 22, 2012 7:34 PM
> >>> To: user@mahout.apache.org
> >>> Cc: mahout-user@lucene.apache.org
> >>> Subject: Re: Forecasting in Mahout
> >>>
> >>> That is a regression, not a classifier.  There are no good regression
> in
> >>> Mahout just now.
> >>>
> >>> How large is your data?  Is R not an option?
> >>>
> >>> On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
> >>> <pa...@gmail.com>**wrote:
> >>>
> >>>  I am currently working on a project which deals with demand
> >>>> forecasting and machine learning on Hadoop. I came across Mahout when
> >>>>
> >>> researching for this.
> >>>
> >>>>  From the various tutorials and 'Mahout in Action' book, I came to
> >>>> understand that classification algorithms on Mahout though allow the
> >>>> use of continuous predictor variables, needs the target variables to
> >>>> be categorical. Is it possible to apply these classification
> >>>> algorithms for predicting the values of continuous variables,
> >>>> essentially like demand? If so, how?
> >>>>
> >>>> --
> >>>> View this message in context:
> >>>> http://lucene.472066.n3.**nabble.com/Forecasting-in-**
> >>>> Mahout-tp3985365.htm<
> http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm>
> >>>> l Sent from the Mahout User List mailing list archive at Nabble.com.
> >>>>
> >>>>
> >
>
>
> --
> Philippe Adjiman | Research Engineer @appsfire | twitter: padjiman |
> linkedin: il.linkedin.com/in/philippeadjiman | blog:
> http://philippeadjiman.com/blog
>

Re: Forecasting in Mahout

Posted by Philippe Adjiman <ad...@gmail.com>.
I think you mix logistic regression (which is a classifier as
Ted mentioned) and (multivariables) linear regression.
What you need is the latter.
R or Matlab or Octave (others exists too) are great options if size of your
data is tractable.


On Wed, May 23, 2012 at 9:57 AM, Paritosh Ranjan <pr...@xebia.com> wrote:

> Please correct me if I am wrong.
>
> I see Logistic Regression and Locally Weighted Linear Regression on the
> algorithms page. Can't they be used for predicting the value of continuous
> variables.
>
> https://cwiki.apache.org/**confluence/display/MAHOUT/**Logistic+Regression<https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression>
> https://cwiki.apache.org/**confluence/display/MAHOUT/**
> Locally+Weighted+Linear+**Regression<https://cwiki.apache.org/confluence/display/MAHOUT/Locally+Weighted+Linear+Regression>
>
>
>
> On 23-05-2012 12:20, Ted Dunning wrote:
>
>> Counts as machine learning to me!
>>
>> On Wed, May 23, 2012 at 12:57 AM, Jason Xin<Ja...@sas.com>  wrote:
>>
>>  A 'regular' regression may not qualify as machine-learning, although
>>> machines definitely can learn regular regression. If data set is too
>>> large,
>>> your R may crash. That is, most of R programs today.
>>>
>>> -----Original Message-----
>>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>>> Sent: Tuesday, May 22, 2012 7:34 PM
>>> To: user@mahout.apache.org
>>> Cc: mahout-user@lucene.apache.org
>>> Subject: Re: Forecasting in Mahout
>>>
>>> That is a regression, not a classifier.  There are no good regression in
>>> Mahout just now.
>>>
>>> How large is your data?  Is R not an option?
>>>
>>> On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
>>> <pa...@gmail.com>**wrote:
>>>
>>>  I am currently working on a project which deals with demand
>>>> forecasting and machine learning on Hadoop. I came across Mahout when
>>>>
>>> researching for this.
>>>
>>>>  From the various tutorials and 'Mahout in Action' book, I came to
>>>> understand that classification algorithms on Mahout though allow the
>>>> use of continuous predictor variables, needs the target variables to
>>>> be categorical. Is it possible to apply these classification
>>>> algorithms for predicting the values of continuous variables,
>>>> essentially like demand? If so, how?
>>>>
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.**nabble.com/Forecasting-in-**
>>>> Mahout-tp3985365.htm<http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm>
>>>> l Sent from the Mahout User List mailing list archive at Nabble.com.
>>>>
>>>>
>


-- 
Philippe Adjiman | Research Engineer @appsfire | twitter: padjiman |
linkedin: il.linkedin.com/in/philippeadjiman | blog:
http://philippeadjiman.com/blog

Re: Forecasting in Mahout

Posted by Paritosh Ranjan <pr...@xebia.com>.
Please correct me if I am wrong.

I see Logistic Regression and Locally Weighted Linear Regression on the 
algorithms page. Can't they be used for predicting the value of 
continuous variables.

https://cwiki.apache.org/confluence/display/MAHOUT/Logistic+Regression
https://cwiki.apache.org/confluence/display/MAHOUT/Locally+Weighted+Linear+Regression


On 23-05-2012 12:20, Ted Dunning wrote:
> Counts as machine learning to me!
>
> On Wed, May 23, 2012 at 12:57 AM, Jason Xin<Ja...@sas.com>  wrote:
>
>> A 'regular' regression may not qualify as machine-learning, although
>> machines definitely can learn regular regression. If data set is too large,
>> your R may crash. That is, most of R programs today.
>>
>> -----Original Message-----
>> From: Ted Dunning [mailto:ted.dunning@gmail.com]
>> Sent: Tuesday, May 22, 2012 7:34 PM
>> To: user@mahout.apache.org
>> Cc: mahout-user@lucene.apache.org
>> Subject: Re: Forecasting in Mahout
>>
>> That is a regression, not a classifier.  There are no good regression in
>> Mahout just now.
>>
>> How large is your data?  Is R not an option?
>>
>> On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
>> <pa...@gmail.com>wrote:
>>
>>> I am currently working on a project which deals with demand
>>> forecasting and machine learning on Hadoop. I came across Mahout when
>> researching for this.
>>>  From the various tutorials and 'Mahout in Action' book, I came to
>>> understand that classification algorithms on Mahout though allow the
>>> use of continuous predictor variables, needs the target variables to
>>> be categorical. Is it possible to apply these classification
>>> algorithms for predicting the values of continuous variables,
>>> essentially like demand? If so, how?
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm
>>> l Sent from the Mahout User List mailing list archive at Nabble.com.
>>>


Re: Forecasting in Mahout

Posted by Ted Dunning <te...@gmail.com>.
Counts as machine learning to me!

On Wed, May 23, 2012 at 12:57 AM, Jason Xin <Ja...@sas.com> wrote:

> A 'regular' regression may not qualify as machine-learning, although
> machines definitely can learn regular regression. If data set is too large,
> your R may crash. That is, most of R programs today.
>
> -----Original Message-----
> From: Ted Dunning [mailto:ted.dunning@gmail.com]
> Sent: Tuesday, May 22, 2012 7:34 PM
> To: user@mahout.apache.org
> Cc: mahout-user@lucene.apache.org
> Subject: Re: Forecasting in Mahout
>
> That is a regression, not a classifier.  There are no good regression in
> Mahout just now.
>
> How large is your data?  Is R not an option?
>
> On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
> <pa...@gmail.com>wrote:
>
> > I am currently working on a project which deals with demand
> > forecasting and machine learning on Hadoop. I came across Mahout when
> researching for this.
> > From the various tutorials and 'Mahout in Action' book, I came to
> > understand that classification algorithms on Mahout though allow the
> > use of continuous predictor variables, needs the target variables to
> > be categorical. Is it possible to apply these classification
> > algorithms for predicting the values of continuous variables,
> > essentially like demand? If so, how?
> >
> > --
> > View this message in context:
> > http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm
> > l Sent from the Mahout User List mailing list archive at Nabble.com.
> >
>

RE: Forecasting in Mahout

Posted by Jason Xin <Ja...@sas.com>.
A 'regular' regression may not qualify as machine-learning, although machines definitely can learn regular regression. If data set is too large, your R may crash. That is, most of R programs today. 

-----Original Message-----
From: Ted Dunning [mailto:ted.dunning@gmail.com] 
Sent: Tuesday, May 22, 2012 7:34 PM
To: user@mahout.apache.org
Cc: mahout-user@lucene.apache.org
Subject: Re: Forecasting in Mahout

That is a regression, not a classifier.  There are no good regression in Mahout just now.

How large is your data?  Is R not an option?

On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
<pa...@gmail.com>wrote:

> I am currently working on a project which deals with demand 
> forecasting and machine learning on Hadoop. I came across Mahout when researching for this.
> From the various tutorials and 'Mahout in Action' book, I came to 
> understand that classification algorithms on Mahout though allow the 
> use of continuous predictor variables, needs the target variables to 
> be categorical. Is it possible to apply these classification 
> algorithms for predicting the values of continuous variables, 
> essentially like demand? If so, how?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.htm
> l Sent from the Mahout User List mailing list archive at Nabble.com.
>

Re: Forecasting in Mahout

Posted by Ted Dunning <te...@gmail.com>.
That is a regression, not a classifier.  There are no good regression in
Mahout just now.

How large is your data?  Is R not an option?

On Tue, May 22, 2012 at 10:31 AM, ParvathyPillai
<pa...@gmail.com>wrote:

> I am currently working on a project which deals with demand forecasting and
> machine learning on Hadoop. I came across Mahout when researching for this.
> From the various tutorials and 'Mahout in Action' book, I came to
> understand
> that classification algorithms on Mahout though allow the use of continuous
> predictor variables, needs the target variables to be categorical. Is it
> possible to apply these classification algorithms for predicting the values
> of continuous variables, essentially like demand? If so, how?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Forecasting-in-Mahout-tp3985365.html
> Sent from the Mahout User List mailing list archive at Nabble.com.
>