You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ihadanny <id...@gmail.com> on 2010/07/19 10:29:30 UTC

Newbie questions about Mahout 228: Logistic Regression LR (SGD)

Hey,
I've been trying out mahout-228: Sequential LR (using SGD).
Few things I haven't been able to figure out:

1. Is there a parallel version? Can it integrate with hadoop and do each
pass in parallel?

2. Weighting - is there support for weighted samples? E.g. I have 50
doughnuts with the same predictors and the same target color, must I feed 50
rows to OnlineLogisticRegression, isn't there a way to feed one line with a
weight of 50?

3. Is it possible to define a stop condition instead of explicitly setting
the number of passes. E.g. stop when the Fisher test reaches a certain
value?

Thanks, and my apologies if these are really obvious 

Ido
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-questions-about-Mahout-228-Logistic-Regression-LR-SGD-tp977968p977968.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Newbie questions about Mahout 228: Logistic Regression LR (SGD)

Posted by ihadanny <id...@gmail.com>.
Thanks! Will try your suggestions in 2,3 and will report if successful.
-- 
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-questions-about-Mahout-228-Logistic-Regression-LR-SGD-tp977968p980763.html
Sent from the Mahout User List mailing list archive at Nabble.com.

Re: Newbie questions about Mahout 228: Logistic Regression LR (SGD)

Posted by Ted Dunning <te...@gmail.com>.
On Mon, Jul 19, 2010 at 1:29 AM, ihadanny <id...@gmail.com> wrote:

>
> I've been trying out mahout-228: Sequential LR (using SGD).
>

Thanks!


> Few things I haven't been able to figure out:
>
> 1. Is there a parallel version? Can it integrate with hadoop and do each
> pass in parallel?
>

Not really.  This is a difficulty with these fast sequential algorithms.

My current goal for the next few weeks is to tune up Mahout-228 to where it
can process a billion training examples in a few hours.  At that kind of
speed, who needs parallel?


> 2. Weighting - is there support for weighted samples? E.g. I have 50
> doughnuts with the same predictors and the same target color, must I feed
> 50
> rows to OnlineLogisticRegression, isn't there a way to feed one line with a
> weight of 50?
>

Currently, weighting is not supported.  Easy to add, however, since it just
adjusts the learning rate.


>
> 3. Is it possible to define a stop condition instead of explicitly setting
> the number of passes. E.g. stop when the Fisher test reaches a certain
> value?
>

It is possible.  Since you control the training, you can do this easily. My
next few iterations will help address this as well by allowing on-the-fly
use of held-out data and on-line adaptation of hyper-parameters.


>
> Thanks, and my apologies if these are really obvious
>

They aren't!