You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by ihadanny <id...@gmail.com> on 2010/07/19 10:29:30 UTC
Newbie questions about Mahout 228: Logistic Regression LR (SGD)
Hey,
I've been trying out mahout-228: Sequential LR (using SGD).
Few things I haven't been able to figure out:
1. Is there a parallel version? Can it integrate with hadoop and do each
pass in parallel?
2. Weighting - is there support for weighted samples? E.g. I have 50
doughnuts with the same predictors and the same target color, must I feed 50
rows to OnlineLogisticRegression, isn't there a way to feed one line with a
weight of 50?
3. Is it possible to define a stop condition instead of explicitly setting
the number of passes. E.g. stop when the Fisher test reaches a certain
value?
Thanks, and my apologies if these are really obvious
Ido
--
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-questions-about-Mahout-228-Logistic-Regression-LR-SGD-tp977968p977968.html
Sent from the Mahout User List mailing list archive at Nabble.com.
Re: Newbie questions about Mahout 228: Logistic Regression LR (SGD)
Posted by ihadanny <id...@gmail.com>.
Thanks! Will try your suggestions in 2,3 and will report if successful.
--
View this message in context: http://lucene.472066.n3.nabble.com/Newbie-questions-about-Mahout-228-Logistic-Regression-LR-SGD-tp977968p980763.html
Sent from the Mahout User List mailing list archive at Nabble.com.
Re: Newbie questions about Mahout 228: Logistic Regression LR (SGD)
Posted by Ted Dunning <te...@gmail.com>.
On Mon, Jul 19, 2010 at 1:29 AM, ihadanny <id...@gmail.com> wrote:
>
> I've been trying out mahout-228: Sequential LR (using SGD).
>
Thanks!
> Few things I haven't been able to figure out:
>
> 1. Is there a parallel version? Can it integrate with hadoop and do each
> pass in parallel?
>
Not really. This is a difficulty with these fast sequential algorithms.
My current goal for the next few weeks is to tune up Mahout-228 to where it
can process a billion training examples in a few hours. At that kind of
speed, who needs parallel?
> 2. Weighting - is there support for weighted samples? E.g. I have 50
> doughnuts with the same predictors and the same target color, must I feed
> 50
> rows to OnlineLogisticRegression, isn't there a way to feed one line with a
> weight of 50?
>
Currently, weighting is not supported. Easy to add, however, since it just
adjusts the learning rate.
>
> 3. Is it possible to define a stop condition instead of explicitly setting
> the number of passes. E.g. stop when the Fisher test reaches a certain
> value?
>
It is possible. Since you control the training, you can do this easily. My
next few iterations will help address this as well by allowing on-the-fly
use of held-out data and on-line adaptation of hyper-parameters.
>
> Thanks, and my apologies if these are really obvious
>
They aren't!