You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by "Goldstein, Alex" <Al...@iqor.com> on 2012/10/29 18:18:27 UTC

SGD in Mahout

Hi, hope anyone can help me out.
In the company I work at we are running SGD algorithms using STATA and recently testing out Mahout and R as we need to run the model on a lot of data.
STATA has been the preference from the analytics group and confortable with the results.
An initial test in R gave similar results, but processing times were really slow in comparison.
Now trying out Mahout, and using the trainlogistic with the input file, correct target and predictive variable, the speed is great, but the results are way off of what we expected.
The coefficients of the function are nothing even close.

Can anyone point me in the right direction on how to write our own code to run sgd algorithm in mahout.  Haven't found much documentation regarding this, even in teh book Mahout in Action the documentation seems scarse.

In STATA the options for running are very few.  Simply run the logistic regression with target variable and the predictive variables and thats it.

I'm sure I'll need to write my own code for this, but just wanted som pointers if anyone had worked with the SGD algorithm extensively.

Thanks

Alex

Re: SGD in Mahout

Posted by Lance Norskog <go...@gmail.com>.
This is a resource for other discussions about the SGD implementation in Mahout:

http://find.searchhub.org/?q=mahout+sgd

One important point is that this SGD implementation requires input data to be heavily randomized.

----- Original Message -----
| From: "Alex Goldstein" <Al...@iqor.com>
| To: user@mahout.apache.org
| Sent: Monday, October 29, 2012 10:18:27 AM
| Subject: SGD in Mahout
| 
| 
| Hi, hope anyone can help me out.
| In the company I work at we are running SGD algorithms using STATA
| and recently testing out Mahout and R as we need to run the model on
| a lot of data.
| STATA has been the preference from the analytics group and
| confortable with the results.
| An initial test in R gave similar results, but processing times were
| really slow in comparison.
| Now trying out Mahout, and using the trainlogistic with the input
| file, correct target and predictive variable, the speed is great,
| but the results are way off of what we expected.
| The coefficients of the function are nothing even close.
| 
| Can anyone point me in the right direction on how to write our own
| code to run sgd algorithm in mahout.  Haven't found much
| documentation regarding this, even in teh book Mahout in Action the
| documentation seems scarse.
| 
| In STATA the options for running are very few.  Simply run the
| logistic regression with target variable and the predictive
| variables and thats it.
| 
| I'm sure I'll need to write my own code for this, but just wanted som
| pointers if anyone had worked with the SGD algorithm extensively.
| 
| Thanks
| 
| Alex
|