You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Suneel Marthi (JIRA)" <ji...@apache.org> on 2013/11/28 21:38:35 UTC

[jira] [Updated] (MAHOUT-1273) Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce

     [ https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Suneel Marthi updated MAHOUT-1273:
----------------------------------

    Fix Version/s:     (was: 0.9)
                   1.0

Deferring to Release 1.0, [~kunyang@stanford.edu] feel free to bring this back to 0.9 queue if you are around.

> Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce
> ----------------------------------------------------------------------------------------
>
>                 Key: MAHOUT-1273
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-1273
>             Project: Mahout
>          Issue Type: New Feature
>    Affects Versions: 0.9
>            Reporter: Kun Yang
>              Labels: documentation, features, patch, test
>             Fix For: 1.0
>
>         Attachments: Algorithm and Numeric Stability.pdf, Examples.pdf, Manual and Example.pdf, Manual and Example.pdf, Notes.pdf, PenalizedLinear.pdf, PenalizedLinearRegression.patch, java files.pdf
>
>   Original Estimate: 720h
>  Remaining Estimate: 720h
>
> Penalized linear regression such as Lasso, Elastic-net are widely used in machine learning, but there are no very efficient scalable implementations on MapReduce.
> The published distributed algorithms for solving this problem is either iterative (which is not good for MapReduce, see Steven Boyd's paper) or approximate (what if we need exact solutions, see Paralleled stochastic gradient descent); another disadvantage of these algorithms is that they can not do cross validation in the training phase, which requires a user-specified penalty parameter in advance. 
> My ideas can train the model with cross validation in a single pass. They are based on some simple observations.
> The core algorithm is a modified version of coordinate descent (see J. Freedman's paper). They implemented a very efficient R package "glmnet", which is the de facto standard of penalized regression.
> I have implemented the primitive version of this algorithm in Alpine Data Labs.  



--
This message was sent by Atlassian JIRA
(v6.1#6144)