You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Sebastian Schelter <ss...@apache.org> on 2013/08/01 00:05:19 UTC

Re: [jira] [Commented] (MAHOUT-1273) Single Pass Algorithm for Penalized Linear Regression with Cross Validation on MapReduce

What are you trying to do exactly?

2013/7/31 Kun Yang (JIRA) <ji...@apache.org>

>
>     [
> https://issues.apache.org/jira/browse/MAHOUT-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13725757#comment-13725757]
>
> Kun Yang commented on MAHOUT-1273:
> ----------------------------------
>
> "The file
> '/trunk/examples/src/main/java/org/apache/mahout/regression/penalizedlinear/LinearCrossValidation.java'
> could not be found in the repository"
>
> I always get this error message. The directory is not correct?
>
> > Single Pass Algorithm for Penalized Linear Regression with Cross
> Validation on MapReduce
> >
> ----------------------------------------------------------------------------------------
> >
> >                 Key: MAHOUT-1273
> >                 URL: https://issues.apache.org/jira/browse/MAHOUT-1273
> >             Project: Mahout
> >          Issue Type: New Feature
> >    Affects Versions: 0.9
> >            Reporter: Kun Yang
> >              Labels: documentation, features, patch, test
> >             Fix For: 0.9
> >
> >         Attachments: Algorithm and Numeric Stability.pdf, java
> files.pdf, Manual and Example.pdf, PenalizedLinear.pdf,
> PenalizedLinearRegression.patch
> >
> >   Original Estimate: 720h
> >  Remaining Estimate: 720h
> >
> > Penalized linear regression such as Lasso, Elastic-net are widely used
> in machine learning, but there are no very efficient scalable
> implementations on MapReduce.
> > The published distributed algorithms for solving this problem is either
> iterative (which is not good for MapReduce, see Steven Boyd's paper) or
> approximate (what if we need exact solutions, see Paralleled stochastic
> gradient descent); another disadvantage of these algorithms is that they
> can not do cross validation in the training phase, which requires a
> user-specified penalty parameter in advance.
> > My ideas can train the model with cross validation in a single pass.
> They are based on some simple observations.
> > The core algorithm is a modified version of coordinate descent (see J.
> Freedman's paper). They implemented a very efficient R package "glmnet",
> which is the de facto standard of penalized regression.
> > I have implemented the primitive version of this algorithm in Alpine
> Data Labs.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>