You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Dong Wang (JIRA)" <ji...@apache.org> on 2014/05/05 05:50:14 UTC

[jira] [Updated] (SPARK-1682) Add gradient descent w/o sampling and RDA L1 updater

     [ https://issues.apache.org/jira/browse/SPARK-1682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dong Wang updated SPARK-1682:
-----------------------------

    Description: 
The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient.

Add enhanced RDA L1 updater, which could produce even sparse solutions with comparable quality compared with L1. Reference: 
Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization", Journal of Machine Learning Research 11 (2010) 2543-2596.

Small fix: add options to BinaryClassification example to read and write model file


  was:
The LogisticRegressionWithSGD example does not expose the following capability that already exist inside MLlib:
  * reading svmlight data
  * regularization with l1 and l2
  * add intercept
  * write model to a file
  * read model and generate predictions

The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient.

        Summary: Add gradient descent w/o sampling and RDA L1 updater  (was: LogisticRegressionWithSGD should support svmlight data and gradient descent w/o sampling)

> Add gradient descent w/o sampling and RDA L1 updater
> ----------------------------------------------------
>
>                 Key: SPARK-1682
>                 URL: https://issues.apache.org/jira/browse/SPARK-1682
>             Project: Spark
>          Issue Type: Improvement
>          Components: MLlib
>    Affects Versions: 1.0.0
>            Reporter: Dong Wang
>             Fix For: 1.0.0
>
>
> The GradientDescent optimizer does sampling before a gradient step. When input data is already shuffled beforehand, it is possible to scan data and make gradient descent for each data instance. This could be potentially more efficient.
> Add enhanced RDA L1 updater, which could produce even sparse solutions with comparable quality compared with L1. Reference: 
> Lin Xiao, "Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization", Journal of Machine Learning Research 11 (2010) 2543-2596.
> Small fix: add options to BinaryClassification example to read and write model file



--
This message was sent by Atlassian JIRA
(v6.2#6252)