You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by dbtsai <gi...@git.apache.org> on 2014/08/09 01:31:49 UTC

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

GitHub user dbtsai opened a pull request:

    https://github.com/apache/spark/pull/1862

    [SPARK-2934][MLlib] Adding LogisticRegressionWithLBFGS Interface

    for training with LBFGS Optimizer which will converge faster than SGD.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/AlpineNow/spark dbtsai-lbfgs-lor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1862
    
----
commit 3cf50c207e79c5f67cd5d06ff3f85f3538c23081
Author: DB Tsai <db...@alpinenow.com>
Date:   2014-08-08T23:23:21Z

    LogisticRegressionWithLBFGS interface

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51674177
  
    QA results for PR 1862:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18236/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51668966
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18231/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51722306
  
    QA results for PR 1862:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18282/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51678112
  
    QA results for PR 1862:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18241/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51671204
  
    QA results for PR 1862:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18231/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023015
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    +    this.convergenceTol = tolerance
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(iters: Int): this.type = {
    --- End diff --
    
    argument names are part of the API. Shall we change it to `numIterations` to match others?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023075
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    +    this.convergenceTol = tolerance
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(iters: Int): this.type = {
    --- End diff --
    
    also the method name to `setNumIterations` (not exactly correct but consistent with others and shorter).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51672880
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18236/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51676717
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18240/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51677423
  
    QA results for PR 1862:<br>- This patch FAILED unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18240/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51671390
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18234/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023042
  
    --- Diff: mllib/src/test/scala/org/apache/spark/mllib/classification/LogisticRegressionSuite.scala ---
    @@ -76,22 +76,34 @@ class LogisticRegressionSuite extends FunSuite with LocalSparkContext with Match
     
         val testRDD = sc.parallelize(testData, 2)
         testRDD.cache()
    -    val lr = new LogisticRegressionWithSGD().setIntercept(true)
    -    lr.optimizer.setStepSize(10.0).setNumIterations(20)
     
    -    val model = lr.run(testRDD)
    +    val lrWithSGD = new LogisticRegressionWithSGD().setIntercept(true)
    +    lrWithSGD.optimizer.setStepSize(10.0).setNumIterations(20)
    +
    +    val lrWithLBFGS = new LogisticRegressionWithLBFGS().setIntercept(true)
    +
    +    val modelWithSGD = lrWithSGD.run(testRDD)
    +    val modelWithLBFGS = lrWithLBFGS.run(testRDD)
    --- End diff --
    
    For the unit tests. It might be better if we separate LBFGS tests from SGD tests since they are different units.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023077
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    +    this.convergenceTol = tolerance
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(iters: Int): this.type = {
    --- End diff --
    
    agreed! should we also change for the api in the optimizer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16022367
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,98 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  override val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  override protected def createModel(weights: Vector, intercept: Double) = {
    +    new LogisticRegressionModel(weights, intercept)
    +  }
    +}
    +
    +/**
    + * Top-level methods for calling Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +object LogisticRegressionWithLBFGS {
    --- End diff --
    
    @dbtsai Do you minding using setters only instead of static methods? It becomes really hard to maintain when we need to add more parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51867619
  
    LGTM. Merged into both master and branch-1.1. (This is a much better algorithm than SGD and we have tested it since v1.0.) Thanks @dbtsai for adding it!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51674337
  
    QA results for PR 1862:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18238/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023299
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    +    this.convergenceTol = tolerance
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(iters: Int): this.type = {
    --- End diff --
    
    LBFGS.setMaxNumIterations


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51720881
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18282/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16024057
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,57 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to return new LBFGS object every time since users can reset the parameters anytime.
    +  override def optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(convergenceTol: Double): this.type = {
    +    this.convergenceTol = convergenceTol
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setNumIterations(numIterations: Int): this.type = setMaxNumIterations(numIterations)
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(maxNumIterations: Int): this.type = {
    --- End diff --
    
    Do we need both?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16024319
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,52 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to return new LBFGS object every time since users can reset the parameters anytime.
    +  override def optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(convergenceTol: Double): this.type = {
    +    this.convergenceTol = convergenceTol
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setNumIterations(maxNumIterations: Int): this.type = {
    --- End diff --
    
    `setNumIterations(numIterations: Int)` (to match others)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023020
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    --- End diff --
    
    `tolerance` -> `convergenceTol`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/1862


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51720829
  
    Jenkins, retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51677448
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18241/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16022998
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    --- End diff --
    
    moving the this constructor to line 202?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by dbtsai <gi...@git.apache.org>.

Github user dbtsai commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16022431
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,98 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  override val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  override protected def createModel(weights: Vector, intercept: Double) = {
    +    new LogisticRegressionModel(weights, intercept)
    +  }
    +}
    +
    +/**
    + * Top-level methods for calling Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +object LogisticRegressionWithLBFGS {
    --- End diff --
    
    I don't mind about this. However, it will cause inconsistent api compared with LogisticRegressionWithSGD


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16023088
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    +    this.convergenceTol = tolerance
    +    this
    +  }
    +
    +  /**
    +   * Set the maximal number of iterations for L-BFGS. Default 100.
    +   */
    +  def setMaxNumIterations(iters: Int): this.type = {
    --- End diff --
    
    which one?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by mengxr <gi...@git.apache.org>.

Github user mengxr commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1862#discussion_r16022979
  
    --- Diff: mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala ---
    @@ -188,3 +188,54 @@ object LogisticRegressionWithSGD {
         train(input, numIterations, 1.0, 1.0)
       }
     }
    +
    +/**
    + * Train a classification model for Logistic Regression using Limited-memory BFGS.
    + * NOTE: Labels used in Logistic Regression should be {0, 1}
    + */
    +class LogisticRegressionWithLBFGS private (
    +    private var convergenceTol: Double,
    +    private var maxNumIterations: Int,
    +    private var regParam: Double)
    +  extends GeneralizedLinearAlgorithm[LogisticRegressionModel] with Serializable {
    +
    +  private val gradient = new LogisticGradient()
    +  private val updater = new SimpleUpdater()
    +  // Have to be lazy since users can change the parameters after the class is created.
    +  // PS, after the first train, the optimizer variable will be computed, so the parameters
    +  // can not be changed anymore.
    +  override lazy val optimizer = new LBFGS(gradient, updater)
    +    .setNumCorrections(10)
    +    .setConvergenceTol(convergenceTol)
    +    .setMaxNumIterations(maxNumIterations)
    +    .setRegParam(regParam)
    +
    +  override protected val validators = List(DataValidators.binaryLabelValidator)
    +
    +  /**
    +   * Construct a LogisticRegression object with default parameters
    +   */
    +  def this() = this(1E-4, 100, 0.0)
    +
    +  /**
    +   * Set the convergence tolerance of iterations for L-BFGS. Default 1E-4.
    +   * Smaller value will lead to higher accuracy with the cost of more iterations.
    +   */
    +  def setConvergenceTol(tolerance: Double): this.type = {
    --- End diff --
    
    Do we have a safe guard to prevent user reset this? Or we can override optimizer as a method and return the LBFGS with the latest set of parameters. The latter sounds better to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51672993
  
    QA results for PR 1862:<br>- This patch PASSES unit tests.<br>- This patch merges cleanly<br>- This patch adds no public classes<br><br>For more information see test ouptut:<br>https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18234/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: [SPARK-2934][MLlib] Adding LogisticRegressionW...

Posted by SparkQA <gi...@git.apache.org>.

Github user SparkQA commented on the pull request:

    https://github.com/apache/spark/pull/1862#issuecomment-51673109
  
    QA tests have started for PR 1862. This patch merges cleanly. <br>View progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/18238/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org