You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by hhbyyh <gi...@git.apache.org> on 2017/05/04 19:30:32 UTC

[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

GitHub user hhbyyh opened a pull request:

    https://github.com/apache/spark/pull/17862

    [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSVC

    ## What changes were proposed in this pull request?
    
    jira: https://issues.apache.org/jira/browse/SPARK-20602
    
    Currently LinearSVC in Spark only supports OWLQN as the optimizer ( check https://issues.apache.org/jira/browse/SPARK-14709). I made comparison between LBFGS and OWLQN on several public dataset and found LBFGS converges much faster for LinearSVC in most cases.
    The following table presents the number of training iterations and f1 score of both optimizers until convergence
    
    ![image](https://cloud.githubusercontent.com/assets/7981698/25721193/6d78005a-30c4-11e7-9431-9e8e4d17776f.png)
    
    data source: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/binary.html
    training code: new LinearSVC().setMaxIter(10000).setTol(1e-6)
    LBFGS requires far less iterations in most cases and probably is a better default optimizer.
    
    ## How was this patch tested?
    
     strengthen existing unit tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/hhbyyh/spark svclbfgs

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/17862.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #17862
    
----
commit d46e5edfdf5d052c59677f58767bdbe0803dc368
Author: Yuhao Yang <yu...@intel.com>
Date:   2017-05-04T19:13:41Z

    add lbfgs as default optimizer of LinearSVC

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    +1 @WeichenXu123 IIRC softmax regression also include a non-derivable point, we can use LBFGS to solve it as well. We can support _squared hinge loss_ which is smooth function in the future, so users can switch to _squared hinge loss_ if they hit the condition with little probability.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78019 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78019/testReport)** for PR 17862 at commit [`2ca5a74`](https://github.com/apache/spark/commit/2ca5a7456f7dc5ea4473ddee2933bd6228b3476e).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Yes, Both LBFGS and OWLQN generate similar model with sklearn if without intercept.
    
    About replacing OWLQN with LBFGS, I noticed if using hinge loss, sometimes OWLQN uses fewer iterations than LBFGS as shown in the table in the PR description.  Shall we just discard OWLQN support? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r122607429
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -96,7 +98,8 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
     #' @note spark.svmLinear since 2.2.0
     setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, regParam = 0.0, maxIter = 100, tol = 1E-6, standardization = TRUE,
    -                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2) {
    +                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2, solver = "l-bfgs",
    --- End diff --
    
    and then no need for as.charater


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    On many large dataset, LinearSVC cannot get the similar result with sklearn. e.g., SKLearn may get coefficients (5, 10, 15, 20), and spark LinearSVC will get (10, 20, 30, 40). It's different but in most cases it's proportional scaling. This may partly because the different handling of intercept scaling in sklearn and spark. But on larger dataset, I found LBFGS and OWLQN also presents the similar issue.
    
    I wonder if it's normal to get different (but proportional)  coefficients from different optimization method for the same dataset and parameter settings.
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76463/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115668587
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    @debasish83 Not sure if it's a better solution than squared hinge loss. But I would be interested to learn the performance (accuracy and speed). Can you please help try to evaluate it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Please let me know if there's any unresolved comments. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115050353
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    Thanks for providing the info @yanboliang.  I'm thinking we should provide both LBFGS and OWNQN as options for L1-SVM (Standard hinge). From the several dataset I tested, LBFGS converges as good as (or even better than) OWNQN and uses part of the time. Yet due to the potential failure, we may not use LBFGS as the default optimizer(solver).
    
    A better solution as I see is to use L2-SVM(squared hinge loss) with LBFGS (combine this with https://github.com/apache/spark/pull/17645), but the change may be too large to be included into 2.2 now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    ping @jkbradley Sorry I know this is like the last minute for 2.2, but the change may be important for user experience. If we're not comfortable making API change right now, how about we just change the default optimizer to LBFGS before 2.2 releases?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115657829
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -223,6 +229,25 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
         assert(model1.coefficients ~== coefficientsSK relTol 4E-3)
       }
     
    +  test("linearSVC L-BFGS comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
    +      .setRegParam(0.00003)
    --- End diff --
    
    Indeed. I can use your help here since I cannot find the theory proof for this case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82377 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)** for PR 17862 at commit [`1f8e984`](https://github.com/apache/spark/commit/1f8e98411401add53dddbb13e94f450c27b7c0fd).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)** for PR 17862 at commit [`bf4d955`](https://github.com/apache/spark/commit/bf4d955082ce94028965eb034efab74f6ff8d0b8).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #76508 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76508/testReport)** for PR 17862 at commit [`f7d5559`](https://github.com/apache/spark/commit/f7d555997d8c589585b5a34a0b017a29577cad82).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @yanboliang Without intercept, sklearn and Spark LinearSVC will get the same coefficients on several dataset I tested. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146168660
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit}
     /** Params for linear SVM Classifier. */
     private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam
       with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol
    -  with HasAggregationDepth with HasThreshold {
    +  with HasAggregationDepth with HasThreshold with HasSolver {
    +
    +  /**
    +   * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported.
    +   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of
    +   * the hinge loss (a.k.a. L2 loss).
    +   *
    +   * @see <a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss (Wikipedia)</a>
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
    +    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
    +    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))
    --- End diff --
    
    I tend to support case-insensitive params in `LinearRegression`, or change the default behavior of ParamValidators.inArray. And we should improve the consistency in supporting case-insensitive String params anyway. 



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r145371704
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit}
     /** Params for linear SVM Classifier. */
     private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam
       with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol
    -  with HasAggregationDepth with HasThreshold {
    +  with HasAggregationDepth with HasThreshold with HasSolver {
    +
    +  /**
    +   * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported.
    +   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of
    +   * the hinge loss (a.k.a. L2 loss).
    +   *
    +   * @see <a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss (Wikipedia)</a>
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
    +    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
    +    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))
    --- End diff --
    
    The `isValid` function you can use
    `ParamValidators.inArray[String](supportedLosses))`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82766/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by debasish83 <gi...@git.apache.org>.
Github user debasish83 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115741206
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    hinge loss is not differentiable...how are you smoothing it before you can use a quasi newton solver....since the papers smooth the max, a newton/quasi-newton solver should work well...if you are keeping the non-differentiable loss better will be to use a sub-gradient solver as suggested by the talk...I will evaluate the formulation...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115004513
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    @MLnick Hinge Loss is a non-smooth and non-differentiable loss, the convex optimization problem is a non-smooth one, so we use use OWL-QN rather than L-BFGS as the underlying optimizer.
    @hhbyyh AFAIK in cases where L-BFGS does not encounter any non-smooth point, it often converges to the optimum faster, but there is change of failure on non-smooth functions.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115278918
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    I saw other libraries like ```sklearn.svm.linearSVC``` use squared hinge loss as the default loss function, it's differentiable and can be solved by LBFGS. What about merging #17645 firstly and then we can use LBFGS as the default solver as well? Otherwise, we would make behavior change if we make this switch after 2.2 release. cc @jkbradley 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    cc @WeichenXu123 What do you think about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @hhbyyh Test result looks good!
    OWLQN takes longer time for each iteration, because each iteration's line search, it made more passes on dataset.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    +1 @jkbradley for test on large-scale datasets. @hhbyyh Do you have time to test it? If not, I can help. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82376/testReport)** for PR 17862 at commit [`bf4d955`](https://github.com/apache/spark/commit/bf4d955082ce94028965eb034efab74f6ff8d0b8).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82376/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78731 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)** for PR 17862 at commit [`7be6bac`](https://github.com/apache/spark/commit/7be6bacd19e06de6ea8f040f8ad0c70feee82e12).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Sure, I can find some larger dataset to test with. 
    
    But I guess, as showed in the PR description, LBFGS will generally outperform OWLQS, but not in all the cases. I assume single large scale dataset is not sufficient for the testing, so if you plan to test on some data set, please leave comment here. Thank you. 
    



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78002/testReport)** for PR 17862 at commit [`3707580`](https://github.com/apache/spark/commit/3707580de4b3905f2f77d53c926df14cda32f942).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/79002/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r124576026
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -272,36 +272,16 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
         val coefficientsSK = Vectors.dense(7.24690165, 14.77029087, 21.99924004, 29.5575729)
         val interceptSK = 7.36947518
    -    assert(model1.intercept ~== interceptSK relTol 1E-3)
    -    assert(model1.coefficients ~== coefficientsSK relTol 4E-3)
    +    assert(model.intercept ~== interceptSK relTol 1E-3)
    +    assert(model.coefficients ~== coefficientsSK relTol 4E-3)
       }
     
    -  test("linearSVC L-BFGS hinge comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
    --- End diff --
    
    we have to set RegParam to 0.00003 to get the similar result with sklearn, while the corresponding one should be 0.00002. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146168734
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    --- End diff --
    
    Personally I never, but I cannot grantee it for all the Locales.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    +1 for adding test on large-scale datasets.
    Another thing I want to know is that: you can compare the final loss value on the result coefficients, between LIBLINEAR(scikit-learn), LBFGS, OWLQN, so that we can know which optimizer can get more optimized result (testing and comparing on different kind of dataset).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Thanks @WeichenXu123 for the comments. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146167706
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit}
     /** Params for linear SVM Classifier. */
     private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam
       with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol
    -  with HasAggregationDepth with HasThreshold {
    +  with HasAggregationDepth with HasThreshold with HasSolver {
    +
    +  /**
    +   * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported.
    +   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of
    +   * the hinge loss (a.k.a. L2 loss).
    +   *
    +   * @see <a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss (Wikipedia)</a>
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
    +    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
    +    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))
    --- End diff --
    
    Yeah, I thought about this, but `solver` param in `LinearRegression` also ignore the thing. I tend to keep them consistent, what do you think of it ?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78024 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)** for PR 17862 at commit [`5f7f456`](https://github.com/apache/spark/commit/5f7f456335b02f1408f0d1577bdbbc3963312233).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78119/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #81367 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit [`cec628b`](https://github.com/apache/spark/commit/cec628ba094fcbaad2fad4a7c5957aa9f5d3d698).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78731/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #81367 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81367/testReport)** for PR 17862 at commit [`cec628b`](https://github.com/apache/spark/commit/cec628ba094fcbaad2fad4a7c5957aa9f5d3d698).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @hhbyyh Thanks for doing the extra work to use the new aggregator here. I do think it's better to separate those changes from this one, though. There is actually more that needs to be done for the conversion (need to use `RDDLossFunction` and also add a test suite for the aggregator). Would you mind submitting a PR for just the conversion changes?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Update: switch to HasSolver trait and use OWLQN as default optimizer 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r124458920
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -272,36 +272,16 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
         val coefficientsSK = Vectors.dense(7.24690165, 14.77029087, 21.99924004, 29.5575729)
         val interceptSK = 7.36947518
    -    assert(model1.intercept ~== interceptSK relTol 1E-3)
    -    assert(model1.coefficients ~== coefficientsSK relTol 4E-3)
    +    assert(model.intercept ~== interceptSK relTol 1E-3)
    +    assert(model.coefficients ~== coefficientsSK relTol 4E-3)
       }
     
    -  test("linearSVC L-BFGS hinge comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
    --- End diff --
    
    what happens to this test?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115429672
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    Can I ask why we set ```regParam = 2.0 / datasize / c``` to match solution of ```sklearn.svm.LinearSVC```? AFAIK, sklearn called liblinear to train linear SVM classification model, I can understand liblinear using ```C``` for penalty parameter of the error term which is different from ```regParam```, and it doesn't multiply ```1/n``` on the error term, but where is the ```2.0``` come from? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @debasish83 There're several approaches trying to smooth the hinge loss. https://en.wikipedia.org/wiki/Hinge_loss. For the one you're proposing, do you know if it's used in other SVM implementations? Please help provide some reference or evaluation data if possible. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #79002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)** for PR 17862 at commit [`aaf35ec`](https://github.com/apache/spark/commit/aaf35ec1a838eaef72d7a880ddbb55d79bfcaecf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82377 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82377/testReport)** for PR 17862 at commit [`1f8e984`](https://github.com/apache/spark/commit/1f8e98411401add53dddbb13e94f450c27b7c0fd).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r145369694
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    --- End diff --
    
    Why need `.toLowerCase(Locale.ROOT)` here ? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78119 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)** for PR 17862 at commit [`0297057`](https://github.com/apache/spark/commit/0297057c50ed673048f8d92acbad1d03a7f7fc88).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115303395
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    Sounds good to me. But we'll also need to update python and R API. I don't know the release date of 2.2. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82766 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit [`0bb5afe`](https://github.com/apache/spark/commit/0bb5afe54a9a53054d2076ac28b09234a7380bbf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r114879308
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -145,6 +164,15 @@ class LinearSVC @Since("2.2.0") (
       def setAggregationDepth(value: Int): this.type = set(aggregationDepth, value)
       setDefault(aggregationDepth -> 2)
     
    +  /**
    +   * Set optimizer for LinearSVC. Supported options: "lbfgs" and "owlqn".
    +   *
    +   * @group setParam
    +   */
    +  @Since("2.2.0")
    +  def setOptimizer(value: String): this.type = set(optimizer, value.toLowerCase(Locale.ROOT))
    --- End diff --
    
    Maybe just call it solver, since `LinearRegression` also has something similar like this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76508/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r114945580
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    Could you refresh me on why OWLQN was used originally? AFAIK OWLQN is for L1 regularized loss functions, while L-BFGS only supports L2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by debasish83 <gi...@git.apache.org>.
Github user debasish83 commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @hhbyyh can we smooth the hinge-loss using soft-max (variant of ReLU) and then use LBFGS ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by debasish83 <gi...@git.apache.org>.
Github user debasish83 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115659479
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    This slides also explain it...Please see slide 32...the max can be replaced by soft-max with the softness lambda can be tuned...log-sum-exp is a standard soft-max that can be used which is similar to ReLu functions and we can re-use it from MLP:
    ftp://ftp.cs.wisc.edu/math-prog/talks/informs99ssv.ps
    ftp://ftp.cs.wisc.edu/pub/dmi/tech-reports/99-03.pdf
    I can add the formulation if there is interest...it needs some tuning for soft-max parameter but the convergence will be good with LBFGS (OWLQN is not needed)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146169405
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit}
     /** Params for linear SVM Classifier. */
     private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam
       with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol
    -  with HasAggregationDepth with HasThreshold {
    +  with HasAggregationDepth with HasThreshold with HasSolver {
    +
    +  /**
    +   * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported.
    +   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of
    +   * the hinge loss (a.k.a. L2 loss).
    +   *
    +   * @see <a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss (Wikipedia)</a>
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
    +    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
    +    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))
    --- End diff --
    
    Created a jira to address that issue: https://issues.apache.org/jira/browse/SPARK-22331


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #82766 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82766/testReport)** for PR 17862 at commit [`0bb5afe`](https://github.com/apache/spark/commit/0bb5afe54a9a53054d2076ac28b09234a7380bbf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115431940
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -223,6 +229,25 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
         assert(model1.coefficients ~== coefficientsSK relTol 4E-3)
       }
     
    +  test("linearSVC L-BFGS comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.LBFGS)
    +      .setRegParam(0.00003)
    --- End diff --
    
    Why we switch to different ```regParam``` compared with the above test? Does it means the converged solution will depends on optimizer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78119 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78119/testReport)** for PR 17862 at commit [`0297057`](https://github.com/apache/spark/commit/0297057c50ed673048f8d92acbad1d03a7f7fc88).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #81836 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)** for PR 17862 at commit [`0f5cad5`](https://github.com/apache/spark/commit/0f5cad5ca9770871fb2a07968f53332f03e74903).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78002/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146167919
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    --- End diff --
    
    ditto. IMO these characters are all in ASCII, I think they won't encounter locales issue. (But do you encounter such issue in some env ?)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115432825
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    Let's wait for more other comments, if we reach a consensus, I'd like to help review and merge this to catch 2.2. cc @jkbradley @MLnick 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Tested with several larger data set with Hinge Loss function, to compare l-bfgs and owlqn solvers.
    Run until converged or exceed maxIter (2000).
    
    dataset | numRecords | numFeatures | l-bfgs iterations | owlqn iterations | l-bfgs final loss | owlqn final loss
    -------- | ---------------|---------------|---------------|---------------|---------------|---------------
    url_combined | 2396130 | 3231961 | 317 (952 sec) | 287 (1661 sec) | 9.71E-5| 1.64E-4
    kdda | 8407752 | 20216830 | 2000+ (29729 sec) | 288 13664 (sec) |  0.0068 | 0.0135
    webspam | 350000 | 254 | 344 (67 sec) | 1502 (714 sec) | 0.18273 | 0.18273
    SUSY | 5000000 | 18 | 152 (145 sec) | 1242 (3357 sec) |  0.499 | 0.499
    
    l-bfgs does not always take fewer iterations, but it converges to a smaller final loss.
    For each iteration, owlqn takes longer time ( 2 or 3 times) than l-bfgs. Logistic Regression also exhibits the similar behavior.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #81836 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81836/testReport)** for PR 17862 at commit [`0f5cad5`](https://github.com/apache/spark/commit/0f5cad5ca9770871fb2a07968f53332f03e74903).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115007440
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -145,6 +164,15 @@ class LinearSVC @Since("2.2.0") (
       def setAggregationDepth(value: Int): this.type = set(aggregationDepth, value)
       setDefault(aggregationDepth -> 2)
     
    +  /**
    +   * Set optimizer for LinearSVC. Supported options: "lbfgs" and "owlqn".
    +   *
    +   * @group setParam
    +   */
    +  @Since("2.2.0")
    +  def setOptimizer(value: String): this.type = set(optimizer, value.toLowerCase(Locale.ROOT))
    --- End diff --
    
    The solver is ```l-bfgs``` if training with lbfgs(l2 reg) or owlqn(l1/elasticNet reg) for other algorithms such as ```LinearRegression``` and ```LogisticRegression```, so I concerned that it may confuse users if we distinguish them in ```LinearSVC```. Or we should clarify it in document. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #79002 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/79002/testReport)** for PR 17862 at commit [`aaf35ec`](https://github.com/apache/spark/commit/aaf35ec1a838eaef72d7a880ddbb55d79bfcaecf).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r114944894
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -145,6 +164,15 @@ class LinearSVC @Since("2.2.0") (
       def setAggregationDepth(value: Int): this.type = set(aggregationDepth, value)
       setDefault(aggregationDepth -> 2)
     
    +  /**
    +   * Set optimizer for LinearSVC. Supported options: "lbfgs" and "owlqn".
    +   *
    +   * @group setParam
    +   */
    +  @Since("2.2.0")
    +  def setOptimizer(value: String): this.type = set(optimizer, value.toLowerCase(Locale.ROOT))
    --- End diff --
    
    👍 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115656752
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    http://www.robots.ox.ac.uk/~az/lectures/ml/lect2.pdf 
    Please refer to page 36.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82377/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/76637/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merge the change from https://github.com/apache/spark/pull/17645 into a single change.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @hhbyyh Make sense, does it mean both LBFGS and OWLQN produce the same solution if fitting without intercept? If so, I'm prefer to change the solver to LBFGS rather than adding a new option. Since we treat LBFGS and OWLQN as the same solver in other algorithms like ```LinearRegression``` and ```LogisticRegression```.
    BTW, we can discuss whether we should follow the same way(compared with sklearn/R) to handle intercept in follow-up work. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r122607418
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -96,7 +98,8 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
     #' @note spark.svmLinear since 2.2.0
     setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, regParam = 0.0, maxIter = 100, tol = 1E-6, standardization = TRUE,
    -                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2) {
    +                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2, solver = "l-bfgs",
    +                   loss = "squared_hinge") {
    --- End diff --
    
    ditto for loss


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81367/
    Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146165706
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    --- End diff --
    
    To ensure consistency with param validation in all Locales.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by debasish83 <gi...@git.apache.org>.
Github user debasish83 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115745818
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -154,22 +159,23 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
     
       test("linearSVC with sample weights") {
         def modelEquals(m1: LinearSVCModel, m2: LinearSVCModel): Unit = {
    -      assert(m1.coefficients ~== m2.coefficients absTol 0.05)
    +      assert(m1.coefficients ~== m2.coefficients absTol 0.07)
           assert(m1.intercept ~== m2.intercept absTol 0.05)
         }
    -
    -    val estimator = new LinearSVC().setRegParam(0.01).setTol(0.01)
    -    val dataset = smallBinaryDataset
    -    MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals)
    -    MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    -    MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    -      dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    LinearSVC.supportedOptimizers.foreach { opt =>
    +      val estimator = new LinearSVC().setRegParam(0.02).setTol(0.01).setSolver(opt)
    +      val dataset = smallBinaryDataset
    +      MLTestingUtils.testArbitrarilyScaledWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals)
    +      MLTestingUtils.testOutliersWithSmallWeights[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, 2, modelEquals, outlierRatio = 3)
    +      MLTestingUtils.testOversamplingVsWeighting[LinearSVCModel, LinearSVC](
    +        dataset.as[LabeledPoint], estimator, modelEquals, 42L)
    +    }
       }
     
    -  test("linearSVC comparison with R e1071 and scikit-learn") {
    -    val trainer1 = new LinearSVC()
    +  test("linearSVC OWLQN comparison with R e1071 and scikit-learn") {
    +    val trainer1 = new LinearSVC().setSolver(LinearSVC.OWLQN)
           .setRegParam(0.00002) // set regParam = 2.0 / datasize / c
    --- End diff --
    
    @hhbyyh I saw some posts that hinge loss is not differentiable but squared hinge loss is for practical purposes...can you please point to a reference on squared hinge loss ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    @hhbyyh If different handling of intercept scaling is the major cause for result difference between sklearn and Spark, do you check whether fit model without intercept will produce same model? Theoretically, different optimizers should converged to the same result if the function is convex. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81836/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Sure. That's reasonable. I'll move the hingeAggregator to a new PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78024/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146166006
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    +
    +  /* Set of loss function that LinearSVC supports */
    +  private[classification] val supportedLoss = Array(HINGE, SQUARED_HINGE)
    --- End diff --
    
    Sure. I can update it.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for L...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r115698645
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -205,15 +233,21 @@ class LinearSVC @Since("2.2.0") (
           val costFun = new LinearSVCCostFun(instances, $(fitIntercept),
             $(standardization), bcFeaturesStd, regParamL2, $(aggregationDepth))
     
    -      def regParamL1Fun = (index: Int) => 0D
    -      val optimizer = new BreezeOWLQN[Int, BDV[Double]]($(maxIter), 10, regParamL1Fun, $(tol))
    +      val optimizerAlgo = $(optimizer) match {
    --- End diff --
    
    Personally I would be in favor of making square hinge the default with L-BFGS. But we would need to make it clear to users that if they select the pure hinge loss then there is a risk involved in using L-BFGS rather than OWL-QN.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78002 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78002/testReport)** for PR 17862 at commit [`3707580`](https://github.com/apache/spark/commit/3707580de4b3905f2f77d53c926df14cda32f942).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #76463 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76463/testReport)** for PR 17862 at commit [`d46e5ed`](https://github.com/apache/spark/commit/d46e5edfdf5d052c59677f58767bdbe0803dc368).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r146165449
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -42,7 +44,26 @@ import org.apache.spark.sql.functions.{col, lit}
     /** Params for linear SVM Classifier. */
     private[classification] trait LinearSVCParams extends ClassifierParams with HasRegParam
       with HasMaxIter with HasFitIntercept with HasTol with HasStandardization with HasWeightCol
    -  with HasAggregationDepth with HasThreshold {
    +  with HasAggregationDepth with HasThreshold with HasSolver {
    +
    +  /**
    +   * Specifies the loss function. Currently "hinge" and "squared_hinge" are supported.
    +   * "hinge" is the standard SVM loss (a.k.a. L1 loss) while "squared_hinge" is the square of
    +   * the hinge loss (a.k.a. L2 loss).
    +   *
    +   * @see <a href="https://en.wikipedia.org/wiki/Hinge_loss">Hinge loss (Wikipedia)</a>
    +   *
    +   * @group param
    +   */
    +  @Since("2.3.0")
    +  final val loss: Param[String] = new Param(this, "loss", "Specifies the loss " +
    +    "function. hinge is the standard SVM loss while squared_hinge is the square of the hinge loss.",
    +    (s: String) => LinearSVC.supportedLoss.contains(s.toLowerCase(Locale.ROOT)))
    --- End diff --
    
    Correct me if I'm wrong, IMO we need toLowerCase here.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78024 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78024/testReport)** for PR 17862 at commit [`5f7f456`](https://github.com/apache/spark/commit/5f7f456335b02f1408f0d1577bdbbc3963312233).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r145371903
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/LinearSVC.scala ---
    @@ -282,8 +348,27 @@ class LinearSVC @Since("2.2.0") (
     @Since("2.2.0")
     object LinearSVC extends DefaultParamsReadable[LinearSVC] {
     
    +  /** String name for Limited-memory BFGS. */
    +  private[classification] val LBFGS: String = "l-bfgs".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Orthant-Wise Limited-memory Quasi-Newton. */
    +  private[classification] val OWLQN: String = "owlqn".toLowerCase(Locale.ROOT)
    +
    +  /* Set of optimizers that LinearSVC supports */
    +  private[classification] val supportedSolvers = Array(LBFGS, OWLQN)
    +
    +  /** String name for Hinge Loss. */
    +  private[classification] val HINGE: String = "hinge".toLowerCase(Locale.ROOT)
    +
    +  /** String name for Squared Hinge Loss. */
    +  private[classification] val SQUARED_HINGE: String = "squared_hinge".toLowerCase(Locale.ROOT)
    +
    +  /* Set of loss function that LinearSVC supports */
    +  private[classification] val supportedLoss = Array(HINGE, SQUARED_HINGE)
    --- End diff --
    
    supportedLoss ==> supportedLosses


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #76508 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76508/testReport)** for PR 17862 at commit [`f7d5559`](https://github.com/apache/spark/commit/f7d555997d8c589585b5a34a0b017a29577cad82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78019 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78019/testReport)** for PR 17862 at commit [`2ca5a74`](https://github.com/apache/spark/commit/2ca5a7456f7dc5ea4473ddee2933bd6228b3476e).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #76463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76463/testReport)** for PR 17862 at commit [`d46e5ed`](https://github.com/apache/spark/commit/d46e5edfdf5d052c59677f58767bdbe0803dc368).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78019/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #78731 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78731/testReport)** for PR 17862 at commit [`7be6bac`](https://github.com/apache/spark/commit/7be6bacd19e06de6ea8f040f8ad0c70feee82e12).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squa...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/17862#discussion_r122607391
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -96,7 +98,8 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
     #' @note spark.svmLinear since 2.2.0
     setMethod("spark.svmLinear", signature(data = "SparkDataFrame", formula = "formula"),
               function(data, formula, regParam = 0.0, maxIter = 100, tol = 1E-6, standardization = TRUE,
    -                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2) {
    +                   threshold = 0.0, weightCol = NULL, aggregationDepth = 2, solver = "l-bfgs",
    --- End diff --
    
    for parameter with limited choice,
    use `c("l-bfgs", "owlqn")` https://github.com/apache/spark/pull/18140/files#diff-2e1e5d49cfda9702645a955f584572eeR129
    and match.arg instead https://github.com/apache/spark/pull/18140/files#diff-2e1e5d49cfda9702645a955f584572eeR132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS as optimizer for LinearSV...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    **[Test build #76637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/76637/testReport)** for PR 17862 at commit [`8a7c10f`](https://github.com/apache/spark/commit/8a7c10f5bc0d7234ed6e156c98f04bddb7a37204).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by yanboliang <gi...@git.apache.org>.
Github user yanboliang commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    I'm in favor of discarding OWLQN. Take LiR or LoR as examples, if you replace LBFGS with OWLQN for regression with L2 regularization, we can saw OWLQN may converge faster than LBFGS in a certain case, but in most of the cases, LBFGS is faster. But I'm still open to hear others' thought. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by hhbyyh <gi...@git.apache.org>.
Github user hhbyyh commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Given the discussion above, I plan to replace OWLQN with LBFGS. I will send update soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #17862: [SPARK-20602] [ML]Adding LBFGS optimizer and Squared_hin...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/17862
  
    Catching up here...  To make sure I caught the decisions made in the discussion above, is it correct that this PR will:
    * Add support for squared hinge loss, and use that as the default (which I fully support)
    * Switch from OWLQN to LBFGS (which is fine with me if reasonably large-scale tests support that choice)
      * On this note, has anyone tested on large-scale datasets (ideally with millions of rows and columns)?
    
    If those are the decisions, can you please update the PR description and JIRA as such when you update the PR @hhbyyh ?  Thank you!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org