You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jkbradley <gi...@git.apache.org> on 2017/05/31 00:45:22 UTC

[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

GitHub user jkbradley opened a pull request:

    https://github.com/apache/spark/pull/18151

    [SPARK-20929][ML] LinearSVC should use its own threshold param

    ## What changes were proposed in this pull request?
    
    LinearSVC should use its own threshold param, rather than the shared one, since it applies to rawPrediction instead of probability.  This PR changes the param in the Scala, Python and R APIs.
    
    ## How was this patch tested?
    
    New unit test to make sure the threshold can be set to any Double value.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jkbradley/spark ml-2.2-linearsvc-cleanup

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18151.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18151
    
----
commit 5a612c3d9b417f559275006e45dae40e72653f6d
Author: Joseph K. Bradley <jo...@databricks.com>
Date:   2017-05-31T00:39:43Z

    LinearSVC should use its own threshold param, rather than the shared one, since it applies to rawPrediction instead of probability

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/78262/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77604 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77604/testReport)** for PR 18151 at commit [`35c13cf`](https://github.com/apache/spark/commit/35c13cf873ca000e30e4f91787fc9441372010ba).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18151#discussion_r119275319
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -127,6 +127,14 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
         MLTestingUtils.checkCopyAndUids(lsvc, model)
       }
     
    +  test("LinearSVC threshold can be any real value") {
    --- End diff --
    
    It would be nice to see a test that checks behavior of threshold while we're here. e.g. positive infinity predicts only class 0, negative infinity predicts only class 1. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77565/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merging with master, branch-2.2
    Thanks for reviewing!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18151#discussion_r119252826
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -62,7 +62,7 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
     #'                        of models will be always returned on the original scale, so it will be transparent for
     #'                        users. Note that with/without standardization, the models should be always converged
     #'                        to the same solution when no regularization is applied.
    -#' @param threshold The threshold in binary classification, in range [0, 1].
    +#' @param threshold The threshold in binary classification applied to rawPrediction.
    --- End diff --
    
    it might not be obvious to R user what `rawPrediction` is though


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18151#discussion_r119901231
  
    --- Diff: python/pyspark/ml/classification.py ---
    @@ -109,6 +109,10 @@ class LinearSVC(JavaEstimator, HasFeaturesCol, HasLabelCol, HasPredictionCol, Ha
         .. versionadded:: 2.2.0
         """
     
    +    threshold = Param(Params._dummy(), "threshold",
    +                      "threshold in binary classification prediction applied to rawPrediction",
    --- End diff --
    
    This is the only doc that Python users will see, right? Should we clarify here what rawPrediction is like we did for Scala and R?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    One minor comment, otherwise LGTM. Thanks for catching this @jkbradley!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    LGTM with that change.
    
    Seems odd we have to override `predict` and `raw2prediction` - seems this could be cleaned up perhaps via a better `transform` impl in `ClassificationModel`?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77565 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77565/testReport)** for PR 18151 at commit [`5a612c3`](https://github.com/apache/spark/commit/5a612c3d9b417f559275006e45dae40e72653f6d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18151


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    So...good thing you asked for the test b/c transform() wasn't going through the corrected code path.  Another bit of evidence that the Prediction APIs don't generalize that well...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #78262 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78262/testReport)** for PR 18151 at commit [`5830186`](https://github.com/apache/spark/commit/58301861500fe1c6134d4dc8cf3e789d7b1cabc3).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77604/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77655 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77655/testReport)** for PR 18151 at commit [`851e0b6`](https://github.com/apache/spark/commit/851e0b679d36c34befcb80a84b0594d4b70f1f69).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #78262 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/78262/testReport)** for PR 18151 at commit [`5830186`](https://github.com/apache/spark/commit/58301861500fe1c6134d4dc8cf3e789d7b1cabc3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77574 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77574/testReport)** for PR 18151 at commit [`98ffd16`](https://github.com/apache/spark/commit/98ffd16ffd53ff86c64629274e8d19cbc9860e0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by MLnick <gi...@git.apache.org>.
Github user MLnick commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Good catch. Agree with @sethah comment about adding a test case, otherwise LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77565 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77565/testReport)** for PR 18151 at commit [`5a612c3`](https://github.com/apache/spark/commit/5a612c3d9b417f559275006e45dae40e72653f6d).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77655/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18151#discussion_r119275003
  
    --- Diff: R/pkg/R/mllib_classification.R ---
    @@ -62,7 +62,7 @@ setClass("NaiveBayesModel", representation(jobj = "jobj"))
     #'                        of models will be always returned on the original scale, so it will be transparent for
     #'                        users. Note that with/without standardization, the models should be always converged
     #'                        to the same solution when no regularization is applied.
    -#' @param threshold The threshold in binary classification, in range [0, 1].
    +#' @param threshold The threshold in binary classification applied to rawPrediction.
    --- End diff --
    
    Good point.  Updated now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77574 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77574/testReport)** for PR 18151 at commit [`98ffd16`](https://github.com/apache/spark/commit/98ffd16ffd53ff86c64629274e8d19cbc9860e0a).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/77574/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77655 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77655/testReport)** for PR 18151 at commit [`851e0b6`](https://github.com/apache/spark/commit/851e0b679d36c34befcb80a84b0594d4b70f1f69).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    CC @mlnick @yanboliang 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #3769 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3769/testReport)** for PR 18151 at commit [`98ffd16`](https://github.com/apache/spark/commit/98ffd16ffd53ff86c64629274e8d19cbc9860e0a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #18151: [SPARK-20929][ML] LinearSVC should use its own threshold...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/18151
  
    **[Test build #77604 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/77604/testReport)** for PR 18151 at commit [`35c13cf`](https://github.com/apache/spark/commit/35c13cf873ca000e30e4f91787fc9441372010ba).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #18151: [SPARK-20929][ML] LinearSVC should use its own th...

Posted by sethah <gi...@git.apache.org>.
Github user sethah commented on a diff in the pull request:

    https://github.com/apache/spark/pull/18151#discussion_r119472374
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/LinearSVCSuite.scala ---
    @@ -127,6 +127,27 @@ class LinearSVCSuite extends SparkFunSuite with MLlibTestSparkContext with Defau
         MLTestingUtils.checkCopyAndUids(lsvc, model)
       }
     
    +  test("LinearSVC threshold acts on rawPrediction") {
    +    val lsvc =
    +      new LinearSVCModel(uid = "myLSVCM", coefficients = Vectors.dense(1.0), intercept = 0.0)
    +    val df = spark.createDataFrame(Seq(
    +      (1, Vectors.dense(1e-7)),
    +      (0, Vectors.dense(0.0)),
    +      (-1, Vectors.dense(-1e-7)))).toDF("id", "features")
    +
    +    def checkResults(threshold: Double, expected: Set[(Int, Double)]): Unit = {
    +      lsvc.setThreshold(threshold)
    +      val results = lsvc.transform(df).select("id", "prediction").collect()
    +        .map(r => (r.getInt(0), r.getDouble(1)))
    +        .toSet
    +      assert(results === expected, s"Failed for threshold = $threshold")
    +    }
    +
    +    checkResults(0.0, Set((1, 1.0), (0, 0.0), (-1, 0.0)))
    --- End diff --
    
    It might be a good idea to additionally test the other code path, by calling `lsvc.set(lsvc.rawPrediction, "")`. The way it is, we don't know if `predict` works correctly (this is done in the logistic regression suite).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org