You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by WeichenXu123 <gi...@git.apache.org> on 2017/08/28 06:29:20 UTC

[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

GitHub user WeichenXu123 opened a pull request:

    https://github.com/apache/spark/pull/19065

    [SPARK-21729][ML][TEST] Generic test for ProbabilisticClassifier to ensure consistent output columns

    ## What changes were proposed in this pull request?
    
    Add test for prediction using the model with all combinations of output columns turned on/off.
    Make sure the output column values match, presumably by comparing vs. the case with all 3 output columns turned on.
    
    ## How was this patch tested?
    
    Test updated.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/WeichenXu123/spark generic_test_for_prob_classifier

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19065.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19065
    
----
commit a588e2eb349eab46344fb4b3817f0b575c353eaf
Author: WeichenXu <we...@outlook.com>
Date:   2017-08-25T09:08:48Z

    init pr

commit 3e8617d3098da7c8fad09c4d3af9f86369a23591
Author: WeichenXu <we...@outlook.com>
Date:   2017-08-28T06:25:33Z

    update tests

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81176 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81176/testReport)** for PR 19065 at commit [`3e8617d`](https://github.com/apache/spark/commit/3e8617d3098da7c8fad09c4d3af9f86369a23591).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135653479
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -18,7 +18,10 @@
     package org.apache.spark.ml.classification
     
     import org.apache.spark.SparkFunSuite
    -import org.apache.spark.ml.linalg.{Vector, Vectors}
    +import org.apache.spark.ml.linalg.{DenseVector, Vector, Vectors}
    +import org.apache.spark.ml.param.ParamMap
    +import org.apache.spark.ml.util.TestingUtils._
    +import org.apache.spark.sql.{DataFrame, Dataset, Row}
    --- End diff --
    
    DataFrame is an unused import, could be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Nice!  LGTM
    Thanks @WeichenXu123 and @smurching !
    Merging with master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135782045
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -91,4 +94,54 @@ object ProbabilisticClassifierSuite {
         "thresholds" -> Array(0.4, 0.6)
       )
     
    +  def probabilisticClassifierGenericTest[
    +      FeaturesType,
    +      M <: ProbabilisticClassificationModel[FeaturesType, M]](
    +    model: M, testData: Dataset[_]): Unit = {
    +
    +    val allColModel = model.copy(ParamMap.empty)
    +      .setRawPredictionCol("rawPredictionAll")
    +      .setProbabilityCol("probabilityAll")
    +      .setPredictionCol("predictionAll")
    +    val allColResult = allColModel.transform(testData)
    +
    +    for (rawPredictionCol <- Seq("", "rawPredictionSingle")) {
    +      for (probabilityCol <- Seq("", "probabilitySingle")) {
    --- End diff --
    
    Yes.
    And when set these 3 params to be empty string, they are regarded as turned off, according to the implementation in `ProbabilityClassificationModel`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135656421
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -91,4 +94,54 @@ object ProbabilisticClassifierSuite {
         "thresholds" -> Array(0.4, 0.6)
       )
     
    +  def probabilisticClassifierGenericTest[
    +      FeaturesType,
    +      M <: ProbabilisticClassificationModel[FeaturesType, M]](
    +    model: M, testData: Dataset[_]): Unit = {
    +
    +    val allColModel = model.copy(ParamMap.empty)
    +      .setRawPredictionCol("rawPredictionAll")
    +      .setProbabilityCol("probabilityAll")
    +      .setPredictionCol("predictionAll")
    +    val allColResult = allColModel.transform(testData)
    +
    +    for (rawPredictionCol <- Seq("", "rawPredictionSingle")) {
    +      for (probabilityCol <- Seq("", "probabilitySingle")) {
    --- End diff --
    
    Just to confirm, does setting `probabilityCol`, `rawPredictionCol`, `predictionCol` to empty strings work here because expressions like `$(probabilityCol)` (used in [ProbabilisticClassifier.scala](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala#L115)) return the String value of probabilityCol?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19065


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81217/testReport)** for PR 19065 at commit [`a6fef60`](https://github.com/apache/spark/commit/a6fef60bdadb56a68c60a16306ce524aa49fa731).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81288 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81288/testReport)** for PR 19065 at commit [`f13cd73`](https://github.com/apache/spark/commit/f13cd73926e80173228637da2015c7d6e7a0e848).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135653421
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -18,7 +18,10 @@
     package org.apache.spark.ml.classification
     
     import org.apache.spark.SparkFunSuite
    -import org.apache.spark.ml.linalg.{Vector, Vectors}
    +import org.apache.spark.ml.linalg.{DenseVector, Vector, Vectors}
    --- End diff --
    
    It looks like DenseVector is an unused import and could be removed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135653044
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/DecisionTreeClassifierSuite.scala ---
    @@ -262,6 +262,9 @@ class DecisionTreeClassifierSuite
           assert(Vectors.dense(rawPred.toArray.map(_ / sum)) === probPred,
             "probability prediction mismatch")
         }
    +
    +    ProbabilisticClassifierSuite.probabilisticClassifierGenericTest[
    --- End diff --
    
    We should use a more descriptive name for this test. How about `ProbabilisticClassifierSuite.testPredictMethods`? @jkbradley may have other suggestions too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r136152805
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -91,4 +94,60 @@ object ProbabilisticClassifierSuite {
         "thresholds" -> Array(0.4, 0.6)
       )
     
    +  /**
    +   * Add test for prediction using the model with all combinations of
    --- End diff --
    
    Tiny nit: This could be reworded from the JIRA description.
    
    How about:
    
    Helper for testing that a ProbabilisticClassificationModel computes the same predictions across all combinations of output columns (rawPrediction/probability/prediction) turned on/off. Makes sure the output column values match by comparing vs. the case with all 3 output columns turned on.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81288/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81176/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135653663
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -91,4 +94,54 @@ object ProbabilisticClassifierSuite {
         "thresholds" -> Array(0.4, 0.6)
       )
     
    +  def probabilisticClassifierGenericTest[
    --- End diff --
    
    Could you add a comment explaining what this test does? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    @smurching Code updated, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by WeichenXu123 <gi...@git.apache.org>.
Github user WeichenXu123 commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81288 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81288/testReport)** for PR 19065 at commit [`f13cd73`](https://github.com/apache/spark/commit/f13cd73926e80173228637da2015c7d6e7a0e848).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81217/testReport)** for PR 19065 at commit [`a6fef60`](https://github.com/apache/spark/commit/a6fef60bdadb56a68c60a16306ce524aa49fa731).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81217/
    Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    **[Test build #81176 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81176/testReport)** for PR 19065 at commit [`3e8617d`](https://github.com/apache/spark/commit/3e8617d3098da7c8fad09c4d3af9f86369a23591).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by jkbradley <gi...@git.apache.org>.
Github user jkbradley commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Taking a look now


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19065: [SPARK-21729][ML][TEST] Generic test for Probabil...

Posted by smurching <gi...@git.apache.org>.
Github user smurching commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19065#discussion_r135655220
  
    --- Diff: mllib/src/test/scala/org/apache/spark/ml/classification/ProbabilisticClassifierSuite.scala ---
    @@ -91,4 +94,54 @@ object ProbabilisticClassifierSuite {
         "thresholds" -> Array(0.4, 0.6)
       )
     
    +  def probabilisticClassifierGenericTest[
    +      FeaturesType,
    +      M <: ProbabilisticClassificationModel[FeaturesType, M]](
    +    model: M, testData: Dataset[_]): Unit = {
    +
    +    val allColModel = model.copy(ParamMap.empty)
    +      .setRawPredictionCol("rawPredictionAll")
    +      .setProbabilityCol("probabilityAll")
    +      .setPredictionCol("predictionAll")
    +    val allColResult = allColModel.transform(testData)
    +
    +    for (rawPredictionCol <- Seq("", "rawPredictionSingle")) {
    +      for (probabilityCol <- Seq("", "probabilitySingle")) {
    +        for (predictionCol <- Seq("", "predictionSingle")) {
    +          val newModel = model.copy(ParamMap.empty)
    +            .setRawPredictionCol(rawPredictionCol)
    +            .setProbabilityCol(probabilityCol)
    +            .setPredictionCol(predictionCol)
    +
    +          val result = newModel.transform(allColResult)
    +
    +          import org.apache.spark.sql.functions._
    +
    +          val resultRawPredictionCol =
    +            if (rawPredictionCol.isEmpty) col("rawPredictionAll") else col(rawPredictionCol)
    +          val resultProbabilityCol =
    +            if (probabilityCol.isEmpty) col("probabilityAll") else col(probabilityCol)
    +          val resultPredictionCol =
    +            if (predictionCol.isEmpty) col("predictionAll") else col(predictionCol)
    +
    +          result.select(
    +            resultRawPredictionCol, col("rawPredictionAll"),
    +            resultProbabilityCol, col("probabilityAll"),
    +            resultPredictionCol, col("predictionAll")
    +          ).collect().foreach {
    +            case Row(
    +              rawPredictionSingle: Vector, rawPredictionAll: Vector,
    +              probabilitySingle: Vector, probabilityAll: Vector,
    +              predictionSingle: Double, predictionAll: Double
    +            ) => {
    +              assert(rawPredictionSingle.asInstanceOf[Vector] ~== rawPredictionAll relTol 1E-3)
    --- End diff --
    
    Are these `asInstanceOf[]` casts necessary given that `rawPredictionSingle`, `rawPredictionAll` are explicitly typed in the case statement above?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19065: [SPARK-21729][ML][TEST] Generic test for ProbabilisticCl...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19065
  
    Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org