You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2018/11/09 08:00:31 UTC

[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

GitHub user zhengruifeng opened a pull request:

    https://github.com/apache/spark/pull/22991

    [SPARK-25989][ML] OneVsRestModel handle empty outputCols incorrectly

    ## What changes were proposed in this pull request?
    ignore empty output columns
    
    ## How was this patch tested?
    added tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/zhengruifeng/spark ovrm_empty_outcol

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/22991.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #22991
    
----
commit 035362d9ab6d04ff04e3060edd941fdbd0c26222
Author: zhengruifeng <ru...@...>
Date:   2018-11-09T07:47:30Z

    lint

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99252/testReport)** for PR 22991 at commit [`db1fb47`](https://github.com/apache/spark/commit/db1fb47dfc85ad2a64f1f91fd2bcee95ef3afe04).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/22991


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #98645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98645/testReport)** for PR 22991 at commit [`035362d`](https://github.com/apache/spark/commit/035362d9ab6d04ff04e3060edd941fdbd0c26222).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #98645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98645/testReport)** for PR 22991 at commit [`035362d`](https://github.com/apache/spark/commit/035362d9ab6d04ff04e3060edd941fdbd0c26222).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22991#discussion_r235929179
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] (
             Vectors.dense(predArray)
           }
     
    -      // output the index of the classifier with highest confidence as prediction
    -      val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble }
    -
    -      // output confidence as raw prediction, label and label metadata as prediction
    -      aggregatedDataset
    -        .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
    -        .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
    -        .drop(accColName)
    +      if (getPredictionCol != "") {
    --- End diff --
    
    I guess I'm surprised these are both optional, in PredicitonModel too. But yeah consistency is good. However shouldn't this if clause be outside the "getRawPredictionCol = """ block? see ClassificationModel


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5384/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22991#discussion_r236230624
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -209,6 +215,9 @@ final class OneVsRestModel private[ml] (
           newDataset.unpersist()
         }
     
    +    var outputColNames = Seq.empty[String]
    --- End diff --
    
    Maybe 'predictionColumns' ? These aren't the only output columns. You could make this a mutable val too, but whatever.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4883/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98645/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99250/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99252/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged to master


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99250/testReport)** for PR 22991 at commit [`747a88e`](https://github.com/apache/spark/commit/747a88e19c22c61b0f7f96eeb7398520626c9b14).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99252/testReport)** for PR 22991 at commit [`db1fb47`](https://github.com/apache/spark/commit/db1fb47dfc85ad2a64f1f91fd2bcee95ef3afe04).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99250/testReport)** for PR 22991 at commit [`747a88e`](https://github.com/apache/spark/commit/747a88e19c22c61b0f7f96eeb7398520626c9b14).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    **[Test build #99301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5339/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22991#discussion_r236110139
  
    --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
    @@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] (
             Vectors.dense(predArray)
           }
     
    -      // output the index of the classifier with highest confidence as prediction
    -      val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble }
    -
    -      // output confidence as raw prediction, label and label metadata as prediction
    -      aggregatedDataset
    -        .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
    -        .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
    -        .drop(accColName)
    +      if (getPredictionCol != "") {
    --- End diff --
    
    I implemented this in another way, classificationmodel update the output dataset, and I direct return the output in each if clause.
    Then I update the to follow ClassificationModel, and update the outputColumns in each clauses. And `withColumns` is used to return the output columns.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    friendly ping @srowen @jkbradley @MLnick 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/22991
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99301/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org