You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by zhengruifeng <gi...@git.apache.org> on 2018/11/09 08:00:31 UTC
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
GitHub user zhengruifeng opened a pull request:
https://github.com/apache/spark/pull/22991
[SPARK-25989][ML] OneVsRestModel handle empty outputCols incorrectly
## What changes were proposed in this pull request?
ignore empty output columns
## How was this patch tested?
added tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/zhengruifeng/spark ovrm_empty_outcol
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/22991.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #22991
----
commit 035362d9ab6d04ff04e3060edd941fdbd0c26222
Author: zhengruifeng <ru...@...>
Date: 2018-11-09T07:47:30Z
lint
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99252 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99252/testReport)** for PR 22991 at commit [`db1fb47`](https://github.com/apache/spark/commit/db1fb47dfc85ad2a64f1f91fd2bcee95ef3afe04).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/22991
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #98645 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98645/testReport)** for PR 22991 at commit [`035362d`](https://github.com/apache/spark/commit/035362d9ab6d04ff04e3060edd941fdbd0c26222).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #98645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/98645/testReport)** for PR 22991 at commit [`035362d`](https://github.com/apache/spark/commit/035362d9ab6d04ff04e3060edd941fdbd0c26222).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22991#discussion_r235929179
--- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] (
Vectors.dense(predArray)
}
- // output the index of the classifier with highest confidence as prediction
- val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble }
-
- // output confidence as raw prediction, label and label metadata as prediction
- aggregatedDataset
- .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
- .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
- .drop(accColName)
+ if (getPredictionCol != "") {
--- End diff --
I guess I'm surprised these are both optional, in PredicitonModel too. But yeah consistency is good. However shouldn't this if clause be outside the "getRawPredictionCol = """ block? see ClassificationModel
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5384/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on a diff in the pull request:
https://github.com/apache/spark/pull/22991#discussion_r236230624
--- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -209,6 +215,9 @@ final class OneVsRestModel private[ml] (
newDataset.unpersist()
}
+ var outputColNames = Seq.empty[String]
--- End diff --
Maybe 'predictionColumns' ? These aren't the only output columns. You could make this a mutable val too, but whatever.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4883/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/98645/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99250/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99252/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:
https://github.com/apache/spark/pull/22991
Merged to master
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99250 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99250/testReport)** for PR 22991 at commit [`747a88e`](https://github.com/apache/spark/commit/747a88e19c22c61b0f7f96eeb7398520626c9b14).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99252 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99252/testReport)** for PR 22991 at commit [`db1fb47`](https://github.com/apache/spark/commit/db1fb47dfc85ad2a64f1f91fd2bcee95ef3afe04).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99250 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99250/testReport)** for PR 22991 at commit [`747a88e`](https://github.com/apache/spark/commit/747a88e19c22c61b0f7f96eeb7398520626c9b14).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/22991
**[Test build #99301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/99301/testReport)** for PR 22991 at commit [`74cc277`](https://github.com/apache/spark/commit/74cc277dc5668ad59efd19fbf47d4cfa824ba9bf).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/5339/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #22991: [SPARK-25989][ML] OneVsRestModel handle empty out...
Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on a diff in the pull request:
https://github.com/apache/spark/pull/22991#discussion_r236110139
--- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -219,14 +225,20 @@ final class OneVsRestModel private[ml] (
Vectors.dense(predArray)
}
- // output the index of the classifier with highest confidence as prediction
- val labelUDF = udf { (rawPredictions: Vector) => rawPredictions.argmax.toDouble }
-
- // output confidence as raw prediction, label and label metadata as prediction
- aggregatedDataset
- .withColumn(getRawPredictionCol, rawPredictionUDF(col(accColName)))
- .withColumn(getPredictionCol, labelUDF(col(getRawPredictionCol)), labelMetadata)
- .drop(accColName)
+ if (getPredictionCol != "") {
--- End diff --
I implemented this in another way, classificationmodel update the output dataset, and I direct return the output in each if clause.
Then I update the to follow ClassificationModel, and update the outputColumns in each clauses. And `withColumns` is used to return the output columns.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by zhengruifeng <gi...@git.apache.org>.
Github user zhengruifeng commented on the issue:
https://github.com/apache/spark/pull/22991
friendly ping @srowen @jkbradley @MLnick
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #22991: [SPARK-25989][ML] OneVsRestModel handle empty outputCols...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/22991
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/99301/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org