You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by viirya <gi...@git.apache.org> on 2017/11/08 04:04:54 UTC
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
GitHub user viirya opened a pull request:
https://github.com/apache/spark/pull/19689
[SPARK-22462][SQL] Make rdd-based actions in Dataset trackable in SQL UI
## What changes were proposed in this pull request?
For the few Dataset actions such as `foreach`, currently no SQL metrics are visible in the SQL tab of SparkUI. It is because it binds wrongly to Dataset's `QueryExecution`. As the actions directly evaluate on the RDD which has individual `QueryExecution`, to show correct SQL metrics on UI, we should bind to RDD's `QueryExecution`.
## How was this patch tested?
Manually test. Screenshot is attached in the PR.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/viirya/spark-1 SPARK-22462
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19689.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19689
----
commit ac539cd0e761193d9a665d8ccb19a8fba5dd504b
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date: 2017-11-07T10:54:14Z
Make rdd-based actions trackable in UI.
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83688 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83688/testReport)** for PR 19689 at commit [`1b1699a`](https://github.com/apache/spark/commit/1b1699a5e414552686d67d758ee0369202d02234).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83584/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:
https://github.com/apache/spark/pull/19689
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83584/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83711 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83711/testReport)** for PR 19689 at commit [`20dc217`](https://github.com/apache/spark/commit/20dc21741a5219dd19bb44806e640b7ea4eed012).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19689
The screenshot for running `sql("select * from range(10)").foreach(a => Unit)` on spark-shell:
<img width="352" alt="screen shot 2017-11-08 at 12 07 37 pm" src="https://user-images.githubusercontent.com/68855/32531135-1e60d544-c47d-11e7-88d6-627ef77d0b80.png">
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83581/testReport)** for PR 19689 at commit [`ac539cd`](https://github.com/apache/spark/commit/ac539cd0e761193d9a665d8ccb19a8fba5dd504b).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/19689#discussion_r150258959
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2579,8 +2579,15 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
- def foreach(f: T => Unit): Unit = withNewExecutionId {
--- End diff --
Ok.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83584 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83584/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
* This patch **fails due to an unknown error code, -9**.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83711 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83711/testReport)** for PR 19689 at commit [`20dc217`](https://github.com/apache/spark/commit/20dc21741a5219dd19bb44806e640b7ea4eed012).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19689
retest this please.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83591/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19689
cc @juliuszsompolski
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19689
@juliuszsompolski No problem. Non-committer can still review. :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19689
LGTM
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19689#discussion_r150209023
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2579,8 +2579,15 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
- def foreach(f: T => Unit): Unit = withNewExecutionId {
--- End diff --
how about we add a `withNewRDDExecutionId`?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83589/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by viirya <gi...@git.apache.org>.
Github user viirya commented on the issue:
https://github.com/apache/spark/pull/19689
cc @cloud-fan for review too.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83581/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83589 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83589/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19689: [SPARK-22462][SQL] Make rdd-based actions in Data...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:
https://github.com/apache/spark/pull/19689#discussion_r150269521
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala ---
@@ -2579,8 +2579,10 @@ class Dataset[T] private[sql](
* @group action
* @since 1.6.0
*/
- def foreach(f: T => Unit): Unit = withNewExecutionId {
- rdd.foreach(f)
+ def foreach(f: T => Unit): Unit = {
+ withNewRDDExecutionId {
--- End diff --
we can follow the previous style: `def foreach ... = withNewRDDExecutionId {`
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83711/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/19689
and also I believe anyone can leave the sign-off too if it looks good :).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83591 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83591/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83688 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83688/testReport)** for PR 19689 at commit [`1b1699a`](https://github.com/apache/spark/commit/1b1699a5e414552686d67d758ee0369202d02234).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83591 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83591/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by juliuszsompolski <gi...@git.apache.org>.
Github user juliuszsompolski commented on the issue:
https://github.com/apache/spark/pull/19689
Thanks for the fix @viirya! But I'm not a Spark committer to approve it :-).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:
https://github.com/apache/spark/pull/19689
thanks, merging to master!
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19689
**[Test build #83589 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83589/testReport)** for PR 19689 at commit [`f0c6399`](https://github.com/apache/spark/commit/f0c639909d7b1638cdf2de5c3684d7de1c375436).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83688/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19689: [SPARK-22462][SQL] Make rdd-based actions in Dataset tra...
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19689
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org