You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2017/10/25 08:26:22 UTC
[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...
GitHub user jinxing64 opened a pull request:
https://github.com/apache/spark/pull/19573
[SPARK-22350][SQL] select grouping__id from subquery
## What changes were proposed in this pull request?
Currently, sql below will fail:
```
SELECT cnt, k2, k3, grouping__id
FROM
(SELECT count as cnt, k2, k3, grouping__id
FROM t1
GROUP BY k2, k3
GROUPING SETS(k2, k3)) t2
```
The use case is common in our warehouse and supported hive now.
Could we support it?
## How was this patch tested?
Test added
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-22350
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19573.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19573
----
commit c0ecbeea29c39091a7c1105afaa3741e28c19286
Author: jinxing <ji...@126.com>
Date: 2017-10-24T13:18:56Z
select grouping__id from subquery
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...
Posted by jinxing64 <gi...@git.apache.org>.
GitHub user jinxing64 reopened a pull request:
https://github.com/apache/spark/pull/19573
[SPARK-22350][SQL] select grouping__id from subquery
## What changes were proposed in this pull request?
Currently, sql below will fail:
```
SELECT cnt, k2, k3, grouping__id
FROM
(SELECT count as cnt, k2, k3, grouping__id
FROM t1
GROUP BY k2, k3
GROUPING SETS(k2, k3)) t2
```
The use case is common in our warehouse and supported by hive now.
Could we support it?
## How was this patch tested?
Test added
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/jinxing64/spark SPARK-22350
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19573.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19573
----
commit c0ecbeea29c39091a7c1105afaa3741e28c19286
Author: jinxing <ji...@126.com>
Date: 2017-10-24T13:18:56Z
select grouping__id from subquery
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83131/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test FAILed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86928/
Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83131/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #86928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86928/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83039/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83039/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1559/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83594/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Build finished. Test FAILed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Merged build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 closed the pull request at:
https://github.com/apache/spark/pull/19573
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
Thanks a lot. I will leave it open(if it's ok). Actually my friend from a another company also suffers this issue. Maybe people can leave some ideas on this.
Thanks again for comment on this. It will be great if you could review the pr when you have time. I can keep working on it :)
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83594/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83039/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:
https://github.com/apache/spark/pull/19573
@jinxing64 You can keep it open, but it might take more time to review the fix.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83131/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
@DonnyZone
Thanks for taking a look.
I think not quite the same.
After https://github.com/apache/spark/pull/18270, all `grouping__id` are transformed to be `GroupingID` , which makes user cannot select `grouping__id` with subquery.
Also after that pr, `grouping__id` is not deprecated any longer. This pr removes `spark_grouping_id` and simplify the logic.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Test PASSed.
Refer to this link for build results (access rights to CI server needed):
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4729/
Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:
https://github.com/apache/spark/pull/19573
@gatorsmile
thanks for reply.
It seems you preffer to give the alias explicitly. I will close this pr and go by your suggestion.
But in my warehouse, there are lots of ETLs which are selecting grouping__id from subquery. We cannot migrate seamlessly
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:
https://github.com/apache/spark/pull/19573
Build finished. Test PASSed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by DonnyZone <gi...@git.apache.org>.
Github user DonnyZone commented on the issue:
https://github.com/apache/spark/pull/19573
Is it similar to the below issue?
https://github.com/apache/spark/pull/19178
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #86928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86928/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
* This patch **fails PySpark unit tests**.
* This patch **does not merge cleanly**.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...
Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:
https://github.com/apache/spark/pull/19573#discussion_r147344912
--- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
@@ -1497,6 +1497,27 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
}
}
+ test("select grouping__id from subquery.") {
+ checkAnswer(
+ sql(
+ """
+ |SELECT cnt, k2, k3, grouping__id
+ |FROM
+ | (SELECT count(*) as cnt, k2, k3, grouping__id
--- End diff --
```
SELECT cnt, k2, k3, alias_grouping__id
FROM
(SELECT count(*) as cnt, k2, k3, grouping__id as alias_grouping__id
FROM (SELECT key, key%2 as k2 , key%3 as k3 FROM src) t1
GROUP BY k2, k3
GROUPING SETS(k2, k3)) t2
ORDER BY alias_grouping__id, k2, k3
```
This is the workaround.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org
[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery
Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:
https://github.com/apache/spark/pull/19573
**[Test build #83594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83594/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org