You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by jinxing64 <gi...@git.apache.org> on 2017/10/25 08:26:22 UTC

[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...

GitHub user jinxing64 opened a pull request:

    https://github.com/apache/spark/pull/19573

    [SPARK-22350][SQL] select grouping__id from subquery

    ## What changes were proposed in this pull request?
    
    Currently, sql below will fail:
    ```
    SELECT cnt, k2, k3, grouping__id
    FROM
    (SELECT count as cnt, k2, k3, grouping__id
    FROM t1
    GROUP BY k2, k3
    GROUPING SETS(k2, k3)) t2
    ```
    The use case is common in our warehouse and supported hive now. 
    Could we support it?
    
    ## How was this patch tested?
    Test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-22350

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19573
    
----
commit c0ecbeea29c39091a7c1105afaa3741e28c19286
Author: jinxing <ji...@126.com>
Date:   2017-10-24T13:18:56Z

    select grouping__id from subquery

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...

Posted by jinxing64 <gi...@git.apache.org>.
GitHub user jinxing64 reopened a pull request:

    https://github.com/apache/spark/pull/19573

    [SPARK-22350][SQL] select grouping__id from subquery

    ## What changes were proposed in this pull request?
    
    Currently, sql below will fail:
    ```
    SELECT cnt, k2, k3, grouping__id
    FROM
    (SELECT count as cnt, k2, k3, grouping__id
    FROM t1
    GROUP BY k2, k3
    GROUPING SETS(k2, k3)) t2
    ```
    The use case is common in our warehouse and supported by hive now. 
    Could we support it?
    
    ## How was this patch tested?
    Test added

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jinxing64/spark SPARK-22350

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19573.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19573
    
----
commit c0ecbeea29c39091a7c1105afaa3741e28c19286
Author: jinxing <ji...@126.com>
Date:   2017-10-24T13:18:56Z

    select grouping__id from subquery

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83131 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83131/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86928/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83131 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83131/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #86928 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86928/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83039/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83039 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83039/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1559/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83594 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83594/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 closed the pull request at:

    https://github.com/apache/spark/pull/19573


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Thanks a lot. I will leave it open(if it's ok). Actually my friend from a another company also suffers this issue. Maybe people can leave some ideas on this.
    Thanks again for comment on this. It will be great if you could review the pr when you have time. I can keep working on it :)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83594/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83039 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83039/testReport)** for PR 19573 at commit [`c0ecbee`](https://github.com/apache/spark/commit/c0ecbeea29c39091a7c1105afaa3741e28c19286).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    @jinxing64 You can keep it open, but it might take more time to review the fix. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83131/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    @DonnyZone 
    Thanks for taking a look.
    I think not quite the same.
    After https://github.com/apache/spark/pull/18270, all `grouping__id` are transformed to be `GroupingID` , which makes user cannot select `grouping__id` with subquery.
    Also after that pr, `grouping__id` is not deprecated any longer. This pr removes `spark_grouping_id` and simplify the logic.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/4729/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by jinxing64 <gi...@git.apache.org>.
Github user jinxing64 commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    @gatorsmile
    thanks for reply.
    It seems you preffer to give the alias explicitly. I will close this pr and go by your suggestion.
    But in my warehouse, there are lots of ETLs which are selecting grouping__id from subquery. We cannot migrate seamlessly


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by DonnyZone <gi...@git.apache.org>.
Github user DonnyZone commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    Is it similar to the below issue?
    https://github.com/apache/spark/pull/19178


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #86928 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86928/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
     * This patch **fails PySpark unit tests**.
     * This patch **does not merge cleanly**.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19573: [SPARK-22350][SQL] select grouping__id from subqu...

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19573#discussion_r147344912
  
    --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala ---
    @@ -1497,6 +1497,27 @@ class SQLQuerySuite extends QueryTest with SQLTestUtils with TestHiveSingleton {
         }
       }
     
    +  test("select grouping__id from subquery.") {
    +    checkAnswer(
    +      sql(
    +        """
    +          |SELECT cnt, k2, k3, grouping__id
    +          |FROM
    +          |  (SELECT count(*) as cnt, k2, k3, grouping__id
    --- End diff --
    
    ```
    SELECT cnt, k2, k3, alias_grouping__id
    FROM
      (SELECT count(*) as cnt, k2, k3, grouping__id as alias_grouping__id
      FROM (SELECT key, key%2 as k2 , key%3 as k3 FROM src) t1
      GROUP BY k2, k3
      GROUPING SETS(k2, k3)) t2
    ORDER BY alias_grouping__id, k2, k3
    ```
    This is the workaround.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19573: [SPARK-22350][SQL] select grouping__id from subquery

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19573
  
    **[Test build #83594 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83594/testReport)** for PR 19573 at commit [`a593442`](https://github.com/apache/spark/commit/a593442bd026ec70b4c29f2d247300f7b6d829ec).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org