You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2022/08/25 10:14:54 UTC

[GitHub] [spark] cloud-fan opened a new pull request, #37655: [SPARK-40218][SQL] GROUPING SETS should preserve the grouping columns

cloud-fan opened a new pull request, #37655:
URL: https://github.com/apache/spark/pull/37655

### What changes were proposed in this pull request?

This PR fixes a bug caused by https://github.com/apache/spark/pull/32022 . Although we deprecate `GROUP BY ... GROUPING SETS ...`, it should still work if it worked before.

https://github.com/apache/spark/pull/32022 made a mistake that it didn't preserve the order of user-specified group by columns. Usually it's not a problem, as `GROUP BY a, b` is no different from `GROUP BY b, a`. However, the `grouping_id(...)` function requires the input to be exactly the same with the group by columns. This PR fixes the problem by preserve the order of user-specified group by columns.

### Why are the changes needed?

bug fix

### Does this PR introduce _any_ user-facing change?

Yes, now a query that worked before 3.2 can work again.

### How was this patch tested?

new test

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org