You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 12:20:00 UTC

[GitHub] [arrow-datafusion] thinkharderdev opened a new pull request, #2716: Support for GROUPING SETS/CUBE/ROLLUP

thinkharderdev opened a new pull request, #2716:
URL: https://github.com/apache/arrow-datafusion/pull/2716

# Which issue does this PR close?

Closes #1327

TODO

- [ ] Implement CUBE expansion
- [ ] Implement ROLLUP expansion
- [ ] Add SQL tests for CUBE/ROLLUP queries
- [ ] Pass partitioning expressions directly to `AggregateExec`

Note that currently the sql parser doesn't seem to handle `GROUP BY GROUPING SETS ...` so we need to address that to test that explicitly.

# Rationale for this change

This PR adds support for GROUPING SETS (and special cases CUBE/ROLLUP) in the physical planner and execution plan.

# What changes are included in this PR?

There are three primary changes:

1. `AggregateExec` now takes a `Vec<Vec<(Arc<dyn PhysicalExpr>,String)>>` to represent grouping sets. A normal `GROUP BY` is just a special case. We expect the grouping sets to be "aligned". For example, for a SQL clause like `GROUP BY GROUPING SETS ((a),(b),(a,b))`, `AggregateExec` assumes that the planner will expand that to the grouping set `((a,NULL),(NULL,b),(a,b))`. We can't handle this in the execution plan because we don't have `ParialEq` for `PhysicalExpr`.
2. In `DefaultPhysicalPlanner` handle expanding and aligning grouping sets. This includes expanding CUBE/ROLLUP expressions and merging and aligning GROUPING SET expressions.
3. Handle grouping sets correctly in optimizers.

Also we include serialization for grouping set expression in `datafusion-proto`

# Are there any user-facing changes?

SQL statements with CUBE/ROLLUP should now be supported. GROUPING SETS should also be supported but it seems like the sql parser is not handling them correctly.

I don't think so.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org