You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by GitBox <gi...@apache.org> on 2021/04/23 01:28:22 UTC

[GitHub] [spark] sigmod edited a comment on pull request #31913: [SPARK-34581][SQL] Don't optimize out grouping expressions from aggregate expressions without aggregate function

sigmod edited a comment on pull request #31913:
URL: https://github.com/apache/spark/pull/31913#issuecomment-825318373

Posted a message on the git commit page and copy it here.

Hi @peter-toth

@cloud-fan @maryannxue @rednaxelafx and I had an offline discussion about PR.

Here's a summary:

(1) This PR would bring complexities when we add more optimization around grouping columns down the road. E.g., if we want to reorder grouping columns in order to reuse partitioning properties or optimize away certain grouping columns, we would need to re-generate the ordinals in grouping refs.
(2) It seems safe to pull out non-attribute grouping expressions from an Aggregate and put them in a Project below the Aggregate. Then, grouping expressions in the Aggregate are only AttributeReferences. The rest of Optimizer rules can still match the Aggregate as before. We do need to pay attention to rules that collapse Aggregate and Project. In the planner, we can combine Aggregate and Project, so that the physical plan doesn't change.

Thus, would you mind reverting this PR and try the proposal in (2)? Thanks in advance!

Best,
Yingyi

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org