You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/29 03:15:13 UTC

[GitHub] [arrow-datafusion] jorgecarleitao opened a new issue #438: Removing the physical optimizer `AddMergeExec` breaks the sorts

jorgecarleitao opened a new issue #438:
URL: https://github.com/apache/arrow-datafusion/issues/438


   AFAIK physical optimizers should be entirely optional and having them or not must result in the same semantics.
   
   However, removing `AddMergeExec` breaks THC 1 atm (all sorts, actually) with
   
   ```
   Error: Internal("SortExec requires a single input partition")
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #438: Removing the physical optimizer `AddMergeExec` breaks the sorts

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #438:
URL: https://github.com/apache/arrow-datafusion/issues/438#issuecomment-850795291


   Yes, agreed 👍 I think hash aggregates might have the same problem.
   
   Back when I added the physical optimizers I added this rule to keep it functioning the same way, as this was contained in the optimization pass.
   I believe there were some comments back then too that we should change it at some time.
   
   I think we have a couple of options:
   
   * Add a `MergeExec` during planning time, wrap any input plan for a node that requires a single partition with `MergeExec` (to me it seems like the best option).
   * Use `MergeExec` inside the `SortExec` - this is something done in the `HashJoinExec` and `CrossJoinExec` too. I am not sure if that's really elegant as it "hides" nodes from the plan tree - so maybe that should move to the planner too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org