You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/24 14:42:39 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue #412: DefaultPhysicalPlanner Generates Invalid Physical Plans

tustvold opened a new issue #412:
URL: https://github.com/apache/arrow-datafusion/issues/412


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   #378 sought to loosen the requirement on `SortExec` to allow it to sort within multiple partitions, preserving the partitioning of the input. However, this broke the physical planner because it relies on the `SortExec` node to express that it only supports a single partition, so that the `AddMergeExec` optimisation pass inserts a `MergeExec` in front of it.
   
   This isn't necessarily a problem as it currently isn't possible to create an `ExecutionPlan` without immediately optimising it, but it feels a little bit unusual.
   
   **Describe the solution you'd like**
   
   The PhysicalPlanner should be aware of the partitioning within the plan, and insert the MergeExec nodes, or otherwise, necessary for the plan to be correct. 
   
   **Describe alternatives you've considered**
   
   The current head of #378 adds a constructor argument to `SortExec` to opt in to looser partitioning
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #412: DefaultPhysicalPlanner Generates Invalid Physical Plans

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #412:
URL: https://github.com/apache/arrow-datafusion/issues/412#issuecomment-847217157


   Clearly separating the logic / transformations needed for the plan to produce the correct answers and the transformations that aim to make it go faster (e.g. optimizations) makes a lot of sense to me.
   
   One approach that seems reasonable to me would be to take the pass in https://github.com/apache/arrow-datafusion/blob/master/datafusion/src/physical_optimizer/merge_exec.rs and call it directly from the `DefaultPhysicalPlanner` always. Depending on where else it is used, we may even consider not exposing it as a `PhysicalOptimizerRule`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org