You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/25 15:42:01 UTC

[GitHub] [arrow-datafusion] alamb commented on issue #424: Design how to respect output stream ordering

alamb commented on issue #424:
URL: https://github.com/apache/arrow-datafusion/issues/424#issuecomment-847981350


   My thoughts:
   
   I think it will be simpler, as @tustvold  has suggested, to do the majority / all of sort based optimizations (e.g. optimize away a Sort) on the `LogicalPlan` level, rather than in the physical plan. That way:
   1. We can work with `Exprs` rather than `PhysicalExpr`s. 
   2. The knowledge of sort order can also feed into potential cost model decisions too (e.g. join ordering, algorithm selection)
   
   Encoding the requirements / assumptions of `LogicalPlan` nodes via `outputOrdering ` or `requiredChildOrdering` seems like a good idea to me.
   
   In terms of physical plans, what about adding something like `ExecutionPlan::requires_output_sort()` that would communicate to the various physical optimizer passes when they had to preserve the output sort (and thus might preclude things like "repartition exec" from rewriting the plan)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org