You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/25 10:39:16 UTC

[GitHub] [arrow-datafusion] tustvold opened a new issue #424: Operator Ordering

tustvold opened a new issue #424:
URL: https://github.com/apache/arrow-datafusion/issues/424

As identified in #423 and #378 (and #412) there needs to be a mechanism for physical operators to express their behaviour with respect to sort order, so that optimisation passes can handle it correctly.

It is assumed that the initial physical plan created from the logical plan is valid, and that the requirement is for the optimisation phase to not alter the plan in a way that violates its implicit ordering requirements. I think it is therefore sufficient to encode some notion of sort sensitivity, as opposed to what the sort order necessarily is. I believe any optimisations related to the specific sort orders being utilised would take place at the LogicalPlan level, and avoiding this at the physical layer sidesteps issues around equality for PhysicalExpressions, etc...

The proposal would be to introduce a new member function to `ExecutionPlan` called `partition_order()` that returns a variant of a new enum `PartitionOrder`. This would have three variants:

* `PartitionOrder::Preserving` - operations that preserve the ordering of their input partition(s) - e.g. `FilterExec`, `CoalesceBatchesExec`
* `PartitionOrder::Sensitive` - operations which rely on the order of their input partition(s) - e.g. `GlobalLimitExec`, `LocalLimitExec`
* `PartitionOrder::Agnostic` - operations which do not rely on, nor preserve the order of their input partition(s) - e.g. `HashAggregateExec`, `MergeExec`, `RepartitionExec`

Note that the formulation does not distinguish between 1 or many partitions, as this is a detail already encapsulated by `required_child_distribution` (although I do wonder if this should be a property of the plan and not the operators themselves). There is no mechanism to express an ordering requirement across partitions, I'm not sure that this would be useful.

The default implementation of `partition_order()` would return `PartitionOrder::Sensitive`. Or to put it another way, unless explicitly told otherwise the optimiser cannot assume that an operator isn't sensitive to the ordering of its inputs.

The `Repartition` pass would then be modified to only insert a `RepartitionExec` on branches of the graph that have no `PartitionOrder::Sensitive` operations without an intervening `PartitionOrder::Agnostic` operator. This would fix #423. `AddMergeExec` could additionally be modified to error if it find itself needing to insert a `MergeExec` on an order sensitive branch.

Eventually as certain forms of `RepartitionExec` are order preserving, e.g. splitting a single partition into multiple, this could be codified and combined with a modified version of `AddMergeExec` that inserts an order preserving merge. This would naturally fit into the proposed framework.

I'm not sure how ordering is typically handled in query engines, so if there is a standard solution I'd be happy to go with that instead, but thought I'd write up the simplest solution I can see to the issue in #423

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org