You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/14 19:07:43 UTC
[GitHub] [arrow-datafusion] alamb commented on pull request #6009: fix: remove partition aware union logic
alamb commented on PR #6009:
URL: https://github.com/apache/arrow-datafusion/pull/6009#issuecomment-1509099461
I spent a non trivial time thinking about what a "PartitionAware" Union even means
It is entirely undocumented in
https://docs.rs/datafusion/22.0.0/datafusion/physical_plan/union/struct.UnionExec.html
Thanks to @mingmwang and @crepererum 's comment on https://github.com/apache/arrow-datafusion/issues/5970#issuecomment-1508165252
Given the operation seems so different, if we are going to keep the partition aware union, I think we should use a different structure name. Maybe we could call what is currently named "UnionExec with preserve partitioning" as `Interleave` -- that would imply the data from the different partitions was kept segregated in their own partitions but interleaved in the output partition streams.
> The physical plan should show or display the UnionExec is partition-aware or ordering-aware clearly.
I will try and make a PR to do this to make the current state of affairs easier to understand
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org