You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/04/14 19:07:43 UTC

[GitHub] [arrow-datafusion] alamb commented on pull request #6009: fix: remove partition aware union logic

alamb commented on PR #6009:
URL: https://github.com/apache/arrow-datafusion/pull/6009#issuecomment-1509099461

   I spent a non trivial time thinking about what a "PartitionAware" Union even means 
   
   It is entirely undocumented in 
    https://docs.rs/datafusion/22.0.0/datafusion/physical_plan/union/struct.UnionExec.html
   
   Thanks to @mingmwang and @crepererum 's comment on  https://github.com/apache/arrow-datafusion/issues/5970#issuecomment-1508165252
   
   Given the operation seems so different, if we are going to keep the partition aware union, I think we should use a different structure name. Maybe we could call what is currently named "UnionExec with preserve partitioning"  as `Interleave` --  that  would imply the data from the different partitions was kept segregated in their own partitions but interleaved in the output partition streams. 
   
   
   > The physical plan should show or display the UnionExec is partition-aware or ordering-aware clearly.
   
   I will try and make a PR to do this to make the current state of affairs easier to understand
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org