You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/20 14:47:36 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #369: Proposal: Rename MergeExec

alamb opened a new issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   The name `MergeExec` has long confused me. 
   
   The name `MergeExec` implies to me that its purpose is to "merge sorted streams" into another sorteds stream
   
   However,   `MergeExec` does something more like `UNION ALL` of streams into a single stream, combining the input streams in some arbitrary output order. Something that looks to me like an actual "merge" operation is described in  https://github.com/apache/arrow-datafusion/issues/362. 
   
   **Describe the solution you'd like**
   I propose renaming  `MergeExec` to `UnionToSingleStreamExec` to make it clearer
   
   It could also potentially combined with the existing `UnionExec` (which preserves partitioning) as another flag
   
   **Describe alternatives you've considered**
   As this would be a fairly invasive (and backwards incompatible change) leaving the current naming is also a possibility. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845261931


   I agree that the name is confusing. Union would be confusing too, as a UNION ALL doesn't actually combine the partitions into one and has 2 or more children instead of only 1.
   
   `CoalescePartitionsExec` sounds great to me :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao closed issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845261931


   I agree that the name is confusing. Union would be confusing too, as a UNION all doesn't actually combine the partitions.
   
   `CoalescePartitionsExec` sounds great to me :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845268587


   I like`CoalescePartitionsExec` 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] andygrove commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845251570


   I agree that we should change the name. If I had to pick a new name I would probably be thinking more along the lines of `CoalescePartitionsExec` since this is a specialized form of repartitioning. That said, I don't have a strong opinion on the new name.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] jorgecarleitao commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845189357


   agree 👍 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845261931


   I agree that the name is confusing. Union would be confusing too, as a UNION ALL doesn't actually combine the partitions into one and has 2 or multiple children instead of only 1.
   
   `CoalescePartitionsExec` sounds great to me :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
Dandandan commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845265344


   Another thing which makes some sense to me is making this a special case of `RepartitionExec` with a `SinglePartition` option (next to round robin / hash)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan edited a comment on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
Dandandan edited a comment on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845261931


   I agree that the name is confusing. Union would be confusing too, as a UNION ALL doesn't actually combine the partitions into one
   
   `CoalescePartitionsExec` sounds great to me :+1: 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #369: Proposal: Rename MergeExec

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #369:
URL: https://github.com/apache/arrow-datafusion/issues/369#issuecomment-845187879


   Do you have any thoughts on this @Dandandan  @jorgecarleitao @andygrove @returnString ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org