You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/19 15:54:03 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue #362: Add an Order Preserving merge operator

alamb opened a new issue #362:
URL: https://github.com/apache/arrow-datafusion/issues/362


   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   
   In an attempt to unlock sort based optimizations in DataFusion (and IOx) often we want to take several input streams and merge them together so that the output is still sorted in the same way.
   
   One major usecase we have in IOx is that we will have data in several streams (e.g. parquet files, or in memory) that are already sorted and we want to merge those streams together into a single stream with the same sort order 
   
   As another example, it would be really nice to have a repartitioned sort operation (so that we are able to sort on partitions of an input in parallel) but there is currently no way to combine several sorted streams together. 
   
   Also, to implement something like a parallel sort-merge-join, https://github.com/apache/arrow-datafusion/issues/141, again having the ability to repartition / merge / retain sort is likely important. 
   
   **Describe the solution you'd like**
   I would like a `SortPreservingMerge` operator that takes as input a list of `SortExprs` and some number of input streams. The input streams are guaranteed to be sorted according to the sort exprs.
   
   When the operator is run, it produces a output stream that is also sorted on `SortExprs`
   
   **Describe alternatives you've considered**
   TBD
   
   **Additional context**
   I think @tustvold plans to implement an operator similar to this in IOx. Depending on how that goes, we will contemplate putting it into DataFusion


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb closed issue #362: Add an Order Preserving merge operator

Posted by GitBox <gi...@apache.org>.
alamb closed issue #362:
URL: https://github.com/apache/arrow-datafusion/issues/362


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org