You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "alamb (via GitHub)" <gi...@apache.org> on 2023/05/30 12:47:07 UTC

[GitHub] [arrow-datafusion] alamb opened a new issue, #6486: Sort Preserving Repartition

alamb opened a new issue, #6486:
URL: https://github.com/apache/arrow-datafusion/issues/6486

   ### Is your feature request related to a problem or challenge?
   
   RepartitionExec , when handling multiple input partitions, creates N channels for each input partition, where N is the output partition count. This results in a total of input_partition * output_partition channels. During processing, the channels are pulled for each output partition, depending on the processing time, which disrupts the order of records. This is particularly problematic when the input partition count is greater than 1, as it leads to an unpredictable order of records within the output partitions. 
   
   
   
   ### Describe the solution you'd like
   
   To address this issue, a more sophisticated algorithm is needed, one that can combine the existing hash partitioner and round-robin partitioner functionalities while preserving the original order of records within partitions, even when the input partition count is greater than 1.
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   I used the description from https://github.com/apache/arrow-datafusion/pull/6346 @baharberna above


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb closed issue #6486: Sort Preserving Repartition

Posted by "alamb (via GitHub)" <gi...@apache.org>.
alamb closed issue #6486: Sort Preserving Repartition 
URL: https://github.com/apache/arrow-datafusion/issues/6486


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org