You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/04/04 12:00:11 UTC

[GitHub] [arrow] Dandandan commented on pull request #9882: ARROW-12190: [Rust][DataFusion] Implement parallel / partitioned hash join

Dandandan commented on pull request #9882:
URL: https://github.com/apache/arrow/pull/9882#issuecomment-813021037


   @jorgecarleitao I am wondering if you have found a more performant implementation or ideas for the `concat` kernel in your `arrow2` branch?
   In this case (even though the extra parallelism has better performance and better scalability), through CoalesceBatches it is adding significant overhead, but looking at the implementation think there are probably quite some opportunities to improve performance there - especially for the concat case there seems too much work currently done in the `MutableArrayData` and creating intermediate `Vec`s. etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org