You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/24 19:54:33 UTC

[GitHub] [arrow] jorgecarleitao opened a new pull request #7830: ARROW-9555: [Rust] [DataFusion] Added inner join

jorgecarleitao opened a new pull request #7830:
URL: https://github.com/apache/arrow/pull/7830


   This is PR contains a physical plan to execute an inner join. I have not ran any benchmark, this is pure implementation plus some tests.
   
   The gist of the implementation for a given partition is:
   
   ```python
   for left_record in left_records:
        hash_left = build_hash_of_keys(left_record)
        for right_record in right_records:
               hash_right = build_hash_of_keys(right_record)
               indexes = inner_join(hash_left, hash_right)
               yield concat(left_record, right_record)[indexes]
   ```
   
   I.e. inefficient.
   
   The implementation is currently sequential, even though it can be trivially distributed as each RecordBatch is evaluated independently (we still lock the mutex on partition reading, as in other physical plans). Since we have not committed to a distributed computational model, IMO the sequential is enough for now.
   
   This PR is built on top of #7687 and #7796 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7830: ARROW-9555: [Rust] [DataFusion] Added inner join

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7830:
URL: https://github.com/apache/arrow/pull/7830#issuecomment-663710935


   https://issues.apache.org/jira/browse/ARROW-9555


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #7830: ARROW-9555: [Rust] [DataFusion] Added inner join

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #7830:
URL: https://github.com/apache/arrow/pull/7830#issuecomment-674512019


   I agree with you @andygrove that we need to revisit the partitioning before tackling this. Closing


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao closed pull request #7830: ARROW-9555: [Rust] [DataFusion] Added inner join

Posted by GitBox <gi...@apache.org>.
jorgecarleitao closed pull request #7830:
URL: https://github.com/apache/arrow/pull/7830


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org