You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/21 20:20:03 UTC

[GitHub] [arrow] Dandandan opened a new pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Dandandan opened a new pull request #8983:
URL: https://github.com/apache/arrow/pull/8983


   Current PR shows failing unit test with multiple batches.
   Here "3,7,9,NULL,NULL" is appearing twice, it should be there only once.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-749183943


   https://issues.apache.org/jira/browse/ARROW-10971


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-778610619


   @alamb status is that it this unsolved as of now. Closing for now


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
Dandandan commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-749744053


   @jorgecarleitao 
   
   Do you maybe have an idea in which ways this could be solved?
   The _most efficient_ way I think / read about would to keep a boolean / bit vector per element or key on the left and just scan / filter the ones at the end that are not marked and produce those extra rows.
   
   Maybe an easier intermediate solution would be to iterate over the batches and update a  `Hashset<Vec<u8>>` or something like currently is done _per batch_.
   
   But I'm not sure how both options fit in the current design (with `SendableRecordBatchStream`, poll next, etc).
   
   So if you have some hints would be very appreciated :) 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] Dandandan closed pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
Dandandan closed pull request #8983:
URL: https://github.com/apache/arrow/pull/8983


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] alamb commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
alamb commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-778610356


   @Dandandan 
   What is the status of this PR?
   
   As part of trying to clean up the backlog of Rust PRs in this repo, I am going  through seemingly stale PRs and pinging the authors to see if there are any plans to continue the work or conversation.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] jorgecarleitao commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Posted by GitBox <gi...@apache.org>.
jorgecarleitao commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-751299776


   Hi @Dandandan . My first impression is that we need a `Arc<Mutex` to share the visited keys and skip them after the first observation. I think we need to pass that state as mutable to `build_join_indexes` and update it there.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org