You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/12/22 19:49:54 UTC

[GitHub] [arrow] Dandandan commented on pull request #8983: ARROW-10971: [Rust][DataFusion] WIP Left join with multiple batches

Dandandan commented on pull request #8983:
URL: https://github.com/apache/arrow/pull/8983#issuecomment-749744053


   @jorgecarleitao 
   
   Do you maybe have an idea in which ways this could be solved?
   The _most efficient_ way I think / read about would to keep a boolean / bit vector per element or key on the left and just scan / filter the ones at the end that are not marked and produce those extra rows.
   
   Maybe an easier intermediate solution would be to iterate over the batches and update a  `Hashset<Vec<u8>>` or something like currently is done _per batch_.
   
   But I'm not sure how both options fit in the current design (with `SendableRecordBatchStream`, poll next, etc).
   
   So if you have some hints would be very appreciated :) 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org