You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/05/02 11:31:08 UTC

[GitHub] [arrow-datafusion] Dandandan opened a new issue #239: Left join implementation is incorrect for 0 or multiple batches on the right side

Dandandan opened a new issue #239:
URL: https://github.com/apache/arrow-datafusion/issues/239


   **Describe the bug**
   Currently the left join generates a null for every row that is not present in the right batch.
   
   However, this is wrong, as there should be no match in all of the right batches.
   
   The current implementation generates extra (left, none) tuples for every batch when there is no match against a left key, and generates no left-side rows if the right side is empty.
   
   To fix it, we need to mark the keys or indexes on the left side as visited and scan the items once at the end to generate the rows without any match. 
   
   **To Reproduce**
   
   Run LEFT join against more 0 or multiple batches. 
   
   **Expected behavior**
   The left-side rows are included only once if there is no match.
   
   **Additional context**
   Add any other context about the problem here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] Dandandan closed issue #239: Left join implementation is incorrect for 0 or multiple batches on the right side

Posted by GitBox <gi...@apache.org>.
Dandandan closed issue #239:
URL: https://github.com/apache/arrow-datafusion/issues/239


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org