You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Daniël Heres (Jira)" <ji...@apache.org> on 2020/12/19 12:06:00 UTC

[jira] [Updated] (ARROW-10971) [Rust][DataFusion] Left Join implementation is wrong for multiple batches on right side

     [ https://issues.apache.org/jira/browse/ARROW-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniël Heres updated ARROW-10971:
---------------------------------
    Description: 
Currently the left join generates a null for every row that is not present in the right batch.

However, this is wrong, as there should be no math in _all_ of the right batches.

The current implementation generates extra (left, none) tuples for every batch where the left side is not present. 

To fix it, we need to mark the keys or indexes on the left side as visited and traverse the unvisited items once at the end of the hash join.

  was:
Currently the left join generates a null for every row that is not present in the right batch.

However, this is wrong, as there should be no math in _all_ of the right batches.

The current implementation generates extra (left, none) tuples for every batch where the left side is not present. 

To fix it, we need to mark the keys or indexes on the left side as visited and traverse them once at the end of the hash join.


> [Rust][DataFusion] Left Join implementation is wrong for multiple batches on right side
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-10971
>                 URL: https://issues.apache.org/jira/browse/ARROW-10971
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Daniël Heres
>            Priority: Major
>
> Currently the left join generates a null for every row that is not present in the right batch.
> However, this is wrong, as there should be no math in _all_ of the right batches.
> The current implementation generates extra (left, none) tuples for every batch where the left side is not present. 
> To fix it, we need to mark the keys or indexes on the left side as visited and traverse the unvisited items once at the end of the hash join.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)