You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Daniël Heres (Jira)" <ji...@apache.org> on 2021/05/02 12:34:00 UTC

[jira] [Closed] (ARROW-10971) [Rust][DataFusion] Left Join implementation is wrong for multiple batches on right side

     [ https://issues.apache.org/jira/browse/ARROW-10971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniël Heres closed ARROW-10971.
--------------------------------
    Resolution: Duplicate

moved to https://github.com/apache/arrow-datafusion/issues/239

> [Rust][DataFusion] Left Join implementation is wrong for multiple batches on right side
> ---------------------------------------------------------------------------------------
>
>                 Key: ARROW-10971
>                 URL: https://issues.apache.org/jira/browse/ARROW-10971
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Daniël Heres
>            Priority: Blocker
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Currently the left join generates a null for every row that is not present in the right batch.
> However, this is wrong, as there should be no match in _all_ of the right batches.
> The current implementation generates extra (left, none) tuples for every batch when there is no match against a left key.
> To fix it, we need to mark the keys or indexes on the left side as visited and scan the items once at the end to generate the rows without any match. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)