You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2022/03/17 10:00:00 UTC

[jira] [Commented] (ARROW-15957) [C++] Add option to consolidate key columns in hash join

    [ https://issues.apache.org/jira/browse/ARROW-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508099#comment-17508099 ] 

Joris Van den Bossche commented on ARROW-15957:
-----------------------------------------------

I suppose this is essentially a duplicate of ARROW-15838, or we can maybe keep that one for the R side of things.

> [C++] Add option to consolidate key columns in hash join
> --------------------------------------------------------
>
>                 Key: ARROW-15957
>                 URL: https://issues.apache.org/jira/browse/ARROW-15957
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> Currently the hash join outputs key columns from both sides.  On an outer join this can help distinguish between a row that matched but had entirely null payloads on one side and a row that didn't match on one side.
> However, that distinction is sometimes not very important and many databases will simply coalesce the key columns into one.  For example, we might get an outer join result today that looks like:
> {noformat}
> L_KEY | R_KEY | L_PAY | R_PAY
>     0       0       x       Y
>  NULL       1    NULL       Z
>     2    NULL       A    NULL
> {noformat}
> Ideally we could specify a "combine key columns" option to get a result that looks like:
> {noformat}
> KEY | L_PAY | R_PAY
>   0       x       Y
>   1    NULL       Z
>   2       A    NULL
> {noformat}
> This can be done today with an extra project step, and it isn't likely to offer much performance benefit, but from a usability perspective it would be nice if users didn't have to do this extra project step.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)