You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Todd Farmer (Jira)" <ji...@apache.org> on 2022/07/12 14:05:03 UTC

[jira] [Assigned] (ARROW-15957) [C++] Add option to consolidate key columns in hash join

     [ https://issues.apache.org/jira/browse/ARROW-15957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Todd Farmer reassigned ARROW-15957:
-----------------------------------

    Assignee:     (was: Tobias Zagorni)

This issue was last updated over 90 days ago, which may be an indication it is no longer being actively worked. To better reflect the current state, the issue is being unassigned. Please feel free to re-take assignment of the issue if it is being actively worked, or if you plan to start that work soon.

> [C++] Add option to consolidate key columns in hash join
> --------------------------------------------------------
>
>                 Key: ARROW-15957
>                 URL: https://issues.apache.org/jira/browse/ARROW-15957
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Weston Pace
>            Priority: Major
>
> Currently the hash join outputs key columns from both sides.  On an outer join this can help distinguish between a row that matched but had entirely null payloads on one side and a row that didn't match on one side.
> However, that distinction is sometimes not very important and many databases will simply coalesce the key columns into one.  For example, we might get an outer join result today that looks like:
> {noformat}
> L_KEY | R_KEY | L_PAY | R_PAY
>     0       0       x       Y
>  NULL       1    NULL       Z
>     2    NULL       A    NULL
> {noformat}
> Ideally we could specify a "combine key columns" option to get a result that looks like:
> {noformat}
> KEY | L_PAY | R_PAY
>   0       x       Y
>   1    NULL       Z
>   2       A    NULL
> {noformat}
> This can be done today with an extra project step, and it isn't likely to offer much performance benefit, but from a usability perspective it would be nice if users didn't have to do this extra project step.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)