You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Hirokazu SUZUKI (Jira)" <ji...@apache.org> on 2022/10/21 05:07:00 UTC
[jira] [Commented] (ARROW-18091) [Ruby] Arrow::Table#join returns duplicated key columns
[ https://issues.apache.org/jira/browse/ARROW-18091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17621484#comment-17621484 ]
Hirokazu SUZUKI commented on ARROW-18091:
-----------------------------------------
I mean dplyr's join( ..., keep=FALSE) behavior.
Sorry for the poor explanation.
> [Ruby] Arrow::Table#join returns duplicated key columns
> -------------------------------------------------------
>
> Key: ARROW-18091
> URL: https://issues.apache.org/jira/browse/ARROW-18091
> Project: Apache Arrow
> Issue Type: Bug
> Components: Ruby
> Reporter: Hirokazu SUZUKI
> Priority: Major
>
> `Arrow::Table#join` returns columns with duplicate keys. Duplicate column names are acceptable in Arrow, but it is preferable to use one.
> Also with `type: :full_outer`, column data should be merged.
> table1
> =>
> #<Arrow::Table:0x7f9706109380 ptr=0x55a91a4cac10>
> KEY X
> 0 A 1
> 1 B 2
> 2 C 3
> table2
> =>
> #<Arrow::Table:0x7f970415d2c0 ptr=0x55a91a348ce0>
> KEY X
> 0 A 4
> 1 B 5
> 2 D 6
>
> Should omit `:KEY` in right
> table1.join(table2, :KEY)
> =>
> #<Arrow::Table:0x7f96fd152548 ptr=0x55a91af21110>
> KEY X KEY X
> 0 A 1 A 4
> 1 B 2 B 5
>
> Should merge `:KEY`s
> table1.join(table2, :KEY, type: :full_outer)
> =>
> #<Arrow::Table:0x7f96fd0e1550 ptr=0x55a91a1a6410>
> KEY X KEY X
> 0 A 1 A 4
> 1 B 2 B 5
> 2 C 3 (null) (null)
> 3 (null) (null) D 6
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)