You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Carsten Haubold (Jira)" <ji...@apache.org> on 2022/06/30 12:40:00 UTC

[jira] [Commented] (ARROW-16695) [R][C++] Extension types are not supported in joins

    [ https://issues.apache.org/jira/browse/ARROW-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17561023#comment-17561023 ] 

Carsten Haubold commented on ARROW-16695:
-----------------------------------------

This issue also affects the Python side, maybe add [Python] to the title?

> [R][C++] Extension types are not supported in joins
> ---------------------------------------------------
>
>                 Key: ARROW-16695
>                 URL: https://issues.apache.org/jira/browse/ARROW-16695
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: C++, R
>            Reporter: Dewey Dunnington
>            Priority: Major
>
> It looks like extension types are not supported in joins (even if the underlying type is supproted)! Reported by [~jonkeane] while making a demo for Arrow + Query engine + geoarrow (R package), which uses extension types liberally:
> {code:R}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> rb_non_ext <- record_batch(
>   a = 1:5, 
>   b = letters[1:5]
> )
> rb_ext_storage <- record_batch(
>   b = letters[1:5],
>   c = Array$create(list(as.raw(1:5)), type = binary())
> )
> rb_ext <- record_batch(
>   b = letters[1:5],
>   c = vctrs_extension_array(rb_ext_storage$c$as_vector())
> )
> rb_non_ext %>% 
>   left_join(rb_ext_storage) %>% 
>   collect()
> #> # A tibble: 5 × 3
> #>       a b                      c
> #>   <int> <chr>         <arrw_bnr>
> #> 1     1 a     01, 02, 03, 04, 05
> #> 2     2 b     01, 02, 03, 04, 05
> #> 3     3 c     01, 02, 03, 04, 05
> #> 4     4 d     01, 02, 03, 04, 05
> #> 5     5 e     01, 02, 03, 04, 05
> rb_non_ext %>% 
>   left_join(rb_ext) %>% 
>   collect()
> #> Error in `collect()`:
> #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key field
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121  ValidateSchemas(join_type, left_schema, left_keys, left_output, right_schema, right_keys, right_output, left_field_name_suffix, right_field_name_suffix)
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499  schema_mgr->Init( join_options.join_type, left_schema, join_options.left_keys, join_options.left_output, right_schema, join_options.right_keys, join_options.right_output, join_options.filter, join_options.output_suffix_for_left, join_options.output_suffix_for_right)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)