You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Dewey Dunnington (Jira)" <ji...@apache.org> on 2022/05/31 16:30:00 UTC

[jira] [Created] (ARROW-16695) [R][C++] Extension types are not supported in joins

Dewey Dunnington created ARROW-16695:
----------------------------------------

             Summary: [R][C++] Extension types are not supported in joins
                 Key: ARROW-16695
                 URL: https://issues.apache.org/jira/browse/ARROW-16695
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++, R
            Reporter: Dewey Dunnington


It looks like extension types are not supported in joins (even if the underlying type is supproted)! Reported by [~jonkeane] while making a demo for Arrow + Query engine + geoarrow (R package), which uses extension types liberally:

{code:R}
library(arrow, warn.conflicts = FALSE)
library(dplyr, warn.conflicts = FALSE)

rb_non_ext <- record_batch(
  a = 1:5, 
  b = letters[1:5]
)

rb_ext_storage <- record_batch(
  b = letters[1:5],
  c = Array$create(list(as.raw(1:5)), type = binary())
)

rb_ext <- record_batch(
  b = letters[1:5],
  c = vctrs_extension_array(rb_ext_storage$c$as_vector())
)

rb_non_ext %>% 
  left_join(rb_ext_storage) %>% 
  collect()
#> # A tibble: 5 × 3
#>       a b                      c
#>   <int> <chr>         <arrw_bnr>
#> 1     1 a     01, 02, 03, 04, 05
#> 2     2 b     01, 02, 03, 04, 05
#> 3     3 c     01, 02, 03, 04, 05
#> 4     4 d     01, 02, 03, 04, 05
#> 5     5 e     01, 02, 03, 04, 05

rb_non_ext %>% 
  left_join(rb_ext) %>% 
  collect()
#> Error in `collect()`:
#> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key field
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121  ValidateSchemas(join_type, left_schema, left_keys, left_output, right_schema, right_keys, right_output, left_field_name_suffix, right_field_name_suffix)
#> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499  schema_mgr->Init( join_options.join_type, left_schema, join_options.left_keys, join_options.left_output, right_schema, join_options.right_keys, join_options.right_output, join_options.filter, join_options.output_suffix_for_left, join_options.output_suffix_for_right)
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)