You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Rok Mihevc (Jira)" <ji...@apache.org> on 2022/09/03 10:16:00 UTC
[jira] [Resolved] (ARROW-16695) [R][Python][C++] Extension types are not supported in joins
[ https://issues.apache.org/jira/browse/ARROW-16695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Rok Mihevc resolved ARROW-16695.
--------------------------------
Fix Version/s: 10.0.0
Resolution: Fixed
Issue resolved by pull request 13501
[https://github.com/apache/arrow/pull/13501]
> [R][Python][C++] Extension types are not supported in joins
> -----------------------------------------------------------
>
> Key: ARROW-16695
> URL: https://issues.apache.org/jira/browse/ARROW-16695
> Project: Apache Arrow
> Issue Type: Improvement
> Components: C++, Python, R
> Reporter: Dewey Dunnington
> Assignee: Rok Mihevc
> Priority: Major
> Labels: pull-request-available
> Fix For: 10.0.0
>
> Time Spent: 3h 10m
> Remaining Estimate: 0h
>
> It looks like extension types are not supported in joins (even if the underlying type is supproted)! Reported by [~jonkeane] while making a demo for Arrow + Query engine + geoarrow (R package), which uses extension types liberally:
> {code:R}
> library(arrow, warn.conflicts = FALSE)
> library(dplyr, warn.conflicts = FALSE)
> rb_non_ext <- record_batch(
> a = 1:5,
> b = letters[1:5]
> )
> rb_ext_storage <- record_batch(
> b = letters[1:5],
> c = Array$create(list(as.raw(1:5)), type = binary())
> )
> rb_ext <- record_batch(
> b = letters[1:5],
> c = vctrs_extension_array(rb_ext_storage$c$as_vector())
> )
> rb_non_ext %>%
> left_join(rb_ext_storage) %>%
> collect()
> #> # A tibble: 5 × 3
> #> a b c
> #> <int> <chr> <arrw_bnr>
> #> 1 1 a 01, 02, 03, 04, 05
> #> 2 2 b 01, 02, 03, 04, 05
> #> 3 3 c 01, 02, 03, 04, 05
> #> 4 4 d 01, 02, 03, 04, 05
> #> 5 5 e 01, 02, 03, 04, 05
> rb_non_ext %>%
> left_join(rb_ext) %>%
> collect()
> #> Error in `collect()`:
> #> ! Invalid: Data type <arrow_binary[0]> is not supported in join non-key field
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:121 ValidateSchemas(join_type, left_schema, left_keys, left_output, right_schema, right_keys, right_output, left_field_name_suffix, right_field_name_suffix)
> #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/hash_join_node.cc:499 schema_mgr->Init( join_options.join_type, left_schema, join_options.left_keys, join_options.left_output, right_schema, join_options.right_keys, join_options.right_output, join_options.filter, join_options.output_suffix_for_left, join_options.output_suffix_for_right)
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)