You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Stephanie Hazlitt (Jira)" <ji...@apache.org> on 2022/02/18 19:38:00 UTC

[jira] [Created] (ARROW-15731) Enable joins when data contains a list column

Stephanie Hazlitt created ARROW-15731:
-----------------------------------------

             Summary:  Enable joins when data contains a list column
                 Key: ARROW-15731
                 URL: https://issues.apache.org/jira/browse/ARROW-15731
             Project: Apache Arrow
          Issue Type: Improvement
          Components: C++
            Reporter: Stephanie Hazlitt


Currently Arrow joins with data that contain a list column errors, even when the list column is not a join key:



``` r
library(arrow)
#> 
#> Attaching package: 'arrow'
#> The following object is masked from 'package:utils':
#> 
#>     timestamp
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

jedi <- data.frame(name = c("C-3PO", "Luke Skywalker"),
                   jedi = c(FALSE, TRUE))

arrow_table(starwars) %>%
  left_join(jedi) %>%
  collect()
#> Error in `handle_csv_read_error()`:
#> ! Invalid: Data type list<item: string> is not supported in join non-key field
```

The ability to join would be a useful enhancement for workflows with tabular data where list columns can be common, and for geospatial workflows where geometry columns are stored as `list` or `fixed_size_list` (thanks [~paleolimbot] for mentioning that use case).

Related discussion here: https://issues.apache.org/jira/browse/ARROW-14519

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)