You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/04/13 20:12:00 UTC

[jira] [Updated] (ARROW-16085) [R] Support unifying schemas for InMemoryDatasets

     [ https://issues.apache.org/jira/browse/ARROW-16085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Jones updated ARROW-16085:
-------------------------------
    Fix Version/s: 9.0.0
                       (was: 8.0.0)

> [R] Support unifying schemas for InMemoryDatasets
> -------------------------------------------------
>
>                 Key: ARROW-16085
>                 URL: https://issues.apache.org/jira/browse/ARROW-16085
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>    Affects Versions: 7.0.0
>            Reporter: Will Jones
>            Priority: Major
>             Fix For: 9.0.0
>
>
>  
> The following fails:
> {code:R}
> sub_df1 <- Table$create(
>   x = Array$create(c(1, 2, 3)),
>   y = Array$create(c("a", "b", "c"))
> )
> sub_df2 <- Table$create(
>   x = Array$create(c(4, 5)),
>   z = Array$create(c("d", "e"))
> )
> ds1 <- InMemoryDataset$create(sub_df1)
> ds2 <- InMemoryDataset$create(sub_df2)
> ds <- c(ds1, ds2)
> actual <- ds %>% collect()
> {code}
> {code}
> Type error: yielded batch had schema x: double
> y: string which did not match InMemorySource's: x: double
> y: string
> z: string
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:541  child_.Next()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:152  value_.status()
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/util/iterator.h:180  maybe_element
> /Users/willjones/Documents/arrows/arrow-quick/cpp/src/arrow/dataset/scanner.cc:840  fragments_it.ToVector()
> {code}
> If we fixed this, we could implement a function that does for Tables what {{dplyr::bind_rows}} does for Tibbles:
> {code:R}
> concat_tables <- function(..., schema = NULL) {
>   tables <- list2(...)
>   dataset <- open_dataset(map(tables, InMemoryDataset$create), schema = schema)
>   dplyr::collect(dataset, as_data_frame = FALSE)
> }
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)