You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2021/10/01 13:13:00 UTC

[jira] [Closed] (ARROW-14190) [R] Should unify_schemas() allow change of type?

     [ https://issues.apache.org/jira/browse/ARROW-14190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nicola Crane closed ARROW-14190.
--------------------------------
    Resolution: Not A Problem

> [R] Should unify_schemas() allow change of type?
> ------------------------------------------------
>
>                 Key: ARROW-14190
>                 URL: https://issues.apache.org/jira/browse/ARROW-14190
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: R
>            Reporter: Nicola Crane
>            Priority: Major
>
> Should {{unify_schemas()}} be able to do schema evolution?  If schemas with different (but compatible) types are combined using {{open_dataset()}}, this works, whereas if done via {{unify_schemas()}}, it results in an error.
> See discussion here: https://github.com/apache/arrow-cookbook/pull/67#discussion_r714847220
> {code:r}
> library(dplyr)
> library(arrow)
> # Set up schemas
> schema1 = schema(speed = int32(), dist = int32())
> schema2 = schema(speed = float64(), dist = float64())
> # Try to combine schemas via `unify_schemas()` - results in an error
> unify_schemas(schema1, schema2)
> ## Error: Invalid: Unable to merge: Field speed has incompatible types: int32 vs double
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1609  fields_[i]->MergeWith(field)
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1672  AddField(field)
> ## /home/nic2/arrow/cpp/src/arrow/type.cc:1743  builder.AddSchema(schema)
> # Create datasets with different schemas and read in via `open_dataset()`
> cars1 <- Table$create(slice(cars, 1:25), schema = schema1)
> cars2 <- Table$create(slice(cars, 26:50), schema = schema2)
> td <- tempfile()
> dir.create(td)
> write_parquet(cars1, paste0(td, "/cars1.parquet"))
> write_parquet(cars2, paste0(td, "/cars2.parquet"))
> new_dataset <- open_dataset(td) 
> new_dataset$schema
> # Schema
> # speed: int32
> # dist: int32
> # 
> # See $metadata for additional Schema metadata
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)