You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2021/10/01 10:12:00 UTC

[jira] [Created] (ARROW-14190) [R] Should unify_schemas() allow change of type?

Nicola Crane created ARROW-14190:
------------------------------------

             Summary: [R] Should unify_schemas() allow change of type?
                 Key: ARROW-14190
                 URL: https://issues.apache.org/jira/browse/ARROW-14190
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Nicola Crane


Should {{unify_schemas()}} be able to do schema evolution?  If schemas with different (but compatible) types are combined using {{open_dataset()}}, this works, whereas if done via {{unify_schemas()}}, it results in an error.

See discussion here: https://github.com/apache/arrow-cookbook/pull/67#discussion_r714847220


{code:r}
library(dplyr)
library(arrow)

# Set up schemas
schema1 = schema(speed = int32(), dist = int32())
schema2 = schema(speed = float64(), dist = float64())

# Try to combine schemas via `unify_schemas()` - results in an error
unify_schemas(schema1, schema2)
## Error: Invalid: Unable to merge: Field speed has incompatible types: int32 vs double
## /home/nic2/arrow/cpp/src/arrow/type.cc:1609  fields_[i]->MergeWith(field)
## /home/nic2/arrow/cpp/src/arrow/type.cc:1672  AddField(field)
## /home/nic2/arrow/cpp/src/arrow/type.cc:1743  builder.AddSchema(schema)

# Create datasets with different schemas and read in via `open_dataset()`
cars1 <- Table$create(slice(cars, 1:25), schema = schema1)
cars2 <- Table$create(slice(cars, 26:50), schema = schema2)

td <- tempfile()
dir.create(td)

write_parquet(cars1, paste0(td, "/cars1.parquet"))
write_parquet(cars2, paste0(td, "/cars2.parquet"))

new_dataset <- open_dataset(td) 

new_dataset$schema
# Schema
# speed: int32
# dist: int32
# 
# See $metadata for additional Schema metadata
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)