You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Nicola Crane (Jira)" <ji...@apache.org> on 2022/11/07 14:32:00 UTC

[jira] [Created] (ARROW-18266) [R] Make it more obvious how to read in a Parquet file with a different schema to the inferred one

Nicola Crane created ARROW-18266:
------------------------------------

             Summary: [R] Make it more obvious how to read in a Parquet file with a different schema to the inferred one
                 Key: ARROW-18266
                 URL: https://issues.apache.org/jira/browse/ARROW-18266
             Project: Apache Arrow
          Issue Type: Improvement
          Components: R
            Reporter: Nicola Crane


It's not all that clear from our docs that if we want to read in a Parquet file and change the schema, we need to call the {{cast()}} method on the Table, e.g. 

{code:r}
# Write out data
data <- tibble::tibble(x = c(letters[1:5], NA), y = 1:6)
data_with_schema <- arrow_table(data, schema = schema(x = string(), y = int64()))
write_parquet(data_with_schema, "data_with_schema.parquet")

# Read in data while specifying a schema
data_in <- read_parquet("data_with_schema.parquet", as_data_frame = FALSE)	
data_in$cast(target_schema = schema(x = string(), y = int32()))
{code}

We should document this more clearly. Pehaps we could even update the code here to automatically do some of this if we pass in a schema to the {...} argument of {{read_parquet}} _and_ the returned data doesn't match the desired schema? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)