You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pal (Jira)" <ji...@apache.org> on 2021/12/02 07:18:00 UTC
[jira] [Closed] (ARROW-14939) [R] Problem with new variables in dataset schema
[ https://issues.apache.org/jira/browse/ARROW-14939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Pal closed ARROW-14939.
-----------------------
Resolution: Resolved
> [R] Problem with new variables in dataset schema
> ------------------------------------------------
>
> Key: ARROW-14939
> URL: https://issues.apache.org/jira/browse/ARROW-14939
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 6.0.1
> Environment: RStudio Version
> --------------------------------------------------
> 1.4.1717
> Session Information
> --------------------------------------------------
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-apple-darwin17.0 (64-bit)
> Running under: macOS 12.0.1
> Matrix products: default
> LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
> locale:
> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
> attached base packages:
> [1] stats graphics grDevices utils datasets methods base
> other attached packages:
> [1] arrow_6.0.1
> loaded via a namespace (and not attached):
> [1] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.0 magrittr_2.0.1 assertthat_0.2.1 R6_2.5.1
> [7] tools_4.1.0 glue_1.5.0 bit64_4.0.5 vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
> System Information
> --------------------------------------------------
> sysname : Darwin
> release : 21.1.0
> version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64
> nodename :
> machine : x86_64
> login : root
> user : os
> effective_user : os
> Platform Information
> --------------------------------------------------
> OS.type : unix
> file.sep : /
> dynlib.ext : .so
> GUI : RStudio
> endian : little
> pkgType : mac.binary
> path.sep : :
> r_arch :
> Reporter: Pal
> Priority: Critical
>
> Hi,
> I have a problem with updating the schema in arrow::open_dataset().
> For example, let's say I have one parquet file with two columns (a and b) and another file with three columns (a and b and c). When I open this dataset, its schema will only detect columns a and b. Am I missing something ? From my previous experience, I already added new columns to some Parquet files which did not exist in other files and the new columns were automatically added to my schema, which was great.
> Hereafter you will find the code to replicate my issue :
>
> {code:java}
> df = data.frame(a= 1,
> b= 2)
> df_2 = data.frame(a = 2,
> b = 3,
> c = 4)
> write_parquet(df, "C:/Data/test2/df1.parquet")
> write_parquet(df_2, "C:/Data/test2/df2.parquet")
> ds <- arrow::open_dataset(sources = "C:/Data/test2") ; ds_cols <- data.frame(variables = ds$ schema$ names)
> ds
> {code}
>
--
This message was sent by Atlassian Jira
(v8.20.1#820001)