You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pal (Jira)" <ji...@apache.org> on 2021/12/01 11:46:00 UTC
[jira] [Created] (ARROW-14939) [R] Problem with new variables in dataset schema
Pal created ARROW-14939:
---------------------------
Summary: [R] Problem with new variables in dataset schema
Key: ARROW-14939
URL: https://issues.apache.org/jira/browse/ARROW-14939
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 6.0.1
Environment:
RStudio Version
--------------------------------------------------
1.4.1717
Session Information
--------------------------------------------------
R version 4.1.0 (2021-05-18)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS 12.0.1
Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] arrow_6.0.1
loaded via a namespace (and not attached):
[1] tidyselect_1.1.1 bit_4.0.4 compiler_4.1.0 magrittr_2.0.1 assertthat_0.2.1 R6_2.5.1
[7] tools_4.1.0 glue_1.5.0 bit64_4.0.5 vctrs_0.3.8 rlang_0.4.12 purrr_0.3.4
System Information
--------------------------------------------------
sysname : Darwin
release : 21.1.0
version : Darwin Kernel Version 21.1.0: Wed Oct 13 17:33:23 PDT 2021; root:xnu-8019.41.5~1/RELEASE_X86_64
nodename :
machine : x86_64
login : root
user : os
effective_user : os
Platform Information
--------------------------------------------------
OS.type : unix
file.sep : /
dynlib.ext : .so
GUI : RStudio
endian : little
pkgType : mac.binary
path.sep : :
r_arch :
Reporter: Pal
Hi,
I have a problem with updating the schema in arrow::open_dataset().
For example, let's say I have one parquet file with two columns (a and b) and another file with three columns (a and b and c). When I open this dataset, its schema will only detect columns a and b. Am I missing something ? From my previous experience, I already added new columns to some Parquet files which did not exist in other files and the new columns were automatically added to my schema, which was great.
Hereafter you will find the code to replicate my issue :
{code:java}
df = data.frame(a= 1,
b= 2)
df_2 = data.frame(a = 2,
b = 3,
c = 4)
write_parquet(df, "C:/Data/test2/df1.parquet")
write_parquet(df_2, "C:/Data/test2/df2.parquet")
ds <- arrow::open_dataset(sources = "C:/Data/test2") ; ds_cols <- data.frame(variables = ds$ schema$ names)
ds
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)