You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/12/17 20:03:00 UTC
[jira] [Created] (ARROW-10953) CLONE - [R] as.data.frame.Table
crashes R with schema and no record batches
Neal Richardson created ARROW-10953:
---------------------------------------
Summary: CLONE - [R] as.data.frame.Table crashes R with schema and no record batches
Key: ARROW-10953
URL: https://issues.apache.org/jira/browse/ARROW-10953
Project: Apache Arrow
Issue Type: Bug
Components: R
Affects Versions: 2.0.0
Environment: > sessionInfo()
R version 4.0.3 (2020-10-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 10 (buster)
Matrix products: default
BLAS: /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.3.5.so
locale:
[1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bigrquery_1.3.2 bigrquerystorage_0.1.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.5 cellranger_1.1.0 pillar_1.4.6
[4] compiler_4.0.3 dbplyr_2.0.0 tools_4.0.3
[7] odbc_1.3.0 getPass_0.2-2 digest_0.6.27
[10] bit_4.0.4 gargle_0.5.0 jsonlite_1.7.1
[13] memoise_1.1.0 lifecycle_0.2.0 tibble_3.0.4
[16] pkgconfig_2.0.3 rlang_0.4.8 extraw_1.8.25
[19] DBI_1.1.0 rstudioapi_0.13 curl_4.3
[22] xml2_1.3.2 dplyr_1.0.2 httr_1.4.2
[25] askpass_1.1 fs_1.5.0 generics_0.1.0
[28] vctrs_0.3.5 hms_0.5.3 bit64_4.0.5
[31] tidyselect_1.1.0 glue_1.4.2 data.table_1.13.2
[34] R6_2.5.0 readxl_1.3.1 connect.cap_0.3.19
[37] purrr_0.3.4 blob_1.2.1 magrittr_2.0.1
[40] ellipsis_0.3.1 assertthat_0.2.1 keyring_1.1.0
[43] arrow_2.0.0.20201117 openssl_1.4.3 crayon_1.3.4
Reporter: Bruno Tremblay
Fix For: 3.0.0
Objective is to build a 0 rows data.frame using an arrow schema field definition
{code:java}
#IPC stream containing only a schema
stream<-as.raw(c(255,255,255,255,16,1,0,0,16,0,0,0,0,0,10,0,12,0,6,0,5,0,8,0,10,0,0,0,0,1,3,0,12,0,0,0,8,0,8,0,0,0,4,0,8,0,0,0,4,0,0,0,4,0,0,0,160,0,0,0,92,0,0,0,48,0,0,0,4,0,0,0,128,255,255,255,0,0,1,5,20,0,0,0,12,0,0,0,4,0,0,0,0,0,0,0,176,255,255,255,7,0,0,0,82,69,80,79,78,83,69,0,168,255,255,255,0,0,1,5,20,0,0,0,12,0,0,0,4,0,0,0,0,0,0,0,216,255,255,255,6,0,0,0,68,69,84,65,73,76,0,0,208,255,255,255,0,0,1,5,24,0,0,0,16,0,0,0,4,0,0,0,0,0,0,0,4,0,4,0,4,0,0,0,8,0,0,0,68,65,84,65,84,89,80,69,0,0,0,0,16,0,20,0,8,0,6,0,7,0,12,0,0,0,16,0,16,0,0,0,0,0,1,7,36,0,0,0,20,0,0,0,4,0,0,0,0,0,0,0,8,0,12,0,4,0,8,0,8,0,0,0,38,0,0,0,9,0,0,0,8,0,0,0,77,65,67,84,65,95,73,68,0,0,0,0,0,0,0,0))
readr <- RecordBatchStreamReader$create(stream)
readr$read_table()
# Error in Table__from_RecordBatchStreamReader(self) :
# Invalid: Must pass at least one record batch or an explicit Schema
# Now trying to be too clever
tb <- Table$create(data.frame(), schema = readr$schema)
dtf <- as.data.frame(tb)
# This will crash you R session
{code}
Tested on nightly, same behavior. It's borderline a bug / feature request, but to be a drop in replacement for some DBI methods, it needs to be able to build 0 rows data.frame with the correct class for each column.
Thank you and have a nice day.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)