You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jonathan Keane (Jira)" <ji...@apache.org> on 2021/02/10 14:11:00 UTC

[jira] [Updated] (ARROW-11582) [R] write_dataset fails unexpectedly

     [ https://issues.apache.org/jira/browse/ARROW-11582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Keane updated ARROW-11582:
-----------------------------------
    Summary: [R] write_dataset fails unexpectedly  (was: write_dataset fails unexpectedly)

> [R] write_dataset fails unexpectedly
> ------------------------------------
>
>                 Key: ARROW-11582
>                 URL: https://issues.apache.org/jira/browse/ARROW-11582
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 2.0.0
>         Environment: R 4.0.3, Ubuntu 20.04.
> sessionInfo():
> > sessionInfo()
> R version 4.0.3 (2020-10-10)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04 LTS
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> loaded via a namespace (and not attached):
>  [1] rstudioapi_0.13   magrittr_2.0.1    hms_0.5.3         tidyselect_1.1.0  bit_4.0.4         R6_2.5.0          rlang_0.4.9      
>  [8] dplyr_1.0.2       tools_4.0.3       R.oo_1.24.0       arrow_2.0.0       DBI_1.1.0         ellipsis_0.3.1    bit64_4.0.5      
> [15] assertthat_0.2.1  tibble_3.0.4      lifecycle_0.2.0   crayon_1.3.4      readr_1.4.0       purrr_0.3.4       arkdb_0.0.8      
> [22] duckdb_0.2.4      fs_1.5.0          vctrs_0.3.5       R.utils_2.10.1    curl_4.3          glue_1.4.2        compiler_4.0.3   
> [29] pillar_1.4.7      generics_0.1.0    R.methodsS3_1.8.1 pkgconfig_2.0.3  
>            Reporter: Carl Boettiger
>            Priority: Major
>
>  
> I'd like to use the R package interface to access data distributed in a tab-separated text file that is much larger than available RAM.  I understand that in principle this is possible using `open_datatset()` in text mode and then streaming data out to parquet via `write_dataset()`, but this strategy fails even on small text files with an unexpected error:
> Here's a minimal reproducible example.
>  
> fs::dir_create("import_dir")
> readr::write_tsv(mtcars, "import_dir/mtcars.tsv")
> ds <- arrow::open_dataset("import_dir", format="text", delim="\t")
> arrow::write_dataset(ds, "parquet_dir")
> The error I get occurs only on the last line (`write_dataset()`), saying: 
> Error in options$update(...) : attempt to apply non-function
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)