You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pal (Jira)" <ji...@apache.org> on 2021/09/24 08:25:00 UTC

[jira] [Updated] (ARROW-13694) [R] Arrow filter crashes (R aborted session)

     [ https://issues.apache.org/jira/browse/ARROW-13694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pal updated ARROW-13694:
------------------------
    Fix Version/s:     (was: 5.0.1)
                   6.0.0

> [R] Arrow filter crashes (R aborted session)
> --------------------------------------------
>
>                 Key: ARROW-13694
>                 URL: https://issues.apache.org/jira/browse/ARROW-13694
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 5.0.0
>         Environment: RStudio Version
> --------------------------------------------------
> 1.4.1103
> Session Information
> --------------------------------------------------
> R version 4.0.4 (2021-02-15)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 18363)
> Matrix products: default
> locale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
> [5] LC_TIME=English_United States.1252    
> attached base packages:
> [1] grid      stats     graphics  grDevices utils     datasets  methods   base     
> other attached packages:
>  [1] readxl_1.3.1               RJDBC_0.2-8                rJava_1.0-4                tibbletime_0.1.6           arrow_4.0.0.1             
>  [6] rdbnomics_0.6.4            rstudioapi_0.13            scales_1.1.1               tidyquant_1.0.3            quantmod_0.4.18           
> [11] TTR_0.24.2                 PerformanceAnalytics_2.0.4 xts_0.12.1                 zoo_1.8-9                  skimr_2.1.3               
> [16] janitor_2.1.0              DBI_1.1.1                  R.utils_2.10.1             R.oo_1.24.0                R.methodsS3_1.8.1         
> [21] devtools_2.4.2             usethis_2.0.1              R.cache_0.15.0             rmarkdown_2.10             kableExtra_1.3.4          
> [26] knitr_1.33                 plotly_4.9.4.1             RColorBrewer_1.1-2         ggpubr_0.4.0               ggrepel_0.9.1             
> [31] ggExtra_0.9                haven_2.4.3                sas7bdat_0.5               data.table_1.14.0          lubridate_1.7.10          
> [36] forcats_0.5.1              stringr_1.4.0              dplyr_1.0.7                purrr_0.3.4                readr_2.0.1               
> [41] tidyr_1.1.3                tibble_3.1.3               ggplot2_3.3.5              tidyverse_1.3.1           
> loaded via a namespace (and not attached):
>  [1] colorspace_2.0-2  ggsignif_0.6.2    ellipsis_0.3.2    rio_0.5.27        rprojroot_2.0.2   snakecase_0.11.0  base64enc_0.1-3   fs_1.5.0         
>  [9] remotes_2.4.0     bit64_4.0.5       fansi_0.5.0       xml2_1.3.2        cachem_1.0.5      pkgload_1.2.1     jsonlite_1.7.2    broom_0.7.9      
> [17] dbplyr_2.1.1      shiny_1.6.0       compiler_4.0.4    httr_1.4.2        backports_1.2.1   assertthat_0.2.1  fastmap_1.1.0     lazyeval_0.2.2   
> [25] cli_3.0.1         later_1.2.0       htmltools_0.5.1.1 prettyunits_1.1.1 tools_4.0.4       gtable_0.3.0      glue_1.4.2        Rcpp_1.0.7       
> [33] carData_3.0-4     cellranger_1.1.0  vctrs_0.3.8       svglite_2.0.0     xfun_0.25         ps_1.6.0          openxlsx_4.2.4    testthat_3.0.4   
> [41] rvest_1.0.1       mime_0.11         miniUI_0.1.1.1    lifecycle_1.0.0   rstatix_0.7.0     hms_1.1.0         promises_1.2.0.1  curl_4.3.2       
> [49] memoise_2.0.0     stringi_1.7.3     desc_1.3.0        pkgbuild_1.2.0    zip_2.2.0         repr_1.1.3        rlang_0.4.11      pkgconfig_2.0.3  
> [57] systemfonts_1.0.2 lattice_0.20-41   evaluate_0.14     htmlwidgets_1.5.3 bit_4.0.4         tidyselect_1.1.1  processx_3.5.2    magrittr_2.0.1   
> [65] R6_2.5.1          generics_0.1.0    pillar_1.6.2      foreign_0.8-81    withr_2.4.2       abind_1.4-5       modelr_0.1.8      crayon_1.4.1     
> [73] car_3.0-11        Quandl_2.11.0     utf8_1.2.2        tzdb_0.1.2        callr_3.7.0       reprex_2.0.1      digest_0.6.27     webshot_0.5.2    
> [81] xtable_1.8-4      httpuv_1.6.1      munsell_0.5.0     viridisLite_0.4.0 quadprog_1.5-8    sessioninfo_1.1.1
> System Information
> --------------------------------------------------
> sysname        : Windows    
> release        : 10 x64     
> version        : build 18363
> machine        : x86-64      
> Platform Information
> --------------------------------------------------
> OS.type    : windows
> file.sep   : /
> dynlib.ext : .dll
> GUI        : RStudio
> endian     : little
> pkgType    : win.binary
> path.sep   : ;
> r_arch     : x64
> R Version
> --------------------------------------------------
> platform       : x86_64-w64-mingw32
> arch           : x86_64
> os             : mingw32
> system         : x86_64, mingw32
> status         : 
> major          : 4
> minor          : 0.4
> year           : 2021
> month          : 02
> day            : 15
> svn rev        : 80002
> language       : R
> version.string : R version 4.0.4 (2021-02-15)
> nickname       : Lost Library Book
>            Reporter: Pal
>            Priority: Critical
>             Fix For: 6.0.0
>
>
> Hi,
>  
> I encounter a fatal error with the new version of Arrow R (5.0.0) that I did not have with its older version (4.0.1). Basically, after running "open_dataset", I filter and collect the data into a dataframe; then RStudio crashes :
>  
> {code:java}
> ds <- arrow::open_dataset(sources = "XXXX", partitioning = c("XX","YY","ZZ"))
> df<- ds %>%
>  filter(year >= 2014 & year <= 2020 & type %in% c("XX", "YY") & sector == "ABC" & identifier %in% list_identifiers & type == "LE" & val == "M") %>%
>  select(period, obs_value) %>%
> collect()
> {code}
>  
> If I run the code above without "filter", I do not have any problem. I guess there is something wrong in the filtering expression.
>  
> Unfortunately, I cannot reproduce the exact code neither the problem. The dataset is very large and I did not understand the precise source of the error. Eveything I know is that my R Studio crashes and that this code worked perfectly in the older version of the package.
> Also, please note that I disabled multithreading with :
> {code:java}
> options(arrow.use_threads = FALSE){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)