You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2020/09/03 18:34:00 UTC

[jira] [Commented] (ARROW-9903) [R] open_dataset freezes opening feather files

    [ https://issues.apache.org/jira/browse/ARROW-9903?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190348#comment-17190348 ] 

Neal Richardson commented on ARROW-9903:
----------------------------------------

{{open_dataset()}} doesn't generally open all of the files when it is called, and it doesn't report on which files it is opening. How do you know it's freezing on different files? 

Can you share a minimal reproducible example?

> [R] open_dataset freezes opening feather files
> ----------------------------------------------
>
>                 Key: ARROW-9903
>                 URL: https://issues.apache.org/jira/browse/ARROW-9903
>             Project: Apache Arrow
>          Issue Type: Bug
>         Environment: Rstudio
>            Reporter: Sean Clement
>            Priority: Critical
>
> Session info:
> {code:java}
> // R version 4.0.2 (2020-06-22)
> Platform: x86_64-w64-mingw32/x64 (64-bit)
> Running under: Windows 10 x64 (build 19041)Matrix products: defaultlocale:
> [1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
> [3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
> [5] LC_TIME=English_United States.1252    attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     other attached packages:
>  [1] forcats_0.5.0   stringr_1.4.0   dplyr_1.0.1     purrr_0.3.4     readr_1.3.1     tidyr_1.1.1    
>  [7] tibble_3.0.3    ggplot2_3.3.2   tidyverse_1.3.0 arrow_1.0.1    loaded via a namespace (and not attached):
>  [1] Rcpp_1.0.5       cellranger_1.1.0 pillar_1.4.6     compiler_4.0.2   dbplyr_1.4.4     tools_4.0.2     
>  [7] bit_1.1-15.2     lubridate_1.7.9  jsonlite_1.7.0   lifecycle_0.2.0  gtable_0.3.0     pkgconfig_2.0.3 
> [13] rlang_0.4.7      reprex_0.3.0     cli_2.0.2        DBI_1.1.0        rstudioapi_0.11  haven_2.3.1     
> [19] withr_2.2.0      xml2_1.3.2       httr_1.4.2       fs_1.4.1         generics_0.0.2   vctrs_0.3.2     
> [25] hms_0.5.3        bit64_0.9-7      grid_4.0.2       tidyselect_1.1.0 glue_1.4.1       R6_2.4.1        
> [31] fansi_0.4.1      readxl_1.3.1     modelr_0.1.8     blob_1.2.1       magrittr_1.5     backports_1.1.7 
> [37] scales_1.1.1     ellipsis_0.3.1   rvest_0.3.5      assertthat_0.2.1 colorspace_1.4-1 stringi_1.4.6   
> [43] munsell_0.5.0    broom_0.7.0      crayon_1.3.4    
> {code}
> While cycling through and processing files using open_dataset(..., format = "feather") in R, the function hangs randomly and will not proceed to the next file. The freeze does not appear at the same file each time, additionally, the same function freezes when used one on occasion. 
> When open_dataset hangs the only way to get R to stop is using Task Manager as Rstudio becomes totally unresponsive. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)