You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Neal Richardson (Jira)" <ji...@apache.org> on 2021/09/01 18:49:00 UTC

[jira] [Assigned] (ARROW-13761) [R] arrow::filter() crashes (aborts R session)

     [ https://issues.apache.org/jira/browse/ARROW-13761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Neal Richardson reassigned ARROW-13761:
---------------------------------------

    Assignee: Weston Pace

> [R] arrow::filter() crashes (aborts R session)
> ----------------------------------------------
>
>                 Key: ARROW-13761
>                 URL: https://issues.apache.org/jira/browse/ARROW-13761
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 5.0.0
>            Reporter: Carl Boettiger
>            Assignee: Weston Pace
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Arrow crashes (aborts R session) when attempting to evaluate `filter` with a `collect()` command, e.g. following arrow's dplyr vignette: https://cran.r-project.org/web/packages/arrow/vignettes/dataset.html
> ```r
> library(arrow)
> library(dplyr)
> ds <- open_dataset("nyc-taxi", partitioning = c("year", "month"))
> x <- ds %>%
>   filter(total_amount > 100, year == 2015)
> x %>% collect() # crashes R
> ```
> (Note for simplicity I downloaded only years 2009 and 2010 using the R loop you provide in the Vignette.
> I observe this behavior in a RStudio server instance on a Ubuntu 20.04 Linux server with 128 cores and 256 GB RAM.  
> Here's my sessionInfo():
> ```r
>  sessionInfo()
> R version 4.1.0 (2021-05-18)
> Platform: x86_64-pc-linux-gnu (64-bit)
> Running under: Ubuntu 20.04.2 LTS
> Matrix products: default
> BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
> locale:
>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods   base     
> other attached packages:
> [1] dplyr_1.0.7 arrow_5.0.0
> loaded via a namespace (and not attached):
>  [1] fansi_0.5.0      crayon_1.4.1     utf8_1.2.2       assertthat_0.2.1
>  [5] R6_2.5.1         DBI_1.1.1        lifecycle_1.0.0  magrittr_2.0.1  
>  [9] pillar_1.6.2     rlang_0.4.11     vctrs_0.3.8      generics_0.1.0  
> [13] ellipsis_0.3.2   tools_4.1.0      bit64_4.0.5      glue_1.4.2      
> [17] purrr_0.3.4      bit_4.0.4        compiler_4.1.0   pkgconfig_2.0.3 
> [21] tidyselect_1.1.1 tibble_3.1.3   
> ```



--
This message was sent by Atlassian Jira
(v8.3.4#803005)