You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Christian (Jira)" <ji...@apache.org> on 2020/01/08 19:07:00 UTC

[jira] [Created] (ARROW-7520) Arrow / R - too many batches causes a crash

Christian created ARROW-7520:
--------------------------------

             Summary: Arrow / R - too many batches causes a crash
                 Key: ARROW-7520
                 URL: https://issues.apache.org/jira/browse/ARROW-7520
             Project: Apache Arrow
          Issue Type: Bug
          Components: R
    Affects Versions: 0.15.1
         Environment: - Session info -----------------------------------------------------------------------------------------------------------------------------------------------------------

setting  value                      

 version  R version 3.6.1 (2019-07-05)

os       Windows 10 x64              

 system   x86_64, mingw32            

 ui       RStudio                    

 language (EN)                       

 collate  English_United States.1252 

 ctype    English_United States.1252 

 tz       America/New_York           

 date     2020-01-08                 

 

- Packages ---------------------------------------------------------------------------------------------------------------------------------------------------------------

! package      * version     date       lib source                                 

   acepack        1.4.1       2016-10-29 [1] CRAN (R 3.6.1)                        

   arrow        * 0.15.1.1    2019-11-05 [1] CRAN (R 3.6.2)                        

   askpass        1.1         2019-01-13 [1] CRAN (R 3.6.1)                         

   assertthat     0.2.1       2019-03-21 [1] CRAN (R 3.6.1)                        

   backports      1.1.5       2019-10-02 [1] CRAN (R 3.6.1)                        

   base64enc      0.1-3       2015-07-28 [1] CRAN (R 3.6.0)                         

   bit            1.1-14      2018-05-29 [1] CRAN (R 3.6.0)                        

   bit64          0.9-7       2017-05-08 [1] CRAN (R 3.6.0)                        

   blob           1.2.0       2019-07-09 [1] CRAN (R 3.6.1)                         

   callr          3.3.1       2019-07-18 [1] CRAN (R 3.6.1)                        

   cellranger     1.1.0       2016-07-27 [1] CRAN (R 3.6.1)                        

   checkmate      1.9.4       2019-07-04 [1] CRAN (R 3.6.1)                         

   cli            1.1.0       2019-03-19 [1] CRAN (R 3.6.1)                        

   cluster        2.1.0       2019-06-19 [2] CRAN (R 3.6.1)                         

   codetools      0.2-16      2018-12-24 [2] CRAN (R 3.6.1)                        

   colorspace     1.4-1       2019-03-18 [1] CRAN (R 3.6.1)                        

   commonmark     1.7         2018-12-01 [1] CRAN (R 3.6.1)                         

   crayon         1.3.4       2017-09-16 [1] CRAN (R 3.6.1)                        

   credentials    1.1         2019-03-12 [1] CRAN (R 3.6.2)                        

   curl         * 4.2         2019-09-24 [1] CRAN (R 3.6.1)                         

   data.table     1.12.2      2019-04-07 [1] CRAN (R 3.6.1)                        

   DBI          * 1.0.0       2018-05-02 [1] CRAN (R 3.6.1)                        

   desc           1.2.0       2018-05-01 [1] CRAN (R 3.6.1)                         

   devtools     * 2.2.0       2019-09-07 [1] CRAN (R 3.6.1)                        

   digest         0.6.23      2019-11-23 [1] CRAN (R 3.6.1)                        

   dplyr        * 0.8.3       2019-07-04 [1] CRAN (R 3.6.1)                         

   DT             0.9         2019-09-17 [1] CRAN (R 3.6.1)                        

   ellipsis       0.3.0       2019-09-20 [1] CRAN (R 3.6.1)                        

   evaluate       0.14        2019-05-28 [1] CRAN (R 3.6.1)                         

   foreign        0.8-71      2018-07-20 [2] CRAN (R 3.6.1)                        

   Formula      * 1.2-3       2018-05-03 [1] CRAN (R 3.6.0)                        

   fs             1.3.1       2019-05-06 [1] CRAN (R 3.6.1)                         

   fst          * 0.9.0       2019-04-09 [1] CRAN (R 3.6.1)                        

   future       * 1.15.0-9000 2019-11-19 [1] Github (HenrikBengtsson/future@bc241c7)

   ggplot2      * 3.2.1       2019-08-10 [1] CRAN (R 3.6.1)                         

   globals        0.12.4      2018-10-11 [1] CRAN (R 3.6.0)                        

   glue         * 1.3.1       2019-03-12 [1] CRAN (R 3.6.1)                        

   gridExtra      2.3         2017-09-09 [1] CRAN (R 3.6.1)                         

   gt           * 0.1.0       2019-11-27 [1] Github (rstudio/gt@284bbe5)           

   gtable         0.3.0       2019-03-25 [1] CRAN (R 3.6.1)                        

   Hmisc        * 4.3-0       2019-11-07 [1] CRAN (R 3.6.1)                         

   htmlTable      1.13.2      2019-09-22 [1] CRAN (R 3.6.1)                        

 D htmltools      0.3.6.9004  2019-09-20 [1] Github (rstudio/htmltools@c49b29c)    

   htmlwidgets    1.3         2018-09-30 [1] CRAN (R 3.6.1)                         

   jsonlite     * 1.6         2018-12-07 [1] CRAN (R 3.6.1)                        

   knitr          1.25        2019-09-18 [1] CRAN (R 3.6.1)                        

   lattice      * 0.20-38     2018-11-04 [2] CRAN (R 3.6.1)                         

   latticeExtra   0.6-28      2016-02-09 [1] CRAN (R 3.6.1)                        

   lazyeval       0.2.2       2019-03-15 [1] CRAN (R 3.6.1)                        

   lifecycle      0.1.0       2019-08-01 [1] CRAN (R 3.6.1)                         

   listenv        0.7.0       2018-01-21 [1] CRAN (R 3.6.1)                        

   lubridate    * 1.7.4       2018-04-11 [1] CRAN (R 3.6.1)                        

   magrittr     * 1.5         2014-11-22 [1] CRAN (R 3.6.1)                         

   Matrix         1.2-17      2019-03-22 [2] CRAN (R 3.6.1)                        

   memoise        1.1.0       2017-04-21 [1] CRAN (R 3.6.1)                        

   munsell        0.5.0       2018-06-12 [1] CRAN (R 3.6.1)                         

   nnet           7.3-12      2016-02-02 [2] CRAN (R 3.6.1)                        

   openssl        1.4.1       2019-07-18 [1] CRAN (R 3.6.1)                        

   outliers     * 0.14        2011-01-24 [1] CRAN (R 3.6.0)                         

   pillar         1.4.2       2019-06-29 [1] CRAN (R 3.6.1)                        

   pkgbuild       1.0.5       2019-08-26 [1] CRAN (R 3.6.1)                        

   pkgconfig      2.0.2       2018-08-16 [1] CRAN (R 3.6.1)                         

   pkgload        1.0.2       2018-10-29 [1] CRAN (R 3.6.1)                        

   plyr         * 1.8.4       2016-06-08 [1] CRAN (R 3.6.1)                        

   prettyunits    1.0.2       2015-07-13 [1] CRAN (R 3.6.1)                         

   processx       3.4.1       2019-07-18 [1] CRAN (R 3.6.1)                        

   pryr         * 0.1.4       2018-02-18 [1] CRAN (R 3.6.1)                        

   ps             1.3.0       2018-12-21 [1] CRAN (R 3.6.1)                        

   purrr        * 0.3.2       2019-03-15 [1] CRAN (R 3.6.1)                        

   R6           * 2.4.1       2019-11-12 [1] CRAN (R 3.6.1)                        

   RColorBrewer   1.1-2       2014-12-07 [1] CRAN (R 3.6.0)                        

   Rcpp           1.0.3       2019-11-08 [1] CRAN (R 3.6.1)                        

   readxl       * 1.3.1       2019-03-13 [1] CRAN (R 3.6.1)                        

   remotes        2.1.0       2019-06-24 [1] CRAN (R 3.6.1)                        

   rlang        * 0.4.2       2019-11-23 [1] CRAN (R 3.6.1)                        

   rmarkdown    * 2.0.3       2019-12-19 [1] Github (rstudio/rmarkdown@26cc3b1)    

   RODBC        * 1.3-16      2019-09-03 [1] CRAN (R 3.6.1)                        

   roxygen2     * 6.1.1       2018-11-07 [1] CRAN (R 3.6.1)                        

   rpart          4.1-15      2019-04-12 [2] CRAN (R 3.6.1)                        

   rprojroot      1.3-2       2018-01-03 [1] CRAN (R 3.6.1)                        

   RSQLite      * 2.1.2       2019-07-24 [1] CRAN (R 3.6.1)                        

   rstudioapi     0.10        2019-03-19 [1] CRAN (R 3.6.1)                        

   scales         1.0.0       2018-08-09 [1] CRAN (R 3.6.1)                        

   sessioninfo    1.1.1       2018-11-05 [1] CRAN (R 3.6.1)                        

   slide        * 0.0.0.9002  2019-11-27 [1] Github (DavisVaughan/slide@92e8e02)   

   ssh            0.6         2019-04-09 [1] CRAN (R 3.6.2)                        

   stringi        1.4.3       2019-03-12 [1] CRAN (R 3.6.0)                        

   stringr      * 1.4.0       2019-02-10 [1] CRAN (R 3.6.1)                        

   survival     * 2.44-1.1    2019-04-01 [2] CRAN (R 3.6.1)                        

   testthat       2.2.1       2019-07-25 [1] CRAN (R 3.6.1)                        

   tibble         2.1.3       2019-06-06 [1] CRAN (R 3.6.1)                        

   tidyr        * 1.0.0       2019-09-11 [1] CRAN (R 3.6.1)                        

   tidyselect     0.2.5       2018-10-11 [1] CRAN (R 3.6.1)                        

   usethis      * 1.5.1       2019-07-04 [1] CRAN (R 3.6.1)                        

   varhandle    * 2.0.3       2018-07-04 [1] CRAN (R 3.6.0)                        

   vctrs          0.2.0.9007  2019-11-27 [1] Github (r-lib/vctrs@945809e)          

   withr          2.1.2       2018-03-15 [1] CRAN (R 3.6.1)                        

   xfun           0.9         2019-08-21 [1] CRAN (R 3.6.1)                        

   xml2         * 1.2.2       2019-08-09 [1] CRAN (R 3.6.1)                        

   xts          * 0.11-2      2018-11-05 [1] CRAN (R 3.6.1)                        

   zoo          * 1.8-6       2019-05-28 [1] CRAN (R 3.6.1)                        

 

[1] C:/Users/cklar/Desktop/R packages

[2] C:/Program Files/R/R-3.6.1/library

 

P -- Loaded and on-disk path mismatch.

D -- DLL MD5 mismatch, broken installation.
            Reporter: Christian
             Fix For: 0.15.1


Hi,

When creating north of 200-300 batches, the writing to the arrow file crashes R - it doesn't even show an error message. Rstudio just aborts.

I have the feeling that maybe each batch becomes a stream and R has issues with the connections, but that's a total guess.

Any help would be appreciated.

 

##

 

Here is the function. When running it with 3000 it crashes immediately.

Before that I ran it with 100, and then increased it slowly, and then it randomly crashed again.

 

write_arrow_custom(data.frame(A=c(1:100000),B=c(1:100000)),'C:/Temp/test.arrow',3000)

 

write_arrow_custom <- function(df,targetarrow,nrbatches) {

  ct <- nrbatches

  idxs <- c(0:ct)/ct*nrow(df)

  idxs <- round(idxs,0) %>% as.integer()

  idxs[length(idxs)] <- nrow(df)

  df_nav <- idxs %>% as.data.frame() %>% rename(colfrom=1) %>% mutate(colto=lead(colfrom)) %>% mutate(colfrom=colfrom+1) %>% filter(!is.na(colto)) %>% mutate(R=row_number())

  stopifnot(df_nav %>% mutate(chk=colto-colfrom+1) %>% '$'('chk') %>% sum()==nrow(df))

  table_df <- Table$create(name=rownames(df[1,]),df[1,])

  writer <- RecordBatchFileWriter$create(targetarrow,table_df$schema)

  df_nav %>% dlply(c('R'),function(df_nav){

    catl(glue('\{df_nav$colfrom[1]}:\{df_nav$colto[1]} / \{df_nav$R[1]}...'))

    tmp <- df[df_nav$colfrom[1]:df_nav$colto[1],]

    writer$write_batch(record_batch(name = rownames(tmp), tmp))

    NULL

  }) -> batch_lst

  writer$close()

  rm(batch_lst)

  gc()

}

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)