You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Claymore Marshall (Jira)" <ji...@apache.org> on 2021/02/10 01:39:00 UTC

[jira] [Updated] (ARROW-11579) R's arrow::read_feather hanging on repeat reads of large objects

     [ https://issues.apache.org/jira/browse/ARROW-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Claymore Marshall updated ARROW-11579:
--------------------------------------
    Description: 
On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read.

 

This issue has been reproduced on 3 different windows machines.  All running win 10, R 4.0.0 (or later).

read_feather does not hang if using version = 1, or using uncompressed with version 2.

This issue does not happen on tests on linux (Ubuntu 20.04 atleast)

 

Example:

 

_library(arrow)_

_m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n"))_

_write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd_

_for (j in 1:50)_

_{   y <- read_feather("test.feather4")  # hangs after an unpredictable number of reads, just on windows though   print(paste0("feather read ", j, "...")) }_

 

 

 

 

Interestingly, a work around is to use read_feather but call just one column at a time.  This does not hang so far.

 

e.g. y returns the full data frame, and this doesn't hang on repeated reads:

 

_y <- lapply(cols, function(col) {_

_read_feather(..., col_select = all_of(col))_

_})_

 

  was:
On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read.

 

This issue has been reproduced on 3 different windows machines.  All running win 10, R 4.0.0 (or later).

read_feather does not hang if using version = 1, or using uncompressed with version 2.

This issue does not happen on tests on linux (Ubuntu 20.04 atleast)

 

Example:

 

library(arrow)

m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n"))

write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd

for (j in 1:50) {
  y <- read_feather("test.feather4")  # hangs after an unpredictable number of reads, just on windows though
  print(paste0("feather read ", j, "..."))
}


> R's arrow::read_feather hanging on repeat reads of large objects
> ----------------------------------------------------------------
>
>                 Key: ARROW-11579
>                 URL: https://issues.apache.org/jira/browse/ARROW-11579
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: R
>    Affects Versions: 3.0.0
>         Environment: windows 10, R 4.0.0, arrow 3.0.0
>            Reporter: Claymore Marshall
>            Priority: Major
>
> On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read.
>  
> This issue has been reproduced on 3 different windows machines.  All running win 10, R 4.0.0 (or later).
> read_feather does not hang if using version = 1, or using uncompressed with version 2.
> This issue does not happen on tests on linux (Ubuntu 20.04 atleast)
>  
> Example:
>  
> _library(arrow)_
> _m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n"))_
> _write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd_
> _for (j in 1:50)_
> _{   y <- read_feather("test.feather4")  # hangs after an unpredictable number of reads, just on windows though   print(paste0("feather read ", j, "...")) }_
>  
>  
>  
>  
> Interestingly, a work around is to use read_feather but call just one column at a time.  This does not hang so far.
>  
> e.g. y returns the full data frame, and this doesn't hang on repeated reads:
>  
> _y <- lapply(cols, function(col) {_
> _read_feather(..., col_select = all_of(col))_
> _})_
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)