You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ian Cook (Jira)" <ji...@apache.org> on 2021/02/11 01:43:00 UTC
[jira] [Comment Edited] (ARROW-11579) [R] read_feather hanging on
Windows
[ https://issues.apache.org/jira/browse/ARROW-11579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17282830#comment-17282830 ]
Ian Cook edited comment on ARROW-11579 at 2/11/21, 1:42 AM:
------------------------------------------------------------
[~claymoremarshall] Thank you for reporting this. I made two attempts to reproduce this error, both with version 3.0.0 of the arrow package installed from CRAN running in R x64 4.0.3. First I used Windows 10 running in a VM with VirtualBox on a macOS host. The error did not occur there even after repeated attempts. Next I used a laptop running Windows 10 natively, and there I was able to reproduce the issue immediately.
I also experimented with two things:
- running {{Sys.setenv(ARROW_DEFAULT_MEMORY_POOL="system")}} before loading the arrow package
- running {{gc()}} in each iteration of the loop
The hanging behavior occurred regardless.
In my tests, there was no memory starvation when the hanging occurred.
I'll keep investigating.
was (Author: icook):
[~claymoremarshall] Thank you for reporting this. I made two attempts to reproduce this error, both with version 3.0.0 of the arrow package installed from CRAN running in R x64 4.0.3. First I used Windows 10 running in a VM with VirtualBox on a macOS host. The error did not occur there even after repeated attempts. Next I used a laptop running Windows 10 natively, and there I was able to reproduce the issue immediately.
I also experimented with a two of things:
- running {{Sys.setenv(ARROW_DEFAULT_MEMORY_POOL="system")}} before loading the arrow package
- running {{gc()}} in each iteration of the loop
The hanging behavior occurred regardless.
In my tests, there was no memory starvation when the hanging occurred.
I'll keep investigating.
> [R] read_feather hanging on Windows
> -----------------------------------
>
> Key: ARROW-11579
> URL: https://issues.apache.org/jira/browse/ARROW-11579
> Project: Apache Arrow
> Issue Type: Bug
> Components: R
> Affects Versions: 3.0.0
> Environment: windows 10, R 4.0.0, arrow 3.0.0
> Reporter: Claymore Marshall
> Assignee: Ian Cook
> Priority: Major
>
> On windows 10, reading large feather objects in R seems to lead to hanging on a repeat read.
>
> This issue has been reproduced on 3 different windows machines. All running win 10, R 4.0.0 (or later).
> read_feather does not hang if using version = 1, or using uncompressed with version 2.
> This issue does not happen on tests on linux (Ubuntu 20.04 atleast)
>
> Example:
>
> library(arrow)
> m <- data.frame(x = rnorm(7e6), y = rnorm(5), b = rnorm(5), n = rnorm(5), c = c("a", "n"))
> write_feather(m, "test.feather4", version = 2, compression = "lz4") # does not hang with uncompressed, but does with lz4 and zstd
> for (j in 1:50){
> y <- read_feather("test.feather4") # hangs after an unpredictable number of reads, just on windows though
> print(paste0("feather read ", j, "..."))
> }
>
>
>
>
>
> Interestingly, a work around is to use read_feather but call just one column at a time. This does not hang so far.
>
> e.g. y returns the full data frame, and this doesn't hang on repeated reads:
>
> _y <- lapply(cols, function(col) {_
> _read_feather("test.feather4", col_select = all_of(col))_
> _})_
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)