You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Christian (Jira)" <ji...@apache.org> on 2022/02/18 19:32:00 UTC
[jira] [Created] (ARROW-15730) Memory usage in R
Christian created ARROW-15730:
---------------------------------
Summary: Memory usage in R
Key: ARROW-15730
URL: https://issues.apache.org/jira/browse/ARROW-15730
Project: Apache Arrow
Issue Type: Bug
Components: R
Reporter: Christian
Fix For: 6.0.1
Hi,
I'm trying to load a ~10gb arrow file into R.
For whatever reason the memory usage blows up to ~110-120gb.
The weird thing is that when deleting the object again and running a gc() the memory usage goes down to 90gb. The delta of ~20-30gb is what I would have expected the dataframe to use up in memory (and that's also approx. what was used when running the old arrow version of 0.15.1. And it is also what R shows me when just printing the object size.)
The commands I'm running are simply:
options(arrow.use_threads=FALSE);
arrow::set_cpu_count(1); # need this - otherwise it freezes under windows
arrow::read_arrow('file.arrow5')
Is arrow reserving some resources in the background and not giving them up again? Are there some settings I need to change for this?
Is this something that is known and fixed in a newer version?
*Note* that this doesn't happen in Linux. There all the resources are freed up when calling the gc() function - not sure if it matters but there I also don't need to set the cpu count to 1.
Any help would be appreciated.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)