You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "James Coder (Jira)" <ji...@apache.org> on 2022/09/06 16:11:00 UTC
[jira] [Updated] (ARROW-17634) pyarrow.fs import reserves large amount of memory
[ https://issues.apache.org/jira/browse/ARROW-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
James Coder updated ARROW-17634:
--------------------------------
Description:
It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
Test code:
{code:python}
def memory_snapshot(label=''):
from util.System import System
rss = System.process_rss_gigabytes()
vms = _max = System.process_gigabytes()
_max = System.process_max_gigabytes()
print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
memory_snapshot()
import pyarrow
print(pyarrow.__version__)
memory_snapshot()
import pyarrow.fs
memory_snapshot()
{code}
8.0.0 output
{code}
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
8.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
{code}
9.0.0 output
{code}
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
9.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
{code}
digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
{code}
before s3 initialize
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
after s3 initialize
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
{code}
was:
It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
Test code:
```python
def memory_snapshot(label=''):
from util.System import System
rss = System.process_rss_gigabytes()
vms = _max = System.process_gigabytes()
_max = System.process_max_gigabytes()
print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
memory_snapshot()
import pyarrow
print(pyarrow.__version__)
memory_snapshot()
import pyarrow.fs
memory_snapshot()
```
8.0.0 output
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
8.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
```
9.0.0 output
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
9.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```
digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
```
before s3 initialize
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
after s3 initialize
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```
> pyarrow.fs import reserves large amount of memory
> -------------------------------------------------
>
> Key: ARROW-17634
> URL: https://issues.apache.org/jira/browse/ARROW-17634
> Project: Apache Arrow
> Issue Type: Bug
> Affects Versions: 9.0.0
> Reporter: James Coder
> Priority: Major
>
> It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
> Test code:
> {code:python}
> def memory_snapshot(label=''):
> from util.System import System
> rss = System.process_rss_gigabytes()
> vms = _max = System.process_gigabytes()
> _max = System.process_max_gigabytes()
> print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
> memory_snapshot()
> import pyarrow
> print(pyarrow.__version__)
> memory_snapshot()
> import pyarrow.fs
> memory_snapshot()
> {code}
> 8.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 8.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> {code}
> 9.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 9.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}
> digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
> {code}
> before s3 initialize
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> after s3 initialize
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)