You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "James Coder (Jira)" <ji...@apache.org> on 2022/09/06 16:09:00 UTC

[jira] [Created] (ARROW-17634) pyarrow.fs import reserves large amount of memory

James Coder created ARROW-17634:
-----------------------------------

             Summary: pyarrow.fs import reserves large amount of memory
                 Key: ARROW-17634
                 URL: https://issues.apache.org/jira/browse/ARROW-17634
             Project: Apache Arrow
          Issue Type: Bug
    Affects Versions: 9.0.0
            Reporter: James Coder


It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0



Test code:
```python

def memory_snapshot(label=''):
   from util.System import System
   rss = System.process_rss_gigabytes()
   vms = _max = System.process_gigabytes()
   _max = System.process_max_gigabytes()
   print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))

memory_snapshot()
import pyarrow
print(pyarrow.__version__)
memory_snapshot()
import pyarrow.fs
memory_snapshot()

```

8.0.0 output
```

Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
8.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
```
9.0.0 output
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
9.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```

digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
```
before s3 initialize
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
after s3 initialize
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```




--
This message was sent by Atlassian Jira
(v8.20.10#820010)