You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "James Coder (Jira)" <ji...@apache.org> on 2022/09/06 16:09:00 UTC
[jira] [Created] (ARROW-17634) pyarrow.fs import reserves large amount of memory
James Coder created ARROW-17634:
-----------------------------------
Summary: pyarrow.fs import reserves large amount of memory
Key: ARROW-17634
URL: https://issues.apache.org/jira/browse/ARROW-17634
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 9.0.0
Reporter: James Coder
It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
Test code:
```python
def memory_snapshot(label=''):
from util.System import System
rss = System.process_rss_gigabytes()
vms = _max = System.process_gigabytes()
_max = System.process_max_gigabytes()
print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
memory_snapshot()
import pyarrow
print(pyarrow.__version__)
memory_snapshot()
import pyarrow.fs
memory_snapshot()
```
8.0.0 output
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
8.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
```
9.0.0 output
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
9.0.0
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```
digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
```
before s3 initialize
Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
after s3 initialize
Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)