You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "James Coder (Jira)" <ji...@apache.org> on 2022/12/12 18:56:00 UTC

[jira] [Commented] (ARROW-17634) pyarrow.fs import reserves large amount of memory

    [ https://issues.apache.org/jira/browse/ARROW-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17646282#comment-17646282 ] 

James Coder commented on ARROW-17634:
-------------------------------------

This seems to be resolved in 10.0.1
```
Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
10.0.1
Memory snapshot (); rss=0.1 vms=0.5 max=0.5 GB
Memory snapshot (); rss=0.1 vms=0.5 max=0.5 GB
```

> pyarrow.fs import reserves large amount of memory
> -------------------------------------------------
>
>                 Key: ARROW-17634
>                 URL: https://issues.apache.org/jira/browse/ARROW-17634
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 9.0.0
>            Reporter: James Coder
>            Priority: Major
>
> It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
> Test code:
> {code:python}
> def memory_snapshot(label=''):
>    from util.System import System
>    rss = System.process_rss_gigabytes()
>    vms = _max = System.process_gigabytes()
>    _max = System.process_max_gigabytes()
>    print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
> memory_snapshot()
> import pyarrow
> print(pyarrow.__version__)
> memory_snapshot()
> import pyarrow.fs
> memory_snapshot()
> {code}
> 8.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 8.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> {code}
> 9.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 9.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}
> digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
> {code}
> before s3 initialize
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> after s3 initialize
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)