You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "James Coder (Jira)" <ji...@apache.org> on 2022/12/12 18:57:00 UTC

[jira] [Resolved] (ARROW-17634) pyarrow.fs import reserves large amount of memory

     [ https://issues.apache.org/jira/browse/ARROW-17634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Coder resolved ARROW-17634.
---------------------------------
    Fix Version/s: 10.0.1
       Resolution: Fixed

> pyarrow.fs import reserves large amount of memory
> -------------------------------------------------
>
>                 Key: ARROW-17634
>                 URL: https://issues.apache.org/jira/browse/ARROW-17634
>             Project: Apache Arrow
>          Issue Type: Bug
>    Affects Versions: 9.0.0
>            Reporter: James Coder
>            Priority: Major
>             Fix For: 10.0.1
>
>
> It seems that in version 9.0.0 `import pyarrow.fs` reserves 1+ (close to 2) gigs of virtual memory, this was not present in 8.0.0
> Test code:
> {code:python}
> def memory_snapshot(label=''):
>    from util.System import System
>    rss = System.process_rss_gigabytes()
>    vms = _max = System.process_gigabytes()
>    _max = System.process_max_gigabytes()
>    print("Memory snapshot (%s); rss=%.1f vms=%.1f max=%.1f GB" % (label, rss, vms, _max))
> memory_snapshot()
> import pyarrow
> print(pyarrow.__version__)
> memory_snapshot()
> import pyarrow.fs
> memory_snapshot()
> {code}
> 8.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 8.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> {code}
> 9.0.0 output
> {code}
> Memory snapshot (); rss=0.1 vms=0.4 max=0.4 GB
> 9.0.0
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}
> digging further into what happens during import, it seems `initialize_s3` is what is the culprit.
> {code}
> before s3 initialize
> Memory snapshot (); rss=0.1 vms=0.5 max=0.6 GB
> after s3 initialize
> Memory snapshot (); rss=0.2 vms=2.2 max=2.3 GB
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)