You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Mikhail (Jira)" <ji...@apache.org> on 2021/07/20 13:16:00 UTC

[jira] [Updated] (ARROW-13406) [Python] pyarrow.array memory leak on large string arrays

     [ https://issues.apache.org/jira/browse/ARROW-13406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mikhail updated ARROW-13406:
----------------------------
    Environment: 
Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
Python 3.7.6

Darwin  19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
Python 3.8.6

  was:
Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux

Darwin  19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64


> [Python] pyarrow.array memory leak on large string arrays
> ---------------------------------------------------------
>
>                 Key: ARROW-13406
>                 URL: https://issues.apache.org/jira/browse/ARROW-13406
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 4.0.1
>         Environment: Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28) x86_64 GNU/Linux
> Python 3.7.6
> Darwin  19.6.0 Darwin Kernel Version 19.6.0: Mon Aug 31 22:12:52 PDT 2020; root:xnu-6153.141.2~1/RELEASE_X86_64 x86_64
> Python 3.8.6
>            Reporter: Mikhail
>            Priority: Major
>
> Starting from big array sizes (~500Mb) `pyarrow.array` constructor hangs and starts to consume memory until it's killed (by hand or by OOM).
> {code:python}
> import pyarrow as pa
> my_string = 'a' * 40
> strings = [my_string for _ in range(100_000_000)]
> pyarrow_array = pa.array(x[:50_000_000]) # this works a couple of seconds
> pyarrow_array = pa.array(x[:60_000_000]) # this hangs and consumes all free memory
> {code}
>  
> In pyarrow==3.0.0 it works seamlessly.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)