You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/03/04 01:59:00 UTC
[jira] [Created] (ARROW-11855) [C++] [Python] Memory leak in
to_pandas when converting chunked struct array
Weston Pace created ARROW-11855:
-----------------------------------
Summary: [C++] [Python] Memory leak in to_pandas when converting chunked struct array
Key: ARROW-11855
URL: https://issues.apache.org/jira/browse/ARROW-11855
Project: Apache Arrow
Issue Type: Bug
Components: C++, Python
Reporter: Weston Pace
Assignee: Weston Pace
Reproduction from [~shadowdsp]
{code:java}
import io
import pandas as pd
import pyarrow as pa
pa.jemalloc_set_decay_ms(0)
import pyarrow.parquet as pq
from memory_profiler import profile
@profile
def read_file(f):
table = pq.read_table(f)
df = table.to_pandas(strings_to_categorical=True)
del table
del df
def main():
rows = 2000000
df = pd.DataFrame({
"string": [{"test": [1, 2], "test1": [3, 4]}] * rows,
"int": [5] * rows,
"float": [2.0] * rows,
})
table = pa.Table.from_pandas(df, preserve_index=False)
parquet_stream = io.BytesIO()
pq.write_table(table, parquet_stream)
for i in range(3):
parquet_stream.seek(0)
read_file(parquet_stream)
if __name__ == '__main__':
main()
{code}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)