You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ziheng Wang (Jira)" <ji...@apache.org> on 2022/03/27 03:33:00 UTC
[jira] [Created] (ARROW-16037) Possible memory leak in compute.take
Ziheng Wang created ARROW-16037:
-----------------------------------
Summary: Possible memory leak in compute.take
Key: ARROW-16037
URL: https://issues.apache.org/jira/browse/ARROW-16037
Project: Apache Arrow
Issue Type: Bug
Affects Versions: 6.0.1
Environment: Ubuntu
Reporter: Ziheng Wang
If you run the following code, the memory usage of the process goes up to 1GB even though the pyarrow allocated bytes is always at ~80MB. The process memory comes down after a while to 800 MB, but is still way more than what is necessary.
'''
import pyarrow as pa
import numpy as np
import pandas as pd
import os, psutil
import pyarrow.compute as compute
import gc
my_table = pa.Table.from_pandas(pd.DataFrame(np.random.normal(size=(10000,1000))))
process = psutil.Process(os.getpid())
print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
for i in range(100):
print("mem usage", process.memory_info().rss, pa.total_allocated_bytes())
temp = compute.sort_indices(my_table['0'], sort_keys = [('0','ascending')])
my_table = my_table.take(temp)
gc.collect()
'''
--
This message was sent by Atlassian Jira
(v8.20.1#820001)