You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/08/18 12:33:00 UTC

[jira] [Created] (ARROW-13654) [C++][Parquet] Appending a FileMetaData object to itselfs explodes memory

Joris Van den Bossche created ARROW-13654:
---------------------------------------------

             Summary: [C++][Parquet] Appending a FileMetaData object to itselfs explodes memory
                 Key: ARROW-13654
                 URL: https://issues.apache.org/jira/browse/ARROW-13654
             Project: Apache Arrow
          Issue Type: Bug
          Components: C++, Parquet
            Reporter: Joris Van den Bossche


Writing a tiny parquet file, to read in its metadata (to obtain a FileMetaData object):

{code}
import pyarrow as pa
import pyarrow.parquet as pq

table = pa.table({'a': [1, 2, 3], 'b': [4, 5, 6]})
pq.write_table(table, "test_file_for_metadata.parquet")
metadata = pq.read_metadata("test_file_for_metadata.parquet")

metadata.append_row_groups(metadata)
{code}

The last line (appending the metadata object to itself) keeps running with increasing memory usage (I killed the process when it was using 10 GB).

This is not something useful to do, but still I wouldn't expect it to blow up (as one can accidentally do it; I was actually trying it in a attempt to create a large FileMetaData object). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)