You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2019/09/20 10:39:00 UTC

[jira] [Created] (ARROW-6642) [Python] chained access of ParquetDataset's metadata segfaults

Joris Van den Bossche created ARROW-6642:
--------------------------------------------

             Summary: [Python] chained access of ParquetDataset's metadata segfaults
                 Key: ARROW-6642
                 URL: https://issues.apache.org/jira/browse/ARROW-6642
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Joris Van den Bossche


Creating and reading a parquet dataset:

{code}
table = pa.table({'a': [1, 2, 3]})

import pyarrow.parquet as pq
pq.write_table(table, '__test_statistics_segfault.parquet')
dataset = pq.ParquetDataset('__test_statistics_segfault.parquet')
dataset_piece = dataset.pieces[0]
{code}

If you access the metadata and a column's statistics in steps, this works fine:

{code}
meta = dataset_piece.get_metadata()
row = meta.row_group(0)
col = row.column(0)
{code}

but doing it chained in one step, this segfaults:

{code}
dataset_piece.get_metadata().row_group(0).column(0)
{code}

{{dataset_piece.get_metadata().row_group(0)}} still works, but additionally with {{.column(0)}} then it segfaults. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)