You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Ying Wang (JIRA)" <ji...@apache.org> on 2018/09/10 19:19:00 UTC

[jira] [Created] (ARROW-3208) Segmentation fault when reading a Parquet partitioned dataset to a Parquet file

Ying Wang created ARROW-3208:
--------------------------------

             Summary: Segmentation fault when reading a Parquet partitioned dataset to a Parquet file
                 Key: ARROW-3208
                 URL: https://issues.apache.org/jira/browse/ARROW-3208
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.9.0
         Environment: Ubuntu 16.04 LTS; System76 Oryx Pro
            Reporter: Ying Wang


Steps to reproduce:
 # Create a partitioned dataset with the following code:

```python

import numpy as np

import pandas as pd

import pyarrow as pa

import pyarrow.parquet as pq

df = pd.DataFrame({ 'one': [-1, 10, 2.5, 100, 1000, 1, 29.2], 'two': [-1, 10, 2, 100, 1000, 1, 11], 'three': [0, 0, 0, 0, 0, 0, 0] })

table = pa.Table.from_pandas(df)

pq.write_to_dataset(table, root_path='/home/yingw787/misc/example_dataset', partition_cols=['one', 'two'])

```
 # Create a Parquet file from a PyArrow Table created from the partitioned Parquet dataset:

```python

import pyarrow.parquet as pq

table = pq.ParquetDataset('/path/to/dataset').read()

pq.write_table(table, '/path/to/example.parquet')

```

EXPECTED:
 * Successful write

GOT:
 * Segmentation fault

Issue reference on GitHub mirror: https://github.com/apache/arrow/issues/2511



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)