You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Jarno Seppanen (JIRA)" <ji...@apache.org> on 2017/08/31 08:51:00 UTC

[jira] [Created] (ARROW-1440) Segmentation fault after loading parquet file to pandas dataframe

Jarno Seppanen created ARROW-1440:
-------------------------------------

             Summary: Segmentation fault after loading parquet file to pandas dataframe
                 Key: ARROW-1440
                 URL: https://issues.apache.org/jira/browse/ARROW-1440
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 0.6.0
         Environment: ubuntu 16.04.2
            Reporter: Jarno Seppanen
         Attachments: part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet

Reading the attached parquet file into pandas dataframe and then inspecting the dataframe segfaults.

{noformat}
Python 3.5.3 |Continuum Analytics, Inc.| (default, Mar  6 2017, 11:58:13) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
>>> import pyarrow
>>> import pyarrow.parquet as pq
>>> pyarrow.__version__
'0.6.0'
>>> df = pq.read_table('part-00000-6570e34b-b42c-4a39-8adf-21d3a97fb87d.snappy.parquet') \
...        .to_pandas()
>>> len(df)
69
>>> df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 69 entries, 0 to 68
Data columns (total 6 columns):
label               69 non-null int32
account_meta        69 non-null object
features_type       69 non-null int32
features_size       69 non-null int32
features_indices    1 non-null object
features_values     1 non-null object
dtypes: int32(3), object(3)
memory usage: 2.5+ KB
>>> 
>>> print(df)
Segmentation fault (core dumped)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)