You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2018/12/19 04:46:00 UTC

[jira] [Updated] (ARROW-2592) [Python] Error reading old Parquet file due to metadata backwards compatibility issue

     [ https://issues.apache.org/jira/browse/ARROW-2592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney updated ARROW-2592:
--------------------------------
    Summary: [Python] Error reading old Parquet file due to metadata backwards compatibility issue  (was: [Python] AssertionError in to_pandas())

> [Python] Error reading old Parquet file due to metadata backwards compatibility issue
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-2592
>                 URL: https://issues.apache.org/jira/browse/ARROW-2592
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0, 0.9.0, 0.10.0, 0.11.0, 0.11.1
>            Reporter: Dima Ryazanov
>            Assignee: Wes McKinney
>            Priority: Major
>              Labels: parquet
>             Fix For: 0.12.0
>
>
> Pyarrow 0.8 and 0.9 raises an AssertionError for one of the datasets I have (created using an older version of pyarrow). Repro steps:
> {{In [1]: from pyarrow.parquet import ParquetDataset}}
> {{In [2]: d = ParquetDataset(['bug.parq'])}}
> {{In [3]: t = d.read()}}
> {{In [4]: t.to_pandas()}}
> {{---------------------------------------------------------------------------}}
> {{AssertionError                            Traceback (most recent call last)}}
> {{<ipython-input-4-d17c9e2818f1> in <module>()}}
> {{----> 1 t.to_pandas()}}
> {{table.pxi in pyarrow.lib.Table.to_pandas()}}
> {{~/envs/cli3/lib/python3.6/site-packages/pyarrow/pandas_compat.py in table_to_blockmanager(options, table, memory_pool, nthreads, categories)}}
> {{    529     # There must be the same number of field names and physical names}}
> {{    530     # (fields in the arrow Table)}}
> {{--> 531     assert len(logical_index_names) == len(index_columns_set)}}
> {{    532 }}
> {{    533     # It can never be the case in a released version of pyarrow that}}
> {{AssertionError: }}
>  
> Here's the file: [https://www.dropbox.com/s/oja3khjsc5tycfh/bug.parq]
> (I was not able to attach it here due to a "missing token", whatever that means.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)