You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/11/11 16:55:59 UTC

[jira] [Assigned] (ARROW-375) columns parameter in parquet.read_table() raises KeyError for valid column

     [ https://issues.apache.org/jira/browse/ARROW-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Wes McKinney reassigned ARROW-375:
----------------------------------

    Assignee: Wes McKinney

> columns parameter in parquet.read_table() raises KeyError for valid column
> --------------------------------------------------------------------------
>
>                 Key: ARROW-375
>                 URL: https://issues.apache.org/jira/browse/ARROW-375
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Christopher Aycock
>            Assignee: Wes McKinney
>
> Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have
> {code:none}
> In [1]: from pyarrow import parquet
> In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')
> In [3]: t.to_pandas()
> Out[3]: 
>    age name
> 0    1    A
> 1    2    B
> 2    3    C
> In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
> ---------------------------------------------------------------------------
> KeyError                                  Traceback (most recent call last)
> <ipython-input-4-5cf213819489> in <module>()
> ----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
>     143         return reader.read_all()
>     144     else:
> --> 145         column_idxs = [reader.column_name_idx(column) for column in columns]
>     146         arrays = [reader.read_column(column_idx) for column_idx in column_idxs]
>     147         return Table.from_arrays(columns, arrays)
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
>     102                 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i
>     103 
> --> 104         return self.column_idx_map[column_name]
>     105 
>     106     def read_column(self, int column_index):
> KeyError: 'age'
> {code}
> This happens on both Mac and Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)