You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Wes McKinney (JIRA)" <ji...@apache.org> on 2016/11/11 16:55:59 UTC
[jira] [Assigned] (ARROW-375) columns parameter in
parquet.read_table() raises KeyError for valid column
[ https://issues.apache.org/jira/browse/ARROW-375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney reassigned ARROW-375:
----------------------------------
Assignee: Wes McKinney
> columns parameter in parquet.read_table() raises KeyError for valid column
> --------------------------------------------------------------------------
>
> Key: ARROW-375
> URL: https://issues.apache.org/jira/browse/ARROW-375
> Project: Apache Arrow
> Issue Type: Bug
> Components: Python
> Reporter: Christopher Aycock
> Assignee: Wes McKinney
>
> Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have
> {code:none}
> In [1]: from pyarrow import parquet
> In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')
> In [3]: t.to_pandas()
> Out[3]:
> age name
> 0 1 A
> 1 2 B
> 2 3 C
> In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
> ---------------------------------------------------------------------------
> KeyError Traceback (most recent call last)
> <ipython-input-4-5cf213819489> in <module>()
> ----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
> 143 return reader.read_all()
> 144 else:
> --> 145 column_idxs = [reader.column_name_idx(column) for column in columns]
> 146 arrays = [reader.read_column(column_idx) for column_idx in column_idxs]
> 147 return Table.from_arrays(columns, arrays)
> /Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
> 102 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i
> 103
> --> 104 return self.column_idx_map[column_name]
> 105
> 106 def read_column(self, int column_index):
> KeyError: 'age'
> {code}
> This happens on both Mac and Linux.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)