You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Christopher Aycock (JIRA)" <ji...@apache.org> on 2016/11/11 16:38:58 UTC
[jira] [Created] (ARROW-375) columns parameter in
parquet.read_table() raises KeyError for valid column
Christopher Aycock created ARROW-375:
----------------------------------------
Summary: columns parameter in parquet.read_table() raises KeyError for valid column
Key: ARROW-375
URL: https://issues.apache.org/jira/browse/ARROW-375
Project: Apache Arrow
Issue Type: Bug
Components: Python
Reporter: Christopher Aycock
Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have
{code:none}
In [1]: from pyarrow import parquet
In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')
In [3]: t.to_pandas()
Out[3]:
age name
0 1 A
1 2 B
2 3 C
In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-4-5cf213819489> in <module>()
----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
/Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
143 return reader.read_all()
144 else:
--> 145 column_idxs = [reader.column_name_idx(column) for column in columns]
146 arrays = [reader.read_column(column_idx) for column_idx in column_idxs]
147 return Table.from_arrays(columns, arrays)
/Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
102 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i
103
--> 104 return self.column_idx_map[column_name]
105
106 def read_column(self, int column_index):
KeyError: 'age'
{code}
This happens on both Mac and Linux.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)