You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by "Christopher Aycock (JIRA)" <ji...@apache.org> on 2016/11/11 16:38:58 UTC

[jira] [Created] (ARROW-375) columns parameter in parquet.read_table() raises KeyError for valid column

Christopher Aycock created ARROW-375:
----------------------------------------

             Summary: columns parameter in parquet.read_table() raises KeyError for valid column
                 Key: ARROW-375
                 URL: https://issues.apache.org/jira/browse/ARROW-375
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
            Reporter: Christopher Aycock


Using arrow commit 4fa7ac4 and parquet-cpp commit 0024665, I have

{code:none}
In [1]: from pyarrow import parquet

In [2]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet')

In [3]: t.to_pandas()
Out[3]: 
   age name
0    1    A
1    2    B
2    3    C

In [4]: t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-4-5cf213819489> in <module>()
----> 1 t = parquet.read_table('/Users/christophercaycock/Desktop/sample.parquet', columns=['age'])

/Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.read_table (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2693)()
    143         return reader.read_all()
    144     else:
--> 145         column_idxs = [reader.column_name_idx(column) for column in columns]
    146         arrays = [reader.read_column(column_idx) for column_idx in column_idxs]
    147         return Table.from_arrays(columns, arrays)

/Users/christophercaycock/Desktop/arrow/python/pyarrow/parquet.pyx in pyarrow.parquet.ParquetReader.column_name_idx (/Users/christophercaycock/Desktop/arrow/python/build/temp.macosx-10.6-x86_64-3.5/parquet.cxx:2232)()
    102                 self.column_idx_map[str(metadata.schema().Column(i).path().get().ToDotString())] = i
    103 
--> 104         return self.column_idx_map[column_name]
    105 
    106     def read_column(self, int column_index):

KeyError: 'age'
{code}

This happens on both Mac and Linux.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)