You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2021/01/01 03:38:00 UTC

[jira] [Created] (ARROW-11095) [Python] Access pyarrow.RecordBatch column by name

Will Jones created ARROW-11095:
----------------------------------

             Summary: [Python] Access pyarrow.RecordBatch column by name
                 Key: ARROW-11095
                 URL: https://issues.apache.org/jira/browse/ARROW-11095
             Project: Apache Arrow
          Issue Type: Improvement
          Components: Python
            Reporter: Will Jones


I propose adding support for selecting a column out of a pyarrow.RecordBatch using both __getitem__() and .field(), like we have in pyarrow.Table.

pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have filter and take methods and a schema), but I got tripped up on this difference. pyarrow.Table supports accessing columns by name using both __getitem__ and .field():
{code:python}
my_array = pa.array(range(10))
table = pa.Table.from_arrays([my_array], names=['my_column'])

// Both of these work on table:
table['my_column']
table.field('my_column')
{code}
Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had a hard time finding a way to grab a column by name from a recordbatch without first looking up the integer index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)