You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "ASF GitHub Bot (Jira)" <ji...@apache.org> on 2021/01/01 22:47:00 UTC

[jira] [Updated] (ARROW-11095) [Python] Access pyarrow.RecordBatch column by name

     [ https://issues.apache.org/jira/browse/ARROW-11095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

ASF GitHub Bot updated ARROW-11095:
-----------------------------------
    Labels: pull-request-available  (was: )

> [Python] Access pyarrow.RecordBatch column by name
> --------------------------------------------------
>
>                 Key: ARROW-11095
>                 URL: https://issues.apache.org/jira/browse/ARROW-11095
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Will Jones
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> I propose adding support for selecting a column out of a pyarrow.RecordBatch using both __getitem__() and .field(), like we have in pyarrow.Table.
> pyarrow.RecordBatch has a pretty similar API to pyarrow.Table (e.g. both have filter and take methods and a schema), but I got tripped up on this difference. pyarrow.Table supports accessing columns by name using both __getitem__ and .field():
> {code:python}
> my_array = pa.array(range(10))
> table = pa.Table.from_arrays([my_array], names=['my_column'])
> // Both of these work on table:
> table['my_column']
> table.field('my_column')
> {code}
> Meanwhile pyarrow.RecordBatch doesn't support either of those. In fact, I had a hard time finding a way to grab a column by name from a recordbatch without first looking up the integer index.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)