You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/06/04 15:24:00 UTC
[jira] [Comment Edited] (ARROW-12970) efficient "row accessor" for
a pyarrow Table
[ https://issues.apache.org/jira/browse/ARROW-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357421#comment-17357421 ]
Antoine Pitrou edited comment on ARROW-12970 at 6/4/21, 3:23 PM:
-----------------------------------------------------------------
Ironically, an efficient solution might be simply to convert Table slices and iterate each slice. Something like (untested):
{code:python}
def itertuples(table):
chunk_size = 1024
for i in range(0, table.num_rows, chunk_size):
rows = table[i:i + chunk_size].to_pydict().values()
yield from map(tuple, rows)
{code}
cc [~jorisvandenbossche]
was (Author: pitrou):
Ironically, an efficient solution might be simply to convert Table slices and iterate each slice. Something like (untested):
{code:python}
def itertuples(table):
chunk_size = 1024
for i in range(0, table.num_rows(), chunk_size):
rows = table[i:i + chunk_size].values()
yield from map(tuple, rows)
{code}
cc [~jorisvandenbossche]
> efficient "row accessor" for a pyarrow Table
> --------------------------------------------
>
> Key: ARROW-12970
> URL: https://issues.apache.org/jira/browse/ARROW-12970
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Luke Higgins
> Priority: Minor
>
> It would be nice to have a nice row accessor for a Table akin to pandas.DataFrame.itertuples.
> I have a lot of code where I am converting a parquet file to pandas just to have access to the rows through iterating with itertuples. Having this ability in pyarrow natively would be a nice feature and would avoid memory copy in the pandas conversion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)