You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/06/04 15:24:00 UTC

[jira] [Comment Edited] (ARROW-12970) efficient "row accessor" for a pyarrow Table

    [ https://issues.apache.org/jira/browse/ARROW-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357421#comment-17357421 ] 

Antoine Pitrou edited comment on ARROW-12970 at 6/4/21, 3:23 PM:
-----------------------------------------------------------------

Ironically, an efficient solution might be simply to convert Table slices and iterate each slice. Something like (untested):
{code:python}
def itertuples(table):
    chunk_size = 1024
    for i in range(0, table.num_rows, chunk_size):
        rows = table[i:i + chunk_size].to_pydict().values()
        yield from map(tuple, rows)
{code}

cc [~jorisvandenbossche]



was (Author: pitrou):
Ironically, an efficient solution might be simply to convert Table slices and iterate each slice. Something like (untested):
{code:python}
def itertuples(table):
    chunk_size = 1024
    for i in range(0, table.num_rows(), chunk_size):
        rows = table[i:i + chunk_size].values()
        yield from map(tuple, rows)
{code}

cc [~jorisvandenbossche]


> efficient "row accessor" for a pyarrow Table
> --------------------------------------------
>
>                 Key: ARROW-12970
>                 URL: https://issues.apache.org/jira/browse/ARROW-12970
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Luke Higgins
>            Priority: Minor
>
> It would be nice to have a nice row accessor for a Table akin to pandas.DataFrame.itertuples.
> I have a lot of code where I am converting a parquet file to pandas just to have access to the rows through iterating with itertuples.  Having this ability in pyarrow natively would be a nice feature and would avoid memory copy in the pandas conversion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)