You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/06/04 17:15:00 UTC

[jira] [Issue Comment Deleted] (ARROW-12970) efficient "row accessor" for a pyarrow Table

     [ https://issues.apache.org/jira/browse/ARROW-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Weston Pace updated ARROW-12970:
--------------------------------
    Comment: was deleted

(was: Something like this will allow a row-major "view" into the dictionary...

 
{code:java}
table = {'size': [1, 2, 3], 'type': ['x', 'y', 'z'], 'ready': [True, False, True]}

class DictRowIterator:
    
    def __init__(self, d):
        self.d = d
        self.keys = list(d)
        if len(self.keys) == 0:
            self.length = 0
        else:
            self.length = len(d[self.keys[0]])
        self.index = -1
    
    def __iter__(self):
        return self
    
    def __next__(self):
        self.index = self.index + 1
        if self.index >= self.length:
            raise StopIteration
        return self
    
    def __getitem__(self, key):
        if isinstance(key, str):
            return self.d[key][self.index]
        else:
            return self.d[self.keys[key]][self.index]
        
    def __setitem__(self, key, value):
        raise Exception('DictRowIterator is read-only')
    
    def __delitem__(self, key):
        raise Exception('DictRowIterator is read-only')
        
        
for row in DictRowIterator(table):
    row_type = row[1]
    is_ready = row['ready']
    print(f'{row_type} {is_ready}')
{code})

> efficient "row accessor" for a pyarrow Table
> --------------------------------------------
>
>                 Key: ARROW-12970
>                 URL: https://issues.apache.org/jira/browse/ARROW-12970
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Python
>            Reporter: Luke Higgins
>            Priority: Minor
>
> It would be nice to have a nice row accessor for a Table akin to pandas.DataFrame.itertuples.
> I have a lot of code where I am converting a parquet file to pandas just to have access to the rows through iterating with itertuples.  Having this ability in pyarrow natively would be a nice feature and would avoid memory copy in the pandas conversion.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)