You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Weston Pace (Jira)" <ji...@apache.org> on 2021/06/04 17:15:00 UTC
[jira] [Issue Comment Deleted] (ARROW-12970) efficient "row
accessor" for a pyarrow Table
[ https://issues.apache.org/jira/browse/ARROW-12970?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Weston Pace updated ARROW-12970:
--------------------------------
Comment: was deleted
(was: Something like this will allow a row-major "view" into the dictionary...
{code:java}
table = {'size': [1, 2, 3], 'type': ['x', 'y', 'z'], 'ready': [True, False, True]}
class DictRowIterator:
def __init__(self, d):
self.d = d
self.keys = list(d)
if len(self.keys) == 0:
self.length = 0
else:
self.length = len(d[self.keys[0]])
self.index = -1
def __iter__(self):
return self
def __next__(self):
self.index = self.index + 1
if self.index >= self.length:
raise StopIteration
return self
def __getitem__(self, key):
if isinstance(key, str):
return self.d[key][self.index]
else:
return self.d[self.keys[key]][self.index]
def __setitem__(self, key, value):
raise Exception('DictRowIterator is read-only')
def __delitem__(self, key):
raise Exception('DictRowIterator is read-only')
for row in DictRowIterator(table):
row_type = row[1]
is_ready = row['ready']
print(f'{row_type} {is_ready}')
{code})
> efficient "row accessor" for a pyarrow Table
> --------------------------------------------
>
> Key: ARROW-12970
> URL: https://issues.apache.org/jira/browse/ARROW-12970
> Project: Apache Arrow
> Issue Type: New Feature
> Components: Python
> Reporter: Luke Higgins
> Priority: Minor
>
> It would be nice to have a nice row accessor for a Table akin to pandas.DataFrame.itertuples.
> I have a lot of code where I am converting a parquet file to pandas just to have access to the rows through iterating with itertuples. Having this ability in pyarrow natively would be a nice feature and would avoid memory copy in the pandas conversion.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)