You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "westonpace (via GitHub)" <gi...@apache.org> on 2023/02/27 22:37:25 UTC

[GitHub] [arrow] westonpace commented on issue #34354: `to_numpy().tolist()` is significantlly faster than `.tolist()`

westonpace commented on issue #34354:
URL: https://github.com/apache/arrow/issues/34354#issuecomment-1447220076

   I'm not aware that anyone has tried particularly hard to optimize `to_pylist`.  I think the expectation at the moment is that it won't be used all that often on large lists since a python list is a very inefficient way to represent the data.
   
   However, from a glance, my guess would be that the difference is that Arrow implements `to_pylist` mostly in python:
   
   ```
       def to_pylist(self):
           """
           Convert to a list of native Python objects.
   
           Returns
           -------
           lst : list
           """
           return [x.as_py() for x in self]
   ```
   
   However, in numpy the entire `tolist` function is in C.  So in Arrow you get 500k python calls and in numpy you get one.  It should be fairly straightforward to implement the more efficient version in Arrow.  I would hope it could mostly be done in cython.  If someone is interested in taking this on I can try giving a few pointers / suggestions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org