You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/05/24 08:14:02 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #35622: [Python] Fixed size lists of numeric types without nulls could be converted to numpy with zero-copy

jorisvandenbossche commented on issue #35622:
URL: https://github.com/apache/arrow/issues/35622#issuecomment-1560654180

   The problem is that `to_numpy()` for a fixed size list array doesn't give you this flat (or nd) array of the values, but an object dtype array of sub-arrays:
   
   ```python
   >>> data.to_numpy(zero_copy_only=False)
   array([array([1, 2]), array([3, 4]), array([5, 6])], dtype=object)
   ```
   
   So while it is true that in case of numeric type without missing values, the underlying values are being converted zero copy, and the sub-arrays in the arrays above are zero-copy slices of this converted array, but the actual object-dtype numpy array that is returned in the snippet above is still a newly allocated array. 
   So it is a bit ambiguous here what "zero copy" would mean exactly.
   
   > But if I work with buffers directly, I can easily get it to work:
   
   Sidenote, there is actually another API to directly get this numpy array, without having to go through the buffers manually:
   
   ```python
   # the underlying values as a pyarrow array (only beware that this could contain "garbage" values in case of nulls)
   >>> data.values
   <pyarrow.lib.Int64Array object at 0x7f41b7682aa0>
   [
     1,
     2,
     3,
     4,
     5,
     6
   ]
   # in case of no missing values, this can be converted zero-copy to numpy
   >>> data.values.to_numpy()
   array([1, 2, 3, 4, 5, 6])
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org