You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "jorisvandenbossche (via GitHub)" <gi...@apache.org> on 2023/03/23 11:42:29 UTC

[GitHub] [arrow] jorisvandenbossche commented on issue #29892: [Python] List array conversion to Numpy N-d array

jorisvandenbossche commented on issue #29892:
URL: https://github.com/apache/arrow/issues/29892#issuecomment-1481040252

   To be explicit, there is no "internal" fix to be done, as this conversion is already possible zero copy with preserving the dtype, _if_ you convert the flat values (i.e. what Antoine showed above):
   
   ```
   >>> a = pa.array([[1,2,3], [4,5,6]])
   >>> a.flatten().to_numpy()
   array([1, 2, 3, 4, 5, 6])
   >>> a.flatten().to_numpy().reshape(2, 3)
   array([[1, 2, 3],
          [4, 5, 6]])
   ```
   
   But so it is more a question about what user facing API we provide for this. Do we expect the user to do this themselves, or do we want to add some "to_numpy_2d" method to FixedSizeListArray that does that for you? 
   The existing `to_numpy` cannot do this, because this method is expected to give you a 1D array of the same length as the pyarrow array. I personally would lean towards letting the user do this themselves, since this is relatively straightforward to do and then you have full control (a method to get a 2D array would also get messy if you have a list array with multiple levels of nesting). So regarding the original topic, I would tend to close this issue.
   
   But @westonpace makes a good point that the FixedShapeTensorArray extension type that is being added might be interesting, depending on your exact use case. The pyarrow API for that still needs to be finalized and merged, but we were planning to add a `to_numpy_array` method (or some other name) that gives you the actual underlying array zero-copy as a N-d array. See the examples in the documentation that is being added in https://github.com/apache/arrow/pull/33948
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org