You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "rok (via GitHub)" <gi...@apache.org> on 2024/02/08 00:32:35 UTC

[I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

rok opened a new issue, #39991:
URL: https://github.com/apache/arrow/issues/39991

   ### Describe the enhancement requested
   
   [FixedShapeTensorArray.to_numpy_ndarray](https://arrow.apache.org/docs/python/generated/pyarrow.FixedShapeTensorArray.html#pyarrow.FixedShapeTensorArray.to_numpy_ndarray) was introduced in https://github.com/apache/arrow/pull/33948#discussion_r1107099766 however we might want to rename it to `FixedShapeTensorArray.to_numpy` as proposed here https://github.com/apache/arrow/pull/37533#discussion_r1434206512. This would result in overriding current [FixedShapeTensorArray.to_numpy](https://arrow.apache.org/docs/python/generated/pyarrow.FixedShapeTensorArray.html#pyarrow.FixedShapeTensorArray.to_numpy).
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #39991:
URL: https://github.com/apache/arrow/issues/39991#issuecomment-1933964112

   Ah, I expect it to be a 1D array of ndarrays (to preserve the shape of each individual element). But right now it gives you 1D subarrays I assume because it just uses the storage array's `to_numpy`.
   
   ListArray and FixedSizeListArray do exactly the same, returning a 1D array of arrays.
   
   >  But a Series of Numpy arrays is quite specific and unusual, isn't it?
   
   It's not _that_ unusual because that's the only way to store that kind of data in pandas (but that's of course more due pandas' limited data type system)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #39991:
URL: https://github.com/apache/arrow/issues/39991#issuecomment-1933956627

   > > ```
   > > array([array([1, 2, 3, 4, 5, 6], dtype=int8),
   > >        array([ 7,  8,  9, 10, 11, 12], dtype=int8)], dtype=object)
   > > ```
   > 
   > 
   > In which circumstances is this a desirable behavior? It even makes a copy, right?
   
   I agree it will _typically_ not be the case, but it can still be the desirable behaviour for any context where you need a 1D array (and expect `to_numpy()` to give you that). Another example of such context is constructors like `pd.Series` that expect a 1D array.
   
   FWIW it does not make a copy. Similarly as our ListArray -> numpy convrsion, each subarray is a slice from one converted parent ndarray (of course it still needs to allocate the object-dtype 1D array that hold those zero-copy subarrays).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

Posted by "rok (via GitHub)" <gi...@apache.org>.
rok commented on issue #39991:
URL: https://github.com/apache/arrow/issues/39991#issuecomment-1977058651

   Can we reach some agreement on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

Posted by "jorisvandenbossche (via GitHub)" <gi...@apache.org>.
jorisvandenbossche commented on issue #39991:
URL: https://github.com/apache/arrow/issues/39991#issuecomment-1933937168

   I am personally in favor of keeping the current name, for the reasons I mentioned in the linked thread.
   
   The `to_numpy()` method is an existing method on all pyarrow Arrays, and from which you expect to get a 1D numpy array of the same length as the pyarrow array. 
   
   I certainly agree that the 2D array is more useful for a FixedSizeTensorArray, but changing `to_numpy()` to a 2D array just for this class would break consistency with all other types. 
   (e.g. generally I expect `pa.array(arr.to_numpy())` to work (not necessarily preserving the exact type because of limitations of numpy notwithstanding), but that would no longer be the case when we change the return value for FixedSizeTensorArray)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Python] Consider renaming FixedShapeTensorArray.to_numpy_ndarray to FixedShapeTensorArray.to_numpy [arrow]

Posted by "pitrou (via GitHub)" <gi...@apache.org>.
pitrou commented on issue #39991:
URL: https://github.com/apache/arrow/issues/39991#issuecomment-1933958431

   > I agree it will _typically_ not be the case, but it can still be the desirable behaviour for any context where you need a 1D array (and expect `to_numpy()` to give you that).
   
   It's a 1D array of 1D arrays. What good is that for exactly?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org