You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Michael <mi...@gmail.com> on 2022/07/17 19:01:43 UTC

Array.to_numpy still experimental?

Hi!

The to_numpy
<https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.to_numpy>
method on the Array class and subclasses is marked as experimental in the
documentation. Is that still the case? In particular I'm most interested in
what would be the current recommended way of converting a TimestampArray or
Date32Array to a numpy datetime64 array. Going through to_pandas
<https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.to_pandas>
isn't ideal as there might be values that are supported in Arrow and numpy
but are outside of the range supported by pandas nanosecond resolution
Timestamp.

I did a quick search on Jira and I found this old resolved issue
<https://issues.apache.org/jira/browse/ARROW-6749> which mentions you can
just use np.array(arr) where arr is a Timestamp('us') and that seems to
work. Would that be recommended over to_numpy or are they doing the same
thing?

Thanks!
Michael

Re: Array.to_numpy still experimental?

Posted by Joris Van den Bossche <jo...@gmail.com>.
Hi Michael,

I think it is time to remove that experimental label in the documentation,
as this method should have been stable for many releases now.

For converting Timestamps, you can indeed use to_numpy to get datetime64
values to avoid converting to nanoseconds (as to_pandas will do).
`np.array(arr)` or `arr.to_numpy()` are basically equivalent (except that
np.array(..) will use `zero_copy_only=False` under the hood, while the
default for that keyword is True in to_numpy).

Best,
Joris

On Sun, 17 Jul 2022 at 21:02, Michael <mi...@gmail.com>
wrote:

> Hi!
>
> The to_numpy
> <https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.to_numpy>
> method on the Array class and subclasses is marked as experimental in the
> documentation. Is that still the case? In particular I'm most interested in
> what would be the current recommended way of converting a TimestampArray or
> Date32Array to a numpy datetime64 array. Going through to_pandas
> <https://arrow.apache.org/docs/python/generated/pyarrow.Array.html#pyarrow.Array.to_pandas>
> isn't ideal as there might be values that are supported in Arrow and numpy
> but are outside of the range supported by pandas nanosecond resolution
> Timestamp.
>
> I did a quick search on Jira and I found this old resolved issue
> <https://issues.apache.org/jira/browse/ARROW-6749> which mentions you can
> just use np.array(arr) where arr is a Timestamp('us') and that seems to
> work. Would that be recommended over to_numpy or are they doing the same
> thing?
>
> Thanks!
> Michael
>