You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Antoine Pitrou (Jira)" <ji...@apache.org> on 2021/10/14 15:01:00 UTC

[jira] [Commented] (ARROW-14320) Pyarrow array to_numpy array corrupts numpy dtype

    [ https://issues.apache.org/jira/browse/ARROW-14320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17428849#comment-17428849 ] 

Antoine Pitrou commented on ARROW-14320:
----------------------------------------

It can't work because you have a PyArrow list array and Arrow lists are variable-sized, so you cannot convert them to a rectangular 2d array.

However, it would be nice if this could work with a fixed-size-list array:
{code:python}
>>> pa.array([[1,2,3],[4,5,6]], type=pa.list_(pa.int32(), 3))
<pyarrow.lib.FixedSizeListArray object at 0x7f787e2e8ac0>
[
  [
    1,
    2,
    3
  ],
  [
    4,
    5,
    6
  ]
]
>>> pa.array([[1,2,3],[4,5,6]], type=pa.list_(pa.int32(), 3)).to_numpy()
Traceback (most recent call last):
  [...]
ArrowInvalid: Needed to copy 1 chunks with 0 nulls, but zero_copy_only was True
{code}


> Pyarrow array to_numpy array corrupts numpy dtype
> -------------------------------------------------
>
>                 Key: ARROW-14320
>                 URL: https://issues.apache.org/jira/browse/ARROW-14320
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 5.0.0
>            Reporter: Ben Epstein
>            Priority: Major
>
> When converting a single-dimensional array to numpy, the dtype is preserved
> {code:java}
> import pyarrow as pa
> x = pa.array([.234,.345,.456])
> x.to_numpy().dtype # dtype('float64'){code}
> But when doing the same for a multi-dimensional array, the dtype is lost *and cannot be set manually*
> {code:java}
> x = pa.array([[1,2,3],[4,5,6]]).to_numpy(zero_copy_only=False)
> print(x.dtpye) # object
> x.astype(np.float64) # ValueError: setting an array element with a sequence.{code}
> Which is to say that numpy believes this array is not uniform. The only way to get it to the proper dtype is to convert it to a python list then back to a numpy array.
> Is there another way to achieve this? Or, at least, can it be fixed such that we can manually set the dtype of the numpy array after conversion?
> I know that pyarrow doesn't support ndarrays with ndim>1 (https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this can be achieved going the other way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)