You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Ben Epstein (Jira)" <ji...@apache.org> on 2021/10/14 14:57:00 UTC

[jira] [Created] (ARROW-14320) Pyarrow array to_numpy array corrupts numpy dtype

Ben Epstein created ARROW-14320:
-----------------------------------

             Summary: Pyarrow array to_numpy array corrupts numpy dtype
                 Key: ARROW-14320
                 URL: https://issues.apache.org/jira/browse/ARROW-14320
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 5.0.0
            Reporter: Ben Epstein


When converting a single-dimensional array to numpy, the dtype is preserved


{code:java}
import pyarrow as pa
x = pa.array([.234,.345,.456])
x.to_numpy().dtype # dtype('float64'){code}

But when doing the same for a multi-dimensional array, the dtype is lost *and cannot be set manually*
{code:java}
x = pa.array([[1,2,3],[4,5,6]]).to_numpy(zero_copy_only=False)
print(x.dtpye) # object
x.astype(np.float64) # ValueError: setting an array element with a sequence.{code}
Which is to say that numpy believes this array is not uniform. The only way to get it to the proper dtype is to convert it to a python list then back to a numpy array.

Is there another way to achieve this? 

I know that pyarrow doesn't support ndarrays with ndim>1 (https://issues.apache.org/jira/browse/ARROW-5645) but I was curious if this can be achieves going the other way.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)