You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Bryan Cutler (JIRA)" <ji...@apache.org> on 2019/06/21 17:47:00 UTC
[jira] [Created] (ARROW-5682) [Python] from_pandas conversion casts
values to string inconsistently
Bryan Cutler created ARROW-5682:
-----------------------------------
Summary: [Python] from_pandas conversion casts values to string inconsistently
Key: ARROW-5682
URL: https://issues.apache.org/jira/browse/ARROW-5682
Project: Apache Arrow
Issue Type: Improvement
Components: Python
Affects Versions: 0.13.0
Reporter: Bryan Cutler
When calling {{pa.Array.from_pandas}} primitive data as input, and casting to string with "type=pa.string()", the resulting pyarrow Array can have inconsistent values. For most input, the result is an empty string, however for some types (int32, int64) the values are '\x01' etc.
{noformat}
In [8]: s = pd.Series([1, 2, 3], dtype=np.uint8)
In [9]: pa.Array.from_pandas(s, type=pa.string())
Out[9]:
<pyarrow.lib.StringArray object at 0x7f90b6091a48>
[
"",
"",
""
]
In [10]: s = pd.Series([1, 2, 3], dtype=np.uint32)
In [11]: pa.Array.from_pandas(s, type=pa.string())
Out[11]:
<pyarrow.lib.StringArray object at 0x7f9097efca48>
[
"",
"",
""
]
{noformat}
This came from the Spark discussion https://github.com/apache/spark/pull/24930/files#r296187903. Type casting this way in Spark is not supported, but it would be good to get the behavior consistent. Would it be better to raise an UnsupportedOperation error?
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)