You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2019/09/09 14:31:00 UTC

[jira] [Assigned] (ARROW-5682) [Python] from_pandas conversion casts values to string inconsistently

     [ https://issues.apache.org/jira/browse/ARROW-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche reassigned ARROW-5682:
--------------------------------------------

    Assignee: Joris Van den Bossche

> [Python] from_pandas conversion casts values to string inconsistently
> ---------------------------------------------------------------------
>
>                 Key: ARROW-5682
>                 URL: https://issues.apache.org/jira/browse/ARROW-5682
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.13.0
>            Reporter: Bryan Cutler
>            Assignee: Joris Van den Bossche
>            Priority: Minor
>             Fix For: 0.15.0
>
>
> When calling {{pa.Array.from_pandas}} primitive data as input, and casting to string with  "type=pa.string()", the resulting pyarrow Array can have inconsistent values. For most input, the result is an empty string, however for some types (int32, int64) the values are '\x01' etc.
> {noformat}
> In [8]: s = pd.Series([1, 2, 3], dtype=np.uint8)
> In [9]: pa.Array.from_pandas(s, type=pa.string())                                                                            
> Out[9]: 
> <pyarrow.lib.StringArray object at 0x7f90b6091a48>
> [
>   "",
>   "",
>   ""
> ]
> In [10]: s = pd.Series([1, 2, 3], dtype=np.uint32)                                                                           
> In [11]: pa.Array.from_pandas(s, type=pa.string())                                                                           
> Out[11]: 
> <pyarrow.lib.StringArray object at 0x7f9097efca48>
> [
>   "",
>   "",
>   ""
> ]
> {noformat}
> This came from the Spark discussion https://github.com/apache/spark/pull/24930/files#r296187903. Type casting this way in Spark is not supported, but it would be good to get the behavior consistent. Would it be better to raise an UnsupportedOperation error?



--
This message was sent by Atlassian Jira
(v8.3.2#803003)