You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Krisztian Szucs (JIRA)" <ji...@apache.org> on 2018/04/05 19:12:00 UTC

[jira] [Commented] (ARROW-2101) [Python] from_pandas reads 'str' type as binary Arrow data with Python 2

    [ https://issues.apache.org/jira/browse/ARROW-2101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16427462#comment-16427462 ] 

Krisztian Szucs commented on ARROW-2101:
----------------------------------------

I thins here https://github.com/apache/arrow/blob/master/cpp/src/arrow/python/numpy_to_arrow.cc#L839
is the relevant code, the comments nicely explain what happens. 

The example with explicit string datatype presumes the opposite direction.
 Of course a high level decode('utf-8') would work. What is the preferred way to do this kind of conversions?

> [Python] from_pandas reads 'str' type as binary Arrow data with Python 2
> ------------------------------------------------------------------------
>
>                 Key: ARROW-2101
>                 URL: https://issues.apache.org/jira/browse/ARROW-2101
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>    Affects Versions: 0.8.0
>            Reporter: Bryan Cutler
>            Priority: Major
>
> Using Python 2, converting Pandas with 'str' data to Arrow results in Arrow data of binary type, even if the user supplies type information.  conversion of 'unicode' type works to create Arrow data of string types.  For example
> {code}
> In [25]: pa.Array.from_pandas(pd.Series(['a'])).type
> Out[25]: DataType(binary)
> In [26]: pa.Array.from_pandas(pd.Series(['a']), type=pa.string()).type
> Out[26]: DataType(binary)
> In [27]: pa.Array.from_pandas(pd.Series([u'a'])).type
> Out[27]: DataType(string)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)