You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Igor Yastrebov (JIRA)" <ji...@apache.org> on 2019/08/13 08:51:00 UTC

[jira] [Commented] (ARROW-5566) [Python] Overhaul type unification from Python sequence in arrow::py::InferArrowType

    [ https://issues.apache.org/jira/browse/ARROW-5566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16905960#comment-16905960 ] 

Igor Yastrebov commented on ARROW-5566:
---------------------------------------

[~wesmckinn] I have found another issue of this type: when you pass a list or a np.array of strings which starts with np.nan to pa.array(), it fails to convert because it expects elements to be floats. Such np.array is easy to obtain in practice if you use values method on a pd.Series of strings with nulls.

> [Python] Overhaul type unification from Python sequence in arrow::py::InferArrowType
> ------------------------------------------------------------------------------------
>
>                 Key: ARROW-5566
>                 URL: https://issues.apache.org/jira/browse/ARROW-5566
>             Project: Apache Arrow
>          Issue Type: Improvement
>          Components: Python
>            Reporter: Wes McKinney
>            Priority: Major
>
> I'm working on ARROW-4324 and there's some technical debt lying in arrow/python/inference.cc because the case where NumPy scalars are mixed with non-NumPy Python scalar values, all hell breaks loose. In particular, the innocuous {{numpy.nan}} is a Python float, not a NumPy float64, so the sequence {{[np.float16(1.5), np.nan]}} can be converted incorrectly. 
> Part of what's messy is that NumPy dtype unification is split from general type unification. This should all be combined together with the NumPy types mapping onto an intermediate value (for unification purposes) that then maps ultimately onto an Arrow type



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)