You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/02/08 12:30:00 UTC

[jira] [Commented] (ARROW-11416) [Python] table.to_pandas converts int32 to float64 if column is None

    [ https://issues.apache.org/jira/browse/ARROW-11416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17280999#comment-17280999 ] 

Joris Van den Bossche commented on ARROW-11416:
-----------------------------------------------

[~Jaxing] this is expected behaviour, because by default pandas does not support missing values in integer columns (so once you have missing values, the integer column gets upcast to a float column). See https://pandas.pydata.org/docs/user_guide/gotchas.html#nan-integer-na-values-and-na-type-promotions for the docs about this, which also mentions the experimental nullable integer data types in pandas.

> [Python] table.to_pandas converts int32 to float64 if column is None
> --------------------------------------------------------------------
>
>                 Key: ARROW-11416
>                 URL: https://issues.apache.org/jira/browse/ARROW-11416
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Jesper Jaxing
>            Priority: Minor
>
> table.to_pandas converts int32 to float64 if column is None.
> To recreate:
> import pyarrow as pa
> schema = pa.schema([pa.field('a', pa.int32())])
> a = [None, None, None]
> array = pa.array(a, type=pa.int32())
> table = pa.Table.from_arrays([array], schema=schema)
> df = table.to_pandas()
> df.dtypes
> >> a float64



--
This message was sent by Atlassian Jira
(v8.3.4#803005)