You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Joris Van den Bossche (JIRA)" <ji...@apache.org> on 2019/06/26 06:16:00 UTC

[jira] [Updated] (ARROW-5655) [Python] Table.from_pydict/from_arrays not using types in specified schema correctly

     [ https://issues.apache.org/jira/browse/ARROW-5655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Joris Van den Bossche updated ARROW-5655:
-----------------------------------------
    Fix Version/s: 1.0.0

> [Python] Table.from_pydict/from_arrays not using types in specified schema correctly 
> -------------------------------------------------------------------------------------
>
>                 Key: ARROW-5655
>                 URL: https://issues.apache.org/jira/browse/ARROW-5655
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Python
>            Reporter: Joris Van den Bossche
>            Priority: Major
>             Fix For: 1.0.0
>
>
> Example with {{from_pydict}} (from https://github.com/apache/arrow/pull/4601#issuecomment-503676534):
> {code:python}
> In [15]: table = pa.Table.from_pydict(
>     ...:     {'a': [1, 2, 3], 'b': [3, 4, 5]},
>     ...:     schema=pa.schema([('a', pa.int64()), ('c', pa.int32())]))
> In [16]: table
> Out[16]: 
> pyarrow.Table
> a: int64
> c: int32
> In [17]: table.to_pandas()
> Out[17]: 
>    a  c
> 0  1  3
> 1  2  0
> 2  3  4
> {code}
> Note that the specified schema has 1) different column names and 2) has a non-default type (int32 vs int64) which leads to corrupted values.
> This is partly due to {{Table.from_pydict}} not using the type information in the schema to convert the dictionary items to pyarrow arrays. But then it is also {{Table.from_arrays}} that is not correctly casting the arrays to another dtype if the schema specifies as such.
> Additional question for {{Table.pydict}} is whether it actually should override the 'b' key from the dictionary as column 'c' as defined in the schema (this behaviour depends on the order of the dictionary, which is not guaranteed below python 3.6).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)