You are viewing a plain text version of this content. The canonical link for it is here.

Posted to jira@arrow.apache.org by "Uwe Korn (Jira)" <ji...@apache.org> on 2020/06/24 07:41:00 UTC

[jira] [Commented] (ARROW-9215) pyarrow parquet writer converts uint32 columns to int64

    [ https://issues.apache.org/jira/browse/ARROW-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143603#comment-17143603 ] 

Uwe Korn commented on ARROW-9215:
---------------------------------

This is expected behaviour as long as you are writing Parquet files with {{version='1.0'}}. There unsigned types are not yet supported in the specification and as uint32 columns can be larger than the range of int32, they must be saved as int64 to not truncate any values.

> pyarrow parquet writer converts uint32 columns to int64
> -------------------------------------------------------
>
>                 Key: ARROW-9215
>                 URL: https://issues.apache.org/jira/browse/ARROW-9215
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Devavret Makkar
>            Priority: Major
>
> pyarrow parquet writer changes uint32 columns to int64. This change is not made for other types and uint8, uint16, and uint64 columns retain their type.
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: import pyarrow.parquet as pq
> In [5]: df = pd.DataFrame({'a':pd.Series([1,2,3], dtype='uint32')})
> In [6]: padf = pa.Table.from_pandas(df)
> In [7]: padf
> Out[7]: 
> pyarrow.Table
> a: uint32
> In [8]: pq.write_table(padf, 'pa.parquet')
> In [9]: pq.read_table('pa.parquet')
> Out[9]: 
> pyarrow.Table
> a: int64
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)