You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Joris Van den Bossche (Jira)" <ji...@apache.org> on 2021/02/03 10:26:00 UTC

[jira] [Commented] (ARROW-9215) pyarrow parquet writer converts uint32 columns to int64

    [ https://issues.apache.org/jira/browse/ARROW-9215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17277886#comment-17277886 ] 

Joris Van den Bossche commented on ARROW-9215:
----------------------------------------------

I stumbled on this issue while further researching PARQUET-1972 (switching to 2.0 as default). 
So it seems that I mistakenly thought we were not using logical type annotations for ints (so couldn't preserve anything else as int32/int64), but it seems this is actually only for the {{uint32}} case that I tested with. 

Now, what I still don't fully understand from the explanation above is why for uint32 we don't store it physically as int32, but for unint64 we actually are fine with doing that (also for version 1.0). 

> pyarrow parquet writer converts uint32 columns to int64
> -------------------------------------------------------
>
>                 Key: ARROW-9215
>                 URL: https://issues.apache.org/jira/browse/ARROW-9215
>             Project: Apache Arrow
>          Issue Type: Bug
>            Reporter: Devavret Makkar
>            Assignee: Uwe Korn
>            Priority: Major
>
> pyarrow parquet writer changes uint32 columns to int64. This change is not made for other types and uint8, uint16, and uint64 columns retain their type.
> {code:python}
> In [1]: import pandas as pd
> In [2]: import pyarrow as pa
> In [3]: import pyarrow.parquet as pq
> In [5]: df = pd.DataFrame({'a':pd.Series([1,2,3], dtype='uint32')})
> In [6]: padf = pa.Table.from_pandas(df)
> In [7]: padf
> Out[7]: 
> pyarrow.Table
> a: uint32
> In [8]: pq.write_table(padf, 'pa.parquet')
> In [9]: pq.read_table('pa.parquet')
> Out[9]: 
> pyarrow.Table
> a: int64
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)