You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Bruce Allen (Jira)" <ji...@apache.org> on 2021/11/02 23:38:00 UTC

[jira] [Created] (ARROW-14564) [python] uint32 incorrectly saves to Parquet as int64

Bruce Allen created ARROW-14564:
-----------------------------------

             Summary: [python] uint32 incorrectly saves to Parquet as int64
                 Key: ARROW-14564
                 URL: https://issues.apache.org/jira/browse/ARROW-14564
             Project: Apache Arrow
          Issue Type: Bug
          Components: Python
    Affects Versions: 6.0.0
         Environment: Ubuntu 20.10, Python 3.8.10
            Reporter: Bruce Allen
         Attachments: test_u32.py

Function pyarrow.parquet.write_table() incorrectly saves data of type unsigned int32 as signed int64.  Code test_u32.py showing failure is attached.

Output from running test_u32.py indicating faulty retyping:

pyarrow version: 6.0.0
numpy data:
[(1, 2) (3, 4)]
[('my_u2', '<u2'), ('my_u4', '<u4')]
result:
 my_u2 my_u4
0 1 2
1 3 4
my_u2 uint16
my_u4 int64
dtype: object

 

We can also observe that the incorrect int64 type is in the Parquet file by using the "parq" tool:

$ parq _test_u32_pq --schema

# Schema 
 <pyarrow._parquet.ParquetSchema object at 0x7ff2e40b2a40>
required group field_id=-1 schema {
 optional int32 field_id=-1 my_u2 (Int(bitWidth=16, isSigned=false));
 optional int64 field_id=-1 my_u4;
}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)