You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Bruce Allen (Jira)" <ji...@apache.org> on 2021/11/02 23:38:00 UTC
[jira] [Created] (ARROW-14564) [python] uint32 incorrectly saves to
Parquet as int64
Bruce Allen created ARROW-14564:
-----------------------------------
Summary: [python] uint32 incorrectly saves to Parquet as int64
Key: ARROW-14564
URL: https://issues.apache.org/jira/browse/ARROW-14564
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 6.0.0
Environment: Ubuntu 20.10, Python 3.8.10
Reporter: Bruce Allen
Attachments: test_u32.py
Function pyarrow.parquet.write_table() incorrectly saves data of type unsigned int32 as signed int64. Code test_u32.py showing failure is attached.
Output from running test_u32.py indicating faulty retyping:
pyarrow version: 6.0.0
numpy data:
[(1, 2) (3, 4)]
[('my_u2', '<u2'), ('my_u4', '<u4')]
result:
my_u2 my_u4
0 1 2
1 3 4
my_u2 uint16
my_u4 int64
dtype: object
We can also observe that the incorrect int64 type is in the Parquet file by using the "parq" tool:
$ parq _test_u32_pq --schema
# Schema
<pyarrow._parquet.ParquetSchema object at 0x7ff2e40b2a40>
required group field_id=-1 schema {
optional int32 field_id=-1 my_u2 (Int(bitWidth=16, isSigned=false));
optional int64 field_id=-1 my_u4;
}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)