You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2021/06/29 19:08:00 UTC
[jira] [Created] (ARROW-13214) [C++] [Parquet] uint32 does not roundtrip?
Jorge Leitão created ARROW-13214:
------------------------------------
Summary: [C++] [Parquet] uint32 does not roundtrip?
Key: ARROW-13214
URL: https://issues.apache.org/jira/browse/ARROW-13214
Project: Apache Arrow
Issue Type: Bug
Components: Parquet
Reporter: Jorge Leitão
I found that the following does not roundtrip:
{code:java}
[('generated_primitive', DataType(uint32)), ('generated_primitive', DataType(uint32))]
[('generated_primitive_no_batches', DataType(uint32)), ('generated_primitive_no_batches', DataType(uint32))]
[('generated_primitive_zerolength', DataType(uint32)), ('generated_primitive_zerolength', DataType(uint32))]
{code}
The exact code I am using for this
{code:java}
import os
import pyarrow.ipc
import pyarrow.parquet as pq
def get_file_path(file: str):
return f"../testing/arrow-testing/data/arrow-ipc-stream/integration/1.0.0-littleendian/{file}.arrow_file"
def _expected(file: str):
return pyarrow.ipc.RecordBatchFileReader(get_file_path(file)).read_all()
def check_file(file):
expected = _expected(file)
path = f"{file}.parquet"
pq.write_table(expected, path, compression=None, write_statistics=False)
table = pq.read_table(path)
os.remove(path)
failing = []
for c1, c2 in zip(expected, table):
if c1 != c2:
failing.append((file, c1.type))
return failing
for file in [
"generated_primitive",
"generated_primitive_no_batches",
"generated_primitive_zerolength",
"generated_null",
"generated_null_trivial",
"generated_primitive_large_offsets",
]:
failing = check_file(file)
if failing:
print(failing)
{code}
Note: I generated the same parquet using the experimental parquet2 and the roundtrip succeeds, suggesting that the potential error is in writing.
Upon further investigation, it seems that the only difference is the type: c1's type is uint32, c2's type is int64.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)