You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "Will Jones (Jira)" <ji...@apache.org> on 2022/02/17 22:29:00 UTC
[jira] [Created] (ARROW-15725) [Python] Legacy dataset can't roundtrip Int64 with nulls if partitioned
Will Jones created ARROW-15725:
----------------------------------
Summary: [Python] Legacy dataset can't roundtrip Int64 with nulls if partitioned
Key: ARROW-15725
URL: https://issues.apache.org/jira/browse/ARROW-15725
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 7.0.0, 4.0.0
Reporter: Will Jones
If there is partitioning and the column has nulls, Int64 columns may not round trip successfully using the legacy datasets implementation.
Simple reproduction:
{code:python}
import pyarrow as pa
import pyarrow.parquet as pq
import pyarrow.dataset as ds
import tempfile
table = pa.table({
'x': pa.array([None, 7753285016841556620]),
'y': pa.array(['a', 'b'])
})
ds_dir = tempfile.mkdtemp()
pq.write_to_dataset(table, ds_dir, partition_cols=['y'])
table_after = ds.dataset(ds_dir).to_table()
print(table['x'])
print(table_after['x'])
assert table['x'] == table_after['x']
{code}
{code}
[
[
null,
7753285016841556620
]
]
[
[
null
],
[
7753285016841556992
]
]
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)