You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "David Quijano (Jira)" <ji...@apache.org> on 2020/12/29 00:41:00 UTC
[jira] [Created] (ARROW-11057) [Python] Data inconsistency with
read and write
David Quijano created ARROW-11057:
-------------------------------------
Summary: [Python] Data inconsistency with read and write
Key: ARROW-11057
URL: https://issues.apache.org/jira/browse/ARROW-11057
Project: Apache Arrow
Issue Type: Bug
Components: Python
Affects Versions: 2.0.0
Reporter: David Quijano
I have been reading and writing some tables to parquet and I found some inconsistencies.
{code:java}
# create a table with some data
a = pa.Table.from_pydict({'x': [1]*100,'y': [2]*100,'z': [3]*100,})
# write it to file
pq.write_table(a, 'test.parquet')
# read the same file
b = pq.write_table('test.parquet')
# a == b is True, that's good
# write table b to file
pq.write_table(b, 'test2.parquet')
# test is different from test2{code}
Basically it is:
* Create table in memory
* Write it to file
* Read it again
* Write it to a different file
The files are not the same. The second one contains extra information.
The differences are consistent across different compressions (I tried snappy and zstd).
Also, reading the second file and and writing it again, produces the same file.
Is this a bug or an expected behavior?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)