You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Kevin (Jira)" <ji...@apache.org> on 2021/10/22 00:58:00 UTC
[jira] [Created] (ARROW-14432) created_by is not exposed in the
python wrapper, creating reader side issue.
Kevin created ARROW-14432:
-----------------------------
Summary: created_by is not exposed in the python wrapper, creating reader side issue.
Key: ARROW-14432
URL: https://issues.apache.org/jira/browse/ARROW-14432
Project: Apache Arrow
Issue Type: Bug
Reporter: Kevin
Current python wrapper does NOT expose
created_by
[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L361]
But, this is available in CPP version:
[https://github.com/apache/arrow/blob/4591d76fce2846a29dac33bf01e9ba0337b118e9/cpp/src/parquet/properties.h#L249]
[https://github.com/apache/arrow/blob/master/python/pyarrow/_parquet.pxd#L320]
This creates an issue when Hadoop parquet reader reads this pyarrow parquet file:
SO :
[https://stackoverflow.com/questions/69658140/how-to-save-a-parquet-with-pandas-using-same-header-than-hadoop-spark-parquet?noredirect=1#comment123131862_69658140]
Deelopment should be minimal
--
This message was sent by Atlassian Jira
(v8.3.4#803005)