You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/03/10 18:21:23 UTC

[GitHub] [arrow] dmgcodevil commented on issue #12597: Does Pyarrow support Parquet 2.0

dmgcodevil commented on issue #12597:
URL: https://github.com/apache/arrow/issues/12597#issuecomment-1064360092


   @jorisvandenbossche , the files that Pyarrow successfully reads written by Spark/Iceberg data source and Iceberg's [ParquetWriter].(https://github.com/apache/iceberg/blob/master/parquet/src/main/java/org/apache/iceberg/parquet/ParquetWriter.java)
   
   The files that Pyarrow fails to read are written via Trino Iceberg catalog (connector). In theory, both Trino and Spark should use Iceberg ParquetWriter which internally uses Hadoop Parquet Writer. 
   
   What I've found is that some columns are encoded in DELTA_BYTE_ARRAY. Does Pyarrow support this encoding? I know that Fastparquet does not. I also found this [ticket](https://issues.apache.org/jira/browse/ARROW-6057?src=confmacro), is it still relevant? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org