You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/23 21:17:43 UTC

[GitHub] [arrow-rs] tustvold commented on issue #1932: unable to write parquet file with UTC timestamp

tustvold commented on issue #1932:
URL: https://github.com/apache/arrow-rs/issues/1932#issuecomment-1164881745

   Could you expand a bit on what the expected behaviour is, as honestly cannot find any comprehensive document on how this is supposed to be handled. It's one of the many data model mismatches between arrow and parquet where it isn't really very clearly defined what is "correct" - #1666. 
   
   Ultimately Parquet does not have a native mechanism to encode timezone information in its schema, instead opting for something slightly different - https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp. The arrow schema is embedded in the parquet file, but as documented in #1663 this cannot be treated as authoritative.
   
   What I can say is the following:
   
   * The timezone is being stored in the embedded schema
   * As of parquet 15.0.0, in particular https://github.com/apache/arrow-rs/pull/1682, parquet-rs roundtrips timezones correctly
   * pqrs is on parquet 12.0.0 where timezones did not roundtrip correctly
   * pyarrow appears to ignore the timezone stored within the arrow schema
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org