You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@arrow.apache.org by "Blaž Zupančič (Jira)" <ji...@apache.org> on 2022/07/12 18:06:00 UTC

[jira] [Created] (ARROW-17058) Timezone aware parquet read with schema and filters

Blaž Zupančič created ARROW-17058:
-------------------------------------

             Summary: Timezone aware parquet read with schema and filters
                 Key: ARROW-17058
                 URL: https://issues.apache.org/jira/browse/ARROW-17058
             Project: Apache Arrow
          Issue Type: Bug
          Components: Parquet, Python
    Affects Versions: 8.0.0
            Reporter: Blaž Zupančič
         Attachments: output.txt, pyarrow_bug.py, spark-3.1.parquet, spark-3.2.parquet, spark_parquet.py

The parquet.read_table() method in pyarrow 8.0.0 added `schema` parameter which is great for handling timestamps, i.e., they are correctly converted from UTC to the timezone specified in the schema.

However, when `schema` is used together with `filters`, timezone conversion fails with "Cannot compare timestamp with timezone to timestamp without timezone" error. This was tested on 2 files created with different versions of spark. The test code, files and the output are attached.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)