You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Max Burke (Jira)" <ji...@apache.org> on 2021/01/20 00:03:00 UTC

[jira] [Created] (ARROW-11324) [Rust] Querying datetime data in DataFusion with an embedded timezone always fails

Max Burke created ARROW-11324:
---------------------------------

             Summary: [Rust] Querying datetime data in DataFusion with an embedded timezone always fails
                 Key: ARROW-11324
                 URL: https://issues.apache.org/jira/browse/ARROW-11324
             Project: Apache Arrow
          Issue Type: Bug
          Components: Rust - DataFusion
            Reporter: Max Burke


We have a number (~ hundreds of thousands) of Parquet files that have embedded Arrow schemas in them that have time-valued columns with the type DateTime(TimeUnit::Nanosecond, Some("UTC")).

 

One of the changes in the Arrow 2 -> 3 working window was to make the Parquet loader prefer the Arrow schema compared to the one generated from the columns. 

 

But because DataFusion has the timezone field of the DateTime variant hardcoded as None, we can't load any of our data after this upgrade; we get errors like:



{{SELECT * FROM parquet_table WHERE ("timestamp" >= to_timestamp('2010-03-24T13:00:00.000000Z') AND "timestamp" <= to_timestamp('2010-03-25T00:00:00.000000Z')) ORDER BY timestamp ASC NULLS LAST;}}
{{Plan("\'Timestamp(Nanosecond, Some(\"UTC\")) >= Timestamp(Nanosecond, None)\' can\'t be evaluated because there isn\'t a common type to coerce the types to")}}

 

Any ideas/thoughts? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)