You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Bjørn Jørgensen <bj...@gmail.com> on 2021/10/06 15:27:06 UTC

Problems with datetime in parquet files

Hi I did fill a bug in Apache spark for problems with datetime columns.

Looks like Apache Drill only implements TIMESTAMP_MILLIS in Parquet.
TIMESTAMP_MICROS is also Parquet standard but looks like the read path for
this type seems missing in Drill.

The bug report
https://issues.apache.org/jira/browse/SPARK-36934

-- 
Bjørn Jørgensen

Re: Problems with datetime in parquet files

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.
Hi Bjørn

See this PR which will soon be merged.

https://github.com/apache/drill/pull/2370/files

James

On 2021/10/06 17:27, Bjørn Jørgensen wrote:
> Hi I did fill a bug in Apache spark for problems with datetime columns.
>
> Looks like Apache Drill only implements TIMESTAMP_MILLIS in Parquet.
> TIMESTAMP_MICROS is also Parquet standard but looks like the read path for
> this type seems missing in Drill.
>
> The bug report
> https://issues.apache.org/jira/browse/SPARK-36934
>


Re: Problems with datetime in parquet files

Posted by James Turton <ja...@somecomputer.xyz.INVALID>.
Thank you for reporting this Bjørn.  I found this Drill ticket

https://issues.apache.org/jira/browse/DRILL-6670

which contains the following in the comments
> This change aims at restoring original functionality - handling 
> `TIMESTAMP_MICROS` as `INT64` with no logical type in both Parquet 
> readers. It doesn't seem to make sense to do more since 
> `TIMESTAMP_MICROS` is deprecated logical type as per Parquet [current 
> documentation](https://github.com/apache/parquet-format/blob/master/LogicalTypes.md 
> <https://github.com/apache/parquet-format/blob/master/LogicalTypes.md>).

on the parquet-format page I found
> |TIMESTAMP_MICROS| is the deprecated ConvertedType counterpart of a 
> |TIMESTAMP| logical type that is UTC normalized and has |MICROS| 
> precision. Like the logical type counterpart, it must annotate an |int64|.

So I guess we won't ever add TIMESTAMP_MICROS but we would certainly 
want to support TIMESTAMP with MICROS precision (I don't know our 
current status there).

On 2021/10/06 17:27, Bjørn Jørgensen wrote:
> Hi I did fill a bug in Apache spark for problems with datetime columns.
>
> Looks like Apache Drill only implements TIMESTAMP_MILLIS in Parquet.
> TIMESTAMP_MICROS is also Parquet standard but looks like the read path for
> this type seems missing in Drill.
>
> The bug report
> https://issues.apache.org/jira/browse/SPARK-36934
>