You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2022/01/01 06:39:00 UTC

[jira] [Created] (DRILL-8099) Parquet record writer does not convert Dril local timestamp to UTC

Paul Rogers created DRILL-8099:
----------------------------------

             Summary: Parquet record writer does not convert Dril local timestamp to UTC
                 Key: DRILL-8099
                 URL: https://issues.apache.org/jira/browse/DRILL-8099
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.19.0
            Reporter: Paul Rogers
            Assignee: Paul Rogers


Drill follows the old SQL engine convention to store the `TIMESTAMP` type in the local time zone. This is, of course, highly awkward in today's age when UTC is used as the standard timestamp in most products. However, it is how Drill works. (It would be great to add a `UTC_TIMESTAMP` type, but that is another topic.)

Each reader or writer that works with files that hold UTC timestamps must convert to (reader) or from (writer) Drill's local-time timestamp. Otherwise, Drill works correctly only when the server time zone is set to UTC.

Now, perhaps we can convince must shops to run their Drill server in UTC, or at least set the JVM timezone to UTC. However, this still leads developers in a lurch: if the development machine timezone is not UTC, then some tests fail. In particular:

{{TestNestedDateTimeTimestamp.testNestedDateTimeCTASParquet}}

The reason that the above test fails is that the generated Parquet writer code assumes (incorrectly) that the Drill timestamp is in UTC and so no conversion is needed to write that data into Parquet. In particular, in {{ParquetOutputRecordWriter.getNewTimeStampConverter()}}:

{noformat}
    reader.read(holder);
    consumer.addLong(holder.value);
{noformat}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)