You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Paul Rogers (Jira)" <ji...@apache.org> on 2022/01/01 07:09:00 UTC

[jira] [Updated] (DRILL-8099) Parquet record writer does not convert Dril local timestamp to UTC

     [ https://issues.apache.org/jira/browse/DRILL-8099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Paul Rogers updated DRILL-8099:
-------------------------------
    Description: 
Drill follows the old SQL engine convention to store the `TIMESTAMP` type in the local time zone. This is, of course, highly awkward in today's age when UTC is used as the standard timestamp in most products. However, it is how Drill works. (It would be great to add a `UTC_TIMESTAMP` type, but that is another topic.)

Each reader or writer that works with files that hold UTC timestamps must convert to (reader) or from (writer) Drill's local-time timestamp. Otherwise, Drill works correctly only when the server time zone is set to UTC.

Now, perhaps we can convince must shops to run their Drill server in UTC, or at least set the JVM timezone to UTC. However, this still leads developers in a lurch: if the development machine timezone is not UTC, then some tests fail. In particular:

{{TestNestedDateTimeTimestamp.testNestedDateTimeCTASParquet}}

The reason that the above test fails is that the generated Parquet writer code assumes (incorrectly) that the Drill timestamp is in UTC and so no conversion is needed to write that data into Parquet. In particular, in {{ParquetOutputRecordWriter.getNewTimeStampConverter()}}:

{noformat}
    reader.read(holder);
    consumer.addLong(holder.value);
{noformat}

The JSON writer has the same problem:

{noformat}
  @Override
  public void writeTimestamp(FieldReader reader) throws IOException {
    if (reader.isSet()) {
      writeTimestamp(reader.readLocalDateTime());
    } else {
      writeTimeNull();
    }
  }
{noformat}

Basically, it takes a {{LocalDateTime}}, and formats it as a UTC timezone (using the "Z" suffix.) This is only valid if the machine is in the UTC time zone, which is why the test for this class attempts to force the local time zone to UTC, something that must users will not do.

  was:
Drill follows the old SQL engine convention to store the `TIMESTAMP` type in the local time zone. This is, of course, highly awkward in today's age when UTC is used as the standard timestamp in most products. However, it is how Drill works. (It would be great to add a `UTC_TIMESTAMP` type, but that is another topic.)

Each reader or writer that works with files that hold UTC timestamps must convert to (reader) or from (writer) Drill's local-time timestamp. Otherwise, Drill works correctly only when the server time zone is set to UTC.

Now, perhaps we can convince must shops to run their Drill server in UTC, or at least set the JVM timezone to UTC. However, this still leads developers in a lurch: if the development machine timezone is not UTC, then some tests fail. In particular:

{{TestNestedDateTimeTimestamp.testNestedDateTimeCTASParquet}}

The reason that the above test fails is that the generated Parquet writer code assumes (incorrectly) that the Drill timestamp is in UTC and so no conversion is needed to write that data into Parquet. In particular, in {{ParquetOutputRecordWriter.getNewTimeStampConverter()}}:

{noformat}
    reader.read(holder);
    consumer.addLong(holder.value);
{noformat}


> Parquet record writer does not convert Dril local timestamp to UTC
> ------------------------------------------------------------------
>
>                 Key: DRILL-8099
>                 URL: https://issues.apache.org/jira/browse/DRILL-8099
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.19.0
>            Reporter: Paul Rogers
>            Assignee: Paul Rogers
>            Priority: Major
>
> Drill follows the old SQL engine convention to store the `TIMESTAMP` type in the local time zone. This is, of course, highly awkward in today's age when UTC is used as the standard timestamp in most products. However, it is how Drill works. (It would be great to add a `UTC_TIMESTAMP` type, but that is another topic.)
> Each reader or writer that works with files that hold UTC timestamps must convert to (reader) or from (writer) Drill's local-time timestamp. Otherwise, Drill works correctly only when the server time zone is set to UTC.
> Now, perhaps we can convince must shops to run their Drill server in UTC, or at least set the JVM timezone to UTC. However, this still leads developers in a lurch: if the development machine timezone is not UTC, then some tests fail. In particular:
> {{TestNestedDateTimeTimestamp.testNestedDateTimeCTASParquet}}
> The reason that the above test fails is that the generated Parquet writer code assumes (incorrectly) that the Drill timestamp is in UTC and so no conversion is needed to write that data into Parquet. In particular, in {{ParquetOutputRecordWriter.getNewTimeStampConverter()}}:
> {noformat}
>     reader.read(holder);
>     consumer.addLong(holder.value);
> {noformat}
> The JSON writer has the same problem:
> {noformat}
>   @Override
>   public void writeTimestamp(FieldReader reader) throws IOException {
>     if (reader.isSet()) {
>       writeTimestamp(reader.readLocalDateTime());
>     } else {
>       writeTimeNull();
>     }
>   }
> {noformat}
> Basically, it takes a {{LocalDateTime}}, and formats it as a UTC timezone (using the "Z" suffix.) This is only valid if the machine is in the UTC time zone, which is why the test for this class attempts to force the local time zone to UTC, something that must users will not do.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)