You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Parth Chandra <pa...@apache.org> on 2017/03/13 21:44:23 UTC

Drill date & time types encoding

Paul asked this and I'm posting here so someone who knows better can
correct me if I'm wrong ( This is from my notes when I was young)

DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
TIME : Int32 : Milliseconds from midnight on 1/1/1970
TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
TimeZones)
TimeStamp : Int64 : Milliseconds from epoch
Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
Interval Day : Int32 + Int32 : Days + Milliseconds
Interval Year : Int32 : Month

A slightly readable version of these can be found in the C++ client :).
$drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
has a bunch of 'Holder' structs for the date-time types.

HTH

Parth

Re: Drill date & time types encoding

Posted by Jinfeng Ni <jn...@apache.org>.
yes, you are right there is difference between "should" and "may". I
"should" use "may" in the first place.

However, I do not think the interpretation of a timestamp is entirely
up to the user (Do you mean the end user of a database product?).
What if the implementation of database takes a different
interpretation? For instance, when the system has to do conversion
between a timestamp w/ tz and a timestamp w/o tz, "SQL assumes the
value without time zone to be local, subtracts the current default
time zone displacement of the SQL-session from it to give UTC, and
associates that time zone displacement with the result."

I think what we are talking about is the semantics of those different
data types in a database system; it has nothing to do whether the
system is using JVM, or running on linux or other operation system.





On Thu, Mar 16, 2017 at 5:39 PM, Julian Hyde <jh...@apache.org> wrote:
> The difference between “should” and “may”. The interpretation of a timestamp is entirely up to the user.
>
> If I am reading a timestamp value from a database, and I know it to be in local timezone, I use the ResultSet.getTimestamp(int) method, which (per the JDBC spec) interprets the value as being in my JVM’s time zone, and converts it into an instant (a java.sql.Timestamp) accordingly. Thus ‘1970-01-01 00:00:00’ will become ‘1969-12-31 16:00:00 UTC’ since I am in pacific time.
>
> If I am reading a timestamp value from a database, and I know it to be in some other timezone, I use the ResultSet.getTimestamp(int, Calendar) method, which applies the time zone inside the calendar.
>
> I HAVE to provide a timezone, implicitly or explicitly, when reading a TIMESTAMP value from a database via JDBC into a java.sql.Timestamp.  Why? Because I am converting a zoneless value into an instant.
>
> Note that the database can do quite a few operations on a timestamp without knowing its time zone. For instance "CAST(ts AS VARCHAR)" and "EXTRACT(HOUR FROM ts)” and “ts + INTERVAL ‘1’ DAY” all make sense.
>
> Julian
>
>
>
>> On Mar 16, 2017, at 4:54 PM, Jinfeng Ni <jn...@apache.org> wrote:
>>
>> On Thu, Mar 16, 2017 at 4:41 PM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>>>
>>>> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:
>>>>
>>>> Time/Timestamp without t/z should be interpreted as local time.
>>>
>>>
>>> No.
>>>
>>> If I am in pacific time and I have a TIMESTAMP value “1970-01-01 12:00:00” and I send it to you in central european time you receive a TIMESTAMP value “1970-01-01 12:00:00”.
>>>
>>> Its time zone is not my local time zone, or your local time zone, or UTC. It has no time zone.
>>>
>>>
>>
>> Maybe I did not read the SQL 2011 ISO/IEC 9075-2:2011(E) correctly.
>> How do we interpret the following ?
>>
>> "A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
>> WITHOUT TIME ZONE, may represent a local time"
>

Re: Drill date & time types encoding

Posted by Julian Hyde <jh...@apache.org>.
The difference between “should” and “may”. The interpretation of a timestamp is entirely up to the user.

If I am reading a timestamp value from a database, and I know it to be in local timezone, I use the ResultSet.getTimestamp(int) method, which (per the JDBC spec) interprets the value as being in my JVM’s time zone, and converts it into an instant (a java.sql.Timestamp) accordingly. Thus ‘1970-01-01 00:00:00’ will become ‘1969-12-31 16:00:00 UTC’ since I am in pacific time.

If I am reading a timestamp value from a database, and I know it to be in some other timezone, I use the ResultSet.getTimestamp(int, Calendar) method, which applies the time zone inside the calendar.

I HAVE to provide a timezone, implicitly or explicitly, when reading a TIMESTAMP value from a database via JDBC into a java.sql.Timestamp.  Why? Because I am converting a zoneless value into an instant.

Note that the database can do quite a few operations on a timestamp without knowing its time zone. For instance "CAST(ts AS VARCHAR)" and "EXTRACT(HOUR FROM ts)” and “ts + INTERVAL ‘1’ DAY” all make sense.

Julian



> On Mar 16, 2017, at 4:54 PM, Jinfeng Ni <jn...@apache.org> wrote:
> 
> On Thu, Mar 16, 2017 at 4:41 PM, Julian Hyde <jhyde@apache.org <ma...@apache.org>> wrote:
>> 
>>> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:
>>> 
>>> Time/Timestamp without t/z should be interpreted as local time.
>> 
>> 
>> No.
>> 
>> If I am in pacific time and I have a TIMESTAMP value “1970-01-01 12:00:00” and I send it to you in central european time you receive a TIMESTAMP value “1970-01-01 12:00:00”.
>> 
>> Its time zone is not my local time zone, or your local time zone, or UTC. It has no time zone.
>> 
>> 
> 
> Maybe I did not read the SQL 2011 ISO/IEC 9075-2:2011(E) correctly.
> How do we interpret the following ?
> 
> "A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
> WITHOUT TIME ZONE, may represent a local time"


Re: Drill date & time types encoding

Posted by Jinfeng Ni <jn...@apache.org>.
On Thu, Mar 16, 2017 at 4:41 PM, Julian Hyde <jh...@apache.org> wrote:
>
>> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:
>>
>> Time/Timestamp without t/z should be interpreted as local time.
>
>
> No.
>
> If I am in pacific time and I have a TIMESTAMP value “1970-01-01 12:00:00” and I send it to you in central european time you receive a TIMESTAMP value “1970-01-01 12:00:00”.
>
> Its time zone is not my local time zone, or your local time zone, or UTC. It has no time zone.
>
>

Maybe I did not read the SQL 2011 ISO/IEC 9075-2:2011(E) correctly.
How do we interpret the following ?

"A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
WITHOUT TIME ZONE, may represent a local time"

Re: Drill date & time types encoding

Posted by Julian Hyde <jh...@apache.org>.
> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:
> 
> Time/Timestamp without t/z should be interpreted as local time.


No.

If I am in pacific time and I have a TIMESTAMP value “1970-01-01 12:00:00” and I send it to you in central european time you receive a TIMESTAMP value “1970-01-01 12:00:00”.

Its time zone is not my local time zone, or your local time zone, or UTC. It has no time zone.

Julian


Re: Drill date & time types encoding

Posted by Chunhui Shi <cs...@mapr.com>.
I think they are using the same timezone data from IANA,

For Java the timezone data can be found under jre/lib/zi, and Oracle has a timezone update tool too.
For enterprise software vendors, timezone update actually is a big thing.
________________________________
From: Boaz Ben-Zvi <bb...@mapr.com>
Sent: Thursday, March 16, 2017 6:08:01 PM
To: dev@drill.apache.org
Subject: Re: Drill date & time types encoding

  Timezone calculations are not simple ( e.g.,  “2017-03-11 23:30:00-PST” + INTERVAL ‘3’ HOURS  --> need to know about daylight savings time, etc.)



  Linux does have a timezone. The actual implementation is quite complex – it keeps an elaborate “database” under /usr/share/zoneinfo , (which needs to be updated periodically, e.g. by running “yum update tzdata”).



Is Java’s TZ support (https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html) equivalent to Linux ?



-       Boaz



On 3/16/17, 4:48 PM, "Paul Rogers" <pr...@mapr.com> wrote:



    Thanks all for the explanations!



    Did a bit of poking around. See DRILL-5360. For the Timestamp type:



    * Literals are claimed to be in UTC (have not yet tested)

    * Value vectors store Timestamps in server local time

    * Drill clients get the Timestamp in server local time

    * JDBC clients try to convert server local time to UTC, but use the client timezone to do so.



    The result is that clients must know the server timezone, but Drill does not provide this info. Drill clients must convert from server timezone to UTC to get the UTC value of a timestamp.



    JDBC clients must convert from the “UTC” given from JDBC to true UTC by subtracting the difference between server and client timezone offsets.



    I suspect, as Jinfeng points out, that much of the confusion comes from the conflicting use of the term “timestamp” in the SQL 2011 standard [1] and standard Linux/Java practice.



    In Linux and Java, a “timestamp” is ms since the Unix epoch, UTC. (That is, the UTC timestamp is implied, so all machines anywhere agree on what a time means.)



    The SQL TIMESTAMP is what most databases call a DATETIME: a combination of a date and time that are “free floating”: there is no implied time zone. “3 PM” is just that, it does not imply “3 PM in Paris.”



    SQL provides a TIMESTAMP WITH TIME ZONE, but that also differs from Linux practice: it is not a UTC time but rather a DATETIME with a associated timezone.



    The Drill Timestamp is neither of these. It like a TIMESTAMP WITH TIMEZONE where the timezone is the server local timezone. But, Drill does not specify that timezone, so the client “just has to know.” Unlike the Linux timestamp, the client & server don’t agree ahead of time by convention; instead every server can have its own Timestamp timezone and the client must figure out the corresponding UTC or client local time.



    What we have can work with clever adjustment programming. But it would be better (for wider adoption) to provide a cleaner, more deterministic API.



    Unfortunately, we probably can’t fix the existing Timestamp as there is probably already code that tries (like JDBC) to work around the current behavior.



    Instead, we should add the SQL TIMESTAMP WITH TIMEZONE. Or add a non-standard “LinuxTimezone” (or “TimezoneUTC”) that sores times in an agreed-upon UTC format.



    Until then, tread carefully.



    - Paul



    [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip



    > On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:

    >

    > My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without

    > timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill

    > probably does not fully support.

    >

    > SQL standards has  DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT

    > TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.

    > Time/Timestamp without t/z should be interpreted as local time.

    >

    > Here is some descriptions in SQL 2011 : Sec 4.6.2.

    >

    > "

    >

    > A datetime data type that specifies WITH TIME ZONE is a data type that

    > is datetime with time zone, while a datetime data type that specifies

    > WITHOUT TIME ZONE is a data type that is datetime without time zone.

    >

    > The surface of the earth is divided into zones, called time zones, in

    > which every correct clock tells the same time, known as local time.

    > Local time is equal to UTC (Coordinated Universal Time) plus the time

    > zone dis- placement, which is an interval value that ranges between

    > INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE.

    >

    > A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP

    > WITHOUT TIME ZONE, may represent a local time, whereas a datetime

    > value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE

    > represents UTC.

    >

    > "

    >

    > On Tue, Mar 14, 2017 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:

    >> I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing.

    >>

    >> I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range.

    >>

    >> Julian

    >>

    >>> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <bb...@mapr.com> wrote:

    >>>

    >>> Thanks for the detailed research, Paul,

    >>>

    >>> INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.

    >>>

    >>> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)

    >>>

    >>> TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)

    >>>

    >>> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).

    >>>

    >>>   Thanks,

    >>>

    >>>          — Boaz

    >>>

    >>>

    >>> On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:

    >>>

    >>> Thanks Parth!

    >>>

    >>> The date and time definitions are the “classic” ones, but conflict with the Drill documentation:

    >>>

    >>> http://drill.apache.org/docs/supported-data-types/

    >>>

    >>> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC

    >>>

    >>> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss

   >>>

    >>> Which is correct?

    >>>

    >>> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)

    >>>

    >>> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.

    >>>

    >>> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?

    >>>

    >>> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?

    >>>

    >>> Again, the documentation differs:

    >>>

    >>> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval

    >>>

    >>> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS

    >>>

    >>> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?

    >>>

    >>> If anyone knows, please let me know, else I need to do some poking around...

    >>>

    >>> Thanks,

    >>>

    >>> - Paul

    >>>

    >>> On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:

    >>>

    >>> Paul asked this and I'm posting here so someone who knows better can

    >>> correct me if I'm wrong ( This is from my notes when I was young)

    >>>

    >>> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00

    >>> TIME : Int32 : Milliseconds from midnight on 1/1/1970

    >>> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of

    >>> TimeZones)

    >>> TimeStamp : Int64 : Milliseconds from epoch

    >>> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds

    >>> Interval Day : Int32 + Int32 : Days + Milliseconds

    >>> Interval Year : Int32 : Month

    >>>

    >>> A slightly readable version of these can be found in the C++ client :).

    >>> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which

    >>> has a bunch of 'Holder' structs for the date-time types.

    >>>

    >>> HTH

    >>>

    >>> Parth

    >>>

    >>>

    >>





Re: Drill date & time types encoding

Posted by Boaz Ben-Zvi <bb...@mapr.com>.
  Timezone calculations are not simple ( e.g.,  “2017-03-11 23:30:00-PST” + INTERVAL ‘3’ HOURS  --> need to know about daylight savings time, etc.)



  Linux does have a timezone. The actual implementation is quite complex – it keeps an elaborate “database” under /usr/share/zoneinfo , (which needs to be updated periodically, e.g. by running “yum update tzdata”).



Is Java’s TZ support (https://docs.oracle.com/javase/8/docs/api/java/util/TimeZone.html) equivalent to Linux ?



-       Boaz



On 3/16/17, 4:48 PM, "Paul Rogers" <pr...@mapr.com> wrote:



    Thanks all for the explanations!



    Did a bit of poking around. See DRILL-5360. For the Timestamp type:



    * Literals are claimed to be in UTC (have not yet tested)

    * Value vectors store Timestamps in server local time

    * Drill clients get the Timestamp in server local time

    * JDBC clients try to convert server local time to UTC, but use the client timezone to do so.



    The result is that clients must know the server timezone, but Drill does not provide this info. Drill clients must convert from server timezone to UTC to get the UTC value of a timestamp.



    JDBC clients must convert from the “UTC” given from JDBC to true UTC by subtracting the difference between server and client timezone offsets.



    I suspect, as Jinfeng points out, that much of the confusion comes from the conflicting use of the term “timestamp” in the SQL 2011 standard [1] and standard Linux/Java practice.



    In Linux and Java, a “timestamp” is ms since the Unix epoch, UTC. (That is, the UTC timestamp is implied, so all machines anywhere agree on what a time means.)



    The SQL TIMESTAMP is what most databases call a DATETIME: a combination of a date and time that are “free floating”: there is no implied time zone. “3 PM” is just that, it does not imply “3 PM in Paris.”



    SQL provides a TIMESTAMP WITH TIME ZONE, but that also differs from Linux practice: it is not a UTC time but rather a DATETIME with a associated timezone.



    The Drill Timestamp is neither of these. It like a TIMESTAMP WITH TIMEZONE where the timezone is the server local timezone. But, Drill does not specify that timezone, so the client “just has to know.” Unlike the Linux timestamp, the client & server don’t agree ahead of time by convention; instead every server can have its own Timestamp timezone and the client must figure out the corresponding UTC or client local time.



    What we have can work with clever adjustment programming. But it would be better (for wider adoption) to provide a cleaner, more deterministic API.



    Unfortunately, we probably can’t fix the existing Timestamp as there is probably already code that tries (like JDBC) to work around the current behavior.



    Instead, we should add the SQL TIMESTAMP WITH TIMEZONE. Or add a non-standard “LinuxTimezone” (or “TimezoneUTC”) that sores times in an agreed-upon UTC format.



    Until then, tread carefully.



    - Paul



    [1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip



    > On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:

    >

    > My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without

    > timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill

    > probably does not fully support.

    >

    > SQL standards has  DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT

    > TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.

    > Time/Timestamp without t/z should be interpreted as local time.

    >

    > Here is some descriptions in SQL 2011 : Sec 4.6.2.

    >

    > "

    >

    > A datetime data type that specifies WITH TIME ZONE is a data type that

    > is datetime with time zone, while a datetime data type that specifies

    > WITHOUT TIME ZONE is a data type that is datetime without time zone.

    >

    > The surface of the earth is divided into zones, called time zones, in

    > which every correct clock tells the same time, known as local time.

    > Local time is equal to UTC (Coordinated Universal Time) plus the time

    > zone dis- placement, which is an interval value that ranges between

    > INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE.

    >

    > A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP

    > WITHOUT TIME ZONE, may represent a local time, whereas a datetime

    > value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE

    > represents UTC.

    >

    > "

    >

    > On Tue, Mar 14, 2017 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:

    >> I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing.

    >>

    >> I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range.

    >>

    >> Julian

    >>

    >>> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <bb...@mapr.com> wrote:

    >>>

    >>> Thanks for the detailed research, Paul,

    >>>

    >>> INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.

    >>>

    >>> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)

    >>>

    >>> TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)

    >>>

    >>> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).

    >>>

    >>>   Thanks,

    >>>

    >>>          — Boaz

    >>>

    >>>

    >>> On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:

    >>>

    >>> Thanks Parth!

    >>>

    >>> The date and time definitions are the “classic” ones, but conflict with the Drill documentation:

    >>>

    >>> http://drill.apache.org/docs/supported-data-types/

    >>>

    >>> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC

    >>>

    >>> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss

   >>>

    >>> Which is correct?

    >>>

    >>> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)

    >>>

    >>> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.

    >>>

    >>> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?

    >>>

    >>> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?

    >>>

    >>> Again, the documentation differs:

    >>>

    >>> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval

    >>>

    >>> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS

    >>>

    >>> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?

    >>>

    >>> If anyone knows, please let me know, else I need to do some poking around...

    >>>

    >>> Thanks,

    >>>

    >>> - Paul

    >>>

    >>> On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:

    >>>

    >>> Paul asked this and I'm posting here so someone who knows better can

    >>> correct me if I'm wrong ( This is from my notes when I was young)

    >>>

    >>> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00

    >>> TIME : Int32 : Milliseconds from midnight on 1/1/1970

    >>> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of

    >>> TimeZones)

    >>> TimeStamp : Int64 : Milliseconds from epoch

    >>> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds

    >>> Interval Day : Int32 + Int32 : Days + Milliseconds

    >>> Interval Year : Int32 : Month

    >>>

    >>> A slightly readable version of these can be found in the C++ client :).

    >>> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which

    >>> has a bunch of 'Holder' structs for the date-time types.

    >>>

    >>> HTH

    >>>

    >>> Parth

    >>>

    >>>

    >>





Re: Drill date & time types encoding

Posted by Paul Rogers <pr...@mapr.com>.
Thanks all for the explanations!

Did a bit of poking around. See DRILL-5360. For the Timestamp type:

* Literals are claimed to be in UTC (have not yet tested)
* Value vectors store Timestamps in server local time
* Drill clients get the Timestamp in server local time
* JDBC clients try to convert server local time to UTC, but use the client timezone to do so.

The result is that clients must know the server timezone, but Drill does not provide this info. Drill clients must convert from server timezone to UTC to get the UTC value of a timestamp.

JDBC clients must convert from the “UTC” given from JDBC to true UTC by subtracting the difference between server and client timezone offsets.

I suspect, as Jinfeng points out, that much of the confusion comes from the conflicting use of the term “timestamp” in the SQL 2011 standard [1] and standard Linux/Java practice.

In Linux and Java, a “timestamp” is ms since the Unix epoch, UTC. (That is, the UTC timestamp is implied, so all machines anywhere agree on what a time means.)

The SQL TIMESTAMP is what most databases call a DATETIME: a combination of a date and time that are “free floating”: there is no implied time zone. “3 PM” is just that, it does not imply “3 PM in Paris.”

SQL provides a TIMESTAMP WITH TIME ZONE, but that also differs from Linux practice: it is not a UTC time but rather a DATETIME with a associated timezone.

The Drill Timestamp is neither of these. It like a TIMESTAMP WITH TIMEZONE where the timezone is the server local timezone. But, Drill does not specify that timezone, so the client “just has to know.” Unlike the Linux timestamp, the client & server don’t agree ahead of time by convention; instead every server can have its own Timestamp timezone and the client must figure out the corresponding UTC or client local time.

What we have can work with clever adjustment programming. But it would be better (for wider adoption) to provide a cleaner, more deterministic API.

Unfortunately, we probably can’t fix the existing Timestamp as there is probably already code that tries (like JDBC) to work around the current behavior.

Instead, we should add the SQL TIMESTAMP WITH TIMEZONE. Or add a non-standard “LinuxTimezone” (or “TimezoneUTC”) that sores times in an agreed-upon UTC format.

Until then, tread carefully.

- Paul

[1] http://standards.iso.org/ittf/PubliclyAvailableStandards/c053681_ISO_IEC_9075-1_2011.zip

> On Mar 16, 2017, at 4:25 PM, Jinfeng Ni <jn...@apache.org> wrote:
> 
> My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without
> timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill
> probably does not fully support.
> 
> SQL standards has  DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT
> TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.
> Time/Timestamp without t/z should be interpreted as local time.
> 
> Here is some descriptions in SQL 2011 : Sec 4.6.2.
> 
> "
> 
> A datetime data type that specifies WITH TIME ZONE is a data type that
> is datetime with time zone, while a datetime data type that specifies
> WITHOUT TIME ZONE is a data type that is datetime without time zone.
> 
> The surface of the earth is divided into zones, called time zones, in
> which every correct clock tells the same time, known as local time.
> Local time is equal to UTC (Coordinated Universal Time) plus the time
> zone dis- placement, which is an interval value that ranges between
> INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE.
> 
> A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
> WITHOUT TIME ZONE, may represent a local time, whereas a datetime
> value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE
> represents UTC.
> 
> "
> 
> On Tue, Mar 14, 2017 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
>> I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing.
>> 
>> I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range.
>> 
>> Julian
>> 
>>> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <bb...@mapr.com> wrote:
>>> 
>>> Thanks for the detailed research, Paul,
>>> 
>>> INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.
>>> 
>>> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)
>>> 
>>> TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)
>>> 
>>> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).
>>> 
>>>   Thanks,
>>> 
>>>          — Boaz
>>> 
>>> 
>>> On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:
>>> 
>>> Thanks Parth!
>>> 
>>> The date and time definitions are the “classic” ones, but conflict with the Drill documentation:
>>> 
>>> http://drill.apache.org/docs/supported-data-types/
>>> 
>>> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC
>>> 
>>> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss
>>> 
>>> Which is correct?
>>> 
>>> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)
>>> 
>>> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.
>>> 
>>> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?
>>> 
>>> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?
>>> 
>>> Again, the documentation differs:
>>> 
>>> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval
>>> 
>>> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS
>>> 
>>> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?
>>> 
>>> If anyone knows, please let me know, else I need to do some poking around...
>>> 
>>> Thanks,
>>> 
>>> - Paul
>>> 
>>> On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:
>>> 
>>> Paul asked this and I'm posting here so someone who knows better can
>>> correct me if I'm wrong ( This is from my notes when I was young)
>>> 
>>> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
>>> TIME : Int32 : Milliseconds from midnight on 1/1/1970
>>> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
>>> TimeZones)
>>> TimeStamp : Int64 : Milliseconds from epoch
>>> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
>>> Interval Day : Int32 + Int32 : Days + Milliseconds
>>> Interval Year : Int32 : Month
>>> 
>>> A slightly readable version of these can be found in the C++ client :).
>>> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
>>> has a bunch of 'Holder' structs for the date-time types.
>>> 
>>> HTH
>>> 
>>> Parth
>>> 
>>> 
>> 


Re: Drill date & time types encoding

Posted by Jinfeng Ni <jn...@apache.org>.
My understanding is TIME/TIMESTAMP in Drill is TIME/TIMESTAMP without
timezone. TimeStampTZ is for TIMESTAMP with timezone, which Drill
probably does not fully support.

SQL standards has  DATE, TIME WITHOUT TIME ZONE, TIMESTAMP WITHOUT
TIME ZONE, TIME WITH TIME ZONE, or TIMESTAMP WITH TIME ZONE.
Time/Timestamp without t/z should be interpreted as local time.

Here is some descriptions in SQL 2011 : Sec 4.6.2.

"

A datetime data type that specifies WITH TIME ZONE is a data type that
is datetime with time zone, while a datetime data type that specifies
WITHOUT TIME ZONE is a data type that is datetime without time zone.

The surface of the earth is divided into zones, called time zones, in
which every correct clock tells the same time, known as local time.
Local time is equal to UTC (Coordinated Universal Time) plus the time
zone dis- placement, which is an interval value that ranges between
INTERVAL '–14:00' HOUR TO MINUTE and INTERVAL '+14:00' HOUR TO MINUTE.

A datetime value, of data type TIME WITHOUT TIME ZONE or TIMESTAMP
WITHOUT TIME ZONE, may represent a local time, whereas a datetime
value of data type TIME WITH TIME ZONE or TIMESTAMP WITH TIME ZONE
represents UTC.

"

On Tue, Mar 14, 2017 at 11:56 AM, Julian Hyde <jh...@apache.org> wrote:
> I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing.
>
> I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range.
>
> Julian
>
>> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <bb...@mapr.com> wrote:
>>
>> Thanks for the detailed research, Paul,
>>
>>  INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.
>>
>> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)
>>
>> TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)
>>
>> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).
>>
>>    Thanks,
>>
>>           — Boaz
>>
>>
>> On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:
>>
>> Thanks Parth!
>>
>> The date and time definitions are the “classic” ones, but conflict with the Drill documentation:
>>
>> http://drill.apache.org/docs/supported-data-types/
>>
>> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC
>>
>> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss
>>
>> Which is correct?
>>
>> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)
>>
>> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.
>>
>> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?
>>
>> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?
>>
>> Again, the documentation differs:
>>
>> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval
>>
>> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS
>>
>> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?
>>
>> If anyone knows, please let me know, else I need to do some poking around...
>>
>> Thanks,
>>
>> - Paul
>>
>> On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:
>>
>> Paul asked this and I'm posting here so someone who knows better can
>> correct me if I'm wrong ( This is from my notes when I was young)
>>
>> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
>> TIME : Int32 : Milliseconds from midnight on 1/1/1970
>> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
>> TimeZones)
>> TimeStamp : Int64 : Milliseconds from epoch
>> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
>> Interval Day : Int32 + Int32 : Days + Milliseconds
>> Interval Year : Int32 : Month
>>
>> A slightly readable version of these can be found in the C++ client :).
>> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
>> has a bunch of 'Holder' structs for the date-time types.
>>
>> HTH
>>
>> Parth
>>
>>
>

Re: Drill date & time types encoding

Posted by Julian Hyde <jh...@apache.org>.
I don’t think 4713 BC comes from the SQL standard. That is a Postgres thing.

I believe that the standard says you should support timestamp precision up to 9 (i.e. nanoseconds). 2 ^ 64 nanoseconds is 584 years. So, it’s not possible to cram all of the timestamp values we’d like into a 64 bit integer if you are going to support such a large time range.

Julian
 
> On Mar 14, 2017, at 11:29 AM, Boaz Ben-Zvi <bb...@mapr.com> wrote:
> 
> Thanks for the detailed research, Paul,
> 
>  INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.
> 
> DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)
> 
> TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)
> 
> TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).
> 
>    Thanks,
> 
>           — Boaz
> 
> 
> On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:
> 
> Thanks Parth!
> 
> The date and time definitions are the “classic” ones, but conflict with the Drill documentation:
> 
> http://drill.apache.org/docs/supported-data-types/
> 
> DATE Years, months, and days in YYYY-MM-DD format since 4713 BC
> 
> TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss
> 
> Which is correct?
> 
> If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)
> 
> Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.
> 
> Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?
> 
> How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?
> 
> Again, the documentation differs:
> 
> INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval
> 
> TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS
> 
> So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?
> 
> If anyone knows, please let me know, else I need to do some poking around...
> 
> Thanks,
> 
> - Paul
> 
> On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:
> 
> Paul asked this and I'm posting here so someone who knows better can
> correct me if I'm wrong ( This is from my notes when I was young)
> 
> DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
> TIME : Int32 : Milliseconds from midnight on 1/1/1970
> TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
> TimeZones)
> TimeStamp : Int64 : Milliseconds from epoch
> Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
> Interval Day : Int32 + Int32 : Days + Milliseconds
> Interval Year : Int32 : Month
> 
> A slightly readable version of these can be found in the C++ client :).
> $drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
> has a bunch of 'Holder' structs for the date-time types.
> 
> HTH
> 
> Parth
> 
> 


Re: Drill date & time types encoding

Posted by Boaz Ben-Zvi <bb...@mapr.com>.
Thanks for the detailed research, Paul,

  INTERVAL by the (SQL-99 ?) standard should be either a SECOND-DAY interval or a MONTH-YEAR interval. This is a result of the inconsistent “number of days in a month”.

DATE expressed in days, starting at 4713-BC sound like the standard as well (at least same as Postgres). (should be implemented as 4 byte)

TIME is only within a 24 hour period, so why should it care about 2001 ?  Probably the documentation should be fixed. (should be implemented as 4 byte)

TIMESTAMP should be an 8 byte type (maybe starting at 4713-BC as well, to match DATE (and Postgres :-)).

    Thanks,

           — Boaz


On Mar 13, 2017, at 3:46 PM, Paul Rogers <pr...@mapr.com>> wrote:

Thanks Parth!

The date and time definitions are the “classic” ones, but conflict with the Drill documentation:

http://drill.apache.org/docs/supported-data-types/

DATE Years, months, and days in YYYY-MM-DD format since 4713 BC

TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss

Which is correct?

If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)

Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.

Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?

How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?

Again, the documentation differs:

INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval

TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS

So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?

If anyone knows, please let me know, else I need to do some poking around...

Thanks,

- Paul

On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:

Paul asked this and I'm posting here so someone who knows better can
correct me if I'm wrong ( This is from my notes when I was young)

DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
TIME : Int32 : Milliseconds from midnight on 1/1/1970
TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
TimeZones)
TimeStamp : Int64 : Milliseconds from epoch
Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
Interval Day : Int32 + Int32 : Days + Milliseconds
Interval Year : Int32 : Month

A slightly readable version of these can be found in the C++ client :).
$drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
has a bunch of 'Holder' structs for the date-time types.

HTH

Parth



Re: Drill date & time types encoding

Posted by Paul Rogers <pr...@mapr.com>.
Thanks Parth!

The date and time definitions are the “classic” ones, but conflict with the Drill documentation:

http://drill.apache.org/docs/supported-data-types/

DATE Years, months, and days in YYYY-MM-DD format since 4713 BC

TIME 24-hour based time before or after January 1, 2001 in hours, minutes, seconds format: HH:mm:ss

Which is correct?

If the documentation is wrong, we can file a JIRA to correct it. (It may not even be wrong, since one can convert from one to the other easily, it may just be misleading…)

Also note that, according to C++, DATE and TIME and TIMESTAMP are exactly the same, but the TIME as as 32-bit number, could only hold about 2 years due to limited range.

Also, according to SQL, DATE has no time zone, it is just a date. That is, 2016-03-13 is the same date in PST or GMT. If DATE were seconds since the UTC epoch, dates would be different in different time zones. So, I assume we use the Unix epoch, but without an implied UTC time zone as is usual for Linux and Windows timestamps?

How does a TIMESTAMP differ from a DATE? Perhaps a TIMESTAMP is based on the epoch UTC while DATE has no implied time zone?

Again, the documentation differs:

INTERVAL (Internally, INTERVAL is represented as INTERVALDAY or INTERVALYEAR.) A day-time or year-month interval

TIMESTAMP JDBC timestamp in year, month, date hour, minute, second, and optional milliseconds format: yyyy-MM-dd HH:mm:ss.SSS

So, sounds like we have an INTERVALDAY and INTERVAL year, but do we or do we not have an INTERVAL?

If anyone knows, please let me know, else I need to do some poking around...

Thanks,

- Paul

On Mar 13, 2017, at 2:44 PM, Parth Chandra <pa...@apache.org>> wrote:

Paul asked this and I'm posting here so someone who knows better can
correct me if I'm wrong ( This is from my notes when I was young)

DATE : Int64 : Milliseconds from Unix Epoch : 1/1/1970 00:00:00
TIME : Int32 : Milliseconds from midnight on 1/1/1970
TimeStampTZ : Int64 + Int32 : (Milliseconds from epoch + Index into list of
TimeZones)
TimeStamp : Int64 : Milliseconds from epoch
Interval : Int32 + Int32 + Int32 : Month + Days + Milliseconds
Interval Day : Int32 + Int32 : Days + Milliseconds
Interval Year : Int32 : Month

A slightly readable version of these can be found in the C++ client :).
$drill_src/contrib/native/client/src/include/drill/recordbatch.hpp which
has a bunch of 'Holder' structs for the date-time types.

HTH

Parth