You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Jan Van Besien <ja...@ngdata.com> on 2015/09/25 10:42:30 UTC

timezone confusion

I might be misunderstanding something, but I think calcite is wrong
with respect to how it handles timezones in date, time and timestamp.

Allow me to use the existing CsvTest#testDateType test as an example,
in particular the test with a timestamp on line 385.

The data in the CSV file itself for the timestamp field is the string
"1996-08-03 00:01:02". According to the TIME_FORMAT_TIMESTAMP in the
CsvEnumerator, this String is interpreted as being a timestamp in GMT
timezone. So it is considered "1996-08-03 00:01:02 GMT". So far, so
good.

Note that my local timezone is CEST (2 hours east of GMT). When
querying above timezone, I think the correct thing to do is to return
the Timezone exactly as is, hence "1996-08-03 00:01:02 GMT" or
"1996-08-03 02:01:02 CEST". Given that the string representation of a
Timestamp object returns a String in the local timezone (without
mentioning that local timezone), the resulting string would be
"1996-08-03 02:01:02". This might seem wrong (2 hours to late) but it
is actually correct because it is in the local (CEST) timezone. More
importantly, the millis inside this Timestamp object will in this case
be exactly the same as the millis inside the Timestamp object returned
by the CsvEnumerator.

However, calcite seems to do a negative correction on the millis value
with the local timezone offset, such that the result is actually 2
hours earlier, i.e. "1996-08-02 22:01:02 GMT". Thanks to the way
string formatting on the Timestamp object works, this ends up as being
the string "1996-08-03 00:01:02" which looks correct, but is actually
wrong because it is to be interpreted in the local Timezone which is
CEST, hence it really means "1996-08-02 22:01:02 GMT" which is not
what was in the database (CsvEnumerator).

If you agree with my analysis, this also means that the CsvTest on
line 385 is wrong, because it checks that the timestamp from the
resultset is java.sql.Timestamp.valueOf("1996-08-03 00:01:02") while
(in my timezone) it should really be
java.sql.Timestamp.valueOf("1996-08-03 02:01:02"). The test will have
to be rewritten to work in any timezone obviously, which will imply
creating a millis value from a Calendar with a certain fixed timezone.


Jan

Re: timezone confusion

Posted by Julian Hyde <jh...@apache.org>.
The fundamental cause is that SQL’s TIMESTAMP represents a local time with no stated timezone, whereas Java’s java.util.Date represents a moment in time. Any conversion between those is going to require a timezone (implicit or explicit).

I don’t know the Joda-Time API (part of the JDK since 1.8) very well, but I see that it recognizes “instant” and “local time” as separate concepts. See e.g. https://docs.oracle.com/javase/8/docs/api/java/time/LocalDateTime.html. Maybe we should support resultSet.getObject(0, java.time.LocalDateTime.class), then people can convert the LocalDateTime to an Instant if that’s what they want.

Julian


> On Sep 25, 2015, at 5:55 AM, Jan Van Besien <ja...@ngdata.com> wrote:
> 
> On Fri, Sep 25, 2015 at 11:06 AM, Julian Hyde <jh...@apache.org> wrote:
>> The SQL standard says that TIMESTAMP values do not have a time zone. So, “1970-01-01 00:00:00” means just that; it does not mean “1970-01-01 00:00:00 UTC” or "“1970-01-01 00:00:00 CET” or anything on your local time zone. The time zone is an interpretation placed on the value when they read it.
> 
> I agree.
> 
>> Now JDBC is another matter. When you read a TIMESTAMP value via JDBC, specifically the ResultSet.getTimestamp(int) method, it translates it into the local timezone. So, when you read it, it becomes “1970-01-01 00:00:00 CET”, whose value is -36000000L (one hour, in milliseconds, before the UTC epoch).
> 
> What you are describing here is indeed what calcite is doing. I've
> been going through some tests with hsqldb and mysql which lead me to
> conclude that they seem to do this as well. I am starting to
> understand why this is considered correct, although each time I think
> I understand it I start to doubt again ;-) Do you know where in the
> JDBC spec it is explained that it should be like this? I couldn't find
> anything that suggests this in the JDBC spec nor in the javadoc.
> 
> Anyway, given that at least hsqldb and mysql also seem to do it like
> this, I am willing to accept that it is how it is supposed to be.
> 
> Maybe it is just wrong that JDBC uses timezone-aware objects to
> represent things like SQL TIMESTAMP.
> 
> 
> Jan


Re: timezone confusion

Posted by Jan Van Besien <ja...@ngdata.com>.
On Fri, Sep 25, 2015 at 11:06 AM, Julian Hyde <jh...@apache.org> wrote:
> The SQL standard says that TIMESTAMP values do not have a time zone. So, “1970-01-01 00:00:00” means just that; it does not mean “1970-01-01 00:00:00 UTC” or "“1970-01-01 00:00:00 CET” or anything on your local time zone. The time zone is an interpretation placed on the value when they read it.

I agree.

> Now JDBC is another matter. When you read a TIMESTAMP value via JDBC, specifically the ResultSet.getTimestamp(int) method, it translates it into the local timezone. So, when you read it, it becomes “1970-01-01 00:00:00 CET”, whose value is -36000000L (one hour, in milliseconds, before the UTC epoch).

What you are describing here is indeed what calcite is doing. I've
been going through some tests with hsqldb and mysql which lead me to
conclude that they seem to do this as well. I am starting to
understand why this is considered correct, although each time I think
I understand it I start to doubt again ;-) Do you know where in the
JDBC spec it is explained that it should be like this? I couldn't find
anything that suggests this in the JDBC spec nor in the javadoc.

Anyway, given that at least hsqldb and mysql also seem to do it like
this, I am willing to accept that it is how it is supposed to be.

Maybe it is just wrong that JDBC uses timezone-aware objects to
represent things like SQL TIMESTAMP.


Jan

Re: timezone confusion

Posted by Julian Hyde <jh...@apache.org>.
Here is my understanding of the relevant specifications.

The SQL standard says that TIMESTAMP values do not have a time zone. So, “1970-01-01 00:00:00” means just that; it does not mean “1970-01-01 00:00:00 UTC” or "“1970-01-01 00:00:00 CET” or anything on your local time zone. The time zone is an interpretation placed on the value when they read it.

Internally Calcite represents TIMESTAMP '1970-01-01 00:00:00’ as 0 long.

Now JDBC is another matter. When you read a TIMESTAMP value via JDBC, specifically the ResultSet.getTimestamp(int) method, it translates it into the local timezone. So, when you read it, it becomes “1970-01-01 00:00:00 CET”, whose value is -36000000L (one hour, in milliseconds, before the UTC epoch).

Remember that a Timestamp value has no time zone. It represents a moment in time, e.g. the moment that Apollo 11 landed on the moon. Its internal value is milliseconds from the UTC epoch. But the JVM will “helpfully” render it in your local time.

So, your JVM will render a Timestamp(-36000000L) as “1970-01-01 00:00:00 CET”.

If you want a value to be interpreted in a particular time zone, us the ResultSet.getTimestamp(int, Calendar) method. If you pass in a Calendar whose timezone is UTC, you will get a Timestamp whose value is 0L.

Net net: if you have “YYYY-MM-DD HH:MM:SS" in a CSV file and read it using getTimestamp (i.e. in local timezone), you should expect to that timestamp to print as "YYYY-MM-DD HH:MM:SS <your timezone>”.

Julian


> On Sep 25, 2015, at 1:42 AM, Jan Van Besien <ja...@ngdata.com> wrote:
> 
> I might be misunderstanding something, but I think calcite is wrong
> with respect to how it handles timezones in date, time and timestamp.
> 
> Allow me to use the existing CsvTest#testDateType test as an example,
> in particular the test with a timestamp on line 385.
> 
> The data in the CSV file itself for the timestamp field is the string
> "1996-08-03 00:01:02". According to the TIME_FORMAT_TIMESTAMP in the
> CsvEnumerator, this String is interpreted as being a timestamp in GMT
> timezone. So it is considered "1996-08-03 00:01:02 GMT". So far, so
> good.
> 
> Note that my local timezone is CEST (2 hours east of GMT). When
> querying above timezone, I think the correct thing to do is to return
> the Timezone exactly as is, hence "1996-08-03 00:01:02 GMT" or
> "1996-08-03 02:01:02 CEST". Given that the string representation of a
> Timestamp object returns a String in the local timezone (without
> mentioning that local timezone), the resulting string would be
> "1996-08-03 02:01:02". This might seem wrong (2 hours to late) but it
> is actually correct because it is in the local (CEST) timezone. More
> importantly, the millis inside this Timestamp object will in this case
> be exactly the same as the millis inside the Timestamp object returned
> by the CsvEnumerator.
> 
> However, calcite seems to do a negative correction on the millis value
> with the local timezone offset, such that the result is actually 2
> hours earlier, i.e. "1996-08-02 22:01:02 GMT". Thanks to the way
> string formatting on the Timestamp object works, this ends up as being
> the string "1996-08-03 00:01:02" which looks correct, but is actually
> wrong because it is to be interpreted in the local Timezone which is
> CEST, hence it really means "1996-08-02 22:01:02 GMT" which is not
> what was in the database (CsvEnumerator).
> 
> If you agree with my analysis, this also means that the CsvTest on
> line 385 is wrong, because it checks that the timestamp from the
> resultset is java.sql.Timestamp.valueOf("1996-08-03 00:01:02") while
> (in my timezone) it should really be
> java.sql.Timestamp.valueOf("1996-08-03 02:01:02"). The test will have
> to be rewritten to work in any timezone obviously, which will imply
> creating a millis value from a Calendar with a certain fixed timezone.
> 
> 
> Jan