You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Vladimir Ozerov <pp...@gmail.com> on 2021/03/28 07:51:15 UTC

Time zone management and DST in Avatica

Hi,

Avatica assumes that the underlying engine operates on relative time
without time zone as opposed to global time (UTC). When you set a temporal
value (e.g., a prepared statement parameter), Avatica adds the current
offset to the passed time. When you read a temporal value, Avatica
subtracts the current offset. This may lead to incorrect results if DST
offset changes.

Consider that we have a timezone with DST, that works as follows. D1 and D2
are two consecutive days (e.g., 24 and 25 Oct):
D2 00:00 GMT+2 -> D1 22:00 GMT
D2 01:00 GMT+2 -> D1 23:00 GMT
D2 03:00 GMT+3 -> D2 00:00 GMT
D2 04:00 GMT+3 -> D2 01:00 GMT

Now consider, that we want to save D2 00:00 GMT+2 using Avatica. On write,
Avatica will advance the time by the TZ offset. On read, Avatica will
subtract the TZ offset. The problem is that different offsets will be used,
leading to the incorrect result. The associated logic is located in
AbstractCursor and TypedValue classes.

long initial = [D2 00:00 GMT+2].epochMillis() // D1 22:00 GMT
long onWrite = initial + offsetAt(initial);   // D2 00:00 GMT
long onRead = onWrite - offsetAt(onWrite);    // D1 21:00 GMT
assert initial == onRead;                     // Fails

The fundamental problem is that the current time offset is used, which
might differ before and after adjustment. One potential solution is to
enhance the reading part. It should check whether the offset after the
subtraction is the same and if not - do the additional adjustment to
restore the proper time.

Do you have any objections to the proposed change?

Regards,
Vladimir.

Re: Time zone management and DST in Avatica

Posted by Julian Hyde <jh...@apache.org>.
> I guess anyone who wants the latter would use "Etc/GMT+2".

Oops. That would be "Etc/GMT-2".

On Thu, Apr 1, 2021 at 11:17 AM Julian Hyde <jh...@apache.org> wrote:
>
> Let me state up front: I do think there is an issue here, and we can
> discuss it in another email. But I want to make a couple of pedantic
> points.
>
> Time zones are hard to implement and hard to understand. Many times
> over the years, people have accused Avatica of having the wrong
> behavior, and usually they have been mistaken. Therefore if you are
> claiming that Avatica is wrong you need to provide a watertight case.
>
> Vladimir Ozerov's original example was in terms of the "Israel
> Standard Time" time zone, and he later switched to the "Israel" time
> zone and claimed there was no difference. Vladimir claims that "Israel
> Standard Time" is variable; but according to Wikipedia [1] it is
> fixed. On my system, Java's TimeZones includes "Israel" and
> "Asia/Jerusalem", which are variable, but does not include "Israel
> Standard Time". I guess anyone who wants the latter would use
> "Etc/GMT+2".
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Israel_Standard_Time
>
> On Thu, Apr 1, 2021 at 1:58 AM Vladimir Sitnikov
> <si...@gmail.com> wrote:
> >
> > Vladimir>I can write a dedicated test for that, but it would require some
> > time. But
> > Vladimir>I do not see much value in it during the discussion
> >
> > What is your question then?
> > I assume you want to make a change to Avatica.
> > Typically, we need tests for all code changes, so eventually, a test would
> > be required anyway.
> >
> > Timezones are always hard to get, so there's no question there's
> > something wrong with timezones in Avatica.
> > You don't need to question that, the defect is there, so let's focus on the
> > most important end-to-end cases.
> >
> > ---
> >
> > It looks like Avatica uses "int" to represent all time-like entities, which
> > is not really enough to distinguish "timestamp without time zone" vs
> > "timestamp with time zone".
> > On top of that, databases might support per-cell timezone values.
> >
> > For instance, PostgreSQL always stores all "timestamp with time zone"
> > values in UTC, so users can't store "Europe/Amsterdam" in the DB. The
> > values would always be normalized to UTC.
> > On contrary, OracleDB can store time zone as a part of the value, so if the
> > user stores "09:00 Europe/Amsterdam", then Europe/Amsterdam becomes a part
> > of the value which is stored in the DB.
> >
> > ---
> >
> > I believe "offset calculation" can not be discussed without end-to-end
> > sample.
> > Just in case, "sqlfiddle, psql, etc" do not count since SQL clients do
> > impact the results since
> > SQL client sets connection environment (e.g. client_timestamp), SQL client
> > can use its own timestamp formatting, etc.
> > That is why I suggest focusing on JDBC API rather than sqlfiddle.
> >
> > We can discuss whether Avatica needs "local timestamp" datatype like
> > "2021-04-01 03:05 local time" (~LocalDateTime)
> > We can discuss whether Avatica needs "timestamp with time zone" datatype
> > "2021-04-01 03:05 Europe/Amsterdam" (~ZonedDateTime).
> > We can discuss whether Avatica needs a per-value timezone: should Avatica
> > normalize user-provided instants to UTC or should it propagate per-value
> > timezones to the underlying engine.
> >
> > Does that make sense?
> >
> > Vladimir

Re: Time zone management and DST in Avatica

Posted by Julian Hyde <jh...@apache.org>.
Let me state up front: I do think there is an issue here, and we can
discuss it in another email. But I want to make a couple of pedantic
points.

Time zones are hard to implement and hard to understand. Many times
over the years, people have accused Avatica of having the wrong
behavior, and usually they have been mistaken. Therefore if you are
claiming that Avatica is wrong you need to provide a watertight case.

Vladimir Ozerov's original example was in terms of the "Israel
Standard Time" time zone, and he later switched to the "Israel" time
zone and claimed there was no difference. Vladimir claims that "Israel
Standard Time" is variable; but according to Wikipedia [1] it is
fixed. On my system, Java's TimeZones includes "Israel" and
"Asia/Jerusalem", which are variable, but does not include "Israel
Standard Time". I guess anyone who wants the latter would use
"Etc/GMT+2".

Julian

[1] https://en.wikipedia.org/wiki/Israel_Standard_Time

On Thu, Apr 1, 2021 at 1:58 AM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> Vladimir>I can write a dedicated test for that, but it would require some
> time. But
> Vladimir>I do not see much value in it during the discussion
>
> What is your question then?
> I assume you want to make a change to Avatica.
> Typically, we need tests for all code changes, so eventually, a test would
> be required anyway.
>
> Timezones are always hard to get, so there's no question there's
> something wrong with timezones in Avatica.
> You don't need to question that, the defect is there, so let's focus on the
> most important end-to-end cases.
>
> ---
>
> It looks like Avatica uses "int" to represent all time-like entities, which
> is not really enough to distinguish "timestamp without time zone" vs
> "timestamp with time zone".
> On top of that, databases might support per-cell timezone values.
>
> For instance, PostgreSQL always stores all "timestamp with time zone"
> values in UTC, so users can't store "Europe/Amsterdam" in the DB. The
> values would always be normalized to UTC.
> On contrary, OracleDB can store time zone as a part of the value, so if the
> user stores "09:00 Europe/Amsterdam", then Europe/Amsterdam becomes a part
> of the value which is stored in the DB.
>
> ---
>
> I believe "offset calculation" can not be discussed without end-to-end
> sample.
> Just in case, "sqlfiddle, psql, etc" do not count since SQL clients do
> impact the results since
> SQL client sets connection environment (e.g. client_timestamp), SQL client
> can use its own timestamp formatting, etc.
> That is why I suggest focusing on JDBC API rather than sqlfiddle.
>
> We can discuss whether Avatica needs "local timestamp" datatype like
> "2021-04-01 03:05 local time" (~LocalDateTime)
> We can discuss whether Avatica needs "timestamp with time zone" datatype
> "2021-04-01 03:05 Europe/Amsterdam" (~ZonedDateTime).
> We can discuss whether Avatica needs a per-value timezone: should Avatica
> normalize user-provided instants to UTC or should it propagate per-value
> timezones to the underlying engine.
>
> Does that make sense?
>
> Vladimir

Re: Time zone management and DST in Avatica

Posted by Vladimir Sitnikov <si...@gmail.com>.
Vladimir>I can write a dedicated test for that, but it would require some
time. But
Vladimir>I do not see much value in it during the discussion

What is your question then?
I assume you want to make a change to Avatica.
Typically, we need tests for all code changes, so eventually, a test would
be required anyway.

Timezones are always hard to get, so there's no question there's
something wrong with timezones in Avatica.
You don't need to question that, the defect is there, so let's focus on the
most important end-to-end cases.

---

It looks like Avatica uses "int" to represent all time-like entities, which
is not really enough to distinguish "timestamp without time zone" vs
"timestamp with time zone".
On top of that, databases might support per-cell timezone values.

For instance, PostgreSQL always stores all "timestamp with time zone"
values in UTC, so users can't store "Europe/Amsterdam" in the DB. The
values would always be normalized to UTC.
On contrary, OracleDB can store time zone as a part of the value, so if the
user stores "09:00 Europe/Amsterdam", then Europe/Amsterdam becomes a part
of the value which is stored in the DB.

---

I believe "offset calculation" can not be discussed without end-to-end
sample.
Just in case, "sqlfiddle, psql, etc" do not count since SQL clients do
impact the results since
SQL client sets connection environment (e.g. client_timestamp), SQL client
can use its own timestamp formatting, etc.
That is why I suggest focusing on JDBC API rather than sqlfiddle.

We can discuss whether Avatica needs "local timestamp" datatype like
"2021-04-01 03:05 local time" (~LocalDateTime)
We can discuss whether Avatica needs "timestamp with time zone" datatype
"2021-04-01 03:05 Europe/Amsterdam" (~ZonedDateTime).
We can discuss whether Avatica needs a per-value timezone: should Avatica
normalize user-provided instants to UTC or should it propagate per-value
timezones to the underlying engine.

Does that make sense?

Vladimir

Re: Time zone management and DST in Avatica

Posted by Vladimir Ozerov <pp...@gmail.com>.
Hi Julian, Vladimir,

Israel time zone It is not fixed. Formally, you may find different
abbreviations, like IST and IDT. But in real systems, like Postgres, MS
SQL, or Java, this is a single time zone with DST changes.

The following query in Postgres will return 21:00, 21:00, 22:00 for 24, 25,
and 26 Oct respectively:
select
  cast('October 24 00:00:00 2020 Israel' as timestamp with time zone) at
time zone 'UTC',
  cast('October 25 00:00:00 2020 Israel' as timestamp with time zone) at
time zone 'UTC',
  cast('October 26 00:00:00 2020 Israel' as timestamp with time zone) at
time zone 'UTC'
from t;

Same for SQL Server, even though the zone is named "Israel *Standard* Time"

In Java, the relevant ZoneId is "Israel". And it also returns 21:00, 21:00,
22:00 for these dates:

ZoneId utcZoneId = ZoneId.of("UTC");
ZoneId istZoneId = ZoneId.of("Israel");
LocalDate date24 = LocalDate.of(2020, 10, 24);
LocalDate date25 = LocalDate.of(2020, 10, 25);
LocalDate date26 = LocalDate.of(2020, 10, 26);
System.out.println(ZonedDateTime.ofInstant(date24.atStartOfDay(istZoneId).toInstant(),
utcZoneId));
System.out.println(ZonedDateTime.ofInstant(date25.atStartOfDay(istZoneId).toInstant(),
utcZoneId));
System.out.println(ZonedDateTime.ofInstant(date26.atStartOfDay(istZoneId).toInstant(),
utcZoneId));

I do not have an isolated test for Avatica for now, but I observed the
problem in the real system. The problematic code is in
AbstractCursor.longToDate. The method accepts the absolute GMT time at the
start of the day. E.g., 25-Oct-2020 00:00 GMT. At this time Israel's offset
was +2:00. Then it subtracts that offset, assuming that  25-Oct-2020 00:00
IST == 24-Oct-2020 22:00 GMT. But it is not, because several hours earlier
the offset was +3:00 due to DST. So actually 25-Oct-2020 00:00 IST ==
24-Oct-2020 21:00 GMT. As a result, Avatica will return 22:00 from the
result set, while other databases and Java would return 21:00.

I can write a dedicated test for that, but it would require some time. But
I do not see much value in it during the discussion, because the problem is
relatively clear: the offset at time T2 cannot be used to deduce the offset
at time T1.

This is why other drivers often do some "magic" with Calendar to get the
correct time. Like in PG JDBC [1]. Notice, that for time zones without DST,
they just do some simple math, similarly to Avatica. But for time zones
with DST, they do more complicated calculations to get the correct result.

Regards,
Vladimir.

[1]
https://github.com/pgjdbc/pgjdbc/blob/866c6a9e4cc42d9c279d68b8c756f562eaf0f249/pgjdbc/src/main/java/org/postgresql/jdbc/TimestampUtils.java#L1329

ср, 31 мар. 2021 г. в 23:25, Julian Hyde <jh...@apache.org>:

> Israel Standard Time is fixed at UTC+2. It does not observe daylight
> savings time. So maybe your wall clock should have been in the
> 'Asia/Jerusalem' time zone rather than Israel Standard Time.
>
> On Wed, Mar 31, 2021 at 12:23 PM Vladimir Sitnikov
> <si...@gmail.com> wrote:
> >
> > >Let me provide the exact example
> >
> > Sorry for not being clear.
> > As far as I understand, Avatica means "Java JDBC API" or "something like
> Go
> > API".
> >
> > Could you please provide a test case via Avatica API (e.g. JDBC API)
> along
> > with the actual and expected result?
> > For example, org.apache.calcite.jdbc.CalciteRemoteDriverTest verifies the
> > behavior of Avatica+Calcite
> > integration.
> >
> > Vladimir
>

Re: Time zone management and DST in Avatica

Posted by Julian Hyde <jh...@apache.org>.
Israel Standard Time is fixed at UTC+2. It does not observe daylight
savings time. So maybe your wall clock should have been in the
'Asia/Jerusalem' time zone rather than Israel Standard Time.

On Wed, Mar 31, 2021 at 12:23 PM Vladimir Sitnikov
<si...@gmail.com> wrote:
>
> >Let me provide the exact example
>
> Sorry for not being clear.
> As far as I understand, Avatica means "Java JDBC API" or "something like Go
> API".
>
> Could you please provide a test case via Avatica API (e.g. JDBC API) along
> with the actual and expected result?
> For example, org.apache.calcite.jdbc.CalciteRemoteDriverTest verifies the
> behavior of Avatica+Calcite
> integration.
>
> Vladimir

Re: Time zone management and DST in Avatica

Posted by Vladimir Sitnikov <si...@gmail.com>.
>Let me provide the exact example

Sorry for not being clear.
As far as I understand, Avatica means "Java JDBC API" or "something like Go
API".

Could you please provide a test case via Avatica API (e.g. JDBC API) along
with the actual and expected result?
For example, org.apache.calcite.jdbc.CalciteRemoteDriverTest verifies the
behavior of Avatica+Calcite
integration.

Vladimir

Re: Time zone management and DST in Avatica

Posted by Vladimir Ozerov <pp...@gmail.com>.
Hi,

Thank you for your feedback. Let me provide the exact example of when the
problem happened. Consider the date 2020 Oct 25 in Israel Standard Time
[1]. There was DST end at this date. This date represents exactly 18560
days since 01 Jan 1970. When there was 2020 Oct 25 00:00 IST, it was 2020
Oct 24 21:00 GMT. Therefore, my expectation is that if I return 18560 to
Avatica, I would get  2020 Oct 24 21:00 GMT millis back. This is what at
least PG and SQL Server returns. You may check it in the SQLFiddle [1]
quickly:

Schema:
create table t(c int primary key);
insert into t values (1);

Postgres:
select
  cast('October 25 00:00:00 2020 Israel' as timestamp with time zone) at
time zone 'UTC'
from t;

SQL Server:
select
  ({d'2020-10-25'}  AT TIME ZONE 'Israel Standard Time') AT TIME ZONE 'UTC'
from t;

However, Avatica returns me 2020 Oct 24 22:00 GMT. This is wrong because at
22:00 GMT it was 01:00 on a wall clock in Israel.
The problem is that we cannot use the current time to derive the offset
that should be applied to the past time.

Regards,
Vladimir.

[1] https://www.timeanddate.com/time/zone/israel/jerusalem
[2] http://sqlfiddle.com/

ср, 31 мар. 2021 г. в 10:38, Vladimir Sitnikov <sitnikov.vladimir@gmail.com
>:

> Julian>you should be able to come up with a test case
> Julian>against a reputable database (e.g. PostgreSQL
>
> pgjdbc committer here :)
>
> There are timezone-related tests in
>
> https://github.com/pgjdbc/pgjdbc/blob/c633cc6cf5295bfbd5b6a79bb45dff863c5056f5/pgjdbc/src/test/java/org/postgresql/test/jdbc2/TimezoneTest.java
>
> and
>
> https://github.com/pgjdbc/pgjdbc/blob/c633cc6cf5295bfbd5b6a79bb45dff863c5056f5/pgjdbc/src/test/java/org/postgresql/test/jdbc42/GetObject310Test.java
>
> It might be fun to run the tests through Avatica.
>
> Vladimir, frankly speaking, I don't follow what do you mean by "Avatica
> assumes that the underlying engine operates on relative time"
> Could you please provide the exact API call sequence and the expected
> result?
>
> Vladimir
>

Re: Time zone management and DST in Avatica

Posted by Vladimir Sitnikov <si...@gmail.com>.
Julian>you should be able to come up with a test case
Julian>against a reputable database (e.g. PostgreSQL

pgjdbc committer here :)

There are timezone-related tests in
https://github.com/pgjdbc/pgjdbc/blob/c633cc6cf5295bfbd5b6a79bb45dff863c5056f5/pgjdbc/src/test/java/org/postgresql/test/jdbc2/TimezoneTest.java

and
https://github.com/pgjdbc/pgjdbc/blob/c633cc6cf5295bfbd5b6a79bb45dff863c5056f5/pgjdbc/src/test/java/org/postgresql/test/jdbc42/GetObject310Test.java

It might be fun to run the tests through Avatica.

Vladimir, frankly speaking, I don't follow what do you mean by "Avatica
assumes that the underlying engine operates on relative time"
Could you please provide the exact API call sequence and the expected
result?

Vladimir

Re: Time zone management and DST in Avatica

Posted by Julian Hyde <jh...@apache.org>.
> Avatica assumes that the underlying engine operates on relative time
> without time zone as opposed to global time (UTC).

This is correct. This is standard behavior for SQL's TIMESTAMP data
type. (Hopefully Avatica's behavior for TIMESTAMP WITH TIME ZONE and
TIMESTAMP WITH LOCAL TIME is different.)

> D2 00:00 GMT+2 -> D1 22:00 GMT
> D2 01:00 GMT+2 -> D1 23:00 GMT
> D2 03:00 GMT+3 -> D2 00:00 GMT
> D2 04:00 GMT+3 -> D2 01:00 GMT

This example is a little misleading. I don't think the values on the
right should have GMT. The values on the left are instants (i.e.
values relative to UTC epoch), but the values on the right are
zoneless.

In order to convert an instant to a zoneless timestamp, we need to
know a time zone, and the DST offset of that time zone at that
instant. That question, e.g. "What is the DST offset of the
'America/Los Angeles' time zone at instant D2 01:00 GMT" is
well-defined.

In order to convert a zoneless timestamp to an instant, we need to
know a time zone, and the DST offset of that time zone at that
zoneless timestamp. That question is not well-defined - it usually has
1 answer, but it might have 0 in the spring or 2 in the fall. For
example, "What is the DST offset of the 'America/Los Angeles' time
zone at timestamp '2020-11-01 01:30'" has answers +7 and +8, because
that local time occurred twice. The same question in a time zone that
does not shift (e.g. PST or PDT) has only one answer.

It is possible that Avatica falls into the ambiguity of the second
case. In which case, you should be able to come up with a test case
against a reputable database (e.g. PostgreSQL, hsqldb) where Avatica's
behavior differs. This matter is so complex that I no longer trust my
own (or anyone else's) reasoning.

Julian

On Sun, Mar 28, 2021 at 2:44 AM Alessandro Solimando
<al...@gmail.com> wrote:
>
> Hi Vladimir,
> your analysis seems correct to me, as well as your proposed solution.
>
> Best regards,
> Alessandro
>
> On Sun, 28 Mar 2021 at 09:51, Vladimir Ozerov <pp...@gmail.com> wrote:
>
> > Hi,
> >
> > Avatica assumes that the underlying engine operates on relative time
> > without time zone as opposed to global time (UTC). When you set a temporal
> > value (e.g., a prepared statement parameter), Avatica adds the current
> > offset to the passed time. When you read a temporal value, Avatica
> > subtracts the current offset. This may lead to incorrect results if DST
> > offset changes.
> >
> > Consider that we have a timezone with DST, that works as follows. D1 and D2
> > are two consecutive days (e.g., 24 and 25 Oct):
> > D2 00:00 GMT+2 -> D1 22:00 GMT
> > D2 01:00 GMT+2 -> D1 23:00 GMT
> > D2 03:00 GMT+3 -> D2 00:00 GMT
> > D2 04:00 GMT+3 -> D2 01:00 GMT
> >
> > Now consider, that we want to save D2 00:00 GMT+2 using Avatica. On write,
> > Avatica will advance the time by the TZ offset. On read, Avatica will
> > subtract the TZ offset. The problem is that different offsets will be used,
> > leading to the incorrect result. The associated logic is located in
> > AbstractCursor and TypedValue classes.
> >
> > long initial = [D2 00:00 GMT+2].epochMillis() // D1 22:00 GMT
> > long onWrite = initial + offsetAt(initial);   // D2 00:00 GMT
> > long onRead = onWrite - offsetAt(onWrite);    // D1 21:00 GMT
> > assert initial == onRead;                     // Fails
> >
> > The fundamental problem is that the current time offset is used, which
> > might differ before and after adjustment. One potential solution is to
> > enhance the reading part. It should check whether the offset after the
> > subtraction is the same and if not - do the additional adjustment to
> > restore the proper time.
> >
> > Do you have any objections to the proposed change?
> >
> > Regards,
> > Vladimir.
> >

Re: Time zone management and DST in Avatica

Posted by Alessandro Solimando <al...@gmail.com>.
Hi Vladimir,
your analysis seems correct to me, as well as your proposed solution.

Best regards,
Alessandro

On Sun, 28 Mar 2021 at 09:51, Vladimir Ozerov <pp...@gmail.com> wrote:

> Hi,
>
> Avatica assumes that the underlying engine operates on relative time
> without time zone as opposed to global time (UTC). When you set a temporal
> value (e.g., a prepared statement parameter), Avatica adds the current
> offset to the passed time. When you read a temporal value, Avatica
> subtracts the current offset. This may lead to incorrect results if DST
> offset changes.
>
> Consider that we have a timezone with DST, that works as follows. D1 and D2
> are two consecutive days (e.g., 24 and 25 Oct):
> D2 00:00 GMT+2 -> D1 22:00 GMT
> D2 01:00 GMT+2 -> D1 23:00 GMT
> D2 03:00 GMT+3 -> D2 00:00 GMT
> D2 04:00 GMT+3 -> D2 01:00 GMT
>
> Now consider, that we want to save D2 00:00 GMT+2 using Avatica. On write,
> Avatica will advance the time by the TZ offset. On read, Avatica will
> subtract the TZ offset. The problem is that different offsets will be used,
> leading to the incorrect result. The associated logic is located in
> AbstractCursor and TypedValue classes.
>
> long initial = [D2 00:00 GMT+2].epochMillis() // D1 22:00 GMT
> long onWrite = initial + offsetAt(initial);   // D2 00:00 GMT
> long onRead = onWrite - offsetAt(onWrite);    // D1 21:00 GMT
> assert initial == onRead;                     // Fails
>
> The fundamental problem is that the current time offset is used, which
> might differ before and after adjustment. One potential solution is to
> enhance the reading part. It should check whether the offset after the
> subtraction is the same and if not - do the additional adjustment to
> restore the proper time.
>
> Do you have any objections to the proposed change?
>
> Regards,
> Vladimir.
>