You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Istvan Toth <st...@apache.org> on 2023/02/27 13:08:54 UTC

[DISCUSS] Timezone handling redux

Hi!

We already have an old rambling thread on fixing Phoenix time zone
handling, but I am starting another one.
Please if you have interest in fixing date handling then take the time to
read it, I promise it is much less messy than the old approach.


*Background:*
The SQL standard defines two groups of Date types.
One is the WITHOUT TIME ZONE types, which are analogous to the
java.time.Local* types, the other is the WITH TIME ZONE types, which are
analogous to java.time.Instant.
Types without qualifiers are interpreted as WITHOUT TIME ZONE types.


*Current state of Phoenix*
Phoenix follows this in most regards, as it stores temporal types as UTC
epoch values, interprets string literals as UTC, and formats them as UTC.
Phoenix date functions also operate in UTC, which is consistent with the
parse/format code. This results in the behaviour required for WITHOUT TIME
ZONE types.
However, this breaks down when we begin to use the java.sql. temporal
types, where Phoenix expects and returns the raw UTC epoch timestamp values.

a brief illustration, for selects, but the situation is the same for upsert
with PreparedStatment.setTimestamp() and friends:


*INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
epoch for '2000-01-01 10:10:10', regardless of the client timezone.


*SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10' ,
regardless of the client timezone.

So far, so good.


*SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
epoch timestamp interpreted in the local timezone, i.e a different string
for every TZ.

This is the root of all of our problems.

This problem is built into the ancient JDBC API, which has no real concept
of WITH TIMEZONE and WITHOUT TIMEZONE types.

*My previous attempt*

Previously, I have changed Phoenix to treat all Temporal data as WITH
TIMEZONE types, which is not wrong, but
1.) Goes against the SQL standard with defines unqualified temporal types
as WITHOUT timezone
2.) Goes against several assumptions built into the Phoenix codebase, and
requires pushing the client TZ to the server side, and
- either using ThreadLocals per connection to store this information,
- or adding the TZ as a parameter to many/most methods in the parser and
compiler code
- or some even deeper refactoring, that I haven't figured out yet.

Implementing real WITH TIMEZONE types in Phoenix in the future is not off
the table, but is a much more involved task.
(As evidenced by the time I have already wasted on it)

*The new proposal*

Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types,
we can keep the current interpretation as WITHOUT TIMEZONE types, and  fix
the get / functions very easily.
The  code change required for this is only a few hundred lines (without
tests).

The solution is as simple as applying the Timezone offset as a displacement
in PreparedStatement and Resultset when setting or returning
java.sql.Date/Time/Timestamp.

This is what most DBs do, and also how the SQL standard defines
conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
case, the java.sql epoch based types transiting the JDBC interface are
treated as WITH TIMEZONE types.

This ensures that non timezone-aware clients get timestamps that, if
interpreted in their local timezone, resolve to the same timezone-less date
that the parser and formatter would return, and any java.sql.temporal types
set in PreparedStatement will be interpreted as the local time in the
client TZ.

The only other change in functionality is the CURRENT_DATE() and
CURRENT_TIME() functions, which now apply the displacement to the generated
values.

As usual, we hide this change behind a property, and default to the old
behaviour to maintain backwards (bug) compatibility.


*Complications*
The SQL standard really talks about a Timezone Offset, and is not talking
about DST.
Many databases take DST into account, as does my patch.
Without that the internal representation would not be really UTC, and the
date functions (i.e. dayofweek()) wouldn't work.

However, the standard DST problems also apply, with one hour each year not
existing, and one hour existing twice.

The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
strings, or use java.time.Local* functions with get/setObject methods.
Phoenix doesn't yet have those, I plan to add them in the near future
(PHOENIX-6881).

When using ROW_TIMESTAMP the displacement rules can easily cause problems
if the user is not expecting it, and expects the java.sql.Timestamp value
to be applied directly.
i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there
is no guarantee that the chronologically last insert will have the latest
timestamp.
This was already the case when setting the column via a String.
This requires extra attention, and should be mentioned in the documentation.

*Feedback*

Please check out the new PR at https://github.com/apache/phoenix/pull/1567,
as I said it is very small.

Any and all comments and questions are welcome, either with the general
approach or with the specific implementation.

I am also collecting some related issues in PHOENIX-6882, those will be
also needed to get Phoenix date handling into a better place.

*References/further reading*

https://issues.apache.org/jira/browse/PHOENIX-5066
https://issues.apache.org/jira/browse/PHOENIX-6882
https://github.com/julianhyde/sqlline/issues/440
https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
Also search for "displacement" in the SQL11 standard. (No public link
available)

best regards
Istvan

Re: [DISCUSS] Timezone handling redux

Posted by Istvan Toth <st...@apache.org>.
I have committed the fix both to master and 5.1 (as it needs to be
explicitly enabled)

Reviews and comments are still welcome.

best regards
Istvan

On Fri, Mar 3, 2023 at 7:46 PM Istvan Toth <st...@apache.org> wrote:

> Hi!
>
> I got one review from Richard on Github - Thank you!
>
> However, even though this is a small (in line count) change, and is off by
> default, it has very high visibility and fundamentally changes the
> date/time semantics of the JDBC interface (even for things like query
> browsers / sqlline) so I'd much prefer to get some eyes on it from
> the SFDC side before commiting.
>
> The actual fix is in the *DateUtil#Apply*Displacement* methods, which are
> called from *PhoenixResultSet* and *PhoenixPreparedStatements*,
> and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some
> (really unrelated) refactoring in PhoenixStatement to avoid re-parsing the
> date configuration parameters.
>
> best regards
> Istvan
>
> On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote:
>
>> Hi!
>>
>> We already have an old rambling thread on fixing Phoenix time zone
>> handling, but I am starting another one.
>> Please if you have interest in fixing date handling then take the time to
>> read it, I promise it is much less messy than the old approach.
>>
>>
>> *Background:*
>> The SQL standard defines two groups of Date types.
>> One is the WITHOUT TIME ZONE types, which are analogous to the
>> java.time.Local* types, the other is the WITH TIME ZONE types, which are
>> analogous to java.time.Instant.
>> Types without qualifiers are interpreted as WITHOUT TIME ZONE types.
>>
>>
>> *Current state of Phoenix*
>> Phoenix follows this in most regards, as it stores temporal types as UTC
>> epoch values, interprets string literals as UTC, and formats them as UTC.
>> Phoenix date functions also operate in UTC, which is consistent with the
>> parse/format code. This results in the behaviour required for WITHOUT TIME
>> ZONE types.
>> However, this breaks down when we begin to use the java.sql. temporal
>> types, where Phoenix expects and returns the raw UTC epoch timestamp values.
>>
>> a brief illustration, for selects, but the situation is the same for
>> upsert with PreparedStatment.setTimestamp() and friends:
>>
>>
>> *INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
>> epoch for '2000-01-01 10:10:10', regardless of the client timezone.
>>
>>
>> *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10'
>> , regardless of the client timezone.
>>
>> So far, so good.
>>
>>
>> *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
>> epoch timestamp interpreted in the local timezone, i.e a different string
>> for every TZ.
>>
>> This is the root of all of our problems.
>>
>> This problem is built into the ancient JDBC API, which has no real
>> concept of WITH TIMEZONE and WITHOUT TIMEZONE types.
>>
>> *My previous attempt*
>>
>> Previously, I have changed Phoenix to treat all Temporal data as WITH
>> TIMEZONE types, which is not wrong, but
>> 1.) Goes against the SQL standard with defines unqualified temporal types
>> as WITHOUT timezone
>> 2.) Goes against several assumptions built into the Phoenix codebase, and
>> requires pushing the client TZ to the server side, and
>> - either using ThreadLocals per connection to store this information,
>> - or adding the TZ as a parameter to many/most methods in the parser and
>> compiler code
>> - or some even deeper refactoring, that I haven't figured out yet.
>>
>> Implementing real WITH TIMEZONE types in Phoenix in the future is not off
>> the table, but is a much more involved task.
>> (As evidenced by the time I have already wasted on it)
>>
>> *The new proposal*
>>
>> Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE
>> types, we can keep the current interpretation as WITHOUT TIMEZONE types,
>> and  fix the get / functions very easily.
>> The  code change required for this is only a few hundred lines (without
>> tests).
>>
>> The solution is as simple as applying the Timezone offset as a
>> displacement in PreparedStatement and Resultset when setting or returning
>> java.sql.Date/Time/Timestamp.
>>
>> This is what most DBs do, and also how the SQL standard defines
>> conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
>> case, the java.sql epoch based types transiting the JDBC interface are
>> treated as WITH TIMEZONE types.
>>
>> This ensures that non timezone-aware clients get timestamps that, if
>> interpreted in their local timezone, resolve to the same timezone-less date
>> that the parser and formatter would return, and any java.sql.temporal types
>> set in PreparedStatement will be interpreted as the local time in the
>> client TZ.
>>
>> The only other change in functionality is the CURRENT_DATE() and
>> CURRENT_TIME() functions, which now apply the displacement to the generated
>> values.
>>
>> As usual, we hide this change behind a property, and default to the old
>> behaviour to maintain backwards (bug) compatibility.
>>
>>
>> *Complications*
>> The SQL standard really talks about a Timezone Offset, and is not talking
>> about DST.
>> Many databases take DST into account, as does my patch.
>> Without that the internal representation would not be really UTC, and the
>> date functions (i.e. dayofweek()) wouldn't work.
>>
>> However, the standard DST problems also apply, with one hour each year
>> not existing, and one hour existing twice.
>>
>> The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
>> strings, or use java.time.Local* functions with get/setObject methods.
>> Phoenix doesn't yet have those, I plan to add them in the near future
>> (PHOENIX-6881).
>>
>> When using ROW_TIMESTAMP the displacement rules can easily cause problems
>> if the user is not expecting it, and expects the java.sql.Timestamp value
>> to be applied directly.
>> i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
>> java.sql.Timestamp(new java.sql.Date()));* from multiple timezones,
>> there is no guarantee that the chronologically last insert will have the
>> latest timestamp.
>> This was already the case when setting the column via a String.
>> This requires extra attention, and should be mentioned in the
>> documentation.
>>
>> *Feedback*
>>
>> Please check out the new PR at
>> https://github.com/apache/phoenix/pull/1567, as I said it is very small.
>>
>> Any and all comments and questions are welcome, either with the general
>> approach or with the specific implementation.
>>
>> I am also collecting some related issues in PHOENIX-6882, those will be
>> also needed to get Phoenix date handling into a better place.
>>
>> *References/further reading*
>>
>> https://issues.apache.org/jira/browse/PHOENIX-5066
>> https://issues.apache.org/jira/browse/PHOENIX-6882
>> https://github.com/julianhyde/sqlline/issues/440
>>
>> https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
>>
>> https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
>>
>> https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
>> Also search for "displacement" in the SQL11 standard. (No public link
>> available)
>>
>> best regards
>> Istvan
>>
>>

Re: [DISCUSS] Timezone handling redux

Posted by Istvan Toth <st...@apache.org>.
Hi!

I got one review from Richard on Github - Thank you!

However, even though this is a small (in line count) change, and is off by
default, it has very high visibility and fundamentally changes the
date/time semantics of the JDBC interface (even for things like query
browsers / sqlline) so I'd much prefer to get some eyes on it from
the SFDC side before commiting.

The actual fix is in the *DateUtil#Apply*Displacement* methods, which are
called from *PhoenixResultSet* and *PhoenixPreparedStatements*,
and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some
(really unrelated) refactoring in PhoenixStatement to avoid re-parsing the
date configuration parameters.

best regards
Istvan

On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote:

> Hi!
>
> We already have an old rambling thread on fixing Phoenix time zone
> handling, but I am starting another one.
> Please if you have interest in fixing date handling then take the time to
> read it, I promise it is much less messy than the old approach.
>
>
> *Background:*
> The SQL standard defines two groups of Date types.
> One is the WITHOUT TIME ZONE types, which are analogous to the
> java.time.Local* types, the other is the WITH TIME ZONE types, which are
> analogous to java.time.Instant.
> Types without qualifiers are interpreted as WITHOUT TIME ZONE types.
>
>
> *Current state of Phoenix*
> Phoenix follows this in most regards, as it stores temporal types as UTC
> epoch values, interprets string literals as UTC, and formats them as UTC.
> Phoenix date functions also operate in UTC, which is consistent with the
> parse/format code. This results in the behaviour required for WITHOUT TIME
> ZONE types.
> However, this breaks down when we begin to use the java.sql. temporal
> types, where Phoenix expects and returns the raw UTC epoch timestamp values.
>
> a brief illustration, for selects, but the situation is the same for
> upsert with PreparedStatment.setTimestamp() and friends:
>
>
> *INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
> epoch for '2000-01-01 10:10:10', regardless of the client timezone.
>
>
> *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10'
> , regardless of the client timezone.
>
> So far, so good.
>
>
> *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
> epoch timestamp interpreted in the local timezone, i.e a different string
> for every TZ.
>
> This is the root of all of our problems.
>
> This problem is built into the ancient JDBC API, which has no real concept
> of WITH TIMEZONE and WITHOUT TIMEZONE types.
>
> *My previous attempt*
>
> Previously, I have changed Phoenix to treat all Temporal data as WITH
> TIMEZONE types, which is not wrong, but
> 1.) Goes against the SQL standard with defines unqualified temporal types
> as WITHOUT timezone
> 2.) Goes against several assumptions built into the Phoenix codebase, and
> requires pushing the client TZ to the server side, and
> - either using ThreadLocals per connection to store this information,
> - or adding the TZ as a parameter to many/most methods in the parser and
> compiler code
> - or some even deeper refactoring, that I haven't figured out yet.
>
> Implementing real WITH TIMEZONE types in Phoenix in the future is not off
> the table, but is a much more involved task.
> (As evidenced by the time I have already wasted on it)
>
> *The new proposal*
>
> Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types,
> we can keep the current interpretation as WITHOUT TIMEZONE types, and  fix
> the get / functions very easily.
> The  code change required for this is only a few hundred lines (without
> tests).
>
> The solution is as simple as applying the Timezone offset as a
> displacement in PreparedStatement and Resultset when setting or returning
> java.sql.Date/Time/Timestamp.
>
> This is what most DBs do, and also how the SQL standard defines
> conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
> case, the java.sql epoch based types transiting the JDBC interface are
> treated as WITH TIMEZONE types.
>
> This ensures that non timezone-aware clients get timestamps that, if
> interpreted in their local timezone, resolve to the same timezone-less date
> that the parser and formatter would return, and any java.sql.temporal types
> set in PreparedStatement will be interpreted as the local time in the
> client TZ.
>
> The only other change in functionality is the CURRENT_DATE() and
> CURRENT_TIME() functions, which now apply the displacement to the generated
> values.
>
> As usual, we hide this change behind a property, and default to the old
> behaviour to maintain backwards (bug) compatibility.
>
>
> *Complications*
> The SQL standard really talks about a Timezone Offset, and is not talking
> about DST.
> Many databases take DST into account, as does my patch.
> Without that the internal representation would not be really UTC, and the
> date functions (i.e. dayofweek()) wouldn't work.
>
> However, the standard DST problems also apply, with one hour each year not
> existing, and one hour existing twice.
>
> The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
> strings, or use java.time.Local* functions with get/setObject methods.
> Phoenix doesn't yet have those, I plan to add them in the near future
> (PHOENIX-6881).
>
> When using ROW_TIMESTAMP the displacement rules can easily cause problems
> if the user is not expecting it, and expects the java.sql.Timestamp value
> to be applied directly.
> i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
> java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there
> is no guarantee that the chronologically last insert will have the latest
> timestamp.
> This was already the case when setting the column via a String.
> This requires extra attention, and should be mentioned in the
> documentation.
>
> *Feedback*
>
> Please check out the new PR at https://github.com/apache/phoenix/pull/1567,
> as I said it is very small.
>
> Any and all comments and questions are welcome, either with the general
> approach or with the specific implementation.
>
> I am also collecting some related issues in PHOENIX-6882, those will be
> also needed to get Phoenix date handling into a better place.
>
> *References/further reading*
>
> https://issues.apache.org/jira/browse/PHOENIX-5066
> https://issues.apache.org/jira/browse/PHOENIX-6882
> https://github.com/julianhyde/sqlline/issues/440
>
> https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
>
> https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
>
> https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
> Also search for "displacement" in the SQL11 standard. (No public link
> available)
>
> best regards
> Istvan
>
>