You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Istvan Toth <st...@apache.org> on 2023/03/03 18:46:40 UTC

Re: [DISCUSS] Timezone handling redux

Hi!

I got one review from Richard on Github - Thank you!

However, even though this is a small (in line count) change, and is off by
default, it has very high visibility and fundamentally changes the
date/time semantics of the JDBC interface (even for things like query
browsers / sqlline) so I'd much prefer to get some eyes on it from
the SFDC side before commiting.

The actual fix is in the *DateUtil#Apply*Displacement* methods, which are
called from *PhoenixResultSet* and *PhoenixPreparedStatements*,
and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some
(really unrelated) refactoring in PhoenixStatement to avoid re-parsing the
date configuration parameters.

best regards
Istvan

On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote:

> Hi!
>
> We already have an old rambling thread on fixing Phoenix time zone
> handling, but I am starting another one.
> Please if you have interest in fixing date handling then take the time to
> read it, I promise it is much less messy than the old approach.
>
>
> *Background:*
> The SQL standard defines two groups of Date types.
> One is the WITHOUT TIME ZONE types, which are analogous to the
> java.time.Local* types, the other is the WITH TIME ZONE types, which are
> analogous to java.time.Instant.
> Types without qualifiers are interpreted as WITHOUT TIME ZONE types.
>
>
> *Current state of Phoenix*
> Phoenix follows this in most regards, as it stores temporal types as UTC
> epoch values, interprets string literals as UTC, and formats them as UTC.
> Phoenix date functions also operate in UTC, which is consistent with the
> parse/format code. This results in the behaviour required for WITHOUT TIME
> ZONE types.
> However, this breaks down when we begin to use the java.sql. temporal
> types, where Phoenix expects and returns the raw UTC epoch timestamp values.
>
> a brief illustration, for selects, but the situation is the same for
> upsert with PreparedStatment.setTimestamp() and friends:
>
>
> *INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
> epoch for '2000-01-01 10:10:10', regardless of the client timezone.
>
>
> *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10'
> , regardless of the client timezone.
>
> So far, so good.
>
>
> *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
> epoch timestamp interpreted in the local timezone, i.e a different string
> for every TZ.
>
> This is the root of all of our problems.
>
> This problem is built into the ancient JDBC API, which has no real concept
> of WITH TIMEZONE and WITHOUT TIMEZONE types.
>
> *My previous attempt*
>
> Previously, I have changed Phoenix to treat all Temporal data as WITH
> TIMEZONE types, which is not wrong, but
> 1.) Goes against the SQL standard with defines unqualified temporal types
> as WITHOUT timezone
> 2.) Goes against several assumptions built into the Phoenix codebase, and
> requires pushing the client TZ to the server side, and
> - either using ThreadLocals per connection to store this information,
> - or adding the TZ as a parameter to many/most methods in the parser and
> compiler code
> - or some even deeper refactoring, that I haven't figured out yet.
>
> Implementing real WITH TIMEZONE types in Phoenix in the future is not off
> the table, but is a much more involved task.
> (As evidenced by the time I have already wasted on it)
>
> *The new proposal*
>
> Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE types,
> we can keep the current interpretation as WITHOUT TIMEZONE types, and  fix
> the get / functions very easily.
> The  code change required for this is only a few hundred lines (without
> tests).
>
> The solution is as simple as applying the Timezone offset as a
> displacement in PreparedStatement and Resultset when setting or returning
> java.sql.Date/Time/Timestamp.
>
> This is what most DBs do, and also how the SQL standard defines
> conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
> case, the java.sql epoch based types transiting the JDBC interface are
> treated as WITH TIMEZONE types.
>
> This ensures that non timezone-aware clients get timestamps that, if
> interpreted in their local timezone, resolve to the same timezone-less date
> that the parser and formatter would return, and any java.sql.temporal types
> set in PreparedStatement will be interpreted as the local time in the
> client TZ.
>
> The only other change in functionality is the CURRENT_DATE() and
> CURRENT_TIME() functions, which now apply the displacement to the generated
> values.
>
> As usual, we hide this change behind a property, and default to the old
> behaviour to maintain backwards (bug) compatibility.
>
>
> *Complications*
> The SQL standard really talks about a Timezone Offset, and is not talking
> about DST.
> Many databases take DST into account, as does my patch.
> Without that the internal representation would not be really UTC, and the
> date functions (i.e. dayofweek()) wouldn't work.
>
> However, the standard DST problems also apply, with one hour each year not
> existing, and one hour existing twice.
>
> The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
> strings, or use java.time.Local* functions with get/setObject methods.
> Phoenix doesn't yet have those, I plan to add them in the near future
> (PHOENIX-6881).
>
> When using ROW_TIMESTAMP the displacement rules can easily cause problems
> if the user is not expecting it, and expects the java.sql.Timestamp value
> to be applied directly.
> i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
> java.sql.Timestamp(new java.sql.Date()));* from multiple timezones, there
> is no guarantee that the chronologically last insert will have the latest
> timestamp.
> This was already the case when setting the column via a String.
> This requires extra attention, and should be mentioned in the
> documentation.
>
> *Feedback*
>
> Please check out the new PR at https://github.com/apache/phoenix/pull/1567,
> as I said it is very small.
>
> Any and all comments and questions are welcome, either with the general
> approach or with the specific implementation.
>
> I am also collecting some related issues in PHOENIX-6882, those will be
> also needed to get Phoenix date handling into a better place.
>
> *References/further reading*
>
> https://issues.apache.org/jira/browse/PHOENIX-5066
> https://issues.apache.org/jira/browse/PHOENIX-6882
> https://github.com/julianhyde/sqlline/issues/440
>
> https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
>
> https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
>
> https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
> Also search for "displacement" in the SQL11 standard. (No public link
> available)
>
> best regards
> Istvan
>
>

Re: [DISCUSS] Timezone handling redux

Posted by Istvan Toth <st...@apache.org>.
I have committed the fix both to master and 5.1 (as it needs to be
explicitly enabled)

Reviews and comments are still welcome.

best regards
Istvan

On Fri, Mar 3, 2023 at 7:46 PM Istvan Toth <st...@apache.org> wrote:

> Hi!
>
> I got one review from Richard on Github - Thank you!
>
> However, even though this is a small (in line count) change, and is off by
> default, it has very high visibility and fundamentally changes the
> date/time semantics of the JDBC interface (even for things like query
> browsers / sqlline) so I'd much prefer to get some eyes on it from
> the SFDC side before commiting.
>
> The actual fix is in the *DateUtil#Apply*Displacement* methods, which are
> called from *PhoenixResultSet* and *PhoenixPreparedStatements*,
> and from the* CURRENT_DATE/TIME()* functions. The rest is tests, and some
> (really unrelated) refactoring in PhoenixStatement to avoid re-parsing the
> date configuration parameters.
>
> best regards
> Istvan
>
> On Mon, Feb 27, 2023 at 2:08 PM Istvan Toth <st...@apache.org> wrote:
>
>> Hi!
>>
>> We already have an old rambling thread on fixing Phoenix time zone
>> handling, but I am starting another one.
>> Please if you have interest in fixing date handling then take the time to
>> read it, I promise it is much less messy than the old approach.
>>
>>
>> *Background:*
>> The SQL standard defines two groups of Date types.
>> One is the WITHOUT TIME ZONE types, which are analogous to the
>> java.time.Local* types, the other is the WITH TIME ZONE types, which are
>> analogous to java.time.Instant.
>> Types without qualifiers are interpreted as WITHOUT TIME ZONE types.
>>
>>
>> *Current state of Phoenix*
>> Phoenix follows this in most regards, as it stores temporal types as UTC
>> epoch values, interprets string literals as UTC, and formats them as UTC.
>> Phoenix date functions also operate in UTC, which is consistent with the
>> parse/format code. This results in the behaviour required for WITHOUT TIME
>> ZONE types.
>> However, this breaks down when we begin to use the java.sql. temporal
>> types, where Phoenix expects and returns the raw UTC epoch timestamp values.
>>
>> a brief illustration, for selects, but the situation is the same for
>> upsert with PreparedStatment.setTimestamp() and friends:
>>
>>
>> *INSERT INTO X  (DATE) VALUES ('2000-01-01 10:10:10') *stores  the UTC
>> epoch for '2000-01-01 10:10:10', regardless of the client timezone.
>>
>>
>> *SELECT * from DATE; rs.getString("DATE") *returns '2000-01-01 10:10:10'
>> , regardless of the client timezone.
>>
>> So far, so good.
>>
>>
>> *SELECT * from DATE; rs.getTimestamp("DATE").toString()*returns the UTC
>> epoch timestamp interpreted in the local timezone, i.e a different string
>> for every TZ.
>>
>> This is the root of all of our problems.
>>
>> This problem is built into the ancient JDBC API, which has no real
>> concept of WITH TIMEZONE and WITHOUT TIMEZONE types.
>>
>> *My previous attempt*
>>
>> Previously, I have changed Phoenix to treat all Temporal data as WITH
>> TIMEZONE types, which is not wrong, but
>> 1.) Goes against the SQL standard with defines unqualified temporal types
>> as WITHOUT timezone
>> 2.) Goes against several assumptions built into the Phoenix codebase, and
>> requires pushing the client TZ to the server side, and
>> - either using ThreadLocals per connection to store this information,
>> - or adding the TZ as a parameter to many/most methods in the parser and
>> compiler code
>> - or some even deeper refactoring, that I haven't figured out yet.
>>
>> Implementing real WITH TIMEZONE types in Phoenix in the future is not off
>> the table, but is a much more involved task.
>> (As evidenced by the time I have already wasted on it)
>>
>> *The new proposal*
>>
>> Instead of re-imagining Phoenix to treat all types as WITH TIMEZONE
>> types, we can keep the current interpretation as WITHOUT TIMEZONE types,
>> and  fix the get / functions very easily.
>> The  code change required for this is only a few hundred lines (without
>> tests).
>>
>> The solution is as simple as applying the Timezone offset as a
>> displacement in PreparedStatement and Resultset when setting or returning
>> java.sql.Date/Time/Timestamp.
>>
>> This is what most DBs do, and also how the SQL standard defines
>> conversion between WITH TIME ZONE and WITHOUT TIME ZONE types. In this
>> case, the java.sql epoch based types transiting the JDBC interface are
>> treated as WITH TIMEZONE types.
>>
>> This ensures that non timezone-aware clients get timestamps that, if
>> interpreted in their local timezone, resolve to the same timezone-less date
>> that the parser and formatter would return, and any java.sql.temporal types
>> set in PreparedStatement will be interpreted as the local time in the
>> client TZ.
>>
>> The only other change in functionality is the CURRENT_DATE() and
>> CURRENT_TIME() functions, which now apply the displacement to the generated
>> values.
>>
>> As usual, we hide this change behind a property, and default to the old
>> behaviour to maintain backwards (bug) compatibility.
>>
>>
>> *Complications*
>> The SQL standard really talks about a Timezone Offset, and is not talking
>> about DST.
>> Many databases take DST into account, as does my patch.
>> Without that the internal representation would not be really UTC, and the
>> date functions (i.e. dayofweek()) wouldn't work.
>>
>> However, the standard DST problems also apply, with one hour each year
>> not existing, and one hour existing twice.
>>
>> The proper way to handle WITHOUT TIMEZONE types from JDBC is to use
>> strings, or use java.time.Local* functions with get/setObject methods.
>> Phoenix doesn't yet have those, I plan to add them in the near future
>> (PHOENIX-6881).
>>
>> When using ROW_TIMESTAMP the displacement rules can easily cause problems
>> if the user is not expecting it, and expects the java.sql.Timestamp value
>> to be applied directly.
>> i.e when executing *preparedStmt.setTimestamp('ROW_TS_COL', new
>> java.sql.Timestamp(new java.sql.Date()));* from multiple timezones,
>> there is no guarantee that the chronologically last insert will have the
>> latest timestamp.
>> This was already the case when setting the column via a String.
>> This requires extra attention, and should be mentioned in the
>> documentation.
>>
>> *Feedback*
>>
>> Please check out the new PR at
>> https://github.com/apache/phoenix/pull/1567, as I said it is very small.
>>
>> Any and all comments and questions are welcome, either with the general
>> approach or with the specific implementation.
>>
>> I am also collecting some related issues in PHOENIX-6882, those will be
>> also needed to get Phoenix date handling into a better place.
>>
>> *References/further reading*
>>
>> https://issues.apache.org/jira/browse/PHOENIX-5066
>> https://issues.apache.org/jira/browse/PHOENIX-6882
>> https://github.com/julianhyde/sqlline/issues/440
>>
>> https://stackoverflow.com/questions/14070572/is-java-sql-timestamp-timezone-specific
>>
>> https://stackoverflow.com/questions/44574205/does-the-timezone-matter-when-parsing-a-timestamp
>>
>> https://medium.com/@williampuk/sql-timestamp-and-jdbc-timestamp-deep-dive-7ae0ea91e237
>> Also search for "displacement" in the SQL11 standard. (No public link
>> available)
>>
>> best regards
>> Istvan
>>
>>