You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@phoenix.apache.org by Gabriel Reid <ga...@gmail.com> on 2014/03/18 17:38:02 UTC

JDBC compatibility issues with Time, Date, and Timestamp in Phoenix

Hi,

I believe I've come up against a pretty big issue with Date and Time
handling (and to a certain extent Timestamp handling) in Phoenix.

>From what I understand from the JDBC documentation (mostly API docs),
the way that a java.sql.Date should be handled via JDBC is simply as a
day, month, and year, despite the fact that it is internally
represented as a timestamp (the same kind of thing applies to Time
objects, which are a triple of hours, minutes, and seconds).

Further, my understanding is that it is the responsibility of a JDBC
driver to do normalization of incoming Date and Time (and maybe
Timestamp) objects to interpret them as being in the current time
zone, and remove the extra components (i.e. time components for a
Date, and date components for a Time) before storing the value.

This means that today, if I insert a column value consisting of 'new
Date(System.currentTimeMillis())', then I should be able to retrieve
that same value with a filter on 'Date.valueOf("2014-03-18")'.
Additionally, that filter should work regardless of my own local
timezone where I'm querying from after the data has been written.

It also means that if I store 'Time.valueOf("07:00:00")' in a TIME
field in a database in my current timezone, someone should get
"07:00:00" if they run 'ResultSet#getTime(1).toString()' on that
value, even if they're in a different timezone than me.

>From what I can see right now, Phoenix doesn't currently exhibit this
behavior. Instead, the full long representation of Date, Time, and
Timestamps is stored directly in HBase, without dropping the extra
date fields or doing timezone conversion.

>From the current analysis, what is required for Phoenix to be
JDBC-compliant in terms of time/date/timestamp handling is:
* All incoming time-style values should be interpreted in the local
timezone of the driver, then be normalized and converted to UTC before
serialization (unless a Calendar is supplied) in PreparedStatement
calls
* All outgoing time-style values should be converted from UTC into the
local timezone (unless a Calendar is supplied) in ResultSet calls
* Supplying a Calendar to PreparedStatement methods should cause the
time value to be converted from the local timezone to the timezone of
the calendar (instead of UTC) before being serialized
* Supplying a Calendar to ResultSet methods should cause the time
value from the database to be interpreted as if it was serialized in
the timezone of the Calendar, instead of UTC.

I started on this earlier today as part of a smaller issue that I had
come up against (that the Calendar objects are incorrectly used in
PhoenixPreparedStatement and PhoenixResultSet), but digging deeper has
shown that this is a more fundamental issue in Phoenix.

Making the above changes would mean breaking backwards compatibility
with existing Phoenix installs (unless some kind of
backwards-compatibility mode is introduced or something similar). On
the other hand, not making the above changes will mean that the
Phoenix JDBC driver doesn't treat time-style values in the same way as
(most) other JDBC drivers, which can cause issues with external
tooling that goes via JDBC.

Any thoughts on the direction we want to go with this?

- Gabriel

Re: JDBC compatibility issues with Time, Date, and Timestamp in Phoenix

Posted by Gabriel Reid <ga...@gmail.com>.
Sounds good. I've created PHOENIX-868 for this. I also created
PHOENIX-869, which is a small optimization which doesn't change any
behavior (so it can be safely added to all branches), and also
partially somewhat documents part of the underlying issue of
PHOENIX-868 in the code.

- Gabriel


On Tue, Mar 18, 2014 at 5:55 PM, James Taylor <ja...@apache.org> wrote:
> Hi Gabriel,
> We maintain both date and time to a millisecond granularity and timestamp
> to a nano second granularity. If we want to change this, we'll need to do
> it in 5.0 where we are considering change our type system (which would thus
> require some data conversion anyway).
> For 3.0/4.0, clients have the option of using the TRUNC(date,'DAY') if they
> want to truncate the time to a day boundary as they insert date/time values.
> Would you mind filing a JIRA on this?
> Thanks,
> James
>
>
> On Tue, Mar 18, 2014 at 9:38 AM, Gabriel Reid <ga...@gmail.com>wrote:
>
>> Hi,
>>
>> I believe I've come up against a pretty big issue with Date and Time
>> handling (and to a certain extent Timestamp handling) in Phoenix.
>>
>> From what I understand from the JDBC documentation (mostly API docs),
>> the way that a java.sql.Date should be handled via JDBC is simply as a
>> day, month, and year, despite the fact that it is internally
>> represented as a timestamp (the same kind of thing applies to Time
>> objects, which are a triple of hours, minutes, and seconds).
>>
>> Further, my understanding is that it is the responsibility of a JDBC
>> driver to do normalization of incoming Date and Time (and maybe
>> Timestamp) objects to interpret them as being in the current time
>> zone, and remove the extra components (i.e. time components for a
>> Date, and date components for a Time) before storing the value.
>>
>> This means that today, if I insert a column value consisting of 'new
>> Date(System.currentTimeMillis())', then I should be able to retrieve
>> that same value with a filter on 'Date.valueOf("2014-03-18")'.
>> Additionally, that filter should work regardless of my own local
>> timezone where I'm querying from after the data has been written.
>>
>> It also means that if I store 'Time.valueOf("07:00:00")' in a TIME
>> field in a database in my current timezone, someone should get
>> "07:00:00" if they run 'ResultSet#getTime(1).toString()' on that
>> value, even if they're in a different timezone than me.
>>
>> From what I can see right now, Phoenix doesn't currently exhibit this
>> behavior. Instead, the full long representation of Date, Time, and
>> Timestamps is stored directly in HBase, without dropping the extra
>> date fields or doing timezone conversion.
>>
>> From the current analysis, what is required for Phoenix to be
>> JDBC-compliant in terms of time/date/timestamp handling is:
>> * All incoming time-style values should be interpreted in the local
>> timezone of the driver, then be normalized and converted to UTC before
>> serialization (unless a Calendar is supplied) in PreparedStatement
>> calls
>> * All outgoing time-style values should be converted from UTC into the
>> local timezone (unless a Calendar is supplied) in ResultSet calls
>> * Supplying a Calendar to PreparedStatement methods should cause the
>> time value to be converted from the local timezone to the timezone of
>> the calendar (instead of UTC) before being serialized
>> * Supplying a Calendar to ResultSet methods should cause the time
>> value from the database to be interpreted as if it was serialized in
>> the timezone of the Calendar, instead of UTC.
>>
>> I started on this earlier today as part of a smaller issue that I had
>> come up against (that the Calendar objects are incorrectly used in
>> PhoenixPreparedStatement and PhoenixResultSet), but digging deeper has
>> shown that this is a more fundamental issue in Phoenix.
>>
>> Making the above changes would mean breaking backwards compatibility
>> with existing Phoenix installs (unless some kind of
>> backwards-compatibility mode is introduced or something similar). On
>> the other hand, not making the above changes will mean that the
>> Phoenix JDBC driver doesn't treat time-style values in the same way as
>> (most) other JDBC drivers, which can cause issues with external
>> tooling that goes via JDBC.
>>
>> Any thoughts on the direction we want to go with this?
>>
>> - Gabriel
>>

Re: JDBC compatibility issues with Time, Date, and Timestamp in Phoenix

Posted by James Taylor <ja...@apache.org>.
Hi Gabriel,
We maintain both date and time to a millisecond granularity and timestamp
to a nano second granularity. If we want to change this, we'll need to do
it in 5.0 where we are considering change our type system (which would thus
require some data conversion anyway).
For 3.0/4.0, clients have the option of using the TRUNC(date,'DAY') if they
want to truncate the time to a day boundary as they insert date/time values.
Would you mind filing a JIRA on this?
Thanks,
James


On Tue, Mar 18, 2014 at 9:38 AM, Gabriel Reid <ga...@gmail.com>wrote:

> Hi,
>
> I believe I've come up against a pretty big issue with Date and Time
> handling (and to a certain extent Timestamp handling) in Phoenix.
>
> From what I understand from the JDBC documentation (mostly API docs),
> the way that a java.sql.Date should be handled via JDBC is simply as a
> day, month, and year, despite the fact that it is internally
> represented as a timestamp (the same kind of thing applies to Time
> objects, which are a triple of hours, minutes, and seconds).
>
> Further, my understanding is that it is the responsibility of a JDBC
> driver to do normalization of incoming Date and Time (and maybe
> Timestamp) objects to interpret them as being in the current time
> zone, and remove the extra components (i.e. time components for a
> Date, and date components for a Time) before storing the value.
>
> This means that today, if I insert a column value consisting of 'new
> Date(System.currentTimeMillis())', then I should be able to retrieve
> that same value with a filter on 'Date.valueOf("2014-03-18")'.
> Additionally, that filter should work regardless of my own local
> timezone where I'm querying from after the data has been written.
>
> It also means that if I store 'Time.valueOf("07:00:00")' in a TIME
> field in a database in my current timezone, someone should get
> "07:00:00" if they run 'ResultSet#getTime(1).toString()' on that
> value, even if they're in a different timezone than me.
>
> From what I can see right now, Phoenix doesn't currently exhibit this
> behavior. Instead, the full long representation of Date, Time, and
> Timestamps is stored directly in HBase, without dropping the extra
> date fields or doing timezone conversion.
>
> From the current analysis, what is required for Phoenix to be
> JDBC-compliant in terms of time/date/timestamp handling is:
> * All incoming time-style values should be interpreted in the local
> timezone of the driver, then be normalized and converted to UTC before
> serialization (unless a Calendar is supplied) in PreparedStatement
> calls
> * All outgoing time-style values should be converted from UTC into the
> local timezone (unless a Calendar is supplied) in ResultSet calls
> * Supplying a Calendar to PreparedStatement methods should cause the
> time value to be converted from the local timezone to the timezone of
> the calendar (instead of UTC) before being serialized
> * Supplying a Calendar to ResultSet methods should cause the time
> value from the database to be interpreted as if it was serialized in
> the timezone of the Calendar, instead of UTC.
>
> I started on this earlier today as part of a smaller issue that I had
> come up against (that the Calendar objects are incorrectly used in
> PhoenixPreparedStatement and PhoenixResultSet), but digging deeper has
> shown that this is a more fundamental issue in Phoenix.
>
> Making the above changes would mean breaking backwards compatibility
> with existing Phoenix installs (unless some kind of
> backwards-compatibility mode is introduced or something similar). On
> the other hand, not making the above changes will mean that the
> Phoenix JDBC driver doesn't treat time-style values in the same way as
> (most) other JDBC drivers, which can cause issues with external
> tooling that goes via JDBC.
>
> Any thoughts on the direction we want to go with this?
>
> - Gabriel
>