You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by jiang-wu <gi...@git.apache.org> on 2018/03/23 01:10:53 UTC

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

GitHub user jiang-wu opened a pull request:

    https://github.com/apache/drill/pull/1184

    DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes to hold value…

    See Jira ticket for details.  Use java.sql.Date, java.sql.Time, and java.sql.Timestamp as the Java representation for their corresponding Drill types.  This does not lose any precisions as these classes are just simple subclasses of java.util.Date with millisecond precision.  But using these classes allows the command line to properly format the data using org.apache.drill.exec.util.JsonStringArrayList and org.apache.drill.exec.util.JsonStringHashMap.
    
    The changes are simple enough.  But many Test** methods need to be updated to use java.sql.Date|Time|Timestamp.  Opt not to optimize these changes.  Places still use joda DateTime to parse date and time as before, but then converted to the java.sql.Date|Time|Timestamp as appropriate.  

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/jiang-wu/drill DRILL-6242-master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/drill/pull/1184.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1184
    
----
commit 7cbb8b81196732cb223c031cd629d9bc941640d9
Author: Jiang Wu <ji...@...>
Date:   2018-03-22T20:42:20Z

    DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes to hold values from corresponding Drill date, time, and timestamp types.

----


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    @parthchandra, the point about the birthday is that is is one of those dates that is implied relative to where you are. You celebrate it the same day regardless of where you are in the world. Same with an order date. So, a key problem is that, for dates, they are not relative to UTC, they are just dates. They become relative to UTC only when a time and timezone is supplied.
    
    As @jiang-wu explained, storing in UTC is fine when times are absolute (date + time + tz). The problem is "2018-04-15" or even "2018-04-15 3 PM" is not an absolute: it is local and cannot be stored as UTC unless we know the TZ. Guessing that the TZ is that of the server really does not help, and actually produces wrong results when client and server timezones differ.
    
    That's why the data structures need to support the data model @jiang-wu suggested:
    
    * Date without TZ
    * Time without a TZ
    * Date/time without TZ, and
    * Timestamp implied in UTC.
    
    And, yes, it is because people abused the Java `Date` class that the Joda time classes were invented. We just need to have Drill types that parallel the Joda types. Granted, this is more than this fix can tackle, but the point stands. 
    
    Agreed that the issue of how to handle JDBC/ODBC needs to be resolved. Can we make up synthetic column names? Implicitly flatten the results so that "context.date" will pick out the "date" element within "context". This will allow JDBC to provide metadata and a reasonable type for each column, at the cost of potentially creating a very wide row (if you have deeply nested maps.) The "auto-flatten" option seems cleaner than any object-based format we make up.
    
    A related solution is to fix `sqlline` so that it has a formatter other than `toString()` as @jiang-wu suggested. We register a format class and `sqlline` uses that to format, say, a Drill Map. That way we don't have to use a JSON object so that its `toString()` produces nice output. 


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r184243856
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    Hmm. That takes us back to the original problem, that of the date|time|timestamp field inside a complex object. 
    ```
    select t.context.date, t.context from test t;
    
    will return a java.sql.Date object for column 1, but a java.time.LocalDate for the same object inside column 2. This doesn't seem like a good thing.
    ```
    Why should that be a bad thing though? Ultimately, the object returned by getObject() is displayed to the end user thru the toString method. The string representation of Local[Date|Time|Timestamp]  should be the same as that of java.sql.[Date|Time|Timestamp]. Isn't it?



---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    @parthchandra @vdiravka I finally completed the changes on using Local{Date|Time|DateTime}.  I made a new clean pull request for that here: https://github.com/apache/drill/pull/1247 



---

Re: [GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by Parth Chandra <pa...@apache.org>.
I'm assuming jiang-wu will update the PR to use the Joda/JDK classes when
he gets the time?

On Thu, Apr 19, 2018 at 6:34 AM, arina-ielchiieva <gi...@git.apache.org>
wrote:

> Github user arina-ielchiieva commented on the issue:
>
>     https://github.com/apache/drill/pull/1184
>
>     So what the next steps required before we merge this PR?
>
>
> ---
>

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by arina-ielchiieva <gi...@git.apache.org>.
Github user arina-ielchiieva commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    So what the next steps required before we merge this PR?


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    >  But, if April 15 is your birthday, it is your birthday in all timezones. We don't say your birthday (or order date, or newspaper issue date or...) is one day in, say London and another day in Los Angeles.
    
    If it is your birthday in California, it may already be the day after your birthday in Japan. :)
    
    IMO, Representing dates, times, and timestamp's as UTC is not the problem. It is, in fact, perfectly correct (since UTC is the timezone). Converting a date|time|timestamp without a timezone to/from UTC, is the problem. The problem is made worse by java.util and JDBC APIs. java.time gets it right though.
    
    However, as Jiang-wu points out, that still does not address the mismatch between Joda/Java8 representation and JDBC. It also does not address his original problem, the issue of how to represent a complex type in JDBC; just return an Object, it says, which is no help at all . It is even worse for ODBC which (last I checked) did not even have an API to return an Object type (which is why in ODBC we return a JSON string representation). 
    For Jiang-wu's use case, since the string representation is not enough, we might look at returning a java.sql.Struct [1] type for Maps and java.sql.Array [2] types.
    
    [1] https://docs.oracle.com/javase/7/docs/api/java/sql/Struct.html
    [2] https://docs.oracle.com/javase/7/docs/api/java/sql/Array.html
    
    
    



---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Can you check the unit tests after rebasing? I applied the PR to the latest master and get errors in the same tests. Thanks.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    One additional note. We noted that JDBC does not support the idea of a nested tuple (a Drill "map".) JDBC does support columns that return a Java object. To bridge the gap, Drill returns a Map column as a Java object.
    
    But, why a JSON object? The answer seems to lie with the `sqline` program. If we query a one-line JSON file with a nested object in `sqlline`, we get the following display:
    
    ```
    SELECT * FROM `json/nested.json`;
    +---------+----------------------------------+
    | custId  |               name               |
    +---------+----------------------------------+
    | 101     | {"first":"John","last":"Smith"}  |
    +---------+----------------------------------+
    ```
    
    So, it seems likely that the the value of a Map object was translated to a JSON object so that when `sqlline` calls `toString()` on it, it ends up formatted nicely as above.
    
    Because of this, it may be hard to change the kind of objects returned from JDBC for a Map column.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r180933601
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    BTW, Drill internally tries to fool around with the timezone to preserve the "textual representation" look complex.  I am not convinced this is the "right" way to handle time.  But in any case, that is outside of the scope of this change.
    
    I mentioned in the Jira a comment on how such timezone manipulation is dangerous and lead to errors.  I ran into that when attempting at creating a unit test for the change made in the pull request.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Sorry, coming late. There seem to be two problems. The original "nested column" issue is an artifact of the JDBC driver. In Drill, a Map (the thing that contains your nested column) is just a nested tuple. But, JDBC does not have the idea of a nested field, there is no way to ask, for, say "myMap.ts". All you can ask for is "myMap" if it is a Map. The Drill JDBC driver has to invent a value to return. It invents a Java map.
    
    For JDBC, which does not support nested fields, you can project your field up to the top level just by naming it in the select clause:
    
    ```
    SELECT `context`.`date` as `context_date` ...
    ```
    
    The second problem that this PR seems to address is how dates are stored. Many tests have been changed to double-down on Drill's original sin: that generic dates (2015-04-15, say) are represented a a timestamp in UTC. But, if April 15 is your birthday, it is your birthday in all timezones. We don't say your birthday (or order date, or newspaper issue date or...) is one day in, say London and another day in Los Angeles.
    
    Drill should have a "Date" type that is not associated with a timezone. But, we don't so we get tied up in the "treat the local time zone as if it were UTC" issue.
    
    The original set of Drill types did have types to handle these, but they didn't quite make it into the final version. Also, this issue has been discussed (with some vigor) on the mailing list once or twice.
    
    One flaw, that you seem to have spotted, is that if I read data on a server in one TZ, then display it on a client in another TZ, the dates and times are all messed up. The server reads dates in local TZ, then simply stores the ms value in a date. The client thinks that date is UTC and adjusts it to its own local TZ. All heck breaks loose. (Or something like that; the original test case was over a year ago and I may have messed up the details...)
    
    The ideal set of date/time types:
    
    * Date: A date in an unspecified time zone, such as your birthday.
    * Time: A time relative to midnight in an (unknown) Date, no association with a TZ. For example, "we have lunch at 11:30 AM" applies regardless of TZ.
    * Timestamp: an absolute time relative to UTC.
    
    Then, functions can convert back and forth. Joda (and, in Java 8, the new Date/time classes) work this way.
    
    This means that the many tests that were modified to turn a generic date into a timestamp are simply making the problem worse: we are saying that 2015-04-15 is midnight, April 15 in London, which is highly unexpected.
    
    Bottom line: we've got two very difficult issues here: how to handle maps in JDBC and how to fix Drill's date/time types.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    I was out of town last week. Will work on the type change to Java 8 Local[Data|Time|Timestamp] this week and then notify you when it is done.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    The unit test failure is due to additional changes in master after the pull request is made.  I can merge and update on the branch to fix them.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r184424638
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    The string representation between LocalDateTime and Timestamp are not exactly the same, but that is potentially fixable since we can alter the way the values are displayed via formatters.  Though JDBC is not just for getting string representations.  It is on the programmatic use cases where we are getting the value objects where one would see the disconnect on the data types.  Will try this out with a use case I have with programmatic JDBC access and see what are the impacts on different types for the same expected value.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by vdiravka <gi...@git.apache.org>.
Github user vdiravka commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181472373
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    I think updating to Java8 LocalDate/Time classes would be good choice. And it will be step forward in the resolving of the Drill's Date/Time issues mentioned in different Jiras: [DRILL-5334](https://issues.apache.org/jira/browse/DRILL-5334), [DRILL-5332](https://issues.apache.org/jira/browse/DRILL-5332) etc.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by priteshm <gi...@git.apache.org>.
Github user priteshm commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    @parthchandra can you please review it. @jiang-wu Parth is traveling and he would be able to review it next week.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181256294
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    sure, I can update to the joda version or the java.time version.  Which one is preferred?  java.time.* is available in java 1.8+


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Actually, JDBC representation is not he hard problem here.  I ran into most of the problems dealing with the timezones surrounding the data|time|timestamp.  java.sql.Struct and Array are interfaces and not actual classes.  So the Current JDBC returning Map|List object for complex values are fine.  You can just declare JsonStringHashMap implements Struct, and JsonStringArryaList implements Array.  
    
    Now the harder issue.  The semantics of "date", "time" are tricker comparing to "timestamp".  Timestamp is understood to be an instant in time (java/joda Instant class).  Timestamp is a single point in time and not impacted by time zones.
    
    date and time can have two uses:
    
    1) logical date and time, which is not fixed point or range in time.  e.g. 2018-01-01 is the new year day and this day happens for a different 24 hour window depending on where your time zone is.  So this is a logical date, and we don't celebrate the start of the New Year day at the same time.
    
    2) date and time with offset/timezone -- This refers to a specific point or range in time.  This type of date/time is absolute.  e.g. NYSE opens on "7:30 am EST".  Regardless of the timezone you are at, this time is the same for everyone.
    
    joda/java8 time package have the proper handling for 1) logical date, time (using local date time classes); 2) absolute date, time (using offset date time classes); and 3) timestamp (instant class)  various method exists for you to apply conversions between these based on the heuristics you want to apply. 
    
    I feel for Drill, we need to first decide what behavior do we want to support first.  Then go from there.



---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    @parthchandra Just to clarify on the JDBC comment.  What do you mean by "Json representation"? Do you instead mean the "Local[Date|Time]" class representation?  There are no "Json" being returned from the JDBC layer.  It uses Java collections Map or List objects.  Inside the Map | List, the change in this pull request properly uses objects of different classes: Local [Date|Time|DateTime] to represent the various date/time/timestamp values.
    
    So far so good.  Now, it is possible in the future, we may want to further translate the Local [Date|Time|DateTime] objects inside the Map|List to java.sql.[Date|Time|Timestamp] upon access.  But to do that inside the SqlAccessor, you would need to deep copy the Map|List and build another version with the date|time translated into java.sql.date|time.  That would seem like a lot of work for little gain. 
    
    I would say let's hold off on that for now.  A few databases seem to be moving toward using non-timezone based representation in JDBC if the database does not support timezones: https://jdbc.postgresql.org/documentation/head/8-date-time.html  It would make sense to consider changing the class used after deciding on what to do with Drill handling of timezones. 


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Yes.  There are at least two issues.  
    
    One is about how Drill represent Date, Time, Timestamp internally using a UTC based instant representation and fudges the timezone in order to make the none time zone fields looking right, that Paul outlined nicely above.  To really fix this, one would need to first define how Drill wishes to handle Date, Time, Timestamp: e.g. no time zone at all, time zone aware but not preserving, time zone aware and preserving, etc.  One can look at how databases handle time zones to get some inspirations.  This part is too ambitious for me to fix here.
    
    The second is to obtain a complex object value from JDBC interface.  Drill doesn't make a JSON object in order to send the data to a JDBC interface.  It looks like the JDBC interface is simple an alternative accessor to vector data being transmitted from the server side to the client side.  Once the vector data arrives on the client side, the JDBC layer builds a Map (or List) object by reading the values from the vectors.  The issue I found was that inside this process, date|time|timestamp values from their respective vector classes were all represented using the same Java class, hence losing its type information all together when the Java object is placed inside the Map|List.
    
    So a fix for this part is simple (in concept), just use different Java types.  Except, when making this change, the first issue popped up as one has to figure out how to make Date and Time out of UTC instant values, which are these fudged values from the existing logic


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    ```
    What do you mean by "Json representation"? 
    ```
    Sorry, my mistake, got all tangled up. 
    ```
     we may want to further translate the Local [Date|Time|DateTime] objects inside the Map|List to java.sql.[Date|Time|Timestamp] upon access. But to do that inside the SqlAccessor, you would need to deep copy the Map|List and build another version with the date|time translated into java.sql.date|time.
    ```
    That is what I thought you wanted to get to. If the current state is something you can work with, then great.  I can review the final changes once you're done and merge them as well. 
    Let's move the other discussion to another thread or JIRA.


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by paul-rogers <gi...@git.apache.org>.
Github user paul-rogers commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Just a quick reminder that the current "JSON Map" returned for a map column in JDBC was very likely done so that calling `toString()` in `sqlline` produces something like this: `{"c":"foo"}`. I realize this is a very obscure point; but worth keeping in mind to avoid bugs from `sqlline` users...


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181248969
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    Agree that messing with Timezones is a Bad Thing. Probably an artifact of the way java.util did things.
    Anyway, I did mean using org.joda.time.Local[Data|Time|TimeStamp]  or the corresponding java.time.* classes.



---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181461481
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    Either one is fine (since java.time is based on Joda). We've switched to Java 8, but just for consistency with the rest of the code, we might as well use Joda.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r183862162
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    As I work through using Local[Date|Time|DateTime] inside the vector package, I notice that it will create the following inconsistency on the JDBC output:
    
    SqlAccessor provides "getDate()", "getTime()", and "getTimestamp()" that are returns java.sql.[Date|Time|Timestamp].  This will convert Local[Date|Time|DateTime] into java.sql.[Date|Time|Timestamp]
    
    For complex objects, SqlAccessor provides "getObject()" which will return JsonStringHashMap or JsonStringArrayList.  If the Local[Date|Time|DateTime] objects are inside the map and list, then they will NOT be converted into java.sql.[Date|Time|Timestamp].
    
    Example:
    
    `select t.context.date, t.context from test t; `
    
    will return a java.sql.Date object for column 1, but a java.time.LocalDate for the same object inside column 2.  This doesn't seem like a good thing.  What should be the right thing to do here?  Introduce SqlAccessor.getLocal[Date|Time|Timestamp] accessors to supplement the existing get[Date|Time|Timestamp]?
    



---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181493581
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    How about we use Java 8 Local[Data|Time|Timestamp] for the public interface methods?  That sets things up for the future.
    
    Internally, I won't change the logic that is using Joda DateTime, that is doing the various time zone stuff.  That behind the scene logic can be separately updated after determine what is the right behavior Drill wants to support.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r180914676
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    @jiang-wu Thanks for making these changes. Your fix is on the right track.  However, I'm not sure if we want to introduce a dependency on JDBC classes in the vectors. 
    Take a look at DateAccessor, TimeAccessor, and TimeStampAccessor. These are generated from [SqlAccessor](https://github.com/apache/drill/blob/master/exec/java-exec/src/main/codegen/templates/SqlAccessors.java). 
    The get methods in these convert from UTC to a Local{Date|Time|TimeStamp}.  Subsequently they convert to the JDBC type since they are used by the JDBC driver.
    The vectors should be able to do the same, just returning the Local{Date|Time|TimeStamp} object.
    I'm not sure if that might affect tests that depend on timezone though. Perhaps @vdiravka can comment. 


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    I think that would be best. 


---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Putting aside the discussion on date/time/timezone for the moment, @jiang-wu let's say getObject returns to you an object that implements java.sql.{Struct|Array}. You now use the Struct|Array apis to get the attribute you are interested in. If the attribute is of type date|time the object returned for that attribute should now correspond to java.sql.{Date|Time} instead of the Json representation. Will that not address your requirement?



---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by parthchandra <gi...@git.apache.org>.
Github user parthchandra commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r181513279
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    Sounds good.


---

[GitHub] drill pull request #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] cl...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on a diff in the pull request:

    https://github.com/apache/drill/pull/1184#discussion_r180925699
  
    --- Diff: exec/vector/src/main/codegen/templates/FixedValueVectors.java ---
    @@ -509,15 +509,15 @@ public long getTwoAsLong(int index) {
         public ${friendlyType} getObject(int index) {
           org.joda.time.DateTime date = new org.joda.time.DateTime(get(index), org.joda.time.DateTimeZone.UTC);
           date = date.withZoneRetainFields(org.joda.time.DateTimeZone.getDefault());
    -      return date;
    +      return new java.sql.Date(date.getMillis());
    --- End diff --
    
    Good point.  Someone with deeper knowledge should take a look.  As far as I can tell, I think the problem with SqlAccessor is that it uses the vector type to know whether to invoke getDate() vs getTimestamp().  However, such vector type knowledge is not there when complex type such as List and Map are materialized in memory to form Java JsonStringArrayList and JsonStringHashMap.  
    
    In https://issues.apache.org/jira/browse/DRILL-6242, the example describes this scenario:
    
    `select t.context.`date`, t.context from test t;`
    
    where the `date` field is inside a Map type.  And we select the field by itself as well as select the Map on the same query.  This query returns:
    
    `+--------+---------+ `
    `| EXPR$0 | context | `
    `+--------+---------+ `
    `| 2018-03-13 | {"date":{"dayOfYear":72,"year":2018, ... |`
    
    One can see that the first column shows the date in the right type.  But the same date is shown as a different type inside the Map.  In the vector package, when reading the [List|Map]Vector, the code produces its nested member values via the generic method "getObject()".  Since all three vector type returned the same DataObject type as the representation, there are no distinction.
    
    For the type information to be carried within the List | Map, it would seem that the value should be of distinct types.  These can be java.sql.[Date|Time|Timestamp] or some other [Date|Time|Timestamp] classes.
    



---

[GitHub] drill issue #1184: DRILL-6242 - Use java.sql.[Date|Time|Timestamp] classes t...

Posted by jiang-wu <gi...@git.apache.org>.
Github user jiang-wu commented on the issue:

    https://github.com/apache/drill/pull/1184
  
    Checked in the patch for the latest master.  The test seems to be passing now.  But if we want to change the code to use joda <or> java.time Local[Time|Date|TimeStamp], then we can hold off any merge until it is updated to not use java.sql.*


---