You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Jiang Wu <ji...@mulesoft.com> on 2018/04/05 16:57:55 UTC

Re: Different output for the same date field

Just a FYI.  Submitted a pull request for https://issues.apache.org/jira/browse/DRILL-6242.  The pull request materialize Drill timestamp, date, and time into java.sql.Timestamp, java.sql.Date, and java.sql.Time.  With these 3 distinct types returned inside nested values through the JDBC layer, the application using the output can now see the proper types.

For example, made a parquet file with date, time, timestamp, list of date, list of time, list of timestamp, and a map of data, time, timestamp.   Now querying this returns:

0: jdbc:drill:zk=local> select * from dfs.tmp.`datetime.parquet` limit 1;
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+
|    date     |     time      |        timestamp         |    date_list    |     time_list     |        timestamp_list        |                                      time_map                                      |
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+
| 1970-01-11  | 00:00:03.600  | 2018-03-21 20:30:07.364  | ["1970-01-11"]  | ["00:00:03.600"]  | ["2018-03-21 20:30:07.364"]  | {"date":"1970-01-11","time":"00:00:03.600","timestamp":"2018-03-21 20:30:07.364"}  |
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+ 

Comparing the above to the existing behavior, which uses org.joda.time.DateTime object for all three Drill types when returning nested values such as in list and maps:

+------+------+-----------+-----------+-----------+----------------+----------+
| date | time | timestamp | date_list | time_list | timestamp_list | time_map |
+------+------+-----------+-----------+-----------+----------------+----------+
| 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":864000000,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}] | {"date":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":864000000,"afterNow":false,"beforeNow":true,"equalNow":false},"time":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false},"timestamp":{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}} |
+------+------+-----------+-----------+-----------+----------------+----------+

-- Jiang


On 3/14/18, 1:44 PM, "Jiang Wu" <ji...@mulesoft.com> wrote:

    I will file a Jira issue for this problem with a more detailed description.  https://issues.apache.org/jira/browse/DRILL-6242
    
    I put some solution possibilities in the Jira.  Someone with more knowledge can take a look at see if any of the solution approach makes sense first.
    
    Thanks.
     
    -- Jiang
    
    
    On 3/13/18, 9:56 PM, "Kunal Khatua" <ku...@apache.org> wrote:
    
        Hi Jiang
        
        Thanks for bringing this to our attention.
        
        However, we're not actively developing the Mongo storage plugin at the
        moment, so it might be a while before this is updated.
        
        Do you think you could help with submitting a pull-request against a JIRA
        filed for this? You already appear to have become familiar with the Drill
        codebase, and we're looking to contributions from the community.
        
        
        ~ Kunal
        
        On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wu <ji...@mulesoft.com> wrote:
        
        > Setup:
        >
        >
        >   1.  Mongodb with this sample data inside:
        >
        > db.test.findOne();
        > {
        >      "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
        >      "name" : "orange",
        >      "context" : {
        >            "time" : ISODate("2018-03-13T21:52:54.940Z"),
        >            "user" : "jack"
        >      }
        > }
        >
        >
        >   1.  Connect with Drill 1.12 and run the query:
        >
        >
        > select t.context.`time` as `time`, t.context from mongo.test.test t;
        >
        > The result is:
        >
        > +------------+---------+
        > | time       | context |
        > +------------+---------+
        > | 2018-03-13 | {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,
        > ... },"user":"jack"} |
        > +------------+---------+
        >
        > Note how the output formatting of the same field “time” is different
        > depending on whether the time is shown as a first level column or as a
        > nested JSON field.  Has anyone seen this behavior?
        >
        > Looking at the source code, it appears that the JSON output is produced
        > from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0).
        > This class uses its own ObjectMapper instance to serialize the map.
        >
        > Would it better to add some Mixin’s to the various complex types that may
        > appear inside this JsonStringHashMap?  For example, DateTime object can be
        > serialized using the logical representation, rather than the getters of the
        > DataTime class used by Drill.
        >
        > Thanks.
        >
        > -- Jiang
        >
        >