You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@drill.apache.org by Jiang Wu <ji...@mulesoft.com> on 2018/04/05 16:57:55 UTC
Re: Different output for the same date field
Just a FYI. Submitted a pull request for https://issues.apache.org/jira/browse/DRILL-6242. The pull request materialize Drill timestamp, date, and time into java.sql.Timestamp, java.sql.Date, and java.sql.Time. With these 3 distinct types returned inside nested values through the JDBC layer, the application using the output can now see the proper types.
For example, made a parquet file with date, time, timestamp, list of date, list of time, list of timestamp, and a map of data, time, timestamp. Now querying this returns:
0: jdbc:drill:zk=local> select * from dfs.tmp.`datetime.parquet` limit 1;
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+
| date | time | timestamp | date_list | time_list | timestamp_list | time_map |
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+
| 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | ["1970-01-11"] | ["00:00:03.600"] | ["2018-03-21 20:30:07.364"] | {"date":"1970-01-11","time":"00:00:03.600","timestamp":"2018-03-21 20:30:07.364"} |
+-------------+---------------+--------------------------+-----------------+-------------------+------------------------------+------------------------------------------------------------------------------------+
Comparing the above to the existing behavior, which uses org.joda.time.DateTime object for all three Drill types when returning nested values such as in list and maps:
+------+------+-----------+-----------+-----------+----------------+----------+
| date | time | timestamp | date_list | time_list | timestamp_list | time_map |
+------+------+-----------+-----------+-----------+----------------+----------+
| 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":864000000,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}] | {"date":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":864000000,"afterNow":false,"beforeNow":true,"equalNow":false},"time":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false},"timestamp":{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}} |
+------+------+-----------+-----------+-----------+----------------+----------+
-- Jiang
On 3/14/18, 1:44 PM, "Jiang Wu" <ji...@mulesoft.com> wrote:
I will file a Jira issue for this problem with a more detailed description. https://issues.apache.org/jira/browse/DRILL-6242
I put some solution possibilities in the Jira. Someone with more knowledge can take a look at see if any of the solution approach makes sense first.
Thanks.
-- Jiang
On 3/13/18, 9:56 PM, "Kunal Khatua" <ku...@apache.org> wrote:
Hi Jiang
Thanks for bringing this to our attention.
However, we're not actively developing the Mongo storage plugin at the
moment, so it might be a while before this is updated.
Do you think you could help with submitting a pull-request against a JIRA
filed for this? You already appear to have become familiar with the Drill
codebase, and we're looking to contributions from the community.
~ Kunal
On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wu <ji...@mulesoft.com> wrote:
> Setup:
>
>
> 1. Mongodb with this sample data inside:
>
> db.test.findOne();
> {
> "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
> "name" : "orange",
> "context" : {
> "time" : ISODate("2018-03-13T21:52:54.940Z"),
> "user" : "jack"
> }
> }
>
>
> 1. Connect with Drill 1.12 and run the query:
>
>
> select t.context.`time` as `time`, t.context from mongo.test.test t;
>
> The result is:
>
> +------------+---------+
> | time | context |
> +------------+---------+
> | 2018-03-13 | {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,
> ... },"user":"jack"} |
> +------------+---------+
>
> Note how the output formatting of the same field “time” is different
> depending on whether the time is shown as a first level column or as a
> nested JSON field. Has anyone seen this behavior?
>
> Looking at the source code, it appears that the JSON output is produced
> from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0).
> This class uses its own ObjectMapper instance to serialize the map.
>
> Would it better to add some Mixin’s to the various complex types that may
> appear inside this JsonStringHashMap? For example, DateTime object can be
> serialized using the logical representation, rather than the getters of the
> DataTime class used by Drill.
>
> Thanks.
>
> -- Jiang
>
>