You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by "Ryan Clough (JIRA)" <ji...@apache.org> on 2016/02/26 20:19:18 UTC

[jira] [Created] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

Ryan Clough created DRILL-4447:
----------------------------------

             Summary: Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
                 Key: DRILL-4447
                 URL: https://issues.apache.org/jira/browse/DRILL-4447
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.5.0
         Environment: Centos 6.2/Distributed/CDH5.4.9
            Reporter: Ryan Clough


The issue comes from a larger query, but I've managed to narrow it down to what is a minimally reproducible issue.

Given a list of timestamps (will attach files) associated with 3 days, We want to select the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE function, which does exactly what I want it.

Note, there are 47 distinct timestamps in the data set.

{code:sql}
jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
. . . . . . . > FROM timestamps;
+-------------+
|  data_date  |
+-------------+
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-25  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-24  |
| 2016-02-24  |
| 2016-02-23  |
| 2016-02-23  |
+-------------+
47 rows selected (11.057 seconds)
{code}

As you can see, DRILL has ignored the TO_DATE function when checking for distinct records (note that the 47 rows matches the 47 rows of distinct timestamps).

My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested its self elsewhere.

I tried to get around the problem by converting the dates to a string using TO_CHAR: surely drill will use the resulting strings to do the DISTINCT comparison?

{code:sql}
drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:

Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..

Fragment 0:0

[Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
{code}

As far as I can tell from the docs, you SHOULD be able to convert a date to a string with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report the issue.

Please let me know if you need any further info, query plans, etc, but it should be reproducable with the timestamps data I'll attach in a minute




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)