You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Ryan Clough (JIRA)" <ji...@apache.org> on 2016/02/26 20:20:18 UTC

[jira] [Updated] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

     [ https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan Clough updated DRILL-4447:
-------------------------------
    Attachment: timestamps.txt
                timestamps_parquet.tar.gz

Here is a plaintext list of timestamps, as well as the parquet directory used in my examples

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4447
>                 URL: https://issues.apache.org/jira/browse/DRILL-4447
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Centos 6.2/Distributed/CDH5.4.9
>            Reporter: Ryan Clough
>         Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to what is a minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We want to select the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-------------+
> |  data_date  |
> +-------------+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-------------+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for distinct records (note that the 47 rows matches the 47 rows of distinct timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested its self elsewhere.
> I tried to get around the problem by converting the dates to a string using TO_CHAR: surely drill will use the resulting strings to do the DISTINCT comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a string with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report the issue.
> Please let me know if you need any further info, query plans, etc, but it should be reproducable with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)