You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Ryan Clough (JIRA)" <ji...@apache.org> on 2016/02/26 20:38:18 UTC

[jira] [Commented] (DRILL-4447) Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY

    [ https://issues.apache.org/jira/browse/DRILL-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15169636#comment-15169636 ] 

Ryan Clough commented on DRILL-4447:
------------------------------------

A bit more info: it seems to fail when encapsulated in a subquery like so: 

{code:sql}
WITH subq AS
(SELECT TO_DATE(data_date) as data_date FROM timestamps)
SELECT DISTINCT(data_date)
FROM subq;
{code}

But it DOES appear to WORK if you save them to their own table, and then run the query:

{code:sql}
0: jdbc:drill:> CREATE TABLE converted_dates AS
. . . . . . . > SELECT TO_DATE(data_date) as data_date FROM timestamps;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 47                         |
+-----------+----------------------------+
1 row selected (25.259 seconds)
0: jdbc:drill:> select distinct(data_date) from converted_dates;
+-------------+
|  data_date  |
+-------------+
| 2016-02-23  |
| 2016-02-25  |
| 2016-02-24  |
+-------------+
3 rows selected (14.062 seconds)
{code}

> Drill seems to ignore TO_DATE(timestamp) when used inside DISTINCT() and GROUP BY
> ---------------------------------------------------------------------------------
>
>                 Key: DRILL-4447
>                 URL: https://issues.apache.org/jira/browse/DRILL-4447
>             Project: Apache Drill
>          Issue Type: Bug
>    Affects Versions: 1.5.0
>         Environment: Centos 6.2/Distributed/CDH5.4.9
>            Reporter: Ryan Clough
>         Attachments: timestamps.txt, timestamps_parquet.tar.gz
>
>
> The issue comes from a larger query, but I've managed to narrow it down to what is a minimally reproducible issue.
> Given a list of timestamps (will attach files) associated with 3 days, We want to select the distinct dates (total: 3 days) from this list. To do this, I decided to use the TO_DATE function, which does exactly what I want it.
> Note, there are 47 distinct timestamps in the data set.
> {code:sql}
> jdbc:drill:> SELECT DISTINCT(TO_DATE(data_date)) AS data_date
> . . . . . . . > FROM timestamps;
> +-------------+
> |  data_date  |
> +-------------+
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-25  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-24  |
> | 2016-02-24  |
> | 2016-02-23  |
> | 2016-02-23  |
> +-------------+
> 47 rows selected (11.057 seconds)
> {code}
> As you can see, DRILL has ignored the TO_DATE function when checking for distinct records (note that the 47 rows matches the 47 rows of distinct timestamps).
> My testing has also shown that this affect GROUP BY. I wouldn't be surprised if it manifested its self elsewhere.
> I tried to get around the problem by converting the dates to a string using TO_CHAR: surely drill will use the resulting strings to do the DISTINCT comparison?
> {code:sql}
> drill:> SELECT DISTINCT(TO_CHAR(TO_DATE(data_date))) FROM timestamps;
> Error: SYSTEM ERROR: SchemaChangeException: Failure while trying to materialize incoming schema.  Errors:
> Error in expression at index -1.  Error: Missing function implementation: [to_char(DATE-OPTIONAL)].  Full expression: --UNKNOWN EXPRESSION--..
> Fragment 0:0
> [Error Id: bcad87f0-3353-4a3b-842e-c68a02b394c3 on lvimhdpa14.lv.vimeows.com:31010] (state=,code=0)
> {code}
> As far as I can tell from the docs, you SHOULD be able to convert a date to a string with TO_CHAR(). I'm not sure what the underlying issue is here, but I thought it good to report the issue.
> Please let me know if you need any further info, query plans, etc, but it should be reproducable with the timestamps data I'll attach in a minute



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)