You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Khurram Faraaz (JIRA)" <ji...@apache.org> on 2017/03/09 06:04:38 UTC

[jira] [Comment Edited] (DRILL-4678) Query HANG - SELECT DISTINCT over date data

    [ https://issues.apache.org/jira/browse/DRILL-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15286232#comment-15286232 ] 

Khurram Faraaz edited comment on DRILL-4678 at 3/9/17 6:04 AM:
---------------------------------------------------------------

Upon reducing the size the query returns results, however if we add one or two more values the time taken to execute almost doubles.

This query took 16 seconds.
{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT DISTINCT dt FROM (
. . . . . . . . . . . . . . > VALUES(CAST('1964-03-07' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('2002-03-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1993-08-18' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1959-10-23' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1992-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1994-07-24' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1979-11-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1945-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1982-07-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-06' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1989-05-01' AS DATE)),
. . . . . . . . . . . . . . > (CAST('1951-05-16' AS DATE)))
. . . . . . . . . . . . . . > tbl(dt);
+-------------+
|     dt      |
+-------------+
| 1964-03-07  |
| 2002-03-04  |
| 1966-09-04  |
| 1993-08-18  |
| 1970-06-11  |
| 1959-10-23  |
| 1992-01-14  |
| 1994-07-24  |
| 1979-11-25  |
| 1945-01-14  |
| 1982-07-25  |
| 1966-09-06  |
| 1989-05-01  |
| 1951-05-16  |
+-------------+
14 rows selected (16.718 seconds)
{noformat}

I added few more rows enclosed in parentheses and it took 32 seconds (almost twice the time)
{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT DISTINCT dt FROM (
. . . . . . . . . . . . . . > VALUES(CAST('1964-03-07' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('2002-03-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1993-08-18' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1959-10-23' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1992-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1994-07-24' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1979-11-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1945-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1982-07-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-06' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1989-05-01' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1996-03-08' AS DATE)),
. . . . . . . . . . . . . . > (CAST('1951-05-16' AS DATE)))
. . . . . . . . . . . . . . > tbl(dt);
+-------------+
|     dt      |
+-------------+
| 1964-03-07  |
| 2002-03-04  |
| 1966-09-04  |
| 1993-08-18  |
| 1970-06-11  |
| 1959-10-23  |
| 1992-01-14  |
| 1994-07-24  |
| 1979-11-25  |
| 1945-01-14  |
| 1982-07-25  |
| 1966-09-06  |
| 1989-05-01  |
| 1996-03-08  |
| 1951-05-16  |
+-------------+
15 rows selected (32.182 seconds)
{noformat}

Adding any more rows (even adding just one) would keep the query in STARTING state and the query would never complete.
explain plan over the original query reported in this JIRA would not return any results. So it looks like it has to do something with the total number of values enclosed in parentheses inside the VALUES clause.


was (Author: khfaraaz):
Upon reducing the size the query returns results, however if we add one or two more values the time taken to execute almost doubles.

This query took 16 seconds.
{noormat}
0: jdbc:drill:schema=dfs.tmp> SELECT DISTINCT dt FROM (
. . . . . . . . . . . . . . > VALUES(CAST('1964-03-07' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('2002-03-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1993-08-18' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1959-10-23' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1992-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1994-07-24' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1979-11-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1945-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1982-07-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-06' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1989-05-01' AS DATE)),
. . . . . . . . . . . . . . > (CAST('1951-05-16' AS DATE)))
. . . . . . . . . . . . . . > tbl(dt);
+-------------+
|     dt      |
+-------------+
| 1964-03-07  |
| 2002-03-04  |
| 1966-09-04  |
| 1993-08-18  |
| 1970-06-11  |
| 1959-10-23  |
| 1992-01-14  |
| 1994-07-24  |
| 1979-11-25  |
| 1945-01-14  |
| 1982-07-25  |
| 1966-09-06  |
| 1989-05-01  |
| 1951-05-16  |
+-------------+
14 rows selected (16.718 seconds)
{noformat}

I added few more rows enclosed in parentheses and it took 32 seconds (almost twice the time)
{noformat}
0: jdbc:drill:schema=dfs.tmp> SELECT DISTINCT dt FROM (
. . . . . . . . . . . . . . > VALUES(CAST('1964-03-07' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('2002-03-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-04' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1993-08-18' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1970-06-11' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1959-10-23' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1992-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1994-07-24' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1979-11-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1945-01-14' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1982-07-25' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1966-09-06' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1989-05-01' AS DATE)),
. . . . . . . . . . . . . . >       (CAST('1996-03-08' AS DATE)),
. . . . . . . . . . . . . . > (CAST('1951-05-16' AS DATE)))
. . . . . . . . . . . . . . > tbl(dt);
+-------------+
|     dt      |
+-------------+
| 1964-03-07  |
| 2002-03-04  |
| 1966-09-04  |
| 1993-08-18  |
| 1970-06-11  |
| 1959-10-23  |
| 1992-01-14  |
| 1994-07-24  |
| 1979-11-25  |
| 1945-01-14  |
| 1982-07-25  |
| 1966-09-06  |
| 1989-05-01  |
| 1996-03-08  |
| 1951-05-16  |
+-------------+
15 rows selected (32.182 seconds)
{noformat}

Adding any more rows (even adding just one) would keep the query in STARTING state and the query would never complete.
explain plan over the original query reported in this JIRA would not return any results. So it looks like it has to do something with the total number of values enclosed in parentheses inside the VALUES clause.

> Query HANG - SELECT DISTINCT over date data
> -------------------------------------------
>
>                 Key: DRILL-4678
>                 URL: https://issues.apache.org/jira/browse/DRILL-4678
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 1.7.0
>         Environment: 4 node cluster CentOS
>            Reporter: Khurram Faraaz
>            Priority: Critical
>         Attachments: hung_Date_Query.log
>
>
> Below query hangs
> {noformat}
> 2016-05-16 10:33:57,506 [28c65de9-9f67-dadb-5e4e-e1a12f8dda49:foreman] INFO  o.a.drill.exec.work.foreman.Foreman - Query text for query id 28c65de9-9f67-dadb-5e4e-e1a12f8dda49: SELECT DISTINCT dt FROM (
> VALUES(CAST('1964-03-07' AS DATE)),
>       (CAST('2002-03-04' AS DATE)),
>       (CAST('1966-09-04' AS DATE)),
>       (CAST('1993-08-18' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1970-06-11' AS DATE)),
>       (CAST('1959-10-23' AS DATE)),
>       (CAST('1992-01-14' AS DATE)),
>       (CAST('1994-07-24' AS DATE)),
>       (CAST('1979-11-25' AS DATE)),
>       (CAST('1945-01-14' AS DATE)),
>       (CAST('1982-07-25' AS DATE)),
>       (CAST('1966-09-06' AS DATE)),
>       (CAST('1989-05-01' AS DATE)),
>       (CAST('1996-03-08' AS DATE)),
>       (CAST('1998-08-19' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
>       (CAST('2013-08-13' AS DATE)),
> (CAST('1999-07-20' AS DATE)),
>     (CAST('1962-07-03' AS DATE)),
>       (CAST('2011-08-17' AS DATE)),
>       (CAST('2011-05-16' AS DATE)),
>       (CAST('1946-05-08' AS DATE)),
>       (CAST('1994-02-13' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1978-08-09' AS DATE)),
>       (CAST('1958-02-06' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('2012-06-11' AS DATE)),
>       (CAST('1998-03-26' AS DATE)),
>       (CAST('1996-11-04' AS DATE)),
>       (CAST('1953-09-25' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('2003-06-17' AS DATE)),
>       (CAST('1980-07-05' AS DATE)),
>       (CAST('1982-06-15' AS DATE)),
>       (CAST('1951-05-16' AS DATE)))
> tbl(dt)
> {noformat}
> Details from Web UI Profile tab, please note that the query is still in STARTING state
> {noformat}
> Running Queries
> Time	User	Query	State	Foreman
> 05/16/2016 10:33:57	
> mapr
>  SELECT DISTINCT dt FROM ( VALUES(CAST('1964-03-07' AS DATE)), (CAST('2002-03-04' AS DATE)), (CAST('1966-09-04' AS DATE)), (CAST('199
> STARTING
> centos-01.qa.lab
> {noformat}
> There is no other useful information in drillbit.log. jstack output is attached here for your reference.
> The same query works fine on Postgres 9.3



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)