You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/05/22 00:54:17 UTC
[jira] [Assigned] (DRILL-3137) Joining tables with lots of duplicates and LIMIT does not do early termination

     [ https://issues.apache.org/jira/browse/DRILL-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Aman Sinha reassigned DRILL-3137:
---------------------------------

    Assignee: Aman Sinha  (was: Chris Westin)

> Joining tables with lots of duplicates and LIMIT does not do early termination
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-3137
>                 URL: https://issues.apache.org/jira/browse/DRILL-3137
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 0.9.0
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>
> Create a table with duplicate keys: 
> {code}
> create table dfs.tmp.lineitem_dup as select 100 as key1, 200 as key2 from cp.`tpch/lineitem.parquet`;
> +-----------+----------------------------+
> | Fragment  | Number of records written  |
> +-----------+----------------------------+
> | 0_0       | 60175                      |
> +-----------+----------------------------+
> {code}
> Now do a self-join with a LIMIT.  This query should do an early termination because of the LIMIT but it runs for about 2 minutes:  (note in this plan there are no exchanges): 
> {code}
> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1  | key2  | key10  | key20  |
> +-------+-------+--------+--------+
> | 100   | 200   | 100    | 200    |
> +-------+-------+--------+--------+
> 1 row selected (111.764 seconds)
> {code}
> Disabling hash join does not help in this case. 
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> +-------+-----------------------------------+
> |  ok   |              summary              |
> +-------+-----------------------------------+
> | true  | planner.enable_hashjoin updated.  |
> +-------+-----------------------------------+
> 1 row selected (0.094 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1  | key2  | key10  | key20  |
> +-------+-------+--------+--------+
> | 100   | 200   | 100    | 200    |
> +-------+-------+--------+--------+
> 1 row selected (198.874 seconds)
> {code}
> However, forcing exchanges in the plan helps and the query terminates early:
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 1;
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1  | key2  | key10  | key20  |
> +-------+-------+--------+--------+
> | 100   | 200   | 100    | 200    |
> +-------+-------+--------+--------+
> 1 row selected (0.765 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)