You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/05/18 23:33:59 UTC

[jira] [Created] (DRILL-3137) Joining tables with lots of duplicates and LIMIT does not do early termination

Aman Sinha created DRILL-3137:
---------------------------------

             Summary: Joining tables with lots of duplicates and LIMIT does not do early termination
                 Key: DRILL-3137
                 URL: https://issues.apache.org/jira/browse/DRILL-3137
             Project: Apache Drill
          Issue Type: Bug
          Components: Execution - Relational Operators
    Affects Versions: 0.9.0
            Reporter: Aman Sinha
            Assignee: Chris Westin


Create a table with duplicate keys: 
{code}
create table dfs.tmp.lineitem_dup as select 100 as key1, 200 as key2 from cp.`tpch/lineitem.parquet`;
+-----------+----------------------------+
| Fragment  | Number of records written  |
+-----------+----------------------------+
| 0_0       | 60175                      |
+-----------+----------------------------+
{code}

Now do a self-join with a LIMIT.  This query should do an early termination because of the LIMIT but it runs for about 2 minutes:  (note in this plan there are no exchanges): 
{code}
select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1  | key2  | key10  | key20  |
+-------+-------+--------+--------+
| 100   | 200   | 100    | 200    |
+-------+-------+--------+--------+
1 row selected (111.764 seconds)
{code}

Disabling hash join does not help in this case. 
0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
+-------+-----------------------------------+
|  ok   |              summary              |
+-------+-----------------------------------+
| true  | planner.enable_hashjoin updated.  |
+-------+-----------------------------------+
1 row selected (0.094 seconds)
0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1  | key2  | key10  | key20  |
+-------+-------+--------+--------+
| 100   | 200   | 100    | 200    |
+-------+-------+--------+--------+
1 row selected (198.874 seconds)
{code}

However, forcing exchanges in the plan helps and the query terminates early:
{code}
0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 1;

0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1  | key2  | key10  | key20  |
+-------+-------+--------+--------+
| 100   | 200   | 100    | 200    |
+-------+-------+--------+--------+
1 row selected (0.765 seconds)
{code}






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)