You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/05/18 23:33:59 UTC
[jira] [Created] (DRILL-3137) Joining tables with lots of
duplicates and LIMIT does not do early termination
Aman Sinha created DRILL-3137:
---------------------------------
Summary: Joining tables with lots of duplicates and LIMIT does not do early termination
Key: DRILL-3137
URL: https://issues.apache.org/jira/browse/DRILL-3137
Project: Apache Drill
Issue Type: Bug
Components: Execution - Relational Operators
Affects Versions: 0.9.0
Reporter: Aman Sinha
Assignee: Chris Westin
Create a table with duplicate keys:
{code}
create table dfs.tmp.lineitem_dup as select 100 as key1, 200 as key2 from cp.`tpch/lineitem.parquet`;
+-----------+----------------------------+
| Fragment | Number of records written |
+-----------+----------------------------+
| 0_0 | 60175 |
+-----------+----------------------------+
{code}
Now do a self-join with a LIMIT. This query should do an early termination because of the LIMIT but it runs for about 2 minutes: (note in this plan there are no exchanges):
{code}
select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1 | key2 | key10 | key20 |
+-------+-------+--------+--------+
| 100 | 200 | 100 | 200 |
+-------+-------+--------+--------+
1 row selected (111.764 seconds)
{code}
Disabling hash join does not help in this case.
0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
+-------+-----------------------------------+
| ok | summary |
+-------+-----------------------------------+
| true | planner.enable_hashjoin updated. |
+-------+-----------------------------------+
1 row selected (0.094 seconds)
0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1 | key2 | key10 | key20 |
+-------+-------+--------+--------+
| 100 | 200 | 100 | 200 |
+-------+-------+--------+--------+
1 row selected (198.874 seconds)
{code}
However, forcing exchanges in the plan helps and the query terminates early:
{code}
0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 1;
0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
+-------+-------+--------+--------+
| key1 | key2 | key10 | key20 |
+-------+-------+--------+--------+
| 100 | 200 | 100 | 200 |
+-------+-------+--------+--------+
1 row selected (0.765 seconds)
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)