You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/05/22 00:54:17 UTC
[jira] [Assigned] (DRILL-3137) Joining tables with lots of
duplicates and LIMIT does not do early termination
[ https://issues.apache.org/jira/browse/DRILL-3137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Aman Sinha reassigned DRILL-3137:
---------------------------------
Assignee: Aman Sinha (was: Chris Westin)
> Joining tables with lots of duplicates and LIMIT does not do early termination
> ------------------------------------------------------------------------------
>
> Key: DRILL-3137
> URL: https://issues.apache.org/jira/browse/DRILL-3137
> Project: Apache Drill
> Issue Type: Bug
> Components: Execution - Relational Operators
> Affects Versions: 0.9.0
> Reporter: Aman Sinha
> Assignee: Aman Sinha
>
> Create a table with duplicate keys:
> {code}
> create table dfs.tmp.lineitem_dup as select 100 as key1, 200 as key2 from cp.`tpch/lineitem.parquet`;
> +-----------+----------------------------+
> | Fragment | Number of records written |
> +-----------+----------------------------+
> | 0_0 | 60175 |
> +-----------+----------------------------+
> {code}
> Now do a self-join with a LIMIT. This query should do an early termination because of the LIMIT but it runs for about 2 minutes: (note in this plan there are no exchanges):
> {code}
> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1 | key2 | key10 | key20 |
> +-------+-------+--------+--------+
> | 100 | 200 | 100 | 200 |
> +-------+-------+--------+--------+
> 1 row selected (111.764 seconds)
> {code}
> Disabling hash join does not help in this case.
> 0: jdbc:drill:zk=local> alter session set `planner.enable_hashjoin` = false;
> +-------+-----------------------------------+
> | ok | summary |
> +-------+-----------------------------------+
> | true | planner.enable_hashjoin updated. |
> +-------+-----------------------------------+
> 1 row selected (0.094 seconds)
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1 | key2 | key10 | key20 |
> +-------+-------+--------+--------+
> | 100 | 200 | 100 | 200 |
> +-------+-------+--------+--------+
> 1 row selected (198.874 seconds)
> {code}
> However, forcing exchanges in the plan helps and the query terminates early:
> {code}
> 0: jdbc:drill:zk=local> alter session set `planner.slice_target` = 1;
> 0: jdbc:drill:zk=local> select * from dfs.tmp.lineitem_dup t1, dfs.tmp.lineitem_dup t2 where t1.key1 = t2.key1 limit 1;
> +-------+-------+--------+--------+
> | key1 | key2 | key10 | key20 |
> +-------+-------+--------+--------+
> | 100 | 200 | 100 | 200 |
> +-------+-------+--------+--------+
> 1 row selected (0.765 seconds)
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)