You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/12/11 21:39:46 UTC
[jira] [Created] (DRILL-4188) Change the default value of
planner.enable_hash_single_key to false
Aman Sinha created DRILL-4188:
---------------------------------
Summary: Change the default value of planner.enable_hash_single_key to false
Key: DRILL-4188
URL: https://issues.apache.org/jira/browse/DRILL-4188
Project: Apache Drill
Issue Type: Bug
Components: Query Planning & Optimization
Affects Versions: 1.4.0
Reporter: Aman Sinha
Assignee: Aman Sinha
The planner.enable_hash_single_key flag is used by the HashJoin and MergeJoin plans to do hash distribution on both sides of the join when it is a multi-column join (e.g T1.a1 = T2.a2 AND T1.b1 = T2.b2). The default value of this parameter is True, which means that Drill will generate multiple plans each with hash distribute on only 1 column. The final plan chosen is based on costing.
However, due to lack of column statistics, this approach is problematic because we could end up picking the first column for hash distribution if all plans cost the same and if this column has low number of distinct values, there could be substantial skew in distribution.
Doing the hash distribution on all columns should be the default, so I propose to change planner.enable_hash_single_key to False. The scenario where we might still want single column hash distribution is when the join is done after some other operation (e.g window function, grouped-aggregation) where the child already does a hash-distribution on 1 column that is part of the join. However, for those case, we may want to selectively enable this flag.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)