You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2015/09/18 19:58:04 UTC

[jira] [Created] (DRILL-3803) Support inequality filter evaluation as part of join operators

Aman Sinha created DRILL-3803:
---------------------------------

             Summary: Support inequality filter evaluation as part of join operators
                 Key: DRILL-3803
                 URL: https://issues.apache.org/jira/browse/DRILL-3803
             Project: Apache Drill
          Issue Type: Improvement
          Components: Execution - Relational Operators
            Reporter: Aman Sinha
            Assignee: Aman Sinha


Currently Drill evaluates an inequality filter after the join filter.  See below: 
{code}
0: jdbc:drill:zk=local> explain plan for select n1.n_name from cp.`tpch/nation.parquet` n1 inner join cp.`tpch/region.parquet` n2 on n1.n_nationkey = n2.n_nationkey and n1.n_regionkey < n2.n_regionkey;
+------+------+
| text | json |
+------+------+
| 00-00    Screen
00-01      Project(n_name=[$2])
00-02        SelectionVectorRemover
00-03          Filter(condition=[<($1, $4)])
00-04            HashJoin(condition=[=($0, $3)], joinType=[inner])
00-06              Project(n_nationkey=[$2], n_regionkey=[$0], n_name=[$1])
00-08                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/nation.parquet]], selectionRoot=classpath:/tpch/nation.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`, `n_name`]]])
00-05              Project(n_nationkey0=[$0], n_regionkey0=[$1])
00-07                Project(n_nationkey=[$1], n_regionkey=[$0])
00-09                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=classpath:/tpch/region.parquet]], selectionRoot=classpath:/tpch/region.parquet, numFiles=1, columns=[`n_nationkey`, `n_regionkey`]]])
{code}

Suppose the inequality filter is highly selective but the join's output cardinality is large.  It would be substantially better to push this filter into the join and evaluate both equality and inequality as part of the join.  

This is an enhancement.  We may decide at a later time to split this into 2 JIRAs : one for HashJoin and one for MergeJoin. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)