You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Aman Sinha (JIRA)" <ji...@apache.org> on 2014/06/12 03:36:02 UTC

[jira] [Commented] (DRILL-485) Support non-equijoins as long as there is at least 1 equijoin condition between the same 2 tables

    [ https://issues.apache.org/jira/browse/DRILL-485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14028700#comment-14028700 ] 

Aman Sinha commented on DRILL-485:
----------------------------------

As mentioned in my previous comment, this has been fixed.  I tested it again on latest build (commit level 27a9c98) with the following queries .. both Explain plans and correctness of results w.r.t Postgres.   

Query 1: (Note the Filter above the HashJoin.  The Filter has the inequality '<' condition)

0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
01-02          SelectionVectorRemover
01-03            Filter(condition=[<($1, $3)])
01-04              HashJoin(condition=[=($0, $2)], joinType=[inner])
01-06                Project(o_custkey=[$1], o_orderkey=[$0])
01-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`]]]])
01-05                BroadcastExchange
02-01                  Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
02-02                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath [`c_nationkey`], SchemaPath [`c_name`]]]])  

0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and o.o_orderkey < c.c_nationkey;
+------------+
|   EXPR$0   |
+------------+
| Customer#000000370 |
| Customer#000000781 |
| Customer#000001369 |
| Customer#000000445 |
| Customer#000000557 |
| Customer#000000392 |
+------------+
6 rows selected

Query 2:  (Note the Filter above the HashJoin.  The Filter has the OR condition). 

0: jdbc:drill:zk=local> explain plan for select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey);
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      UnionExchange
01-01        Project(EXPR$0=[CAST($4):VARCHAR(20) CHARACTER SET "ISO-8859-1" COLLATE "ISO-8859-1$en_US$primary"])
01-02          SelectionVectorRemover
01-03            Filter(condition=[OR(=($1, $3), =($0, $3))])
01-04              HashJoin(condition=[=($0, $2)], joinType=[inner])
01-06                Project(o_custkey=[$1], o_orderkey=[$0])
01-07                  Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/orders]], selectionRoot=/Users/asinha/data/tpchmulti/orders, columns=[SchemaPath [`o_custkey`], SchemaPath [`o_orderkey`]]]])
01-05                BroadcastExchange
02-01                  Project(c_custkey=[$2], c_nationkey=[$1], c_name=[$0])
02-02                    Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=file:/Users/asinha/data/tpchmulti/customer]], selectionRoot=/Users/asinha/data/tpchmulti/customer, columns=[SchemaPath [`c_custkey`], SchemaPath [`c_nationkey`], SchemaPath [`c_name`]]]])

0: jdbc:drill:zk=local> select cast(c.c_name as varchar(20)) from orders o, customer c where o.o_custkey = c.c_custkey and (o.o_orderkey = c.c_nationkey or o.o_custkey = c.c_nationkey);
+------------+
|   EXPR$0   |
+------------+
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
| Customer#000000004 |
+------------+
31 rows selected



> Support non-equijoins as long as there is at least 1 equijoin condition between the same 2 tables
> -------------------------------------------------------------------------------------------------
>
>                 Key: DRILL-485
>                 URL: https://issues.apache.org/jira/browse/DRILL-485
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>            Reporter: Aman Sinha
>            Assignee: Aman Sinha
>             Fix For: 1.0.0-BETA1
>
>
> Currently, if the query has a non-equijoin between tables t1, t2 we throw an error that it is unsupported.  However, if there is at least one equijoin between the tables t1, t2 in addition to the non-equijoin, then we should perform the join and then do a filter on top using the non-equijoin condition.  
> example queries where this could be applicable: 
>   SELECT a1 FROM t1, t2 WHERE b1 = b2 AND c1 < c2;
>   SELECT a1 FROM t1, t2 WHERE b1 = b2 AND (c1 = c2 OR d1 = d2);
>   



--
This message was sent by Atlassian JIRA
(v6.2#6252)