You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@tajo.apache.org by "Jihoon Son (JIRA)" <ji...@apache.org> on 2015/05/11 11:48:59 UTC

[jira] [Created] (TAJO-1597) Problem of ignoring theta join condition

Jihoon Son created TAJO-1597:
--------------------------------

             Summary: Problem of ignoring theta join condition
                 Key: TAJO-1597
                 URL: https://issues.apache.org/jira/browse/TAJO-1597
             Project: Tajo
          Issue Type: Bug
          Components: planner/optimizer
            Reporter: Jihoon Son
             Fix For: 0.11.0


Tajo currently does not support theta join, non-equi theta join conditions must be evaluated as a filter after join execution. However, when non-equi theta join conditions are included at on clauses, those conditions are disappeared after filter push down optimization.

This is because we assume that filters are pushed down from top during FPD phase, but theta join conditions from the on clause are not.

For example, let me suppose a query that contains a projection after a join as follows.
{noformat}
projection
       |
     join (contains a theta join condition)
    /     \
scan scan
{noformat}
During FPD optimization, the theta join condition is come up to the projection node (FilterPushDownRule.visitProjection()). This condition is converted based on information of transformedMap, which is created by findCanPushdownAndTransform() before visiting the join node. Obviously, any information for the theta join condition are not contained in transformedMap, it is ignored.

You can reproduce this bug as follows.
{noformat}
default> select n_nationkey, n_name, n_regionkey, t.cnt from nation n join (   select r_regionkey, count(*) as cnt   from nation n   join region r on (n.n_regionkey = r.r_regionkey)   group by r_regionkey ) t  on  (n.n_regionkey = t.r_regionkey) and n.n_nationkey > t.cnt  order by n_nationkey;
...
-----------------------------
Query Block Graph
-----------------------------
|-#ROOT
   |-#QB_0
-----------------------------
Optimization Log:
[LogicalPlan]
	> ProjectionNode is eliminated.
	> ProjectionNode is eliminated.
[#QB_0]
	> Non-optimized join order: (default.nation ⋈θ default.region) (cost: 86513.6)
	> Optimized join order    : (default.nation ⋈θ default.region) (cost: 86513.6)
[#ROOT]
	> Non-optimized join order: (default.nation ⋈θ default.t) (cost: 1.924062464E7)
	> Optimized join order    : (default.nation ⋈θ default.t) (cost: 1.924062464E7)
-----------------------------

SORT(8)
  => Sort Keys: default.n.n_nationkey (INT4) (asc)
   JOIN(12)(INNER)
     => Join Cond: default.n.n_regionkey (INT4) = default.t.r_regionkey (INT4)
     => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)
     => out schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8)}
     => in schema: {(5) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.t.cnt (INT8), default.t.r_regionkey (INT4)}
      TABLE_SUBQUERY(6) as default.t
        => Targets: default.t.cnt (INT8), default.t.r_regionkey (INT4)
        => out schema: {(2) default.t.cnt (INT8), default.t.r_regionkey (INT4)}
        => in  schema: {(2) default.t.r_regionkey (INT4), default.t.cnt (INT8)}
         GROUP_BY(4)(r_regionkey)
           => exprs: (count())
           => target list: default.r.r_regionkey (INT4), cnt (INT8)
           => out schema:{(2) default.r.r_regionkey (INT4), cnt (INT8)}
           => in schema:{(4) default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
            JOIN(11)(INNER)
              => Join Cond: default.n.n_regionkey (INT4) = default.r.r_regionkey (INT4)
              => target list: default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)
              => out schema: {(4) default.r.r_regionkey (INT4), default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
              => in schema: {(4) default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.r.r_regionkey (INT4)}
               SCAN(2) on default.region as r
                 => target list: default.r.r_regionkey (INT4)
                 => out schema: {(1) default.r.r_regionkey (INT4)}
                 => in schema: {(3) default.r.r_regionkey (INT4), default.r.r_name (TEXT), default.r.r_comment (TEXT)}
               SCAN(1) on default.nation as n
                 => target list: default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)
                 => out schema: {(3) default.n.n_regionkey (INT4), default.n.n_nationkey (INT4), default.n.n_name (TEXT)}
                 => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
      SCAN(0) on default.nation as n
        => target list: default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4)
        => out schema: {(3) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4)}
        => in schema: {(4) default.n.n_nationkey (INT4), default.n.n_name (TEXT), default.n.n_regionkey (INT4), default.n.n_comment (TEXT)}
...
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)