You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@drill.apache.org by "Victoria Markman (JIRA)" <ji...@apache.org> on 2015/03/13 23:36:38 UTC

[jira] [Updated] (DRILL-2454) Support for local join filter in outer joins

     [ https://issues.apache.org/jira/browse/DRILL-2454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Victoria Markman updated DRILL-2454:
------------------------------------
    Description: 
Currently in the case of outer join, we can't have local join filter on the preserving side of the outer join. "Can not plan" exception will be thrown.

Below you will find examples of queries that are going to fail:

-- Join filter on the left side of the left outer join 
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 1);
{code}

-- Join filter on the right side of the right outer join
{code:sql}
select
        count(*)
from
        cp.`tpch/lineitem.parquet` l
                right outer join
        cp.`tpch/orders.parquet` o
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
where
        o.o_orderkey = 1;
{code}

Full outer join with the join filter on the left side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

Full outer join with the join filter on the right side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and l.l_orderkey >= 1)
;
{code}

The only case that will currently work is an inner join:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                inner join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

As a reminder, queries with filter in the ON clause of the join condition are semantically different from those with the same condition in the WHERE clause.

This query will return an order and its lines, if any, for order number =100:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey)
where   o.o_orderkey = 100
;
{code}

This query will return ALL orders, but only order 100 will have any lines associated with it:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 100)
;
{code}


  was:
Currently in the case of outer join, we can't have local join filter. "Can not plan" exception will be thrown.

Below you will find examples of queries that are going to fail:

-- Join filter on the left side of the left outer join 
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 1);
{code}

-- Join filter on the right side of the right outer join
{code:sql}
select
        count(*)
from
        cp.`tpch/lineitem.parquet` l
                right outer join
        cp.`tpch/orders.parquet` o
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
where
        o.o_orderkey = 1;
{code}

Full outer join with the join filter on the left side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

Full outer join with the join filter on the right side:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                full outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and l.l_orderkey >= 1)
;
{code}

The only case that will currently work is an inner join:
{code:sql}
select
        count(*)
from
        cp.`tpch/orders.parquet` o
                inner join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
;
{code}

As a reminder, queries with filter in the ON clause of the join condition are semantically different from those with the same condition in the WHERE clause.

This query will return an order and its lines, if any, for order number =100:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l

on      (o.o_orderkey = l.l_orderkey)
where   o.o_orderkey = 100
;
{code}

This query will return ALL orders, but only order 100 will have any lines associated with it:
{code:sql}
select  *
from
        cp.`tpch/orders.parquet` o
                left outer join
        cp.`tpch/lineitem.parquet` l
on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 100)
;
{code}



> Support for local join filter in outer joins
> --------------------------------------------
>
>                 Key: DRILL-2454
>                 URL: https://issues.apache.org/jira/browse/DRILL-2454
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Relational Operators
>            Reporter: Victoria Markman
>            Assignee: Chris Westin
>
> Currently in the case of outer join, we can't have local join filter on the preserving side of the outer join. "Can not plan" exception will be thrown.
> Below you will find examples of queries that are going to fail:
> -- Join filter on the left side of the left outer join 
> {code:sql}
> select
>         count(*)
> from
>         cp.`tpch/orders.parquet` o
>                 left outer join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 1);
> {code}
> -- Join filter on the right side of the right outer join
> {code:sql}
> select
>         count(*)
> from
>         cp.`tpch/lineitem.parquet` l
>                 right outer join
>         cp.`tpch/orders.parquet` o
> on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
> where
>         o.o_orderkey = 1;
> {code}
> Full outer join with the join filter on the left side:
> {code:sql}
> select
>         count(*)
> from
>         cp.`tpch/orders.parquet` o
>                 full outer join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
> ;
> {code}
> Full outer join with the join filter on the right side:
> {code:sql}
> select
>         count(*)
> from
>         cp.`tpch/orders.parquet` o
>                 full outer join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey and l.l_orderkey >= 1)
> ;
> {code}
> The only case that will currently work is an inner join:
> {code:sql}
> select
>         count(*)
> from
>         cp.`tpch/orders.parquet` o
>                 inner join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey and o.o_orderkey >= 1)
> ;
> {code}
> As a reminder, queries with filter in the ON clause of the join condition are semantically different from those with the same condition in the WHERE clause.
> This query will return an order and its lines, if any, for order number =100:
> {code:sql}
> select  *
> from
>         cp.`tpch/orders.parquet` o
>                 left outer join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey)
> where   o.o_orderkey = 100
> ;
> {code}
> This query will return ALL orders, but only order 100 will have any lines associated with it:
> {code:sql}
> select  *
> from
>         cp.`tpch/orders.parquet` o
>                 left outer join
>         cp.`tpch/lineitem.parquet` l
> on      (o.o_orderkey = l.l_orderkey and o.o_orderkey = 100)
> ;
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)