You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Thomas D'Silva <td...@twilio.com.INVALID> on 2019/09/12 21:54:59 UTC

Question about FilterIntoJoinRule

We are using calcite to query data in Kudu. We have implemented our own
sort rule that merges rows returned from kudu when a queries orders by
columns in the same order as the primary key of the kudu table.
For a query that does a left join, in order to get the planner to use my
sort rule I had to disable the FILTER_ON_JOIN rule and use
the DUMB_FILTER_ON_JOIN (which does not try to simplify outer joins). I
also had to modify the following code in FilterJoinRule.perform()

   // Try to push down above filters. These are typically where clause
    // filters. They can be pushed down if they are not on the NULL
    // generating side.
    boolean filterPushed = false;
    if (RelOptUtil.classifyFilters(
        join,
        aboveFilters,
        joinType,
        needsPushInto(join),
        !joinType.generatesNullsOnLeft(),
        !joinType.generatesNullsOnRight(),
        joinFilters,
        leftFilters,
        rightFilters)) {
      filterPushed = true;
    }

The above code sets the pushLeft and pushRight boolean flags only if the
join type does not generate nulls on the left or right side. I was
wondering if any one knows why this is required?
If a filter only references column present in the left or right table
shouldn't we always be able to push it down?

Re: Question about FilterIntoJoinRule

Posted by Thomas D'Silva <td...@twilio.com.INVALID>.
Stamatis,

Thank you for your explanation, this makes sense now.

On Fri, Sep 13, 2019 at 2:58 PM Stamatis Zampetakis <za...@gmail.com>
wrote:

> Hi Thomas,
>
> It is not safe to push the filter in the null generating side of a join
> since
> it changes the semantics of the query. Consider the example below.
>
> SELECT b.title
> FROM Book b
> LEFT JOIN Author a ON b.author = a.id
> WHERE a.name <> 'Victor'
>
> Author
> --------------
> id | name  |
> 1  | Victor  |
>
> Book
> -----------------------------------
> id |           title          |author
> 1  | Les Miserables | 1
>
> If you apply the a.name <> 'Victor' condition before the join
> (over the author table) then you will get the following result.
>
> 'Les Miserables'
>
> If you apply the condition after the join you will get the empty result.
>
> Best,
> Stamatis
>
> On Thu, Sep 12, 2019 at 11:55 PM Thomas D'Silva <tdsilva@twilio.com.invalid
> >
> wrote:
>
> > We are using calcite to query data in Kudu. We have implemented our own
> > sort rule that merges rows returned from kudu when a queries orders by
> > columns in the same order as the primary key of the kudu table.
> > For a query that does a left join, in order to get the planner to use my
> > sort rule I had to disable the FILTER_ON_JOIN rule and use
> > the DUMB_FILTER_ON_JOIN (which does not try to simplify outer joins). I
> > also had to modify the following code in FilterJoinRule.perform()
> >
> >    // Try to push down above filters. These are typically where clause
> >     // filters. They can be pushed down if they are not on the NULL
> >     // generating side.
> >     boolean filterPushed = false;
> >     if (RelOptUtil.classifyFilters(
> >         join,
> >         aboveFilters,
> >         joinType,
> >         needsPushInto(join),
> >         !joinType.generatesNullsOnLeft(),
> >         !joinType.generatesNullsOnRight(),
> >         joinFilters,
> >         leftFilters,
> >         rightFilters)) {
> >       filterPushed = true;
> >     }
> >
> > The above code sets the pushLeft and pushRight boolean flags only if the
> > join type does not generate nulls on the left or right side. I was
> > wondering if any one knows why this is required?
> > If a filter only references column present in the left or right table
> > shouldn't we always be able to push it down?
> >
>

Re: Question about FilterIntoJoinRule

Posted by Stamatis Zampetakis <za...@gmail.com>.
Hi Thomas,

It is not safe to push the filter in the null generating side of a join
since
it changes the semantics of the query. Consider the example below.

SELECT b.title
FROM Book b
LEFT JOIN Author a ON b.author = a.id
WHERE a.name <> 'Victor'

Author
--------------
id | name  |
1  | Victor  |

Book
-----------------------------------
id |           title          |author
1  | Les Miserables | 1

If you apply the a.name <> 'Victor' condition before the join
(over the author table) then you will get the following result.

'Les Miserables'

If you apply the condition after the join you will get the empty result.

Best,
Stamatis

On Thu, Sep 12, 2019 at 11:55 PM Thomas D'Silva <td...@twilio.com.invalid>
wrote:

> We are using calcite to query data in Kudu. We have implemented our own
> sort rule that merges rows returned from kudu when a queries orders by
> columns in the same order as the primary key of the kudu table.
> For a query that does a left join, in order to get the planner to use my
> sort rule I had to disable the FILTER_ON_JOIN rule and use
> the DUMB_FILTER_ON_JOIN (which does not try to simplify outer joins). I
> also had to modify the following code in FilterJoinRule.perform()
>
>    // Try to push down above filters. These are typically where clause
>     // filters. They can be pushed down if they are not on the NULL
>     // generating side.
>     boolean filterPushed = false;
>     if (RelOptUtil.classifyFilters(
>         join,
>         aboveFilters,
>         joinType,
>         needsPushInto(join),
>         !joinType.generatesNullsOnLeft(),
>         !joinType.generatesNullsOnRight(),
>         joinFilters,
>         leftFilters,
>         rightFilters)) {
>       filterPushed = true;
>     }
>
> The above code sets the pushLeft and pushRight boolean flags only if the
> join type does not generate nulls on the left or right side. I was
> wondering if any one knows why this is required?
> If a filter only references column present in the left or right table
> shouldn't we always be able to push it down?
>