You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Yiyun Yin <yy...@gmail.com> on 2022/08/14 15:31:00 UTC

Calcite Join Optimizer Rule

Dear,
    Hello, I am Yiyun Yin. I am trying to develop a data federation query
service using calcite, and I have a question I would like to ask.
<https://stackoverflow.com/posts/73352781/timeline>

    When I use join query with calcite, how can I optimize the query plan
so that it can query the small table first, and then use the result of the
small table as the filter condition of the large table?

    I am looking forward for your answers. Thank you.


Best Regards,
Yours sincerely,
Yiyun Yin

Re: Calcite Join Optimizer Rule

Posted by Julian Hyde <jh...@gmail.com>.
Deciding join order is a basic function of cost-based query optimizers. It occurs not just with ‘dynamic filters’ or external sources, but with pretty much every join query. Consider this query:

  SELECT *
  FROM Emp AS e
    JOIN Dept AS d ON e.deptno = d.deptno
  WHERE e.job = ‘CLERK’

Emp is a smaller table than Dept, and it is made even smaller when we apply the filter condition on job. A cost-based optimizer will know the sizes of the tables, and estimate the selectivity of the ‘job = ‘CLERK’’ condition, and therefore estimate the size of the filtered Emp table.

Based on those size estimates the optimizer can choose the correct join algorithm, and also what should be the inputs to the algorithm. In this case, a hash join would make sense, with Emp on the ‘build’ side (it is read first, and goes into a hash table) and Dept on the ‘probe’ side (read second, trying to pair each row with a row in the hash table).

Julian


> On Aug 16, 2022, at 7:53 AM, Jess Balint <jb...@gmail.com> wrote:
> 
> Hi Yiyun,
> 
> The first requirement is that you will need to make some table cardinality
> estimates available to the query planner (eg. via
> RelMetadataQuery.getRowCount()). This will allow you to implement some
> threshold for deciding when to use the described join algorithm. The
> algorithm is known as "bind join". Some details can be found here:
> https://www.vldb.org/conf/1997/P276.PDF
> 
> Hope this helps.
> Jess
> 
> On Sun, Aug 14, 2022 at 11:00 PM Yiyun Yin <yy...@gmail.com> wrote:
> 
>> Dear,
>>    Hello, I am Yiyun Yin. I am trying to develop a data federation query
>> service using calcite, and I have a question I would like to ask.
>> <https://stackoverflow.com/posts/73352781/timeline>
>> 
>>    When I use join query with calcite, how can I optimize the query plan
>> so that it can query the small table first, and then use the result of the
>> small table as the filter condition of the large table?
>> 
>>    I am looking forward for your answers. Thank you.
>> 
>> 
>> Best Regards,
>> Yours sincerely,
>> Yiyun Yin
>> 


Re: Calcite Join Optimizer Rule

Posted by Jess Balint <jb...@gmail.com>.
Hi Yiyun,

The first requirement is that you will need to make some table cardinality
estimates available to the query planner (eg. via
RelMetadataQuery.getRowCount()). This will allow you to implement some
threshold for deciding when to use the described join algorithm. The
algorithm is known as "bind join". Some details can be found here:
https://www.vldb.org/conf/1997/P276.PDF

Hope this helps.
Jess

On Sun, Aug 14, 2022 at 11:00 PM Yiyun Yin <yy...@gmail.com> wrote:

> Dear,
>     Hello, I am Yiyun Yin. I am trying to develop a data federation query
> service using calcite, and I have a question I would like to ask.
> <https://stackoverflow.com/posts/73352781/timeline>
>
>     When I use join query with calcite, how can I optimize the query plan
> so that it can query the small table first, and then use the result of the
> small table as the filter condition of the large table?
>
>     I am looking forward for your answers. Thank you.
>
>
> Best Regards,
> Yours sincerely,
> Yiyun Yin
>

Re: Calcite Join Optimizer Rule

Posted by Yiyun Yin <yy...@gmail.com>.
Hi,

Many thanks for your prompt reply.Your reply is very helpful to me.

Best Regards,
Yours sincerely,
Yiyun Yin

On Mon, Aug 15, 2022 at 5:45 PM Benchao Li <li...@apache.org> wrote:

> Hi Yiyun,
>
> IIUC, you are talking about "Runtime filter", right?
> I don't know any existing rule which is doing this for now. And I think
> this is more like a
> physical implementation, rather than a pure optimization during logical
> planning.
>
> FYI, Apache Flink has a FLIP[1] that talks about this.
>
> [1]
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning
>
> Yiyun Yin <yy...@gmail.com> 于2022年8月15日周一 12:01写道:
>
> > Dear,
> >     Hello, I am Yiyun Yin. I am trying to develop a data federation query
> > service using calcite, and I have a question I would like to ask.
> > <https://stackoverflow.com/posts/73352781/timeline>
> >
> >     When I use join query with calcite, how can I optimize the query plan
> > so that it can query the small table first, and then use the result of
> the
> > small table as the filter condition of the large table?
> >
> >     I am looking forward for your answers. Thank you.
> >
> >
> > Best Regards,
> > Yours sincerely,
> > Yiyun Yin
> >
>
>
> --
>
> Best,
> Benchao Li
>

Re: Calcite Join Optimizer Rule

Posted by Benchao Li <li...@apache.org>.
Hi Yiyun,

IIUC, you are talking about "Runtime filter", right?
I don't know any existing rule which is doing this for now. And I think
this is more like a
physical implementation, rather than a pure optimization during logical
planning.

FYI, Apache Flink has a FLIP[1] that talks about this.

[1]
https://cwiki.apache.org/confluence/display/FLINK/FLIP-248%3A+Introduce+dynamic+partition+pruning

Yiyun Yin <yy...@gmail.com> 于2022年8月15日周一 12:01写道:

> Dear,
>     Hello, I am Yiyun Yin. I am trying to develop a data federation query
> service using calcite, and I have a question I would like to ask.
> <https://stackoverflow.com/posts/73352781/timeline>
>
>     When I use join query with calcite, how can I optimize the query plan
> so that it can query the small table first, and then use the result of the
> small table as the filter condition of the large table?
>
>     I am looking forward for your answers. Thank you.
>
>
> Best Regards,
> Yours sincerely,
> Yiyun Yin
>


-- 

Best,
Benchao Li