You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@calcite.apache.org by Scott Reynolds <sd...@gmail.com> on 2019/09/13 23:39:33 UTC

Query Compilation happening more often then expected

Hi,

Spent a bunch of time researching and staring at code today to understand
the code compilation path within Calcite. I started down this path because
we noticed whenever we changed the `startDate` or `endDate` for the query
it went through compilation process again. We expected it to use the
previous classes `bind` it with the new RexLiterals. I was *hoping*  the
RexLiterals were passed into the `bind()` method but that does not appear
to be the main goal of `DataContext` objects.

We also found the trick Kylin did to improve their query compilation with
prepared statements:
https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement is
stateful and I don't believe a good way to solve this issue.

I would like to propose a change to Calcite so that Filters are passed into
the `bind()` call alongside or within DataContext. This would allow the
`EnumerableRel` implementations to reference the `Filters` as arguments.
This -- I believe -- would cause any change to the filters to use
the previously compiled class instead of generating a brand new one.

I am emailing everyone on this list for two reasons:
1. Is this a bad idea ?
2. I don't have a design yet so would love any ideas. Should we stick more
stuff into `DataContext`? Should `EnumerableRel` have another method that
is used to gather these RexLiterals?

Re: Query Compilation happening more often then expected

Posted by Julian Hyde <jh...@apache.org>.
See my comments in https://issues.apache.org/jira/browse/CALCITE-963 <https://issues.apache.org/jira/browse/CALCITE-963>.

> On Sep 18, 2019, at 1:58 PM, Scott Reynolds <sd...@gmail.com> wrote:
> 
> Hi all,
> 
> I have submitted my pull request in github and wrote up a design document
> as a comment on CALCITE-963
> <https://issues.apache.org/jira/browse/CALCITE-963?focusedCommentId=16932829&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16932829>
> 
> My implementation deviates considerably from what is being discussed in
> this thread so I rushed it up so you all can let me know what I am missing.
> Here is a excerpt from the design document:
> 
> Goal
> 
> When a query is issued to Calcite it is parsed, optimized and then
> generates a String of Java Class that implements Bindable.
> EnumerableInterpretable creates this string and checks to see if that
> string exists in com.google.common.cache and if it doesn't it will call
> into a Java compiler. Compilation process can take a considerable amount of
> time, Apache Kylin reported 50 to 150ms of additional computation time.
> Today, Apache Calcite will generate unique Java Class strings whenever any
> part of the query changes. This document details out the design and
> implementation of a hoisting technique within Apache Calcite. This design
> and implementation greatly increases the cache hit rate of
> EnumerableInterpretable's BINDABLE_CACHE.
> Non Goals
> 
> This implementation is not designed to change the planning process. It does
> not transform RexLiteral into RexDynamicParam, and doesn't change the cost
> calculation of the query.
> 
> Please take a moment to read the design document let me know what you think.
> 
> On Tue, Sep 17, 2019 at 12:27 PM Vladimir Sitnikov <
> sitnikov.vladimir@gmail.com> wrote:
> 
>> Stamatis>Out of curiosity does anybody know if popular DBMS (Postgres)
>> support "hoisting"?
>> 
>> PostgreSQL does support it, and here's a reproducible case when that
>> feature makes the query 300 times slower:
>> https://gist.github.com/vlsi/df08cbef370b2e86a5c1
>> 
>> Vladimir
>> 


Re: Query Compilation happening more often then expected

Posted by Scott Reynolds <sd...@gmail.com>.
Hi all,

I have submitted my pull request in github and wrote up a design document
as a comment on CALCITE-963
<https://issues.apache.org/jira/browse/CALCITE-963?focusedCommentId=16932829&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16932829>

My implementation deviates considerably from what is being discussed in
this thread so I rushed it up so you all can let me know what I am missing.
Here is a excerpt from the design document:

Goal

When a query is issued to Calcite it is parsed, optimized and then
generates a String of Java Class that implements Bindable.
EnumerableInterpretable creates this string and checks to see if that
string exists in com.google.common.cache and if it doesn't it will call
into a Java compiler. Compilation process can take a considerable amount of
time, Apache Kylin reported 50 to 150ms of additional computation time.
Today, Apache Calcite will generate unique Java Class strings whenever any
part of the query changes. This document details out the design and
implementation of a hoisting technique within Apache Calcite. This design
and implementation greatly increases the cache hit rate of
EnumerableInterpretable's BINDABLE_CACHE.
Non Goals

This implementation is not designed to change the planning process. It does
not transform RexLiteral into RexDynamicParam, and doesn't change the cost
calculation of the query.

Please take a moment to read the design document let me know what you think.

On Tue, Sep 17, 2019 at 12:27 PM Vladimir Sitnikov <
sitnikov.vladimir@gmail.com> wrote:

> Stamatis>Out of curiosity does anybody know if popular DBMS (Postgres)
> support "hoisting"?
>
> PostgreSQL does support it, and here's a reproducible case when that
> feature makes the query 300 times slower:
> https://gist.github.com/vlsi/df08cbef370b2e86a5c1
>
> Vladimir
>

Re: Query Compilation happening more often then expected

Posted by Vladimir Sitnikov <si...@gmail.com>.
Stamatis>Out of curiosity does anybody know if popular DBMS (Postgres)
support "hoisting"?

PostgreSQL does support it, and here's a reproducible case when that
feature makes the query 300 times slower:
https://gist.github.com/vlsi/df08cbef370b2e86a5c1

Vladimir

Re: Query Compilation happening more often then expected

Posted by Julian Hyde <jh...@apache.org>.
I’m just saying the approach isn’t perfect. And it isn’t implemented. If you are able to change your application to use bind variables, that is superior in every way.

You mention “a quick recheck of cost function”. This is a very nice use case for multi-objective parametric query optimization[1], which is a very elegant extension to the query optimization and which I think will gradually become the standard approach. The idea would be to generate plans assuming that the selectivity of a condition is a parameter varies between 0 and 1, and figure out at planning time what is the selectivity value at which plan A ceases to be the optimal plan and plan B becomes the optimal plan.

Julian

[1] https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/abstract <https://cacm.acm.org/magazines/2017/10/221322-multi-objective-parametric-query-optimization/abstract>

> On Sep 17, 2019, at 9:11 AM, Julian Feinauer <j....@pragmaticminds.de> wrote:
> 
> Hi,
> 
> this is a good point Julian. So an implementation should consider a re-planning (possibly triggered by a quick recheck of cost function with the given Literal values). But this should not be a general issue with the approach, or?
> 
> JulianF
> 
> Am 16.09.19, 23:36 schrieb "Julian Hyde" <jh...@apache.org>:
> 
>    I found evidence that MSSQL[1] and Sybase ASE[2] do it.
> 
>    I agree, it's not a free lunch. For instance, if a column has a
>    non-uniform distribution, some values might be much more selective
>    than others, and it would be much better to know which value you are
>    dealing with at planning time, rather than execution time.
> 
>    Julian
> 
>    [1] https://docs.microsoft.com/en-us/sql/relational-databases/performance/specify-query-parameterization-behavior-by-using-plan-guides?view=sql-server-2017
> 
>    [2] http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/BIIIBEJJ.htm
> 
>    On Mon, Sep 16, 2019 at 3:36 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>> 
>> Out of curiosity does anybody know if popular DBMS (Postgres, Oracle, SQL
>> Server, etc.) support "hoisting"?
>> 
>> Performing it all the time does not seem a very good idea (constant
>> reduction, histograms, and other optimization techniques would be
>> impossible)
>> while leaving its configuration to the end-user may not be a
>> straightforward decision.
>> 
>> On Sat, Sep 14, 2019 at 4:29 PM Julian Hyde <jh...@gmail.com> wrote:
>> 
>>> The idea of converting literals into bind variables is called “hoisting”.
>>> We had the idea a while ago but have not implemented it.
>>> 
>>> https://issues.apache.org/jira/browse/CALCITE-963
>>> 
>>> Until that feature is implemented, you will need to create bind variables
>>> explicitly, and bind them before executing the query.
>>> 
>>> Julian
>>> 
>>>> On Sep 13, 2019, at 4:39 PM, Scott Reynolds <sd...@gmail.com>
>>> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Spent a bunch of time researching and staring at code today to understand
>>>> the code compilation path within Calcite. I started down this path
>>> because
>>>> we noticed whenever we changed the `startDate` or `endDate` for the query
>>>> it went through compilation process again. We expected it to use the
>>>> previous classes `bind` it with the new RexLiterals. I was *hoping*  the
>>>> RexLiterals were passed into the `bind()` method but that does not appear
>>>> to be the main goal of `DataContext` objects.
>>>> 
>>>> We also found the trick Kylin did to improve their query compilation with
>>>> prepared statements:
>>>> https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement
>>> is
>>>> stateful and I don't believe a good way to solve this issue.
>>>> 
>>>> I would like to propose a change to Calcite so that Filters are passed
>>> into
>>>> the `bind()` call alongside or within DataContext. This would allow the
>>>> `EnumerableRel` implementations to reference the `Filters` as arguments.
>>>> This -- I believe -- would cause any change to the filters to use
>>>> the previously compiled class instead of generating a brand new one.
>>>> 
>>>> I am emailing everyone on this list for two reasons:
>>>> 1. Is this a bad idea ?
>>>> 2. I don't have a design yet so would love any ideas. Should we stick
>>> more
>>>> stuff into `DataContext`? Should `EnumerableRel` have another method that
>>>> is used to gather these RexLiterals?
>>> 
> 
> 


Re: Query Compilation happening more often then expected

Posted by Julian Feinauer <j....@pragmaticminds.de>.
Hi,

this is a good point Julian. So an implementation should consider a re-planning (possibly triggered by a quick recheck of cost function with the given Literal values). But this should not be a general issue with the approach, or?

JulianF

Am 16.09.19, 23:36 schrieb "Julian Hyde" <jh...@apache.org>:

    I found evidence that MSSQL[1] and Sybase ASE[2] do it.
    
    I agree, it's not a free lunch. For instance, if a column has a
    non-uniform distribution, some values might be much more selective
    than others, and it would be much better to know which value you are
    dealing with at planning time, rather than execution time.
    
    Julian
    
    [1] https://docs.microsoft.com/en-us/sql/relational-databases/performance/specify-query-parameterization-behavior-by-using-plan-guides?view=sql-server-2017
    
    [2] http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/BIIIBEJJ.htm
    
    On Mon, Sep 16, 2019 at 3:36 PM Stamatis Zampetakis <za...@gmail.com> wrote:
    >
    > Out of curiosity does anybody know if popular DBMS (Postgres, Oracle, SQL
    > Server, etc.) support "hoisting"?
    >
    > Performing it all the time does not seem a very good idea (constant
    > reduction, histograms, and other optimization techniques would be
    > impossible)
    > while leaving its configuration to the end-user may not be a
    > straightforward decision.
    >
    > On Sat, Sep 14, 2019 at 4:29 PM Julian Hyde <jh...@gmail.com> wrote:
    >
    > > The idea of converting literals into bind variables is called “hoisting”.
    > > We had the idea a while ago but have not implemented it.
    > >
    > > https://issues.apache.org/jira/browse/CALCITE-963
    > >
    > > Until that feature is implemented, you will need to create bind variables
    > > explicitly, and bind them before executing the query.
    > >
    > > Julian
    > >
    > > > On Sep 13, 2019, at 4:39 PM, Scott Reynolds <sd...@gmail.com>
    > > wrote:
    > > >
    > > > Hi,
    > > >
    > > > Spent a bunch of time researching and staring at code today to understand
    > > > the code compilation path within Calcite. I started down this path
    > > because
    > > > we noticed whenever we changed the `startDate` or `endDate` for the query
    > > > it went through compilation process again. We expected it to use the
    > > > previous classes `bind` it with the new RexLiterals. I was *hoping*  the
    > > > RexLiterals were passed into the `bind()` method but that does not appear
    > > > to be the main goal of `DataContext` objects.
    > > >
    > > > We also found the trick Kylin did to improve their query compilation with
    > > > prepared statements:
    > > > https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement
    > > is
    > > > stateful and I don't believe a good way to solve this issue.
    > > >
    > > > I would like to propose a change to Calcite so that Filters are passed
    > > into
    > > > the `bind()` call alongside or within DataContext. This would allow the
    > > > `EnumerableRel` implementations to reference the `Filters` as arguments.
    > > > This -- I believe -- would cause any change to the filters to use
    > > > the previously compiled class instead of generating a brand new one.
    > > >
    > > > I am emailing everyone on this list for two reasons:
    > > > 1. Is this a bad idea ?
    > > > 2. I don't have a design yet so would love any ideas. Should we stick
    > > more
    > > > stuff into `DataContext`? Should `EnumerableRel` have another method that
    > > > is used to gather these RexLiterals?
    > >
    


Re: Query Compilation happening more often then expected

Posted by Julian Hyde <jh...@apache.org>.
I found evidence that MSSQL[1] and Sybase ASE[2] do it.

I agree, it's not a free lunch. For instance, if a column has a
non-uniform distribution, some values might be much more selective
than others, and it would be much better to know which value you are
dealing with at planning time, rather than execution time.

Julian

[1] https://docs.microsoft.com/en-us/sql/relational-databases/performance/specify-query-parameterization-behavior-by-using-plan-guides?view=sql-server-2017

[2] http://infocenter.sybase.com/help/index.jsp?topic=/com.sybase.infocenter.dc00743.1570/html/queryprocessing/BIIIBEJJ.htm

On Mon, Sep 16, 2019 at 3:36 PM Stamatis Zampetakis <za...@gmail.com> wrote:
>
> Out of curiosity does anybody know if popular DBMS (Postgres, Oracle, SQL
> Server, etc.) support "hoisting"?
>
> Performing it all the time does not seem a very good idea (constant
> reduction, histograms, and other optimization techniques would be
> impossible)
> while leaving its configuration to the end-user may not be a
> straightforward decision.
>
> On Sat, Sep 14, 2019 at 4:29 PM Julian Hyde <jh...@gmail.com> wrote:
>
> > The idea of converting literals into bind variables is called “hoisting”.
> > We had the idea a while ago but have not implemented it.
> >
> > https://issues.apache.org/jira/browse/CALCITE-963
> >
> > Until that feature is implemented, you will need to create bind variables
> > explicitly, and bind them before executing the query.
> >
> > Julian
> >
> > > On Sep 13, 2019, at 4:39 PM, Scott Reynolds <sd...@gmail.com>
> > wrote:
> > >
> > > Hi,
> > >
> > > Spent a bunch of time researching and staring at code today to understand
> > > the code compilation path within Calcite. I started down this path
> > because
> > > we noticed whenever we changed the `startDate` or `endDate` for the query
> > > it went through compilation process again. We expected it to use the
> > > previous classes `bind` it with the new RexLiterals. I was *hoping*  the
> > > RexLiterals were passed into the `bind()` method but that does not appear
> > > to be the main goal of `DataContext` objects.
> > >
> > > We also found the trick Kylin did to improve their query compilation with
> > > prepared statements:
> > > https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement
> > is
> > > stateful and I don't believe a good way to solve this issue.
> > >
> > > I would like to propose a change to Calcite so that Filters are passed
> > into
> > > the `bind()` call alongside or within DataContext. This would allow the
> > > `EnumerableRel` implementations to reference the `Filters` as arguments.
> > > This -- I believe -- would cause any change to the filters to use
> > > the previously compiled class instead of generating a brand new one.
> > >
> > > I am emailing everyone on this list for two reasons:
> > > 1. Is this a bad idea ?
> > > 2. I don't have a design yet so would love any ideas. Should we stick
> > more
> > > stuff into `DataContext`? Should `EnumerableRel` have another method that
> > > is used to gather these RexLiterals?
> >

Re: Query Compilation happening more often then expected

Posted by Stamatis Zampetakis <za...@gmail.com>.
Out of curiosity does anybody know if popular DBMS (Postgres, Oracle, SQL
Server, etc.) support "hoisting"?

Performing it all the time does not seem a very good idea (constant
reduction, histograms, and other optimization techniques would be
impossible)
while leaving its configuration to the end-user may not be a
straightforward decision.

On Sat, Sep 14, 2019 at 4:29 PM Julian Hyde <jh...@gmail.com> wrote:

> The idea of converting literals into bind variables is called “hoisting”.
> We had the idea a while ago but have not implemented it.
>
> https://issues.apache.org/jira/browse/CALCITE-963
>
> Until that feature is implemented, you will need to create bind variables
> explicitly, and bind them before executing the query.
>
> Julian
>
> > On Sep 13, 2019, at 4:39 PM, Scott Reynolds <sd...@gmail.com>
> wrote:
> >
> > Hi,
> >
> > Spent a bunch of time researching and staring at code today to understand
> > the code compilation path within Calcite. I started down this path
> because
> > we noticed whenever we changed the `startDate` or `endDate` for the query
> > it went through compilation process again. We expected it to use the
> > previous classes `bind` it with the new RexLiterals. I was *hoping*  the
> > RexLiterals were passed into the `bind()` method but that does not appear
> > to be the main goal of `DataContext` objects.
> >
> > We also found the trick Kylin did to improve their query compilation with
> > prepared statements:
> > https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement
> is
> > stateful and I don't believe a good way to solve this issue.
> >
> > I would like to propose a change to Calcite so that Filters are passed
> into
> > the `bind()` call alongside or within DataContext. This would allow the
> > `EnumerableRel` implementations to reference the `Filters` as arguments.
> > This -- I believe -- would cause any change to the filters to use
> > the previously compiled class instead of generating a brand new one.
> >
> > I am emailing everyone on this list for two reasons:
> > 1. Is this a bad idea ?
> > 2. I don't have a design yet so would love any ideas. Should we stick
> more
> > stuff into `DataContext`? Should `EnumerableRel` have another method that
> > is used to gather these RexLiterals?
>

Re: Query Compilation happening more often then expected

Posted by Julian Hyde <jh...@gmail.com>.
The idea of converting literals into bind variables is called “hoisting”. We had the idea a while ago but have not implemented it. 

https://issues.apache.org/jira/browse/CALCITE-963

Until that feature is implemented, you will need to create bind variables explicitly, and bind them before executing the query. 

Julian

> On Sep 13, 2019, at 4:39 PM, Scott Reynolds <sd...@gmail.com> wrote:
> 
> Hi,
> 
> Spent a bunch of time researching and staring at code today to understand
> the code compilation path within Calcite. I started down this path because
> we noticed whenever we changed the `startDate` or `endDate` for the query
> it went through compilation process again. We expected it to use the
> previous classes `bind` it with the new RexLiterals. I was *hoping*  the
> RexLiterals were passed into the `bind()` method but that does not appear
> to be the main goal of `DataContext` objects.
> 
> We also found the trick Kylin did to improve their query compilation with
> prepared statements:
> https://issues.apache.org/jira/browse/KYLIN-3434 but PreparedStatement is
> stateful and I don't believe a good way to solve this issue.
> 
> I would like to propose a change to Calcite so that Filters are passed into
> the `bind()` call alongside or within DataContext. This would allow the
> `EnumerableRel` implementations to reference the `Filters` as arguments.
> This -- I believe -- would cause any change to the filters to use
> the previously compiled class instead of generating a brand new one.
> 
> I am emailing everyone on this list for two reasons:
> 1. Is this a bad idea ?
> 2. I don't have a design yet so would love any ideas. Should we stick more
> stuff into `DataContext`? Should `EnumerableRel` have another method that
> is used to gather these RexLiterals?