You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Ivan Daschinskiy <iv...@gmail.com> on 2020/04/21 11:51:31 UTC

Why OPTIMIZE_REUSE_RESULT is set to 0 explicitly in o.a.i.i.p.query.h2.ConnectionManager#DEFAULT_DB_OPTIONS

Hello folks.

Recently I trying to investigate why when the query, i.e. «SELECT * FROM T1 WHERE ID IN (SELECT T1_ID FROM T2), is executed locally,
subquery evaluated, even if result is the same. I noticed, that in mentioned in subj (.ConnectionManager#DEFAULT_DB_OPTIONS),
parameter OPTIMIZE_REUSE_RESULT explicitly set to 0. This value disable internal H2 optimization (see details here org.h2.command.dml.Query#query(int, org.h2.result.ResultTarget) and lead the mentioned above query to be ineffective.

I cannot understand why this decision was made. Also I don’t find any evidence that setting OPTIMIZE_REUSE_RESULT to 1 could break something. Unfortunatelly, it is impossible to change this behavior per Session without forking H2, because this flag is set in org.h2.engine.Database. 

Let’s discuss possible caveats in enabling this optimization by default in DEFAULT_DB_OPTIONS.


Re: Why OPTIMIZE_REUSE_RESULT is set to 0 explicitly in o.a.i.i.p.query.h2.ConnectionManager#DEFAULT_DB_OPTIONS

Posted by Konstantin Orlov <ko...@gridgain.com>.
Ivan, Nikolay, guys 

Sorry if I didn’t put it very clearly. The only caveat I would like to emphasize is that we must invalidate the results. H2 does this by comparing table's maxDataModificationId with the databases' current modificationDataId.
We have a H2StatementCache, where Query object with lastResult is cached. If the result is not invalidated when update is happen (even not concurrent one), the query will return an incorrect result. Consider the following scenario:
	0) OPTIMIZE_REUSE_RESULT is set to 1, GridH2Table#getMaxDataModificationId still returns 0
	1) execute some query with non-correlated sub-query several times (10 should be enough), for example "select * from outerTbl where inner_id in (select id from innerTbl)"
	2) clear innerTbl table
	3) execute query one more time. Empty result set is expected

This behavior was not obvious to me when I once looked at disabled 'OPTIMIZE_REUSE_RESULT', so I decided to share with you just in case =)

-- 
Regards,
Konstantin Orlov
Software Engineer, St. Petersburg
+7 (921) 445-65-75
https://www.gridgain.com
Powered by Apache® Ignite™



> On 27 Apr 2020, at 13:36, Nikolay Izhikov <ni...@apache.org> wrote:
> 
> Hello, Konstantin.
> 
>> I think we cannot just turn this optimization on
> 
> Why is that?
> 
> If we turn on `OPTIMIZE_REUSE_RESULT` consequences would be the following:
> 
> * concurrent updated of the subquery table will not be used(we have no guarantee on that, for now)
> * performance of the select increase due to the subquery cache.
> 
>> And incrementing org.h2.engine.Database#modificationDataId on each data update may lead to some performance issues. It should be benchmarked first.
> 
> For now, Ignite doesn’t guarantee the visibility of concurrent update into select results [1] [2]
> 
>> It should be benchmarked first.
> 
> +1
> 
> What have I missed?
> 
> [1] https://apacheignite-sql.readme.io/docs/how-ignite-sql-works#subqueries-in-where-clause
> [2] https://apacheignite-sql.readme.io/docs/how-ignite-sql-works#concurrent-modifications
> 
> 
>> 22 апр. 2020 г., в 10:39, Ivan Daschinsky <iv...@gmail.com> написал(а):
>> 
>> Thank you for your response, Konstantin
>> 
>> I think we cannot just turn this optimization
>>> 
>> Of course not, that is the reason why I started this thread.
>> 
>> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table don't change
>>> it neither on remove or update
>> 
>> 
>> Yes, but we can increment this counter and find possible workarounds.
>> 
>> It should be benchmarked first.
>>> 
>> Absolutelly agree.
>> 
>> I started this thread to discuss possible caveats and traps and decide
>> whether to implement this optimization or not.
>> If there is too many traps or moreover this is impossible or SQL
>> maintainers are against this, then it's probably not worth to do it.
>> 
>> 
>> 
>> ср, 22 апр. 2020 г. в 09:52, Konstantin Orlov <ko...@gridgain.com>:
>> 
>>> Hello, Ivan
>>> 
>>> I think we cannot just turn this optimization on because a result
>>> invalidation counts on org.h2.engine.Database#modificationDataId (see
>>> org.h2.command.dml.Query#sameResultAsLast(Session, Value[], Value[],
>>> long)), but org.apache.ignite.internal.processors.query.h2.opt.GridH2Table
>>> don't change it neither on remove or update. And incrementing
>>> org.h2.engine.Database#modificationDataId on each data update may lead to
>>> some performance issues. It should be benchmarked first.
>>> 
>>> --
>>> Regards,
>>> Konstantin Orlov
>>> 
>>> 
>>>> On 21 Apr 2020, at 14:51, Ivan Daschinskiy <iv...@gmail.com> wrote:
>>>> 
>>>> Hello folks.
>>>> 
>>>> Recently I trying to investigate why when the query, i.e. «SELECT * FROM
>>> T1 WHERE ID IN (SELECT T1_ID FROM T2), is executed locally,
>>>> subquery evaluated, even if result is the same. I noticed, that in
>>> mentioned in subj (.ConnectionManager#DEFAULT_DB_OPTIONS),
>>>> parameter OPTIMIZE_REUSE_RESULT explicitly set to 0. This value disable
>>> internal H2 optimization (see details here
>>> org.h2.command.dml.Query#query(int, org.h2.result.ResultTarget) and lead
>>> the mentioned above query to be ineffective.
>>>> 
>>>> I cannot understand why this decision was made. Also I don’t find any
>>> evidence that setting OPTIMIZE_REUSE_RESULT to 1 could break something.
>>> Unfortunatelly, it is impossible to change this behavior per Session
>>> without forking H2, because this flag is set in org.h2.engine.Database.
>>>> 
>>>> Let’s discuss possible caveats in enabling this optimization by default
>>> in DEFAULT_DB_OPTIONS.
>>>> 
>>> 
>>> 
>> 
>> -- 
>> Sincerely yours, Ivan Daschinskiy
> 


Re: Why OPTIMIZE_REUSE_RESULT is set to 0 explicitly in o.a.i.i.p.query.h2.ConnectionManager#DEFAULT_DB_OPTIONS

Posted by Nikolay Izhikov <ni...@apache.org>.
Hello, Konstantin.

> I think we cannot just turn this optimization on

Why is that?

If we turn on `OPTIMIZE_REUSE_RESULT` consequences would be the following:

* concurrent updated of the subquery table will not be used(we have no guarantee on that, for now)
* performance of the select increase due to the subquery cache.

> And incrementing org.h2.engine.Database#modificationDataId on each data update may lead to some performance issues. It should be benchmarked first.

For now, Ignite doesn’t guarantee the visibility of concurrent update into select results [1] [2]

> It should be benchmarked first.

+1

What have I missed?

[1] https://apacheignite-sql.readme.io/docs/how-ignite-sql-works#subqueries-in-where-clause
[2] https://apacheignite-sql.readme.io/docs/how-ignite-sql-works#concurrent-modifications


> 22 апр. 2020 г., в 10:39, Ivan Daschinsky <iv...@gmail.com> написал(а):
> 
> Thank you for your response, Konstantin
> 
> I think we cannot just turn this optimization
>> 
> Of course not, that is the reason why I started this thread.
> 
> org.apache.ignite.internal.processors.query.h2.opt.GridH2Table don't change
>> it neither on remove or update
> 
> 
> Yes, but we can increment this counter and find possible workarounds.
> 
> It should be benchmarked first.
>> 
> Absolutelly agree.
> 
> I started this thread to discuss possible caveats and traps and decide
> whether to implement this optimization or not.
> If there is too many traps or moreover this is impossible or SQL
> maintainers are against this, then it's probably not worth to do it.
> 
> 
> 
> ср, 22 апр. 2020 г. в 09:52, Konstantin Orlov <ko...@gridgain.com>:
> 
>> Hello, Ivan
>> 
>> I think we cannot just turn this optimization on because a result
>> invalidation counts on org.h2.engine.Database#modificationDataId (see
>> org.h2.command.dml.Query#sameResultAsLast(Session, Value[], Value[],
>> long)), but org.apache.ignite.internal.processors.query.h2.opt.GridH2Table
>> don't change it neither on remove or update. And incrementing
>> org.h2.engine.Database#modificationDataId on each data update may lead to
>> some performance issues. It should be benchmarked first.
>> 
>> --
>> Regards,
>> Konstantin Orlov
>> 
>> 
>>> On 21 Apr 2020, at 14:51, Ivan Daschinskiy <iv...@gmail.com> wrote:
>>> 
>>> Hello folks.
>>> 
>>> Recently I trying to investigate why when the query, i.e. «SELECT * FROM
>> T1 WHERE ID IN (SELECT T1_ID FROM T2), is executed locally,
>>> subquery evaluated, even if result is the same. I noticed, that in
>> mentioned in subj (.ConnectionManager#DEFAULT_DB_OPTIONS),
>>> parameter OPTIMIZE_REUSE_RESULT explicitly set to 0. This value disable
>> internal H2 optimization (see details here
>> org.h2.command.dml.Query#query(int, org.h2.result.ResultTarget) and lead
>> the mentioned above query to be ineffective.
>>> 
>>> I cannot understand why this decision was made. Also I don’t find any
>> evidence that setting OPTIMIZE_REUSE_RESULT to 1 could break something.
>> Unfortunatelly, it is impossible to change this behavior per Session
>> without forking H2, because this flag is set in org.h2.engine.Database.
>>> 
>>> Let’s discuss possible caveats in enabling this optimization by default
>> in DEFAULT_DB_OPTIONS.
>>> 
>> 
>> 
> 
> -- 
> Sincerely yours, Ivan Daschinskiy


Re: Why OPTIMIZE_REUSE_RESULT is set to 0 explicitly in o.a.i.i.p.query.h2.ConnectionManager#DEFAULT_DB_OPTIONS

Posted by Ivan Daschinsky <iv...@gmail.com>.
Thank you for your response, Konstantin

I think we cannot just turn this optimization
>
Of course not, that is the reason why I started this thread.

org.apache.ignite.internal.processors.query.h2.opt.GridH2Table don't change
> it neither on remove or update


 Yes, but we can increment this counter and find possible workarounds.

 It should be benchmarked first.
>
Absolutelly agree.

I started this thread to discuss possible caveats and traps and decide
whether to implement this optimization or not.
If there is too many traps or moreover this is impossible or SQL
maintainers are against this, then it's probably not worth to do it.



ср, 22 апр. 2020 г. в 09:52, Konstantin Orlov <ko...@gridgain.com>:

> Hello, Ivan
>
> I think we cannot just turn this optimization on because a result
> invalidation counts on org.h2.engine.Database#modificationDataId (see
> org.h2.command.dml.Query#sameResultAsLast(Session, Value[], Value[],
> long)), but org.apache.ignite.internal.processors.query.h2.opt.GridH2Table
> don't change it neither on remove or update. And incrementing
> org.h2.engine.Database#modificationDataId on each data update may lead to
> some performance issues. It should be benchmarked first.
>
> --
> Regards,
> Konstantin Orlov
>
>
> > On 21 Apr 2020, at 14:51, Ivan Daschinskiy <iv...@gmail.com> wrote:
> >
> > Hello folks.
> >
> > Recently I trying to investigate why when the query, i.e. «SELECT * FROM
> T1 WHERE ID IN (SELECT T1_ID FROM T2), is executed locally,
> > subquery evaluated, even if result is the same. I noticed, that in
> mentioned in subj (.ConnectionManager#DEFAULT_DB_OPTIONS),
> > parameter OPTIMIZE_REUSE_RESULT explicitly set to 0. This value disable
> internal H2 optimization (see details here
> org.h2.command.dml.Query#query(int, org.h2.result.ResultTarget) and lead
> the mentioned above query to be ineffective.
> >
> > I cannot understand why this decision was made. Also I don’t find any
> evidence that setting OPTIMIZE_REUSE_RESULT to 1 could break something.
> Unfortunatelly, it is impossible to change this behavior per Session
> without forking H2, because this flag is set in org.h2.engine.Database.
> >
> > Let’s discuss possible caveats in enabling this optimization by default
> in DEFAULT_DB_OPTIONS.
> >
>
>

-- 
Sincerely yours, Ivan Daschinskiy

Re: Why OPTIMIZE_REUSE_RESULT is set to 0 explicitly in o.a.i.i.p.query.h2.ConnectionManager#DEFAULT_DB_OPTIONS

Posted by Konstantin Orlov <ko...@gridgain.com>.
Hello, Ivan

I think we cannot just turn this optimization on because a result invalidation counts on org.h2.engine.Database#modificationDataId (see org.h2.command.dml.Query#sameResultAsLast(Session, Value[], Value[], long)), but org.apache.ignite.internal.processors.query.h2.opt.GridH2Table don't change it neither on remove or update. And incrementing org.h2.engine.Database#modificationDataId on each data update may lead to some performance issues. It should be benchmarked first.

-- 
Regards,
Konstantin Orlov


> On 21 Apr 2020, at 14:51, Ivan Daschinskiy <iv...@gmail.com> wrote:
> 
> Hello folks.
> 
> Recently I trying to investigate why when the query, i.e. «SELECT * FROM T1 WHERE ID IN (SELECT T1_ID FROM T2), is executed locally,
> subquery evaluated, even if result is the same. I noticed, that in mentioned in subj (.ConnectionManager#DEFAULT_DB_OPTIONS),
> parameter OPTIMIZE_REUSE_RESULT explicitly set to 0. This value disable internal H2 optimization (see details here org.h2.command.dml.Query#query(int, org.h2.result.ResultTarget) and lead the mentioned above query to be ineffective.
> 
> I cannot understand why this decision was made. Also I don’t find any evidence that setting OPTIMIZE_REUSE_RESULT to 1 could break something. Unfortunatelly, it is impossible to change this behavior per Session without forking H2, because this flag is set in org.h2.engine.Database. 
> 
> Let’s discuss possible caveats in enabling this optimization by default in DEFAULT_DB_OPTIONS.
>