You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Bryan Pendleton (JIRA)" <ji...@apache.org> on 2006/11/29 23:55:23 UTC

[jira] Created: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Optimizer performance slowdown from 10.1 to 10.2
------------------------------------------------

Key: DERBY-2130
URL: http://issues.apache.org/jira/browse/DERBY-2130
Project: Derby
Issue Type: Bug
Components: Performance, SQL
Affects Versions: 10.2.1.6, 10.1.3.1, 10.3.0.0
Reporter: Bryan Pendleton

Attached is 'repro.sql', an IJ script which demonstrates what I
believe to be a serious performance issue in the Optimizer.

I have run this script in a number of configurations:

- 10.1.2.1: the script runs successfully. The 'prepare' statement
takes about 90 seconds, on a fairly powerful Windows machine

- 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777

- 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
often takes about 220 seconds, on the same Windows machine

Intermittently, on 10.2 and on the trunk, the prepare statement takes
15+ minutes. I cannot reliably reproduce this; I run the same script
several times in a row and I cannot predict whether it will take 220
seconds or whether it will take 15+ minutes.

I am quite motivated to work on this problem, as this is blocking me from
using Derby for a project that I'm quite keen on, but I need some
suggestions and ideas about how to attack it. From my perspective
there are 3 primary topics:

1) Why did optimizer performance for this query degrade so significantly
from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
for this particular query at least, in 10.2. Sometimes it is 10x slower.

2) What is the source of the non-determinism? Why does the optimizer
often take 4 minutes to optimize this query on the trunk, but sometimes
take 15+ minutes? I don't believe that I'm changing anything from
run to run.

3) Can we improve the optimizer performance even beyond what it was
for 10.1.2? I realize that this is an ugly query, but I was hoping to
see an optimization time of 5-10 seconds, not 90 seconds (and certainly
not 220 seconds).

I have attempted to start answering some of these questions, with
limited success. Here is some of what I think I've discovered so far:

- the optimizer changes in 10.2 seem to have given the optimizer many
more choices of possible query plans to consider. I think this means
that, if the optimizer does not time out, it will spend substantially
more time optimizing because there are more choices to evaluate. Does
this by itself mean that the optimizer will take 2.5 times longer in
10.2 than it did in 10.1?

- something about this query seems to make the costing mechanism go
haywire, and produce extreme costs. While stepping through the
optimization of this query in the debugger I have seen it compute
costs like 1e63 and 1e200. This might be very closely related to
DERBY-1905, although I don't think I'm doing any subqueries here.
But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
At any rate, due to the enormous estimated costs, timeout does not
occur.

- the WHERE clause in this query is converted during compilation to
an equivalent IN clause, I believe, which then causes me to run into
a number of the problems described in DERBY-47 and DERBY-713.
Specifically, rather than constructing a plan which involves 4
index probes for the 4 WHERE clause values, the optimizer decides
that an index scan must be performed and that it will have to process
the entire index (because the query uses parameter markers, not
literal values). So perhaps solving DERBY-47 would help me

- the optimizer in fact comes up with a "decent" query plan quite quickly.
I have experimented with placing a hard limit into the optimizer
timeout code, so that I can force optimization to stop after an
arbitrary fixed period of time. Then I have been able to set that
value to as low as 1 second, and the optimizer has produced plans
that then execute in a few milliseconds. Of course, I have only tried
this with a trivial amount of data in my database, so it's possible
that the plan produced by the optimizer after just a second of
optimizing is in fact poor, and I'm just not noticing it because my
data sizes are so small.

At this point, what would be really helpful to me would be some suggestions
about some general approaches or techniques to try to start breaking down
and analyzing this problem.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Created: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Army <qo...@gmail.com>.

Bryan Pendleton (JIRA) wrote:
> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------

Just some quick comments based on my first (rather quick) reading of the 
description:

>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?

Not necessarily "will", but "could", yes.  Take as an example the following 
discussion, copied from the html document attached to DERBY-781:

<begin copy>

   select <...> from t1, v1 where v1.xxx = t1.yyy ...

where v1 is a view that consitutes a large, non-flattenable subquery taking 100 
units of time per "decorated" permutation to optimize, while t1 takes 1 unit of 
time per "decorated" permutation to optimize.  Assume there's an index on T1. 
Prior to the changes for DERBY-781, the optimization may have gone something 
like this:

     * Try join order {t1, v1}.  We'd have two permutations of T1 (table scan 
and index scan) and a single permutation (nested loop join) for v1, giving 
optimize time of 2 + 100 = 102.
     * Try join order {v1, t1}.  We'd have a single permutation of V1 (nested 
loop join) and four permutations for T1 (table scan w/ nested loop, table scan 
w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 = 
104.
     * Total optimize time: 102 + 104 = 206.

With the changes for DERBY-781, though, the optimization would be:

     * Try join order {t1, v1}.  We'd have two permutations of T1 (table scan 
and index scan) and *two* permutations for v1 (nested loop join and hash join), 
giving optimize time of 2 + 200 = *202*.
     * Try join order {v1, t1}.  We'd have a single permutation of V1 (nested 
loop join) and four permutations for T1 (table scan w/ nested loop, table scan 
w/ hash, index w/ nested loop, index w/ hash), giving optimize time of 4 + 100 = 
104.
     * Total optimize time: 202 + 104 = 306.

So the total time it takes to optimize the query in this case jumps from 206 
units to 306 units--i.e. almost 50%.  If the subquery is fairly simple, this 
additional compilation time will be fairly negligible; but if it's a large 
and/or complicated subquery, the time to compile can grow significantly.

<end copy>

So in short, one unfortunate side effect of having the optimizer consider hash 
joins with subqueries is that we spend more time optimizing.  This is typically 
worth it because the optimizer can find a better plan--but as you have 
discovered, this can also make things worse for queries that have relatively 
small execution times.

>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.

Yes, I think this is DERBY-1905.  You have views in your query and you are 
joining with those views.  The views themselves constitute "subqueries" in the 
sense that they are queries which need to be compiled and executed as part of 
the outer query.  Most of the "subquery" discussion that occurs for DERBY-781, 
DERBY-805, DERBY-1905, and other related optimizer issues are dealing with 
exactly this kind of query--i.e. the views themselves are the "subqueries" in 
question.  You could just as easily "in-line" the view declaration directly into 
the query as a subquery; from an optimizer standpoint, I think the result is the 
same...?

>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.

Yes, sounds like DERBY-1905.

>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me

That might be true.  I haven't actually spent any time looking at DERBY-47, but 
that certainly seems to be the thorn in a lot of people's sides.  That would 
definitely constitute a "pain point" for a number of Derby users...

>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.

Does the query plan chosen by the optimizer when it spends 220 seconds (or 15 
minutes (!)) execute in a time similar to the other plans that you've seen 
(including 10.1), or is it slower?

> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

Unfortunately, there are no clean-cut rules to help out with this kind of thing. 
    You have already determined that the problem is with compilation, not 
execution.  You have a repro and you verified that the query is much slower with 
10.2 than it was 10.1.  This is all great information--thank you for your 
up-front thoroughness.

At this point my inclination would be to get the query plans for 10.1, trunk 
(220 seconds), and trunk (15 minutes).  Then, if you have time and resources, 
get those query plans again with "derby.optimizer.noTimeout" set to true.  With 
that information you can then compare the query plans to see if anything 
interesting shows up.  (This is one reason why it would nice to have improved 
query plan logging; the current plan dump has a whole of stuff that is rather 
hard to manage).

After that I don't have much wisdom.  I myself would start by figuring out what 
join order is chosen in 10.1.  Then try to figure out how that join order is 
treated in 10.2--do we ever get to it, or do we timeout beforehand?  If we get 
to it, what happens to it?  Do we try out the same decorated permuations (index 
choices, join strategies) for that permutation? Do we throw it out in favor of 
something else?

In terms of the actual code, I added some timeout 'reset' logic for the pushing 
of predicates a while ago (I think it was for DERBY-805?).  In hindsight I think 
that was probably not a good idea, as it could delay timeout for subqueries for 
a potentially very long time.  This might be a place to start; see 
OptimizerImpl.prepForNextRound().

That's all I've got for now.  Hopefully there's something helpful in this email; 
I'll re-read the description again and see if anything else comes to mind.

Thank you for filing the issue and for your interest in resolving it.

Army

PS.  I ran the repro on my laptop with a bunch of apps running in the 
background, and it took a full 35 minutes to complete.  It *did* complete, 
though, which rules out infinite looping.  Not exactly thrilling news, I know, 
but it's something...

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-2130:
---------------------------------

      Issue & fix info: [High Value Fix]
    Bug behavior facts: [Performance, Regression]  (was: [Regression])

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Bryan Pendleton <bp...@amberpoint.com>.

Mike Matrigali wrote:
> just so we are working on the same case, when you run is there
> also 0 rows in all the tables?

Correct. I am just running that exact script from the JIRA,
and all it does is issue all the CREATE TABLE/INDEX/VIEW
statements and then prepare the query.

So the tables are brand new, and empty.

thanks,

bryan

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Mike Matrigali <mi...@sbcglobal.net>.

just so we are working on the same case, when you run is there
also 0 rows in all the tables?

Bryan Pendleton wrote:
>  >  * I did not see the non-deterministic behavior that Bryan did
>  > with 10.2 trunk--instead, I consistently see 7+ minutes.  However,
>  > I only ran the query 3 times, which is probably not enough to have
>  > seen what Bryan saw.  Also note that I only ran the DDL one time
>  > and then I re-ran the query the second two times.  So given Mike's
>  > previous comment, this may be the reason I didn't see the variance 
> that Bryan did.
> 
> Thanks for all the help and good suggestions so far -- they have given
> me a lot of ideas to pursue.
> 
> I think that Mike's theory about the source of the non-determinism
> being due to whether I recreate the tables each time, or re-use the
> existing tables, is a good one, and I'll see if I can confirm it
> with more tests of my own.
> 
> Meanwhile, I'll also investigate the other pointers you passed along.
> 
> Regarding the comment that the DERBY-2130 test case is a hairy one,
> and it would be easier to start with a simpler one, I agree. On the
> other hand, the appeal of the DERBY-2130 test case is that:
> a) it is complicated enough to vividly show that there is a problem
> b) it is still pruned down enough so that it runs; some of my other
> test cases ran for hours and "never" terminated :)
> 
> Again, thanks for all the help and ideas; they are much appreciated.
> 
> bryan
> 
> 
> 
>

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Bryan Pendleton <bp...@amberpoint.com>.

 >  * I did not see the non-deterministic behavior that Bryan did
 > with 10.2 trunk--instead, I consistently see 7+ minutes.  However,
 > I only ran the query 3 times, which is probably not enough to have
 > seen what Bryan saw.  Also note that I only ran the DDL one time
 > and then I re-ran the query the second two times.  So given Mike's
 > previous comment, this may be the reason I didn't see the variance that Bryan did.

Thanks for all the help and good suggestions so far -- they have given
me a lot of ideas to pursue.

I think that Mike's theory about the source of the non-determinism
being due to whether I recreate the tables each time, or re-use the
existing tables, is a good one, and I'll see if I can confirm it
with more tests of my own.

Meanwhile, I'll also investigate the other pointers you passed along.

Regarding the comment that the DERBY-2130 test case is a hairy one,
and it would be easier to start with a simpler one, I agree. On the
other hand, the appeal of the DERBY-2130 test case is that:
a) it is complicated enough to vividly show that there is a problem
b) it is still pruned down enough so that it runs; some of my other
test cases ran for hours and "never" terminated :)

Again, thanks for all the help and ideas; they are much appreciated.

bryan

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12454745 ] 
            
A B commented on DERBY-2130:
----------------------------

> Some items might be:

> 1) update the store costs to reflect current machines
> o update the store costs to reflect "actual" machine it is running on
> o figure out why there is such variance in DERBY-2130
> o investigate why optimizer costs are so out of line - is it just
>  wrong logic, is it lack of some sort of cost estimate so one bad
>  guess on top of another.
> o DERBY-47

> I am interested in doing 1, but I don't think it is going to help
> with infinite costs.

> Is anyone else interested in looking at this?

I am interested both in this particular issue (DERBY-2130) and also in #3: why the optimizer cost estimates are so out of line.  I.e. DERBY-1905.  I haven't started just yet but I would like to look more into DERBY-1905 to see if I can pull anything up.  I hope to start looking at this very soon...

For what it's worth, I ran the repro attached to this issue with various sets of Derby insane jars, and the timings I saw were as follows.  I ran the repro 3 times for each set of jars on Red Hat Linux using ibm142.  Unit of measurement is milliseconds (returned via the "elapsedtime on" ij command).

10.1 Pre-805:

  - 141570
  - 142463
  - 140686

10.2 Pre-805:

  - 143794
  - 139213
  - 138892

10.1 latest branch:

  - 43494
  - 37229
  - 37532

10.2 Post-805, Pre-781 (10.2.0.4, svn 423199)

  - NPE (caused by DERBY-1777). 

10.2 latest branch:

  - 470067
  - 438084
  - 433038

Things to notice:

 * It looks like the slow-down is *not* caused by the DERBY-805 changes--at least not directly.  On the contrary, the query actually runs quite a bit more quickly with the DERBY-805 changes (and all subsequent patches, esp. DERBY-1777) than it does without.  Or put another way, the query runs faster with the latest 10.1 codeline than it does with 10.1.2.1.  Of course, it's possible that changes for DERBY-805 are mingling with more recent changes to cause the slowdown, but we at least know that the DERBY-805 changes alone are not the problem.

 * The run at revision 423199 fails with an NPE, which corresponds to Bryan's results when he ran with 10.1.3.1.  In both cases the DERBY-805 changes have been committed but the subsequent regression fixes (including DERBY-1777) have not, hence the NPE.

 * I did not see the non-deterministic behavior that Bryan did with 10.2 trunk--instead, I consistently see 7+ minutes.  However, I only ran the query 3 times, which is probably not enough to have seen what Bryan saw.  Also note that I only ran the DDL one time and then I re-ran the query the second two times.  So given Mike's previous comment, this may be the reason I didn't see the variance that Bryan did.

Note sure how useful this information is, but that's what I've found so far.  Still scratching my head and poking around, though, so maybe something will come up. (My inclination is toward DERBY-781, but I don't really know how just yet...)

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Performance
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Rick Hillegas (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859413#action_12859413 ] 

Rick Hillegas commented on DERBY-2130:
--------------------------------------

Hi Bryan,

You contributed a patch for this issue more than 3 years ago but the patch probably needs to be updated before it can be applied now. Are you interested in pursuing this?

Thanks,
-Rick

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Army <qo...@gmail.com>.

Bryan Pendleton wrote:

> Independently, I tried fiddling with the cost estimates myself, because as
> I stepped through HeapCostController and BTreeCostController, my reaction
> was that the costs were possibly 2 orders of magnitude too high. In my
> experiments, I didn't see much benefit to the optimizer, but I wonder if
> that's because I *also* need to have some of these other fixes (such as the
> permute state fix from jumpReset.patch and the JUMPING-give-up change that
> you've been studying) in order for the cost estimate changes to take 
> effect?

I was actually going a very different direction with this.  I don't know 
anything about the CostControllers so I tend to avoid them, perhaps to my 
detriment ;)

I've attached a partial patch with my (potentially completely wrong) approach to 
DERBY-1905.  Feel free to look at that and comment/fix it up as needed.  I don't 
have time at the moment to pursue those changes and will not be able to until at 
least next year...

Army

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> I'm curious: what is the optimization time that you consistently see 
> with jumpReset.patch?  Is it the lesser time (i.e 170-200 seconds) or 
> the greater time (500-650s)?  Or something else entirely?

The values I saw on the machine in question were consistently 192-198 secs.

> That said, I would like to emphasize that the underlying problem here 
> seems to be DERBY-1905.  Even if we get 10.2 to run repro.sql as 
> 'quickly' as 10.1, it still takes 10.1 way too long to optimize the 
> query (90 seconds on a "fairly powerful Windows machine", according to 
> the description for DERBY-2130), esp. given that there are no rows in 
> any of the tables.  The reason is because the cost estimates are too 
> high (infinity) and thus timeout does not take effect until too late.

I concur!

> Note, though, that it *may* be too risky to port changes for DERBY-1905 
> back to 10.2 (I don't know for sure since I don't know what those 
> changes will ultimately be).  Thus it might still be worth it to 
> investigate the aforementioned if-block angle for the sake of addressing 
> the performance regression seen between 10.1 and 10.2.  I.e. to 
> specifically address the 10.2 slowdown filed as DERBY-2130.

Either plan sounds fine to me. I think that making changes to 10.2 to
bring it back to the 10.1 levels of performance will benefit the entire
community; my specific motivation is to address the problem more fully
and arrive at an optimizer which can handle these queries in just a few seconds,
particularly because I think that as more and more people start using
these Object/Relational mapping libraries (which seem to be where these
union view subqueries commonly arise) we will see more and more queries
like these and we want to be able to handle them well.

Independently, I tried fiddling with the cost estimates myself, because as
I stepped through HeapCostController and BTreeCostController, my reaction
was that the costs were possibly 2 orders of magnitude too high. In my
experiments, I didn't see much benefit to the optimizer, but I wonder if
that's because I *also* need to have some of these other fixes (such as the
permute state fix from jumpReset.patch and the JUMPING-give-up change that
you've been studying) in order for the cost estimate changes to take effect?

thanks,

bryan

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Army <qo...@gmail.com>.

> Bryan Pendleton commented on DERBY-2130:
> ----------------------------------------
> 
> With jumpReset.patch applied, I cannot reproduce the varying optimize times.
> The times are all in a tight range, after nearly 6x as many tests as produced
> the variable optimize times before.

That's good to hear.  Thank you for taking the time to re-run your tests.

I'm curious: what is the optimization time that you consistently see with 
jumpReset.patch?  Is it the lesser time (i.e 170-200 seconds) or the greater 
time (500-650s)?  Or something else entirely?

Given that jumpReset.patch appears to solve the variance problem, we are then 
left with the increased optimization time vs 10.1, which as I mentioned earlier 
is indirectly caused by the DERBY-1357 fix.  And as I proposed earlier, I think 
the DERBY-1357 fix is itself correct; the "problem" (I use the term loosely) is 
with the following if-block in OptimizerImpl.java:

     if (permuteState == JUMPING && !joinPosAdvanced && joinPosition >= 0)
     {
         //not feeling well in the middle of jump
         // Note: we have to make sure we reload the best plans
         // as we rewind since they may have been clobbered
         // (as part of the current join order) before we gave
         // up on jumping.
         reloadBestPlan = true;
         rewindJoinOrder(); //fall
         permuteState = NO_JUMP; //give up
     }

While just removing this check reduces the optimization time back to what it was 
for 10.1, that in itself is not a complete solution.  The reason is that simple 
removal of this if-block can in certain cases lead to an infinite loop.  As an 
example, when I removed the if-block and then tried running lang/innerjoin.sql, 
the test never finished; instead it hung due to an infinite "JUMPING" loop.  So 
some additional changes would be required.

That said, I would like to emphasize that the underlying problem here seems to 
be DERBY-1905.  Even if we get 10.2 to run repro.sql as 'quickly' as 10.1, it 
still takes 10.1 way too long to optimize the query (90 seconds on a "fairly 
powerful Windows machine", according to the description for DERBY-2130), esp. 
given that there are no rows in any of the tables.  The reason is because the 
cost estimates are too high (infinity) and thus timeout does not take effect 
until too late.

I have been piddling around with DERBY-1905 on and off and early experimentation 
shows that if the cost estimates are more reasonable, the repro.sql script 
attached to DERBY-2130 completes in 3 or 4 seconds.  And that's the case even if 
we leave the above if-block exactly as it is right now (i.e. we do *not* remove 
it).  So this seems to confirm that aside from the variance problem, DERBY-2130 
is in some ways an expression of DERBY-1905.

Note, though, that it *may* be too risky to port changes for DERBY-1905 back to 
10.2 (I don't know for sure since I don't know what those changes will 
ultimately be).  Thus it might still be worth it to investigate the 
aforementioned if-block angle for the sake of addressing the performance 
regression seen between 10.1 and 10.2.  I.e. to specifically address the 10.2 
slowdown filed as DERBY-2130.

Army

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12458025 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

With jumpReset.patch applied, I cannot reproduce the varying optimize times.
The times are all in a tight range, after nearly 6x as many tests as produced
the variable optimize times before.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455118 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

Just speaking abstractly here for a minute, I want to make sure I'm following the discussion.

As I understand it, the Optimizer's overall top-level algorithm is:

-  Consider every possible query plan for executing this query (modulo permutation "jumping")
-  Evaluate the cost of each such plan.
-  Select the cheapest.
-  If at any time we determine that we've now spent longer evaluating plans
     than it would have taken to execute the cheapest plan seen so far, then
     quit and go run that plan (this is known as OptimizerTimeout)

Is that an accurate summary? If so, it seems that:

1) The most important thing about the Optimizer's cost estimation algorithm
is that it be "relatively accurate": the cheapest actual plan should
have the lowest estimated cost. 

2) The second most important thing is that the algorithm should strive to
be accurate, so that the optimizer timeout works effectively.

3) Another important thing is that the algorithm should be as rapid as
possible, so that the optimizer can evaluate as many plans as possible
prior to timing out.

It seems that most of the discussion so far has been around item (2), regarding
the accuracy and interpretation (units of measurement) of the cost estimate.
DERBY-1905, DERBY-1906, DERBY-1908, DERBY-1259, DERBY-1260 all
have to do with this topic.

I've seen almost no discussion of the first point (that the cheapest actual plan
should have the cheapest estimated cost); does this mean that we are pretty
confident about this aspect of cost estimation at this point? That is, the cost
estimating may be off, but it seems to be off for all queries equally?

And I haven't seen much discussion of the third point (that the Optimizer should
strive to evaluate plans as rapidly as possible, so as to increase its chances of
coming across a decent plan prior to timeout). DERBY-1907 is certainly relevant
here; are there other issues logged like this?

I guess I'm wondering (out loud) whether it is worth investigating a simple tuning
of the cost estimation algorithm. If the optimizer was *much* faster at
generating and estimating possible plans, wouldn't that be a big benefit?

Also, how confident are we that permutation jumping (as described in 
http://wiki.apache.org/db-derby/JoinOrderPermutations) is working properly?


> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455117 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

Army, thank you for the hint about the optimizer overrides. I will investigate this
as a possible workaround for the short term.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455141 ] 
            
A B commented on DERBY-2130:
----------------------------

Everything written in Bryan's preceding comment sounds correct to me, so I won't dwell.  As for the specific questions:

> I've seen almost no discussion of the first point (that the cheapest actual plan
> should have the cheapest estimated cost); does this mean that we are pretty
> confident about this aspect of cost estimation at this point? That is, the cost
> estimating may be off, but it seems to be off for all queries equally?

This has been my general assumption about the code, yes--at least, after DERBY-1007 was resolved.  It seems to me that at some point in the last year I found a scenario where the optimizer's "best" cost estimate did not (appear to) correspond to the best query plan, but I don't remember the details and it may have ended up being correct after all.  In any event, in all of the discussion that I've had/written, my general assumption has been that "Yes", the *relative* accuracy of the cost estimates is correct--i.e. that better plans have lower cost estimates.  Note, though, that this is just an assumption of mine which I have not bothered trying to debunk; if you find info to the contrary, please say so!

> DERBY-1907 is certainly relevant here; are there other issues logged like this?

None come to mind, no.  But I admit that's an answer based strictly on memory; I didn't actually do any searching...

> I guess I'm wondering (out loud) whether it is worth investigating a simple tuning 
> of the cost estimation algorithm. If the optimizer was *much* faster at generating
> and estimating possible plans, wouldn't that be a big benefit? 

Yes, definitely!

> Also, how confident are we that permutation jumping (as described in
> http://wiki.apache.org/db-derby/JoinOrderPermutations) is working properly? 

I had to laugh out loud when I read this question.  It sounds to me like the kind of question someone asks when they've found a somewhat serious bug but don't want to rock the boat ;)

So far as I know, the jumping code is working properly.  But if you told me there was a problem with the code, I think I'd assume you were right.  Is that the case?

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12457083 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

> My guess is that if you created 10.1 database A, and ran 10.3 against
> it, and then created 10.1 database B, and ran 10.3 against it, you
> might see dramatic differences even in just those two runs.

I was able to confirm this.

I created 10 separate databases using 10.1.2.1, and ran the DDL portion of repro.sql
against each of those 10 databases using 10.1.2.1.

Then I ran just the PREPARE portion of repro.sql against each of those 10 databases,
using the 10.3 trunk, and setting derby.database.allowPreReleaseUpgrade=true so
that 10.3 would be willing to access the 10.1 database.

Often, I observed 170-180 seconds for the prepare.

But I also observed 500-650 seconds for the prepare.

Furthermore, it wasn't on the same database each time!

That is, once I saw 500 seconds for the prepare on database 6, but I reran DB 6
and saw 168 seconds.

And, the first time I did the prepare on DB 8, it was 170 seconds, but the second
time I did the prepare on DB 8, it was 607 seconds.

So my observations are that:
 - 10.3 is non-deterministic regarding the amount of time that the PREPARE takes
 - it happens whether the DB and schema are created using 10.1, or 10.3
 - the same DB, with the same DDL, can sometimes prepare the query in time X,
   and sometimes in time Y, and as far as I can tell nothing is changing
   between runs. I just run IJ with the "prepare" script twice in a row, and
   get two very different elapsed times.

> I'm wondering if there is something going on, correlated to the DDL
> statements executed by the repro script, which sometimes results in
> tables that return "reasonable" estimates, and sometimes results in
> tables that return "infinite" estimates.

I now believe this is *not* an accurate theory, as the non-determinism does not
seem to be correlated to a particular run of the DDL statements.

So at this point I don't think that the non-determinism depends on whether the
database and its tables were created by 10.1, or by 10.3, and I don't think that
the non-determinism is "tied" to a particular database and set of tables; the
same database and tables can sometimes lead to one prepare time, and sometimes
to a different prepare time.


> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Knut Anders Hatlen (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Knut Anders Hatlen updated DERBY-2130:
--------------------------------------

    Issue & fix info: [High Value Fix, Repro attached]  (was: [High Value Fix])
             Urgency: Normal

Triaged for 10.5.2.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]

Bryan Pendleton updated DERBY-2130:
-----------------------------------

    Attachment: plan10_1_2_1.txt
                plan10_2.txt
                plans.diff

Attached are the query plans that I get when running with 10.1.2.1 and 10.2

I also attached 'plans.diff', which is a diff of the plans after editing the plans to remove "noisy" differences such as the floating point values of the estimated costs.

The plans are very similar, but not identical.

It seems that 10.2 is, in several cases, preferring indexed access to tables, which I think shows that the predicate pushdown changes (DERBY-805 and friends) are working.

Other than that, my extremely coarse reading of the plans is that the primary difference between the two plans is the estimated costs.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.1.3.1, 10.3.0.0
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]

Mike Matrigali updated DERBY-2130:
----------------------------------


The variance in optimizer times is puzzling as I thought the order of plans picked was pretty much the same.  I think
2 things would help in understanding this:
1) Have a trace flag that tells us if the prepare ended because of timeout or because of plan exhaustion.
2) Can you reproduce the variance if you only do the prepare on the same db, without redoing the ddl.  I think I 
     remember some issue with
     the order that tables are looked at being related to hash codes which might vary from ddl to ddl.
      Is this an issue only if entire optmizer is timing out, or are there any "sub-timeouts" where we might short circuit
     a list of subquery options?

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Performance
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455197 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

> So far as I know, the jumping code is working properly.

Thanks, that's good to know. It actually was an innocent question; I'm
not holding anything up my sleeve :) My train of thought was something like:
 - 7+ minutes of CPU time is a *lot* on today's machines
 - therefore, we're either evaluating an astounding number of query plans,
 - or, we're burning a lot of CPU evaluating the plans,
 - or both


> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.1.3.1, 10.3.0.0
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Bryan Pendleton <bp...@amberpoint.com>.

 > can you apply jumpReset.patch to your codeline and run your tests again?

You bet! This sounds like a very interesting theory to pursue.

thanks,

bryan

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]

A B updated DERBY-2130:
-----------------------

    Attachment: jumpReset.patch

> That is, once I saw 500 seconds for the prepare on database 6, but I
> reran DB 6 and saw 168 seconds. [...] I don't think that the non-
> determinism is "tied" to a particular database and set of tables;
> the same database and tables can sometimes lead to one prepare time,
> and sometimes to a different prepare time.  

As I was looking at this problem I noticed a bug in the code that *may* be causing this variance.  It looks like we are not resetting the permuteState field in OptimizerImpl at the beginning of each "round" of optimization.  The result (in the cases I saw) was that the field has the value WALK_LOW at the start of the second round of optimization, which, in very timing- sensitive situations, can lead to a code-path that causes the bestCost for the round to remain UNSET--i.e. it sits as Double.MAX_VALUE.  Then if we try to add anything to that cost, we end up with Infinity, which effectively ruins most subsequent cost estimates.

I still have not been able to reproduce the variance that Bryan reports, so I cannot say for sure that this is the cause.  I ran the repro 10 times against the same database and I consistently saw times of 550 to 700 seconds--and in each case I could see that the optimizer tried out all 40k permutations before completing.  Then I added code to reset the permuteState variable (see attached jumpReset.patch) and ran the repro 10 more times against the same database.  In this case every execution completed in about 200 to 220 seconds, and I could see that the optimizer timed out before trying out all permutations (which is more what I would expect).

The reason I bring this is up is because the compilation time without jumpReset.patch and the compilation time with it appear to correlate to the times that Bryan is seeing in his variance (based on his previous comment).  So I'm wondering if the two are related...?

Bryan, I know you are busy and that running these tests takes a while, but if you have the time/inclination, can you apply jumpReset.patch to your codeline and run your tests again?  I'm wondering what effect (if any) the patch will have on the variance that you are seeing...  

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.1.3.1, 10.3.0.0
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12456115 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

> When running against a 10.1 database 10.3 estimates a cost of 1736685, followed up by a 
> cost of 198589 for the next complete order, which leads leads to eventual timeout. But when 
> running against a 10.3 database, the cost estimate is *Infinity* and thus the timeout value 
> sits at 5E62, which we never reach.

Is it possible that the 10.1 vs 10.3 database is a red herring here, and this is simply the
same "non-determinism" that I was observing in my initial tests? That is, you ran against two
different database, thus you ran all the CREATE TABLE/INDEX/VIEW statements twice,
and got hugely different results. I definitely saw dramatically different results even in the
10.3-only case from run to run, when running the entire script which recreates all the tables each time.

My guess is that if you created 10.1 database A, and ran 10.3 against it, and then created
10.1 database B, and ran 10.3 against it, you might see dramatic differences even in just
those two runs. 

I'm wondering if there is something going on, correlated to the DDL statements executed
by the repro script, which sometimes results in tables that return "reasonable"
estimates, and sometimes results in tables that return "infinite" estimates.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Mike Matrigali (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]

Mike Matrigali updated DERBY-2130:
----------------------------------


What continues to puzzle me about these issues is that the 
estimated costs are so out of control.  Without a major
rewrite of the optimizer, the basic assumption is that the
estimated costs are somewhat close to the actual elapsed
time of the query.  Timeout depends on this, and thus
optimizer "performance" depends on this as we tend to
loop through ever increasing number of plans as each
variable is added.

I think the key is to get the optimizer to time out correctly.
DERBY-1907 would be nice but the whole system is just broken
if estimates are so wrong.

Army posted some "simple" problem cases, that is where I
would start.  It seems like all of the following may apply
to your case, probably good to validate there is not something
new going on with your query.:

DERBY-1908  	 Investigate: What's the "unit" for optimizer cost estimates?   	 
DERBY-1907 	Improve optimizer to prune query plans more aggressively. 	
DERBY-1906 	Investigate appropriateness of current optimizer timeout mechanism. 	
DERBY-1905 	Optimizer cost estimates for subqueries are way (way) too high. 	
DERBY-47	IN list optimization

One approach that would keep the current basic optimizer technology would be to see if we can fix the estimates.  Some items might be:

1) update the store costs to reflect current machines
o update the store costs to reflect "actual" machine it is running on
o figure out why there is such variance in DERBY-2130
o investigate why optimizer costs are so out of line - is it just
  wrong logic, is it lack of some sort of cost estimate so one bad
  guess on top of another.
o DERBY-47

I am interested in doing 1, but I don't think it is going to help
with infinite costs.

Is anyone else interested in looking at this?  The repro for
2130 is pretty ugly, I wonder if we could agree on a smaller problem
to work on that hopefully would help the harder problem.


> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Performance
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12454479 ] 
            
Bryan Pendleton commented on DERBY-2130:
----------------------------------------

One experiment I tried was to remove all the secondary 
indexes. I thought that that might shrink the overall "search 
space" that the optimizer was dealing with. However, it
actually seemed to make matters worse, not better.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Performance
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> A B commented on DERBY-2130:
> ----------------------------

> Have you yourself seen this variance with *10.1* databases, as well? 

No, but it hadn't occurred to me to even try the experiment until
you mentioned it :)

I'll try to confirm or disprove this theory, but I'm not able to
spend as much time on this problem as I want to, right now, so it
may take a while.

Thanks again for all the time you've been putting into this, and
all the good ideas. I feel like we're making good progress on
understanding what's going on.

bryan

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12456541 ] 
            
A B commented on DERBY-2130:
----------------------------

> Is it possible that the 10.1 vs 10.3 database is a red herring here,
> and this is simply the same "non-determinism" that I was observing
> in my initial tests?

Good point, very well could be.  That didn't occur to me.

> My guess is that if you created 10.1 database A, and ran 10.3 against
> it, and then created 10.1 database B, and ran 10.3 against it, you
> might see dramatic differences even in just those two runs.

Have you yourself seen this variance with *10.1* databases, as well?  I was limiting myself to the details in the description of this issue, which only mention the variance with 10.2 and trunk.  If you are also seeing the variance in 10.1, then can we assume that the non-deterministic variance aspect of this issue is not a regression, but is an existing problem with Derby since who-knows-when?  That might correlate with Mike's earlier comments about that being a potential issue in the past (hash codes based on DDL)...

> I'm wondering if there is something going on, correlated to the DDL
> statements executed by the repro script, which sometimes results in
> tables that return "reasonable" estimates, and sometimes results in
> tables that return "infinite" estimates.

This is great question.  A follow-up question would be whether or not this DDL-based variance is a regression from 10.1 or has been there from the "beginning".

As you can tell, my primary focus here has been on potentially regressed behavior.  As I've been looking at this problem I have been asking myself "what changed from 10.1 to 10.2 to cause this sudden slow-down?" So that's where my mind (and all of my questions) have been leading...

And on that point, I mentioned in my last comment that the JUMP behavior "does not take effect" for 10.3, but it does for 10.1.  I spent some time tracing through the code and quickly realized that this change in behavior is an indirect result of DERBY-1357.  Here's why.

As explained on the wiki page that describes jumping (http://wiki.apache.org/db-derby/JoinOrderPermutations), the optimizer will find the first complete join order for a query before attempting to jump.  In the repro query the first join order is simply "0 1 2 3 4 5 6 7".  The associated cost is 5.607E62.  Using the row counts calculated for this join order, the "target" join order then becomes "1 4 2 6 7 0 3 5".  This means that the next 8 calls to "getNextPermutation()" will return partial join orders that build up to this target.  i.e.

1 -1 -1 -1 -1 -1 -1 -1
1  4 -1 -1 -1 -1 -1 -1
1  4  2 -1 -1 -1 -1 -1
etc.

Prior to DERBY-1357 the optimizer "short-circuit" logic was not working, so even if a partial join order is more expensive than the best complete join order so far, the optimizer will continue to waste time optimizing join orders that are always going to return an estimate that is higher than the best so far.  So in 10.1 (which does not have the fix for DERBY-1357) we'll merrily continue on our way from [1 4 2 ... ] all the way up until we either timeout or find a join order that is cheaper than 5.607E62.  We eventually find a cheaper join order after about 2500 permutations, and that's the point at which optimization stops (see details from my previous comment).

With 10.2 and trunk, though, the DERBY-1357 short-circuit logic has been corrected.  And as it happens, the estimated cost for the partial join order "1 4 2 6 7 0 3" is "Infinity"; since that's greater than 5.607E62, we do not advance the join order and thus we effectively skip the rest of it (because we know it's going to return a cost that's worse than 5.607E62).  That part is, I think, correct.

But having chosen to NOT advance the join order, now we come to the following block of code in OptimizerImpl:

    if (permuteState == JUMPING && !joinPosAdvanced && joinPosition >= 0)
    {
        //not feeling well in the middle of jump
        // Note: we have to make sure we reload the best plans
        // as we rewind since they may have been clobbered
        // (as part of the current join order) before we gave
        // up on jumping.
        reloadBestPlan = true;
        rewindJoinOrder();  //fall
        permuteState = NO_JUMP;  //give up
    }

Since we're in the middle of jumping and since we did *not* advance the join order, we execute this if block--i.e. we rewind the join order back to the beginning and start all over.  Or put another way, we gave up on the "JUMP" and went back to normal processing.  From there we then have to go through thousands and thousands of permutations before we find a plan that's cheaper than 5.607E62.  Depending on which database is in use, that can be ten thousand permutations or it could be forty thousand, which explains the variance in the time taken to complete optimization.

I think the changes for DERBY-1357 are correct; what's not clear is whether the "if" block shown above is really necessary.  The idea sort of makes sense to me: if we broke away from normal processing and jumped to a _bad_ join order then we want to abort the jump as quickly as possible and return to normal processing.  However, now that the optimizer "short-circuit" logic is fixed, I'm not so sure we need to revert back to "normal" processing when we jump to a bad join order.  If the join order is bad then the short-circuit logic will make sure that we do not waste "too much" (yes, that's an awfully subjective term) time trying out bad join orders.  So would it make sense to just "iterate" the target join order as usual without defaulting back to square one?

As a sanity check I commented the above "if" block out in the trunk codeline and ran the repro; when I did so, the script executed in about the same time as it does on the latest 10.1 branch.  So this seems to confirm my findings.  FAR more testing is needed, though, to see if this simple removal of the "if" block is a viable solution.

All of that said, I still think the BIGGER problems we are seeing are 1) DERBY-1905 and 2) as Bryan has said, the apparent variance in cost estimates based on DDL.  DERBY-1905 is definitely playing a huge role here, what with the Infinity cost estimates followed by NaN.  These kinds of estimates render the vast majority of the join order estimates useless, even if we do remove the "if" block mentioned above.

Also note: when I remove the "if" block as mentioned above, I found that sometimes the cost estimate for the "best" join order ("2 0 1 3 4 5 6 7") was "14116", and sometimes it was "247".  This was on the *same* database and so does not appear to be the result of DDL statements.  I'm still looking into this little tidbit to see what might be going on there...

As usual, some of this may be useful, some of it may not be. Please excuse me if I'm babbling...

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.1.3.1, 10.3.0.0
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Dag H. Wanvik (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Dag H. Wanvik updated DERBY-2130:
---------------------------------

    Derby Categories: [High Value Fix, Performance]  (was: [High Value Fix])

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Re: [jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by Manjula G Kutty <ma...@gmail.com>.

Bryan Pendleton (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]
>
>Bryan Pendleton updated DERBY-2130:
>-----------------------------------
>
>    Attachment: repro.sql
>
>reproduction script is 'repro.sql'.
>
>  
>
>>Optimizer performance slowdown from 10.1 to 10.2
>>------------------------------------------------
>>
>>                Key: DERBY-2130
>>                URL: http://issues.apache.org/jira/browse/DERBY-2130
>>            Project: Derby
>>         Issue Type: Bug
>>         Components: SQL, Performance
>>   Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>>           Reporter: Bryan Pendleton
>>        Attachments: repro.sql
>>
>>
>>Attached is 'repro.sql', an IJ script which demonstrates what I
>>believe to be a serious performance issue in the Optimizer.
>>I have run this script in a number of configurations:
>> - 10.1.2.1: the script runs successfully. The 'prepare' statement
>>   takes about 90 seconds, on a fairly powerful Windows machine
>> - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>> - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>>   often takes about 220 seconds, on the same Windows machine
>>   Intermittently, on 10.2 and on the trunk, the prepare statement takes
>>   15+ minutes. I cannot reliably reproduce this; I run the same script
>>   several times in a row and I cannot predict whether it will take 220
>>   seconds or whether it will take 15+ minutes.
>>I am quite motivated to work on this problem, as this is blocking me from
>>using Derby for a project that I'm quite keen on, but I need some
>>suggestions and ideas about how to attack it. From my perspective
>>there are 3 primary topics:
>>1) Why did optimizer performance for this query degrade so significantly
>>from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
>>for this particular query at least, in 10.2. Sometimes it is 10x slower.
>>2) What is the source of the non-determinism? Why does the optimizer
>>often take 4 minutes to optimize this query on the trunk, but sometimes
>>take 15+ minutes? I don't believe that I'm changing anything from
>>run to run.
>>3) Can we improve the optimizer performance even beyond what it was
>>for 10.1.2? I realize that this is an ugly query, but I was hoping to
>>see an optimization time of 5-10 seconds, not 90 seconds (and certainly
>>not 220 seconds).
>>I have attempted to start answering some of these questions, with
>>limited success. Here is some of what I think I've discovered so far:
>> - the optimizer changes in 10.2 seem to have given the optimizer many
>>   more choices of possible query plans to consider. I think this means
>>   that, if the optimizer does not time out, it will spend substantially
>>   more time optimizing because there are more choices to evaluate. Does
>>   this by itself mean that the optimizer will take 2.5 times longer in
>>   10.2 than it did in 10.1?
>> - something about this query seems to make the costing mechanism go
>>   haywire, and produce extreme costs. While stepping through the
>>   optimization of this query in the debugger I have seen it compute
>>   costs like 1e63 and 1e200. This might be very closely related to
>>   DERBY-1905, although I don't think I'm doing any subqueries here.
>>   But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>>   At any rate, due to the enormous estimated costs, timeout does not
>>   occur.
>> - the WHERE clause in this query is converted during compilation to 
>>   an equivalent IN clause, I believe, which then causes me to run into
>>   a number of the problems described in DERBY-47 and DERBY-713.
>>   Specifically, rather than constructing a plan which involves 4
>>   index probes for the 4 WHERE clause values, the optimizer decides
>>   that an index scan must be performed and that it will have to process
>>   the entire index (because the query uses parameter markers, not
>>   literal values). So perhaps solving DERBY-47 would help me
>> - the optimizer in fact comes up with a "decent" query plan quite quickly.
>>   I have experimented with placing a hard limit into the optimizer
>>   timeout code, so that I can force optimization to stop after an
>>   arbitrary fixed period of time. Then I have been able to set that
>>   value to as low as 1 second, and the optimizer has produced plans
>>   that then execute in a few milliseconds. Of course, I have only tried
>>   this with a trivial amount of data in my database, so it's possible
>>   that the plan produced by the optimizer after just a second of
>>   optimizing is in fact poor, and I'm just not noticing it because my
>>   data sizes are so small.
>>At this point, what would be really helpful to me would be some suggestions
>>about some general approaches or techniques to try to start breaking down
>>and analyzing this problem.
>>    
>>
>
>  
>
Hi Bryan,

I'm not an optimizer expert, but from the experience of verifying the 
derby-805 fix , I noticed that if you run the queries through IJ it 
takes more time, but if you run the same query through a java pgm the 
time will be less.  If you want the numbers I can send you for both 10.1 
and 10.2 with the queries I used

Thanks
Manjula

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

     [ http://issues.apache.org/jira/browse/DERBY-2130?page=all ]

Bryan Pendleton updated DERBY-2130:
-----------------------------------

    Attachment: repro.sql

reproduction script is 'repro.sql'.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL, Performance
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Bryan Pendleton (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12859624#action_12859624 ] 

Bryan Pendleton commented on DERBY-2130:
----------------------------------------

Hi Rick,

I think that Army's patch is a good one, but I'm afraid I don't have time right
now to push this issue forward.


> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12455802 ] 
            
A B commented on DERBY-2130:
----------------------------

Thanks for the query plans, Bryan.  I took a quick look and nothing stood out, but I'll have to look more closely before I can say anything for sure.  Thank you for taking the time to pull out the "noise"--I think that'll be helpful.

On a different note, I added some System.outs to various places in the Optimizer to see what cost estimates the optimizer is calculating for different complete join orders in the top-level OptimizerImpl.  I then ran the repro query against the latest 10.1 branch and the latest 10.3 trunk.  The abbreviated output from what I saw is shown below.

Some things to note:

  0. Lots of "Infinity" and "NaN" cost estimates, which are probably the result of DERBY-1905 and DERBY-1260, respectively. This pretty much renders all but a handful of the cost estimates useless.

  1. In 10.1.3 we try to jump to a target join order and we SUCCESSFULLY do so.  In 10.3 we also try to jump to the same target join order, BUT for some reason the jump does NOT take effect.  So it looks like Bryan's question about whether or not permutation jumping works correctly was indeed well-founded.  This is, I think, the first thing to investigate.

  2. Because the 10.3 "jump" does not occur we spend a lot of time iterating through different permutations until we get to the "best join order" found by 10.1. Once we get there the estimated cost for 10.3 (on a 10.1 database) is finite but is much higher than the cost estimate of the same join order in 10.1 (namely, "247" in 10.1 vs "1736685" in 10.3).  So this is another point of interest.  This could again be the result of DERBY-1260, but it's hard to say at this point.

  3. This next observation is weird: when running 10.3 on a 10.1 database the query finishes after about 10,000 iterations (roughly 5 minutes on my laptop).  But when running 10.3 on a *10.3* database the query runs until it has exhausted all possible permutations (over 40,000).  The difference is in the calculated cost estimate for the "2 0 1 3 4 5 6 7" join order.  When running against a 10.1 database 10.3 estimates a cost of 1736685, followed up by a cost of 198589 for the next complete order, which leads leads to eventual timeout.  But when running against a 10.3 database, the cost estimate is *Infinity* and thus the timeout value sits at 5E62, which we never reach.  I have absolutely no idea why this would be the case.  Any one out there have any theories?   Bryan, can you perhaps try this out to see if you observe the same behavior?  You can connect to a 10.1 database by setting the "derby.database.allowPreReleaseUpgrade" property to "true".  Note that you should only use that on a _test_ 10.1 database, though; I don't think it's intended for production db's.

Generally speaking lines in the output are of the form:

-=> estimatedCost: j o i n O r d e r

--------------------------
10.1.3 on 10.1.3 database:
--------------------------

ij version 10.1
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> Infinity: 1 4 2 6 7 0 3 5
-=> NaN: 1 4 2 6 7 0 5 3
-=> NaN: 1 4 2 6 7 3 0 5
-=> NaN: 1 4 2 6 7 3 5 0
-=> NaN: 1 4 2 6 7 5 0 3
-=> NaN: 1 4 2 6 7 5 3 0
-=> NaN: 1 4 2 7 0 3 5 6
-=> NaN: 1 4 2 7 0 3 6 5
-=> NaN: 1 4 2 7 0 5 3 6
-=> NaN: 1 4 2 7 0 5 6 3
-=> NaN: 1 4 2 7 0 6 3 5
-=> NaN: 1 4 2 7 0 6 5 3

<snip ~2,600 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 0 2 3
-=> NaN: 1 7 6 5 4 0 3 2
-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> 247.4647753453739: 2 0 1 3 4 5 6 7
-==> Just set bestCost to: 247.4647753453739
-=-> TIMED OUT: exceeded 247.4647753453739 ms.
-=> 247.4647753453739: 2 0 1 3 4 5 6 7
ELAPSED TIME = 82961 milliseconds

------------------------
10.3 on *10.1.3* database:
------------------------

(on the *same* database that was created by 10.1)

ij version 10.3
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> 1.8253980349686949E106: 0 1 2 3 4 5 6 7
-=> 3.021440681134236E107: 0 1 2 3 4 5 7 6
-=> 8.374147544754745E119: 0 1 2 3 4 6 5 7
-=> 8.374147544754745E119: 0 1 2 3 4 6 7 5
-=> 5.993308284393933E109: 0 1 2 3 4 7 5 6
-=> 2.736734332816085E108: 0 1 2 3 4 7 6 5
-=> 3.973494182787193E115: 0 1 2 3 5 4 6 7
-=> 6.577018684224087E116: 0 1 2 3 5 4 7 6
-=> 3.973494182787193E115: 0 1 2 3 5 6 4 7
-=> 6.577018684224087E116: 0 1 2 3 5 7 4 6
-=> 8.374147544754745E119: 0 1 2 3 6 4 5 7
-=> 8.374147544754745E119: 0 1 2 3 6 4 7 5
-=> 1.0852895218002162E119: 0 1 2 3 6 5 4 7
-=> 8.374147544754745E119: 0 1 2 3 6 7 4 5
-=> 5.993308284393933E109: 0 1 2 3 7 4 5 6
-=> 2.736734332816085E108: 0 1 2 3 7 4 6 5
-=> 1.2945545894290903E112: 0 1 2 3 7 5 4 6
-=> 2.736734332816085E108: 0 1 2 3 7 6 4 5
-=> NaN: 0 1 2 4 5 3 6 7
-=> NaN: 0 1 2 4 5 3 7 6
-=> NaN: 0 1 2 4 5 6 3 7
-=> NaN: 0 1 2 4 5 6 7 3
-=> NaN: 0 1 2 4 5 7 3 6
-=> NaN: 0 1 2 4 5 7 6 3

<snip ~10,000 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 0 2 3
-=> NaN: 1 7 6 5 4 0 3 2
-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> 1736685.050617207: 2 0 1 3 4 5 6 7
-==> Just set bestCost to: 1736685.050617207
-=> 198589.53739726284: 2 0 1 3 4 5 7 6
-==> Just set bestCost to: 198589.53739726284
-=-> TIMED OUT: exceeded 198589.53739726284 ms.
-=> 198589.53739726284: 2 0 1 3 4 5 7 6
ELAPSED TIME = 336370 milliseconds

------------------------
10.3 on 10.3 database:
------------------------

ij version 10.3
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
-==> Just set bestCost to: 5.607246651648704E62
-=-> JUMPING to: 1 4 2 6 7 0 3 5
-=> 1.8253980349686949E106: 0 1 2 3 4 5 6 7
-=> 3.021440681134236E107: 0 1 2 3 4 5 7 6
-=> 8.374147544754745E119: 0 1 2 3 4 6 5 7
-=> 8.374147544754745E119: 0 1 2 3 4 6 7 5
-=> 5.993308284393933E109: 0 1 2 3 4 7 5 6
-=> 2.736734332816085E108: 0 1 2 3 4 7 6 5
-=> 3.973494182787193E115: 0 1 2 3 5 4 6 7
-=> 6.577018684224087E116: 0 1 2 3 5 4 7 6
-=> 3.973494182787193E115: 0 1 2 3 5 6 4 7
-=> 6.577018684224087E116: 0 1 2 3 5 7 4 6
-=> 8.374147544754745E119: 0 1 2 3 6 4 5 7
-=> 8.374147544754745E119: 0 1 2 3 6 4 7 5
-=> 1.0852895218002162E119: 0 1 2 3 6 5 4 7
-=> 8.374147544754745E119: 0 1 2 3 6 7 4 5
-=> 5.993308284393933E109: 0 1 2 3 7 4 5 6
-=> 2.736734332816085E108: 0 1 2 3 7 4 6 5
-=> 1.2945545894290903E112: 0 1 2 3 7 5 4 6
-=> 2.736734332816085E108: 0 1 2 3 7 6 4 5
-=> NaN: 0 1 2 4 5 3 6 7
-=> NaN: 0 1 2 4 5 3 7 6
-=> NaN: 0 1 2 4 5 6 3 7
-=> NaN: 0 1 2 4 5 6 7 3
-=> NaN: 0 1 2 4 5 7 3 6

<snip ~10,000 permutations with cost estimates of "NaN">

-=> NaN: 1 7 6 5 4 2 0 3
-=> NaN: 1 7 6 5 4 2 3 0
-=> NaN: 1 7 6 5 4 3 0 2
-=> NaN: 1 7 6 5 4 3 2 0
-=> Infinity: 2 0 1 3 4 5 6 7
-=> NaN: 2 0 1 3 4 5 7 6
-=> NaN: 2 0 1 3 4 6 5 7
-=> NaN: 2 0 1 3 4 6 7 5
-=> NaN: 2 0 1 3 4 7 5 6
-=> NaN: 2 0 1 3 4 7 6 5
-=> NaN: 2 0 1 3 5 4 6 7

<snip ~30,000 permutations, most with cost estimates of "NaN", a few with estimates of Infinity>

-=> NaN: 7 6 5 4 3 1 0 2
-=> NaN: 7 6 5 4 3 1 2 0
-=> NaN: 7 6 5 4 3 2 0 1
-=> NaN: 7 6 5 4 3 2 1 0
-=-> ENDED because ran out of permutations.
-=> 5.607246651648704E62: 0 1 2 3 4 5 6 7
ELAPSED TIME = 1145530 milliseconds

I'm still trying to make sense of this data, but perhaps this info will be helpful in tracking down the issue(s) here...

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "Kathey Marsden (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/DERBY-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kathey Marsden updated DERBY-2130:
----------------------------------

          Derby Info: [Regression]
    Derby Categories: [High Value Fix]

I wonder if the patch attached to this issue can be pursued as is or if there is more work that needs to be done.  It seems it would be good to get this performance regression fixed.

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: https://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.1.3.1, 10.2.1.6, 10.3.1.4
>            Reporter: Bryan Pendleton
>         Attachments: jumpReset.patch, plan10_1_2_1.txt, plan10_2.txt, plans.diff, repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (DERBY-2130) Optimizer performance slowdown from 10.1 to 10.2

Posted by "A B (JIRA)" <ji...@apache.org>.

    [ http://issues.apache.org/jira/browse/DERBY-2130?page=comments#action_12454963 ] 
            
A B commented on DERBY-2130:
----------------------------

Just a random though that occured to me...

The query in question is written as:

prepare p1 as '
select count(distinct t1.c5) 
    from table19 t1 
        inner join table05 t3 on ( t1.c5 = t3.c2)  
        inner join table06 t4 on ( t3.c33 = t4.c1)  
        inner join v3 t5 on ( t4.c34 = t5.c1)  
        inner join table20 t2 on ( t5.c3 = t2.c5)  
        inner join v1 t6 on ( t2.c35 = t6.c1)  
        inner join v2 t7 on ( t6.c3 = t7.c5)  
        inner join v1 t8 on ( t7.c22 = t8.c1)  
    where (((t8.c3 = ? or t8.c3 = ? or t8.c3 = ? or t8.c3 = ?)))';

Is the use of the explicit "inner join" syntax an intentional decision?  Or more specifically, is the intention that the tables are to be joined in a specific order?  From the Optimizer standpoint this query  becomes one large outer query with 8 Optimizables in the from list, so the Optimizer will attempt to evaluate all combinations of the 8 Optimizables before quitting--which takes a (long) while.  If, however, the query was written with a specific join order in mind, you could use Derby optimizer overrides to fix the join order of the 8 Optimizables to be the join order in which they are specified:

prepare p1 as '
select count(distinct t1.c5) 
    from  --DERBY-PROPERTIES joinOrder=FIXED
            table19 t1 
            inner join table05 t3 on ( t1.c5 = t3.c2)  
            inner join table06 t4 on ( t3.c33 = t4.c1)  
            inner join v3 t5 on ( t4.c34 = t5.c1)  
            inner join table20 t2 on ( t5.c3 = t2.c5)  
            inner join v1 t6 on ( t2.c35 = t6.c1)  
            inner join v2 t7 on ( t6.c3 = t7.c5)  
            inner join v1 t8 on ( t7.c22 = t8.c1)  
    where (((t8.c3 = ? or t8.c3 = ? or t8.c3 = ? or t8.c3 = ?)))';

When I did this the "prepare' finished in about 3 seconds on the same Red Hat Linux machine with the latest trunk branch (sane build).

Of course, I still think this issue is one that needs to be resolved.  But if a) you're looking for a temprorary workaround and b) you know the order in which you would like the tables to be joined, maybe the optimizer override can help in the short term.

Please note, though, that Derby optimization as a whole would benefit a *lot* more if you continued to help resolve this problem :)

> Optimizer performance slowdown from 10.1 to 10.2
> ------------------------------------------------
>
>                 Key: DERBY-2130
>                 URL: http://issues.apache.org/jira/browse/DERBY-2130
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance, SQL
>    Affects Versions: 10.2.1.6, 10.3.0.0, 10.1.3.1
>            Reporter: Bryan Pendleton
>         Attachments: repro.sql
>
>
> Attached is 'repro.sql', an IJ script which demonstrates what I
> believe to be a serious performance issue in the Optimizer.
> I have run this script in a number of configurations:
>  - 10.1.2.1: the script runs successfully. The 'prepare' statement
>    takes about 90 seconds, on a fairly powerful Windows machine
>  - 10.1.3.1: the script produces a NPE. I believe this is DERBY-1777
>  - 10.2.1.8/trunk: the script runs successfully. The 'prepare' statement
>    often takes about 220 seconds, on the same Windows machine
>    Intermittently, on 10.2 and on the trunk, the prepare statement takes
>    15+ minutes. I cannot reliably reproduce this; I run the same script
>    several times in a row and I cannot predict whether it will take 220
>    seconds or whether it will take 15+ minutes.
> I am quite motivated to work on this problem, as this is blocking me from
> using Derby for a project that I'm quite keen on, but I need some
> suggestions and ideas about how to attack it. From my perspective
> there are 3 primary topics:
> 1) Why did optimizer performance for this query degrade so significantly
> from 10.1.2.1 to 10.2? The optimizer seems to be at least 2.5 times slower,
> for this particular query at least, in 10.2. Sometimes it is 10x slower.
> 2) What is the source of the non-determinism? Why does the optimizer
> often take 4 minutes to optimize this query on the trunk, but sometimes
> take 15+ minutes? I don't believe that I'm changing anything from
> run to run.
> 3) Can we improve the optimizer performance even beyond what it was
> for 10.1.2? I realize that this is an ugly query, but I was hoping to
> see an optimization time of 5-10 seconds, not 90 seconds (and certainly
> not 220 seconds).
> I have attempted to start answering some of these questions, with
> limited success. Here is some of what I think I've discovered so far:
>  - the optimizer changes in 10.2 seem to have given the optimizer many
>    more choices of possible query plans to consider. I think this means
>    that, if the optimizer does not time out, it will spend substantially
>    more time optimizing because there are more choices to evaluate. Does
>    this by itself mean that the optimizer will take 2.5 times longer in
>    10.2 than it did in 10.1?
>  - something about this query seems to make the costing mechanism go
>    haywire, and produce extreme costs. While stepping through the
>    optimization of this query in the debugger I have seen it compute
>    costs like 1e63 and 1e200. This might be very closely related to
>    DERBY-1905, although I don't think I'm doing any subqueries here.
>    But maybe I'm misunderstanding the term "subquery" in DERBY-1905.
>    At any rate, due to the enormous estimated costs, timeout does not
>    occur.
>  - the WHERE clause in this query is converted during compilation to 
>    an equivalent IN clause, I believe, which then causes me to run into
>    a number of the problems described in DERBY-47 and DERBY-713.
>    Specifically, rather than constructing a plan which involves 4
>    index probes for the 4 WHERE clause values, the optimizer decides
>    that an index scan must be performed and that it will have to process
>    the entire index (because the query uses parameter markers, not
>    literal values). So perhaps solving DERBY-47 would help me
>  - the optimizer in fact comes up with a "decent" query plan quite quickly.
>    I have experimented with placing a hard limit into the optimizer
>    timeout code, so that I can force optimization to stop after an
>    arbitrary fixed period of time. Then I have been able to set that
>    value to as low as 1 second, and the optimizer has produced plans
>    that then execute in a few milliseconds. Of course, I have only tried
>    this with a trivial amount of data in my database, so it's possible
>    that the plan produced by the optimizer after just a second of
>    optimizing is in fact poor, and I'm just not noticing it because my
>    data sizes are so small.
> At this point, what would be really helpful to me would be some suggestions
> about some general approaches or techniques to try to start breaking down
> and analyzing this problem.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira