You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by "Bryan Pendleton (JIRA)" <de...@db.apache.org> on 2006/07/14 20:07:14 UTC

[jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

    [ http://issues.apache.org/jira/browse/DERBY-1357?page=comments#action_12421183 ] 
            
Bryan Pendleton commented on DERBY-1357:
----------------------------------------

Hi Army,

Your change looks very good to me. I like the way you named the boolean temp variable; it makes the code substantially easier to read. And thanks for adding the good comments in the code.

One question, though: what is the actual symptom of this bug? If I am understanding it correctly, the symptom is that the optimizer may pointlessly continue to investigate a possible query plan which it should already be able to reject as being too expensive.

Does that mean that the only symptom here is that the optimizer may waste some resources?

I guess the thing I'm still struggling with is the bit of the change which adds "reloadBestPlan = true" at about line 530. Does that part of the patch have any actual user-visible impact on which query plan would be chosen, such that we need to have a release note here? Or is that just another bit of resource wastage that we're improving?


> Short-circuit logic in optimizer appears to be incorrect...
> -----------------------------------------------------------
>
>                 Key: DERBY-1357
>                 URL: http://issues.apache.org/jira/browse/DERBY-1357
>             Project: Derby
>          Issue Type: Bug
>          Components: Performance
>    Affects Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1, 10.1.1.2, 10.1.2.1, 10.1.2.2, 10.1.2.3, 10.1.2.4
>            Reporter: A B
>         Assigned To: A B
>            Priority: Minor
>             Fix For: 10.2.0.0
>
>         Attachments: d1357_v1.patch, d1357_v1.stat
>
>
> When considering different join orders for the FROM tables in a query, the optimizer will decide to give up on a join order midway through if the cost of that (partial) join order is already higher than the cost of some other *complete* join order that the optimizer previously found.  This "short-circuiting" of a join order can save compilation time.
> That said, the logic to perform this "short-circuit" of a join order is currently as follows (from OptimizerImpl.java):
>   /*
>   ** Pick the next table in the join order, if there is an unused position
>   ** in the join order, and the current plan is less expensive than
>   ** the best plan so far, and the amount of time spent optimizing is
>   ** still less than the cost of the best plan so far, and a best
>   ** cost has been found in the current join position.  Otherwise,
>   ** just pick the next table in the current position.
>   */
>   boolean joinPosAdvanced = false;
>   if ((joinPosition < (numOptimizables - 1)) &&
>     ((currentCost.compare(bestCost) < 0) ||
>     (currentSortAvoidanceCost.compare(bestCost) < 0)) &&
>     ( ! timeExceeded )
>     )
>   {
>     ...
>   }
> There are two "current costs" in this statement: one for the cost if the optimizer is calculating a "sort avoidance" plan (which it does if there is a required row ordering on the results) and one if it is calculating a plan for which row order is not important.
> I admit that I'm not all that familiar with what goes on with the costing of a sort-avoidance plan, but inspection of the code shows that, when there is no required row ordering--i.e. when we aren't looking for a sort-avoidance plan--the cost field of currentSortAvoidanceCost will always be 0.0d. That in turn means that in the above "if" statement, the check for
>   ((currentCost.compare(bestCost) < 0) ||
>     (currentSortAvoidanceCost.compare(bestCost) < 0))
> will always return true (because bestCost should--in theory--always be greater than 0.0d).  Thus, in the case where we don't have a required row ordering, the short-circuit logic will fail even if currentCost is actually greater than bestCost.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

Posted by Army <qo...@gmail.com>.

Bryan Pendleton wrote:

> I agree that this is a hard case, and I can see both sides. But I do think
> that we can craft an umbrella release note which will be valuable to users
> as they move to the new release, covering the multiple changes to the
> optimizer as a single logical unit. It seems to me that the release note
> should observe that:

<snip details of release note suggestions>

This sounds like a great idea.  I will try to write something up and post it to 
DERBY-1357 and/or DERBY-781 shortly.

Thanks for excellent suggestion,
Army

Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

Posted by Bryan Pendleton <bp...@amberpoint.com>.

> I posted possible release notes to DERBY-1357 and DERBY-781. 

Hi Army,

I just wanted to note that I read both the release notes, and the
wiki page too, and I think they are quite clear and will be useful
to customers when they upgrade. Thanks for putting them together!

bryan

Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

Posted by Army <qo...@gmail.com>.

Bryan Pendleton wrote:
> 
> I'd be glad to help by reviewing the release note, etc.

I posted possible release notes to DERBY-1357 and DERBY-781.  If you or anyone 
else has comments/suggestions, please do let me know...

Thanks,
Army

Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

Posted by Bryan Pendleton <bp...@amberpoint.com>.

>  So adding a test case with 
> timeout disabled won't show the symptom, and adding it with timeout 
> enabled will likely lead to intermittent failures on different machines. 

I agree. Unreliable tests can be worse than no tests at all. I'm comfortable
with this decision.

> release note would have to be something generic like "optimizer 
> fixes/enhancements could cause the optimizer to choose different query 
> plans, so performance of existing queries may change". 

I agree that this is a hard case, and I can see both sides. But I do think
that we can craft an umbrella release note which will be valuable to users
as they move to the new release, covering the multiple changes to the
optimizer as a single logical unit. It seems to me that the release note
should observe that:

  - there have been some optimizer changes in this release (with pointers
    to the specific bugs fixed)
  - the optimizer will now consider additional query plans
  - the optimizer's timeout logic has changed
  - we believe that the optimizer will now choose better plans, and choose
    them more quickly.
  - if you think you are seeing an optimizer-related problem as a result of
    upgrading to this release, please capture the actual query plan that is
    being chosen by the old release and by the new release, and send them
    to us so we can help with the analysis, and here's how to do that.

Perhaps that last bullet point should be a page on the wiki, and the release
note could just point to http://wiki.apache.org/db-derby/OptimizerDiagnosisTips
(or whatever the actual page turns out to be).

I'd be glad to help by reviewing the release note, etc.

thanks,

bryan

Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer appears to be incorrect...

Posted by Army <qo...@gmail.com>.

Thank you for reviewing this patch, Bryan.  My comments below.

> One question, though: what is the actual symptom of this bug? 
> If I am understanding it correctly, the symptom is that the 
> optimizer may pointlessly continue to investigate a possible 
> query plan which it should already be able to reject as being
> too expensive.

Yes, this is correct.

> Does that mean that the only symptom here is that the optimizer
> may waste some resources?

Directly, yes.  But indirectly, this could affect the the query plan that 
optimizer chooses because of optimizer "timeout".  The more time the optimizer 
wastes optimizing join orders that it should already be able to reject, the 
fewer join orders it's going to see before timeout occurs.  The patch for 
DERBY-1357 avoids the wasted time, thereby allowing the optimizer to 
(potentially) see more join orders before timeout occurs, and thus potentially 
allowing the optimizer to find a better plan.

That said, you may be wondering why I didn't add any new tests with this patch. 
  The answer is that the only time this patch will show a difference in query 
plan is if timeout is enabled (as explained above), but as I've learned from 
other tests already in the harness (esp. wisconsin, predicatePushdown, 
subquery), a test that enables timeout and then prints query plans is bound to 
fail (usually intermittently) on one system or another (because timeout is 
actually based on milliseconds and the amount of work that the optimizer can do 
in xxx milliseconds depends a lot on what machine is being used).  So adding a 
test case with timeout disabled won't show the symptom, and adding it with 
timeout enabled will likely lead to intermittent failures on different machines. 
  Hence no new tests.

The diff that occurs for predicatePushdown is an admittedly accidental way to 
see that the patch is in fact short-circuiting plans (see above comment on the 
master update), but other than that I'm not sure how else to measure the symptom.

> I guess the thing I'm still struggling with is the bit of the 
> change which adds "reloadBestPlan = true" at about line 530. 
> Does that part of the patch have any actual user-visible impact 
> on which query plan would be chosen, 

Great question.  Timeout effects aside, the plan that the optimizer *chooses* 
will be the same with or without this "reloadBestPlan" change.  However, the 
plan that the optimizer *generates* could change with this patch.  Let's say 
that the optimizer chooses some access plan for a subquery in some specific join 
order and calls it the "best so far", then it starts processing a new join order 
in which the plan for the subquery changes, and then short-circuit happens 
(either because of the DERBY-1357 changes or because of timeout).  Without the 
"reloadBestPlan" change the optimizer would have *chosen* the "best so far" plan 
as the overall best one--but since it didn't reload the best plan for the 
subquery, it would actually end up *generating* the latest plan (i.e. the plan 
that was found as part of the short-circuited join order).  By setting 
reloadBestPlan to true, we make sure that the optimizer will reload the "best so 
far" plan as its best, and thus that's the plan that will ultimately get generated.

> such that we need to have a release note here?

Hmm, another good question.  I guess I didn't think this warranted a release 
note because it's fixing an existing problem in the optimizer (or actually, two: 
incorrect short-circuit logic and failure to reload best plans when 
short-circuit occurs) and now the optimizer is generating the correct plan based 
on short-circuiting.  But since, as you say, this could cause existing query 
plans to change, I guess this might merit a release note after all...?

If that's the case, then the same will be true of DERBY-781 (and was also true 
of DERBY-1007 and DERBY-805 changes...), and in all cases the release note would 
have to be something generic like "optimizer fixes/enhancements could cause the 
optimizer to choose different query plans, so performance of existing queries 
may change".  Theoretically the performance change should always be a *good* 
thing, and it seems a bit odd to include a release note saying "queries might 
run more quickly now!" :)  However, I guess the practical side of it is that 
some queries could potentially end up with worse query plans, so perhaps a 
release note is a good idea...

If that's the consensus, I'll try to post a suggested release note to this issue 
(and maybe to DERBY-781, as well).

Hopefully that answers your questions--but if not, please feel free to ask more.

And thanks again for the review; I appreciate it!
Army