You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by "Bryan Pendleton (JIRA)" <de...@db.apache.org> on 2006/07/14 20:07:14 UTC
[jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
[ http://issues.apache.org/jira/browse/DERBY-1357?page=comments#action_12421183 ]
Bryan Pendleton commented on DERBY-1357:
----------------------------------------
Hi Army,
Your change looks very good to me. I like the way you named the boolean temp variable; it makes the code substantially easier to read. And thanks for adding the good comments in the code.
One question, though: what is the actual symptom of this bug? If I am understanding it correctly, the symptom is that the optimizer may pointlessly continue to investigate a possible query plan which it should already be able to reject as being too expensive.
Does that mean that the only symptom here is that the optimizer may waste some resources?
I guess the thing I'm still struggling with is the bit of the change which adds "reloadBestPlan = true" at about line 530. Does that part of the patch have any actual user-visible impact on which query plan would be chosen, such that we need to have a release note here? Or is that just another bit of resource wastage that we're improving?
> Short-circuit logic in optimizer appears to be incorrect...
> -----------------------------------------------------------
>
> Key: DERBY-1357
> URL: http://issues.apache.org/jira/browse/DERBY-1357
> Project: Derby
> Issue Type: Bug
> Components: Performance
> Affects Versions: 10.0.2.0, 10.0.2.1, 10.1.1.0, 10.2.0.0, 10.1.2.0, 10.1.1.1, 10.1.1.2, 10.1.2.1, 10.1.2.2, 10.1.2.3, 10.1.2.4
> Reporter: A B
> Assigned To: A B
> Priority: Minor
> Fix For: 10.2.0.0
>
> Attachments: d1357_v1.patch, d1357_v1.stat
>
>
> When considering different join orders for the FROM tables in a query, the optimizer will decide to give up on a join order midway through if the cost of that (partial) join order is already higher than the cost of some other *complete* join order that the optimizer previously found. This "short-circuiting" of a join order can save compilation time.
> That said, the logic to perform this "short-circuit" of a join order is currently as follows (from OptimizerImpl.java):
> /*
> ** Pick the next table in the join order, if there is an unused position
> ** in the join order, and the current plan is less expensive than
> ** the best plan so far, and the amount of time spent optimizing is
> ** still less than the cost of the best plan so far, and a best
> ** cost has been found in the current join position. Otherwise,
> ** just pick the next table in the current position.
> */
> boolean joinPosAdvanced = false;
> if ((joinPosition < (numOptimizables - 1)) &&
> ((currentCost.compare(bestCost) < 0) ||
> (currentSortAvoidanceCost.compare(bestCost) < 0)) &&
> ( ! timeExceeded )
> )
> {
> ...
> }
> There are two "current costs" in this statement: one for the cost if the optimizer is calculating a "sort avoidance" plan (which it does if there is a required row ordering on the results) and one if it is calculating a plan for which row order is not important.
> I admit that I'm not all that familiar with what goes on with the costing of a sort-avoidance plan, but inspection of the code shows that, when there is no required row ordering--i.e. when we aren't looking for a sort-avoidance plan--the cost field of currentSortAvoidanceCost will always be 0.0d. That in turn means that in the above "if" statement, the check for
> ((currentCost.compare(bestCost) < 0) ||
> (currentSortAvoidanceCost.compare(bestCost) < 0))
> will always return true (because bestCost should--in theory--always be greater than 0.0d). Thus, in the case where we don't have a required row ordering, the short-circuit logic will fail even if currentCost is actually greater than bestCost.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
Posted by Army <qo...@gmail.com>.
Bryan Pendleton wrote:
> I agree that this is a hard case, and I can see both sides. But I do think
> that we can craft an umbrella release note which will be valuable to users
> as they move to the new release, covering the multiple changes to the
> optimizer as a single logical unit. It seems to me that the release note
> should observe that:
<snip details of release note suggestions>
This sounds like a great idea. I will try to write something up and post it to
DERBY-1357 and/or DERBY-781 shortly.
Thanks for excellent suggestion,
Army
Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
Posted by Bryan Pendleton <bp...@amberpoint.com>.
> I posted possible release notes to DERBY-1357 and DERBY-781.
Hi Army,
I just wanted to note that I read both the release notes, and the
wiki page too, and I think they are quite clear and will be useful
to customers when they upgrade. Thanks for putting them together!
bryan
Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
Posted by Army <qo...@gmail.com>.
Bryan Pendleton wrote:
>
> I'd be glad to help by reviewing the release note, etc.
I posted possible release notes to DERBY-1357 and DERBY-781. If you or anyone
else has comments/suggestions, please do let me know...
Thanks,
Army
Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
Posted by Bryan Pendleton <bp...@amberpoint.com>.
> So adding a test case with
> timeout disabled won't show the symptom, and adding it with timeout
> enabled will likely lead to intermittent failures on different machines.
I agree. Unreliable tests can be worse than no tests at all. I'm comfortable
with this decision.
> release note would have to be something generic like "optimizer
> fixes/enhancements could cause the optimizer to choose different query
> plans, so performance of existing queries may change".
I agree that this is a hard case, and I can see both sides. But I do think
that we can craft an umbrella release note which will be valuable to users
as they move to the new release, covering the multiple changes to the
optimizer as a single logical unit. It seems to me that the release note
should observe that:
- there have been some optimizer changes in this release (with pointers
to the specific bugs fixed)
- the optimizer will now consider additional query plans
- the optimizer's timeout logic has changed
- we believe that the optimizer will now choose better plans, and choose
them more quickly.
- if you think you are seeing an optimizer-related problem as a result of
upgrading to this release, please capture the actual query plan that is
being chosen by the old release and by the new release, and send them
to us so we can help with the analysis, and here's how to do that.
Perhaps that last bullet point should be a page on the wiki, and the release
note could just point to http://wiki.apache.org/db-derby/OptimizerDiagnosisTips
(or whatever the actual page turns out to be).
I'd be glad to help by reviewing the release note, etc.
thanks,
bryan
Re: [jira] Commented: (DERBY-1357) Short-circuit logic in optimizer
appears to be incorrect...
Posted by Army <qo...@gmail.com>.
Thank you for reviewing this patch, Bryan. My comments below.
> One question, though: what is the actual symptom of this bug?
> If I am understanding it correctly, the symptom is that the
> optimizer may pointlessly continue to investigate a possible
> query plan which it should already be able to reject as being
> too expensive.
Yes, this is correct.
> Does that mean that the only symptom here is that the optimizer
> may waste some resources?
Directly, yes. But indirectly, this could affect the the query plan that
optimizer chooses because of optimizer "timeout". The more time the optimizer
wastes optimizing join orders that it should already be able to reject, the
fewer join orders it's going to see before timeout occurs. The patch for
DERBY-1357 avoids the wasted time, thereby allowing the optimizer to
(potentially) see more join orders before timeout occurs, and thus potentially
allowing the optimizer to find a better plan.
That said, you may be wondering why I didn't add any new tests with this patch.
The answer is that the only time this patch will show a difference in query
plan is if timeout is enabled (as explained above), but as I've learned from
other tests already in the harness (esp. wisconsin, predicatePushdown,
subquery), a test that enables timeout and then prints query plans is bound to
fail (usually intermittently) on one system or another (because timeout is
actually based on milliseconds and the amount of work that the optimizer can do
in xxx milliseconds depends a lot on what machine is being used). So adding a
test case with timeout disabled won't show the symptom, and adding it with
timeout enabled will likely lead to intermittent failures on different machines.
Hence no new tests.
The diff that occurs for predicatePushdown is an admittedly accidental way to
see that the patch is in fact short-circuiting plans (see above comment on the
master update), but other than that I'm not sure how else to measure the symptom.
> I guess the thing I'm still struggling with is the bit of the
> change which adds "reloadBestPlan = true" at about line 530.
> Does that part of the patch have any actual user-visible impact
> on which query plan would be chosen,
Great question. Timeout effects aside, the plan that the optimizer *chooses*
will be the same with or without this "reloadBestPlan" change. However, the
plan that the optimizer *generates* could change with this patch. Let's say
that the optimizer chooses some access plan for a subquery in some specific join
order and calls it the "best so far", then it starts processing a new join order
in which the plan for the subquery changes, and then short-circuit happens
(either because of the DERBY-1357 changes or because of timeout). Without the
"reloadBestPlan" change the optimizer would have *chosen* the "best so far" plan
as the overall best one--but since it didn't reload the best plan for the
subquery, it would actually end up *generating* the latest plan (i.e. the plan
that was found as part of the short-circuited join order). By setting
reloadBestPlan to true, we make sure that the optimizer will reload the "best so
far" plan as its best, and thus that's the plan that will ultimately get generated.
> such that we need to have a release note here?
Hmm, another good question. I guess I didn't think this warranted a release
note because it's fixing an existing problem in the optimizer (or actually, two:
incorrect short-circuit logic and failure to reload best plans when
short-circuit occurs) and now the optimizer is generating the correct plan based
on short-circuiting. But since, as you say, this could cause existing query
plans to change, I guess this might merit a release note after all...?
If that's the case, then the same will be true of DERBY-781 (and was also true
of DERBY-1007 and DERBY-805 changes...), and in all cases the release note would
have to be something generic like "optimizer fixes/enhancements could cause the
optimizer to choose different query plans, so performance of existing queries
may change". Theoretically the performance change should always be a *good*
thing, and it seems a bit odd to include a release note saying "queries might
run more quickly now!" :) However, I guess the practical side of it is that
some queries could potentially end up with worse query plans, so perhaps a
release note is a good idea...
If that's the consensus, I'll try to post a suggested release note to this issue
(and maybe to DERBY-781, as well).
Hopefully that answers your questions--but if not, please feel free to ask more.
And thanks again for the review; I appreciate it!
Army