You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Army <qo...@sbcglobal.net> on 2006/03/03 01:47:52 UTC

[OPTIMIZER] Optimizer "timeout" for subqueries?

While working on changes for DERBY-805 and DERBY-1007, I noticed that the 
"timeout" mechanism for the Derby optimizer is never reset for subqueries.  As 
is usually required when talking about the optimizer, let me give an example of 
what I mean.

If we have a query such as:

select <...> from
   (select t1.i, t2.j from t1, t2 where <...>) X1,
    T3
where <...>

then we would have one "outer" query and one "subquery".  The outer query would 
be "select <...> from X1, T3", the subquery would be "select t1.i, t2.j from t1, 
t2".

In this case Derby will create two instances of OptimizerImpl: one for the outer 
query (call it OI_OQ) and one for the subquery (call it OI_SQ).  Each 
OptimizerImpl has its own timeout "clock" that it initializes at creation 
time--but never resets.  If timeout occurs, the OptimizerImpl will stop 
searching for "the" best plan and will just take the best plan found so far.

That said, for every permutation of the outer query a call will be made to 
optimize the subquery.  To simplify things, let's assume there are only two 
permutations of the outer query: one with join order {X1, T3} and another with 
join order {T3, X1}.

Now let's say we're looking at the first permutation {X1, T3}.  OI_OQ will make 
a call to optimize the subquery represented by OI_SQ.  Let's further say that 
the subquery tries some permutation {T1, T2} and then times out.  It then 
returns the plan information for {T1, T2} to the outer query.  The outer query, 
which has *not* yet timed out, then decides to try its second permutation {T3, 
X1}.  So it again makes a call to optimize the subquery.  In this case, the 
subquery--which has already timed out--will *immediately* return without trying 
to optimize anything.  The outer query will then make a decision about its 
second permutation based on the un-optimized subquery's plan results.

My question is: is this intentional behavior?

On the one hand, I can sort of see the logic in not trying to optimize the 
subquery after the first time, because if we've "timed out" then we just want to 
"wrap things up" as quickly as possible.

On the other hand, the outer query--which is the query representing the 
top-level statement that the user is executing--has *not* yet timed out, so it 
seems to me like the second call to optimize the subquery should get a "clean 
start" instead of timing out right away.

This hasn't really been an issue to date because the "best plan" chosen by the 
subquery is typically independent of the outer query's current permutation--with 
the exception of "outerCost", which is passed in from the outer query and is 
factored into the subquery's cost estimates.  Because of this relative 
independence, the plan chosen by the subquery would rarely (if ever?) change 
with different permutations of the outer query, so if the subquery timed out 
once there was no point in trying to re-optimize it again later.

With my changes for DERBY-805, though, I'm introducing the notion of pushing 
predicates from outer queries down into subqueries--which means that the outer 
join order can have a very significant impact on the plan chosen by the 
subquery.  But because the timeout mechanism is never reset, we could end up 
skipping the second optimization phase of the subquery, which means we never get 
a chance to see how much the outer predicates can help, and thus we could end up 
skipping over some plans that have the potential to give us significant 
performance improvement.

It's hard to come up with a concrete example of this because it depends on 
optimizer timeout, which varies with different machines.  But in my work with 
DERBY-805 I've seen cases where the optimizer ignores pushed predicates because 
of subquery timeout and thus ends up choosing a (hugely) sup-optimal plan.

All of that said, in an attempt to resolve this issue I added the following two 
lines to the "prepForNextRound()" method in OptimizerImpl (that method was only 
recently added as part of a change for DERBY-805):

+	/* Reset timeout state since we want to timeout on a per-round
+	 * basis.  Otherwise if we timed out during a previous round and
+	 * then we get here for another round, we'll immediately
+	 * "timeout" again before optimizing any of the Optimizables--
+	 * which means we could be missing out on some helpful
+	 * optimizations that weren't available for the previous
+	 * round (esp. use of indexes/qualifiers that are made possible
+	 * by predicates pushed down from the outer query).  So by
+	 * resetting the timeout state here, we give ourselves a chance
+	 * to do some legitimate optimization before (potentially) timing
+	 * out again.  This could allow us to find some access plans that
+	 * are significantly better than anything we found in previous
+	 * rounds.
+	 */
+	timeOptimizationStarted = System.currentTimeMillis();
+	timeExceeded = false;

I applied this change locally in combination with the DERBY-1007 patch and ran 
derbyall on Linux Red Hat with IBM 1.4.2 with no failures.

Does anyone out there know if this is a safe change?  I would appreciate any 
feedback/advice here.  If I don't hear any objections, I'll file a sub-task to 
DERBY-805 for this issue and post the above patch there.

Many thanks in advance to anyone who might have input/feedback here.
Army


Re: [OPTIMIZER] Optimizer "timeout" for subqueries?

Posted by Army <qo...@sbcglobal.net>.
Satheesh Bandaram wrote:
> Just remembered, during discussions with Jeffl, he mentioned another one... 
> Improve unnesting of subqueries with more than one table. More work. :-)

<snip list of Optimizer To-Do's>

One other thing to add to this list was mentioned a while ago on derby-dev here:

http://mail-archives.apache.org/mod_mbox/db-derby-dev/200412.mbox/%3c41B9F482.4020900@sbcglobal.net%3e

In that email Jack talked about his BackingStoreHashtable changes and the need 
to update Optimizer costing to account for those changes (which were committed 
as part of 10.1).  In that email Jack says:

 > I would like to work on this, changing BackingStoreHashtable to spill to
 > disk when the hash table gets large, and changing the optimizer to
 > understand that large hash tables are allowed, but more costly.

The changes to spill to disk were implemented for 10.1, but I don't think the 
optimizer was ever updated accordingly.  In particular, it still checks the 
memory requirements for the hash join and, if it's too high, the Optimizer will 
skip it.  I think the code that does this in the OptimizerImpl.considerCost() 
method:

/*
** Skip this access path if it takes too much memory.
**
** NOTE: The default assumption here is that the number of rows in
** a single scan is the total number of rows divided by the number
** of outer rows.  The optimizable may over-ride this assumption.
**
** NOTE: This is probably not necessary here, because we should
** get here only for nested loop joins, which don't use memory.
*/
if(!optimizable.memoryUsageOK( estimatedCost.rowCount() / outerCost.rowCount(),
     maxMemoryPerTable))
{
	if (optimizerTrace)
	{
		trace(SKIPPING_DUE_TO_EXCESS_MEMORY, 0, 0, 0.0, null);
	}
	return;
}

Note that the comment regarding "this is probably not necessary" appears to be 
out of date--there are situations where we can and do get to this code when 
considering the cost of hash joins.

So as another "To-Do", I think we need to somehow update the optimizer to 
account for the fact that if the hash join requires a lot of memory and thus 
will (probably?) spill over, the cost should be adjusted accordingly--instead of 
just skipping the plan altogether.

Army


Re: [OPTIMIZER] Optimizer "timeout" for subqueries?

Posted by Army <qo...@sbcglobal.net>.
Satheesh Bandaram wrote:

> Good plan. So in your example OI_SQ would be reoptimized by resetting
> the timeout only if join-predicates could come from OI_OQ? 

Only if join predicates *came* from OI_OQ, yes.  I don't think "could come" 
would be the right approach because then we might reset the timeout state even 
though there aren't actually any predicates this round.  It's possible the 
predicate will be pushed for one round and not pushed for another, and we would 
only want to reset the timeout when the predicate was actually pushed.

> If there is another OI_SQ1 which may have new join-predicates this round, 
> your changes would not reset timeout for OI_SQ?

If OI_OQ is pushing a predicate down, it will only push it to the subquery to 
which it is intened--either OI_SQ1 or OI_SQ2.  The predicate will then be scoped 
to that subquery and passed into the that subquery's OptimizerImpl.  The 
OptimizerImpl will in turn see the scoped predicate and reset its timeout state. 
  The OptimizerImpl for the other subquery will _not_ have received the 
predicate and so it will not reset its state.

Extending my original example to be:

select <...> from
   (select t1.i, t2.j from t1, t2 where <...>) X1,
   (select t3.a, t4.b from t3, t4 where <...>) X2,
   T5
where T5.p = X2.a

Then we'd have OI_OQ for the outer select, OI_SQ1 for X1 and OI_SQ2 for X2.  In 
this case OI_OQ would only pass the predicate to OI_SQ2 since that's the target 
subquery of the reference "X2.a".  So OI_SQ2 would reset its timeout state. 
OI_SQ1 would not.

All of that said, my changes for DERBY-805 won't actually push the predicate in 
the above example--DERBY-805 will only push predicates into UNION nodes.  But 
the argument is still the same...

Does that answer your question?
Army


Re: [OPTIMIZER] Optimizer "timeout" for subqueries?

Posted by Satheesh Bandaram <sa...@Sourcery.Org>.

Army wrote:

> After re-reading what I wrote, I realized we have a basis for logic to
> only reset the timeout state when it could potentially be beneficial
> to do so--i.e. when we have predicates pushed from the outer query.
>
> I could add logic to check to see if the OptimizerImpl has predicates
> from outer queries and, if so, then reset the timeout state; 

Good plan. So in your example OI_SQ would be reoptimized by resetting
the timeout only if join-predicates could come from OI_OQ? I hope we are
able to correctly recognize the cases where there may be mulitple
OptimizerImpl for each subquery. If there is another OI_SQ1 which may
have new join-predicates this round, your changes would not reset
timeout for OI_SQ?

Satheesh

> otherwise, since the subquery's "best plan" probably won't change much
> from the previous round, we would leave the timeout state as it is. 
> This approach allows the optimizer to consider plans that use pushed
> predicates while at the same time reducing the likelihood that queries
> which have timed out in the past (prior to 10.2) would now take longer
> to compile.
>
> I'll code that up and run derbyall tonight; in the meantime, I'd still
> like to hear comments/suggestions from any readers out there...
>
> Thanks,
> Army
>
>
>


Re: [OPTIMIZER] Optimizer "timeout" for subqueries?

Posted by Army <qo...@sbcglobal.net>.
Army wrote:

> This hasn't really been an issue to date because the "best plan" chosen 
> by the subquery is typically independent of the outer query's current 
> permutation--with the exception of "outerCost", which is passed in from 
> the outer query and is factored into the subquery's cost estimates.  
> Because of this relative independence, the plan chosen by the subquery 
> would rarely (if ever?) change with different permutations of the outer 
> query, so if the subquery timed out once there was no point in trying to 
> re-optimize it again later.
> 
> With my changes for DERBY-805, though, I'm introducing the notion of 
> pushing predicates from outer queries down into subqueries--which means 
> that the outer join order can have a very significant impact on the plan 
> chosen by the subquery.

After re-reading what I wrote, I realized we have a basis for logic to only 
reset the timeout state when it could potentially be beneficial to do so--i.e. 
when we have predicates pushed from the outer query.

I could add logic to check to see if the OptimizerImpl has predicates from outer 
queries and, if so, then reset the timeout state; otherwise, since the 
subquery's "best plan" probably won't change much from the previous round, we 
would leave the timeout state as it is.  This approach allows the optimizer to 
consider plans that use pushed predicates while at the same time reducing the 
likelihood that queries which have timed out in the past (prior to 10.2) would 
now take longer to compile.

I'll code that up and run derbyall tonight; in the meantime, I'd still like to 
hear comments/suggestions from any readers out there...

Thanks,
Army