You are viewing a plain text version of this content. The canonical link for it is here.
Posted to derby-dev@db.apache.org by Ole Solberg <Ol...@Sun.COM> on 2006/03/13 11:14:49 UTC
Regression Test Observation: increased testsuite execution time for
nist and storetests
I had a look at the testsuite execution times for the last week and see
that two testsuites have had significant increases:
storetests: with svn 383955 (2006-03-07) duration increased
3 to 11 times depending on platform.
http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383955.txt
nist: with svn 383172 (2006-03-04) duration increased
4 to 7 times depending on platform.
http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
Run as part of derbyall these increases are hardly significant.
--
Ole Solberg, Database Technology Group,
Sun Microsystems, Trondheim, Norway
Re: Regression Test Observation: increased testsuite execution time
for nist and storetests
Posted by Mike Matrigali <mi...@sbcglobal.net>.
yea that test, the reclaim_longcol has a few cases with relative
large rows, and some waits for post commit so it is
expected that it would take longer.
I would be surprised if nist changed at all.
Ole Solberg wrote:
> Ole Solberg wrote:
>
>> I had a look at the testsuite execution times for the last week and
>> see that two testsuites have had significant increases:
>>
>> storetests: with svn 383955 (2006-03-07) duration increased
>> 3 to 11 times depending on platform.
>> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383955.txt
>
> Ok: a new test was added:
> A
> java/testing/org/apache/derbyTesting/functionTests/tests/storetests/st_reclaim_longcol.java
>
>
>>
>> nist: with svn 383172 (2006-03-04) duration increased
>> 4 to 7 times depending on platform.
>> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
>>
>>
>> Run as part of derbyall these increases are hardly significant.
>>
>>
>
>
Re: Regression Test Observation: increased testsuite execution time
for nist and storetests
Posted by Ole Solberg <Ol...@Sun.COM>.
Ole Solberg wrote:
> I had a look at the testsuite execution times for the last week and see
> that two testsuites have had significant increases:
>
> storetests: with svn 383955 (2006-03-07) duration increased
> 3 to 11 times depending on platform.
> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383955.txt
Ok: a new test was added:
A
java/testing/org/apache/derbyTesting/functionTests/tests/storetests/st_reclaim_longcol.java
>
> nist: with svn 383172 (2006-03-04) duration increased
> 4 to 7 times depending on platform.
> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
>
>
> Run as part of derbyall these increases are hardly significant.
>
>
--
Ole Solberg, Database Technology Group,
Sun Microsystems, Trondheim, Norway
Re: Regression Test Observation: increased testsuite execution time
for nist and storetests
Posted by Army <qo...@sbcglobal.net>.
Army wrote:
> Ole Solberg wrote:
>
>> I had a look at the testsuite execution times for the last week and
>> see that two testsuites have had significant increases:
>>
>> nist: with svn 383172 (2006-03-04) duration increased
>> 4 to 7 times depending on platform.
>> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
>
>
> Looks like this is caused by my changes for DERBY-805.
Actually, the nist slow-down is the result of my changes for DERBY-1007 and/or
DERBY-1073 (depending on how you look at it) rather than for DERBY-805.
Prior to DERBY-1073, if the optimizer for a subquery timed out in one round and
then we returned for a subsequent round after that, the subquery's optimizer
would immediately return without further optimization. My changes to DERBY-1073
added logic to reset the timeout state in cases where the subquery receives
predicates that have been pushed to it from outer queries, so that the optimizer
can use the predicates to consider better plans. At the same time, in an
attempt to avoid increasing the compilation time for subqueries which do not
receive pushed predicates, I added logic to keep the "immediately return without
further optimization" functionality as the 'default' behavior (i.e. behavior
when there are no pushed predicates). But as the nist slowdown shows, this
"default" approach was incomplete.
The reason the DERBY-1073 patch is incomplete is because of DERBY-1007, which
resets the bestCost field of a subquery's optimizer at the start of every round
of optimization. This behavior (which is correct) combined with the 'default'
behavior of DERBY-1073 means that when we "immediately return without further
optimization" for a subquery, the bestCost value will be uninitialized, i.e.
Double.MAX_VALUE. That value will then be passed back up to the outer query,
which will add it to its running cost and end up with "infinity". If the outer
query has already found a best plan at this point, then the infinite cost for
the current join order will cause the outer query's optimizer to reject that
join order in favor of the previous best. What this ultimately means is that
whatever outer join order was deemed "best" at the time the subquery first timed
out will remain as the "best" from that point on, even if there are other join
orders which, in the absence of a subquery timeout, would actually have been
cheaper. Since the best cost so far is what determines the optimizer's timeout
value, the optimizer for the outer query could end up spending more time trying
to find an optimal join order than we really want to spend.
The above scenario is what is causing the slowdown in nist/dml132.sql after the
DERBY-1007 and DERBY-1073 checkins. The outer query has 10 optimizables, one of
which is subquery. While the outer optimizer is costing different join orders,
the subquery's optimizer eventually times out. At that point the best join
order for the outer query has a "best cost" of 6,616,444. That cost in turn
becomes the timeout value (in milliseconds) for the outer query, translating to
about 110 minutes. Thus the optimizer will continue to optimize for up to 110
minutes or until it has exhausted all possible permutations of its 10
optimizables. In this case it exhausts the permutations first--but only after
about 10 minutes of optimization.
To verify this behavior I ran the nist/dml132.sql test against 10.1 with
derby.optimizer.noTimeout set to true, which tells the optimizer to exhaust all
possible join orders before picking the best one. When I did this the
dml132.sql test took roughly the same amount of time to compile and execute as
it does with the current 10.2 trunk (i.e. just over 10 minutes). That's what I
would expect given the above explanation.
Prior to the fixes for DERBY-1007 and DERBY-1073, the subquery would still
timeout--but since bestCost was never reset, the subquery would return whatever
value was in bestCost from the previous round of optimization. That cost, while
still technically incorrect (see below), was nonetheless more reasonable than
"infinity", and thus the outer optimizer was still able to do meaningful cost
comparisons and to find cheaper join orders. With each new "best" join order
the timeout value for the outer optimizer would decrease accordingly, until it
eventually became something reasonable like 15-ish seconds, at which point the
optimizer would stop and just pick the best plan so far.
Note, though, that while things worked before DERBY-1007 and DERBY-1073, the
cost that the subquery was returning after it timed out was actually
incorrect--it was the "best cost" for the subquery as it was stored at the point
it timed out, instead of the best cost for the subquery w.r.t its current
position in the outer join order.
So all of that said, the fix for the nist slowdown issue is to take the changes
for DERBY-1073 one step further by recognizing when a subquery's optimizer has
timed out and, instead of returning immediately, letting the optimizer continue
until it finds it's first "best plan" for the current round. Or put more
simply, we need to delay the timeout until the optimizer has a legitimate cost
that it can return.
I have a patch to address this issue and I plan to attach it to DERBY-1073 as
follow-up patch to the original. I ran derbyall with the changes last night and
saw no failures, but I did make some slight changes this morning and so would
like to re-run derbyall again before posting.
In the meantime, if anyone who reads this has any feedback/comments, that would
be great.
Thanks much,
Army
Re: Regression Test Observation: increased testsuite execution time
for nist and storetests
Posted by Army <qo...@sbcglobal.net>.
Ole Solberg wrote:
> I had a look at the testsuite execution times for the last week and see
> that two testsuites have had significant increases:
>
> nist: with svn 383172 (2006-03-04) duration increased
> 4 to 7 times depending on platform.
> http://www.multinet.no/~solberg/public/Apache/Derby/UpdateInfo/383172.txt
Looks like this is caused by my changes for DERBY-805. One test in
particular--dml132.sql--is taking much longer now than it used to. I'm looking
into it and will post when I have more...
Thanks for pointing this out.
Army