You are viewing a plain text version of this content. The canonical link for it is here.

Posted to derby-dev@db.apache.org by Bryan Pendleton <bp...@amberpoint.com> on 2006/02/08 19:31:25 UTC

Re: Regression Test Harness handling of "duration" field

David W. Van Couvering wrote:
> My understanding it's the wall clock time. 

>>   derbyall   630 6 624 0
>>     Duration   45.6%

OK. So it's saying that this particular run of 'derbyall'
took 46.2% of the wall clock time that 'derbyall' took
on August 2, 2005.

That makes sense.

Given that this particular run failed badly, we probably
don't care about the Durations, then.

But in general, we'd probably want to keep our eyes open
for Duration values that started to get significantly
higher than 100%, because that would mean that we might
have accidentally introduced a performance regression.

Perhaps we could have some sort of trigger, so that if a
suite experienced a duration of, say, 150%, that was
treated as a regression failure, even if all the tests in
that suite passed?

thanks,

bryan

Re: Regression Test Harness handling of "duration" field

Posted by Dy...@Sun.COM.

>>>>> "BP" == Bryan Pendleton <bp...@amberpoint.com> writes:

    BP> David W. Van Couvering wrote:
    >> My understanding it's the wall clock time.

    >>> derbyall   630 6 624 0
    >>> Duration   45.6%

    BP> OK. So it's saying that this particular run of 'derbyall'
    BP> took 46.2% of the wall clock time that 'derbyall' took
    BP> on August 2, 2005.

    BP> That makes sense.

    BP> Given that this particular run failed badly, we probably
    BP> don't care about the Durations, then.

    BP> But in general, we'd probably want to keep our eyes open
    BP> for Duration values that started to get significantly
    BP> higher than 100%, because that would mean that we might
    BP> have accidentally introduced a performance regression.

    BP> Perhaps we could have some sort of trigger, so that if a
    BP> suite experienced a duration of, say, 150%, that was
    BP> treated as a regression failure, even if all the tests in
    BP> that suite passed?

Given that the "Derby way" requires all bug fixes to be accompanied
with a regression test that tests that the bug has not been
re-introduced, AND that all such regression tests should be a part of
derbyall (this is what I've been told, if it is not correct, please
let me know), it should only be a matter of time before your trigger
fires. Unless you create a new baseline regularly, of course...

-- 
dt

Re: Regression Test Harness handling of "duration" field

Posted by "David W. Van Couvering" <Da...@Sun.COM>.

+1 on the trigger, I had suggested that earlier.

David

Bryan Pendleton wrote:
> David W. Van Couvering wrote:
> 
>> My understanding it's the wall clock time. 
> 
> 
>>>   derbyall   630 6 624 0
>>>     Duration   45.6%
> 
> 
> OK. So it's saying that this particular run of 'derbyall'
> took 46.2% of the wall clock time that 'derbyall' took
> on August 2, 2005.
> 
> That makes sense.
> 
> Given that this particular run failed badly, we probably
> don't care about the Durations, then.
> 
> But in general, we'd probably want to keep our eyes open
> for Duration values that started to get significantly
> higher than 100%, because that would mean that we might
> have accidentally introduced a performance regression.
> 
> Perhaps we could have some sort of trigger, so that if a
> suite experienced a duration of, say, 150%, that was
> treated as a regression failure, even if all the tests in
> that suite passed?
> 
> thanks,
> 
> bryan
>

Re: Regression Test Harness handling of "duration" field

Posted by Mike Matrigali <mi...@sbcglobal.net>.

tracking test time is nice, but remember the point of tests
is not performance measurement.  Having nothing else it
is reasonable to look at cases as described below - but
we shouldn't fool ourselves that it a great way to measure
performance regression.  As has been reported on the list
tests tend to do one thing once and any sort of outside
influence of the machine or other processes can easily
skew numbers (true of any performance measurement of course).

It would be better if we had some sort of performance
regression test suite.  Personally I like 2 flavors of
such a beast.  One is a very directed test that is more
to measure pieces of the system sort of like unit testing
rather than model a user application.  Cloudscape had
such a beast but was not donated as it contained a lot
of customer based data which we could not donate.  It also
sort of grew like the current test harness, so it may be
better to start fresh rather than port the code.

The other flavor is standardized open source benchmarks.  It
looks like some contributers on the list are working with
TPC like benchmarks.  These test the whole system and are
good for system regression testing, but it hard work for a
developer to go from this test is 10% slower to what line
of code caused it.

I am just wondering if this is a problem that Junit or some
other standard opensource harness can handle rather than
creating yet another harness in derby.  I don't really know
much about Junit.  The features I would like from such a harness are:

o control a init/cleanup routine per test
o control a init/cleanup routine per thread in test
o control number of threads
o control number of iterations of test
o control number of repeats of iterations of test
o collect elapsed time of each of the pieces (test, per user, and 
overall test).
o allow for properties to be passed in to control test behavior

So you could write a simple insert test and then with the same
implementation try out:
o 1000 inserts
o 10 runs of 1000 inserts
o 1000 inserts, 10 users
o 10 runs of 1000 inserts, 10 users

extra credit:
o collect  I/O stats (don't think there is 100% pure java way)
o collect system vs. user time

Bryan Pendleton wrote:
> David W. Van Couvering wrote:
> 
>> My understanding it's the wall clock time. 
> 
> 
>>>   derbyall   630 6 624 0
>>>     Duration   45.6%
> 
> 
> OK. So it's saying that this particular run of 'derbyall'
> took 46.2% of the wall clock time that 'derbyall' took
> on August 2, 2005.
> 
> That makes sense.
> 
> Given that this particular run failed badly, we probably
> don't care about the Durations, then.
> 
> But in general, we'd probably want to keep our eyes open
> for Duration values that started to get significantly
> higher than 100%, because that would mean that we might
> have accidentally introduced a performance regression.
> 
> Perhaps we could have some sort of trigger, so that if a
> suite experienced a duration of, say, 150%, that was
> treated as a regression failure, even if all the tests in
> that suite passed?
> 
> thanks,
> 
> bryan
> 
> 
>