You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@accumulo.apache.org by Bill Havanki <bh...@cloudera.com> on 2013/10/18 17:07:21 UTC

Appropriate changes to increase test reliability

Hello all,

We've often found while running Accumulo tests, at least under 1.4.x and
1.5.x, that we need to tweak aspects of the tests to get them to run more
reliably. This is usually extensions to timeouts defined for how long tests
should be allowed to run, how long shutdowns are expected to take, and the
like.

We suspect the need to make these changes arises from us running tests on
VMs, while the tests were perhaps originally aimed for execution on
dedicated physical resources.

Should we provide patches for these sort of changes, so that others get
increased reliability? Or, conversely, does this indicate issues with our
test environments? How open are the tests to being changed for reliability?

I welcome your thoughts. Thank you!

Bill

-- 
// - - -
// Bill Havanki
// Solutions Architect, Cloudera
// - - -

Re: Appropriate changes to increase test reliability

Posted by Eric Newton <er...@gmail.com>.
> Should we provide patches for these sort of changes, so that others get
> increased reliability? Or, conversely, does this indicate issues with our
> test environments? How open are the tests to being changed for reliability?

I live for test reliability; please send patches, suggestions and
criticisms.  Patches preferred. :-)

-Eric

Re: Appropriate changes to increase test reliability

Posted by Bill Havanki <bh...@cloudera.com>.
I'll look into our VM server farm some to see what can be done. It's super
overloaded, that's for sure.

The overridable timeout suggestion (thanks for that!) gave me an idea - I'm
playing with an option to add a multiplier to certain timeouts (the ones
bothering me) via the command line, i.e., run.py -f 3 to triple timeout
lengths. That can preserve the current timeout numbers in the tests but let
you scale them out for slower machines, VMs, etc.

Bill


On Mon, Oct 21, 2013 at 6:56 AM, Steve Loughran <st...@hortonworks.com>wrote:

> On 18 October 2013 16:07, Bill Havanki <bh...@cloudera.com> wrote:
>
> > Hello all,
> >
> > We've often found while running Accumulo tests, at least under 1.4.x and
> > 1.5.x, that we need to tweak aspects of the tests to get them to run more
> > reliably. This is usually extensions to timeouts defined for how long
> tests
> > should be allowed to run, how long shutdowns are expected to take, and
> the
> > like.
> >
> > We suspect the need to make these changes arises from us running tests on
> > VMs, while the tests were perhaps originally aimed for execution on
> > dedicated physical resources.
> >
>
>
> it may just be slower machines: why not provide some overrideable timeouts.
>
>
> >
> > Should we provide patches for these sort of changes, so that others get
> > increased reliability? Or, conversely, does this indicate issues with our
> > test environments? How open are the tests to being changed for
> reliability?
> >
> >
>
> VMs have odd clock drift, especially if you don't have enough RAM to stop
> them being swapped out -time gets jerkier.
>
> Try adding more memory to your physical hosts, make sure they are getting
> host time not NTP, and that the TZs are consistent.
>
> remember: they're not real machines
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.
>



-- 
// - - -
// Bill Havanki
// Solutions Architect, Cloudera
// - - -

Re: Appropriate changes to increase test reliability

Posted by Steve Loughran <st...@hortonworks.com>.
On 18 October 2013 16:07, Bill Havanki <bh...@cloudera.com> wrote:

> Hello all,
>
> We've often found while running Accumulo tests, at least under 1.4.x and
> 1.5.x, that we need to tweak aspects of the tests to get them to run more
> reliably. This is usually extensions to timeouts defined for how long tests
> should be allowed to run, how long shutdowns are expected to take, and the
> like.
>
> We suspect the need to make these changes arises from us running tests on
> VMs, while the tests were perhaps originally aimed for execution on
> dedicated physical resources.
>


it may just be slower machines: why not provide some overrideable timeouts.


>
> Should we provide patches for these sort of changes, so that others get
> increased reliability? Or, conversely, does this indicate issues with our
> test environments? How open are the tests to being changed for reliability?
>
>

VMs have odd clock drift, especially if you don't have enough RAM to stop
them being swapped out -time gets jerkier.

Try adding more memory to your physical hosts, make sure they are getting
host time not NTP, and that the TZs are consistent.

remember: they're not real machines

-- 
CONFIDENTIALITY NOTICE
NOTICE: This message is intended for the use of the individual or entity to 
which it is addressed and may contain information that is confidential, 
privileged and exempt from disclosure under applicable law. If the reader 
of this message is not the intended recipient, you are hereby notified that 
any printing, copying, dissemination, distribution, disclosure or 
forwarding of this communication is strictly prohibited. If you have 
received this communication in error, please contact the sender immediately 
and delete it from your system. Thank You.