You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by Michael McCandless <lu...@mikemccandless.com> on 2015/03/19 18:24:46 UTC

Tests with beating hearts

How hard/reasonable would it be to see a full thread dump on every
heartbeat of a slow running test when the test finally times out?

I ask because ... I've seen a number of recent test failures that seem
to be hung, but since we only get a single full thread dump when the
test finally experiences a heart attack, we can't really know.

If we had N thread dumps, we could compare and see if the offending
thread(s) are caught in different places each time?

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Tests with beating hearts

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> I wouldn't want to see the thread stacks until the test finally timed
> out at which point I'd like to see N of them.

I don't think it will be possible since there is no explicit trigger
that signals a timeout... although maybe there is -- I think the test
framework attempts to send an interrupt signal to all threads within
the test group; maybe this could be used as a stimulus for dumping
previously saved stack traces.

I once had a more "intelligent" periodic stack analyzer -- one that
analyzed a series of stack traces and looked for the common root, the
diverging stack frame, etc. It could tell you immediately whether a
given thread was stalled on something or if it was running in a loop
under a given method, etc. It should be in randomized runner's
history.

> It would spawn a new thread for each test case right?  And it'd have
> to stop that thread when the test completes (success or failure)...

Pretty much. Otherwise you'd run into problems with thread leak
detection that's built into the runner. Or you could make this thread
run per the entire suite, not for each individual test -- this would
be much faster (in particular in the presence of repeats).

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Tests with beating hearts

Posted by Michael McCandless <lu...@mikemccandless.com>.

On Thu, Mar 19, 2015 at 1:38 PM, Dawid Weiss
<da...@cs.put.poznan.pl> wrote:

> There is no extra "heartbeat" emitted form the forked process back to
> the parent, but there could be. If a stalled process emitted a thread
> dump on every heartbeat this would look scary though (especially for
> slower machines/ tests), wouldn't it?

I agree: this would be too much.

> It's this issue, Mike:
> https://github.com/carrotsearch/randomizedtesting/issues/132

Thanks.

> I'd love to say I can look at fixing it soon, but I have plenty of
> things on the backlog right now. I promise to return to it eventually.
> Sorry!

OK!

> Also, this does *not* have to be implemented in the runner itself...
> it can be a JUnit test rule that would spawn its own watchdog and, if
> the test doesn't return in the given deadline, takes periodic stack
> trace probes, emitting them to regular sysout. You wouldn't get these
> dumps immediately (because sysout is suppressed from console,
> normally), but eventually if a test times out all of the sysouts would
> be printed.

This would maybe be a good solution...

I wouldn't want to see the thread stacks until the test finally timed
out at which point I'd like to see N of them.

It would spawn a new thread for each test case right?  And it'd have
to stop that thread when the test completes (success or failure)...

Mike McCandless

http://blog.mikemccandless.com

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Tests with beating hearts

Posted by Dawid Weiss <da...@cs.put.poznan.pl>.

> How hard/reasonable would it be to see a full thread dump on every
> heartbeat of a slow running test when the test finally times out?

The "heartbeat" message is not actually originating from the forked
process, it is merely an information issued by the test runner to let
you know that the forked process:

1) hasn't died,
2) the forked runner's main method has not completed.

There is no extra "heartbeat" emitted form the forked process back to
the parent, but there could be. If a stalled process emitted a thread
dump on every heartbeat this would look scary though (especially for
slower machines/ tests), wouldn't it?

It's this issue, Mike:
https://github.com/carrotsearch/randomizedtesting/issues/132

I'd love to say I can look at fixing it soon, but I have plenty of
things on the backlog right now. I promise to return to it eventually.
Sorry!

Also, this does *not* have to be implemented in the runner itself...
it can be a JUnit test rule that would spawn its own watchdog and, if
the test doesn't return in the given deadline, takes periodic stack
trace probes, emitting them to regular sysout. You wouldn't get these
dumps immediately (because sysout is suppressed from console,
normally), but eventually if a test times out all of the sysouts would
be printed.

Dawid

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org

Re: Tests with beating hearts

Posted by Robert Muir <rc...@gmail.com>.

+1, this is a great idea.

On Thu, Mar 19, 2015 at 1:24 PM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> How hard/reasonable would it be to see a full thread dump on every
> heartbeat of a slow running test when the test finally times out?
>
> I ask because ... I've seen a number of recent test failures that seem
> to be hung, but since we only get a single full thread dump when the
> test finally experiences a heart attack, we can't really know.
>
> If we had N thread dumps, we could compare and see if the offending
> thread(s) are caught in different places each time?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org