You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Alex Rukletsov <al...@mesosphere.io> on 2014/08/27 23:55:53 UTC

Crashed task is not reaped

While playing with Rendler <https://github.com/mesosphere/RENDLER> I
noticed that if the task (read: python executor) crashes, the underlying
executor stays alive and therefore is not reaped, which renders the task
running indefinitely. Here
<https://gist.github.com/rukletsov/4a74743c5b67f304e661> is a part of the
slave log (exception itself doesn't matter, it's there to test the
behaviour). Not sure, whether it's a bug or a feature, for me it looks like
a bug.

Regards,
Alex

Re: Crashed task is not reaped

Posted by Alex Rukletsov <al...@mesosphere.io>.
Hi Brian,

thanks for the answer. This sounds reasonable, it would be nice to somehow
enforce this "crash-if-fail" behaviour in client executors, but it seems
barely possible.

Alex


On Thu, Aug 28, 2014 at 12:56 AM, Brian Wickman <wi...@apache.org> wrote:

> A "crashed" thread does not terminate the Python interpreter, so the
> executor here will stay alive.  If you want an abnormal thread exit to
> result in an executor termination, you will have to implement that behavior
> explicitly.  We use a thread liveness detector that looks something like:
> https://gist.github.com/wickman/dc11896d782f9a2160b8
>
> When you create a thread, you do registry.register(thread).  That thread
> should call registry.unregister(self) prior to terminating normally.  If it
> terminates abnormally, the registry.dead event will be set.  Our MainThread
> in practice just does something like:
>
> while registry.dead.wait(timeout=10):
>   pass
>
> We also have a library (twitter.common.exceptions on pypi) that provides a
> class called ExceptionalThread which guarantees that sys.excepthook() is
> called.  You could implement similar behavior by making all threads
> ExceptionalThreads and wrapping sys.excepthook with something that provides
> an event to MainThread to signal termination as described above.
>
> ~brian
>
>
> On Wed, Aug 27, 2014 at 2:55 PM, Alex Rukletsov <al...@mesosphere.io>
> wrote:
>
> > While playing with Rendler <https://github.com/mesosphere/RENDLER> I
> > noticed that if the task (read: python executor) crashes, the underlying
> > executor stays alive and therefore is not reaped, which renders the task
> > running indefinitely. Here
> > <https://gist.github.com/rukletsov/4a74743c5b67f304e661> is a part of
> the
> > slave log (exception itself doesn't matter, it's there to test the
> > behaviour). Not sure, whether it's a bug or a feature, for me it looks
> like
> > a bug.
> >
> > Regards,
> > Alex
> >
>

Re: Crashed task is not reaped

Posted by Brian Wickman <wi...@apache.org>.
A "crashed" thread does not terminate the Python interpreter, so the
executor here will stay alive.  If you want an abnormal thread exit to
result in an executor termination, you will have to implement that behavior
explicitly.  We use a thread liveness detector that looks something like:
https://gist.github.com/wickman/dc11896d782f9a2160b8

When you create a thread, you do registry.register(thread).  That thread
should call registry.unregister(self) prior to terminating normally.  If it
terminates abnormally, the registry.dead event will be set.  Our MainThread
in practice just does something like:

while registry.dead.wait(timeout=10):
  pass

We also have a library (twitter.common.exceptions on pypi) that provides a
class called ExceptionalThread which guarantees that sys.excepthook() is
called.  You could implement similar behavior by making all threads
ExceptionalThreads and wrapping sys.excepthook with something that provides
an event to MainThread to signal termination as described above.

~brian


On Wed, Aug 27, 2014 at 2:55 PM, Alex Rukletsov <al...@mesosphere.io> wrote:

> While playing with Rendler <https://github.com/mesosphere/RENDLER> I
> noticed that if the task (read: python executor) crashes, the underlying
> executor stays alive and therefore is not reaped, which renders the task
> running indefinitely. Here
> <https://gist.github.com/rukletsov/4a74743c5b67f304e661> is a part of the
> slave log (exception itself doesn't matter, it's there to test the
> behaviour). Not sure, whether it's a bug or a feature, for me it looks like
> a bug.
>
> Regards,
> Alex
>