You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jie Yu (JIRA)" <ji...@apache.org> on 2018/10/11 05:09:00 UTC

[jira] [Created] (MESOS-9307) Libprocess should have a way to detect stuck actor.

Jie Yu created MESOS-9307:
-----------------------------

             Summary: Libprocess should have a way to detect stuck actor.
                 Key: MESOS-9307
                 URL: https://issues.apache.org/jira/browse/MESOS-9307
             Project: Mesos
          Issue Type: Improvement
          Components: libprocess
            Reporter: Jie Yu


We spent two days on a bug, which turns out to be an infinite loop in an actor, blocking other events from being processed by that actor.

Currently, the only way to know about a stuck agent is to use gdb. We should think about a way to print error logs when an actor has stuck for more than a threshold.

For instance, Linux kernel will print a warning in kernel log if a task is stuck for more than 120 seconds. Something like this will be extremely helpful.

Another way is to expose some metrics around this.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)