You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Mahler (JIRA)" <ji...@apache.org> on 2013/09/20 01:13:53 UTC

[jira] [Updated] (MESOS-540) Executor health checking.

     [ https://issues.apache.org/jira/browse/MESOS-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Benjamin Mahler updated MESOS-540:
----------------------------------

    Description: 
We currently do not health check running executors.

At Twitter, this has led to out-of-band health checking of executors for an internal framework.

For the Storm framework, this has led to out-of-band health checking via ZooKeeper. Health checking would allow Storm to use finer grained executors for better isolation.

This also helps the Hadoop and Jenkins frameworks as well should health checking be desired.

As for implementation, I would propose adding a call on the Executor interface:

/**
 * Invoked by the ExecutorDriver to determine the health of the executor.
 * When this function returns, the Executor is considered healthy.
 */
void heartbeat(ExecutorDriver* driver) = 0;

The driver can then heartbeat periodically and kill when the Executor is not responding to heartbeats. The driver should also detect the executor deadlocking on any of the other callbacks.

  was:
We currently do not health check running executors.

At Twitter, this has led to out-of-band health checking of executors for an internal framework.

For the Storm framework, this has led to out-of-band health checking via ZooKeeper. Health checking would allow Storm to use finer grained executors for better isolation.

This also helps the Hadoop and Jenkins frameworks as well should health checking be desired.

As for implementation, I would propose adding a call on the Executor interface:

/**
 * Invoked by the ExecutorDriver to determine the health of the executor.
 * When this function returns, the Executor is considered healthy.
 */
void heartbeat(ExecutorDriver* driver) = 0;

The driver can then heartbeat periodically and kill when the Executor is not responding to heartbeats.

    
> Executor health checking.
> -------------------------
>
>                 Key: MESOS-540
>                 URL: https://issues.apache.org/jira/browse/MESOS-540
>             Project: Mesos
>          Issue Type: Improvement
>            Reporter: Benjamin Mahler
>
> We currently do not health check running executors.
> At Twitter, this has led to out-of-band health checking of executors for an internal framework.
> For the Storm framework, this has led to out-of-band health checking via ZooKeeper. Health checking would allow Storm to use finer grained executors for better isolation.
> This also helps the Hadoop and Jenkins frameworks as well should health checking be desired.
> As for implementation, I would propose adding a call on the Executor interface:
> /**
>  * Invoked by the ExecutorDriver to determine the health of the executor.
>  * When this function returns, the Executor is considered healthy.
>  */
> void heartbeat(ExecutorDriver* driver) = 0;
> The driver can then heartbeat periodically and kill when the Executor is not responding to heartbeats. The driver should also detect the executor deadlocking on any of the other callbacks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira