You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by "Benjamin Hindman (JIRA)" <ji...@apache.org> on 2014/06/16 20:50:03 UTC

[jira] [Commented] (MESOS-741) Add health checking for tasks

    [ https://issues.apache.org/jira/browse/MESOS-741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032769#comment-14032769 ] 

Benjamin Hindman commented on MESOS-741:
----------------------------------------

Took a brief look at https://reviews.apache.org/r/22579 and saw the outcome of " I propose that we pass the PID of the executor to the health-checker program and send protobufs back with the health updates". Any reason not to just have the health-checker output the protobufs on it's stdout? It could even do this as JSON for human consumption and/ as recordio as we're already doing in src/usage/main.cpp for the mesos-usage utility. The value here is that the tool becomes standalone that someone can compose with even if they don't have the executor-like model.

> Add health checking for tasks
> -----------------------------
>
>                 Key: MESOS-741
>                 URL: https://issues.apache.org/jira/browse/MESOS-741
>             Project: Mesos
>          Issue Type: Story
>          Components: master, slave
>            Reporter: Niklas Quarfot Nielsen
>            Assignee: Timothy Chen
>
> Determining the health of a task during its lifetime (during start up, while it is running, shutting down etc.) can be considered a more elaborate matter than only observing its process state.
> The task health might be determined by any combination of observable behavior; for example the process being listening to a certain range of ports, writing certain files or pipes, responding to messages, utilizing resources to or below certain thresholds etc.
> It could be a powerful extension to extend the interface for launching and running tasks by an optional HealthCommand message. This message could encode:
> 1) A command to be run at the slave to determine the health of the task. The return value of the command will tell if the task is healthy or unhealthy. 
> 2) An interval which to run the health command.
> In connection with this, it could make sense to introduce new healthy and unhealthy task states.



--
This message was sent by Atlassian JIRA
(v6.2#6252)