You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gabriel Hartmann (JIRA)" <ji...@apache.org> on 2015/10/02 19:07:28 UTC

[jira] [Issue Comment Deleted] (MESOS-3479) COMMAND Health Checks are not executed if the timeout is exceeded

     [ https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Gabriel Hartmann updated MESOS-3479:
------------------------------------
    Comment: was deleted

(was: If the health check process exits is the Task left in an indeterminate health state?  What is in an example of a command value which is not executable.  If it's a command health-check with value "foobar", we'll get something like "foobar: command not found" depending on the OS.  Exit code will be something like 127.  That's a fine health-check failure to me.

Again, I make the claim that if a Task is running, then it's health-check should be running.  It can fail and pursuant to the specification of interval, grace period etc. it should perhaps determine that the Task is unhealthy, but it should never stop running.  In particular if a health-check times out, I definitely don't think we should stop running health-checks on the Task.  A timeout should just count as a health-check failure.

Maybe I'm misunderstanding something and we actually agree.)

> COMMAND Health Checks are not executed if the timeout is exceeded
> -----------------------------------------------------------------
>
>                 Key: MESOS-3479
>                 URL: https://issues.apache.org/jira/browse/MESOS-3479
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Matthias Veit
>            Assignee: haosdent
>            Priority: Critical
>
> The issue first appeared as Marathon Bug: See here for reference: https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior: 
> - The mesos health check process get's killed, but the defined command process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior: 
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)