You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Gabriel Hartmann (JIRA)" <ji...@apache.org> on 2015/10/01 01:30:04 UTC

[jira] [Comment Edited] (MESOS-3479) COMMAND Health Checks are not executed if the timeout is exceeded

    [ https://issues.apache.org/jira/browse/MESOS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14939054#comment-14939054 ] 

Gabriel Hartmann edited comment on MESOS-3479 at 9/30/15 11:29 PM:
-------------------------------------------------------------------

[~haosdent@gmail.com]:  I'm seeing this issue as well.  The config is like this:
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 60,
"maxConsecutiveFailures": 3

We fail early on 3 times in a row.  Then the 4th attempt takes more than 60s to eventually fail/timeout.  While it's running a 5th attempt is started (it succeeds).  All this occurs before expiration of the grace period.  The 5th attempt is the last attempt.  No more health checks are made.  Marathon never receives a health check report.

Is there an ETA for a fix for this?  It's very disruptive to frameworks converging to a healthy or unhealthy state.  Marathon in this case will see the framework as having 1 running task, with 0 staging, 0 healthy, and 0 unhealthy. 


was (Author: gabriel.hartmann@gmail.com):
[~haosdent@gmail.com]:  I'm seeing this issue as well.  The config is like this:
"gracePeriodSeconds": 300,
"intervalSeconds": 60,
"timeoutSeconds": 60,
"maxConsecutiveFailures": 3

We fail early on 3 times in a row.  Then the 4th attempt takes more than 60s to eventually fail.  While it's running a 5th attempt is started (it succeeds).  All this occurs before expiration of the grace period.  The 5th attempt is the last attempt.  No more health checks are made.  Marathon never receives a health check report.

Is there an ETA for a fix for this?  It's very disruptive to frameworks converging to a healthy or unhealthy state.  Marathon in this case will see the framework as having 1 running task, with 0 staging, 0 healthy, and 0 unhealthy. 

> COMMAND Health Checks are not executed if the timeout is exceeded
> -----------------------------------------------------------------
>
>                 Key: MESOS-3479
>                 URL: https://issues.apache.org/jira/browse/MESOS-3479
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.23.0
>            Reporter: Matthias Veit
>            Assignee: haosdent
>            Priority: Critical
>
> The issue first appeared as Marathon Bug: See here for reference: https://github.com/mesosphere/marathon/issues/2179.
> A COMMAND health check is defined with a timeout of 20 seconds.
> The command itself takes longer than 20 seconds to execute.
> Current behavior: 
> - The mesos health check process get's killed, but the defined command process not (in the example the curl command returns after 21 seconds).
> - The check attempt is considered healthy, if the timeout is exceeded
> - The health check stops and is not executed any longer
> Expected behavior: 
> - The defined health check command is killed, when the timeout is exceeded
> - The check attempt is considered Unhealthy, if the timeout is exceeded
> - The health check does not stop 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)