You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Zameer Manji (JIRA)" <ji...@apache.org> on 2016/10/11 21:26:20 UTC
[jira] [Commented] (AURORA-1791) Commit ca683 is not backwards
compatible.
[ https://issues.apache.org/jira/browse/AURORA-1791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15566667#comment-15566667 ]
Zameer Manji commented on AURORA-1791:
--------------------------------------
Note, I could be wrong here but this was deployed to a cluster and tasks that were healthy before started to fail.
> Commit ca683 is not backwards compatible.
> -----------------------------------------
>
> Key: AURORA-1791
> URL: https://issues.apache.org/jira/browse/AURORA-1791
> Project: Aurora
> Issue Type: Bug
> Reporter: Zameer Manji
> Assignee: Kai Huang
> Priority: Blocker
>
> The commit [ca683cb9e27bae76424a687bc6c3af5a73c501b9 | https://github.com/apache/aurora/commit/ca683cb9e27bae76424a687bc6c3af5a73c501b9] is not backwards compatible. The last section of the commit
> {quote}
> 4. Modified the Health Checker and redefined the meaning initial_interval_secs.
> {quote}
> has serious, unintended consequences.
> Consider the following health check config:
> {noformat}
> initial_interval_secs: 10
> interval_secs: 5
> max_consecutive_failures: 1
> {noformat}
> On the 0.16.0 executor, no health checking will occur for the first 10 seconds. Here the earliest a task can cause failure is at the 10th second.
> On master, health checking starts right away which means the task can fail at the first second since {{max_consecutive_failures}} is set to 1.
> This is not backwards compatible and needs to be fixed.
> I think a good solution would be to revert the meaning change to initial_interval_secs and have the task transition into RUNNING when {{max_consecutive_successes}} is met.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)