You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Stephan Erb (JIRA)" <ji...@apache.org> on 2016/02/08 23:44:39 UTC

[jira] [Commented] (AURORA-1500) Platform SLA gets stuck in DOWN when a replacement PENDING is killed

    [ https://issues.apache.org/jira/browse/AURORA-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15137911#comment-15137911 ] 

Stephan Erb commented on AURORA-1500:
-------------------------------------

Relevant piece of code responsible for untracable deletes of PENDING tasks: https://github.com/apache/aurora/blob/9ed81a7db58f6a7cb308c8ac6a545705351c8c0e/src/main/java/org/apache/aurora/scheduler/state/TaskStateMachine.java#L442 (thanks Maxim for pointing out :-)

> Platform SLA gets stuck in DOWN when a replacement PENDING is killed
> --------------------------------------------------------------------
>
>                 Key: AURORA-1500
>                 URL: https://issues.apache.org/jira/browse/AURORA-1500
>             Project: Aurora
>          Issue Type: Bug
>          Components: Scheduler
>            Reporter: Maxim Khutornenko
>
> The way platform SLA calculation is currently done cannot account for some special cases when killed tasks don't leave any history behind. One example: a task gets LOST (SLA DOWN interval starts) and its replacement is scheduled immediately. If, however, the replacement task gets killed while still in PENDING, no history is left to close the DOWN interval and the platform SLA is degraded until either a new matching instance task is created by user or the task history is purged.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)