You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@aurora.apache.org by "Stephan Erb (JIRA)" <ji...@apache.org> on 2015/05/04 09:45:05 UTC

[jira] [Commented] (AURORA-1275) Don't delay kill sequence when HTTP teardown signal could not be dispatched

    [ https://issues.apache.org/jira/browse/AURORA-1275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14526373#comment-14526373 ] 

Stephan Erb commented on AURORA-1275:
-------------------------------------

{code}
This is now on master.

$ git log -1
commit 10e75fc1e3649ddbcf3f810dbaba960ae35ee94a
Author: Stephan Erb stephan@dev.static-void.de
Date:   Wed Apr 15 12:01:29 2015 -0700

Only perform escalation wait when http teardown signal could be dispatched

Testing Done:
./pants test.pytest --no-fast --options=-v src/test/python/apache/aurora/executor:thermos_task_runner

In addition, manual verification that shutdown of health-checked services without lifecycle methods is 10 seconds faster.

Bugs closed: AURORA-1275

Reviewed at https://reviews.apache.org/r/3288
{code}

> Don't delay kill sequence when HTTP teardown signal could not be dispatched
> ---------------------------------------------------------------------------
>
>                 Key: AURORA-1275
>                 URL: https://issues.apache.org/jira/browse/AURORA-1275
>             Project: Aurora
>          Issue Type: Story
>          Components: Executor, Thermos
>            Reporter: Stephan Erb
>            Assignee: Stephan Erb
>
> The Thermos task runner performs a kill escalation sequence by dispatching calls to the HTTP endpoints /quitquitquit and /abortabortabort before stopping a task for good. After each of these calls it waits for 5 seconds.
> The runner should not perform the waiting when the corresponding shutdown request could not be dispatched. 
> Advantages:
> * Services which use a health port but don't implement /qqq and /aaa are not taxed by a total of 10 seconds waiting time for each kill operation
> * The whole system is faster to react when services are meant to be restarted due to failing health checks. Failing health checks often imply an application is not responding at all, including the endpoints /qqq and /aaa



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)