You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@aurora.apache.org by Santhosh Kumar Shanmugham <sa...@gmail.com> on 2016/12/02 08:43:49 UTC

Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/
-----------------------------------------------------------

Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.


Bugs: AURORA-1841
    https://issues.apache.org/jira/browse/AURORA-1841


Repository: aurora


Description
-------

It is possible to set the health checks such that a task can
continually fail health checks with intermittent successes and still
succeed an update. Essentially a task fails health checks during the
`initial_interval_secs` and an additional `max_consecutive_failures`,
and then perform a successful health check to become healthy.

To be backward compatible to the above configuration, include the
`max_consecutive_failures` when computing `max_attempts_to_running`.


Diffs
-----

  docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
  src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
  src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 

Diff: https://reviews.apache.org/r/54299/diff/


Testing
-------

build-support/jenkins/build.sh
src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.


> On Dec. 2, 2016, 11:54 a.m., Zameer Manji wrote:
> > It took me a long time to understand this after staring at the tests, but I think this is correct.
> > 
> > This is unfortunately a little complex to understand. For bonus points, would it be possible to encode some of this information in a diagram?
> > 
> > The tests are thourough, which makes me comfortable in shipping this change.

https://docs.google.com/document/d/1KOO0LC046k75TqQqJ4c0FQcVGbxvrn71E10wAjMorVY/edit?usp=sharing


- Santhosh Kumar


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157807
-----------------------------------------------------------


On Dec. 2, 2016, 12:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 12:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Zameer Manji <zm...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157807
-----------------------------------------------------------


Ship it!




It took me a long time to understand this after staring at the tests, but I think this is correct.

This is unfortunately a little complex to understand. For bonus points, would it be possible to encode some of this information in a diagram?

The tests are thourough, which makes me comfortable in shipping this change.

- Zameer Manji


On Dec. 2, 2016, 12:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 12:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.

> On Dec. 2, 2016, 1:44 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, lines 115-117
> > <https://reviews.apache.org/r/54299/diff/1/?file=1574585#file1574585line115>
> >
> >     There still exists the chance for a backwards incompatibility here. Under the previous watch-driven updates, a task could flip between failing and successful health checks, and as long as it's still running at the end of `watch_secs` the updater would consider it healthy and move on. With this new behavior, someone could configure a task in such a way that the max attempts are consumed without reaching `max_consecutive_failures` or `min_consecutive_successes` before `watch_secs` is elapsed, meaning that the task would fail.
> >     
> >     As we discussed earlier, if we make `watch_secs` and `min_consecutive_successes` mutually exclusive in the client, then the executor could only trigger the new behavior if the user opted in by setting `watch_secs` to 0 and `min_consecutive_successes` to non-zero.

I believe that the situation you are describing would occur only when `min_consecutive_successes > 1`, which means that user has already opted in for the new behavior.

#Old behavior:#
#*With `watch_secs`*#
Task starts in `RUNNING` state. Task has to report atleast 1 success within the first `initial_interval_secs + max_consecutive_failures * interval_secs` (no health checks are done during the `initial_interval_secs`, hence it means no `max_consecutive_failures + 1`). Following this, the task must report atleast 1 success after every `max_consecutive_failures` to remain in `RUNNING`, until `watch_secs` expires.

#*Without `watch_secs`#*
Task starts in `RUNNING` state. Task has to report atleast 1 success within the first `initial_interval_secs + max_consecutive_failures * interval_secs` (no health checks are done during the `initial_interval_secs`, hence it means no `max_consecutive_failures + 1`).

#New behavior:#
#*With `watch_secs`*#
Task has to report atleast `min_consecutive_successes` (default=1) within the first `initial_interval_secs + (max_consecutive_failures + min_consecutive_successes) * interval_secs` to move to `RUNNING` state. Following this, the task must report atleast 1 success after every `max_consecutive_failures` to remain in `RUNNING`, until `watch_secs` expires.

#*Without `watch_secs`#*
Task has to report atleast `min_consecutive_successes` (default=1) within the first `initial_interval_secs + (max_consecutive_failures + min_consecutive_successes) * interval_secs` to move to `RUNNING` state.

Once in `RUNNING`, `min_consecutive_successes` is irrelevant, since the only transition possible is from `RUNNING` to a terminal state. Hence it is enough for a task to report just 1 successes every `max_consecutive_failures` to remain healthy. One might argue that `min_consecutive_successes` is not at all necessary in the first place. On the other hand once can argue that, this will serve as a replacement mechanism in-place of `watch_secs` to enforce tighter healthiness conditions before treating a task as successfully updated, thereby avoiding bad updates from succeeding.

All in all, setting `min_consecutive_successes` to 1 as the default should provide us with the necessary backward-compatibility.

Please refer to the diagrams in the design document. https://docs.google.com/document/d/1KOO0LC046k75TqQqJ4c0FQcVGbxvrn71E10wAjMorVY/edit?usp=sharing

> On Dec. 2, 2016, 1:44 p.m., Joshua Cohen wrote:
> > src/main/python/apache/aurora/executor/common/health_checker.py, line 113
> > <https://reviews.apache.org/r/54299/diff/1/?file=1574585#file1574585line113>
> >
> >     s/suppose/supposed

Done.

- Santhosh Kumar

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157764
-----------------------------------------------------------

On Dec. 2, 2016, 12:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 12:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Joshua Cohen <jc...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157764
-----------------------------------------------------------




src/main/python/apache/aurora/executor/common/health_checker.py (line 113)
<https://reviews.apache.org/r/54299/#comment228376>

    s/suppose/supposed



src/main/python/apache/aurora/executor/common/health_checker.py (lines 115 - 117)
<https://reviews.apache.org/r/54299/#comment228434>

    There still exists the chance for a backwards incompatibility here. Under the previous watch-driven updates, a task could flip between failing and successful health checks, and as long as it's still running at the end of `watch_secs` the updater would consider it healthy and move on. With this new behavior, someone could configure a task in such a way that the max attempts are consumed without reaching `max_consecutive_failures` or `min_consecutive_successes` before `watch_secs` is elapsed, meaning that the task would fail.
    
    As we discussed earlier, if we make `watch_secs` and `min_consecutive_successes` mutually exclusive in the client, then the executor could only trigger the new behavior if the user opted in by setting `watch_secs` to 0 and `min_consecutive_successes` to non-zero.


- Joshua Cohen


On Dec. 2, 2016, 8:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 8:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Aurora ReviewBot <wf...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review158183
-----------------------------------------------------------


Ship it!




Master (4bc5246) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Dec. 6, 2016, 4:32 p.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 6, 2016, 4:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by David McLaughlin <da...@dmclaughlin.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review158182
-----------------------------------------------------------


Ship it!




Ship It!

- David McLaughlin


On Dec. 6, 2016, 4:32 p.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 6, 2016, 4:32 p.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Joshua Cohen <jc...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review158490
-----------------------------------------------------------


Ship it!




Ship It!

- Joshua Cohen


On Dec. 8, 2016, 12:15 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 12:15 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/aurora_executor.py d01fcb9594552eb6cdfbdbab2d03707738df3443 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Aurora ReviewBot <wf...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review158456
-----------------------------------------------------------


Ship it!




Master (91ddb07) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Dec. 8, 2016, 12:15 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 8, 2016, 12:15 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/aurora_executor.py d01fcb9594552eb6cdfbdbab2d03707738df3443 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/
-----------------------------------------------------------

(Updated Dec. 7, 2016, 4:15 p.m.)


Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.


Changes
-------

Return only the reason field instead of the entire StatusResult object.


Bugs: AURORA-1841
    https://issues.apache.org/jira/browse/AURORA-1841


Repository: aurora


Description
-------

It is possible to set the health checks such that a task can
continually fail health checks with intermittent successes and still
succeed an update. Essentially a task fails health checks during the
`initial_interval_secs` and an additional `max_consecutive_failures`,
and then perform a successful health check to become healthy.

To be backward compatible to the above configuration, include the
`max_consecutive_failures` when computing `max_attempts_to_running`.


Diffs (updated)
-----

  docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
  src/main/python/apache/aurora/executor/aurora_executor.py d01fcb9594552eb6cdfbdbab2d03707738df3443 
  src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
  src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 

Diff: https://reviews.apache.org/r/54299/diff/


Testing
-------

build-support/jenkins/build.sh
src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh


Thanks,

Santhosh Kumar Shanmugham

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Santhosh Kumar Shanmugham <sa...@gmail.com>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/
-----------------------------------------------------------

(Updated Dec. 6, 2016, 5:32 p.m.)

Review request for Aurora, David McLaughlin, Joshua Cohen, and Zameer Manji.

Changes
-------

I don't have the necessary time to review this one properly. Sorry.

Bugs: AURORA-1841
https://issues.apache.org/jira/browse/AURORA-1841

Repository: aurora

Description
-------

It is possible to set the health checks such that a task can
continually fail health checks with intermittent successes and still
succeed an update. Essentially a task fails health checks during the
`initial_interval_secs` and an additional `max_consecutive_failures`,
and then perform a successful health check to become healthy.

To be backward compatible to the above configuration, include the
`max_consecutive_failures` when computing `max_attempts_to_running`.

Diffs (updated)
-----

docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b
src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49
src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2

Diff: https://reviews.apache.org/r/54299/diff/

Testing
-------

build-support/jenkins/build.sh
src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh

Thanks,

Santhosh Kumar Shanmugham

Re: Review Request 54299: Extend warm-up time by `max_consecutive_failures` attempts.

Posted by Aurora ReviewBot <wf...@apache.org>.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54299/#review157722
-----------------------------------------------------------


Ship it!




Master (3ea0331) is green with this patch.
  ./build-support/jenkins/build.sh

I will refresh this build result if you post a review containing "@ReviewBot retry"

- Aurora ReviewBot


On Dec. 2, 2016, 8:43 a.m., Santhosh Kumar Shanmugham wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54299/
> -----------------------------------------------------------
> 
> (Updated Dec. 2, 2016, 8:43 a.m.)
> 
> 
> Review request for Aurora, David McLaughlin, Joshua Cohen, Stephan Erb, and Zameer Manji.
> 
> 
> Bugs: AURORA-1841
>     https://issues.apache.org/jira/browse/AURORA-1841
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> It is possible to set the health checks such that a task can
> continually fail health checks with intermittent successes and still
> succeed an update. Essentially a task fails health checks during the
> `initial_interval_secs` and an additional `max_consecutive_failures`,
> and then perform a successful health check to become healthy.
> 
> To be backward compatible to the above configuration, include the
> `max_consecutive_failures` when computing `max_attempts_to_running`.
> 
> 
> Diffs
> -----
> 
>   docs/features/services.md 50189eeff26ce9614d092f6abd9246788647fe2b 
>   src/main/python/apache/aurora/executor/common/health_checker.py 12af9d8635a553eabe918a86508aa6ce2fd78a49 
>   src/test/python/apache/aurora/executor/common/test_health_checker.py e2a7f164a24f49dd1f4cdba136e838b9d42d73a2 
> 
> Diff: https://reviews.apache.org/r/54299/diff/
> 
> 
> Testing
> -------
> 
> build-support/jenkins/build.sh
> src/test/sh/org/apacher/aurora/e2e/test_end_to_end.sh
> 
> 
> Thanks,
> 
> Santhosh Kumar Shanmugham
> 
>