You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Kevin Sweeney <ke...@apache.org> on 2014/10/03 02:03:10 UTC

Review Request 26300: Sleep 10sec instead of 5sec during GC shutdown

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------

Review request for Aurora, Vinod Kone and Brian Wickman.


Bugs: AURORA-788
    https://issues.apache.org/jira/browse/AURORA-788


Repository: aurora


Description
-------

Sleep 10sec instead of 5sec during GC shutdown


Diffs
-----

  src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 

Diff: https://reviews.apache.org/r/26300/diff/


Testing
-------


Thanks,

Kevin Sweeney


Re: Review Request 26300: Sleep 10sec instead of 5sec during GC shutdown

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55299
-----------------------------------------------------------

Ship it!


http://media.giphy.com/media/7LG6PqAubrWBa/giphy.gif

- Brian Wickman


On Oct. 3, 2014, 12:03 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 12:03 a.m.)
> 
> 
> Review request for Aurora, Vinod Kone and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Sleep 10sec instead of 5sec during GC shutdown
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Sleep 10sec instead of 5sec during GC shutdown

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55312
-----------------------------------------------------------


Upon discussion with Vinod Kone and Ben Mahler this doesn't seem like the right approach - driver.stop() doesn't actually send a signal to the slave that it should not expect further communications from the executor, so the race will always exist.

Upon further dissection of this code I'm questioning the need to quit after 15 minutes of inactivity. Updated patch incoming.

- Kevin Sweeney


On Oct. 2, 2014, 5:03 p.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 2, 2014, 5:03 p.m.)
> 
> 
> Review request for Aurora, Vinod Kone and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> Sleep 10sec instead of 5sec during GC shutdown
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55318
-----------------------------------------------------------


Thanks Kevin! Looks good to me, will defer to wickman for a Ship It.

- Ben Mahler


On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 1:59 a.m.)
> 
> 
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
> 
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
>   src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:gc_executor
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------

(Updated Oct. 8, 2014, 11:56 a.m.)


Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.


Changes
-------

rebase


Bugs: AURORA-788
    https://issues.apache.org/jira/browse/AURORA-788


Repository: aurora


Description
-------

The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).

Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/gc_executor.py 788671e32d2f995b954e906d8866019ed66c970d 
  src/test/python/apache/aurora/executor/test_gc_executor.py 1905fe35de31e667f499f7945952f04dc13aac06 

Diff: https://reviews.apache.org/r/26300/diff/


Testing
-------

./pants src/test/python/apache/aurora/executor:gc_executor


Thanks,

Kevin Sweeney


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55350
-----------------------------------------------------------

Ship it!


Ship It!

- Brian Wickman


On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 1:59 a.m.)
> 
> 
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
> 
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
>   src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:gc_executor
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55325
-----------------------------------------------------------


lgtm. i'll let wickman give a shipit.

- Vinod Kone


On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 1:59 a.m.)
> 
> 
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
> 
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
>   src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:gc_executor
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55821
-----------------------------------------------------------

Ship it!


Ship It!

- David McLaughlin


On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
> 
> (Updated Oct. 3, 2014, 1:59 a.m.)
> 
> 
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
> 
> 
> Bugs: AURORA-788
>     https://issues.apache.org/jira/browse/AURORA-788
> 
> 
> Repository: aurora
> 
> 
> Description
> -------
> 
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
> 
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
> 
> 
> Diffs
> -----
> 
>   src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
>   src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 
> 
> Diff: https://reviews.apache.org/r/26300/diff/
> 
> 
> Testing
> -------
> 
> ./pants src/test/python/apache/aurora/executor:gc_executor
> 
> 
> Thanks,
> 
> Kevin Sweeney
> 
>


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------

(Updated Oct. 2, 2014, 6:59 p.m.)


Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.


Bugs: AURORA-788
    https://issues.apache.org/jira/browse/AURORA-788


Repository: aurora


Description
-------

The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).

Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.


Diffs
-----

  src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
  src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 

Diff: https://reviews.apache.org/r/26300/diff/


Testing (updated)
-------

./pants src/test/python/apache/aurora/executor:gc_executor


Thanks,

Kevin Sweeney


Re: Review Request 26300: Don't kill GC Executor after period of inactivity

Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------

(Updated Oct. 2, 2014, 6:59 p.m.)


Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.


Changes
-------

update patch and description, +bmahler


Summary (updated)
-----------------

Don't kill GC Executor after period of inactivity


Bugs: AURORA-788
    https://issues.apache.org/jira/browse/AURORA-788


Repository: aurora


Description (updated)
-------

The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).

Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.


Diffs (updated)
-----

  src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47 
  src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943 

Diff: https://reviews.apache.org/r/26300/diff/


Testing
-------


Thanks,

Kevin Sweeney