You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@aurora.apache.org by Kevin Sweeney <ke...@apache.org> on 2014/10/03 02:03:10 UTC
Review Request 26300: Sleep 10sec instead of 5sec during GC shutdown
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------
Review request for Aurora, Vinod Kone and Brian Wickman.
Bugs: AURORA-788
https://issues.apache.org/jira/browse/AURORA-788
Repository: aurora
Description
-------
Sleep 10sec instead of 5sec during GC shutdown
Diffs
-----
src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
Diff: https://reviews.apache.org/r/26300/diff/
Testing
-------
Thanks,
Kevin Sweeney
Re: Review Request 26300: Sleep 10sec instead of 5sec during GC
shutdown
Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55299
-----------------------------------------------------------
Ship it!
http://media.giphy.com/media/7LG6PqAubrWBa/giphy.gif
- Brian Wickman
On Oct. 3, 2014, 12:03 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2014, 12:03 a.m.)
>
>
> Review request for Aurora, Vinod Kone and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Sleep 10sec instead of 5sec during GC shutdown
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Sleep 10sec instead of 5sec during GC
shutdown
Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55312
-----------------------------------------------------------
Upon discussion with Vinod Kone and Ben Mahler this doesn't seem like the right approach - driver.stop() doesn't actually send a signal to the slave that it should not expect further communications from the executor, so the race will always exist.
Upon further dissection of this code I'm questioning the need to quit after 15 minutes of inactivity. Updated patch incoming.
- Kevin Sweeney
On Oct. 2, 2014, 5:03 p.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 2, 2014, 5:03 p.m.)
>
>
> Review request for Aurora, Vinod Kone and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> Sleep 10sec instead of 5sec during GC shutdown
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55318
-----------------------------------------------------------
Thanks Kevin! Looks good to me, will defer to wickman for a Ship It.
- Ben Mahler
On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2014, 1:59 a.m.)
>
>
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
>
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
> src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
> ./pants src/test/python/apache/aurora/executor:gc_executor
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------
(Updated Oct. 8, 2014, 11:56 a.m.)
Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
Changes
-------
rebase
Bugs: AURORA-788
https://issues.apache.org/jira/browse/AURORA-788
Repository: aurora
Description
-------
The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
Diffs (updated)
-----
src/main/python/apache/aurora/executor/gc_executor.py 788671e32d2f995b954e906d8866019ed66c970d
src/test/python/apache/aurora/executor/test_gc_executor.py 1905fe35de31e667f499f7945952f04dc13aac06
Diff: https://reviews.apache.org/r/26300/diff/
Testing
-------
./pants src/test/python/apache/aurora/executor:gc_executor
Thanks,
Kevin Sweeney
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Brian Wickman <wi...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55350
-----------------------------------------------------------
Ship it!
Ship It!
- Brian Wickman
On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2014, 1:59 a.m.)
>
>
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
>
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
> src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
> ./pants src/test/python/apache/aurora/executor:gc_executor
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55325
-----------------------------------------------------------
lgtm. i'll let wickman give a shipit.
- Vinod Kone
On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2014, 1:59 a.m.)
>
>
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
>
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
> src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
> ./pants src/test/python/apache/aurora/executor:gc_executor
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by David McLaughlin <da...@dmclaughlin.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/#review55821
-----------------------------------------------------------
Ship it!
Ship It!
- David McLaughlin
On Oct. 3, 2014, 1:59 a.m., Kevin Sweeney wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/26300/
> -----------------------------------------------------------
>
> (Updated Oct. 3, 2014, 1:59 a.m.)
>
>
> Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
>
>
> Bugs: AURORA-788
> https://issues.apache.org/jira/browse/AURORA-788
>
>
> Repository: aurora
>
>
> Description
> -------
>
> The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
>
> Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
>
>
> Diffs
> -----
>
> src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
> src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
>
> Diff: https://reviews.apache.org/r/26300/diff/
>
>
> Testing
> -------
>
> ./pants src/test/python/apache/aurora/executor:gc_executor
>
>
> Thanks,
>
> Kevin Sweeney
>
>
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------
(Updated Oct. 2, 2014, 6:59 p.m.)
Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
Bugs: AURORA-788
https://issues.apache.org/jira/browse/AURORA-788
Repository: aurora
Description
-------
The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
Diffs
-----
src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
Diff: https://reviews.apache.org/r/26300/diff/
Testing (updated)
-------
./pants src/test/python/apache/aurora/executor:gc_executor
Thanks,
Kevin Sweeney
Re: Review Request 26300: Don't kill GC Executor after period of
inactivity
Posted by Kevin Sweeney <ke...@apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/26300/
-----------------------------------------------------------
(Updated Oct. 2, 2014, 6:59 p.m.)
Review request for Aurora, Ben Mahler, Vinod Kone, and Brian Wickman.
Changes
-------
update patch and description, +bmahler
Summary (updated)
-----------------
Don't kill GC Executor after period of inactivity
Bugs: AURORA-788
https://issues.apache.org/jira/browse/AURORA-788
Repository: aurora
Description (updated)
-------
The GC executor is configured to exit after 15 minutes of inactivity. This leads to a race where the mesos slave gets a launchTask message for a GC executor just as the executor has exited, causing TASK_LOST noise. This also increases the risk that a slave will lose its GC executor due to the scheduler not being able to find a slot for it (since GC executors will have a higher churn rate).
Cluster operators will still be able to deploy new versions of the GC executor as the 24-hour max lifetime limit is still in place. This patch only removes the inactivity limit.
Diffs (updated)
-----
src/main/python/apache/aurora/executor/gc_executor.py 44eb0da984a126536f0d277da3da128089201a47
src/test/python/apache/aurora/executor/test_gc_executor.py 774c9ba0d5c31fc4c46dbe257579e013460fa943
Diff: https://reviews.apache.org/r/26300/diff/
Testing
-------
Thanks,
Kevin Sweeney