You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/04/02 04:04:54 UTC
Review Request: Fixed the slave to wait for all executors to exit and delete
the "latest" slave symlink when shutting down.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman and Ben Mahler.
Description
-------
See summary.
The main crux is two fold
1) Shutdown a slave after all executors have terminated
2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
Diffs
-----
src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
src/tests/slave_recovery_tests.cpp 47f9b0f215af2fb9bc300e0c92535b6f91afa5cd
Diff: https://reviews.apache.org/r/10233/diff/
Testing
-------
make check
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
Thanks,
Vinod Kone
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review19282
-----------------------------------------------------------
Ship it!
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39940>
move this above and place the loop in the else statement
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39936>
Can you move this into an else on the TERMINATED check?
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39937>
Can you move this into an else on the TERMINATED check?
shutdownExecutor should enforce its expected states, and then callers must only call it accordingly.
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39941>
Can you add a TODO to kill this, which means pushing Futures through and using collect() to terminate the slave after it's all finished.
src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/10233/#comment39942>
// TODO base this on the time variable of the reaper?
- Ben Mahler
On April 16, 2013, 12:39 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 16, 2013, 12:39 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/
-----------------------------------------------------------
(Updated April 17, 2013, 6:39 a.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
comments and rebase. no need for review.
Description
-------
See summary.
The main crux is two fold
1) Shutdown a slave after all executors have terminated
2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
Diffs (updated)
-----
src/slave/process_isolator.cpp d8d940f069d596a1f0d14832270ca94d9f2b2314
src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
src/tests/slave_recovery_tests.cpp 7be332b74367869e2b4c847fea348535cc526a23
Diff: https://reviews.apache.org/r/10233/diff/
Testing
-------
make check
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
Thanks,
Vinod Kone
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/
-----------------------------------------------------------
(Updated April 16, 2013, 12:39 a.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
benm's
Description
-------
See summary.
The main crux is two fold
1) Shutdown a slave after all executors have terminated
2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
Diffs (updated)
-----
src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
Diff: https://reviews.apache.org/r/10233/diff/
Testing
-------
make check
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
Thanks,
Vinod Kone
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
> On April 15, 2013, 1:06 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 292
> > <https://reviews.apache.org/r/10233/diff/3/?file=281061#file281061line292>
> >
> > Why this TODO if you've introduced the states in another change?
I used that to keep track.
> On April 15, 2013, 1:06 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 492
> > <https://reviews.apache.org/r/10233/diff/3/?file=281062#file281062line492>
> >
> > Just curious whether you're planning to implement the fix in another review?
Not yet. I don't like the timeout solution to be honest. If this becomes a problem in production, I will spend more time on a proper solution.
> On April 15, 2013, 1:06 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 520
> > <https://reviews.apache.org/r/10233/diff/3/?file=281062#file281062line520>
> >
> > You'll need to remove this check. You can also kill the Future argument.
I fixed this in a downstream review.
> On April 15, 2013, 1:06 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1715
> > <https://reviews.apache.org/r/10233/diff/3/?file=281062#file281062line1715>
> >
> > What's the fix?
as you might have seen, this is fixed in a downstream review.
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review19143
-----------------------------------------------------------
On April 15, 2013, 12:50 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 15, 2013, 12:50 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review19143
-----------------------------------------------------------
src/slave/slave.hpp
<https://reviews.apache.org/r/10233/#comment39701>
Why this TODO if you've introduced the states in another change?
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39702>
Just curious whether you're planning to implement the fix in another review?
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39703>
s/,//
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39706>
You'll need to remove this check. You can also kill the Future argument.
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39704>
s/.//
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39705>
What's the fix?
- Ben Mahler
On April 15, 2013, 12:50 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 15, 2013, 12:50 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/
-----------------------------------------------------------
(Updated April 15, 2013, 12:50 a.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
addressed comments.
Description
-------
See summary.
The main crux is two fold
1) Shutdown a slave after all executors have terminated
2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
Diffs (updated)
-----
src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
Diff: https://reviews.apache.org/r/10233/diff/
Testing
-------
make check
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
Thanks,
Vinod Kone
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review19040
-----------------------------------------------------------
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39566>
TODO: Add a timeout
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39565>
utils::copy()
src/slave/slave.cpp
<https://reviews.apache.org/r/10233/#comment39564>
pull this out.
- Vinod Kone
On April 9, 2013, 11:14 p.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 9, 2013, 11:14 p.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/
-----------------------------------------------------------
(Updated April 9, 2013, 11:14 p.m.)
Review request for mesos, Benjamin Hindman and Ben Mahler.
Changes
-------
fixed tests to use the latest abstractions.
Description
-------
See summary.
The main crux is two fold
1) Shutdown a slave after all executors have terminated
2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
Diffs (updated)
-----
src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
src/tests/slave_recovery_tests.cpp d0ff9b73e06e89a5409f038be2766333e0a0689e
Diff: https://reviews.apache.org/r/10233/diff/
Testing
-------
make check
sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
Thanks,
Vinod Kone
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Vinod Kone <vi...@gmail.com>.
> On April 8, 2013, 5:18 p.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 240
> > <https://reviews.apache.org/r/10233/diff/1/?file=277031#file277031line240>
> >
> > Seems odd to have a function called _terminate() that you're calling from:
> >
> > _initialize()
> > shutdown() // makes sense
> > cleanup(Framework)
> >
> > Seems counter-intuitive that _terminate() only terminates sometimes, and that it's being called in locations that don't seem associated with termination.
This is why _terminate(), which would've been called terminateIfNecessary() in Java world, is called from multiple locations.
_initialize():
This is essentially end of recovery. If slave was started in cleanup mode and we didn't recover any frameworks, we want to terminate the slave.
shutdown():
This I think is obvious
cleanup(Framework)
Every time this function is called, a framework could potentially be removed from frameworks struct. Now, if there are no frameworks and the slave is being shutdown or in cleanup mode, the slave should terminate.
I agree that its not very intuitive and I am open to suggestions (regarding renaming the function(s), or the abstractions we have).
- Vinod
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review18783
-----------------------------------------------------------
On April 2, 2013, 2:04 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 2, 2013, 2:04 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp 47f9b0f215af2fb9bc300e0c92535b6f91afa5cd
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>
Re: Review Request: Fixed the slave to wait for all executors to exit and
delete the "latest" slave symlink when shutting down.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10233/#review18783
-----------------------------------------------------------
src/slave/slave.hpp
<https://reviews.apache.org/r/10233/#comment39202>
Seems odd to have a function called _terminate() that you're calling from:
_initialize()
shutdown() // makes sense
cleanup(Framework)
Seems counter-intuitive that _terminate() only terminates sometimes, and that it's being called in locations that don't seem associated with termination.
- Ben Mahler
On April 2, 2013, 2:04 a.m., Vinod Kone wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10233/
> -----------------------------------------------------------
>
> (Updated April 2, 2013, 2:04 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Ben Mahler.
>
>
> Description
> -------
>
> See summary.
>
> The main crux is two fold
>
> 1) Shutdown a slave after all executors have terminated
> 2) Delete the "latest" symlink under /path/to/meta/slaves/ to make sure a shutdown slave comes up as new slave when restarted.
>
> I had to refactor cleanup(), because turns out there are quite a few edge cases to guard against.
>
>
> Diffs
> -----
>
> src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> src/slave/slave.cpp 325231458a6883019436e7cc5a37f85f0f5735fa
> src/slave/state.cpp e5c32257978d8407535e05ed73f8a50bdc2f651d
> src/tests/slave_recovery_tests.cpp 47f9b0f215af2fb9bc300e0c92535b6f91afa5cd
>
> Diff: https://reviews.apache.org/r/10233/diff/
>
>
> Testing
> -------
>
> make check
>
> sudo GLOG_v=1 ./bin/mesos-tests.sh --gtest_filter="*ShutdownSlave*" --verbose --gtest_repeat=100 --gtest_break_on_failure
>
>
> Thanks,
>
> Vinod Kone
>
>