You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/12/13 05:28:33 UTC
Review Request 16238: Updated the slave to not recover after a reboot.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16238/
-----------------------------------------------------------
Review request for mesos, Benjamin Hindman and Vinod Kone.
Bugs: MESOS-844
https://issues.apache.org/jira/browse/MESOS-844
Repository: mesos-git
Description
-------
See MESOS-844. This uses the boot id to check whether the underlying machine has rebooted. If so, we do not want to attempt recovery given the PIDs are stale and machine information may have changed.
Diffs
-----
src/slave/paths.hpp 8f80b931a896cd7e317658147e864410558d5028
src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5
src/slave/state.cpp bf267b5ea23a7427c7cb05ab4e98af2e44d9cc43
src/tests/slave_recovery_tests.cpp 250083d380bf6d0bdba09abd239a9139a402008f
Diff: https://reviews.apache.org/r/16238/diff/
Testing
-------
Added an integration test, make check.
I need to clean up the boot time code to handle multiple slaves:
[ RUN ] SlaveRecoveryProcessIsolatorTest/0.MultipleSlaves
F1212 20:26:35.767004 273121280 state.cpp:69] CHECK_SOME(id): No boot time found
*** Check failure stack trace: ***
@ 0x10f35ea18 google::LogMessage::Flush()
@ 0x10f3600d1 google::LogMessageFatal::~LogMessageFatal()
@ 0x10e410a36 _CheckSome::~_CheckSome()
@ 0x10f0eb6a1 mesos::internal::slave::state::recover()
@ 0x10f13924f process::AsyncExecutorProcess::execute<>()
@ 0x10f11dd4f std::tr1::_Mem_fn<>::operator()()
@ 0x10f11ddcb std::tr1::_Bind<>::operator()<>()
@ 0x10f11de1c std::tr1::_Function_handler<>::_M_invoke()
@ 0x10f138fd3 process::internal::rdispatcher<>()
@ 0x10f11fe8b std::tr1::_Bind<>::operator()<>()
@ 0x10f11ffc8 std::tr1::_Function_handler<>::_M_invoke()
Re-registered executor on tw-172-25-24-157.office.twttr.net
@ 0x10f2850d0 process::ProcessManager::resume()
@ 0x10f285ad8 process::schedule()
@ 0x7fff8bd05772 _pthread_start
@ 0x7fff8bcf21a1 thread_start
Thanks,
Ben Mahler
Re: Review Request 16238: Updated the slave to not recover after a reboot.
Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16238/#review30712
-----------------------------------------------------------
Ship it!
src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/16238/#comment58809>
Indentation?
- Benjamin Hindman
On Dec. 18, 2013, 2:16 a.m., Ben Mahler wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/16238/
> -----------------------------------------------------------
>
> (Updated Dec. 18, 2013, 2:16 a.m.)
>
>
> Review request for mesos, Benjamin Hindman and Vinod Kone.
>
>
> Bugs: MESOS-844
> https://issues.apache.org/jira/browse/MESOS-844
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> See MESOS-844. This uses the boot id to check whether the underlying machine has rebooted. If so, we do not want to attempt recovery given the PIDs are stale and machine information may have changed.
>
>
> Diffs
> -----
>
> src/slave/paths.hpp 8f80b931a896cd7e317658147e864410558d5028
> src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5
> src/slave/state.cpp bf267b5ea23a7427c7cb05ab4e98af2e44d9cc43
> src/tests/slave_recovery_tests.cpp 250083d380bf6d0bdba09abd239a9139a402008f
>
> Diff: https://reviews.apache.org/r/16238/diff/
>
>
> Testing
> -------
>
> Added an integration test, make check.
>
>
> Thanks,
>
> Ben Mahler
>
>
Re: Review Request 16238: Updated the slave to not recover after a reboot.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16238/
-----------------------------------------------------------
(Updated Dec. 19, 2013, 11:43 p.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
Used 'ID' in comments and removed the .txt file extension.
Bugs: MESOS-844
https://issues.apache.org/jira/browse/MESOS-844
Repository: mesos-git
Description
-------
See MESOS-844. This uses the boot id to check whether the underlying machine has rebooted. If so, we do not want to attempt recovery given the PIDs are stale and machine information may have changed.
Diffs (updated)
-----
src/slave/paths.hpp 8f80b931a896cd7e317658147e864410558d5028
src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5
src/slave/state.cpp bf267b5ea23a7427c7cb05ab4e98af2e44d9cc43
src/tests/slave_recovery_tests.cpp 250083d380bf6d0bdba09abd239a9139a402008f
Diff: https://reviews.apache.org/r/16238/diff/
Testing
-------
Added an integration test, make check.
Thanks,
Ben Mahler
Re: Review Request 16238: Updated the slave to not recover after a reboot.
Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/16238/
-----------------------------------------------------------
(Updated Dec. 18, 2013, 2:16 a.m.)
Review request for mesos, Benjamin Hindman and Vinod Kone.
Changes
-------
This now works correctly on OS X since sysctl is used in favor of utmpx.
Bugs: MESOS-844
https://issues.apache.org/jira/browse/MESOS-844
Repository: mesos-git
Description
-------
See MESOS-844. This uses the boot id to check whether the underlying machine has rebooted. If so, we do not want to attempt recovery given the PIDs are stale and machine information may have changed.
Diffs
-----
src/slave/paths.hpp 8f80b931a896cd7e317658147e864410558d5028
src/slave/slave.cpp 6e6107e3551f29566ff233dad47dac8c29b4fab5
src/slave/state.cpp bf267b5ea23a7427c7cb05ab4e98af2e44d9cc43
src/tests/slave_recovery_tests.cpp 250083d380bf6d0bdba09abd239a9139a402008f
Diff: https://reviews.apache.org/r/16238/diff/
Testing (updated)
-------
Added an integration test, make check.
Thanks,
Ben Mahler