You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2017/05/03 00:04:54 UTC
Re: Review Request 56895: Allow agents to recover slave state post a
reboot.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56895/#review173678
-----------------------------------------------------------
src/slave/slave.hpp
Lines 333 (patched)
<https://reviews.apache.org/r/56895/#comment246708>
Can you add a comment on what this represents?
src/slave/slave.cpp
Lines 5957 (patched)
<https://reviews.apache.org/r/56895/#comment246709>
s/info/info after a reboot/
also please log the `future.failure()` here? are we sure that a failed future here can only happen due to incompatibile agent info?
src/slave/slave.cpp
Line 5954 (original), 5963 (patched)
<https://reviews.apache.org/r/56895/#comment246710>
As discussed offline, we should continue registering as new agent instead of exiting here.
src/slave/state.hpp
Lines 310 (patched)
<https://reviews.apache.org/r/56895/#comment246711>
s/hasRebooted/rebooted/
- Vinod Kone
On April 26, 2017, 6:16 p.m., Megha Sharma wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56895/
> -----------------------------------------------------------
>
> (Updated April 26, 2017, 6:16 p.m.)
>
>
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
>
>
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
>
>
> Repository: mesos
>
>
> Description
> -------
>
> With partition awareness, the agents are now allowed to re-register
> after they have been marked Unreachable. The executors are anyway
> terminated on the agent when it reboots so there is no harm in
> letting the agent keep its SlaveID, re-register with the master
> and reconcile the lost executors. This is a pre-requisite for
> supporting persistent/restartable tasks in mesos.
>
>
> Diffs
> -----
>
> src/slave/slave.hpp 77fb93abc701cd34b69c75b6219c219fdb784a67
> src/slave/slave.cpp 4ff522e75bc8de34fe2e7720bdd8ce3d32cbf803
> src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5
> src/slave/state.cpp 33dcc7a148f9a6b1a3216cce45710da8fd819ba6
> src/tests/reservation_tests.cpp 4504831d77c1bfcf5f2ddf6d28cd45dea2c421ad
> src/tests/slave_recovery_tests.cpp 53f33a2b0411c8158326074ce043c7b1dbeef5b4
>
>
> Diff: https://reviews.apache.org/r/56895/diff/4/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Megha Sharma
>
>
Re: Review Request 56895: Allow agents to recover slave state post a
reboot.
Posted by Megha Sharma <ms...@apple.com>.
> On May 3, 2017, 12:04 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp
> > Lines 5957 (patched)
> > <https://reviews.apache.org/r/56895/diff/4/?file=1693973#file1693973line5957>
> >
> > s/info/info after a reboot/
> >
> > also please log the `future.failure()` here? are we sure that a failed future here can only happen due to incompatibile agent info?
Indeed there are more reasons for the future to fail in addition to the incompatibile agent info so I agree the message needs to be more generic.
- Megha
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56895/#review173678
-----------------------------------------------------------
On April 26, 2017, 6:16 p.m., Megha Sharma wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56895/
> -----------------------------------------------------------
>
> (Updated April 26, 2017, 6:16 p.m.)
>
>
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
>
>
> Bugs: MESOS-6223
> https://issues.apache.org/jira/browse/MESOS-6223
>
>
> Repository: mesos
>
>
> Description
> -------
>
> With partition awareness, the agents are now allowed to re-register
> after they have been marked Unreachable. The executors are anyway
> terminated on the agent when it reboots so there is no harm in
> letting the agent keep its SlaveID, re-register with the master
> and reconcile the lost executors. This is a pre-requisite for
> supporting persistent/restartable tasks in mesos.
>
>
> Diffs
> -----
>
> src/slave/slave.hpp 77fb93abc701cd34b69c75b6219c219fdb784a67
> src/slave/slave.cpp 4ff522e75bc8de34fe2e7720bdd8ce3d32cbf803
> src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5
> src/slave/state.cpp 33dcc7a148f9a6b1a3216cce45710da8fd819ba6
> src/tests/reservation_tests.cpp 4504831d77c1bfcf5f2ddf6d28cd45dea2c421ad
> src/tests/slave_recovery_tests.cpp 53f33a2b0411c8158326074ce043c7b1dbeef5b4
>
>
> Diff: https://reviews.apache.org/r/56895/diff/4/
>
>
> Testing
> -------
>
> make check
>
>
> Thanks,
>
> Megha Sharma
>
>