You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2017/05/03 00:04:54 UTC

Re: Review Request 56895: Allow agents to recover slave state post a reboot.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56895/#review173678
-----------------------------------------------------------




src/slave/slave.hpp
Lines 333 (patched)
<https://reviews.apache.org/r/56895/#comment246708>

    Can you add a comment on what this represents?



src/slave/slave.cpp
Lines 5957 (patched)
<https://reviews.apache.org/r/56895/#comment246709>

    s/info/info after a reboot/
    
    also please log the `future.failure()` here? are we sure that a failed future here can only happen due to incompatibile agent info?



src/slave/slave.cpp
Line 5954 (original), 5963 (patched)
<https://reviews.apache.org/r/56895/#comment246710>

    As discussed offline, we should continue registering as new agent instead of exiting here.



src/slave/state.hpp
Lines 310 (patched)
<https://reviews.apache.org/r/56895/#comment246711>

    s/hasRebooted/rebooted/


- Vinod Kone


On April 26, 2017, 6:16 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56895/
> -----------------------------------------------------------
> 
> (Updated April 26, 2017, 6:16 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
>     https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With partition awareness, the agents are now allowed to re-register
> after they have been marked Unreachable. The executors are anyway
> terminated on the agent when it reboots so there is no harm in
> letting the agent keep its SlaveID, re-register with the master
> and reconcile the lost executors. This is a pre-requisite for
> supporting persistent/restartable tasks in mesos.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 77fb93abc701cd34b69c75b6219c219fdb784a67 
>   src/slave/slave.cpp 4ff522e75bc8de34fe2e7720bdd8ce3d32cbf803 
>   src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5 
>   src/slave/state.cpp 33dcc7a148f9a6b1a3216cce45710da8fd819ba6 
>   src/tests/reservation_tests.cpp 4504831d77c1bfcf5f2ddf6d28cd45dea2c421ad 
>   src/tests/slave_recovery_tests.cpp 53f33a2b0411c8158326074ce043c7b1dbeef5b4 
> 
> 
> Diff: https://reviews.apache.org/r/56895/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>


Re: Review Request 56895: Allow agents to recover slave state post a reboot.

Posted by Megha Sharma <ms...@apple.com>.

> On May 3, 2017, 12:04 a.m., Vinod Kone wrote:
> > src/slave/slave.cpp
> > Lines 5957 (patched)
> > <https://reviews.apache.org/r/56895/diff/4/?file=1693973#file1693973line5957>
> >
> >     s/info/info after a reboot/
> >     
> >     also please log the `future.failure()` here? are we sure that a failed future here can only happen due to incompatibile agent info?

Indeed there are more reasons for the future to fail in addition to the incompatibile agent info so I agree the message needs to be more generic.


- Megha


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/56895/#review173678
-----------------------------------------------------------


On April 26, 2017, 6:16 p.m., Megha Sharma wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/56895/
> -----------------------------------------------------------
> 
> (Updated April 26, 2017, 6:16 p.m.)
> 
> 
> Review request for mesos, Neil Conway, Vinod Kone, and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-6223
>     https://issues.apache.org/jira/browse/MESOS-6223
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> With partition awareness, the agents are now allowed to re-register
> after they have been marked Unreachable. The executors are anyway
> terminated on the agent when it reboots so there is no harm in
> letting the agent keep its SlaveID, re-register with the master
> and reconcile the lost executors. This is a pre-requisite for
> supporting persistent/restartable tasks in mesos.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 77fb93abc701cd34b69c75b6219c219fdb784a67 
>   src/slave/slave.cpp 4ff522e75bc8de34fe2e7720bdd8ce3d32cbf803 
>   src/slave/state.hpp a497ce1f58fb8dc7718ee5bb10bc62dd7479efa5 
>   src/slave/state.cpp 33dcc7a148f9a6b1a3216cce45710da8fd819ba6 
>   src/tests/reservation_tests.cpp 4504831d77c1bfcf5f2ddf6d28cd45dea2c421ad 
>   src/tests/slave_recovery_tests.cpp 53f33a2b0411c8158326074ce043c7b1dbeef5b4 
> 
> 
> Diff: https://reviews.apache.org/r/56895/diff/4/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Megha Sharma
> 
>