You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/12/25 09:22:54 UTC

Review Request: Slave Restart (Part 9): Recover executors and isolation module

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp 3ce8081edba669157a14b815503322ea8d367d43 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 0f2975d1adf874dba4d0a539eb5c99233cef6e6b 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 22, 2013, 2:41 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 969
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line969>
> >
> >     Also, need to identify to the master that we're cleaning up so that it doesn't try to run any tasks on this slave.

Actually, looking at the master, slave not registering with master, when in cleanup mode, should do the trick. No need to add more complexity!


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review16924
-----------------------------------------------------------


On Feb. 20, 2013, 2:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2013, 2:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review16924
-----------------------------------------------------------



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35897>

    Just say unacknowledged, here and everywhere else.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment35898>

    We should be changing the other ones to CopyFrom rather than changing these ones to MergeFrom.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment35899>

    Ditto.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment35903>

    Also, need to identify to the master that we're cleaning up so that it doesn't try to run any tasks on this slave.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment35902>

    s/funcRecoverExecutors/recoverExecutors/



src/slave/state.cpp
<https://reviews.apache.org/r/8762/#comment35900>

    Tell peeps why you had to do this.



src/slave/state.cpp
<https://reviews.apache.org/r/8762/#comment35901>

    Ditto.


- Benjamin Hindman


On Feb. 20, 2013, 2:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2013, 2:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/exec/exec.cpp, line 348
> > <https://reviews.apache.org/r/8762/diff/6/?file=260150#file260150line348>
> >
> >     Should 'slaveId' be an option? I don't really like the .IsInitialized semantics.

fair enough.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/exec/exec.cpp, line 398
> > <https://reviews.apache.org/r/8762/diff/6/?file=260150#file260150line398>
> >
> >     Why this change?

reverted.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/exec/exec.cpp, line 432
> > <https://reviews.apache.org/r/8762/diff/6/?file=260150#file260150line432>
> >
> >     At this point in the review I still don't exactly know how/why you're saving all the tasks. I see you do a duplicate check but that seems a bit crazy to save the tasks. Also, sometimes you refer to the tasks as "unacknowledged" and sometimes you refer to them as "un-updated" ... some more comments here are probably necessary.

added a comment. note that this map is going to be very small, because we remove a task from this map as soon as we get an ACK for it.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/exec/exec.cpp, line 556
> > <https://reviews.apache.org/r/8762/diff/6/?file=260150#file260150line556>
> >
> >     Doing numify<bool> seems a little weird here. Why not just:
> >     
> >     checkpoint = string(value) == "1";
> >     
> >     Or something like it? Likewise, if you replace all of the getenv calls here with their modern os::getenv versions then you can get a string back instead of a char*.

sure. also, error checking is free with os::getenv(), which is great.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/cgroups_isolation_module.cpp, line 920
> > <https://reviews.apache.org/r/8762/diff/6/?file=260156#file260156line920>
> >
> >     Not your bug, but s/tag/uuid/g. Also, I'm much more inclined to always store UUIDs as the UUID type in code and only use toString to pretty print them. Then people don't have to think about what the representation of this UUID might be given that a string could either be the pretty-printed version or just the bytes. This also isn't really your bug, but we should fix this. Note that when we use the UUID to construct the cgroup name or directories then it's totally fine to use the pretty-printed version, let's just do it there not here.

sgtm. fixed.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 401
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line401>
> >
> >     I think we need a more descriptive name for 'active' (something more along the lines of shutdown like 'halt' or 'halted' or 'halting'.

i like halting.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1122
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line1122>
> >
> >     Maybe we should do this first? Before we handle status updates or relaunch any tasks?

Sure.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1152
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line1152>
> >
> >     What "event" does this deliver? How is it used? Some comments here are necessary. Finally, who owns the promise? Is this is a memory leak?

Added a comment. And yes, this is a memory leak! Thank you. Changed 'recovered' from future to promise to fix this.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1501
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line1501>
> >
> >     s/:/,/

n/a as i got rid of this logic altogether.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1726
> > <https://reviews.apache.org/r/8762/diff/6/?file=260166#file260166line1726>
> >
> >     So, we have the isolation module recover this executor even if we're just killing it? Probably needs some more comments about this interplay.

Isolation module recovery basically entails getting enough information to interact with an executor (change resources, monitor usage, killing executor) etc.

Added a comment.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.cpp, line 71
> > <https://reviews.apache.org/r/8762/diff/6/?file=260169#file260169line71>
> >
> >     Are there other kinds of directories a recover function would need?

No.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/lxc_isolation_module.cpp, line 116
> > <https://reviews.apache.org/r/8762/diff/6/?file=260162#file260162line116>
> >
> >     Just elide the paramater name entirely.

I thought the convention of unused params is to use "_" ?  I see this in another places too.


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 222
> > <https://reviews.apache.org/r/8762/diff/6/?file=260165#file260165line222>
> >
> >     Comment or add a TODO on why const&.

not needed anymore after your refactor of future.then().


> On Feb. 23, 2013, 11:49 p.m., Benjamin Hindman wrote:
> > src/tests/slave_recovery_tests.cpp, line 185
> > <https://reviews.apache.org/r/8762/diff/6/?file=260170#file260170line185>
> >
> >     Is there a way to just pass in the type T and then instantiate that directly? I think Thomas might have done something like this in allocator tests. If we can do without using the factory here that will save someone time when we eliminate these factories for factory functions.

Done. Its too bad that to access templatized base class members you need to use "this->" in C++.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review16990
-----------------------------------------------------------


On Feb. 20, 2013, 2:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2013, 2:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review16990
-----------------------------------------------------------



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35992>

    s/install </install</



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35993>

    Your error message here makes it sound like this case could happen? Why is this a check then? If this case shouldn't ever happen (because you programmed it that way) then you'll want an error message more along the lines of "Unexpected duplicated task ...".



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35995>

    Should 'slaveId' be an option? I don't really like the .IsInitialized semantics.



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35994>

    s/re-connect/reconnect/



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35996>

    I tend to put spaces around my elipsis.



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35997>

    Why this change?



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35998>

    At this point in the review I still don't exactly know how/why you're saving all the tasks. I see you do a duplicate check but that seems a bit crazy to save the tasks. Also, sometimes you refer to the tasks as "unacknowledged" and sometimes you refer to them as "un-updated" ... some more comments here are probably necessary.



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment35999>

    Or you could remove the other old style comments and put a period at the end of this comment. ;) 



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment36000>

    Doing numify<bool> seems a little weird here. Why not just:
    
    checkpoint = string(value) == "1";
    
    Or something like it? Likewise, if you replace all of the getenv calls here with their modern os::getenv versions then you can get a string back instead of a char*. 



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36002>

    .



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36001>

    Not your bug, but s/tag/uuid/g. Also, I'm much more inclined to always store UUIDs as the UUID type in code and only use toString to pretty print them. Then people don't have to think about what the representation of this UUID might be given that a string could either be the pretty-printed version or just the bytes. This also isn't really your bug, but we should fix this. Note that when we use the UUID to construct the cgroup name or directories then it's totally fine to use the pretty-printed version, let's just do it there not here.



src/slave/lxc_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36003>

    Just elide the paramater name entirely.



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36004>

    Ditto.



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36005>

    Ditto.



src/slave/slave.hpp
<https://reviews.apache.org/r/8762/#comment36006>

    Comment or add a TODO on why const&.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36007>

    Single quote to make it clear what is "unknown".



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36008>

    s/shutdown/shut down/g



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36009>

    I think we need a more descriptive name for 'active' (something more along the lines of shutdown like 'halt' or 'halted' or 'halting'.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36010>

    Your spelling fix is incorrect.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36011>

    s/it is/it was/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36012>

    s/Re-l/rel/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36013>

    s/Re-l/Rel/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36014>

    Maybe we should do this first? Before we handle status updates or relaunch any tasks?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36027>

    What "event" does this deliver? How is it used? Some comments here are necessary. Finally, who owns the promise? Is this is a memory leak?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36015>

    s/:/,/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36022>

    Also, add a todo here to just embed the defer directly in the 'then' below (I have a review forthcoming for that!).



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36016>

    Don't seem like very useful temporaries to me.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36017>

    With naming conflicts like this, I'd prefer s/fw/f/.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36018>

    Same here, although s/exec/e/ is rather unfortunate. I think we probably want to change the name of framework and executor above. If you do that I could see pulling out more temporaries above too (in this case it makes more sense, the temporaries you pulled out earlier didn't really make sense).



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36019>

    Comment doesn't make sense.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36020>

    So, we have the isolation module recover this executor even if we're just killing it? Probably needs some more comments about this interplay.



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36021>

    s/un-registered/unregistered/



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8762/#comment36023>

    Are there other kinds of directories a recover function would need?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8762/#comment36024>

    s/",t/", t/



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8762/#comment36026>

    Is there a way to just pass in the type T and then instantiate that directly? I think Thomas might have done something like this in allocator tests. If we can do without using the factory here that will save someone time when we eliminate these factories for factory functions.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8762/#comment36025>

    This is not Python.


- Benjamin Hindman


On Feb. 20, 2013, 2:10 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 20, 2013, 2:10 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.

> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/exec/exec.cpp, line 393
> > <https://reviews.apache.org/r/8762/diff/7/?file=263157#file263157line393>
> >
> >     Can we maybe spell this out in the public executor interfaces? Considering how harsh this is, I like the harshness if we have this contract, as people will _definitely_ discover they are doing this.
> >     
> >     One would probably assume the driver will store messages for later delivery if sent pre-registration. That would be a nicer contract than doing this, but I see the added complexity =/

Decided to pass SlaveID through environment. So, no need for enforcing this in the contract. Also, now, we actually buffer status updates.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/exec/exec.cpp, line 539
> > <https://reviews.apache.org/r/8762/diff/7/?file=263157#file263157line539>
> >
> >     Were you going to add slave id to these?
> >     
> >     While you're at it: https://issues.apache.org/jira/browse/MESOS-334
> >     
> >     Or maybe let's not do too many things in this change..

done.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 415-416
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line415>
> >
> >     given the context here, "terminating" seems more appropriate
> >     
> >     you would set terminating to true, and then you call terminate, makes more sense IMO

"terminating" would be confusing, because intuitively it sounds like someone called 'terminate()'.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1115
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1115>
> >
> >     Does this imply the StatusUpdateManager provides idempotent primitives in general?

yes.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1151
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1151>
> >
> >     Kill "If we are here, ", I know the comment is for this spot in the code because the comment was written in this spot for a reason! ;)

i like the emphasis.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1153
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1153>
> >
> >     Isn't the reaper only for the process isolation module?

nope.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1702
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1702>
> >
> >     quotes on executor id but not framework id?

as i explained in an earlier review, this has been the convention, for better or worse.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1753
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1753>
> >
> >     s/by/for/?

for tasks, by isolation module.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1797
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1797>
> >
> >     else?

else, no action needed. 


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/tests/slave_recovery_tests.cpp, line 616
> > <https://reviews.apache.org/r/8762/diff/7/?file=263177#file263177line616>
> >
> >     Line wrap at 70

?


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/cgroups_isolation_module.cpp, line 620
> > <https://reviews.apache.org/r/8762/diff/7/?file=263163#file263163line620>
> >
> >     Can you split off the killed case in terms of the failed message?
> >     
> >     I see unknown as more serious than killed (since killed can happen due to known races).

I don't see the benefit, considering the fact that a killed executor becomes unknown as soon as it gets cleaned up. Also note, that we use the same phrasing across a bunch of functions, across isolation modules.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/process_based_isolation_module.cpp, line 329
> > <https://reviews.apache.org/r/8762/diff/7/?file=263171#file263171line329>
> >
> >     ditto on splitting the failure message

see above.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review17260
-----------------------------------------------------------


On Feb. 27, 2013, 7:59 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2013, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
>   src/slave/cgroups_isolation_module.cpp a779de80d13c67e507d7d2ee788fcdaa5e71daca 
>   src/slave/constants.hpp 08db75f3e5af0ae79298546386deddf00beb8de9 
>   src/slave/constants.cpp 159dd12ec1e6b5f33fd75fc3a66d8ac897fc40b7 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
>   src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
>   src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
>   src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp 8c2e1bfc363491c681177676f9dfe5f229276f7d 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp 4600c2f6e0ceabd6a38f8bb762da314b795c23a0 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Ben Mahler <be...@gmail.com>.

> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/cgroups_isolation_module.cpp, line 620
> > <https://reviews.apache.org/r/8762/diff/7/?file=263163#file263163line620>
> >
> >     Can you split off the killed case in terms of the failed message?
> >     
> >     I see unknown as more serious than killed (since killed can happen due to known races).
> 
> Vinod Kone wrote:
>     I don't see the benefit, considering the fact that a killed executor becomes unknown as soon as it gets cleaned up. Also note, that we use the same phrasing across a bunch of functions, across isolation modules.

That's no reason to continue the bad habit! ;)

This would be useful for forensics IMO


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/process_based_isolation_module.cpp, line 329
> > <https://reviews.apache.org/r/8762/diff/7/?file=263171#file263171line329>
> >
> >     ditto on splitting the failure message
> 
> Vinod Kone wrote:
>     see above.

Why wouldn't this be useful for forensics?


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, lines 415-416
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line415>
> >
> >     given the context here, "terminating" seems more appropriate
> >     
> >     you would set terminating to true, and then you call terminate, makes more sense IMO
> 
> Vinod Kone wrote:
>     "terminating" would be confusing, because intuitively it sounds like someone called 'terminate()'.

But you're immediately calling terminate(), so what you're saying it sounds like is what we want ;)


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1151
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1151>
> >
> >     Kill "If we are here, ", I know the comment is for this spot in the code because the comment was written in this spot for a reason! ;)
> 
> Vinod Kone wrote:
>     i like the emphasis.

Can you explain why? It's unnecessary less succinct. I feel like if we apply it here, there's nothing to stop us from applying it everywhere?


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1702
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1702>
> >
> >     quotes on executor id but not framework id?
> 
> Vinod Kone wrote:
>     as i explained in an earlier review, this has been the convention, for better or worse.

fair enough, I guess we'll start killing them when we sanitize executor ids


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1733
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1733>
> >
> >     s/Create executor/Create the executor entry/

Why did you drop? I think either correct it or kill the comment? Currently it's not a sentence.


> On March 1, 2013, 8:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1721
> > <https://reviews.apache.org/r/8762/diff/7/?file=263174#file263174line1721>
> >
> >     s/framework/the framework entry/

Why did you drop? I think either correct it or kill the comment? Currently it's not a sentence.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review17260
-----------------------------------------------------------


On March 4, 2013, 1:26 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated March 4, 2013, 1:26 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
>   src/slave/cgroups_isolation_module.cpp a779de80d13c67e507d7d2ee788fcdaa5e71daca 
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
>   src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
>   src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
>   src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
>   src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review17260
-----------------------------------------------------------

Ship it!


Great tests, pushes me to write better tests for my components! (usage() on the isolation modules in untested from my monitoring changes =/)

Also, I thought you were going to pass in the SlaveID through the environment?


src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment36696>

    Can we maybe spell this out in the public executor interfaces? Considering how harsh this is, I like the harshness if we have this contract, as people will _definitely_ discover they are doing this.
    
    One would probably assume the driver will store messages for later delivery if sent pre-registration. That would be a nicer contract than doing this, but I see the added complexity =/



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment36697>

    ditto



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment36693>

    s/those//
    s/haven't/have not



src/exec/exec.cpp
<https://reviews.apache.org/r/8762/#comment36699>

    Were you going to add slave id to these?
    
    While you're at it: https://issues.apache.org/jira/browse/MESOS-334
    
    Or maybe let's not do too many things in this change..



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36702>

    Can you split off the killed case in terms of the failed message?
    
    I see unknown as more serious than killed (since killed can happen due to known races).



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36703>

    No need for a comment on this in the .cpp file since it's in the header, right?



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36704>

    would be awful nice to pass that option through directly ;)



src/slave/flags.hpp
<https://reviews.apache.org/r/8762/#comment36705>

    great



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment36708>

    ditto on splitting the failure message



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36713>

    given the context here, "terminating" seems more appropriate
    
    you would set terminating to true, and then you call terminate, makes more sense IMO



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36714>

    empty()?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36716>

    Does this imply the StatusUpdateManager provides idempotent primitives in general?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36717>

    Kill "If we are here, ", I know the comment is for this spot in the code because the comment was written in this spot for a reason! ;)



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36718>

    comma after exited



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36719>

    Isn't the reaper only for the process isolation module?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36720>

    quotes on executor id but not framework id?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36722>

    s/recovering/recovery of/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36723>

    ditto



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36725>

    s/framework/the framework entry/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36727>

    s/Create executor/Create the executor entry/



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36728>

    kill newline?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36724>

    ditto



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36729>

    s/,//



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36730>

    s/by/for/?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36732>

    else?



src/slave/slave.cpp
<https://reviews.apache.org/r/8762/#comment36733>

    s/Setup the monitoring./Begin monitoring the executor./



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8762/#comment36734>

    Line wrap at 70


- Ben Mahler


On Feb. 27, 2013, 7:59 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated Feb. 27, 2013, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
>   src/slave/cgroups_isolation_module.cpp a779de80d13c67e507d7d2ee788fcdaa5e71daca 
>   src/slave/constants.hpp 08db75f3e5af0ae79298546386deddf00beb8de9 
>   src/slave/constants.cpp 159dd12ec1e6b5f33fd75fc3a66d8ac897fc40b7 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
>   src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
>   src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
>   src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp 8c2e1bfc363491c681177676f9dfe5f229276f7d 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp 4600c2f6e0ceabd6a38f8bb762da314b795c23a0 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 9): Recover executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated March 13, 2013, 6:14 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 11244802b3210ef1a6900b978faf8bbcaa00266c 
  src/slave/cgroups_isolation_module.cpp 9395d9cd6cf2ac7a720480b778836eb1d704e00d 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated March 12, 2013, 6:15 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

minor fixes based on offline review. no need for review.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 11244802b3210ef1a6900b978faf8bbcaa00266c 
  src/slave/cgroups_isolation_module.cpp 9395d9cd6cf2ac7a720480b778836eb1d704e00d 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated March 12, 2013, 12:51 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Reverted summary.


Summary (updated)
-----------------

Slave Restart (Part 9): Recover executors.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 11244802b3210ef1a6900b978faf8bbcaa00266c 
  src/slave/cgroups_isolation_module.cpp 9395d9cd6cf2ac7a720480b778836eb1d704e00d 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Recover executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/#review17705
-----------------------------------------------------------



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment37601>

    more info in the comment.



src/slave/cgroups_isolation_module.cpp
<https://reviews.apache.org/r/8762/#comment37602>

    return Future::failed().


- Vinod Kone


On March 11, 2013, 5:54 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8762/
> -----------------------------------------------------------
> 
> (Updated March 11, 2013, 5:54 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Properly recovers and reconnects with executors.
> 
> Also, recovers isolation module.
> 
> This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
>   src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
>   src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 11244802b3210ef1a6900b978faf8bbcaa00266c 
>   src/slave/cgroups_isolation_module.cpp 9395d9cd6cf2ac7a720480b778836eb1d704e00d 
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
>   src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
>   src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
>   src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
>   src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
>   src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
>   src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
>   src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 
> 
> Diff: https://reviews.apache.org/r/8762/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Recover executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated March 11, 2013, 5:54 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Re-based off of https://reviews.apache.org/r/9847/.

Changed isolation module recovery to do the recovery of all executors at once.

Fixed the tests.


Summary (updated)
-----------------

Recover executors.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 11244802b3210ef1a6900b978faf8bbcaa00266c 
  src/slave/cgroups_isolation_module.cpp 9395d9cd6cf2ac7a720480b778836eb1d704e00d 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated March 4, 2013, 1:26 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's/


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
  src/slave/cgroups_isolation_module.cpp a779de80d13c67e507d7d2ee788fcdaa5e71daca 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/constants.cpp ecb96fd42afdb74aca394411c6ff60276f509cfd 
  src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Feb. 27, 2013, 7:59 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's comments and rebased off trunk.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp a04fc46b15d2741886f5847cadbdc9bed351c588 
  src/slave/cgroups_isolation_module.cpp a779de80d13c67e507d7d2ee788fcdaa5e71daca 
  src/slave/constants.hpp 08db75f3e5af0ae79298546386deddf00beb8de9 
  src/slave/constants.cpp 159dd12ec1e6b5f33fd75fc3a66d8ac897fc40b7 
  src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
  src/slave/isolation_module.hpp d7cc02b8ea0cde591ee61059efd79fbacaa74769 
  src/slave/lxc_isolation_module.hpp 6be99038fb547e43e74ef5469f2339bbe82017cf 
  src/slave/lxc_isolation_module.cpp 1ed3c87f884089e032bba2ed89be109abe6c1c13 
  src/slave/process_based_isolation_module.hpp 6a4e6aef9ab4b13cce9341f6f35c6b24e6f8fde7 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/reaper.cpp c0ee4b4c07fd792bcb39455b666808b712eb32c2 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp 8c2e1bfc363491c681177676f9dfe5f229276f7d 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp 4600c2f6e0ceabd6a38f8bb762da314b795c23a0 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Feb. 20, 2013, 2:10 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

re-enabled the disabled tests, now that we allow partial recovery!


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 8:07 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Updated diff with upstream changes.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 14f549edaf1b37a6bca8f75309864333ae775e7c 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Feb. 5, 2013, 8:25 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk and upstream changes.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 63cefc33cf34eebb82db5d8448b751be8652fa36 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Jan. 29, 2013, 11:08 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

changes based on addressing some upstream reviews.

NOTE: I had to disable couple of slave recovery tests, because we no longer consider the recovery non-fatal in the cases setup by those tests.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp f2d68c5fd1201b75c5a8f96599d7fb36516687fb 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 63cefc33cf34eebb82db5d8448b751be8652fa36 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 9): Recover executors and isolation module

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8762/
-----------------------------------------------------------

(Updated Jan. 4, 2013, 10:35 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

fix for a corner case in executor driver regarding  un-acked tasks.


Description
-------

Properly recovers and reconnects with executors.

Also, recovers isolation module.

This is pretty much the whole of recovery! (I'm going to send out another tiny review for properly doing incompatible upgrades)


Diffs (updated)
-----

  src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
  src/launcher/launcher.hpp ead58c0f60b6bff532f241a224b547b25e292175 
  src/launcher/launcher.cpp 3ce8081edba669157a14b815503322ea8d367d43 
  src/launcher/main.cpp e90df85ae433431defdd251490c74a482a58f743 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
  src/slave/cgroups_isolation_module.cpp 0f2975d1adf874dba4d0a539eb5c99233cef6e6b 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/constants.cpp 1735a6b55a93e6537a5a119e5345961f3d84a000 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8762/diff/


Testing
-------

make check


Thanks,

Vinod Kone