You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/12/19 05:42:18 UTC

Review Request: Slave Restart (Part 8): Recover slave state

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Summary (updated)
-----------------

Slave Restart (Part 8): Recover slave state


Description (updated)
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8680/diff/


Testing (updated)
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated March 13, 2013, 6:13 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/common/type_utils.hpp 66c2dc17655123641433c078931acc9e769d5077 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated March 3, 2013, 10:33 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
  src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 242
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line242>
> >
> >     Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
> >     
> >     This is extremely brittle to changes in the protobufs!
> >     
> >     I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?

As discussed offline, we need this for properly comparing resources. Included source and name fields in the comparison.

Audited the file to confirm everything is up to date.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 293
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line293>
> >
> >     Where did you use this in this review?
> >     What is Task? I don't see Task in the protos.
> >     
> >     Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!

I've used this in 'Recover Executors' review. Not sure why I added this in this review. I'm going to keep it as is, to avoid more work for me.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/

huh?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 113
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line113>
> >
> >     I think it's clearer to have the first two lines as:
> >     
> >     "Enable recovery. Possible recovery techniques:"
> >     ...
> >     
> >     On that note, this seems more appropriate to be named as 'recovery', because:
> >     --recovery=reconnect makes more sense than
> >     --recover=reconnect
> >     
> >     Given this is asking for the recovery technique rather than just whether or not to recover.

These semantics are going to change in the next review, so I will fix it and the related ones (below) then.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 265
> > <https://reviews.apache.org/r/8680/diff/9/?file=261422#file261422line265>
> >
> >     // Whether recovery is completed (including reconciling
> >     // with old executors).

Going to change in the later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 428
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line428>
> >
> >     How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?

The latter.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 484
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line484>
> >
> >     s/Skipping/Postponing
> >     
> >     Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?

Will address in later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 127
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line127>
> >
> >     instead of this can you just have locally scoped message strings?

i don't like it because, it makes some of the "message=" statements 3 lines instead of 2.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.

i can't see why fd wont be restored?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.

Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Ben Mahler <be...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/
> 
> Vinod Kone wrote:
>     huh?

As in, apply the replacement. So:

"reconnect:
"  reconnect:

This will help readability on the command line.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 116
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line116>
> >
> >     indent

ditto


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.
> 
> Vinod Kone wrote:
>     i can't see why fd wont be restored?

because the seek to restore it can fail as well


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.
> 
> Vinod Kone wrote:
>     Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.

ok, just keep in mind the idiomatic way benh and I have been following is:

Try<Nothing> foo = os::foo();

Matching the call name.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On March 3, 2013, 10:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated March 3, 2013, 10:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------

Ship it!


I like the recovery code style!


src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36520>

    Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
    
    This is extremely brittle to changes in the protobufs!
    
    I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?



src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36521>

    Where did you use this in this review?
    What is Task? I don't see Task in the protos.
    
    Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36528>

    I think it's clearer to have the first two lines as:
    
    "Enable recovery. Possible recovery techniques:"
    ...
    
    On that note, this seems more appropriate to be named as 'recovery', because:
    --recovery=reconnect makes more sense than
    --recover=reconnect
    
    Given this is asking for the recovery technique rather than just whether or not to recover.



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36525>

    indent: s/"r"/"  r/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36526>

    indent



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36527>

    Add a newline? Makes this easier to read on the command line
    
    Also, s/NOTE/NOTES



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36523>

    s/enabled/provided/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36522>

    s/should/must



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36530>

    Can you just print out the operation we're trying? As you're doing below in createSlaveDirectory.
    
    i.e.
    Failed to symlink '/foo/bar/baz' to 'latest'



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36531>

    Quote latest to be consistent.



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment36532>

    // Whether recovery is completed (including reconciling
    // with old executors).



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36534>

    s/. /.\n/



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36539>

    How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36541>

    s/Skipping/Postponing
    
    Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36546>

    Need this to? Or it already does?



src/slave/state.hpp
<https://reviews.apache.org/r/8680/#comment36549>

    The caveat is that there may be orphans for unsafe recovery, correct? Can you document this important distinction?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36563>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36562>

    Can you move the error message inside this if scope?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36565>

    instead of this can you just have locally scoped message strings?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36566>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36567>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36568>

    ditto for this message



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36569>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36571>

    const &?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36570>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36574>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36575>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36578>

    no worry for numify errors?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36579>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36580>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36581>

    It's possible that the fd is not restored by protobuf read, so this is best effort.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36582>

    Kill the newlines if you're just printing the descriptor name.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36583>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36584>

    instead of using result, can you use const& called mkdir?
    
    Seems weird to call a try 'result'.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36585>

    ditto const& and call this one write?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36589>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36587>

    ditto on const& mkdir



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36588>

    ditto on const& write


- Ben Mahler


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 23, 2013, 8:12 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review16907
-----------------------------------------------------------

Ship it!



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment35877>

    These helper functions SHOULD NOT be using CHECK or CHECK_SOME but returning a Try/Result instead.



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment35878>

    Why are you doing this?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment35888>

    :(



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment35882>

    Another good spot to return error (for here and all the other globs).


- Benjamin Hindman


On Feb. 19, 2013, 8:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review16915
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment35889>

    Retry is a word.


- Benjamin Hindman


On Feb. 19, 2013, 8:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 8:05 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Another day, another iteration of how we do recovery :)

Instead of returning None(), recovery functions now return partial states.

Subsequently, reverted the functions to return Try instead of Result.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 6, 2013, 12:39 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

minor fixes


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/protobuf_io_tests.cpp 5c9bf4664d413a34859608dd727deacf3ce4b402 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
  third_party/libprocess/include/stout/protobuf.hpp 25d4634bbc51d14ea78c1a922353f0ce52e458df 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 5, 2013, 8:15 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Added a 'safe' option (turned on by default) for recovery.

Updated recovery accordingly.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/protobuf_io_tests.cpp 5c9bf4664d413a34859608dd727deacf3ce4b402 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
  third_party/libprocess/Makefile.am dad1b65c3fdb7dbdad4e7c3d9c241cc4e89c3325 
  third_party/libprocess/include/stout/protobuf.hpp 25d4634bbc51d14ea78c1a922353f0ce52e458df 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Jan. 28, 2013, 4:56 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's comments


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am 0ab59b75b2955c532d0833f132bdaffe323838d0 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > Biggest piece of feedback here is really deciding what is a recovery error and what can just be printed and ignored. We talked a bit about this offline, but I don't like a semantics where errors are just printed out in logs and require operators to go sleuthing.

As discussed offline, I've decided to short-circuit recovery on ANY and ALL types of recovery errors. While this makes the code safe and easy to reason about, I'm still skeptical that this is going to be practical.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 116
> > <https://reviews.apache.org/r/8680/diff/4/?file=244659#file244659line116>
> >
> >     Do we rename this cleanup later?

yes! it should be in the 'Recover executors' review.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, lines 120-121
> > <https://reviews.apache.org/r/8680/diff/4/?file=244659#file244659line120>
> >
> >     Do these semantics change later?

yes. as you will see in the 'Recover executors' review, this flag is no longer optional.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 433
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line433>
> >
> >     You're only storing the ID if we're checkpointing?

i'm checkpointing info. i had to update 'id' inside info because this is the place a slave is assigned an 'id'. the other fields of 'info' are set in initialize().


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 483-486
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line483>
> >
> >     So when will we attempt to register in the future? Again, I think using something like futures is the key here. Basically we want to future "events" to occur in order for us to attempt to register. The first event is that we have a new master. And the second event is that we've completed recovery.

as discussed offline, this was a bug in the logic that surprisingly didn't get caught in tests!


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 569
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line569>
> >
> >     What do you think about killing the 'id' field and always reading from info.id()?

i think its great. one source of truth.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 974-975
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line974>
> >
> >     Do we need a Checkpointer?

Yes. But this gets a bit complicated when using 'async' because async doesn't support void functions, whereas the callbacks on future only expect void functions. I will leave it as is for now, and revisit this in future as a separate diff (Part 12!)


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 41
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line41>
> >
> >     If the result is none this means the file is empty right? Want to print that out?

i will wait for benm's protobuf read/write reviews to be committed before making any changes.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 98
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line98>
> >
> >     Looking forward to a templated read that just returns type T. ;)

can't wait for benm's code to be committed!


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 102
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line102>
> >
> >     Ditto comment above about printing out the file is empty.

ditto


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 232
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line232>
> >
> >     What does an error here mean? Why not return it? And what about the none case of pid? If we've sufficiently eliminated the none case a CHECK should be put in place until we replace Result with Try in stout/os.hpp.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 338
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line338>
> >
> >     When protobuf::read fails it should set the seek position back to the position before it attempted the read, so you should be able to just truncate at the "current seek position" below instead of adding up size each time. Just wanted to propose the option.

good point. fixed


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/tests/slave_recovery_tests.cpp, line 125
> > <https://reviews.apache.org/r/8680/diff/4/?file=244672#file244672line125>
> >
> >     Again, why protected instead of public?

this is how it was done in google test documentation for setup/teardown. https://code.google.com/p/googletest/wiki/AdvancedGuide.
 any reason it should be public? 


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/tests/slave_recovery_tests.cpp, line 128
> > <https://reviews.apache.org/r/8680/diff/4/?file=244672#file244672line128>
> >
> >     I just want to confirm that you want one work directory for all the SlaveRecoveryTest tests?

it gets cleaned up after every test. but you are right, its cleaner to create a different directory per test.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/tests/slave_recovery_tests.cpp, line 371
> > <https://reviews.apache.org/r/8680/diff/4/?file=244672#file244672line371>
> >
> >     While it makes the code longer, I think a lot of these might be easier read if you broke each '.' on a newline:
> >     
> >     ASSERT_TRUE(state
> >                          .frameworks[frameworkId]
> >                          .executors[executorId]
> >                          ...);
> >     
> >     Just a thought.

agreed. fixed all those that didn't fit on one line


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 334
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line334>
> >
> >     Here's a thought:
> >     
> >     Instead of the boolean 'recovered', make 'recover' return a future. Then, move all of the "installing" and files attaching into a new function called '_initialize' (the continuation of 'initialize'). Now, in 'initialize' do:
> >     
> >     recover(flags.recover.get() == "reconnect")
> >       .then(defer(self(), &Self::_initialize));
> >     
> >     The only crux here is in doReliableRegistration. One solution would be to save the future you get from this chain (called it 'recovery' or 'recovered' even) and use that just like you do (but first checking .isReady() and then doing .get()). I'm not sure how doReliableRegistration actually gets reinvoked as the code stands now, but I'll address that in a comment below.

Its not clear to me, atleast from the point of this review, why we would want recovery to be asynchronous. i will revisit this when i work on the downstream reviews where the complete recovery is implemented. Added a TODO for now.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1438
> > <https://reviews.apache.org/r/8680/diff/4/?file=244666#file244666line1438>
> >
> >     This is where recover could return a failed future ... probably don't want to continue if recovery fails without explicit action from an operator!

made this LOG(FATAL) for now.


> On Jan. 22, 2013, 4:22 a.m., Benjamin Hindman wrote:
> > src/slave/state.cpp, line 332
> > <https://reviews.apache.org/r/8680/diff/4/?file=244668#file244668line332>
> >
> >     s/udpates/updates/
> >     
> >     Why not propagate?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review15559
-----------------------------------------------------------


On Jan. 4, 2013, 2:25 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2013, 2:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review15559
-----------------------------------------------------------


Biggest piece of feedback here is really deciding what is a recovery error and what can just be printed and ignored. We talked a bit about this offline, but I don't like a semantics where errors are just printed out in logs and require operators to go sleuthing.


src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment33589>

    !?
    
    s/Enable recovery/recover/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment33590>

    You don't need to use a hypen with reconnect, it's a word. ;)



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment33591>

    Ditto above.



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment33593>

    Do we rename this cleanup later?



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment33592>

    Do these semantics change later?



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment33595>

    s/its//



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment33594>

    s/re-connect/reconnect/



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment33596>

    s/../.,/



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33597>

    Let's not scare our users, use an if check and EXIT please.



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33598>

    Rather than require keeping this in sync with the message in flags.hpp, how about ask them to use '--help' to see the valid options?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33602>

    Here's a thought:
    
    Instead of the boolean 'recovered', make 'recover' return a future. Then, move all of the "installing" and files attaching into a new function called '_initialize' (the continuation of 'initialize'). Now, in 'initialize' do:
    
    recover(flags.recover.get() == "reconnect")
      .then(defer(self(), &Self::_initialize));
    
    The only crux here is in doReliableRegistration. One solution would be to save the future you get from this chain (called it 'recovery' or 'recovered' even) and use that just like you do (but first checking .isReady() and then doing .get()). I'm not sure how doReliableRegistration actually gets reinvoked as the code stands now, but I'll address that in a comment below.



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33600>

    Maybe move making sure value of recover is valid down here so it's closer with the conditional?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33603>

    You're only storing the ID if we're checkpointing?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33604>

    So when will we attempt to register in the future? Again, I think using something like futures is the key here. Basically we want to future "events" to occur in order for us to attempt to register. The first event is that we have a new master. And the second event is that we've completed recovery.



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33605>

    s/foreachvalue(/foreachvalue (/



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33606>

    What do you think about killing the 'id' field and always reading from info.id()?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33607>

    Do we need a Checkpointer?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment33608>

    This is where recover could return a failed future ... probably don't want to continue if recovery fails without explicit action from an operator!



src/slave/state.hpp
<https://reviews.apache.org/r/8680/#comment33609>

    The style has been to put functions above fields per visibility scoping (public, private, etc). And I tend to put static members (functions and fields) above instance members too.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33610>

    If the result is none this means the file is empty right? Want to print that out?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33611>

    The error message is a little redundant don't you think? ;)



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33612>

    What about isNone?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33614>

    Looking forward to a templated read that just returns type T. ;)



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33613>

    Ditto comment above about printing out the file is empty.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33615>

    What constitutes making this a WARNING instead of an INFO?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33616>

    Ditto above comments.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33617>

    Again, what constitutes WARNING?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33619>

    Let's capitalize our acronyms in comments: s/uuid/UUID/



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33621>

    IMHO, using an else branch instead of a continue here links these two paragraphs together better.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33620>

    Seems like you might want to propagate this error?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33622>

    What does an error here mean? Why not return it? And what about the none case of pid? If we've sufficiently eliminated the none case a CHECK should be put in place until we replace Result with Try in stout/os.hpp.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33624>

    Again, what does an error mean here? Why not propagate?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33623>

    Ditto WARNING comments above.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33625>

    Why not propagate?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33626>

    Another ditto from earlier comments about saying file is empty (and maybe even better, what that possibly means?).



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33627>

    s/udpates/updates/
    
    Why not propagate?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33633>

    When protobuf::read fails it should set the seek position back to the position before it attempted the read, so you should be able to just truncate at the "current seek position" below instead of adding up size each time. Just wanted to propose the option.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33628>

    If you're grabbing the sterror anyway, might as well use PCHECK.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment33629>

    It might be helpful during debugging to demarcate the message using single quotes or newlines. Same goes for the CHECK_SOME above with protobufs, and newlines probably make more sense there.



src/tests/paths_tests.cpp
<https://reviews.apache.org/r/8680/#comment33630>

    s/PathsTest:/PathsTest :/



src/tests/paths_tests.cpp
<https://reviews.apache.org/r/8680/#comment33631>

    Why protected?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33632>

    s/SlaveStateTest:/SlaveStateTest :/



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33635>

    Ditto protected comment.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33634>

    s/SlaveRecoveryTest:/SlaveRecoverTest :/



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33636>

    Again, why protected instead of public?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33638>

    I just want to confirm that you want one work directory for all the SlaveRecoveryTest tests?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33637>

    Move to tests/utils.hpp? Maybe add a few more default parameters?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33640>

    Put Trigger on own line.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33642>

    CHECK or ASSERT? Kill the process or the test?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33639>

    Why was the cast needed? Do you just need to use 1U or 1L?



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8680/#comment33641>

    While it makes the code longer, I think a lot of these might be easier read if you broke each '.' on a newline:
    
    ASSERT_TRUE(state
                         .frameworks[frameworkId]
                         .executors[executorId]
                         ...);
    
    Just a thought.


- Benjamin Hindman


On Jan. 4, 2013, 2:25 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Jan. 4, 2013, 2:25 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Jan. 4, 2013, 2:25 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Addressed benM's offline comment about correctly handling partial updates file.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Dec. 25, 2012, 7:55 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

now we also checkpoint executor info, framework info and slave info.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Dec. 21, 2012, 2:55 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

refactored state recovery.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Dec. 19, 2012, 6:03 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Part 8 --> Part 7


Summary (updated)
-----------------

Slave Restart (Part 7): Recover slave state


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone