You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/02/05 21:15:07 UTC

Re: Review Request: Slave Restart (Part 7): Recover slave state

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 5, 2013, 8:15 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Added a 'safe' option (turned on by default) for recovery.

Updated recovery accordingly.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/protobuf_io_tests.cpp 5c9bf4664d413a34859608dd727deacf3ce4b402 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
  third_party/libprocess/Makefile.am dad1b65c3fdb7dbdad4e7c3d9c241cc4e89c3325 
  third_party/libprocess/include/stout/protobuf.hpp 25d4634bbc51d14ea78c1a922353f0ce52e458df 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated March 13, 2013, 6:13 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/common/type_utils.hpp 66c2dc17655123641433c078931acc9e769d5077 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp d3efa58ef62383af9eb051b23feb950ba6a4f4e3 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated March 3, 2013, 10:33 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
  src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
  src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 242
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line242>
> >
> >     Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
> >     
> >     This is extremely brittle to changes in the protobufs!
> >     
> >     I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?

As discussed offline, we need this for properly comparing resources. Included source and name fields in the comparison.

Audited the file to confirm everything is up to date.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 293
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line293>
> >
> >     Where did you use this in this review?
> >     What is Task? I don't see Task in the protos.
> >     
> >     Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!

I've used this in 'Recover Executors' review. Not sure why I added this in this review. I'm going to keep it as is, to avoid more work for me.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/

huh?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 113
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line113>
> >
> >     I think it's clearer to have the first two lines as:
> >     
> >     "Enable recovery. Possible recovery techniques:"
> >     ...
> >     
> >     On that note, this seems more appropriate to be named as 'recovery', because:
> >     --recovery=reconnect makes more sense than
> >     --recover=reconnect
> >     
> >     Given this is asking for the recovery technique rather than just whether or not to recover.

These semantics are going to change in the next review, so I will fix it and the related ones (below) then.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 265
> > <https://reviews.apache.org/r/8680/diff/9/?file=261422#file261422line265>
> >
> >     // Whether recovery is completed (including reconciling
> >     // with old executors).

Going to change in the later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 428
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line428>
> >
> >     How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?

The latter.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 484
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line484>
> >
> >     s/Skipping/Postponing
> >     
> >     Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?

Will address in later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 127
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line127>
> >
> >     instead of this can you just have locally scoped message strings?

i don't like it because, it makes some of the "message=" statements 3 lines instead of 2.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.

i can't see why fd wont be restored?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.

Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Ben Mahler <be...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/
> 
> Vinod Kone wrote:
>     huh?

As in, apply the replacement. So:

"reconnect:
"  reconnect:

This will help readability on the command line.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 116
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line116>
> >
> >     indent

ditto


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.
> 
> Vinod Kone wrote:
>     i can't see why fd wont be restored?

because the seek to restore it can fail as well


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.
> 
> Vinod Kone wrote:
>     Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.

ok, just keep in mind the idiomatic way benh and I have been following is:

Try<Nothing> foo = os::foo();

Matching the call name.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On March 3, 2013, 10:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated March 3, 2013, 10:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------

Ship it!


I like the recovery code style!


src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36520>

    Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
    
    This is extremely brittle to changes in the protobufs!
    
    I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?



src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36521>

    Where did you use this in this review?
    What is Task? I don't see Task in the protos.
    
    Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36528>

    I think it's clearer to have the first two lines as:
    
    "Enable recovery. Possible recovery techniques:"
    ...
    
    On that note, this seems more appropriate to be named as 'recovery', because:
    --recovery=reconnect makes more sense than
    --recover=reconnect
    
    Given this is asking for the recovery technique rather than just whether or not to recover.



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36525>

    indent: s/"r"/"  r/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36526>

    indent



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36527>

    Add a newline? Makes this easier to read on the command line
    
    Also, s/NOTE/NOTES



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36523>

    s/enabled/provided/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36522>

    s/should/must



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36530>

    Can you just print out the operation we're trying? As you're doing below in createSlaveDirectory.
    
    i.e.
    Failed to symlink '/foo/bar/baz' to 'latest'



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36531>

    Quote latest to be consistent.



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment36532>

    // Whether recovery is completed (including reconciling
    // with old executors).



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36534>

    s/. /.\n/



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36539>

    How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36541>

    s/Skipping/Postponing
    
    Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36546>

    Need this to? Or it already does?



src/slave/state.hpp
<https://reviews.apache.org/r/8680/#comment36549>

    The caveat is that there may be orphans for unsafe recovery, correct? Can you document this important distinction?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36563>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36562>

    Can you move the error message inside this if scope?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36565>

    instead of this can you just have locally scoped message strings?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36566>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36567>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36568>

    ditto for this message



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36569>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36571>

    const &?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36570>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36574>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36575>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36578>

    no worry for numify errors?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36579>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36580>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36581>

    It's possible that the fd is not restored by protobuf read, so this is best effort.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36582>

    Kill the newlines if you're just printing the descriptor name.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36583>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36584>

    instead of using result, can you use const& called mkdir?
    
    Seems weird to call a try 'result'.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36585>

    ditto const& and call this one write?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36589>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36587>

    ditto on const& mkdir



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36588>

    ditto on const& write


- Ben Mahler


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 23, 2013, 8:12 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review16907
-----------------------------------------------------------

Ship it!



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment35877>

    These helper functions SHOULD NOT be using CHECK or CHECK_SOME but returning a Try/Result instead.



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment35878>

    Why are you doing this?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment35888>

    :(



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment35882>

    Another good spot to return error (for here and all the other globs).


- Benjamin Hindman


On Feb. 19, 2013, 8:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review16915
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment35889>

    Retry is a word.


- Benjamin Hindman


On Feb. 19, 2013, 8:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8:05 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 8:05 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Another day, another iteration of how we do recovery :)

Instead of returning None(), recovery functions now return partial states.

Subsequently, reverted the functions to return Try instead of Result.


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/
-----------------------------------------------------------

(Updated Feb. 6, 2013, 12:39 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

minor fixes


Description
-------

This only covers recovering slave state.

Recovering status update manager and isolation modules will be coming in next reviews.


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
  src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
  src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
  src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
  src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
  src/tests/paths_tests.cpp PRE-CREATION 
  src/tests/protobuf_io_tests.cpp 5c9bf4664d413a34859608dd727deacf3ce4b402 
  src/tests/reaper_tests.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 
  src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 
  src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
  third_party/libprocess/include/stout/protobuf.hpp 25d4634bbc51d14ea78c1a922353f0ce52e458df 

Diff: https://reviews.apache.org/r/8680/diff/


Testing
-------

make check


Thanks,

Vinod Kone