You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/03/01 05:23:25 UTC

Re: Review Request: Slave Restart (Part 7): Recover slave state

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------

Ship it!


I like the recovery code style!


src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36520>

    Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
    
    This is extremely brittle to changes in the protobufs!
    
    I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?



src/common/type_utils.hpp
<https://reviews.apache.org/r/8680/#comment36521>

    Where did you use this in this review?
    What is Task? I don't see Task in the protos.
    
    Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36528>

    I think it's clearer to have the first two lines as:
    
    "Enable recovery. Possible recovery techniques:"
    ...
    
    On that note, this seems more appropriate to be named as 'recovery', because:
    --recovery=reconnect makes more sense than
    --recover=reconnect
    
    Given this is asking for the recovery technique rather than just whether or not to recover.



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36525>

    indent: s/"r"/"  r/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36526>

    indent



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36527>

    Add a newline? Makes this easier to read on the command line
    
    Also, s/NOTE/NOTES



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36523>

    s/enabled/provided/



src/slave/flags.hpp
<https://reviews.apache.org/r/8680/#comment36522>

    s/should/must



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36530>

    Can you just print out the operation we're trying? As you're doing below in createSlaveDirectory.
    
    i.e.
    Failed to symlink '/foo/bar/baz' to 'latest'



src/slave/paths.hpp
<https://reviews.apache.org/r/8680/#comment36531>

    Quote latest to be consistent.



src/slave/slave.hpp
<https://reviews.apache.org/r/8680/#comment36532>

    // Whether recovery is completed (including reconciling
    // with old executors).



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36534>

    s/. /.\n/



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36539>

    How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36541>

    s/Skipping/Postponing
    
    Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?



src/slave/slave.cpp
<https://reviews.apache.org/r/8680/#comment36546>

    Need this to? Or it already does?



src/slave/state.hpp
<https://reviews.apache.org/r/8680/#comment36549>

    The caveat is that there may be orphans for unsafe recovery, correct? Can you document this important distinction?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36563>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36562>

    Can you move the error message inside this if scope?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36565>

    instead of this can you just have locally scoped message strings?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36566>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36567>

    const string&?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36568>

    ditto for this message



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36569>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36571>

    const &?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36570>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36574>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36575>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36578>

    no worry for numify errors?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36579>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36580>

    ditto



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36581>

    It's possible that the fd is not restored by protobuf read, so this is best effort.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36582>

    Kill the newlines if you're just printing the descriptor name.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36583>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36584>

    instead of using result, can you use const& called mkdir?
    
    Seems weird to call a try 'result'.



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36585>

    ditto const& and call this one write?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36589>

    quote path?



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36587>

    ditto on const& mkdir



src/slave/state.cpp
<https://reviews.apache.org/r/8680/#comment36588>

    ditto on const& write


- Ben Mahler


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Vinod Kone <vi...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 242
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line242>
> >
> >     Why did we make this file? We don't even compare frameworkid here? I also introduced source and name, which are now not compared (my fault, didn't know this existed).
> >     
> >     This is extremely brittle to changes in the protobufs!
> >     
> >     I know it's a hassle, but can you please audit this file? Or please create a bug to keep track of this?

As discussed offline, we need this for properly comparing resources. Included source and name fields in the comparison.

Audited the file to confirm everything is up to date.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 293
> > <https://reviews.apache.org/r/8680/diff/9/?file=261413#file261413line293>
> >
> >     Where did you use this in this review?
> >     What is Task? I don't see Task in the protos.
> >     
> >     Introducing even more protobuf equality methods is really brittle, I'd like to see why you need this!

I've used this in 'Recover Executors' review. Not sure why I added this in this review. I'm going to keep it as is, to avoid more work for me.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/

huh?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 113
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line113>
> >
> >     I think it's clearer to have the first two lines as:
> >     
> >     "Enable recovery. Possible recovery techniques:"
> >     ...
> >     
> >     On that note, this seems more appropriate to be named as 'recovery', because:
> >     --recovery=reconnect makes more sense than
> >     --recover=reconnect
> >     
> >     Given this is asking for the recovery technique rather than just whether or not to recover.

These semantics are going to change in the next review, so I will fix it and the related ones (below) then.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 265
> > <https://reviews.apache.org/r/8680/diff/9/?file=261422#file261422line265>
> >
> >     // Whether recovery is completed (including reconciling
> >     // with old executors).

Going to change in the later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 428
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line428>
> >
> >     How do you make sure any changes to createSlaveDirectory don't accidentally break the functionality for using it here to create the meta-directory? Or is the whole intention to mirror a real slave directory?

The latter.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 484
> > <https://reviews.apache.org/r/8680/diff/9/?file=261423#file261423line484>
> >
> >     s/Skipping/Postponing
> >     
> >     Also, seems like this could just take place after recovery,  to avoid the need for this case, add a TODO?

Will address in later review.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 127
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line127>
> >
> >     instead of this can you just have locally scoped message strings?

i don't like it because, it makes some of the "message=" statements 3 lines instead of 2.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.

i can't see why fd wont be restored?


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.

Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp ddf02570caf3793106b3c48e158a5bb48c1ae80c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/http.cpp b94197013ac0d5dd95d6dabb44812905a123184a 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 12a579cba56cd3dac384bc7919b0d5537b0e429d 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/exception_tests.cpp 13355d08788432ed07679daf24c2d74cc12a7f11 
>   src/tests/fault_tolerance_tests.cpp 44eef03aedd89ddfce8491add73918d8f453a0f2 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 7): Recover slave state

Posted by Ben Mahler <be...@gmail.com>.

> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 115
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line115>
> >
> >     indent: s/"r"/"  r/
> 
> Vinod Kone wrote:
>     huh?

As in, apply the replacement. So:

"reconnect:
"  reconnect:

This will help readability on the command line.


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/flags.hpp, line 116
> > <https://reviews.apache.org/r/8680/diff/9/?file=261415#file261415line116>
> >
> >     indent

ditto


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 454
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line454>
> >
> >     It's possible that the fd is not restored by protobuf read, so this is best effort.
> 
> Vinod Kone wrote:
>     i can't see why fd wont be restored?

because the seek to restore it can fail as well


> On March 1, 2013, 4:23 a.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 492
> > <https://reviews.apache.org/r/8680/diff/9/?file=261425#file261425line492>
> >
> >     instead of using result, can you use const& called mkdir?
> >     
> >     Seems weird to call a try 'result'.
> 
> Vinod Kone wrote:
>     Calling it 'mkdir' also seems weird to me. I've used 'result' in the past to mean the result (or return value) of a function. Doesn't necessarily mean result is a type of Result.

ok, just keep in mind the idiomatic way benh and I have been following is:

Try<Nothing> foo = os::foo();

Matching the call name.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8680/#review17217
-----------------------------------------------------------


On March 3, 2013, 10:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8680/
> -----------------------------------------------------------
> 
> (Updated March 3, 2013, 10:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This only covers recovering slave state.
> 
> Recovering status update manager and isolation modules will be coming in next reviews.
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/slave/constants.hpp dc5b9877e2608f6806f13b6ef4d133b6fddaece4 
>   src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
>   src/slave/http.cpp f6e2fdf61502f8237c3a41013557a250416e6457 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.cpp ff98d105af675dfc66070feaa43b42c1aa438fd8 
>   src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
>   src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 2688567719c5b12f1d8a56fa6193c11d4077685b 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/fault_tolerance_tests.cpp 61509f1c2432dcd1d0c5d1be1e662aff76a29fd5 
>   src/tests/paths_tests.cpp PRE-CREATION 
>   src/tests/reaper_tests.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp b648b631d8c11c26e6adfa3f5b16012044557e25 
> 
> Diff: https://reviews.apache.org/r/8680/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>