You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/12/21 03:59:30 UTC

Review Request: Slave Restart (Part 8): Recover status update manager

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs
-----

  src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/flags.hpp, line 111
> > <https://reviews.apache.org/r/8736/diff/2/?file=243208#file243208line111>
> >
> >     I'd like to make sure we understand and clearly document the semantics of what it means to have the default '--recovery=reconnect' when the slave doesn't actually have any old slave state to reconnect to.

updated the comment. lemme if it looks ok.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 1456
> > <https://reviews.apache.org/r/8736/diff/2/?file=243209#file243209line1456>
> >
> >     I do not like the semantics of an error getting returned and us deciding that we've "successfully" recovered! Considering using a Result to distinguish _no_ slave state to recover from an actual error. This needs to get done everywhere that an error is swallowed (even though it's logged).

we always exit when recovery fails now.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, lines 1466-1468
> > <https://reviews.apache.org/r/8736/diff/2/?file=243209#file243209line1466>
> >
> >     Invoking recover on the status update manager is asynchronous. In the event there was an error recovering, I don't think we should set 'recovered = true' so optimistically!

Made 'recover' asynchronous.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.cpp, line 175
> > <https://reviews.apache.org/r/8736/diff/2/?file=243211#file243211line175>
> >
> >     I don't like that semantics of using the "checkpointing" flag to "hack" the replaying of checkpoints. It seems like replying a stream should be a fundamental primitive, and shouldn't require a breach of isolation. Let's really separate the concerns here such that the StatusUpdateManager doesn't need to concern itself with the internals of the StatusUpdateStream's implementation.

agreed, refactored.


> On Jan. 24, 2013, 8:25 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.cpp, line 138
> > <https://reviews.apache.org/r/8736/diff/2/?file=243211#file243211line138>
> >
> >     Is there a non-hacky way to do this? If so, can you leave a TODO that describes what you're thinking for posterity?

“I never put off till tomorrow what I can possibly do - the day after.” -- O.W.

fixed.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/#review15590
-----------------------------------------------------------


On Dec. 25, 2012, 7:57 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8736/
> -----------------------------------------------------------
> 
> (Updated Dec. 25, 2012, 7:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Status update manager now recovers the update stream and re-sends any pending updates.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8736/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/#review15590
-----------------------------------------------------------



src/slave/flags.hpp
<https://reviews.apache.org/r/8736/#comment33753>

    I'd like to make sure we understand and clearly document the semantics of what it means to have the default '--recovery=reconnect' when the slave doesn't actually have any old slave state to reconnect to.



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment33754>

    s/result/state/
    
    I liked your previous name of the variable better.



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment33674>

    s/if(/if (/



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment33756>

    I do not like the semantics of an error getting returned and us deciding that we've "successfully" recovered! Considering using a Result to distinguish _no_ slave state to recover from an actual error. This needs to get done everywhere that an error is swallowed (even though it's logged).



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment33757>

    Invoking recover on the status update manager is asynchronous. In the event there was an error recovering, I don't think we should set 'recovered = true' so optimistically!



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/8736/#comment33758>

    s/udpates/updates/



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment33761>

    Is there a non-hacky way to do this? If so, can you leave a TODO that describes what you're thinking for posterity?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment33760>

    s/Re-try/Retry/
    
    It's a word! ;)



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment33762>

    I don't like that semantics of using the "checkpointing" flag to "hack" the replaying of checkpoints. It seems like replying a stream should be a fundamental primitive, and shouldn't require a breach of isolation. Let's really separate the concerns here such that the StatusUpdateManager doesn't need to concern itself with the internals of the StatusUpdateStream's implementation.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment33764>

    This variable is not a boolean (or doesn't not have a boolean operator), please don't treat it as such.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8736/#comment33765>

    Indentation.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8736/#comment33766>

    Not Python.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8736/#comment33768>

    As in other places in our code, you should make the 'recover' paramater an Option to explicitly distinguish some or none argument.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8736/#comment33769>

    Indentation, here and everywhere else!



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/8736/#comment33770>

    Newline.


- Benjamin Hindman


On Dec. 25, 2012, 7:57 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8736/
> -----------------------------------------------------------
> 
> (Updated Dec. 25, 2012, 7:57 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Status update manager now recovers the update stream and re-sends any pending updates.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8736/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/#review16916
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment35890>

    s/unr/Unr/



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment35894>

    If you make 'recovered' a Future instead then you can wait for that event in the first doReliableRegistration and not need these complicated semantics.



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment35891>

    s/re-/re/



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment35895>

    s/Re-t/ret/



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment35896>

    But ... unacknowledged is a word!


- Benjamin Hindman


On Feb. 19, 2013, 8:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8736/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8:06 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Status update manager now recovers the update stream and re-sends any pending updates.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8736/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/#review17250
-----------------------------------------------------------

Ship it!


As a higher level comment, have you reasoned about the slave dying _during_ recovery?


src/slave/flags.hpp
<https://reviews.apache.org/r/8736/#comment36664>

    Why not make this change in the review that introduced the 'kill' in the first place? Makes this chain harder to review because changes from an earlier review on this line will get stomped on during rebase, no?
    
    Ah well, next time.



src/slave/slave.hpp
<https://reviews.apache.org/r/8736/#comment36666>

    No comment necessary since the "_" prefix pattern indicates this is a continuation of recover.



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment36667>

    ditto with having made these changes in the right review to make it easier on reviewers and yourself ;)



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment36668>

    Why not do this as the first thing in finalize?



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment36669>

    Ah.. I see you've already taken care of many of my comments from earlier reviews, hence making the changes in the right ones! ;)



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment36670>

    would be nice to have os::exists returned any errors so that we can handle those differently here



src/slave/slave.cpp
<https://reviews.apache.org/r/8736/#comment36672>

    "We would be here" seems to be false, unless you moved the comment inside the loop.
    
    How about just saying: the state is none when....



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/8736/#comment36675>

    const method?



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/8736/#comment36673>

    ditto from earlier review, on why it would be better to not have the open / close as separate functions (rather have them inline in the only spot they are used.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment36676>

    Good call.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment36677>

    kill newline?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/8736/#comment36678>

    newline after the CHECK


- Ben Mahler


On Feb. 23, 2013, 8:12 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8736/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:12 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Status update manager now recovers the update stream and re-sends any pending updates.
> 
> 
> Diffs
> -----
> 
>   src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   src/tests/slave_recovery_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/8736/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated March 13, 2013, 6:13 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
  src/slave/flags.hpp b2441c773b13365a14b3daad4f81ae1ec0733439 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated March 3, 2013, 11:13 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
  src/slave/flags.hpp 9cd436cae5772b906b3566eb1f344904d5894214 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated Feb. 23, 2013, 8:12 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 8:06 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Updated diff with upstream changes.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp 814a6e1af2c7caf77452748dfc9e6935d8386d8f 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated Feb. 5, 2013, 8:18 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk and upstream changes.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp 20384f9f10ec37f03d61b1952424631a3bde9725 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated Jan. 29, 2013, 11:06 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenH's comments


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 8): Recover status update manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8736/
-----------------------------------------------------------

(Updated Dec. 25, 2012, 7:57 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

fixed a bug in the status update manager.


Description
-------

Status update manager now recovers the update stream and re-sends any pending updates.


Diffs (updated)
-----

  src/master/master.cpp fbf7f0775392e3d21a6986203ed8f11729ffda8c 
  src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/tests/slave_recovery_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/8736/diff/


Testing
-------

make check


Thanks,

Vinod Kone