You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/09/21 20:20:36 UTC

Review Request: Slave Restart (Part Deux): Status Update Manager

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

This is Status Update Manager's API and implementation.

Haven't included the integration (into slave) and tests yet.

Rebased off latest trunk


Diffs
-----

  src/Makefile.am df8696920fd48968907270decbef3dda0803f80a 
  src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 215
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line215>
> >
> >     By inlining I meant not storing the bools and using the expressions in the if conditions directly.
> >     
> >      if (protobuf::isTerminalState(update.status().state())) {
> >           if (!stream->pending.empty()) {
> >     
> >     I think it makes it more readable. Or at the very least: s/empty/emptyStream and s/terminal/terminalState but I would prefer the first option.

got rid of terminal and s/empty/emptyStream. i wanted 'empty' in a variable, because its used twice.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/messages/messages.proto, line 61
> > <https://reviews.apache.org/r/7212/diff/2/?file=176985#file176985line61>
> >
> >     s/Consequently, i/NOTE: I
> >     
> >     I think this belongs above the Type enum itself to describe each type in the enum. That way they don't really need to be "NOTE:" comments either.

I think it reads better at top.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.hpp, line 50
> > <https://reviews.apache.org/r/7212/diff/2/?file=176986#file176986line50>
> >
> >     Enqueued for what?

Re-worded a lot of the comments, with the refactor. So, I'm going to drop the below issues.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.hpp, line 94
> > <https://reviews.apache.org/r/7212/diff/2/?file=176986#file176986line94>
> >
> >     I think you can do
> >     
> >     s/acknowledgement/ACK
> >     s/sendStatusUpdateAcknowledgement/sendStatusUpdateAck
> >     
> >     given that the rest of the code / comments talk about 'ACK's.

All the comments now say ACK. I still like the functions to have the Acknowledged prefix.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 86
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line86>
> >
> >     Is this comment accurate? It said to ignore it, but then it creates a stream and opens it?

No longer relevant


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 113
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line113>
> >
> >     Was there a reason for the space? " ."

nope, fixed


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 190
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line190>
> >
> >     It seems like you'd want to pop() here, no?

pop what?


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 253
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line253>
> >
> >     Can you document why you sometimes expect !pid.isSome()? That is, why is pid an Option?

Sometimes slave generates TASK_LOST messages, precisely when it doesnt know about the executor.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 290
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line290>
> >
> >     Extra indentation here.

good eye!


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 292
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line292>
> >
> >     Seems like we should add a bool function to Timeout for this case. Something like:
> >     
> >     timeout.expired();
> >     timeout.timedOut();
> >     
> >     What do you think?
> >

agreed, done.


> On Oct. 16, 2012, 1:16 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 322
> > <https://reviews.apache.org/r/7212/diff/2/?file=176987#file176987line322>
> >
> >     Rather than return null here, you could just return the Option directly, and then there's no need for this function?

This is a convenience function to avoid that boilerpate needed for checking the option and getting the value out of it.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12463
-----------------------------------------------------------


On Oct. 15, 2012, 11:46 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2012, 11:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Implementation of Status Updates Manager.
> 
> Features:
> -->  Checkpoints status updates and acks.
> --> Reliably sends stats updates
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12463
-----------------------------------------------------------



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26471>

    By inlining I meant not storing the bools and using the expressions in the if conditions directly.
    
     if (protobuf::isTerminalState(update.status().state())) {
          if (!stream->pending.empty()) {
    
    I think it makes it more readable. Or at the very least: s/empty/emptyStream and s/terminal/terminalState but I would prefer the first option.



src/messages/messages.proto
<https://reviews.apache.org/r/7212/#comment26478>

    s/Consequently, i/NOTE: I
    
    I think this belongs above the Type enum itself to describe each type in the enum. That way they don't really need to be "NOTE:" comments either.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26497>

    Add these includes:
    
    #include <ostream>
    #include <queue>
    #include <string>
    
    #include <process/pid.hpp>



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26482>

    Enqueued for what?



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26485>

    Alternative phrasing:
    After checkpointing, an ACK is immediately sent to the executor.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26481>

    s/re-tried/retried
    s/till/until
    s/received/received from the master/



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26487>

    s/to master/to the master



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26488>

    s/ack/ACK



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26491>

    s/>r/> r/



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26495>

    I think you can do
    
    s/acknowledgement/ACK
    s/sendStatusUpdateAcknowledgement/sendStatusUpdateAck
    
    given that the rest of the code / comments talk about 'ACK's.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26508>

    s/rtype/recordType here and below.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26498>

    s/to master/to the master



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26499>

    s/slave/this slave
    
    right?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26500>

    s/executor/the executor



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26501>

    Is this comment accurate? It said to ignore it, but then it creates a stream and opens it?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26504>

    Was there a reason for the space? " ."



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26505>

    It seems like you'd want to pop() here, no?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26506>

    Kill newline.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26507>

    Can you document why you sometimes expect !pid.isSome()? That is, why is pid an Option?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26509>

    Extra indentation here.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26510>

    Seems like we should add a bool function to Timeout for this case. Something like:
    
    timeout.expired();
    timeout.timedOut();
    
    What do you think?
    



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26511>

    Extra semicolon.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26512>

    Rather than return null here, you could just return the Option directly, and then there's no need for this function?


- Ben Mahler


On Oct. 15, 2012, 11:46 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Oct. 15, 2012, 11:46 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Implementation of Status Updates Manager.
> 
> Features:
> -->  Checkpoints status updates and acks.
> --> Reliably sends stats updates
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 6, 2012, 12:30 a.m., Benjamin Hindman wrote:
> > src/common/protobuf_utils.hpp, line 54
> > <https://reviews.apache.org/r/7212/diff/5/?file=178390#file178390line54>
> >
> >     Do you need ::process?

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14071
-----------------------------------------------------------


On Dec. 3, 2012, 9:54 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 3, 2012, 9:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14071
-----------------------------------------------------------



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/7212/#comment30085>

    Do you need ::process?



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment30086>

    s/"s/<s/g
    s/p"/p>/g



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment30087>

    Not in the header please.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment30088>

    s/(SUM)//



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment30089>

    Formatting (here and everywhere else).



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment30090>

    Kill for now, recover (pun intended) in part four.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30091>

    s/"s/<s/g
    s/p"/p>/g



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30093>

    Kill, see below.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30092>

    Check this here, and return if error.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30097>

    return true; ?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30101>

    Move to StatusUpdateStream (call it write? or checkpoint?).



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30100>

    Formatting.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30102>

    s/recordType/type/



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30103>

    Let's do this in stream.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30105>

    Reorder.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment30106>

    Just call it process, that's the standard other places.



third_party/libprocess/include/process/timeout.hpp
<https://reviews.apache.org/r/7212/#comment30107>

    ?


- Benjamin Hindman


On Dec. 3, 2012, 9:54 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 3, 2012, 9:54 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 214
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line214>
> >
> >     Would it be useful to include the slave id here?
> 
> Vinod Kone wrote:
>     I think its redundant because this is going to be on that slave?
> 
> Ben Mahler wrote:
>     Fair enough, but it's possible to run multiple slaves on one machine, although they'll have their own log directory.
>     
>     The only case to consider is mesos-local where we may want to add the slave id.
>     
>     Come to think of it, will these recovery changes break mesos-local?

The first 2 points are not unique to status update manager or recovery code, so i'm reluctant to add slave id. If (or when) we decide to do that we need to include slave id in all the log lines in all the pieces of code that goes into a slave binary. This is a bigger change and doesn't belong here.

Good catch on checkpoint collision in mesos-local! As long as each slave gets its own working directory, it shouldn't be a problem. They currently dont! Fixed.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14306
-----------------------------------------------------------


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.

> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 214
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line214>
> >
> >     Would it be useful to include the slave id here?
> 
> Vinod Kone wrote:
>     I think its redundant because this is going to be on that slave?

Fair enough, but it's possible to run multiple slaves on one machine, although they'll have their own log directory.

The only case to consider is mesos-local where we may want to add the slave id.

Come to think of it, will these recovery changes break mesos-local?


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14306
-----------------------------------------------------------


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 68
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line68>
> >
> >     This is a unique @return formatting. Was there a reason for this? Can you make it consistent with those in cgroups_isolation_module.hpp and the like?
> >     
> >     i.e.
> >     // @ return  Whether the update is handled successfully (e.g. checkpointed)

There was a discussion about changing the doc style in cgroups. But, I will stick with the same style as cgroups for now, so that we can change it in bulk later, if needed.


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 117
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line117>
> >
> >     Looks to me like you would fail your CHECK in checkpoint below when open fails, so why not check the result here?

My original thought process was that failure to checkpoint shouldn't kill the status update manager / slave, because there might be non-checkpointing frameworks. But, thinking a bit more about it, a failure to checkpoint seems like a fatal enough state. Since we control the paths, the only reason a checkpoint fails is because there is some sort of disk failure or permissions issue. In either case, failing fast (CHECK) seems good. I'm inclined towards this mode, because it would also make the code cleaner and simple.

Thoughts?


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 131
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line131>
> >
> >     CHECK_SOME

fixed, here and everywhere else


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 214
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line214>
> >
> >     Would it be useful to include the slave id here?

I think its redundant because this is going to be on that slave?


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 303
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line303>
> >
> >     why are you using CHECK_NOTNULL? if new fails it will throw an std::bad_alloc, right?

you are right. i think its redundant.


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 129
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line129>
> >
> >     Seems more appropriate as a Try<Nothing>

n/a


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 250
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line250>
> >
> >     In all of these cases where you return false, does this stream stall?
> >     
> >     In that, you don't consider the next update?

No. We currently don't. See the above comment. I think failing fast is better here than adding that complexity.


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 160
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line160>
> >
> >     bool -> Try<Nothing>?

changed to void instead.


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 150
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line150>
> >
> >     Why not log an error when close fails?

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14306
-----------------------------------------------------------


On Dec. 12, 2012, 8:48 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 8:48 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.

> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 117
> > <https://reviews.apache.org/r/7212/diff/7/?file=236514#file236514line117>
> >
> >     Looks to me like you would fail your CHECK in checkpoint below when open fails, so why not check the result here?
> 
> Vinod Kone wrote:
>     My original thought process was that failure to checkpoint shouldn't kill the status update manager / slave, because there might be non-checkpointing frameworks. But, thinking a bit more about it, a failure to checkpoint seems like a fatal enough state. Since we control the paths, the only reason a checkpoint fails is because there is some sort of disk failure or permissions issue. In either case, failing fast (CHECK) seems good. I'm inclined towards this mode, because it would also make the code cleaner and simple.
>     
>     Thoughts?

As discussed offline, you're going to go with CHECKs correct?


> On Dec. 12, 2012, 12:36 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 250
> > <https://reviews.apache.org/r/7212/diff/7/?file=236515#file236515line250>
> >
> >     In all of these cases where you return false, does this stream stall?
> >     
> >     In that, you don't consider the next update?
> 
> Vinod Kone wrote:
>     No. We currently don't. See the above comment. I think failing fast is better here than adding that complexity.

Agreed.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14306
-----------------------------------------------------------


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14306
-----------------------------------------------------------



src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment30460>

    If you want to clean these up:
    
    return stream << frameworkId.value();
    
    here and below in the others.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30461>

    This is a unique @return formatting. Was there a reason for this? Can you make it consistent with those in cgroups_isolation_module.hpp and the like?
    
    i.e.
    // @ return  Whether the update is handled successfully (e.g. checkpointed)



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30462>

    ditto



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30463>

    s/This would also stop re-trying any/This also stops the re-trying of any



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30464>

    s/of framework/of a framework



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30465>

    s/life time/lifetime



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30469>

    Looks to me like you would fail your CHECK in checkpoint below when open fails, so why not check the result here?



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30468>

    Seems more appropriate as a Try<Nothing>



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30506>

    CHECK_SOME



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30508>

    CHECK_SOME



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30510>

    Why not log an error when close fails?



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30516>

    bool -> Try<Nothing>?



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30512>

    CHECK_SOME



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30515>

    There's no harm, but we should be using CopyFrom here.
    
    I've seen us use MergeFrom a lot where it doesn't make sense.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30517>

    kill this newline



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment30528>

    CHECK_SOME



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment30523>

    Would it be useful to include the slave id here?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment30524>

    In all of these cases where you return false, does this stream stall?
    
    In that, you don't consider the next update?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment30522>

    why are you using CHECK_NOTNULL? if new fails it will throw an std::bad_alloc, right?


- Ben Mahler


On Dec. 10, 2012, 7:58 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 10, 2012, 7:58 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 306
> > <https://reviews.apache.org/r/7212/diff/12/?file=261358#file261358line306>
> >
> >     can you quote these ids, given we currently don't enforce no whitespace?

Throughout the code base we only quote executor ids because we know they can have space in them (our command executor does). I would be keep task ids unquoted for now, to be consistent. May be we can do a sweep later on of the code base if we decide to fix it.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/messages/messages.proto, line 59
> > <https://reviews.apache.org/r/7212/diff/12/?file=261360#file261360line59>
> >
> >     Hm.. how is sequence used? Is it perpetually incremented across recoveries? If so let's make this a uint64.

i think this is not used anymore! thanks for catching this.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 69
> > <https://reviews.apache.org/r/7212/diff/12/?file=261361#file261361line69>
> >
> >     s/Whether the update is handled successfully (e.g. checkpointed)./A Future of the update operation./
> >     
> >     You can then describe more return semantics if you like separately but I think if we want to use annotations like @return we should actually describe what we're returning in the first sentence (which is what Jie originally did before I refactored his return types).
> >     
> >     Whether sounds like a boolean to me, I think this pattern is present in the cgroups isolation module because the functions there used to return booleans before I refactored them to return Try's.

Changed it to "A future indicating whether the update is handled successfully (e.g. checkpointed)". I think that is more clear.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 83
> > <https://reviews.apache.org/r/7212/diff/12/?file=261361#file261361line83>
> >
> >     Can you add a ticket or review to track this? If there is one.

there isn't one :/


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 153
> > <https://reviews.apache.org/r/7212/diff/12/?file=261361#file261361line153>
> >
> >     Can you maybe add a comment or note on what the uuid is for and why there can be a mismatch?

Actually I don't remember what the reasoning was. Made this a CHECK to be safe.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 221
> > <https://reviews.apache.org/r/7212/diff/12/?file=261361#file261361line221>
> >
> >     Why are you not using the protobuf write error? We've lost that information here!

good point. fixed.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 169
> > <https://reviews.apache.org/r/7212/diff/12/?file=261362#file261362line169>
> >
> >     Why are you checking the timeout is expired? Can you add a comment?

Because this is the first in the stream, as mentioned in the comment above. Timeout only gets set when we send an update. Let me know if thats not clear?


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 218
> > <https://reviews.apache.org/r/7212/diff/12/?file=261362#file261362line218>
> >
> >     s/might/will

see above.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 295
> > <https://reviews.apache.org/r/7212/diff/12/?file=261362#file261362line295>
> >
> >     Can you make this an Option, that way you can CHECK_NOTNULL when the option is some in the caller?

I think thats a bit of too much overhead. NULL is a perfectly valid sentinel value to check against. Making it an option, doesn't add additional value but adds overhead.


> On Feb. 27, 2013, 10:31 p.m., Ben Mahler wrote:
> > src/slave/status_update_manager.cpp, line 208
> > <https://reviews.apache.org/r/7212/diff/12/?file=261362#file261362line208>
> >
> >     s/might/will/
> >     
> >     Why do we allow this to happen? If it's hard to prevent, please elaborate in this comment.

We use 'might' for 'will', throughout our code base :)

While it might (:)) be easy to stop status updates reaching the SUM, there might be cases where the SUM can receive an ACK for a terminated stream. For eg: updates generated by the master. Added a comment for this.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review17166
-----------------------------------------------------------


On Feb. 23, 2013, 8:11 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review17166
-----------------------------------------------------------



src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment36383>

    can you quote these ids, given we currently don't enforce no whitespace?



src/messages/messages.proto
<https://reviews.apache.org/r/7212/#comment36365>

    Hm.. how is sequence used? Is it perpetually incremented across recoveries? If so let's make this a uint64.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36375>

    s/Whether the update is handled successfully (e.g. checkpointed)./A Future of the update operation./
    
    You can then describe more return semantics if you like separately but I think if we want to use annotations like @return we should actually describe what we're returning in the first sentence (which is what Jie originally did before I refactored his return types).
    
    Whether sounds like a boolean to me, I think this pattern is present in the cgroups isolation module because the functions there used to return booleans before I refactored them to return Try's.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36377>

    ditto



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36379>

    Can you add a ticket or review to track this? If there is one.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36381>

    I don't see the need to make open a separate method, can you please inline it here?
    
    1. open is confusing because it looks like a syscall at first glance.
    2. it's not used anywhere else



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36382>

    ditto here for inlining, I think that will make this more clear and simplify the StatusUpdateStream struct as well!



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36384>

    Can you maybe add a comment or note on what the uuid is for and why there can be a mismatch?



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36385>

    If the StatusUpdateManager stream is in a failed state then yes IMO. Right now an error is indistinguishable from an empty pending queue.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36386>

    Why are you not using the protobuf write error? We've lost that information here!



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36387>

    moving open() into the constructor obviates the need for this CHECK_SOME as well



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment36388>

    moving close() into the destructor obviates the need for this CHECK_SOME



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36399>

    Can you kill the empty definition below and s/;/ {}/ here?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36400>

    Looks like you can move this CHECK to be the first thing in this function.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36402>

    Why are you checking the timeout is expired? Can you add a comment?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36404>

    s/might/will/
    
    Why do we allow this to happen? If it's hard to prevent, please elaborate in this comment.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36405>

    s/might/will



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36406>

    This looks more appropriate as a None() option than a default timeout.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36407>

    kill the quotes on "terminal"



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36408>

    If you made timeout an option you could CHECK here that timeout is some when the stream is not empty, which seems like a good validation of your internal state, right?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36409>

    Can you make this an Option, that way you can CHECK_NOTNULL when the option is some in the caller?



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment36411>

    This fits on one line.



third_party/libprocess/include/process/timeout.hpp
<https://reviews.apache.org/r/7212/#comment36412>

    Ugh, I hate these double time comparisons but I'll be fixing that sometime soon :)


- Ben Mahler


On Feb. 23, 2013, 8:11 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 2): Status Update Manager.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated March 13, 2013, 6:15 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

s/part 1/part 2/


Summary (updated)
-----------------

Slave Restart (Part 2): Status Update Manager.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs
-----

  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp 66c2dc17655123641433c078931acc9e769d5077 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 1): Status Update Manager.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated March 13, 2013, 6:10 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Summary (updated)
-----------------

Slave Restart (Part 1): Status Update Manager.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp 66c2dc17655123641433c078931acc9e769d5077 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated March 1, 2013, 10:39 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Changed Try<Option> to Result<>..duh.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated March 1, 2013, 7:55 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Feb. 23, 2013, 8:11 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 21, 2013, 11:51 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.hpp, line 179
> > <https://reviews.apache.org/r/7212/diff/11/?file=259875#file259875line179>
> >
> >     Return an error or make a big todo at the top of this class to do this.

bit the bullet.


> On Feb. 21, 2013, 11:51 p.m., Benjamin Hindman wrote:
> > src/slave/status_update_manager.hpp, line 130
> > <https://reviews.apache.org/r/7212/diff/11/?file=259875#file259875line130>
> >
> >     Methods in StatusUpdateStream should really be returning a Try (or Result if necessary). You can use the technique of an 'Option<bool> error' to capture when an error might have occurred and then return the error when any calls are made after an error has occured (i.e., a call to handle).

done.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review16888
-----------------------------------------------------------


On Feb. 19, 2013, 7:59 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review16888
-----------------------------------------------------------

Ship it!



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/7212/#comment35834>

    s/reason/message/



src/common/protobuf_utils.hpp
<https://reviews.apache.org/r/7212/#comment35833>

    None().



src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment35835>

    Each argument on newline please.



src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment35836>

    Ditto.



src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment35838>

    I'd love to kill this if it's not being used.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35847>

    If open does not get called from outside constructor, we should make this private.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35842>

    Methods in StatusUpdateStream should really be returning a Try (or Result if necessary). You can use the technique of an 'Option<bool> error' to capture when an error might have occurred and then return the error when any calls are made after an error has occured (i.e., a call to handle).



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35841>

    Wrap path in single quotes.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35848>

    Likewise here ... if this is private to the implementation let's make it so.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35849>

    This guy only really makes sense if open/close are private and your semantics are close only gets called via the destructor.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35850>

    None() (here and everywhere else ... maybe just do one more commit at the end that takes care of this).



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35851>

    Single quotes.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35852>

    Return an error or make a big todo at the top of this class to do this.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment35853>

    s/re-send/resend/


- Benjamin Hindman


On Feb. 19, 2013, 7:59 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 7:59 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 7:59 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14509
-----------------------------------------------------------



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30900>

    This comment is a little confusing, because it sounds like you would want to use os::exists.
    
    I had to read os::mkdir to remind myself about that recursive flag that when set, returns no error when the dir already exists.
    
    So.. maybe change this comment to be:
    // Create the base updates directory if it doesn't exist.
    
    I think os::mkdir does a bit too much magic with that optional flag, but alas.


- Ben Mahler


On Dec. 14, 2012, 8:29 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 14, 2012, 8:29 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 14, 2012, 8:29 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenM's comments


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/local/local.cpp 3402029e07341212717e346a97e7cd30fa52ed51 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > src/common/type_utils.hpp, line 307
> > <https://reviews.apache.org/r/7212/diff/9/?file=237621#file237621line307>
> >
> >     2 line indent

thank you


> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 144
> > <https://reviews.apache.org/r/7212/diff/9/?file=237623#file237623line144>
> >
> >     s/result/close has been the convention that benh and I started following (naming the variable the same as the os::_ call)

fair enough


> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 146
> > <https://reviews.apache.org/r/7212/diff/9/?file=237623#file237623line146>
> >
> >     LOG(ERROR) << "Failed to close " << path.get() << ": " << close.error();

Changed the message but kept the log level. I think ERROR level is too alarming for this.


> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 180
> > <https://reviews.apache.org/r/7212/diff/9/?file=237623#file237623line180>
> >
> >     Looks like you also need a CHECK(result.get()) after since this can return false.

I didn't want to write to CHECK's with similar looking error messages, so I changed CHECK_SOME to CHECK(result.isSome() && result.get().

Need to do some ternary operator trickery in the log output, though. Let me know what you think.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14417
-----------------------------------------------------------


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > See my previous comment, have you considered mesos-local in these changes?
> > In that, is there anything that breaks mesos-local?

as i responded above, it should be fixed now.


> On Dec. 13, 2012, 2:42 a.m., Ben Mahler wrote:
> > src/slave/status_update_manager.hpp, line 180
> > <https://reviews.apache.org/r/7212/diff/9/?file=237623#file237623line180>
> >
> >     Looks like you also need a CHECK(result.get()) after since this can return false.
> 
> Vinod Kone wrote:
>     I didn't want to write to CHECK's with similar looking error messages, so I changed CHECK_SOME to CHECK(result.isSome() && result.get().
>     
>     Need to do some ternary operator trickery in the log output, though. Let me know what you think.

s/write to/write two/

s/Need/Needed/


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14417
-----------------------------------------------------------


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review14417
-----------------------------------------------------------


See my previous comment, have you considered mesos-local in these changes?
In that, is there anything that breaks mesos-local?


src/common/type_utils.hpp
<https://reviews.apache.org/r/7212/#comment30706>

    2 line indent



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30711>

    This TODO looks to be no longer applicable.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30715>

    s/result/close has been the convention that benh and I started following (naming the variable the same as the os::_ call)



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30714>

    LOG(ERROR) << "Failed to close " << path.get() << ": " << close.error();



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30718>

    Looks like you also need a CHECK(result.get()) after since this can return false.



src/slave/status_update_manager.hpp
<https://reviews.apache.org/r/7212/#comment30717>

    Appending the error is done by CHECK_SOME for you, so this last line is redundant.



src/slave/status_update_manager.cpp
<https://reviews.apache.org/r/7212/#comment30719>

    .empty() should be available on hash_map


- Ben Mahler


On Dec. 12, 2012, 11:11 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:11 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
>   src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/slave/status_update_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 12, 2012, 11:11 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

re-based off trunk so the diff of diffs wont be pretty.

addresses BenM's comments and adds fast failing during checkpoint errors.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 12, 2012, 8:48 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

some changes, primarily after discussions with BenH.

This doesn't address BenM's comments yet!


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 10, 2012, 7:58 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

status update stream subsumes the checkpointing functionality.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 7, 2012, 12:56 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenH's comments


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/common/protobuf_utils.hpp 69baab78c8db2d2d33ffbbe7f5e5fc80d65b0e1a 
  src/common/type_utils.hpp fde69aeec403b3455839dca6b0b2e1507d81ba00 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/slave/status_update_manager.cpp PRE-CREATION 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Dec. 3, 2012, 9:54 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Description
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs
-----

  src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
  src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
  src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 30, 2012, 6:06 p.m., Ben Mahler wrote:
> > So all the files will grow unbounded for a given slave run, right?
> > Have you thought about rotation on these files, or do you think their size will be reasonable?

Thats a good point. We haven't thought about how to gc the metadata yet. I will think on it when I do further iterations on the review that includes integration of SUM with the slave (https://reviews.apache.org/r/7655/)


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12913
-----------------------------------------------------------


On Oct. 20, 2012, 2:43 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2012, 2:43 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12913
-----------------------------------------------------------


So all the files will grow unbounded for a given slave run, right?
Have you thought about rotation on these files, or do you think their size will be reasonable?

- Ben Mahler


On Oct. 20, 2012, 2:43 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2012, 2:43 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12672
-----------------------------------------------------------

Ship it!


I have no more comments on the current iteration, so issuing my ship it.


src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment27013>

    I'm surprised boost knows how to hash StatusUpdateStreamID, must be some pretty heavy boost magic.


- Ben Mahler


On Oct. 20, 2012, 2:43 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Oct. 20, 2012, 2:43 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> added shutdownFramework
> 
> 
> Minor fixes for open
> 
> 
> Ben's comments
> 
> 
> Refactoring SUM
> 
> 
> Bens' comments
> 
> 
> Status Update Manager
> 
> Rebased off latest trunk
> 
> Conflicts:
> 	src/Makefile.am
> 	src/common/protobuf_utils.hpp
> 	src/common/utils.hpp
> 	src/slave/slave.cpp
> 	src/tests/protobuf_io_tests.cpp
> 	src/tests/utils_tests.cpp
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
>   src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
>   src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
>   third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Oct. 20, 2012, 2:43 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

add the ability to stop re-trying status updates after the framework terminates.


Description (updated)
-------

added shutdownFramework


Minor fixes for open


Ben's comments


Refactoring SUM


Bens' comments


Status Update Manager

Rebased off latest trunk

Conflicts:
	src/Makefile.am
	src/common/protobuf_utils.hpp
	src/common/utils.hpp
	src/slave/slave.cpp
	src/tests/protobuf_io_tests.cpp
	src/tests/utils_tests.cpp


Diffs (updated)
-----

  src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
  src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
  src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Oct. 18, 2012, 8:53 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

some minor fixes as I was testing the integration.


Description (updated)
-------

Status Update Manager


Diffs (updated)
-----

  src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
  src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
  src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Oct. 18, 2012, 1:30 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Refactored status updates manager and Bens' comments


Description (updated)
-------

Status Update Manager


Diffs (updated)
-----

  src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
  src/common/protobuf_utils.hpp 77b300d7c1a02a836100d3365e205889c48ae99a 
  src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 
  third_party/libprocess/include/process/timeout.hpp cac996070359a3e7ecdd8077af83c8c4cf9735fd 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/
-----------------------------------------------------------

(Updated Oct. 15, 2012, 11:46 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Bens' comments


Description (updated)
-------

Implementation of Status Updates Manager.

Features:
-->  Checkpoints status updates and acks.
--> Reliably sends stats updates


Diffs (updated)
-----

  src/Makefile.am cf9364ea0418ec30a75b1774f32bda8de6e031ac 
  src/messages/messages.proto 4e0538fe929f9091e5cdd4a1bb017d836df52a3e 
  src/slave/status_updates_manager.hpp PRE-CREATION 
  src/slave/status_updates_manager.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7212/diff/


Testing
-------

make check.


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/messages/messages.proto, line 65
> > <https://reviews.apache.org/r/7212/diff/1/?file=159109#file159109line65>
> >
> >     Move type closer to the enum, and make it  '= 1' (it's the thing that differentiates the type of message you have).

done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.hpp, line 19
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line19>
> >
> >     Why not match the filename? s/STATUSUPDATES/STATUS_UPDATES/

done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.hpp, line 58
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line58>
> >
> >     Use an Option (and default it to none) if you're going to optionally send a message based on executorPID == UPID().

done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 39
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line39>
> >
> >     If you used the slave's PID rather than self (line 243) then the slave could dispatch to statusUpdateAcknowledgement which would give the added benefit of pushing any errors in recording the status update acknowledgement back to the slave to take action. It's more work on the slave side, but it enables capturing those errors fairly cleanly. I'm happy to discuss this more in detail.

agreed. i also like the added benefit of symmetry.

passing slave pid via initialize(), so that we can inject different kind of SUMs (mock/no disk checkpointer) into the slave.


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 83
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line83>
> >
> >     If the slave removed the executor from it's data structures, and the slave dispatches to the SUM, how is this case possible? This sounds like it should be more of a CHECK than a LOG.

I dont' remember the scenario that needed me to have this log instead of check. Reverting this to check for now. I will update it later if the scenario is caught in tests.


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 102
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line102>
> >
> >     We might need to rethink the 'acknowledged' set, since it grows (without bound) with the number of tasks.

Discussed offline. The acknowledged set doesnt grow unbounded.


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 147
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line147>
> >
> >     Is this just your guess about how this will work? Or is there code that actually uses this? If so, I'd love to see that review too please.

This is how it is done in the slave. I was in 2 minds regarding where this recovery should happen. I went with all the recovery logic being inside the slave. Open to discussion.

I will send out that review (integration with he slave) next.


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 136
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line136>
> >
> >     How about having sendStatusUpdate return the timeout (since delay returns a Timer from which you can call Timer::timeout)?

done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 187
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line187>
> >
> >     s/no  associated/no associated/

hawk eye!


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 206
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line206>
> >
> >     Just to be pedantic, you need to do this AFTER you write it to disk.

:)


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 248
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line248>
> >
> >     return delay(...).timeout();

done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 282
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line282>
> >
> >     The update can have a large 'data' chunk. Any reason not to just write the UUID so that we don't have to write the data chunk twice?

good point. fixed


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 318
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line318>
> >
> >     Better yet, do a get which should return an Option.

even better, just killed this function in favor of inlining it :)


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 344
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line344>
> >
> >     Again, Option seems best here. And delete after you remove from the data structure please. ;)

How would that avoid double look up here?


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.cpp, line 123
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line123>
> >
> >     You could return the error through the future and let the slave commit suicide.

agreed. done


> On Oct. 15, 2012, 5:01 p.m., Benjamin Hindman wrote:
> > src/slave/status_updates_manager.hpp, line 47
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line47>
> >
> >     There needs to be a high-level comment here describing the semantics of receiving (from an executor, when is the executor ACKed?), recording (what about frameworks that don't want the penalty of recording?), and sending status updates (how long are they resent?). Likewise, describing a few of the data structures here such as what a "status update stream" is, how we "name" it, how long it sticks around, etc.

done. Let me know how it reads.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12429
-----------------------------------------------------------


On Sept. 21, 2012, 6:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 6:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is Status Update Manager's API and implementation.
> 
> Haven't included the integration (into slave) and tests yet.
> 
> Rebased off latest trunk
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am df8696920fd48968907270decbef3dda0803f80a 
>   src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12429
-----------------------------------------------------------


You could have tests for this stuff in isolation.


src/messages/messages.proto
<https://reviews.apache.org/r/7212/#comment26337>

    Move type closer to the enum, and make it  '= 1' (it's the thing that differentiates the type of message you have).



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26338>

    Why not match the filename? s/STATUSUPDATES/STATUS_UPDATES/



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26344>

    There needs to be a high-level comment here describing the semantics of receiving (from an executor, when is the executor ACKed?), recording (what about frameworks that don't want the penalty of recording?), and sending status updates (how long are they resent?). Likewise, describing a few of the data structures here such as what a "status update stream" is, how we "name" it, how long it sticks around, etc.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26339>

    virtual



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26340>

    Use an Option (and default it to none) if you're going to optionally send a message based on executorPID == UPID().



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment26341>

    const &



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26364>

    If you used the slave's PID rather than self (line 243) then the slave could dispatch to statusUpdateAcknowledgement which would give the added benefit of pushing any errors in recording the status update acknowledgement back to the slave to take action. It's more work on the slave side, but it enables capturing those errors fairly cleanly. I'm happy to discuss this more in detail.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26361>

    Space.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26362>

    ?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26342>

    s/master=/master =/



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26343>

    If the slave removed the executor from it's data structures, and the slave dispatches to the SUM, how is this case possible? This sounds like it should be more of a CHECK than a LOG.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26345>

    We might need to rethink the 'acknowledged' set, since it grows (without bound) with the number of tasks.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26346>

    Incomplete sentence.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26363>

    You could return the error through the future and let the slave commit suicide.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26347>

    How about having sendStatusUpdate return the timeout (since delay returns a Timer from which you can call Timer::timeout)?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26353>

    Is this just your guess about how this will work? Or is there code that actually uses this? If so, I'd love to see that review too please.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26365>

    const &



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26354>

    s/no  associated/no associated/



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26355>

    Yes, this should be an ERROR since two status updates should not be in flight at the same time.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26356>

    Can drop the '} else {' and unindent that block (that's why you have the 'return' on the previous line).



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26358>

    Just to be pedantic, you need to do this AFTER you write it to disk.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26359>

    As mentioned above (and below):
    
    stream->timeout = sendStatusUpdate(...);



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26360>

    return delay(...).timeout();



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26352>

    This is TODO.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26357>

    The update can have a large 'data' chunk. Any reason not to just write the UUID so that we don't have to write the data chunk twice?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26348>

    Same comment as above, this could be:
    
    stream->timeout = sendStatusUpdate(update);



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26349>

    Better yet, do a get which should return an Option.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26350>

    That works for me.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment26351>

    Again, Option seems best here. And delete after you remove from the data structure please. ;)


- Benjamin Hindman


On Sept. 21, 2012, 6:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 6:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is Status Update Manager's API and implementation.
> 
> Haven't included the integration (into slave) and tests yet.
> 
> Rebased off latest trunk
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am df8696920fd48968907270decbef3dda0803f80a 
>   src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Vinod Kone <vi...@gmail.com>.

> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.hpp, line 53
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line53>
> >
> >     so.. does each stream have a file where it's checkpointing status updates?
> >     
> >     Perhaps s/path/filepath is more appropriate?

Yes.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.hpp, line 111
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line111>
> >
> >     Might be better to use Option<int> for the fd, rather than using -1 as a sentinel value.

agreed. done


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.hpp, line 151
> > <https://reviews.apache.org/r/7212/diff/1/?file=159110#file159110line151>
> >
> >     where is run used?

This is used by the slave, during recovery. That code hasn't been sent out for review yet.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 66
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line66>
> >
> >     What is path indicating here?

Path is where the status update is checkpointed. It is doc'ed in the header. Let me know if it isn't clear.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 80
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line80>
> >
> >     use path.empty()?

done


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 81
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line81>
> >
> >     capitalize

Its the variable name. quoted it for clarity.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 154
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line154>
> >
> >     any reason for checking this, the input is a hashmap and so by definition the keys are unique?

killed


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 215
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line215>
> >
> >     I think inlining these is more intuitive, and they get evaluated once anyway.

this is an inlined function! also, this will be used in slave.cpp too.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 226
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line226>
> >
> >     who is ACKing these, the master?
> >     
> >     seems like a lot of communication for all the slaves to be sending every single status update in serial?

these are acked by the scheduler driver. agree that this could be a scalability problem, but i think these kinda optimizations are way down the road.


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 234
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line234>
> >
> >     period, also maybe just have the full comment at the signature rather than down here? ditto on the others

done


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 318
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line318>
> >
> >     why not do a find to avoid doing the double lookup?

done


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 332
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line332>
> >
> >     use CHECK_NOTNULL and inline it in the assignment:
> >     
> >     streams[id] = CHECK_NOTNULL(stream);
> >     
> >     actually.. benh might not like inlining it

done. i like it :) 


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 344
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line344>
> >
> >     sanity check for contains (use find to avoid double lookups)

done


> On Sept. 28, 2012, 1:14 a.m., Ben Mahler wrote:
> > src/slave/status_updates_manager.cpp, line 195
> > <https://reviews.apache.org/r/7212/diff/1/?file=159111#file159111line195>
> >
> >     seems indicative of an ERROR

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12010
-----------------------------------------------------------


On Sept. 21, 2012, 6:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 6:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is Status Update Manager's API and implementation.
> 
> Haven't included the integration (into slave) and tests yet.
> 
> Rebased off latest trunk
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am df8696920fd48968907270decbef3dda0803f80a 
>   src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Deux): Status Update Manager

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7212/#review12010
-----------------------------------------------------------



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment25630>

    so.. does each stream have a file where it's checkpointing status updates?
    
    Perhaps s/path/filepath is more appropriate?



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment25624>

    Might be better to use Option<int> for the fd, rather than using -1 as a sentinel value.



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment25623>

    I had written all these comments about fsync() and O_SYNC, then I see you've got it down here! ;)



src/slave/status_updates_manager.hpp
<https://reviews.apache.org/r/7212/#comment25625>

    where is run used?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25629>

    What is path indicating here?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25626>

    use path.empty()?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25627>

    capitalize



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25628>

    The comment you wrote suggests ERROR over WARNING, no?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25621>

    any reason for checking this, the input is a hashmap and so by definition the keys are unique?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25620>

    seems indicative of an ERROR



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25619>

    Many of these comments are missing the trailing period.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25633>

    period



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25631>

    I think inlining these is more intuitive, and they get evaluated once anyway.



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25632>

    who is ACKing these, the master?
    
    seems like a lot of communication for all the slaves to be sending every single status update in serial?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25634>

    period, also maybe just have the full comment at the signature rather than down here? ditto on the others



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25614>

    why not do a find to avoid doing the double lookup?



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25616>

    use CHECK_NOTNULL and inline it in the assignment:
    
    streams[id] = CHECK_NOTNULL(stream);
    
    actually.. benh might not like inlining it



src/slave/status_updates_manager.cpp
<https://reviews.apache.org/r/7212/#comment25617>

    sanity check for contains (use find to avoid double lookups)


- Ben Mahler


On Sept. 21, 2012, 6:20 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7212/
> -----------------------------------------------------------
> 
> (Updated Sept. 21, 2012, 6:20 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> This is Status Update Manager's API and implementation.
> 
> Haven't included the integration (into slave) and tests yet.
> 
> Rebased off latest trunk
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am df8696920fd48968907270decbef3dda0803f80a 
>   src/messages/messages.proto 2a0321cb3d9e3f6499e2108a3b21516a3bd18d2f 
>   src/slave/status_updates_manager.hpp PRE-CREATION 
>   src/slave/status_updates_manager.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7212/diff/
> 
> 
> Testing
> -------
> 
> make check.
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>