You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/12/04 04:59:14 UTC

Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 4, 2012, 3:59 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off the latest status updates manager code (which was rebase off trunk).


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
  src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
  src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
  src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_updates_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 801
> > <https://reviews.apache.org/r/7655/diff/8/?file=238494#file238494line801>
> >
> >     Any reason not to use the: "Failed to checkpoint..." log format?
> 
> Vinod Kone wrote:
>     I want to make sure it says that the failure is in the status update manager.
>     
>     Saying, Failed to checkpoint.....blah..by status update manager doesn't read nice.
>     
>     We might have to say.. "Status update manager failed to..".
>     
>     Also, I don't like "Failed to" for warnings. Sounds more ominous than intended.
> 
> Ben Mahler wrote:
>     Ok please use "could not" in that case.

done


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review15397
-----------------------------------------------------------


On Jan. 16, 2013, 11:10 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2013, 11:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> This addresses bug MESOS-110.
>     https://issues.apache.org/jira/browse/MESOS-110
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 801
> > <https://reviews.apache.org/r/7655/diff/8/?file=238494#file238494line801>
> >
> >     Any reason not to use the: "Failed to checkpoint..." log format?

I want to make sure it says that the failure is in the status update manager.

Saying, Failed to checkpoint.....blah..by status update manager doesn't read nice.

We might have to say.. "Status update manager failed to..".

Also, I don't like "Failed to" for warnings. Sounds more ominous than intended.


> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 903
> > <https://reviews.apache.org/r/7655/diff/8/?file=238494#file238494line903>
> >
> >     In general I think "Could not" reads well, compared to "Couldn't", or even better for these messages:
> >     
> >     "Failed to find framework..."
> >     or
> >     "Unknown framework..."

I rephrased it as "Could not find..."


> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1271
> > <https://reviews.apache.org/r/7655/diff/8/?file=238494#file238494line1271>
> >
> >     So they'll be transitioned by other code, or? Can you add a comment / pointer to where?

Actually, this logic is fixed in one of the latter reviews. So, I'll ignore this for now.


> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/tests/status_update_manager_tests.cpp, line 212
> > <https://reviews.apache.org/r/7655/diff/8/?file=238497#file238497line212>
> >
> >     No easy way to WAIT_UNTIL instead of sleep?

unfortunately, no. there are no side-effects that we can easily wait/depend on.


> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/tests/status_update_manager_tests.cpp, line 220
> > <https://reviews.apache.org/r/7655/diff/8/?file=238497#file238497line220>
> >
> >     Kill the extra space:
> >     s/fd  =/fd =/

good eye


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review15397
-----------------------------------------------------------


On Dec. 14, 2012, 8:31 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 14, 2012, 8:31 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Ben Mahler <be...@gmail.com>.

> On Jan. 16, 2013, 7:33 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 801
> > <https://reviews.apache.org/r/7655/diff/8/?file=238494#file238494line801>
> >
> >     Any reason not to use the: "Failed to checkpoint..." log format?
> 
> Vinod Kone wrote:
>     I want to make sure it says that the failure is in the status update manager.
>     
>     Saying, Failed to checkpoint.....blah..by status update manager doesn't read nice.
>     
>     We might have to say.. "Status update manager failed to..".
>     
>     Also, I don't like "Failed to" for warnings. Sounds more ominous than intended.

Ok please use "could not" in that case.


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review15397
-----------------------------------------------------------


On Jan. 16, 2013, 11:10 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Jan. 16, 2013, 11:10 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> This addresses bug MESOS-110.
>     https://issues.apache.org/jira/browse/MESOS-110
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review15397
-----------------------------------------------------------

Ship it!


Modulo comments below.


src/slave/paths.hpp
<https://reviews.apache.org/r/7655/#comment33141>

    Can you log a warning here? I think this might be useful if the symlink fails because of this.
    
    Try<Nothing> rm = os::rm(latest);
    
    if (rm.isError()) {
      LOG(WARNING) << "Failed to rm latest symlink '" << latest << "': " << rm.error();
    }
    
    I think CHECK_SOME is also ok here:
    
    CHECK_SOME(os::rm(latest)) << "Failed to rm latest symlink '" << latest << "'";



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33148>

    Can you add a string when it's discarded?
    
    s/update manager"/update manager: "
    s/": " + future.failure()/future.failure()
    s/""/"the future was discarded"



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33149>

    Any reason not to use the: "Failed to checkpoint..." log format?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33150>

    In general I think "Could not" reads well, compared to "Couldn't", or even better for these messages:
    
    "Failed to find framework..."
    or
    "Unknown framework..."



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33151>

    ditto



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33152>

    ditto my comment above for outputting discarded



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33153>

    s/size() == 0/empty()/



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33154>

    So they'll be transitioned by other code, or? Can you add a comment / pointer to where?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment33155>

    ditto discarded



src/tests/status_update_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment33157>

    No easy way to WAIT_UNTIL instead of sleep?



src/tests/status_update_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment33158>

    Kill the extra space:
    s/fd  =/fd =/



src/tests/status_update_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment33163>

    I think this is cleaner if you stuff the record, result inside the while loop:
    
    int updates = 0;
    int acks = 0;
    string uuid;
    
    while (true) {
      StatusUpdateRecord record;
      Result<bool> result = protobuf::read(fd.get(), &record);
    
      if (!result.isSome()) {
        break;
      }
    
      if (record.type() == StatusUpdateRecord::UPDATE) {
          EXPECT_EQ(TASK_RUNNING, record.update().status().state());
          uuid = record.update().uuid();
          updates++;
        } else {
          EXPECT_EQ(uuid, record.uuid());
          acks++;
        }
      }
    


- Ben Mahler


On Dec. 14, 2012, 8:31 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 14, 2012, 8:31 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part 3): Integrated status updates manager into the slave.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated March 13, 2013, 6:16 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

s/part 2/part 3/


Summary (updated)
-----------------

Slave Restart (Part 3): Integrated status updates manager into the slave.


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs
-----

  include/mesos/mesos.proto 2a6170e8fc86ffd9b53f1cccab5757f0f9219f8b 
  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 2ba14fcaade0580970196f0eb8dbcd5ce573797e 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part 2): Integrated status updates manager into the slave.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated March 13, 2013, 6:11 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk for posterity. no need for review.


Summary (updated)
-----------------

Slave Restart (Part 2): Integrated status updates manager into the slave.


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs (updated)
-----

  include/mesos/mesos.proto 2a6170e8fc86ffd9b53f1cccab5757f0f9219f8b 
  src/Makefile.am f7242a7fb55de3d1df4930126345274124331d57 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp 11f17aef6d5deefe83b2d4706e4b8b24adaac5f4 
  src/slave/slave.cpp 889e9fe7671c9b158486fd7b85fef139c4ab8c9b 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 2ba14fcaade0580970196f0eb8dbcd5ce573797e 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated March 1, 2013, 10:40 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's.


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs (updated)
-----

  include/mesos/mesos.proto 2a6170e8fc86ffd9b53f1cccab5757f0f9219f8b 
  src/Makefile.am 851237cbf8db071d27de40c01d702514713b861a 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp 7648c33230c1900eda7529045c5df9ccab105d47 
  src/slave/slave.cpp c13d268c5e0e107902f64e30304a18128927a571 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 104610cef3fdaaea75271657c42c4fd2b6b61618 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Feb. 28, 2013, 10:11 p.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 139
> > <https://reviews.apache.org/r/7655/diff/11/?file=261368#file261368line139>
> >
> >     Document the purpose of the optional executor argument, maybe use Option (with None() default) + CHECK_NOTNULL when the option is some?

Aah. I don't like options for pointers. But, added a comment.


> On Feb. 28, 2013, 10:11 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 76
> > <https://reviews.apache.org/r/7655/diff/11/?file=261369#file261369line76>
> >
> >     why does it need to be a pointer at all?

The idea was to make it easy to inject a mock status update manager for testing (like we do isolation module). But, I never got around it :/

I can create this on a stack, if this bothers you.


> On Feb. 28, 2013, 10:11 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 610
> > <https://reviews.apache.org/r/7655/diff/11/?file=261369#file261369line610>
> >
> >     I don't buy this comment, statusUpdate is functionally doing more than forwardUpdate.
> >     
> >     It is modifying the executor state, if found. Talking to the isolation module as well.
> >     
> >     Also, if statusUpdate forwarded for unknown frameworks we could call it here right? Sounds like we should take care of that TODO and eliminate the need for having _both_ statusUpdate and forwardUpdate, wouldn't that be much cleaner?

OK. After our discussions offline, I think the right thing to do here is to make statusUpdate() more permissive (i.e., forwarding updates irrespective of unknown frameworks/executors). We can revisit this strategy when there is persistent state at the master.

I am keeping forwardUpdate() instead of folding it statusUpdate() to make it easy to revert the logic in the future, when the master has persistence.


> On Feb. 28, 2013, 10:11 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 686
> > <https://reviews.apache.org/r/7655/diff/11/?file=261369#file261369line686>
> >
> >     s/shut down/shutdown to be consistent

I think BenH doesn't like 'shutdown' in sentences. The log message above is talking about 'shutdown' framework message, which I think is ok.


> On Feb. 28, 2013, 10:11 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 791
> > <https://reviews.apache.org/r/7655/diff/11/?file=261369#file261369line791>
> >
> >     Use "Failed" style message.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review17204
-----------------------------------------------------------


On Feb. 23, 2013, 8:11 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> This addresses bug MESOS-110.
>     https://issues.apache.org/jira/browse/MESOS-110
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp e140f409bf6e651cc365b5fb8d064ceae1cd8ed8 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review17204
-----------------------------------------------------------


The higher level message expressed in my comments is whether we can push forwardUpdate into statusUpdate and eliminate the need for both calls.
It looks like this is possible if you take care of the TODO for forwarding unknown framework updates, right?


include/mesos/mesos.proto
<https://reviews.apache.org/r/7655/#comment36487>

    I see you're following the documentation style of mesos.proto but I think at some point we should move to document the fields inline over including them in the large paragraph above the proto message.



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment36492>

    Can you group the declaration of the status update methods so they are all together?
    
    i.e. why are the status update ack calls declared separately above?



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment36491>

    Document the purpose of the optional executor argument, maybe use Option (with None() default) + CHECK_NOTNULL when the option is some?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36493>

    why does it need to be a pointer at all?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36497>

    I don't buy this comment, statusUpdate is functionally doing more than forwardUpdate.
    
    It is modifying the executor state, if found. Talking to the isolation module as well.
    
    Also, if statusUpdate forwarded for unknown frameworks we could call it here right? Sounds like we should take care of that TODO and eliminate the need for having _both_ statusUpdate and forwardUpdate, wouldn't that be much cleaner?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36498>

    ditto above.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36499>

    s/shut down/shutdown to be consistent



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36500>

    Use "Failed" style message.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36501>

    Can you include the same information as above? Task id and framework id specifically.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36496>

    s/,//
    s/at/in/



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment36495>

    I don't see why we forward when the executor is unknown but _not_ when the framework is unknown?
    
    I see you have a TODO above for this very question. What exactly would be the harm in forwarding it to the master?


- Ben Mahler


On Feb. 23, 2013, 8:11 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Feb. 23, 2013, 8:11 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> This addresses bug MESOS-110.
>     https://issues.apache.org/jira/browse/MESOS-110
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp e140f409bf6e651cc365b5fb8d064ceae1cd8ed8 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Feb. 23, 2013, 8:11 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benh's


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp e140f409bf6e651cc365b5fb8d064ceae1cd8ed8 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review16898
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment35871>

    I think this comment could be more clear.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment35870>

    Let's add a comment here: "the executor may be null because ...". Also, let's do the actual forwarding at the bottom of this function (put the comment there too). In the future when we change these semantics we'll still want to check if the executor exists (we could have gotten a bogus status update), but we'll only forward it if it's valid.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment35866>

    Maybe LOG(FATAL) for now?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment35867>

    Don't delete it, put it in completed!



src/tests/status_update_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment35869>

    Don't need this temporary.


- Benjamin Hindman


On Feb. 19, 2013, 8 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Feb. 19, 2013, 8 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> This addresses bug MESOS-110.
>     https://issues.apache.org/jira/browse/MESOS-110
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Feb. 19, 2013, 8 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased off trunk.


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp d4721c3eb51db87278d05f6fbe2eadb8a3a9b4dd 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Jan. 16, 2013, 11:10 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's comments


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Jan. 16, 2013, 10:56 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

added the bug id


Description
-------

Integrated SUM into slave.


This addresses bug MESOS-110.
    https://issues.apache.org/jira/browse/MESOS-110


Diffs
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 14, 2012, 8:31 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenM's comments


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 13, 2012, 7:04 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 766
> > <https://reviews.apache.org/r/7655/diff/7/?file=237631#file237631line766>
> >
> >     Is this Future<bool> necessary?

probably not. i just copied it from code elsewhere (cgroups?)


> On Dec. 13, 2012, 7:04 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 790
> > <https://reviews.apache.org/r/7655/diff/7/?file=237631#file237631line790>
> >
> >     Can you add the future failure reason to this log message?
> >     
> >     Do you want to also check for discarded?

!ready catches discarded no ?


> On Dec. 13, 2012, 7:04 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 978
> > <https://reviews.apache.org/r/7655/diff/7/?file=237631#file237631line978>
> >
> >     Is logging an error ok here?
> >     Is it ok to proceed, or do you want to bail out and take some action instead of calling update?

good catch. we should bail. to be consistent with my part 4 review, the guy doing checkpointing should ensure the directory exists. so moved the directory creation to status update manager.


> On Dec. 13, 2012, 7:04 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 1268
> > <https://reviews.apache.org/r/7655/diff/7/?file=237631#file237631line1268>
> >
> >     What cleanup? The gc? The destroyExecutor? Both?

both


> On Dec. 13, 2012, 7:04 p.m., Ben Mahler wrote:
> > src/tests/status_update_manager_tests.cpp, line 160
> > <https://reviews.apache.org/r/7655/diff/7/?file=237634#file237634line160>
> >
> >     Can you add a comment on what each test is exercising? It's a bit hard for me to follow this and our other tests, since they are quite verbose.

added a comment inside the first test. the second one is obvious?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14450
-----------------------------------------------------------


On Dec. 12, 2012, 11:13 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14450
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30744>

    Maybe point out that this allows this to be called directly, but in particular we'd like it to allow Slave::finalize() to call it.
    
    Something like:
    // Allow direct calls to shutdownFramework (in particular, from Slave::finalize()).
    // If called from a message, ensure it's from the leading master.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30745>

    Can you log who asked for the shutdown?
    
    i.e. add 'from' to the log when it's set, seems useful for debugging.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30747>

    I don't see where the framework gets removed from the framework map here, can you add a comment as to where that happens?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30748>

    Is this Future<bool> necessary?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30749>

    Can you add the future failure reason to this log message?
    
    Do you want to also check for discarded?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30757>

    Is logging an error ok here?
    Is it ok to proceed, or do you want to bail out and take some action instead of calling update?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30759>

    /s/" : "/": " for consistency with our other logs



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30760>

    ditto on future failure reason and discarded



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30762>

    What cleanup? The gc? The destroyExecutor? Both?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30763>

    ditto on failure reason and discarded check



src/tests/status_update_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment30764>

    Can you add a comment on what each test is exercising? It's a bit hard for me to follow this and our other tests, since they are quite verbose.


- Ben Mahler


On Dec. 12, 2012, 11:13 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 12, 2012, 11:13 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

re-based off trunk.

the only change is that _forwardUpdate now takes Future<Nothing> instead of Future<bool>, due to the corresponding change in the status manager api.


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
  src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
  src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
  src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
  src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 12, 2012, 8:51 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

Moved 'checkpoint' from Task to Framework.  In other words, the granularity of checkpointing is at the framework level, instead of task level. This makes thing simple.

Doesn't address BenM's comments yet.

P.S: Not sure why post-reviews thinks the test file is new!?


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
  src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
  src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
  src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
  src/tests/status_update_manager_tests.cpp PRE-CREATION 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 12, 2012, 1:19 a.m., Ben Mahler wrote:
> > Updated my review.

and yes Diff7 addresses BenM's comments too.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14339
-----------------------------------------------------------


On Dec. 12, 2012, 11:13 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 11:13 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am c94736df660a25b58dc47c07d9c56c3c26152a66 
>   src/slave/gc.cpp fe22f9b557215514e3d432d36af9dc0c377c437b 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 12, 2012, 1:19 a.m., Ben Mahler wrote:
> > src/messages/messages.proto, line 49
> > <https://reviews.apache.org/r/7655/diff/5/?file=236692#file236692line49>
> >
> >     kill TODO since it's going to be difficult to make this required..
> >     
> >     see: https://developers.google.com/protocol-buffers/docs/cpptutorial
> >     
> >     Snippet from the 'Extending a Protocol Buffer' section:
> >     
> >     "Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:
> >     
> >      -you must not change the tag numbers of any existing fields.
> >      -you must not add or delete any required fields.
> >      -you may delete optional or repeated fields.
> >      -you may add new optional or repeated fields but you must use fresh tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).
> >     "
> >     
> >     Note the following:
> >      -you must not add or delete any required fields.
> >

Moved checkpoint to framework info. But, fyi, as you will see in part 4/5, we are going to add new protobufs and callbacks. So the upgrade is most likely going to be backwards incompatible!


> On Dec. 12, 2012, 1:19 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 335
> > <https://reviews.apache.org/r/7655/diff/5/?file=236695#file236695line335>
> >
> >     I think s/doCheckpoint/shouldCheckpoint is more intuitive

n/a


> On Dec. 12, 2012, 1:19 a.m., Ben Mahler wrote:
> > src/slave/slave.hpp, line 339
> > <https://reviews.apache.org/r/7655/diff/5/?file=236695#file236695line339>
> >
> >     I don't feel strongly about this anymore, since I think what you have now reads easier, so feel free to kill this TODO that I made you add.

n/a


> On Dec. 12, 2012, 1:19 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 991
> > <https://reviews.apache.org/r/7655/diff/5/?file=236696#file236696line991>
> >
> >     missed a return here?

thank you.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14339
-----------------------------------------------------------


On Dec. 12, 2012, 8:51 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 12, 2012, 8:51 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
>   src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
>   src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
>   src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14339
-----------------------------------------------------------


Updated my review.


include/mesos/mesos.proto
<https://reviews.apache.org/r/7655/#comment30533>

    s/better recover/recover
    
    Unless we already have some form of task recovery?



src/messages/messages.proto
<https://reviews.apache.org/r/7655/#comment30534>

    kill TODO since it's going to be difficult to make this required..
    
    see: https://developers.google.com/protocol-buffers/docs/cpptutorial
    
    Snippet from the 'Extending a Protocol Buffer' section:
    
    "Sooner or later after you release the code that uses your protocol buffer, you will undoubtedly want to "improve" the protocol buffer's definition. If you want your new buffers to be backwards-compatible, and your old buffers to be forward-compatible – and you almost certainly do want this – then there are some rules you need to follow. In the new version of the protocol buffer:
    
     -you must not change the tag numbers of any existing fields.
     -you must not add or delete any required fields.
     -you may delete optional or repeated fields.
     -you may add new optional or repeated fields but you must use fresh tag numbers (i.e. tag numbers that were never used in this protocol buffer, not even by deleted fields).
    "
    
    Note the following:
     -you must not add or delete any required fields.
    



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment30536>

    I think s/doCheckpoint/shouldCheckpoint is more intuitive



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment30535>

    I don't feel strongly about this anymore, since I think what you have now reads easier, so feel free to kill this TODO that I made you add.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30539>

    missed a return here?


- Ben Mahler


On Dec. 11, 2012, 12:01 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 11, 2012, 12:01 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
>   src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
>   src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
>   src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
>   src/slave/status_update_manager.hpp PRE-CREATION 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 11, 2012, 12:01 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

some minor fixes that were discussed offline with BenH.


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
  src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
  src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
  src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/
-----------------------------------------------------------

(Updated Dec. 7, 2012, 3:54 a.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

BenH's comments


Description
-------

Integrated SUM into slave.


Diffs (updated)
-----

  include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
  src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
  src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
  src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
  src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
  src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
  src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
  src/slave/status_update_manager.hpp PRE-CREATION 
  src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 

Diff: https://reviews.apache.org/r/7655/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Vinod Kone <vi...@gmail.com>.

> On Dec. 6, 2012, 12:48 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 643
> > <https://reviews.apache.org/r/7655/diff/3/?file=233869#file233869line643>
> >
> >     TASK_LOST or TASK_KILLED?

i think TASK_KILLED makes sense actually. reverted


> On Dec. 6, 2012, 12:48 a.m., Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 519
> > <https://reviews.apache.org/r/7655/diff/3/?file=233869#file233869line519>
> >
> >     Explain why not using statusUpdate directly.

fixed with explanation.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14074
-----------------------------------------------------------


On Dec. 4, 2012, 3:59 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 4, 2012, 3:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
>   src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
>   src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
>   src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_updates_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Tres): StatusUpdatesManager integration into the slave

Posted by Benjamin Hindman <be...@berkeley.edu>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/7655/#review14074
-----------------------------------------------------------



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment30108>

    Kill this.



src/slave/slave.hpp
<https://reviews.apache.org/r/7655/#comment30109>

    forwardUpdate?



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30112>

    Explain why not using statusUpdate directly.



src/slave/slave.cpp
<https://reviews.apache.org/r/7655/#comment30110>

    TASK_LOST or TASK_KILLED?



src/tests/status_updates_manager_tests.cpp
<https://reviews.apache.org/r/7655/#comment30113>

    ...


- Benjamin Hindman


On Dec. 4, 2012, 3:59 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/7655/
> -----------------------------------------------------------
> 
> (Updated Dec. 4, 2012, 3:59 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Integrated SUM into slave.
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 38235157d45bdccb676e5c3241c21b585a6f8801 
>   src/Makefile.am b2d1edf140797c7150cb4644d323296965c4f000 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/gc.cpp 679504e51922c5ea54a476d061262e8e8f2aa4b6 
>   src/slave/paths.hpp 98e7fd402919c50a26f69a2f1a1904cb877c5f43 
>   src/slave/slave.hpp bbba4404e9e2b1ff1e246f017cdad704438973ba 
>   src/slave/slave.cpp 28fd4c336d8ac658cf92811d20066a6cfdf5a95e 
>   src/tests/master_tests.cpp 948ab5dff34eeba1f3ce593a864ddf282c8b19ed 
>   src/tests/status_updates_manager_tests.cpp PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/7655/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>