You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2013/08/03 03:11:20 UTC

Review Request 13253: Fixed slave to not recover completed executors.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13253/
-----------------------------------------------------------

Review request for mesos, Benjamin Hindman and Ben Mahler.


Bugs: MESOS-612
    https://issues.apache.org/jira/browse/MESOS-612


Repository: mesos-git


Description
-------

Added a sentinel file to executor checkpoint data. This allows slave/isolator/sum to skip recovery of executors that were completed( terminated and all their updates acked).

Also, cleaned up some code.


Diffs
-----

  src/slave/cgroups_isolator.cpp 0faf7d50d76887cad02267ab11827664a4b63476 
  src/slave/paths.hpp 9d2a2a40048bbe594723ba3f19aa10eaf1935926 
  src/slave/process_isolator.cpp cd794f6cb301a8c00a4c0ef906f95e53959ed905 
  src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04 
  src/slave/state.hpp 08e36174a1d88c342ba7a189ed413163bfd22fd8 
  src/slave/state.cpp e910ab71b8b667a076c0fdf31e3322e52fef1b17 
  src/slave/status_update_manager.cpp 9e9e4e2a47a609d65ed69a57de595852144a86c8 
  src/tests/slave_recovery_tests.cpp 1871e3ba41e65dcbd4b95779dda068f6a1a2ecb3 

Diff: https://reviews.apache.org/r/13253/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request 13253: Fixed slave to not recover completed executors.

Posted by Vinod Kone <vi...@gmail.com>.

> On Aug. 5, 2013, 6:33 p.m., Ben Mahler wrote:
> > src/slave/paths.hpp, lines 58-79
> > <https://reviews.apache.org/r/13253/diff/2/?file=336034#file336034line58>
> >
> >     path::join here would avoid mistakes with double forward slashes or missing forward slashes. Just a note.

your wish is my command.


> On Aug. 5, 2013, 6:33 p.m., Ben Mahler wrote:
> > src/slave/state.cpp, lines 380-382
> > <https://reviews.apache.org/r/13253/diff/2/?file=336038#file336038line380>
> >
> >     state.completed = os::exists(path);

doh..thanks


> On Aug. 5, 2013, 6:33 p.m., Ben Mahler wrote:
> > src/slave/state.cpp, line 296
> > <https://reviews.apache.org/r/13253/diff/2/?file=336038#file336038line296>
> >
> >     Consider making a default constructor that sets this false, or a constructor that takes all of the arguments.

i'll punt on this for consistency.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13253/#review24652
-----------------------------------------------------------


On Aug. 3, 2013, 8:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13253/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2013, 8:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-612
>     https://issues.apache.org/jira/browse/MESOS-612
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added a sentinel file to executor checkpoint data. This allows slave/isolator/sum to skip recovery of executors that were completed( terminated and all their updates acked).
> 
> Also, cleaned up some code.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.cpp 0faf7d50d76887cad02267ab11827664a4b63476 
>   src/slave/paths.hpp 9d2a2a40048bbe594723ba3f19aa10eaf1935926 
>   src/slave/process_isolator.cpp cd794f6cb301a8c00a4c0ef906f95e53959ed905 
>   src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04 
>   src/slave/state.hpp 08e36174a1d88c342ba7a189ed413163bfd22fd8 
>   src/slave/state.cpp e910ab71b8b667a076c0fdf31e3322e52fef1b17 
>   src/slave/status_update_manager.cpp 9e9e4e2a47a609d65ed69a57de595852144a86c8 
>   src/tests/slave_recovery_tests.cpp 1871e3ba41e65dcbd4b95779dda068f6a1a2ecb3 
> 
> Diff: https://reviews.apache.org/r/13253/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 13253: Fixed slave to not recover completed executors.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13253/#review24652
-----------------------------------------------------------

Ship it!



src/slave/paths.hpp
<https://reviews.apache.org/r/13253/#comment48753>

    path::join here would avoid mistakes with double forward slashes or missing forward slashes. Just a note.



src/slave/slave.cpp
<https://reviews.apache.org/r/13253/#comment48770>

    s/is being cleaned up/is completed/ ?



src/slave/slave.cpp
<https://reviews.apache.org/r/13253/#comment48767>

    This explanation of 'completed' would be nice in the RunState struct.
    
    s/don't bother recovering it/we do not need to recover it/ ?



src/slave/slave.cpp
<https://reviews.apache.org/r/13253/#comment48768>

    newline



src/slave/state.hpp
<https://reviews.apache.org/r/13253/#comment48765>

    Can you add a comment as to what 'completed' means?



src/slave/state.cpp
<https://reviews.apache.org/r/13253/#comment48757>

    Consider making a default constructor that sets this false, or a constructor that takes all of the arguments.



src/slave/state.cpp
<https://reviews.apache.org/r/13253/#comment48761>

    state.completed = os::exists(path);


- Ben Mahler


On Aug. 3, 2013, 8:33 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/13253/
> -----------------------------------------------------------
> 
> (Updated Aug. 3, 2013, 8:33 p.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Bugs: MESOS-612
>     https://issues.apache.org/jira/browse/MESOS-612
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> Added a sentinel file to executor checkpoint data. This allows slave/isolator/sum to skip recovery of executors that were completed( terminated and all their updates acked).
> 
> Also, cleaned up some code.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.cpp 0faf7d50d76887cad02267ab11827664a4b63476 
>   src/slave/paths.hpp 9d2a2a40048bbe594723ba3f19aa10eaf1935926 
>   src/slave/process_isolator.cpp cd794f6cb301a8c00a4c0ef906f95e53959ed905 
>   src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04 
>   src/slave/state.hpp 08e36174a1d88c342ba7a189ed413163bfd22fd8 
>   src/slave/state.cpp e910ab71b8b667a076c0fdf31e3322e52fef1b17 
>   src/slave/status_update_manager.cpp 9e9e4e2a47a609d65ed69a57de595852144a86c8 
>   src/tests/slave_recovery_tests.cpp 1871e3ba41e65dcbd4b95779dda068f6a1a2ecb3 
> 
> Diff: https://reviews.apache.org/r/13253/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 13253: Fixed slave to not recover completed executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13253/
-----------------------------------------------------------

(Updated Aug. 6, 2013, 5:53 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

benm's. nnfr.


Bugs: MESOS-612
    https://issues.apache.org/jira/browse/MESOS-612


Repository: mesos-git


Description
-------

Added a sentinel file to executor checkpoint data. This allows slave/isolator/sum to skip recovery of executors that were completed( terminated and all their updates acked).

Also, cleaned up some code.


Diffs (updated)
-----

  src/slave/cgroups_isolator.cpp 7f6d13ede40c913899cb7a4f6ebea3056d3fa491 
  src/slave/paths.hpp 9d2a2a40048bbe594723ba3f19aa10eaf1935926 
  src/slave/process_isolator.cpp cb074485af9af1ea7c659dcd6fa50c035c5442f2 
  src/slave/slave.cpp 9cd7754b647dde21267f1990edb7d4e1425beacd 
  src/slave/state.hpp 08e36174a1d88c342ba7a189ed413163bfd22fd8 
  src/slave/state.cpp e910ab71b8b667a076c0fdf31e3322e52fef1b17 
  src/slave/status_update_manager.cpp e17ecf4b10423d3239ba0752ea0953e21a61483a 
  src/tests/slave_recovery_tests.cpp c451e0f4c571a646d375aa89e806e1a4058d39e7 

Diff: https://reviews.apache.org/r/13253/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request 13253: Fixed slave to not recover completed executors.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/13253/
-----------------------------------------------------------

(Updated Aug. 3, 2013, 8:33 p.m.)


Review request for mesos, Benjamin Hindman and Ben Mahler.


Changes
-------

rebased.


Bugs: MESOS-612
    https://issues.apache.org/jira/browse/MESOS-612


Repository: mesos-git


Description
-------

Added a sentinel file to executor checkpoint data. This allows slave/isolator/sum to skip recovery of executors that were completed( terminated and all their updates acked).

Also, cleaned up some code.


Diffs (updated)
-----

  src/slave/cgroups_isolator.cpp 0faf7d50d76887cad02267ab11827664a4b63476 
  src/slave/paths.hpp 9d2a2a40048bbe594723ba3f19aa10eaf1935926 
  src/slave/process_isolator.cpp cd794f6cb301a8c00a4c0ef906f95e53959ed905 
  src/slave/slave.cpp 7f6e6b456890db438092f19a22e4dd816bb33d04 
  src/slave/state.hpp 08e36174a1d88c342ba7a189ed413163bfd22fd8 
  src/slave/state.cpp e910ab71b8b667a076c0fdf31e3322e52fef1b17 
  src/slave/status_update_manager.cpp 9e9e4e2a47a609d65ed69a57de595852144a86c8 
  src/tests/slave_recovery_tests.cpp 1871e3ba41e65dcbd4b95779dda068f6a1a2ecb3 

Diff: https://reviews.apache.org/r/13253/diff/


Testing
-------

make check


Thanks,

Vinod Kone