You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2013/03/25 02:47:44 UTC

Review Request: Refactored slave::Framework/Executor. Included "states" as well as added recovery methods.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10111/
-----------------------------------------------------------

Review request for mesos and Vinod Kone.


Description
-------

This is work in progress.


Diffs
-----

  src/slave/cgroups_isolator.hpp 1732c4e9ea44eb3b4d6234898f28bf3a02df4a7e 
  src/slave/cgroups_isolator.cpp ebc2843c57d5c1787394d0572aca1ead3e5734f1 
  src/slave/http.cpp 260df43a9fdffe969533f4f2efd37cdcfe0fcb35 
  src/slave/isolator.hpp d702041784f5db159efd7da4d916405e86d99741 
  src/slave/process_isolator.hpp 920a667bb2ecc4c497ada39020579012110d9204 
  src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 
  src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0 
  src/slave/slave.cpp 091ec5ed19924aef31b761e68b70b8d042f9a9b7 

Diff: https://reviews.apache.org/r/10111/diff/


Testing
-------

make


Thanks,

Benjamin Hindman


Re: Review Request: Refactored slave::Framework/Executor. Included "states" as well as added recovery methods.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10111/#review19718
-----------------------------------------------------------


Can you drop this now? I have committed this.


- Vinod Kone


On March 25, 2013, 1:47 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10111/
> -----------------------------------------------------------
> 
> (Updated March 25, 2013, 1:47 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Description
> -------
> 
> This is work in progress.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.hpp 1732c4e9ea44eb3b4d6234898f28bf3a02df4a7e 
>   src/slave/cgroups_isolator.cpp ebc2843c57d5c1787394d0572aca1ead3e5734f1 
>   src/slave/http.cpp 260df43a9fdffe969533f4f2efd37cdcfe0fcb35 
>   src/slave/isolator.hpp d702041784f5db159efd7da4d916405e86d99741 
>   src/slave/process_isolator.hpp 920a667bb2ecc4c497ada39020579012110d9204 
>   src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 
>   src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0 
>   src/slave/slave.cpp 091ec5ed19924aef31b761e68b70b8d042f9a9b7 
> 
> Diff: https://reviews.apache.org/r/10111/diff/
> 
> 
> Testing
> -------
> 
> make
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request: Refactored slave::Framework/Executor. Included "states" as well as added recovery methods.

Posted by Benjamin Mahler <be...@gmail.com>.
Did you forget to update?


On Mon, Mar 25, 2013 at 7:12 PM, Benjamin Hindman <be...@berkeley.edu> wrote:

>
>
> > On March 25, 2013, 4:07 a.m., Vinod Kone wrote:
> > > src/slave/slave.hpp, line 705
> > > <
> https://reviews.apache.org/r/10111/diff/1/?file=274256#file274256line705>
> > >
> > >     Forked pid is checkpointed before libprocess pid.
> > >
> > >     When recovering in unsafe mode, the assumption is that the slave
> can die after checkpointing forked pid but before libprocess pid. So, it is
> not possible for libprocess pid to exist but not forked pid. If so, it is a
> really bad situation (file corruption).
>
> Added a comment.
>
>
> - Benjamin
>
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10111/#review18336
> -----------------------------------------------------------
>
>
> On March 25, 2013, 1:47 a.m., Benjamin Hindman wrote:
> >
> > -----------------------------------------------------------
> > This is an automatically generated e-mail. To reply, visit:
> > https://reviews.apache.org/r/10111/
> > -----------------------------------------------------------
> >
> > (Updated March 25, 2013, 1:47 a.m.)
> >
> >
> > Review request for mesos and Vinod Kone.
> >
> >
> > Description
> > -------
> >
> > This is work in progress.
> >
> >
> > Diffs
> > -----
> >
> >   src/slave/cgroups_isolator.hpp 1732c4e9ea44eb3b4d6234898f28bf3a02df4a7e
> >   src/slave/cgroups_isolator.cpp ebc2843c57d5c1787394d0572aca1ead3e5734f1
> >   src/slave/http.cpp 260df43a9fdffe969533f4f2efd37cdcfe0fcb35
> >   src/slave/isolator.hpp d702041784f5db159efd7da4d916405e86d99741
> >   src/slave/process_isolator.hpp 920a667bb2ecc4c497ada39020579012110d9204
> >   src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d
> >   src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0
> >   src/slave/slave.cpp 091ec5ed19924aef31b761e68b70b8d042f9a9b7
> >
> > Diff: https://reviews.apache.org/r/10111/diff/
> >
> >
> > Testing
> > -------
> >
> > make
> >
> >
> > Thanks,
> >
> > Benjamin Hindman
> >
> >
>
>

Re: Review Request: Refactored slave::Framework/Executor. Included "states" as well as added recovery methods.

Posted by Benjamin Hindman <be...@berkeley.edu>.

> On March 25, 2013, 4:07 a.m., Vinod Kone wrote:
> > src/slave/slave.hpp, line 705
> > <https://reviews.apache.org/r/10111/diff/1/?file=274256#file274256line705>
> >
> >     Forked pid is checkpointed before libprocess pid.
> >     
> >     When recovering in unsafe mode, the assumption is that the slave can die after checkpointing forked pid but before libprocess pid. So, it is not possible for libprocess pid to exist but not forked pid. If so, it is a really bad situation (file corruption).

Added a comment.


- Benjamin


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10111/#review18336
-----------------------------------------------------------


On March 25, 2013, 1:47 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10111/
> -----------------------------------------------------------
> 
> (Updated March 25, 2013, 1:47 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Description
> -------
> 
> This is work in progress.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.hpp 1732c4e9ea44eb3b4d6234898f28bf3a02df4a7e 
>   src/slave/cgroups_isolator.cpp ebc2843c57d5c1787394d0572aca1ead3e5734f1 
>   src/slave/http.cpp 260df43a9fdffe969533f4f2efd37cdcfe0fcb35 
>   src/slave/isolator.hpp d702041784f5db159efd7da4d916405e86d99741 
>   src/slave/process_isolator.hpp 920a667bb2ecc4c497ada39020579012110d9204 
>   src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 
>   src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0 
>   src/slave/slave.cpp 091ec5ed19924aef31b761e68b70b8d042f9a9b7 
> 
> Diff: https://reviews.apache.org/r/10111/diff/
> 
> 
> Testing
> -------
> 
> make
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>


Re: Review Request: Refactored slave::Framework/Executor. Included "states" as well as added recovery methods.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/10111/#review18336
-----------------------------------------------------------


looks much cleaner! thanks for refactoring!!


src/slave/isolator.hpp
<https://reviews.apache.org/r/10111/#comment38543>

    Do you mean in registered()?
    
    See discussion in MESOS-365.
    
    If you meant initialize(), why is slave id special?



src/slave/slave.hpp
<https://reviews.apache.org/r/10111/#comment38544>

    Some comments here would be great.



src/slave/slave.hpp
<https://reviews.apache.org/r/10111/#comment38545>

    Forked pid is checkpointed before libprocess pid.
    
    When recovering in unsafe mode, the assumption is that the slave can die after checkpointing forked pid but before libprocess pid. So, it is not possible for libprocess pid to exist but not forked pid. If so, it is a really bad situation (file corruption).



src/slave/slave.hpp
<https://reviews.apache.org/r/10111/#comment38546>

    ditto..on comments.



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38547>

    How about pushing this responsibility to framework?
    
    In other words can the framework/executor states be private?



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38548>

    Either we need to pull this up (see the NOTE) or add a sleep in the test.



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38549>

    s/is/has/ ?



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38550>

    yes



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38551>

    If we are here, the executor died after fork but before exec. We don't need to do anything here because reregisterExecutorTimeout() cleans it up.



src/slave/slave.cpp
<https://reviews.apache.org/r/10111/#comment38552>

    ditto.


- Vinod Kone


On March 25, 2013, 1:47 a.m., Benjamin Hindman wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/10111/
> -----------------------------------------------------------
> 
> (Updated March 25, 2013, 1:47 a.m.)
> 
> 
> Review request for mesos and Vinod Kone.
> 
> 
> Description
> -------
> 
> This is work in progress.
> 
> 
> Diffs
> -----
> 
>   src/slave/cgroups_isolator.hpp 1732c4e9ea44eb3b4d6234898f28bf3a02df4a7e 
>   src/slave/cgroups_isolator.cpp ebc2843c57d5c1787394d0572aca1ead3e5734f1 
>   src/slave/http.cpp 260df43a9fdffe969533f4f2efd37cdcfe0fcb35 
>   src/slave/isolator.hpp d702041784f5db159efd7da4d916405e86d99741 
>   src/slave/process_isolator.hpp 920a667bb2ecc4c497ada39020579012110d9204 
>   src/slave/process_isolator.cpp 210ea10ad97e08c7a303249da97e70b438dfe11d 
>   src/slave/slave.hpp 2529bf500a3265b10ad4cddde10c2d62a6cdb4a0 
>   src/slave/slave.cpp 091ec5ed19924aef31b761e68b70b8d042f9a9b7 
> 
> Diff: https://reviews.apache.org/r/10111/diff/
> 
> 
> Testing
> -------
> 
> make
> 
> 
> Thanks,
> 
> Benjamin Hindman
> 
>