You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Ben Mahler <be...@gmail.com> on 2013/01/16 20:43:39 UTC

Re: Review Request: Slave Restart (Part Quatre): Checkpointed slave, framework, executor and task

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8535/#review14476
-----------------------------------------------------------


I forgot to publish these comments, they may be stale. I'll do another review as well.


src/slave/lxc_isolation_module.cpp
<https://reviews.apache.org/r/8535/#comment30846>

    Hm.. failing at runtime seems pretty bad, even though we're deprecated lxc, have you considered logging an error and proceeding?
    
    Or an EXIT?



src/slave/process_based_isolation_module.cpp
<https://reviews.apache.org/r/8535/#comment30847>

    Why did you kill all the logging for checkpointing?
    
    Seems useful for debugging any issues with this.



src/slave/slave.cpp
<https://reviews.apache.org/r/8535/#comment30848>

    Instead of a helper, you could use an option and compute it only when checkpoint is true above. A helper seems a bit clunky, no?


- Ben Mahler


On Dec. 19, 2012, 1:32 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8535/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2012, 1:32 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Checkpoints slave id, framework pid, executor pids (libprocess, execed, forked) and task info.
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 0f2975d1adf874dba4d0a539eb5c99233cef6e6b 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
>   src/tests/protobuf_io_tests.cpp 979efac0f28c3d361f5c347f15933d89d9356bc9 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8535/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Quatre): Checkpointed slave, framework, executor and task

Posted by Ben Mahler <be...@gmail.com>.

> On Jan. 16, 2013, 7:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 573
> > <https://reviews.apache.org/r/8535/diff/2-3/?file=237646#file237646line573>
> >
> >     Instead of a helper, you could use an option and compute it only when checkpoint is true above. A helper seems a bit clunky, no?
> 
> Vinod Kone wrote:
>     I don't get it. Option for what? Do you mean make 'path' and 't' as options and pre-compute at the top when if (executor!=NULL)? Doesn't work for when we have to create the executor at the bottom.

Yeah, doesn't work at the bottom but I imagined it would the options would just be contained _inside_ the if (executor != NULL). I guess this doesn't save much since it doesn't eliminate the last computation at the bottom.

It's unfortunate that this function was already fairly complex =/, it's now 175 lines!


- Ben


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8535/#review14476
-----------------------------------------------------------


On Jan. 19, 2013, 1:09 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8535/
> -----------------------------------------------------------
> 
> (Updated Jan. 19, 2013, 1:09 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Checkpoints slave id, framework pid, executor pids (libprocess, execed, forked) and task info.
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 63cefc33cf34eebb82db5d8448b751be8652fa36 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/paths.hpp fbf3fd84fb8f2590311b18d2afec2d2e0d30ef0a 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
>   src/tests/protobuf_io_tests.cpp 979efac0f28c3d361f5c347f15933d89d9356bc9 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8535/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request: Slave Restart (Part Quatre): Checkpointed slave, framework, executor and task

Posted by Vinod Kone <vi...@gmail.com>.

> On Jan. 16, 2013, 7:43 p.m., Ben Mahler wrote:
> > src/slave/lxc_isolation_module.cpp, line 122
> > <https://reviews.apache.org/r/8535/diff/2-3/?file=237642#file237642line122>
> >
> >     Hm.. failing at runtime seems pretty bad, even though we're deprecated lxc, have you considered logging an error and proceeding?
> >     
> >     Or an EXIT?

I think failing fast is better here. Using EXIT though.


> On Jan. 16, 2013, 7:43 p.m., Ben Mahler wrote:
> > src/slave/process_based_isolation_module.cpp, line 198
> > <https://reviews.apache.org/r/8535/diff/2-3/?file=237644#file237644line198>
> >
> >     Why did you kill all the logging for checkpointing?
> >     
> >     Seems useful for debugging any issues with this.

Had to because we don't want to using any GLOG functions (due to lock inheritance issue we found in launcher.cpp).

I will print it out to cerr here though.


> On Jan. 16, 2013, 7:43 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 573
> > <https://reviews.apache.org/r/8535/diff/2-3/?file=237646#file237646line573>
> >
> >     Instead of a helper, you could use an option and compute it only when checkpoint is true above. A helper seems a bit clunky, no?

I don't get it. Option for what? Do you mean make 'path' and 't' as options and pre-compute at the top when if (executor!=NULL)? Doesn't work for when we have to create the executor at the bottom.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/8535/#review14476
-----------------------------------------------------------


On Dec. 19, 2012, 1:32 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/8535/
> -----------------------------------------------------------
> 
> (Updated Dec. 19, 2012, 1:32 a.m.)
> 
> 
> Review request for mesos, Benjamin Hindman and Ben Mahler.
> 
> 
> Description
> -------
> 
> Checkpoints slave id, framework pid, executor pids (libprocess, execed, forked) and task info.
> 
> 
> Diffs
> -----
> 
>   src/exec/exec.cpp 821a94fab1f5969183ecf9e28d7b6bc10920db24 
>   src/messages/messages.proto 815fcbbcb4a8643f50950a294cedf7281b2a187f 
>   src/slave/cgroups_isolation_module.hpp 669efa14ba2603764aa68ae19a44e79dbfdec192 
>   src/slave/cgroups_isolation_module.cpp 0f2975d1adf874dba4d0a539eb5c99233cef6e6b 
>   src/slave/flags.hpp 39e57f4104ee7a1538436ebbb9493581e28c99dd 
>   src/slave/isolation_module.hpp b962365ebeddd047896a66b02a327aa26ae323d3 
>   src/slave/lxc_isolation_module.hpp 2bc844f491befbe588965da2ada7cfcef0b6f0a4 
>   src/slave/lxc_isolation_module.cpp 30cff2a49339bb07030727d30352536a0a22d58c 
>   src/slave/process_based_isolation_module.hpp f1817192582e3646f8dcf17934ba7998829e8fd6 
>   src/slave/process_based_isolation_module.cpp 3d50a4b652e4e09dd57e744e408c8fb79ff3fbf5 
>   src/slave/slave.hpp e9f7b659ca2860501840b3d01e69915ebd162039 
>   src/slave/slave.cpp 9755b46f97173d6fcc9ab1fd63e0e4814b3bc018 
>   src/slave/state.hpp c1e4c782380b0076313d2216c40e86774050d445 
>   src/slave/state.cpp 4d4c2470c44fe630ec0694ee937131b4f0aafc4e 
>   src/tests/protobuf_io_tests.cpp 979efac0f28c3d361f5c347f15933d89d9356bc9 
>   src/tests/slave_state_tests.cpp 1136ea93cffb1483458edad2605b0b4f83b61c44 
>   src/tests/status_update_manager_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp be457117515ee727af101370b26bf9188afb8f45 
> 
> Diff: https://reviews.apache.org/r/8535/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>