You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2012/05/01 04:36:53 UTC

Re: Review Request: Non-disruptive Slave Restart with Recovery!


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > I've only gotten halfway through ... but there is a bunch here already. I'd like to break this up into at least four patches. (1) The utils stuff that was added. (2) The master changes. (3) The slave::path namespace stuff. (3) The status update manager API + implementation (but not the slave using it yet). And (4) the slave using each of these components, and the executor changes that are included.
> > 
> > These comments are across all of those patches, but I'll make future passes on each of those components.
> 
> Vinod Kone wrote:
>     addressed the comments for the utils.hpp part. Will send a review for utils.hpp and protobuf_utils.hpp (forgot to include it in this review) shortly.

patch coming for path refactoring shortly.


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 58
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line58>
> >
> >     s/follows/follows:

done


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 66
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line66>
> >
> >     What is the framework PID? How is that different than the Executor PID mentioned below?

we need to store framework pid because, when an executor re-registers we need its framework's PID to re-create a new Framework() object. 


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 218
> > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line218>
> >
> >     Isn't the default workd_dir mentioned in the addOption above /tmp/mesos? If so, "/tmp" should be "/tmp/mesos" here, and we probably don't want "/tmp/mesos/mesos". Just clean this up so people understand expectations (i.e., should work_dir be a path including the "mesos" directory, or will we create that ourselves).

I agree its a bit confusing. The reason I did it this way is to avoid dumping work/meta directories directly under '/tmp' (for eg. when an user specifies work_dir=/tmp).
 
I'm indifferent on how we want to do this. I will revert back to 'workRootDir = conf.get("work_dir", "/tmp/mesos") + "/work"' for now;


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, line 56
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line56>
> >
> >     I'd like to stick all of this stuff in it's own file and commit this on it's own (with integration in Slave as applicable and tests).

refactored out path stuff into slave/path.hpp 


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.hpp, lines 344-345
> > <https://reviews.apache.org/r/4462/diff/3/?file=103041#file103041line344>
> >
> >     What's the difference between "work" and "work root"? Or "meta" and "meta root"?

workRootDir denotes the root directory (conf['work_dir'/mesos/work]) where work directories of a slave are stored. it's not exactly a slave's working directory (that would be workRootDir/slaves/slaveId).

i needed the workRootDir for the path layout stuff.

same goes with metaRootDir


> On 2012-04-25 22:11:01, Benjamin Hindman wrote:
> > src/slave/slave.cpp, line 2289
> > <https://reviews.apache.org/r/4462/diff/3/?file=103042#file103042line2289>
> >
> >     Pass these into writeFrameworkPID, no need to make this an instance function. More importantly, you should do this for each of these writers/readers.

moved these write/read functions into path.hpp


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4462/#review7232
-----------------------------------------------------------


On 2012-04-19 16:53:07, Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4462/
> -----------------------------------------------------------
> 
> (Updated 2012-04-19 16:53:07)
> 
> 
> Review request for mesos, Benjamin Hindman and John Sirois.
> 
> 
> Summary
> -------
> 
> Sorry for the huge  CL!
> 
> Slave restarts now supports recovery!
> --> Non-disruptive restart means running tasks are not lost
> --> Re-connects with live executors
> --> Checkpoints and reliably sends status updates
> --> Ability to kill executors if the slave upgrade is incompatible with running executors
> 
> 
> This addresses bug mesos-110.
>     https://issues.apache.org/jira/browse/mesos-110
> 
> 
> Diffs
> -----
> 
>   src/Makefile.am d5edaa2 
>   src/common/hashset.hpp 1feb610 
>   src/common/utils.hpp 1d81e21 
>   src/exec/exec.cpp e8db407 
>   src/launcher/launcher.cpp a141b9a 
>   src/local/local.hpp 55f9eaf 
>   src/local/local.cpp affe432 
>   src/master/master.cpp 4dc9ee0 
>   src/messages/messages.proto 87e1548 
>   src/sched/sched.cpp dcadb10 
>   src/scripts/killtree.sh bceae9d 
>   src/slave/constants.hpp f0c8679 
>   src/slave/http.cpp 19c48a0 
>   src/slave/isolation_module.hpp c896908 
>   src/slave/lxc_isolation_module.hpp b7beefe 
>   src/slave/lxc_isolation_module.cpp 66a2a89 
>   src/slave/main.cpp 85cba25 
>   src/slave/process_based_isolation_module.hpp f6f9554 
>   src/slave/process_based_isolation_module.cpp 2b37d42 
>   src/slave/slave.hpp 279bc7b 
>   src/slave/slave.cpp 3358ec4 
>   src/slave/statusupdates_manager.hpp PRE-CREATION 
>   src/slave/statusupdates_manager.cpp PRE-CREATION 
>   src/tests/external_tests.cpp d1b20e4 
>   src/tests/fault_tolerance_tests.cpp 6772daf 
>   src/tests/slave_restart_tests.cpp PRE-CREATION 
>   src/tests/utils.hpp e81ec82 
> 
> Diff: https://reviews.apache.org/r/4462/diff
> 
> 
> Testing
> -------
> 
> make check.
> 
> Note that only the new test in tests/slave_restart_tests.cpp  engages in recovery!
> 
> Recovery is disabled for old tests (though they still checkpoint relevant info!)
> 
> 
> Thanks,
> 
> Vinod
> 
>