You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Benjamin Hindman <be...@berkeley.edu> on 2012/04/04 02:37:57 UTC

Re: Review Request: Made framework failover timeout configurable.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4533/#review6668
-----------------------------------------------------------


Cool Thomas, just a few minor things and then this patch will be tiny! ;)


include/mesos/mesos.proto
<https://reviews.apache.org/r/4533/#comment14424>

    Please update this comment to describe what timeout is used for.



include/mesos/mesos.proto
<https://reviews.apache.org/r/4533/#comment14422>

    How about we use the default mechanism of protobufs, i.e.,:
    
    optional double timeout = 4 [default = 1.0];
    
    Then you can kill the default in constants.hpp. ;)



src/examples/test_framework.cpp
<https://reviews.apache.org/r/4533/#comment14423>

    While I understand you're adding this to demonstrate how to use it, I think it's pretty self explanatory (especially after the comment gets added to mesos.proto) and I'd rather not have code in here that isn't being used (because future developers will assume that it's important for the running of the framework, which it is not). Ditto the rest of the examples.



src/master/constants.hpp
<https://reviews.apache.org/r/4533/#comment14425>

    While I like your intentions, I actually don't think there is a reasonable maximum. At Twitter, we actually set the timeout to several days! I think we should just eliminate this for now until we get a better idea of the right mechanism to build in to keep a framework for lasting forever (maybe the command line interface tool will be have a "destroy running framework" command).



src/master/master.cpp
<https://reviews.apache.org/r/4533/#comment14426>

    To make this more clear in the logs, how about:
    
    LOG(INFO) << "Giving framework " << framework->id << " " << timeout << " seconds to re-register";


- Benjamin


On 2012-03-31 21:57:43, Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4533/
> -----------------------------------------------------------
> 
> (Updated 2012-03-31 21:57:43)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Summary
> -------
> 
> When I started this project, it was somewhat complicated, but with the recent update that passes FrameworkInfo protobufs into the MesosSchedulerDriver, all that was necessary to allow frameworks to set their own timeouts was to add timeout as a field in FrameworkInfo and tell master.cpp to use this value. I made it optional since some people might not care to set it and it would just be an added complication, but I also added it to all three of the test_frameworks (C++, Java, Python) to demonstrate its use.
> 
> The choice for DEFAULT_FRAMEWORK_FAILOVER_TIMEOUT and MAX_FRAMEWORK_FAILOVER_TIMEOUT was basically arbitrary, so I'm open to suggestions on what might be more appropriate.
> 
> 
> This addresses bug MESOS-143.
>     https://issues.apache.org/jira/browse/MESOS-143
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 75eee57 
>   include/mesos/scheduler.hpp dd09e3d 
>   src/examples/java/TestFramework.java 4975760 
>   src/examples/python/test_framework.py dd9cf98 
>   src/examples/test_framework.cpp c408692 
>   src/master/constants.hpp 8248475 
>   src/master/master.hpp 8a34d7e 
>   src/master/master.cpp 4dc9ee0 
> 
> Diff: https://reviews.apache.org/r/4533/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Thomas
> 
>


Re: Review Request: Made framework failover timeout configurable.

Posted by Thomas Marshall <tw...@gmail.com>.

> On 2012-04-04 00:37:57, Benjamin Hindman wrote:
> > include/mesos/mesos.proto, line 91
> > <https://reviews.apache.org/r/4533/diff/2/?file=97830#file97830line91>
> >
> >     Please update this comment to describe what timeout is used for.

Done.


> On 2012-04-04 00:37:57, Benjamin Hindman wrote:
> > include/mesos/mesos.proto, line 97
> > <https://reviews.apache.org/r/4533/diff/2/?file=97830#file97830line97>
> >
> >     How about we use the default mechanism of protobufs, i.e.,:
> >     
> >     optional double timeout = 4 [default = 1.0];
> >     
> >     Then you can kill the default in constants.hpp. ;)

Done.


> On 2012-04-04 00:37:57, Benjamin Hindman wrote:
> > src/examples/test_framework.cpp, line 188
> > <https://reviews.apache.org/r/4533/diff/2/?file=97834#file97834line188>
> >
> >     While I understand you're adding this to demonstrate how to use it, I think it's pretty self explanatory (especially after the comment gets added to mesos.proto) and I'd rather not have code in here that isn't being used (because future developers will assume that it's important for the running of the framework, which it is not). Ditto the rest of the examples.

Done.


> On 2012-04-04 00:37:57, Benjamin Hindman wrote:
> > src/master/constants.hpp, line 57
> > <https://reviews.apache.org/r/4533/diff/2/?file=97835#file97835line57>
> >
> >     While I like your intentions, I actually don't think there is a reasonable maximum. At Twitter, we actually set the timeout to several days! I think we should just eliminate this for now until we get a better idea of the right mechanism to build in to keep a framework for lasting forever (maybe the command line interface tool will be have a "destroy running framework" command).

Done.


> On 2012-04-04 00:37:57, Benjamin Hindman wrote:
> > src/master/master.cpp, line 445
> > <https://reviews.apache.org/r/4533/diff/2/?file=97837#file97837line445>
> >
> >     To make this more clear in the logs, how about:
> >     
> >     LOG(INFO) << "Giving framework " << framework->id << " " << timeout << " seconds to re-register";

Done.


- Thomas


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/4533/#review6668
-----------------------------------------------------------


On 2012-03-31 21:57:43, Thomas Marshall wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/4533/
> -----------------------------------------------------------
> 
> (Updated 2012-03-31 21:57:43)
> 
> 
> Review request for mesos and Benjamin Hindman.
> 
> 
> Summary
> -------
> 
> When I started this project, it was somewhat complicated, but with the recent update that passes FrameworkInfo protobufs into the MesosSchedulerDriver, all that was necessary to allow frameworks to set their own timeouts was to add timeout as a field in FrameworkInfo and tell master.cpp to use this value. I made it optional since some people might not care to set it and it would just be an added complication, but I also added it to all three of the test_frameworks (C++, Java, Python) to demonstrate its use.
> 
> The choice for DEFAULT_FRAMEWORK_FAILOVER_TIMEOUT and MAX_FRAMEWORK_FAILOVER_TIMEOUT was basically arbitrary, so I'm open to suggestions on what might be more appropriate.
> 
> 
> This addresses bug MESOS-143.
>     https://issues.apache.org/jira/browse/MESOS-143
> 
> 
> Diffs
> -----
> 
>   include/mesos/mesos.proto 75eee57 
>   include/mesos/scheduler.hpp dd09e3d 
>   src/examples/java/TestFramework.java 4975760 
>   src/examples/python/test_framework.py dd9cf98 
>   src/examples/test_framework.cpp c408692 
>   src/master/constants.hpp 8248475 
>   src/master/master.hpp 8a34d7e 
>   src/master/master.cpp 4dc9ee0 
> 
> Diff: https://reviews.apache.org/r/4533/diff
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Thomas
> 
>