You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Jiang Yan Xu <ya...@jxu.me> on 2014/04/19 00:09:43 UTC

Review Request 20502: Introduced exponential backoff for slave registration retries.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/
-----------------------------------------------------------

Review request for mesos and Ben Mahler.


Bugs: MESOS-982
    https://issues.apache.org/jira/browse/MESOS-982


Repository: mesos-git


Description
-------

See summary.


Diffs
-----

  src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
  src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 

Diff: https://reviews.apache.org/r/20502/diff/


Testing
-------

make check. Also tested the random backoff logic separately.


Thanks,

Jiang Yan Xu


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review41260
-----------------------------------------------------------

Ship it!


Modified the logic a bit, per offline discussion, and committed.

- Vinod Kone


On April 24, 2014, 1:24 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 24, 2014, 1:24 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
>   src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
>   src/slave/flags.hpp d5c54c0c4bcd80c4849ca31f27896e22b436e045 
>   src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
>   src/slave/slave.cpp b673fd6aa575b30dfe0858881a95de1db8c06e63 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/
-----------------------------------------------------------

(Updated April 23, 2014, 6:24 p.m.)


Review request for mesos, Ben Mahler and Vinod Kone.


Changes
-------

Added a minimum retry interval and slightly changed the backoff algorithm.

Now it's max(user specified min, random value within the exponentially increasing range)


Bugs: MESOS-982
    https://issues.apache.org/jira/browse/MESOS-982


Repository: mesos-git


Description
-------

See summary.


Diffs (updated)
-----

  src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
  src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
  src/slave/flags.hpp d5c54c0c4bcd80c4849ca31f27896e22b436e045 
  src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
  src/slave/slave.cpp b673fd6aa575b30dfe0858881a95de1db8c06e63 

Diff: https://reviews.apache.org/r/20502/diff/


Testing
-------

make check. Also tested the random backoff logic separately.


Thanks,

Jiang Yan Xu


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 22, 2014, 11:31 a.m., Ben Mahler wrote:
> > src/slave/constants.cpp, line 33
> > <https://reviews.apache.org/r/20502/diff/2/?file=564235#file564235line33>
> >
> >     A flag for this would be nice!
> >     
> >     Are you planning to expose the minimum as well? 1 second might be a small minimum for large clusters (10,000+ slaves).

Ok the flag will make https://reviews.apache.org/r/20561/ unnecessary (for now).
I think what should be configurable through flags is not this but the slaves.recoveredTimer: https://github.com/apache/mesos/blob/12e1a9675a630a01988cd30679c7409e123292ae/src/master/master.cpp#L794
We should use that value here. Agreed?

I'll add a minimum.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review41043
-----------------------------------------------------------


On April 21, 2014, 10:10 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 21, 2014, 10:10 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
>   src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
>   src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
>   src/slave/slave.cpp b3c428508c77bf1ca9154f9c22b5d83c27bc9d1e 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review41043
-----------------------------------------------------------



src/slave/constants.cpp
<https://reviews.apache.org/r/20502/#comment74404>

    A flag for this would be nice!
    
    Are you planning to expose the minimum as well? 1 second might be a small minimum for large clusters (10,000+ slaves).


- Ben Mahler


On April 22, 2014, 5:10 a.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 22, 2014, 5:10 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
>   src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
>   src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
>   src/slave/slave.cpp b3c428508c77bf1ca9154f9c22b5d83c27bc9d1e 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/
-----------------------------------------------------------

(Updated April 21, 2014, 10:10 p.m.)


Review request for mesos, Ben Mahler and Vinod Kone.


Changes
-------

Comments.


Bugs: MESOS-982
    https://issues.apache.org/jira/browse/MESOS-982


Repository: mesos-git


Description
-------

See summary.


Diffs (updated)
-----

  src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
  src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
  src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
  src/slave/slave.cpp b3c428508c77bf1ca9154f9c22b5d83c27bc9d1e 

Diff: https://reviews.apache.org/r/20502/diff/


Testing
-------

make check. Also tested the random backoff logic separately.


Thanks,

Jiang Yan Xu


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Mesos ReviewBot <de...@mesos.apache.org>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40857
-----------------------------------------------------------


Patch looks great!

Reviews applied: [20502]

All tests passed.

- Mesos ReviewBot


On April 20, 2014, 9:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 9:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 21, 2014, 5:46 p.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 738
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line738>
> >
> >     Do you need the Duration() wrapper?

Nope. Removed it.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40985
-----------------------------------------------------------


On April 21, 2014, 10:10 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 21, 2014, 10:10 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
>   src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
>   src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
>   src/slave/slave.cpp b3c428508c77bf1ca9154f9c22b5d83c27bc9d1e 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40985
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/20502/#comment74333>

    Why use Duration::create instead of the built in * and + Duration operators?
    
    Duration next = Seconds(1) + backoff * (...);



src/slave/slave.cpp
<https://reviews.apache.org/r/20502/#comment74334>

    Do you need the Duration() wrapper?


- Ben Mahler


On April 20, 2014, 9:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 9:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 21, 2014, 2:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 738
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line738>
> >
> >     Why 16 seconds? Seems too low of a value? How about 10 min like we did for status updates?
> >     
> >     Either way pull this out to a constant (e.g., SLAVE_RETRY_INTERVAL_MAX).

We do want them to register prior to the 75 second time out right?


> On April 21, 2014, 2:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 729-740
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line729>
> >
> >     Instead of doing this way, can we just add the jitter the first time doReliableRegistration() is called (e.g., in in detected())
> >     
> >     That way the code here could be simplified (similar to what we did in status update manager) and easy to reason about?

That way how do I express the randomness between retries? (which I think would be helpful in this case because we don't want everyone to back off too much and thus delay the recovery time after a failover.)


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------


On April 20, 2014, 2:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 2:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 21, 2014, 2:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 624
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line624>
> >
> >     s/backoffInterval/duration/ to be consistent with how we named it in status update manager.

I will make the change to keep our naming consistent but I have my reservations about naming variables less descriptively.

void Slave::doReliableRegistration(const Duration& duration);
vs.
void Slave::doReliableRegistration(const Duration& backoffInterval);

Seems like the latter reads better and is self-documenting.
Maybe this should be discussed elsewhere so I am fine with these suggested renaming in this review.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------


On April 20, 2014, 2:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 2:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Vinod Kone <vi...@gmail.com>.

> On April 21, 2014, 9:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 738
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line738>
> >
> >     Why 16 seconds? Seems too low of a value? How about 10 min like we did for status updates?
> >     
> >     Either way pull this out to a constant (e.g., SLAVE_RETRY_INTERVAL_MAX).
> 
> Jiang Yan Xu wrote:
>     We do want them to register prior to the 75 second time out right?

oh yea thats right. s/10min/75s/. Ideally we should extract out the slave ping timeout and max ping retries into a common/constants.hpp header so that it can be used by both the master and slave. maybe a TODO for now.


> On April 21, 2014, 9:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 729-740
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line729>
> >
> >     Instead of doing this way, can we just add the jitter the first time doReliableRegistration() is called (e.g., in in detected())
> >     
> >     That way the code here could be simplified (similar to what we did in status update manager) and easy to reason about?
> 
> Jiang Yan Xu wrote:
>     That way how do I express the randomness between retries? (which I think would be helpful in this case because we don't want everyone to back off too much and thus delay the recovery time after a failover.)

I don't think I follow. Why would everyone backoff too much?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------


On April 20, 2014, 9:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 9:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Vinod Kone <vi...@gmail.com>.

> On April 21, 2014, 9:47 p.m., Vinod Kone wrote:
> >

also, make sure the tests properly handled the new retry interval.


> On April 21, 2014, 9:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, lines 729-740
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line729>
> >
> >     Instead of doing this way, can we just add the jitter the first time doReliableRegistration() is called (e.g., in in detected())
> >     
> >     That way the code here could be simplified (similar to what we did in status update manager) and easy to reason about?
> 
> Jiang Yan Xu wrote:
>     That way how do I express the randomness between retries? (which I think would be helpful in this case because we don't want everyone to back off too much and thus delay the recovery time after a failover.)
> 
> Vinod Kone wrote:
>     I don't think I follow. Why would everyone backoff too much?

discussed offline. i think this is reasonable to give everyone a fair chance during backoffs.

s/randomized/next/
s/nextInterval/duration_/ ?


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------


On April 20, 2014, 9:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 9:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On April 21, 2014, 2:47 p.m., Vinod Kone wrote:
> > src/slave/slave.cpp, line 738
> > <https://reviews.apache.org/r/20502/diff/1/?file=562543#file562543line738>
> >
> >     Why 16 seconds? Seems too low of a value? How about 10 min like we did for status updates?
> >     
> >     Either way pull this out to a constant (e.g., SLAVE_RETRY_INTERVAL_MAX).
> 
> Jiang Yan Xu wrote:
>     We do want them to register prior to the 75 second time out right?
> 
> Vinod Kone wrote:
>     oh yea thats right. s/10min/75s/. Ideally we should extract out the slave ping timeout and max ping retries into a common/constants.hpp header so that it can be used by both the master and slave. maybe a TODO for now.

Instead of TODO, will follow up with a review for it.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------


On April 21, 2014, 10:10 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 21, 2014, 10:10 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp 6d1cfda97162f9f332e7f92461b281035587e7fa 
>   src/slave/constants.cpp 9811bd39e352a5097eaadb3ae6c6e4f43bc9cd98 
>   src/slave/slave.hpp 438e5b58d732862b3ca1de10f2aabbf5d9071584 
>   src/slave/slave.cpp b3c428508c77bf1ca9154f9c22b5d83c27bc9d1e 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/#review40953
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/20502/#comment74274>

    s/backoffInterval/duration/ to be consistent with how we named it in status update manager.



src/slave/slave.cpp
<https://reviews.apache.org/r/20502/#comment74287>

    Instead of doing this way, can we just add the jitter the first time doReliableRegistration() is called (e.g., in in detected())
    
    That way the code here could be simplified (similar to what we did in status update manager) and easy to reason about?



src/slave/slave.cpp
<https://reviews.apache.org/r/20502/#comment74282>

    Why 16 seconds? Seems too low of a value? How about 10 min like we did for status updates?
    
    Either way pull this out to a constant (e.g., SLAVE_RETRY_INTERVAL_MAX).


- Vinod Kone


On April 20, 2014, 9:49 p.m., Jiang Yan Xu wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
> 
> (Updated April 20, 2014, 9:49 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Vinod Kone.
> 
> 
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 
> 
> Diff: https://reviews.apache.org/r/20502/diff/
> 
> 
> Testing
> -------
> 
> make check. Also tested the random backoff logic separately.
> 
> 
> Thanks,
> 
> Jiang Yan Xu
> 
>


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/20502/
-----------------------------------------------------------

(Updated April 20, 2014, 2:49 p.m.)


Review request for mesos, Ben Mahler and Vinod Kone.


Changes
-------

Add Vinod as reviewer.


Bugs: MESOS-982
    https://issues.apache.org/jira/browse/MESOS-982


Repository: mesos-git


Description
-------

See summary.


Diffs
-----

  src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0 
  src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570 

Diff: https://reviews.apache.org/r/20502/diff/


Testing
-------

make check. Also tested the random backoff logic separately.


Thanks,

Jiang Yan Xu


Re: Review Request 20502: Introduced exponential backoff for slave registration retries.

Posted by Benjamin Mahler <be...@gmail.com>.
Vinod can you review this one?


On Fri, Apr 18, 2014 at 3:09 PM, Jiang Yan Xu <ya...@jxu.me> wrote:

>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/20502/
> -----------------------------------------------------------
>
> Review request for mesos and Ben Mahler.
>
>
> Bugs: MESOS-982
>     https://issues.apache.org/jira/browse/MESOS-982
>
>
> Repository: mesos-git
>
>
> Description
> -------
>
> See summary.
>
>
> Diffs
> -----
>
>   src/slave/slave.hpp 1e9879582315fabb76659e9f8eb03f90188fbfa0
>   src/slave/slave.cpp d6ec87c2232c2172f471ae30711b5da1c7050570
>
> Diff: https://reviews.apache.org/r/20502/diff/
>
>
> Testing
> -------
>
> make check. Also tested the random backoff logic separately.
>
>
> Thanks,
>
> Jiang Yan Xu
>
>