You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mesos.apache.org by Vinod Kone <vi...@gmail.com> on 2014/05/15 00:05:35 UTC

Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/
-----------------------------------------------------------

Review request for mesos, Ben Mahler and Jiang Yan Xu.


Bugs: MESOS-1276
    https://issues.apache.org/jira/browse/MESOS-1276


Repository: mesos-git


Description
-------

See summary.


Diffs
-----

  src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
  src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
  src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
  src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
  src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
  src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
  src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
  src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
  src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
  src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 

Diff: https://reviews.apache.org/r/21464/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Vinod Kone <vi...@gmail.com>.

> On May 15, 2014, 12:40 a.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp, lines 891-893
> > <https://reviews.apache.org/r/21464/diff/2/?file=582039#file582039line891>
> >
> >     Before the change "duration * 2" is capped by "REGISTER_RETRY_INTERVAL_MAX * 2" so it's always safe.
> >     
> >     With this change the duration is growing exponentially and unbounded, it won't be long before it becomes negative.
> >     
> >     We should still keep it bounded.

good point. fixed.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43067
-----------------------------------------------------------


On May 14, 2014, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 11:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43067
-----------------------------------------------------------

Ship it!


The rest looks great!


src/slave/slave.cpp
<https://reviews.apache.org/r/21464/#comment77088>

    Before the change "duration * 2" is capped by "REGISTER_RETRY_INTERVAL_MAX * 2" so it's always safe.
    
    With this change the duration is growing exponentially and unbounded, it won't be long before it becomes negative.
    
    We should still keep it bounded.


- Jiang Yan Xu


On May 14, 2014, 4:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 4:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Vinod Kone <vi...@gmail.com>.

> On May 15, 2014, 12:40 a.m., Ben Mahler wrote:
> > src/slave/slave.cpp, line 559
> > <https://reviews.apache.org/r/21464/diff/2/?file=582039#file582039line559>
> >
> >     I think we need to update 'authenticate' to deal with the fact that it can now be invoked asynchronously. Namely, it needs to check whether we still need to authenticate, much like doReliableRegistration checks if we still need to register.

as discussed, restricted the scope of this patch to only do backoff for (re-)registration and not authentication (added a TODO).


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43057
-----------------------------------------------------------


On May 14, 2014, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 11:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43057
-----------------------------------------------------------



src/slave/constants.hpp
<https://reviews.apache.org/r/21464/#comment77078>

    "jitter"?



src/slave/slave.cpp
<https://reviews.apache.org/r/21464/#comment77087>

    I think we need to update 'authenticate' to deal with the fact that it can now be invoked asynchronously. Namely, it needs to check whether we still need to authenticate, much like doReliableRegistration checks if we still need to register.


- Ben Mahler


On May 14, 2014, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 11:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial (re-)registration attempt.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43233
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/21464/#comment77324>

    It seems like we would need a bit of a refactor to make 'authenticate' asynchronous. Do we want to add a 'state' for 'AUTHENTICATED'?
    
    Authenticating slave:
    
    DISCONNECTED <--> AUTHENTICATED --> RUNNING
          ^                               |
          |                               |
          +-------------------------------+
    
    Non-Auth slave:
    
    DISCONNECTED <--> RUNNING
    
    Maybe a hint about this in the comment?


- Ben Mahler


On May 15, 2014, 2:52 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 15, 2014, 2:52 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp 4a8adf0efdd9a230a1f91d8d944fc6145a26b7c4 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp d74fc94dc63e169b360baac15127b03e124a7cec 
>   src/tests/mesos.cpp 7f59b72105ebf9e7d1e5e97b18d814f5ecdda50d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp 29dc7d434646998b04481e9ae6fe8589d7fed8e7 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial (re-)registration attempt.

Posted by Jiang Yan Xu <ya...@jxu.me>.

> On May 14, 2014, 9:20 p.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp, lines 562-567
> > <https://reviews.apache.org/r/21464/diff/2-3/?file=582039#file582039line562>
> >
> >     So if a large cluster does require slave authentication, then slaves are still going to be bombarding the master (with 1/5 load because it has a 5sec timeout but still) and cause troubles.
> >     
> >     I think we should commit this review first but am wondering if we cut 0.19 without this TODO implemented, whether the release can be considered stable.
> 
> Vinod Kone wrote:
>     It is a "scaling" issue which is not new to 0.19.0. So I think it shouldn't be a blocker for 0.19.0 release.

Yeah but unfortunately registry update significantly slows down slave registration and therefore the scaling issue is made worse this version than previously.


- Jiang Yan


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43086
-----------------------------------------------------------


On May 14, 2014, 7:52 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 7:52 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp 4a8adf0efdd9a230a1f91d8d944fc6145a26b7c4 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp d74fc94dc63e169b360baac15127b03e124a7cec 
>   src/tests/mesos.cpp 7f59b72105ebf9e7d1e5e97b18d814f5ecdda50d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp 29dc7d434646998b04481e9ae6fe8589d7fed8e7 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial (re-)registration attempt.

Posted by Vinod Kone <vi...@gmail.com>.

> On May 15, 2014, 4:20 a.m., Jiang Yan Xu wrote:
> > src/slave/slave.cpp, lines 562-567
> > <https://reviews.apache.org/r/21464/diff/2-3/?file=582039#file582039line562>
> >
> >     So if a large cluster does require slave authentication, then slaves are still going to be bombarding the master (with 1/5 load because it has a 5sec timeout but still) and cause troubles.
> >     
> >     I think we should commit this review first but am wondering if we cut 0.19 without this TODO implemented, whether the release can be considered stable.

It is a "scaling" issue which is not new to 0.19.0. So I think it shouldn't be a blocker for 0.19.0 release.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43086
-----------------------------------------------------------


On May 15, 2014, 2:52 a.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 15, 2014, 2:52 a.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp 4a8adf0efdd9a230a1f91d8d944fc6145a26b7c4 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp d74fc94dc63e169b360baac15127b03e124a7cec 
>   src/tests/mesos.cpp 7f59b72105ebf9e7d1e5e97b18d814f5ecdda50d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp 29dc7d434646998b04481e9ae6fe8589d7fed8e7 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial (re-)registration attempt.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43086
-----------------------------------------------------------

Ship it!



src/slave/slave.cpp
<https://reviews.apache.org/r/21464/#comment77106>

    So if a large cluster does require slave authentication, then slaves are still going to be bombarding the master (with 1/5 load because it has a 5sec timeout but still) and cause troubles.
    
    I think we should commit this review first but am wondering if we cut 0.19 without this TODO implemented, whether the release can be considered stable.


- Jiang Yan Xu


On May 14, 2014, 7:52 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 7:52 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp 4a8adf0efdd9a230a1f91d8d944fc6145a26b7c4 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp d74fc94dc63e169b360baac15127b03e124a7cec 
>   src/tests/mesos.cpp 7f59b72105ebf9e7d1e5e97b18d814f5ecdda50d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp 29dc7d434646998b04481e9ae6fe8589d7fed8e7 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial (re-)registration attempt.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/
-----------------------------------------------------------

(Updated May 15, 2014, 2:52 a.m.)


Review request for mesos, Ben Mahler and Jiang Yan Xu.


Changes
-------

comments. PTAL.


Summary (updated)
-----------------

Added backoff to slave's initial (re-)registration attempt.


Bugs: MESOS-1276
    https://issues.apache.org/jira/browse/MESOS-1276


Repository: mesos-git


Description
-------

See summary.


Diffs (updated)
-----

  src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
  src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
  src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
  src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
  src/slave/slave.cpp 4a8adf0efdd9a230a1f91d8d944fc6145a26b7c4 
  src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
  src/tests/master_tests.cpp d74fc94dc63e169b360baac15127b03e124a7cec 
  src/tests/mesos.cpp 7f59b72105ebf9e7d1e5e97b18d814f5ecdda50d 
  src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
  src/tests/slave_tests.cpp 29dc7d434646998b04481e9ae6fe8589d7fed8e7 

Diff: https://reviews.apache.org/r/21464/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Vinod Kone <vi...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/
-----------------------------------------------------------

(Updated May 14, 2014, 11:06 p.m.)


Review request for mesos, Ben Mahler and Jiang Yan Xu.


Changes
-------

comments.


Bugs: MESOS-1276
    https://issues.apache.org/jira/browse/MESOS-1276


Repository: mesos-git


Description
-------

See summary.


Diffs (updated)
-----

  src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
  src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
  src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
  src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
  src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
  src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
  src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
  src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
  src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
  src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 

Diff: https://reviews.apache.org/r/21464/diff/


Testing
-------

make check


Thanks,

Vinod Kone


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Vinod Kone <vi...@gmail.com>.

> On May 14, 2014, 10:53 p.m., Jiang Yan Xu wrote:
> > src/tests/slave_recovery_tests.cpp, lines 1314-1315
> > <https://reviews.apache.org/r/21464/diff/1/?file=581974#file581974line1314>
> >
> >     Are ReregisterSlaveMessage -> SlaveReregisteredMessage here and below necessary for the backoff change?

they were in general necessary and exacerbated by this change. so decided to fix them here.


- Vinod


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43041
-----------------------------------------------------------


On May 14, 2014, 11:06 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 11:06 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Jiang Yan Xu <ya...@jxu.me>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43041
-----------------------------------------------------------



src/slave/slave.cpp
<https://reviews.apache.org/r/21464/#comment77035>

    Technically register_backoff_jitter is allowed to be equal to REGISTER_RETRY_INTERVAL_MAX, right? In that case the upper bound of the random wait value is always REGISTER_RETRY_INTERVAL_MAX, which is not surprising to the user.



src/tests/slave_recovery_tests.cpp
<https://reviews.apache.org/r/21464/#comment77049>

    Are ReregisterSlaveMessage -> SlaveReregisteredMessage here and below necessary for the backoff change?


- Jiang Yan Xu


On May 14, 2014, 3:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 3:05 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>


Re: Review Request 21464: Added backoff to slave's initial registration/authentication attempt.

Posted by Ben Mahler <be...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21464/#review43044
-----------------------------------------------------------



src/slave/flags.hpp
<https://reviews.apache.org/r/21464/#comment77040>

    How about "registration_backoff_factor" or "registration_backoff_constant" since jitter to me means "the undesired deviation from true periodicity" which is not what we're using this for.
    
    Rather this is the multiplicative factor in our binary backoff:
    
    [0, c * 2^0]
    [0, c * 2^1]
    [0, c * 2^2]
        ...
    
    We're setting 'c' here, right?


- Ben Mahler


On May 14, 2014, 10:05 p.m., Vinod Kone wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/21464/
> -----------------------------------------------------------
> 
> (Updated May 14, 2014, 10:05 p.m.)
> 
> 
> Review request for mesos, Ben Mahler and Jiang Yan Xu.
> 
> 
> Bugs: MESOS-1276
>     https://issues.apache.org/jira/browse/MESOS-1276
> 
> 
> Repository: mesos-git
> 
> 
> Description
> -------
> 
> See summary.
> 
> 
> Diffs
> -----
> 
>   src/slave/constants.hpp c0975254d51fc21dbf3bc006b3169886347f0a3f 
>   src/slave/constants.cpp 1854b16f2c1ed2ae51d2061b0338c3eda4f3b403 
>   src/slave/flags.hpp 8616817b9602f6ccf56e2af61f36c1bd295df1a1 
>   src/slave/slave.hpp 29d2a1607a2b3c4443ea14d1cc2b61a065df3cad 
>   src/slave/slave.cpp f8ed65bf7b7e4f3c0834c9e22a525137856e9b23 
>   src/tests/fault_tolerance_tests.cpp 4796149beb10a6c668762f5efecd86520b809c42 
>   src/tests/master_tests.cpp 7aa678afc94869c8243485bd0604532dec43a1e2 
>   src/tests/mesos.cpp 242d98ad195a8ab1256918c48a2956e2e085d26d 
>   src/tests/slave_recovery_tests.cpp 85c57b29f6a56683e0df9788dea64ebb36e00812 
>   src/tests/slave_tests.cpp dfbc6481a3a1c6fc234dea48bcfe14deab0bea86 
> 
> Diff: https://reviews.apache.org/r/21464/diff/
> 
> 
> Testing
> -------
> 
> make check
> 
> 
> Thanks,
> 
> Vinod Kone
> 
>