You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alex Clemmer <cl...@gmail.com> on 2016/12/21 03:16:13 UTC

Review Request 54928: Added initial random delay to agent (re)registration.

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/
-----------------------------------------------------------

Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.


Bugs: MESOS-6803
    https://issues.apache.org/jira/browse/MESOS-6803


Repository: mesos


Description
-------

Currently when a master fails over, agents that do not use
authentication will choose a random time to re-register to avoid the
"thundering herd" problem. This is not true of the authenticated
codepath -- agents that have a credential will attempt to re-register
immediately.

This issue adds a random delay to the initial authentication to match
the non-authenticated code path.

This issue caps off a chain of reviews that fixes tests to work with
this change. This issue also resolves MESOS-6803.


Diffs
-----

  src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18 

Diff: https://reviews.apache.org/r/54928/diff/


Testing
-------

`make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.


Thanks,

Alex Clemmer


Re: Review Request 54928: Added initial random delay to agent (re)registration.

Posted by Joseph Wu <jo...@mesosphere.io>.

> On Dec. 22, 2016, 3:49 a.m., Joseph Wu wrote:
> >

Note: Even with this chain of reviews, the following tests continue to fail when `HAS_AUTHENTICATION=0`:
```
[  FAILED  ] AuthenticationTest.UnauthenticatedFramework
[  FAILED  ] AuthenticationTest.UnauthenticatedSlave
[  FAILED  ] AuthenticationTest.MismatchedFrameworkInfoPrincipal
[  FAILED  ] AuthenticationTest.DisabledFrameworkAuthenticationPrincipalMismatch
[  FAILED  ] AuthenticationTest.RetryFrameworkAuthentication
[  FAILED  ] AuthenticationTest.RetrySlaveAuthentication
[  FAILED  ] AuthenticationTest.DropIntermediateSASLMessage
[  FAILED  ] AuthenticationTest.DropIntermediateSASLMessageForSlave
[  FAILED  ] AuthenticationTest.DropFinalSASLMessage
[  FAILED  ] AuthenticationTest.DropFinalSASLMessageForSlave
[  FAILED  ] AuthenticationTest.MasterFailover
[  FAILED  ] AuthenticationTest.MasterFailoverDuringSlaveAuthentication
[  FAILED  ] AuthenticationTest.LeaderElection
[  FAILED  ] AuthenticationTest.LeaderElectionDuringSlaveAuthentication
[  FAILED  ] AuthenticationTest.SchedulerFailover
[  FAILED  ] AuthenticationTest.RejectedSchedulerFailover
[  FAILED  ] FaultToleranceTest.MasterFailover
[  FAILED  ] FaultToleranceTest.FrameworkReliableRegistration
[  FAILED  ] PartitionTest.PartitionedSlaveStatusUpdates
[  FAILED  ] RateLimitingTest.SchedulerFailover
[  FAILED  ] ExamplesTest.TestFramework
[  FAILED  ] ExamplesTest.NoExecutorFramework
[  FAILED  ] ExamplesTest.DiskFullFramework
[  FAILED  ] ContentType/AgentAPITest.NestedContainerLaunchFalse/0, where GetParam() = application/x-protobuf
[  FAILED  ] ContentType/AgentAPITest.NestedContainerLaunchFalse/1, where GetParam() = application/json
[  FAILED  ] ContentType/AgentAPITest.NestedContainerLaunch/0, where GetParam() = application/x-protobuf
[  FAILED  ] ContentType/AgentAPITest.NestedContainerLaunch/1, where GetParam() = application/json
[  FAILED  ] ContentType/AgentAPITest.LaunchNestedContainerSessionAttachFailure/0, where GetParam() = application/x-protobuf
[  FAILED  ] ContentType/AgentAPITest.LaunchNestedContainerSessionAttachFailure/1, where GetParam() = application/json
[  FAILED  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0, where GetParam() = application/x-protobuf+recordio
[  FAILED  ] ContentType/AgentAPIStreamingTest.AttachContainerInput/1, where GetParam() = application/json+recordio
```

Some of these simply need to be disabled when there is no authentication.  But this is an orthogonal issue to address.


- Joseph


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/#review159949
-----------------------------------------------------------


On Dec. 20, 2016, 7:16 p.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54928/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2016, 7:16 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.
> 
> 
> Bugs: MESOS-6803
>     https://issues.apache.org/jira/browse/MESOS-6803
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently when a master fails over, agents that do not use
> authentication will choose a random time to re-register to avoid the
> "thundering herd" problem. This is not true of the authenticated
> codepath -- agents that have a credential will attempt to re-register
> immediately.
> 
> This issue adds a random delay to the initial authentication to match
> the non-authenticated code path.
> 
> This issue caps off a chain of reviews that fixes tests to work with
> this change. This issue also resolves MESOS-6803.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18 
> 
> Diff: https://reviews.apache.org/r/54928/diff/
> 
> 
> Testing
> -------
> 
> `make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>


Re: Review Request 54928: Added initial random delay to agent (re)registration.

Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/#review159949
-----------------------------------------------------------


Ship it!




- Joseph Wu


On Dec. 20, 2016, 7:16 p.m., Alex Clemmer wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54928/
> -----------------------------------------------------------
> 
> (Updated Dec. 20, 2016, 7:16 p.m.)
> 
> 
> Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.
> 
> 
> Bugs: MESOS-6803
>     https://issues.apache.org/jira/browse/MESOS-6803
> 
> 
> Repository: mesos
> 
> 
> Description
> -------
> 
> Currently when a master fails over, agents that do not use
> authentication will choose a random time to re-register to avoid the
> "thundering herd" problem. This is not true of the authenticated
> codepath -- agents that have a credential will attempt to re-register
> immediately.
> 
> This issue adds a random delay to the initial authentication to match
> the non-authenticated code path.
> 
> This issue caps off a chain of reviews that fixes tests to work with
> this change. This issue also resolves MESOS-6803.
> 
> 
> Diffs
> -----
> 
>   src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18 
> 
> Diff: https://reviews.apache.org/r/54928/diff/
> 
> 
> Testing
> -------
> 
> `make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.
> 
> 
> Thanks,
> 
> Alex Clemmer
> 
>