You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@mesos.apache.org by Alex Clemmer <cl...@gmail.com> on 2016/12/21 03:16:13 UTC
Review Request 54928: Added initial random delay to agent
(re)registration.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/
-----------------------------------------------------------
Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.
Bugs: MESOS-6803
https://issues.apache.org/jira/browse/MESOS-6803
Repository: mesos
Description
-------
Currently when a master fails over, agents that do not use
authentication will choose a random time to re-register to avoid the
"thundering herd" problem. This is not true of the authenticated
codepath -- agents that have a credential will attempt to re-register
immediately.
This issue adds a random delay to the initial authentication to match
the non-authenticated code path.
This issue caps off a chain of reviews that fixes tests to work with
this change. This issue also resolves MESOS-6803.
Diffs
-----
src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18
Diff: https://reviews.apache.org/r/54928/diff/
Testing
-------
`make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.
Thanks,
Alex Clemmer
Re: Review Request 54928: Added initial random delay to agent
(re)registration.
Posted by Joseph Wu <jo...@mesosphere.io>.
> On Dec. 22, 2016, 3:49 a.m., Joseph Wu wrote:
> >
Note: Even with this chain of reviews, the following tests continue to fail when `HAS_AUTHENTICATION=0`:
```
[ FAILED ] AuthenticationTest.UnauthenticatedFramework
[ FAILED ] AuthenticationTest.UnauthenticatedSlave
[ FAILED ] AuthenticationTest.MismatchedFrameworkInfoPrincipal
[ FAILED ] AuthenticationTest.DisabledFrameworkAuthenticationPrincipalMismatch
[ FAILED ] AuthenticationTest.RetryFrameworkAuthentication
[ FAILED ] AuthenticationTest.RetrySlaveAuthentication
[ FAILED ] AuthenticationTest.DropIntermediateSASLMessage
[ FAILED ] AuthenticationTest.DropIntermediateSASLMessageForSlave
[ FAILED ] AuthenticationTest.DropFinalSASLMessage
[ FAILED ] AuthenticationTest.DropFinalSASLMessageForSlave
[ FAILED ] AuthenticationTest.MasterFailover
[ FAILED ] AuthenticationTest.MasterFailoverDuringSlaveAuthentication
[ FAILED ] AuthenticationTest.LeaderElection
[ FAILED ] AuthenticationTest.LeaderElectionDuringSlaveAuthentication
[ FAILED ] AuthenticationTest.SchedulerFailover
[ FAILED ] AuthenticationTest.RejectedSchedulerFailover
[ FAILED ] FaultToleranceTest.MasterFailover
[ FAILED ] FaultToleranceTest.FrameworkReliableRegistration
[ FAILED ] PartitionTest.PartitionedSlaveStatusUpdates
[ FAILED ] RateLimitingTest.SchedulerFailover
[ FAILED ] ExamplesTest.TestFramework
[ FAILED ] ExamplesTest.NoExecutorFramework
[ FAILED ] ExamplesTest.DiskFullFramework
[ FAILED ] ContentType/AgentAPITest.NestedContainerLaunchFalse/0, where GetParam() = application/x-protobuf
[ FAILED ] ContentType/AgentAPITest.NestedContainerLaunchFalse/1, where GetParam() = application/json
[ FAILED ] ContentType/AgentAPITest.NestedContainerLaunch/0, where GetParam() = application/x-protobuf
[ FAILED ] ContentType/AgentAPITest.NestedContainerLaunch/1, where GetParam() = application/json
[ FAILED ] ContentType/AgentAPITest.LaunchNestedContainerSessionAttachFailure/0, where GetParam() = application/x-protobuf
[ FAILED ] ContentType/AgentAPITest.LaunchNestedContainerSessionAttachFailure/1, where GetParam() = application/json
[ FAILED ] ContentType/AgentAPIStreamingTest.AttachContainerInput/0, where GetParam() = application/x-protobuf+recordio
[ FAILED ] ContentType/AgentAPIStreamingTest.AttachContainerInput/1, where GetParam() = application/json+recordio
```
Some of these simply need to be disabled when there is no authentication. But this is an orthogonal issue to address.
- Joseph
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/#review159949
-----------------------------------------------------------
On Dec. 20, 2016, 7:16 p.m., Alex Clemmer wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54928/
> -----------------------------------------------------------
>
> (Updated Dec. 20, 2016, 7:16 p.m.)
>
>
> Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.
>
>
> Bugs: MESOS-6803
> https://issues.apache.org/jira/browse/MESOS-6803
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Currently when a master fails over, agents that do not use
> authentication will choose a random time to re-register to avoid the
> "thundering herd" problem. This is not true of the authenticated
> codepath -- agents that have a credential will attempt to re-register
> immediately.
>
> This issue adds a random delay to the initial authentication to match
> the non-authenticated code path.
>
> This issue caps off a chain of reviews that fixes tests to work with
> this change. This issue also resolves MESOS-6803.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18
>
> Diff: https://reviews.apache.org/r/54928/diff/
>
>
> Testing
> -------
>
> `make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.
>
>
> Thanks,
>
> Alex Clemmer
>
>
Re: Review Request 54928: Added initial random delay to agent
(re)registration.
Posted by Joseph Wu <jo...@mesosphere.io>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/54928/#review159949
-----------------------------------------------------------
Ship it!
- Joseph Wu
On Dec. 20, 2016, 7:16 p.m., Alex Clemmer wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/54928/
> -----------------------------------------------------------
>
> (Updated Dec. 20, 2016, 7:16 p.m.)
>
>
> Review request for mesos, Andrew Schwartzmeyer, Daniel Pravat, John Kordich, and Joseph Wu.
>
>
> Bugs: MESOS-6803
> https://issues.apache.org/jira/browse/MESOS-6803
>
>
> Repository: mesos
>
>
> Description
> -------
>
> Currently when a master fails over, agents that do not use
> authentication will choose a random time to re-register to avoid the
> "thundering herd" problem. This is not true of the authenticated
> codepath -- agents that have a credential will attempt to re-register
> immediately.
>
> This issue adds a random delay to the initial authentication to match
> the non-authenticated code path.
>
> This issue caps off a chain of reviews that fixes tests to work with
> this change. This issue also resolves MESOS-6803.
>
>
> Diffs
> -----
>
> src/slave/slave.cpp a7a3a394e5e4b7f40a051663cd70add3890bdf18
>
> Diff: https://reviews.apache.org/r/54928/diff/
>
>
> Testing
> -------
>
> `make check` on Unix and I ran the test suite 1000 times overnight, minus the few flaky tests.
>
>
> Thanks,
>
> Alex Clemmer
>
>