You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "José Guilherme Vanz (JIRA)" <ji...@apache.org> on 2016/05/21 20:20:12 UTC
[jira] [Commented] (MESOS-5359) The scheduler library should have a
delay before initiating a connection with master.
[ https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295240#comment-15295240 ]
José Guilherme Vanz commented on MESOS-5359:
--------------------------------------------
Taking a look in the related issue MESOS-5330 and scheduler library (i. e. MesosSchedulerDriver, SchedulerProcess classes ) I think I will change the SchedulerProcess class. Right?
Based on MESOS-5330 solution I have to move the link method call from detected() method to authenticate() and doReliableRegistration() methods:
{code:title=sched.cpp|borderStyle=solid}
if (master.isSome()) {
LOG(INFO) << "New master detected at " << master.get().pid();
link(master.get().pid());
if (credential.isSome()) {
// Authenticate with the master.
// TODO(vinod): Do a backoff for authentication similar to what
// we do for registration.
authenticate();
} else {
{code}
Am I right path?
> The scheduler library should have a delay before initiating a connection with master.
> -------------------------------------------------------------------------------------
>
> Key: MESOS-5359
> URL: https://issues.apache.org/jira/browse/MESOS-5359
> Project: Mesos
> Issue Type: Bug
> Affects Versions: 0.29.0
> Reporter: Anand Mazumdar
> Assignee: José Guilherme Vanz
> Labels: mesosphere
>
> Currently, the scheduler library does have an artificially induced delay when trying to initially establish a connection with the master. In the event of a master failover or ZK disconnect, a large number of frameworks can get disconnected and then thereby overwhelm the master with TCP SYN requests.
> On a large cluster with many agents, the master is already overwhelmed with handling connection requests from the agents. This compounds the issue further on the master.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)