You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "José Guilherme Vanz (JIRA)" <ji...@apache.org> on 2016/05/21 20:20:12 UTC

[jira] [Commented] (MESOS-5359) The scheduler library should have a delay before initiating a connection with master.

    [ https://issues.apache.org/jira/browse/MESOS-5359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15295240#comment-15295240 ] 

José Guilherme Vanz commented on MESOS-5359:
--------------------------------------------

Taking a look in the related issue MESOS-5330 and scheduler library (i. e.  MesosSchedulerDriver, SchedulerProcess classes ) I think I will change the SchedulerProcess class. Right?
Based on MESOS-5330 solution I have to move the link method call from detected() method  to authenticate() and doReliableRegistration()  methods: 

{code:title=sched.cpp|borderStyle=solid}
    if (master.isSome()) {                                                                                                                                                                                                                    
      LOG(INFO) << "New master detected at " << master.get().pid();                                                                                                                                                                           
      link(master.get().pid());                                                                                                                                                                                                               
                                                                                                                                                                                                                                              
      if (credential.isSome()) {                                                                                                                                                                                                              
        // Authenticate with the master.                                                                                                                                                                                                      
        // TODO(vinod): Do a backoff for authentication similar to what                                                                                                                                                                       
        // we do for registration.                                                                                                                                                                                                            
        authenticate();                                                                                                                                                                                                                       
      } else { 
{code}

Am I right path?




> The scheduler library should have a delay before initiating a connection with master.
> -------------------------------------------------------------------------------------
>
>                 Key: MESOS-5359
>                 URL: https://issues.apache.org/jira/browse/MESOS-5359
>             Project: Mesos
>          Issue Type: Bug
>    Affects Versions: 0.29.0
>            Reporter: Anand Mazumdar
>            Assignee: José Guilherme Vanz
>              Labels: mesosphere
>
> Currently, the scheduler library does have an artificially induced delay when trying to initially establish a connection with the master. In the event of a master failover or ZK disconnect, a large number of frameworks can get disconnected and then thereby overwhelm the master with TCP SYN requests. 
> On a large cluster with many agents, the master is already overwhelmed with handling connection requests from the agents. This compounds the issue further on the master.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)