You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Vinod Kone (JIRA)" <ji...@apache.org> on 2015/04/30 21:28:06 UTC

[jira] [Commented] (MESOS-2681) Slave process must restart to update ensemble members

    [ https://issues.apache.org/jira/browse/MESOS-2681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522100#comment-14522100 ] 

Vinod Kone commented on MESOS-2681:
-----------------------------------

[~rgs] Is there something Mesos can do in the meantime to shield from this issue? I'm assuming the clients "loop forever" because the ZK sessions do not expire even if the servers no longer exist!?

> Slave process must restart to update ensemble members
> -----------------------------------------------------
>
>                 Key: MESOS-2681
>                 URL: https://issues.apache.org/jira/browse/MESOS-2681
>             Project: Mesos
>          Issue Type: Bug
>          Components: slave
>            Reporter: Joe Smith
>
> Right now, if a ZooKeeper ensemble has (for instance) more observers added to it, the Mesos Slaves will not see them, and continue to attempt to connect to only the original members. A restart of the slave process is required to call {{getaddrinfo}} again and enumerate the list of hosts in the ensemble.
> Subsequent {{getaddrinfo}} calls _will only_ occur when {{zookeeper_init()}} is called again, that is to say: when the old session expires and you need to create a new one. If you swap all hosts in your ensemble too fast, without permitting time for old sessions to expire, you'd end up with clients looping forever, trying to connect to the old servers in order to get its old session expired.
> This is best tracked by ZOOKEEPER-1998, where these is some discussion about a necessary improvement to the implementation already in the 3.5.x branch, or putting this functionality (debatably a feature vs. fixing a bug) in 3.4.x.
> (Thanks to [~rgs] for reviewing this as well)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)