You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Anand Mazumdar (JIRA)" <ji...@apache.org> on 2016/05/27 05:38:13 UTC
[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework
to handle network partitions.
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303537#comment-15303537 ]
Anand Mazumdar commented on MESOS-5468:
---------------------------------------
[~guoger] I edited the JIRA description a bit. Let me know if it does not align with your observations.
Also, we do close the socket on the master's side upon a framework disconnect/teardown. https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2795
Can you confirm on your end if you are not seeing this behavior and some steps to reproduce it?
> Add logic in long-lived-framework to handle network partitions.
> ---------------------------------------------------------------
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
> Issue Type: Task
> Components: framework, master
> Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e explicitly trying to {{reconnect}} with the master upon not receiving {{HEARTBEAT}} events for a prolonged amount of time. If the master disconnects a framework without the framework being aware of it (one way partition), the framework should explicitly issue a {{reconnect}} request via the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown a framework? Currently the tcp socket is left alive even framework has been deactivated. This results in framework sending invalid {{Call}} to master and re-detection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)