You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Anand Mazumdar (JIRA)" <ji...@apache.org> on 2016/05/27 05:38:13 UTC

[jira] [Commented] (MESOS-5468) Add logic in long-lived-framework to handle network partitions.

    [ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303537#comment-15303537 ] 

Anand Mazumdar commented on MESOS-5468:
---------------------------------------

[~guoger] I edited the JIRA description a bit. Let me know if it does not align with your observations.

Also, we do close the socket on the master's side upon a framework disconnect/teardown. https://github.com/apache/mesos/blob/master/src/master/master.cpp#L2795

Can you confirm on your end if you are not seeing this behavior and some steps to reproduce it?

> Add logic in long-lived-framework to handle network partitions.
> ---------------------------------------------------------------
>
>                 Key: MESOS-5468
>                 URL: https://issues.apache.org/jira/browse/MESOS-5468
>             Project: Mesos
>          Issue Type: Task
>          Components: framework, master
>            Reporter: Jay Guo
>
> Currently long-lived-framework does not handle network partitions i.e explicitly trying to {{reconnect}} with the master upon not receiving {{HEARTBEAT}} events for a prolonged amount of time. If the master disconnects a framework without the framework being aware of it (one way partition), the framework should explicitly issue a {{reconnect}} request via the scheduler library after a certain period of time.
> *On the other hand*, should we close TCP socket on master side when teardown a framework? Currently the tcp socket is left alive even framework has been deactivated. This results in framework sending invalid {{Call}} to master and re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)