You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@mesos.apache.org by "Jay Guo (JIRA)" <ji...@apache.org> on 2016/05/27 05:17:12 UTC

[jira] [Commented] (MESOS-5468) Add logic to long-lived-framework to handle HEARTBEAT timeout

    [ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303509#comment-15303509 ] 

Jay Guo commented on MESOS-5468:
--------------------------------

To reproduce:
* Start master and agent
* Run long-lived-framework
* Issue {{# iptables -A OUTPUT -p tcp -d <master-ip> --dport 5050 -j DROP}} on framework machine to emulate network partition
* Wait till master deactivates the framework
* Remove iptables rule added above to emulate network rejoin
* See log of both long-lived-framework and master. {{netstat -tpn}} also shows enormous {{TIME_WAIT}} sockets which is the result of re-detection

> Add logic to long-lived-framework to handle HEARTBEAT timeout
> -------------------------------------------------------------
>
>                 Key: MESOS-5468
>                 URL: https://issues.apache.org/jira/browse/MESOS-5468
>             Project: Mesos
>          Issue Type: Bug
>          Components: framework, master
>            Reporter: Jay Guo
>
> Currently long-lived-framework does not handle HEARTBEAT timeout. If master teardown the framework without framework being aware of it (network partition), the framework keeps waiting for {{Event}} until reconnected.
> *On the other hand*, should we close TCP socket on master side when teardown a framework? Currently the tcp socket is left alive even framework has been deactivated. This results in framework sending invalid {{Call}} to master and re-detection.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)