You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@mesos.apache.org by "Jay Guo (JIRA)" <ji...@apache.org> on 2016/05/27 05:17:12 UTC
[jira] [Commented] (MESOS-5468) Add logic to long-lived-framework
to handle HEARTBEAT timeout
[ https://issues.apache.org/jira/browse/MESOS-5468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15303509#comment-15303509 ]
Jay Guo commented on MESOS-5468:
--------------------------------
To reproduce:
* Start master and agent
* Run long-lived-framework
* Issue {{# iptables -A OUTPUT -p tcp -d <master-ip> --dport 5050 -j DROP}} on framework machine to emulate network partition
* Wait till master deactivates the framework
* Remove iptables rule added above to emulate network rejoin
* See log of both long-lived-framework and master. {{netstat -tpn}} also shows enormous {{TIME_WAIT}} sockets which is the result of re-detection
> Add logic to long-lived-framework to handle HEARTBEAT timeout
> -------------------------------------------------------------
>
> Key: MESOS-5468
> URL: https://issues.apache.org/jira/browse/MESOS-5468
> Project: Mesos
> Issue Type: Bug
> Components: framework, master
> Reporter: Jay Guo
>
> Currently long-lived-framework does not handle HEARTBEAT timeout. If master teardown the framework without framework being aware of it (network partition), the framework keeps waiting for {{Event}} until reconnected.
> *On the other hand*, should we close TCP socket on master side when teardown a framework? Currently the tcp socket is left alive even framework has been deactivated. This results in framework sending invalid {{Call}} to master and re-detection.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)