You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Henry Saputra (JIRA)" <ji...@apache.org> on 2014/06/19 23:19:25 UTC

[jira] [Commented] (SPARK-704) ConnectionManager sometimes cannot detect loss of sending connections

    [ https://issues.apache.org/jira/browse/SPARK-704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14037904#comment-14037904 ] 

Henry Saputra commented on SPARK-704:
-------------------------------------

Trying to reproduce and understand the issue. 
After a new SendingConnection is created it is creating its own channel then register to the ConnectionManager#selector to listen to state changes. 
When SendingConnection is being ask to send message, it will call Connection#registerInterest to ready for write which later in the 

Detecting whether SendingConnection is disconnected will be done when there is an attempt to write to the channel which will throw an exception which I believe should be sufficient for the purpose of the issue?

Just want to clarify if I understand the problem correctly.

> ConnectionManager sometimes cannot detect loss of sending connections
> ---------------------------------------------------------------------
>
>                 Key: SPARK-704
>                 URL: https://issues.apache.org/jira/browse/SPARK-704
>             Project: Spark
>          Issue Type: Bug
>            Reporter: Charles Reiss
>            Assignee: Henry Saputra
>
> ConnectionManager currently does not detect when SendingConnections disconnect except if it is trying to send through them. As a result, a node failure just after a connection is initiated but before any acknowledgement messages can be sent may result in a hang.
> ConnectionManager has code intended to detect this case by detecting the failure of a corresponding ReceivingConnection, but this code assumes that the remote host:port of the ReceivingConnection is the same as the ConnectionManagerId, which is almost never true. Additionally, there does not appear to be any reason to assume a corresponding ReceivingConnection will exist.



--
This message was sent by Atlassian JIRA
(v6.2#6252)