You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Jonathan Shook (JIRA)" <ji...@apache.org> on 2015/01/16 04:36:35 UTC

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

    [ https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14279768#comment-14279768 ] 

Jonathan Shook commented on CASSANDRA-8621:
-------------------------------------------

For the scenario that prompted this ticket, it appeared that the streaming process was completely stalled. One side of the stream (the sender side) had an exception that appeared to be a connection reset. The receiving side appeared to think that the connection was still active, at least in terms of the netstats reported by nodetool. We were unable to verify whether this was specifically the case in terms of connected sockets due to the fact that there were multiple streams for those peers, and there is no simple way to correlate a specific stream to a tcp session.

[~yukim]
If there is a diagnostic method that we can use to provide more information about specific stalled streams, please let us know so that we can approach the user to get more data.


> For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8621
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremy Hanna
>            Assignee: Yuki Morishita
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will timeout and retry the stream operation in the case where tcp is idle for a period of time.  However in the case where the socket is closed or reset, we do not retry the operation.  This can happen for a number of reasons, including when a firewall sends a reset message on a socket during a streaming operation, such as nodetool rebuild necessarily across DCs or repairs.
> Doing a retry would make the streaming operations more resilient.  It would be good to log the retry clearly as well (with the stream session ID and node address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)