You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Paulo Motta (JIRA)" <ji...@apache.org> on 2015/07/21 18:57:05 UTC

[jira] [Commented] (CASSANDRA-8621) For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream

    [ https://issues.apache.org/jira/browse/CASSANDRA-8621?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14635393#comment-14635393 ] 

Paulo Motta commented on CASSANDRA-8621:
----------------------------------------

I'd like to discuss/validate a possible solution before diving into implementation.

Upon receiving a SocketException during a stablished StreamSession, the reconnection initiator will:
# Mark its view of the StreamSession as "isReconnecting";
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we may ignore/log them while "isReconnecting" is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Send a StreamInitMessage to the session peer with "isReconnecting" flag set to true.
# After the initialization is complete, the "StreamSession.isReconnecting" flag is set to false and the onInitializationComplete() is called to resume the streaming protocol.
# In case of failure during the process, the initiator will retry to stablish the connection up to max_streaming_retries property, and fail the stream session if it's not able to reconnect.

Upon receiving a StreamInitMessage with "isReconnecting=true" the reconnection follower will:
# Fetch the StreamSession object for that session: 
#* If StreamSession.isReconnecting is set to true on the reconnection follower, it means that peer is also trying to act as a reconnection initiator, so we have a conflict. We can use the node identifier or IP as a universal tie-breaker. Only the peer with the lowest IP/ID will have it's StreamInitMessage accepted by the other peer in case of a conflict. The other peer will have its init socket closed.
#* Otherwise, it will set its StreamSession.isReconnecting flag to true.
# Stop/close both incoming and outgoing message handlers and respective sockets;
#* Since the closing of sockets might generate additional SocketExceptions, we may ignore them while "isReconnecting" is set to true.
# Create new incoming and outgoing message handlers and sockets.
# Attach the outgoing socket to the new outgoing message handler.
# After the incoming socket is attached to the incoming message handler, the session is restablished and the "StreamSession.isReconnecting" is set to false.
# The session is restablished and everybody is happy.

What do you think of this approach [~yukim]?

> For streaming operations, when a socket is closed/reset, we should retry/reinitiate that stream
> -----------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-8621
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8621
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>            Reporter: Jeremy Hanna
>            Assignee: Paulo Motta
>
> Currently we have a setting (streaming_socket_timeout_in_ms) that will timeout and retry the stream operation in the case where tcp is idle for a period of time.  However in the case where the socket is closed or reset, we do not retry the operation.  This can happen for a number of reasons, including when a firewall sends a reset message on a socket during a streaming operation, such as nodetool rebuild necessarily across DCs or repairs.
> Doing a retry would make the streaming operations more resilient.  It would be good to log the retry clearly as well (with the stream session ID and node address).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)