You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Sergio Bossa (JIRA)" <ji...@apache.org> on 2013/06/24 14:53:23 UTC

[jira] [Comment Edited] (CASSANDRA-5692) Race condition in detecting version on a mixed 1.1/1.2 cluster

    [ https://issues.apache.org/jira/browse/CASSANDRA-5692?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13691941#comment-13691941 ] 

Sergio Bossa edited comment on CASSANDRA-5692 at 6/24/13 12:53 PM:
-------------------------------------------------------------------

Given the first message which should setup the version is sent along the same connection, this patch doesn't actually work, causing two 1.2 nodes to block each other during bootstrap.

So I'm attaching a different patch (0005), which implements a simple handshake by assuming version 6 and trying to read the actual version on a different thread, so that it can be interrupted (disconnected) and can retry the handshake until one of the following happens:
1) The version is confirmed to be >= 6, and the handshake succeeds.
2) The version is an old one, hence it is expected to be found among the tracked versions when the first gossip message is received.

Sorry for all the different patches, but the implementation details of all the version exchange machinery turned out to be quite subtle.
                
      was (Author: sbtourist):
    Given the first message which should setup the version is sent along the same connection, this patch doesn't actually work, causing two 1.2 nodes to block each other during bootstrap.

So I'm attaching a different patch, which implements a simple handshake by assuming version 6 and trying to read the actual version on a different thread, so that it can be interrupted (disconnected) and can retry the handshake until one of the following happens:
1) The version is confirmed to be >= 6, and the handshake succeeds.
2) The version is an old one, hence it is expected to be found among the tracked versions when the first gossip message is received.

Sorry for all the different patches, but the implementation details of all the version exchange machinery turned out to be quite subtle.
                  
> Race condition in detecting version on a mixed 1.1/1.2 cluster
> --------------------------------------------------------------
>
>                 Key: CASSANDRA-5692
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5692
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.1.9, 1.2.5
>            Reporter: Sergio Bossa
>            Priority: Minor
>         Attachments: 5692-0001.patch, 5692-0004.patch, 5692-0005.patch
>
>
> On a mixed 1.1 / 1.2 cluster, starting 1.2 nodes fires sometimes a race condition in version detection, where the 1.2 node wrongly detects version 6 for a 1.1 node.
> It works as follows:
> 1) The just started 1.2 node quickly opens an OutboundTcpConnection toward a 1.1 node before receiving any messages from the latter.
> 2) Given the version is correctly detected only when the first message is received, the version is momentarily set at 6.
> 3) This opens an OutboundTcpConnection from 1.2 to 1.1 at version 6, which gets stuck in the connect() method.
> Later, the version is correctly fixed, but all outbound connections from 1.2 to 1.1 are stuck at this point.
> Evidence from 1.2 logs:
> TRACE 13:48:31,133 Assuming current protocol version for /127.0.0.2
> DEBUG 13:48:37,837 Setting version 5 for /127.0.0.2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira