You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2013/01/03 23:26:14 UTC

[jira] [Updated] (CASSANDRA-5102) upgrading from 1.1.7 to 1.2.0 caused upgraded nodes to only know about other 1.2.0 nodes

     [ https://issues.apache.org/jira/browse/CASSANDRA-5102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams updated CASSANDRA-5102:
----------------------------------------

    Attachment: 5102.txt

Here is a sad story of how multiple release cycles ended up causing a regression.

The cause of these exceptions is CASSANDRA-4576.  There, we added checks against VERSION_11 to prevent using the compatible mode with newer node that didn't need it. VERSION_11 has an actual value of 4.  We closed the ticket on Sept 18, and that was that.

Fast forward to November, where we closed CASSANDRA-4880.  To do this, we needed a protocol version bump, and created VERSION_117, which has an actual value of 5.  Unfortunately we used <= comparisons in CASSANDRA-4576, but now had created a version higher than VERSION_11 that still needed the compatibility, and we got our original bug back.

The effect of this is if you upgrade from nodes on 1.1.7 or later to 1.2.0, the 1.2.0 nodes won't be able to gossip with the 1.1.7 nodes and they won't be visible in ring output on the 1.2.0 node until they too are on 1.2.0.  The 1.1.7 nodes will still know about the 1.2.0 node, but they won't be able to successfully gossip with it, and keep it marked down.

Patch attached to go ahead and compare more explicitly against VERSION_12 to fix this, but I think it highlights a deeper problem, which is that if we ever do need to do another protocol bump in a minor, stable branch, we're out of luck because there's no space between VERSION_117 and VERSION_12.
                
> upgrading from 1.1.7 to 1.2.0 caused upgraded nodes to only know about other 1.2.0 nodes
> ----------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5102
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5102
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0
>            Reporter: Michael Kjellman
>            Assignee: Brandon Williams
>            Priority: Blocker
>         Attachments: 5102.txt
>
>
> I upgraded as I have since 0.86 and things didn't go very smoothly.
> I did a nodetool drain to my 1.1.7 node and changed my puppet config to use the new merged config. When it came back up (without any errors in the log) a nodetool ring only showed itself. I upgraded another node and sure enough now nodetool ring showed two nodes.
> I tried resetting the local schema. The upgraded node happily grabbed the schema again but still only 1.2 nodes were visible in the ring to any upgraded nodes.
> "Interesting" Log Lines:
>  INFO 14:43:41,997 Using saved token [42535295865117307932921825928971026436]
> ....
>  WARN 23:04:03,361 No host ID found, created 5cef7f51-688d-46c3-9fe4-6c82bde4bb98 (Note: This should happen exa
> ctly once per node).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira