You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (JIRA)" <ji...@apache.org> on 2011/07/28 00:35:09 UTC

[jira] [Resolved] (CASSANDRA-2603) node stuck in 'Down' in nodetool ring, until disablegossip/enablegossip flapped it back into submission

     [ https://issues.apache.org/jira/browse/CASSANDRA-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Brandon Williams resolved CASSANDRA-2603.
-----------------------------------------

    Resolution: Cannot Reproduce

I don't think we're going to get anywhere on this without at least debug logs, preferably trace. I suspect that like CASSANDRA-2947 there was a problem that was fixed by CASSANDRA-2496

> node stuck in 'Down' in nodetool ring, until disablegossip/enablegossip flapped it back into submission
> -------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-2603
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2603
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.7.4
>            Reporter: Peter Schuller
>            Assignee: Brandon Williams
>
> Cluster with 0.7.4 and 9 machines. I was doing rolling restarts so nodes were expected to have flappted up/down a bit.
> After cleanup, I noticed that one of the nodes 'nodetool ring' claimed that another node was Down. I'll call the node that considered the *other* one to be down "UpNode" and the node that was considered *down* "DownNode".
> DownNode was the next successor on the ring relative to UpNode. Only UpNode thought it was down; all others members of the clusters agreed it was up. This stayed the case for almost 24 hours.
> In system.log on UpNode, it is clearly visible that DownNode flapped to state UP recently with no notification of flapping to state DOWN afterwards. Yet 'nodetool ring' reported Down.
> Today, I did disablegossip+wait-for-a-bit+enablegossip on DownNode. This caused 'nodetool ring' on UpNode to again reflect reality that DownNode is in fact up.
> I do not have a reproducable test case but wanted to file it since I don't remember seeing, and didn't easily find, a JIRA bug indicating a bug with this effect has recently been fixed.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira