You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Peter Haggerty (JIRA)" <ji...@apache.org> on 2013/12/14 14:51:07 UTC

[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission

    [ https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13848357#comment-13848357 ] 

Peter Haggerty commented on CASSANDRA-5780:
-------------------------------------------

We just ran into this again when a node rebooted and came back up thinking everything was fine, but every other node in the ring disagreed. This was resolved by our normal "manual restart" procedure where we stop thrift, gossip, flush the node, drain the node then restart cassandra but it definitely caused some confusion for "nodetool status" and "nodetool info" to report that the node was up and a working part of the cluster when in fact it wasn't.

The nodes in this state definitely do *not* make it clear that they are not part of the cluster anymore.

> nodetool status and ring report incorrect/stale information after decommission
> ------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-5780
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-5780
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Tools
>            Reporter: Peter Haggerty
>            Priority: Trivial
>              Labels: lhf, ponies
>
> Cassandra 1.2.6 ring of 12 instances, each with 256 tokens.
> Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring.
> The 9 instances of cassandra that are in the ring all correctly report nodetool status information for the ring and have the same data.
> After the first node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> "nodetool status" on "decommissioned-3rd" reports 9 nodes
> The storage load information is similarly stale on the various decommissioned nodes. The nodetool status and ring commands continue to return information as if they were part of a cluster and they appear to return the last information that they saw.
> In contrast the nodetool info command fails with an exception, which isn't ideal but at least indicates that there was a failure rather than returning stale information.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)