You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@cassandra.apache.org by "Joel Knighton (JIRA)" <ji...@apache.org> on 2016/01/05 21:45:39 UTC

[jira] [Commented] (CASSANDRA-10969) long-running cluster sees bad gossip generation when a node restarts

    [ https://issues.apache.org/jira/browse/CASSANDRA-10969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15083749#comment-15083749 ] 

Joel Knighton commented on CASSANDRA-10969:
-------------------------------------------

Your observations on this ticket and your comment on [CASSANDRA-8113] are correct; we should handle the possibility of a legitimately long-running cluster properly. In my opinion, the current behavior is a bug, and I'll work on a fix.

You are also correct that a rolling restart should fix this because a generation of 0 (as after a restart) is special-cased in the check introduced.

> long-running cluster sees bad gossip generation when a node restarts
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-10969
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10969
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination
>         Environment: 4-node Cassandra 2.1.1 cluster, each node running on a Linux 2.6.32-431.20.3.dl6.x86_64 VM
>            Reporter: T. David Hudson
>            Assignee: Joel Knighton
>            Priority: Minor
>
> One of the nodes in a long-running Cassandra 2.1.1 cluster (not under my control) restarted.  The remaining nodes are logging errors like this:
>     "received an invalid gossip generation for peer xxx.xxx.xxx.xxx; local generation = 1414613355, received generation = 1450978722"
> The gap between the local and received generation numbers exceeds the one-year threshold added for CASSANDRA-8113.  The system clocks are up-to-date for all nodes.
> If this is a bug, the latest released Gossiper.java code in 2.1.x, 2.2.x, and 3.0.x seems not to have changed the behavior that I'm seeing.
> I presume that restarting the remaining nodes will clear up the problem, whence the minor priority.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)