You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2015/07/23 12:08:04 UTC

[jira] [Commented] (CASSANDRA-9871) Cannot replace token does not exist - DN node removed as Fat Client

    [ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638597#comment-14638597 ] 

Stefania commented on CASSANDRA-9871:
-------------------------------------

I've reproduced the problem with this test:

{code}
    def can_replace_down_node_test(self):
        """
        @jira_ticket CASSANDRA-9871
        Test that we can replace a node that is down and in status normal (DN) by using
        -Dcassandra.replace_address
        """
        cluster = self.cluster
        cluster.populate(3)
        cluster.start(wait_for_binary_proto=True)

        version = cluster.version()
        stress_table = 'keyspace1.standard1' if self.cluster.version() >= '2.1' else '"Keyspace1"."Standard1"'

        # write some data
        node1, node2, node3 = cluster.nodelist()
        if version < "2.1":
            node1.stress(['-n', '10000'])
        else:
            node1.stress(['write', 'n=10000', '-rate', 'threads=8'])

        # Stop node 3
        node3.stop(gently=True)

        # Sleep a bit to let GOSSIP settle
        time.sleep(2)

        out, err = node1.nodetool('status')
        self.assertEquals('', err)
        debug(out)

        # Create a new node to replace node3
        node4 = new_node(cluster, bootstrap=True)
        node4.start(jvm_args=["-Dcassandra.replace_address=127.0.0.3"], wait_for_binary_proto=True)
{code}

Interestingly if the old node is shutdown with kill -9 (gently=False in the stop method), then it can be replace without problems.

Here is the code determining if it's a fat client:

{code}
    public boolean isFatClient(InetAddress endpoint)
    {
        EndpointState epState = endpointStateMap.get(endpoint);
        if (epState == null)
        {
            return false;
        }
        return !isDeadState(epState) && !StorageService.instance.getTokenMetadata().isMember(endpoint);
    }
{code}

The dead states are REMOVING, REMOVED, LEFT and HIBERNATE. The state for a clean shutdown should be SHUTDOWN, so {{!isDealState(epState)}} should be true. I still need to work out why the endpoint is not a member but it should be due to the "is now DOWN" log, which is not present when the old node is killed with -9.



> Cannot replace token does not exist - DN node removed as Fat Client
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-9871
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9871
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sebastian Estevez
>            Assignee: Stefania
>             Fix For: 2.1.x
>
>
> We lost a node due to disk failure, we tried to replace it via -Dcassandra.replace_address per -- http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
> The node would not come up with these errors in the system.log:
> {code}
> INFO  [main] 2015-07-22 03:20:06,722  StorageService.java:500 - Gathering node replacement information for /10.171.115.233
> ...
> INFO  [SharedPool-Worker-1] 2015-07-22 03:22:34,281  Gossiper.java:954 - InetAddress /10.111.183.101 is now UP
> INFO  [GossipTasks:1] 2015-07-22 03:22:59,300  Gossiper.java:735 - FatClient /10.171.115.233 has been silent for 30000ms, removing from gossip
> ERROR [main] 2015-07-22 03:23:28,485  CassandraDaemon.java:541 - Exception encountered during startup
> java.lang.UnsupportedOperationException: Cannot replace token -1013652079972151677 which does not exist!
> {code}
> It is not clear why Gossiper removed the node as a FatClient, given that it was a full node before it died and it had tokens assigned to it (including -1013652079972151677) in system.peers and nodetool ring. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)