You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2015/07/23 12:08:04 UTC
[jira] [Commented] (CASSANDRA-9871) Cannot replace token does not
exist - DN node removed as Fat Client
[ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638597#comment-14638597 ]
Stefania commented on CASSANDRA-9871:
-------------------------------------
I've reproduced the problem with this test:
{code}
def can_replace_down_node_test(self):
"""
@jira_ticket CASSANDRA-9871
Test that we can replace a node that is down and in status normal (DN) by using
-Dcassandra.replace_address
"""
cluster = self.cluster
cluster.populate(3)
cluster.start(wait_for_binary_proto=True)
version = cluster.version()
stress_table = 'keyspace1.standard1' if self.cluster.version() >= '2.1' else '"Keyspace1"."Standard1"'
# write some data
node1, node2, node3 = cluster.nodelist()
if version < "2.1":
node1.stress(['-n', '10000'])
else:
node1.stress(['write', 'n=10000', '-rate', 'threads=8'])
# Stop node 3
node3.stop(gently=True)
# Sleep a bit to let GOSSIP settle
time.sleep(2)
out, err = node1.nodetool('status')
self.assertEquals('', err)
debug(out)
# Create a new node to replace node3
node4 = new_node(cluster, bootstrap=True)
node4.start(jvm_args=["-Dcassandra.replace_address=127.0.0.3"], wait_for_binary_proto=True)
{code}
Interestingly if the old node is shutdown with kill -9 (gently=False in the stop method), then it can be replace without problems.
Here is the code determining if it's a fat client:
{code}
public boolean isFatClient(InetAddress endpoint)
{
EndpointState epState = endpointStateMap.get(endpoint);
if (epState == null)
{
return false;
}
return !isDeadState(epState) && !StorageService.instance.getTokenMetadata().isMember(endpoint);
}
{code}
The dead states are REMOVING, REMOVED, LEFT and HIBERNATE. The state for a clean shutdown should be SHUTDOWN, so {{!isDealState(epState)}} should be true. I still need to work out why the endpoint is not a member but it should be due to the "is now DOWN" log, which is not present when the old node is killed with -9.
> Cannot replace token does not exist - DN node removed as Fat Client
> -------------------------------------------------------------------
>
> Key: CASSANDRA-9871
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9871
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sebastian Estevez
> Assignee: Stefania
> Fix For: 2.1.x
>
>
> We lost a node due to disk failure, we tried to replace it via -Dcassandra.replace_address per -- http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
> The node would not come up with these errors in the system.log:
> {code}
> INFO [main] 2015-07-22 03:20:06,722 StorageService.java:500 - Gathering node replacement information for /10.171.115.233
> ...
> INFO [SharedPool-Worker-1] 2015-07-22 03:22:34,281 Gossiper.java:954 - InetAddress /10.111.183.101 is now UP
> INFO [GossipTasks:1] 2015-07-22 03:22:59,300 Gossiper.java:735 - FatClient /10.171.115.233 has been silent for 30000ms, removing from gossip
> ERROR [main] 2015-07-22 03:23:28,485 CassandraDaemon.java:541 - Exception encountered during startup
> java.lang.UnsupportedOperationException: Cannot replace token -1013652079972151677 which does not exist!
> {code}
> It is not clear why Gossiper removed the node as a FatClient, given that it was a full node before it died and it had tokens assigned to it (including -1013652079972151677) in system.peers and nodetool ring.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)