You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Stefania (JIRA)" <ji...@apache.org> on 2015/07/23 16:41:04 UTC

[jira] [Comment Edited] (CASSANDRA-9871) Cannot replace token does not exist - DN node removed as Fat Client

    [ https://issues.apache.org/jira/browse/CASSANDRA-9871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14638902#comment-14638902 ] 

Stefania edited comment on CASSANDRA-9871 at 7/23/15 2:40 PM:
--------------------------------------------------------------

bq. can you provide a dump of both nodetool gossipinfo and nodetool status?

{code}
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  127.0.0.1  82.71 KB   256     ?       af23fcbb-fce4-495c-b5b5-b0b90ccc71c1  rack1
UN  127.0.0.2  51.57 KB   256     ?       11814d51-5120-4f9f-b5fc-d0ffa534f964  rack1
DN  127.0.0.3  51.59 KB   256     ?       0101e850-7f3a-499c-a80c-092ecf4e27e3  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

/127.0.0.1
  generation:1437661129
  heartbeat:164
  RELEASE_VERSION:2.1.8-SNAPSHOT
  SEVERITY:0.0
  STATUS:NORMAL,-107708216716906722
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:af23fcbb-fce4-495c-b5b5-b0b90ccc71c1
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.1
  LOAD:52781.0
/127.0.0.2
  generation:1437661129
  heartbeat:166
  SEVERITY:0.0
  RELEASE_VERSION:2.1.8-SNAPSHOT
  STATUS:NORMAL,-1054644930469012369
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:11814d51-5120-4f9f-b5fc-d0ffa534f964
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.2
  LOAD:52807.0
/127.0.0.3
  generation:1437661129
  heartbeat:2147483647
  RELEASE_VERSION:2.1.8-SNAPSHOT
  SEVERITY:0.0
  STATUS:shutdown,true
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:0101e850-7f3a-499c-a80c-092ecf4e27e3
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.3
  LOAD:52826.0
{code}

bq. isFatClient returns true as the endpoint is not a member in TokenMetadata and that's why we fail in SS.joinTokenRing (we check to see if the token is associated with a TokenMetadata member).

Yes this is the root cause but why would the node not be a member? I guess handleStateNormal() is never called, so once again isFatClient() is at fault, just like for CASSANDRA-9765?

Anyway, I plan on putting more debug information tomorrow to find out when the TM is modified.


was (Author: stefania):
bq. can you provide a dump of both nodetool gossipinfo and nodetool status?

{code}
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address    Load       Tokens  Owns    Host ID                               Rack
UN  127.0.0.1  82.71 KB   256     ?       af23fcbb-fce4-495c-b5b5-b0b90ccc71c1  rack1
UN  127.0.0.2  51.57 KB   256     ?       11814d51-5120-4f9f-b5fc-d0ffa534f964  rack1
DN  127.0.0.3  51.59 KB   256     ?       0101e850-7f3a-499c-a80c-092ecf4e27e3  rack1

Note: Non-system keyspaces don't have the same replication settings, effective ownership information is meaningless

/127.0.0.1
  generation:1437661129
  heartbeat:164
  RELEASE_VERSION:2.1.8-SNAPSHOT
  SEVERITY:0.0
  STATUS:NORMAL,-107708216716906722
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:af23fcbb-fce4-495c-b5b5-b0b90ccc71c1
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.1
  LOAD:52781.0
/127.0.0.2
  generation:1437661129
  heartbeat:166
  SEVERITY:0.0
  RELEASE_VERSION:2.1.8-SNAPSHOT
  STATUS:NORMAL,-1054644930469012369
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:11814d51-5120-4f9f-b5fc-d0ffa534f964
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.2
  LOAD:52807.0
/127.0.0.3
  generation:1437661129
  heartbeat:2147483647
  RELEASE_VERSION:2.1.8-SNAPSHOT
  SEVERITY:0.0
  STATUS:shutdown,true
  DC:datacenter1
  NET_VERSION:8
  RACK:rack1
  HOST_ID:0101e850-7f3a-499c-a80c-092ecf4e27e3
  SCHEMA:fa2a3033-51b7-30c0-8926-a2b71bf0fd8a
  RPC_ADDRESS:127.0.0.3
  LOAD:52826.0
{code}

bq. isFatClient returns true as the endpoint is not a member in TokenMetadata and that's why we fail in SS.joinTokenRing (we check to see if the token is associated with a TokenMetadata member).

Yes this is the root cause but why would the node not be a member?

Anyway, I plan on putting more debug information tomorrow to find out when the TM is modified.

> Cannot replace token does not exist - DN node removed as Fat Client
> -------------------------------------------------------------------
>
>                 Key: CASSANDRA-9871
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9871
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Sebastian Estevez
>            Assignee: Stefania
>             Fix For: 2.1.x
>
>
> We lost a node due to disk failure, we tried to replace it via -Dcassandra.replace_address per -- http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/opsReplaceNode.html
> The node would not come up with these errors in the system.log:
> {code}
> INFO  [main] 2015-07-22 03:20:06,722  StorageService.java:500 - Gathering node replacement information for /10.171.115.233
> ...
> INFO  [SharedPool-Worker-1] 2015-07-22 03:22:34,281  Gossiper.java:954 - InetAddress /10.111.183.101 is now UP
> INFO  [GossipTasks:1] 2015-07-22 03:22:59,300  Gossiper.java:735 - FatClient /10.171.115.233 has been silent for 30000ms, removing from gossip
> ERROR [main] 2015-07-22 03:23:28,485  CassandraDaemon.java:541 - Exception encountered during startup
> java.lang.UnsupportedOperationException: Cannot replace token -1013652079972151677 which does not exist!
> {code}
> It is not clear why Gossiper removed the node as a FatClient, given that it was a full node before it died and it had tokens assigned to it (including -1013652079972151677) in system.peers and nodetool ring. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)