You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jens Rantil (JIRA)" <ji...@apache.org> on 2014/11/14 15:29:33 UTC
[jira] [Updated] (CASSANDRA-8318) Unable to replace a node
[ https://issues.apache.org/jira/browse/CASSANDRA-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jens Rantil updated CASSANDRA-8318:
-----------------------------------
Description:
Had a hardware failure of a node. I followed the Datastax documentation[1] on how to replace the node X.X.X.51 using a brand new node with the same IP. Since it didn't come up after waiting for ~5 minutes or so, I decided to replace X.X.X.51 with a brand new unused IP X.X.X.56 instead. It now seems like my gossip is some weird state. When I start the replacement node I see line like
{noformat}
INFO [GossipStage:1] 2014-11-14 14:57:03,025 Gossiper.java (line 901) InetAddress /X.X.X.51 is now DOWN
INFO [GossipStage:1] 2014-11-14 14:57:03,042 Gossiper.java (line 901) InetAddress /X.X.X.56 is now DOWN
{noformat}
. The latter is somewhat surprising since that is the IP of the actual replacement node. It doesn't surprise me it can't talk to itself if it hasn't started!
Eventually the replacement node shuts down with
{noformat}
ERROR [main] 2014-11-14 14:58:06,031 CassandraDaemon.java (line 513) Exception encountered during startup
java.lang.UnsupportedOperationException: Cannot replace token -2 which does not exist!
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:782)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:614)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:503)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:374)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:615)
INFO [Thread-2] 2014-11-14 14:58:06,035 DseDaemon.java (line 461) DSE shutting down...
INFO [StorageServiceShutdownHook] 2014-11-14 14:58:06,037 Gossiper.java (line 1307) Announcing shutdown
INFO [Thread-2] 2014-11-14 14:58:06,046 PluginManager.java (line 355) All plugins are stopped.
INFO [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 463) Cassandra shutting down...
ERROR [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-2,5,main]
java.lang.NullPointerException
at org.apache.cassandra.service.CassandraDaemon.stop(CassandraDaemon.java:464)
at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:464)
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:364){noformat}
All nodes are showing
{noformat}
jrantil@machine-2:~$ nodetool status company
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN X.X.X.50 18.35 GB 1 16.7% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
DN X.X.X.51 195.67 KB 1 16.7% d97cf86f-bfaf-4488-b716-26d71635a8fc rack1
UN X.X.X.52 18.7 GB 1 16.7% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
UN X.X.X.53 18.56 GB 1 16.7% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
UN X.X.X.54 19.69 GB 1 16.7% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
UN X.X.X.55 18.88 GB 1 16.7% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN X.X.X.33 128.95 GB 256 100.0% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
UN X.X.X.32 115.3 GB 256 100.0% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
UN X.X.X.31 130.45 GB 256 100.0% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
{noformat}
, but when X.X.X.56 is starting is shows
{noformat}
root@machine-1:/var/lib/cassandra# nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN X.X.X.50 18.41 GB 1 0.2% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
UN X.X.X.52 19.07 GB 1 0.0% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
UN X.X.X.53 18.65 GB 1 0.1% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
UN X.X.X.54 19.69 GB 1 0.0% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
UN X.X.X.55 18.97 GB 1 0.2% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN X.X.X.33 129.72 GB 256 21.7% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
UN X.X.X.32 116 GB 256 12.4% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
UN X.X.X.31 130.62 GB 256 65.3% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
{noformat}
The above cluster state does not seem to replicate to the rest of the cluster (hasn't so far).
Any input on how I can restore world order is appreciated.
[1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
was:
Had a hardware failure of a node. I followed the Datastax documentation[1] on how to replace the node X.X.X.51 using a brand new node with the same IP. Since it didn't come up after waiting for ~5 minutes or so, I decided to replace X.X.X.51 with a brand new unused IP X.X.X.56 instead. It now seems like my gossip is some weird state. When I start the replacement node I see line like
{noformat}
INFO [GossipStage:1] 2014-11-14 14:57:03,025 Gossiper.java (line 901) InetAddress /X.X.X.51 is now DOWN
INFO [GossipStage:1] 2014-11-14 14:57:03,042 Gossiper.java (line 901) InetAddress /X.X.X.56 is now DOWN
{noformat}
. The latter is somewhat surprising since that is the IP of the actual replacement node. It doesn't surprise me it can't talk to itself if it hasn't started!
Eventually the replacement node shuts down with
{noformat}
ERROR [main] 2014-11-14 14:58:06,031 CassandraDaemon.java (line 513) Exception encountered during startup
java.lang.UnsupportedOperationException: Cannot replace token -2 which does not exist!
at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:782)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:614)
at org.apache.cassandra.service.StorageService.initServer(StorageService.java:503)
at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:374)
at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:615)
INFO [Thread-2] 2014-11-14 14:58:06,035 DseDaemon.java (line 461) DSE shutting down...
INFO [StorageServiceShutdownHook] 2014-11-14 14:58:06,037 Gossiper.java (line 1307) Announcing shutdown
INFO [Thread-2] 2014-11-14 14:58:06,046 PluginManager.java (line 355) All plugins are stopped.
INFO [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 463) Cassandra shutting down...
ERROR [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-2,5,main]
java.lang.NullPointerException
at org.apache.cassandra.service.CassandraDaemon.stop(CassandraDaemon.java:464)
at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:464)
at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:364){noformat}
All nodes are showing
{noformat}
jrantil@analytics-2:~$ nodetool status company
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN X.X.X.50 18.35 GB 1 16.7% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
DN X.X.X.51 195.67 KB 1 16.7% d97cf86f-bfaf-4488-b716-26d71635a8fc rack1
UN X.X.X.52 18.7 GB 1 16.7% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
UN X.X.X.53 18.56 GB 1 16.7% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
UN X.X.X.54 19.69 GB 1 16.7% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
UN X.X.X.55 18.88 GB 1 16.7% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN X.X.X.33 128.95 GB 256 100.0% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
UN X.X.X.32 115.3 GB 256 100.0% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
UN X.X.X.31 130.45 GB 256 100.0% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
{noformat}
, but when X.X.X.56 is starting is shows
{noformat}
root@analytics-1:/var/lib/cassandra# nodetool status
Note: Ownership information does not include topology; for complete information, specify a keyspace
Datacenter: Analytics
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN X.X.X.50 18.41 GB 1 0.2% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
UN X.X.X.52 19.07 GB 1 0.0% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
UN X.X.X.53 18.65 GB 1 0.1% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
UN X.X.X.54 19.69 GB 1 0.0% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
UN X.X.X.55 18.97 GB 1 0.2% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
Datacenter: Cassandra
=====================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns Host ID Rack
UN X.X.X.33 129.72 GB 256 21.7% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
UN X.X.X.32 116 GB 256 12.4% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
UN X.X.X.31 130.62 GB 256 65.3% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
{noformat}
Any input on how I can restore world order is appreciated.
[1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
> Unable to replace a node
> ------------------------
>
> Key: CASSANDRA-8318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8318
> Project: Cassandra
> Issue Type: Bug
> Environment: 2.0.8.39 (Datastax DSE 4.5.3)
> Reporter: Jens Rantil
> Attachments: X.X.X.56.log
>
>
> Had a hardware failure of a node. I followed the Datastax documentation[1] on how to replace the node X.X.X.51 using a brand new node with the same IP. Since it didn't come up after waiting for ~5 minutes or so, I decided to replace X.X.X.51 with a brand new unused IP X.X.X.56 instead. It now seems like my gossip is some weird state. When I start the replacement node I see line like
> {noformat}
> INFO [GossipStage:1] 2014-11-14 14:57:03,025 Gossiper.java (line 901) InetAddress /X.X.X.51 is now DOWN
> INFO [GossipStage:1] 2014-11-14 14:57:03,042 Gossiper.java (line 901) InetAddress /X.X.X.56 is now DOWN
> {noformat}
> . The latter is somewhat surprising since that is the IP of the actual replacement node. It doesn't surprise me it can't talk to itself if it hasn't started!
> Eventually the replacement node shuts down with
> {noformat}
> ERROR [main] 2014-11-14 14:58:06,031 CassandraDaemon.java (line 513) Exception encountered during startup
> java.lang.UnsupportedOperationException: Cannot replace token -2 which does not exist!
> at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:782)
> at org.apache.cassandra.service.StorageService.initServer(StorageService.java:614)
> at org.apache.cassandra.service.StorageService.initServer(StorageService.java:503)
> at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:378)
> at com.datastax.bdp.server.DseDaemon.setup(DseDaemon.java:374)
> at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:496)
> at com.datastax.bdp.server.DseDaemon.main(DseDaemon.java:615)
> INFO [Thread-2] 2014-11-14 14:58:06,035 DseDaemon.java (line 461) DSE shutting down...
> INFO [StorageServiceShutdownHook] 2014-11-14 14:58:06,037 Gossiper.java (line 1307) Announcing shutdown
> INFO [Thread-2] 2014-11-14 14:58:06,046 PluginManager.java (line 355) All plugins are stopped.
> INFO [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 463) Cassandra shutting down...
> ERROR [Thread-2] 2014-11-14 14:58:06,047 CassandraDaemon.java (line 199) Exception in thread Thread[Thread-2,5,main]
> java.lang.NullPointerException
> at org.apache.cassandra.service.CassandraDaemon.stop(CassandraDaemon.java:464)
> at com.datastax.bdp.server.DseDaemon.stop(DseDaemon.java:464)
> at com.datastax.bdp.server.DseDaemon$1.run(DseDaemon.java:364){noformat}
> All nodes are showing
> {noformat}
> jrantil@machine-2:~$ nodetool status company
> Datacenter: Analytics
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN X.X.X.50 18.35 GB 1 16.7% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
> DN X.X.X.51 195.67 KB 1 16.7% d97cf86f-bfaf-4488-b716-26d71635a8fc rack1
> UN X.X.X.52 18.7 GB 1 16.7% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
> UN X.X.X.53 18.56 GB 1 16.7% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
> UN X.X.X.54 19.69 GB 1 16.7% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
> UN X.X.X.55 18.88 GB 1 16.7% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
> Datacenter: Cassandra
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN X.X.X.33 128.95 GB 256 100.0% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
> UN X.X.X.32 115.3 GB 256 100.0% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
> UN X.X.X.31 130.45 GB 256 100.0% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
> {noformat}
> , but when X.X.X.56 is starting is shows
> {noformat}
> root@machine-1:/var/lib/cassandra# nodetool status
> Note: Ownership information does not include topology; for complete information, specify a keyspace
> Datacenter: Analytics
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN X.X.X.50 18.41 GB 1 0.2% 25efdbcd-14d3-4e9c-803a-3db5795d4efa rack1
> UN X.X.X.52 19.07 GB 1 0.0% caa32f68-5a6b-4d87-80bd-baa66a9b61ce rack1
> UN X.X.X.53 18.65 GB 1 0.1% e219321e-a6d5-48c4-9bad-d2e25429b1d2 rack1
> UN X.X.X.54 19.69 GB 1 0.0% 3cd36895-ee47-41c1-a5f5-41cb0f8526a6 rack1
> UN X.X.X.55 18.97 GB 1 0.2% 7d3f73c4-724e-45a6-bcf9-d3072dfc157f rack1
> Datacenter: Cassandra
> =====================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns Host ID Rack
> UN X.X.X.33 129.72 GB 256 21.7% 871968c9-1d6b-4f06-ba90-8b3a8d92dcf0 rack1
> UN X.X.X.32 116 GB 256 12.4% d7cacd89-8613-4de5-8a5e-a2c53c41ea45 rack1
> UN X.X.X.31 130.62 GB 256 65.3% 48cb0782-6c9a-4805-9330-38e192b6b680 rack1
> {noformat}
> The above cluster state does not seem to replicate to the rest of the cluster (hasn't so far).
> Any input on how I can restore world order is appreciated.
> [1] http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)