You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Joel Knighton (JIRA)" <ji...@apache.org> on 2016/07/12 05:50:10 UTC
[jira] [Commented] (CASSANDRA-12172) Fail to bootstrap new node.

    [ https://issues.apache.org/jira/browse/CASSANDRA-12172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15372209#comment-15372209 ] 

Joel Knighton commented on CASSANDRA-12172:
-------------------------------------------

I'm not sure this is a bug - if so, we'll need more information to resolve it.

This error message is correct. Cassandra will try to bootstrap from the replica being replaced unless you advise it otherwise "-Dc
assandra.consistent.rangemovement=false". 

Cassandra's failure detection builds up a statistical estimate based on gossip updates as to whether a node is up or down. It may be that you have an unreliable network and need to tune the phi conviction threshold appropriately - https://github.com/apache/cassandra/blob/91392edbe812c722adcf35cf167bf400d25dc99a/conf/cassandra.yaml#L855. Otherwise, it may be the case that some behavior on the hosts being marked down is preventing them from gossiping/performing other tasks, such as a long GC pause. In this sense, the bug is not in the failure detection but in some other component.

We could get a better perspective on this with trace/debug level logs from the bootstrapping node and also a node marked down at the time of bootstrap.

> Fail to bootstrap new node.
> ---------------------------
>
>                 Key: CASSANDRA-12172
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12172
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Dikang Gu
>
> When I try to bootstrap new node in the cluster, sometimes it failed because of following exceptions.
> {code}
> 2016-07-12_05:14:55.58509 INFO  05:14:55 [main]: JOINING: Starting to bootstrap...
> 2016-07-12_05:14:56.07491 INFO  05:14:56 [GossipTasks:1]: InetAddress /2401:db00:2011:50c7:face:0:9:0 is now DOWN
> 2016-07-12_05:14:56.32219 Exception (java.lang.RuntimeException) encountered during startup: A node required to move the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a potentially inconsis
> tent replica, restart the node with -Dcassandra.consistent.rangemovement=false
> 2016-07-12_05:14:56.32582 ERROR 05:14:56 [main]: Exception encountered during startup
> 2016-07-12_05:14:56.32583 java.lang.RuntimeException: A node required to move the data consistently is down (/2401:db00:2011:50c7:face:0:9:0). If you wish to move the data from a potentially inconsistent replica, restart the node with -Dc
> assandra.consistent.rangemovement=false
> 2016-07-12_05:14:56.32584       at org.apache.cassandra.dht.RangeStreamer.getAllRangesWithStrictSourcesFor(RangeStreamer.java:264) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584       at org.apache.cassandra.dht.RangeStreamer.addRanges(RangeStreamer.java:147) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584       at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:82) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584       at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1230) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32584       at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:924) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585       at org.apache.cassandra.service.StorageService.initServer(StorageService.java:709) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585       at org.apache.cassandra.service.StorageService.initServer(StorageService.java:585) ~[apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32585       at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:300) [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32586       at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32586       at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:625) [apache-cassandra-2.2.5+git20160315.c29948b.jar:2.2.5+git20160315.c29948b]
> 2016-07-12_05:14:56.32730 WARN  05:14:56 [StorageServiceShutdownHook]: No local state or state is in silent shutdown, not announcing shutdown
> {code}
> Here are more logs: https://gist.github.com/DikangGu/c6a83eafdbc091250eade4a3bddcc40b
> I'm pretty sure there are no DOWN nodes or restarted nodes in the cluster, but I still see a lot of nodes UP and DOWN in the gossip log, which failed the bootstrap at the end, is this a known bug?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)