You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Heiko Sommer (JIRA)" <ji...@apache.org> on 2015/11/12 11:28:10 UTC

[jira] [Created] (CASSANDRA-10693) Bootstrapping failure that at least needs improved logging

Heiko Sommer created CASSANDRA-10693:
----------------------------------------

             Summary: Bootstrapping failure that at least needs improved logging
                 Key: CASSANDRA-10693
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10693
             Project: Cassandra
          Issue Type: Bug
          Components: Streaming and Messaging
         Environment: Cassandra 2.1.11
Cluster of 2 or 3 xeon servers (8 core, 24 GB RAM each)
            Reporter: Heiko Sommer


I'm playing with a test cluster, trying to add a 3rd server while writing to the cluster at max rate (executeAsync with pushing back). Single table for time series data, with RF=2, STCS.

The bootstrap failure that I repeatedly get may be the same as in CASSANDRA-9440 (or also CASSANDRA-6565).
Even if I could join the 3rd node using temporarily modified configuation ({{streaming_socket_timeout_in_ms}}, {{phi_convict_threshold}} etc), I think that at least improving the logs for this failure case could be worthwhile.

When starting the new node (134.171.18.228), the underlying problem seems to happen very early in the bootstrapping, on one of the existing cluster nodes (134.171.18.231):
{noformat}
INFO  [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:109 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Creating new streaming plan for Bootstrap
INFO  [STREAM-INIT-/134.171.18.228:45331] 2015-11-11 16:21:30,344 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap
INFO  [STREAM-INIT-/134.171.18.228:45333] 2015-11-11 16:21:30,350 StreamResultFuture.java:116 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Received streaming plan for Bootstrap

ERROR [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,505 StreamSession.java:505 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Streaming error occurred
java.lang.AssertionError: Illegal bounds [2270144..2270152); size: 2270144
	at org.apache.cassandra.io.util.Memory.checkBounds(Memory.java:328) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.io.util.SafeMemory.checkBounds(SafeMemory.java:98) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.io.util.Memory.getLong(Memory.java:249) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.io.compress.CompressionMetadata.getChunksForSections(CompressionMetadata.java:253) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.messages.OutgoingFileMessage.<init>(OutgoingFileMessage.java:70) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.StreamTransferTask.addTransferFile(StreamTransferTask.java:58) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.StreamSession.addTransferFiles(StreamSession.java:378) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.StreamSession.addTransferRanges(StreamSession.java:272) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.StreamSession.prepare(StreamSession.java:521) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:453) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256) ~[apache-cassandra-2.1.11.jar:2.1.11]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
INFO  [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,510 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.228 is complete
WARN  [STREAM-IN-/134.171.18.228] 2015-11-11 16:21:41,511 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed
{noformat}

On the joining node, this failure is not reported at all:
{noformat}
INFO  [main] 2015-11-11 16:21:30,000 StorageService.java:1120 - JOINING: Starting to bootstrap...
INFO  [main] 2015-11-11 16:21:30,336 StreamResultFuture.java:86 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Executing streaming plan for Bootstrap
INFO  [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.229
INFO  [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,337 StreamSession.java:220 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Starting streaming to /134.171.18.231
INFO  [StreamConnectionEstablisher:1] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.231
INFO  [StreamConnectionEstablisher:2] 2015-11-11 16:21:30,348 StreamCoordinator.java:209 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc, ID#0] Beginning stream session with /134.171.18.229
INFO  [STREAM-IN-/134.171.18.231] 2015-11-11 16:21:41,508 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.231 is complete
INFO  [STREAM-IN-/134.171.18.229] 2015-11-11 16:21:42,639 StreamResultFuture.java:166 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc ID#0] Prepare completed. Receiving 66 files(120500434815 bytes), sending 0 files(0 bytes)
{noformat}

However, ~9 hours later, the bootstrapping fails, with logs:
{noformat}
INFO  [StreamReceiveTask:8] 2015-11-12 01:25:26,837 StreamResultFuture.java:180 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Session with /134.171.18.229 is complete
WARN  [StreamReceiveTask:8] 2015-11-12 01:25:26,839 StreamResultFuture.java:207 - [Stream #43deef20-8890-11e5-a633-c16ac7efddbc] Stream failed
ERROR [main] 2015-11-12 01:25:26,843 CassandraDaemon.java:579 - Exception encountered during startup
java.lang.RuntimeException: Error during boostrap: Stream failed
        at org.apache.cassandra.dht.BootStrapper.bootstrap(BootStrapper.java:86) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1148) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:938) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:734) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.StorageService.initServer(StorageService.java:611) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:387) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:562) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:651) [apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
        at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.streaming.StreamSession.maybeCompleted(StreamSession.java:688) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.streaming.StreamSession.taskCompleted(StreamSession.java:651) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:139) ~[apache-cassandra-2.1.11-SNAPSHOT.jar:2.1.11-SNAPSHOT]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_40]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_40]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_40]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_40]
        at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_40]
WARN  [StorageServiceShutdownHook] 2015-11-12 01:25:26,855 Gossiper.java:1453 - No local state or state is in silent shutdown, not announcing shutdown
{noformat}

I hacked {{StreamState#hasFailedSession()}} to log all failed sessions, and it showed exactly one to the node that had the "AssertionError: Illegal bounds" before. 

It would be great to get some help with the following questions:
* What could cause the "AssertionError: Illegal bounds"? I'm now running a cleanup just in case and will re-run later.
* Is the streaming failure with one of the existing cluster nodes considered so unimportant (because other nodes are supposed to cover this) that it should not be logged on the bootstrapping node? In that case the failure should not let the bootstrapping fail, I suppose.
* Is it normal that bootstrapping can take 9 hours, or am I doing something fundamentally wrong?
* OpsCenter 5.2.2 shows my new node for the whole time under "Unknown Datacenter", even though "nodetool status" shows it as joining the existing datacenter. Is that an indication of a problem in my setup, or just the way that OpsCenter displays joining nodes?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)