You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Kurt Greaves (JIRA)" <ji...@apache.org> on 2018/07/02 08:23:00 UTC

[jira] [Commented] (CASSANDRA-14525) streaming failure during bootstrap makes new node into inconsistent state

    [ https://issues.apache.org/jira/browse/CASSANDRA-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16529525#comment-16529525 ] 

Kurt Greaves commented on CASSANDRA-14525:
------------------------------------------

{quote}Also I've discovered another bug exists in current open source code in which if isSurveyMode is true and streaming fails (i.e. isBootstrapMode is true) then also one can call nodetool join without nodetool bootstrap resume and have that node join the ring.
{quote}
Great catch. I found a couple more small issue w.r.t {{nodetool join}} as well while I was testing this.
 # If in write_survey and you join the ring after bootstrap, transports won't be enabled. can we call {{CassandraDaemon#start()}} here?
 # nodetool join fails silently if write_survey is true and we haven't completed bootstrapping, but server log prints the following
{code:java}
WARN [RMI TCP Connection(5)-127.0.0.1] 2018-06-29 12:39:49,735 StorageService.java:1008 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS
{code}
nodetool join should say something along the lines of "{{Can't join the ring because in write_survey mode and bootstrap hasn't completed}}"

Also another minor nit w.r.t logging; you can get the following log message after successfully bootstrapping if you were in write survey mode:
{code:java}
INFO [main] 2018-06-29 12:12:39,071 CassandraDaemon.java:479 - Not starting client transports as bootstrap has not completed
{code}
Probably better to split CassandraDaemon.start() if block so that we print "{{Not starting client transports as write_survey mode is enabled.}}"

And finally, there's still 2 occurences of "bootstraped" in the exception messages in {{startNativeTransport}} and {{startRPCServer}}.

> streaming failure during bootstrap makes new node into inconsistent state
> -------------------------------------------------------------------------
>
>                 Key: CASSANDRA-14525
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14525
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Major
>             Fix For: 4.0, 2.2.x, 3.0.x
>
>
> If bootstrap fails for newly joining node (most common reason is due to streaming failure) then Cassandra state remains in {{joining}} state which is fine but Cassandra also enables Native transport which makes overall state inconsistent. This further creates NullPointer exception if auth is enabled on the new node, please find reproducible steps here:
> For example if bootstrap fails due to streaming errors like
> {quote}java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed
>  at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na]
>  at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1256) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:894) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:660) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.StorageService.initServer(StorageService.java:573) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:330) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:567) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:695) [apache-cassandra-3.0.16.jar:3.0.16]
>  Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
>  at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na]
>  at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na]
>  at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:440) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:540) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:307) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:79) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_121]
> {quote}
> then variable [StorageService.java::dataAvailable |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L892] will be {{false}}. Since {{dataAvailable}} is {{false}} hence it will not call [StorageService.java::finishJoiningRing |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L933] and as a result [StorageService.java::doAuthSetup|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L999] will not be invoked.
> API [StorageService.java::joinTokenRing |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/StorageService.java#L763] returns without any problem. After this [CassandraDaemon.java::start|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L584] is invoked which starts native transport at 
>  [CassandraDaemon.java::startNativeTransport |https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/service/CassandraDaemon.java#L478]
> At this point daemon’s bootstrap is still not finished and transport is enabled. So client will connect to the node and will encounter {{java.lang.NullPointerException}} as following:
> {quote}ERROR [SharedPool-Worker-2] Message.java:647 - Unexpected exception during request; channel = [id: 0x412a26b3, L:/a.b.c.d:9042 - R:/p.q.r.s:20121]
>  java.lang.NullPointerException: null
>  at org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:160) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:82) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:198) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) ~[apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:535) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:429) [apache-cassandra-3.0.16.jar:3.0.16]
>  at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.ChannelHandlerInvokerUtil.invokeChannelReadNow(ChannelHandlerInvokerUtil.java:83) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at io.netty.channel.DefaultChannelHandlerInvoker$7.run(DefaultChannelHandlerInvoker.java:159) [netty-all-4.1.0.CR6.jar:4.1.0.CR6]
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_121]
>  at org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) [apache-cassandra-3.0.16.jar:3.0.16]
>  at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) [apache-cassandra-3.0.16.jar:3.0.16]
>  at java.lang.Thread.run(Thread.java:745) [na:1.8.0_121]
> {quote}
> At this point if we run {{nodetool status}} then it will show this new node in {{UJ}} state, however clients can connect to this node over {{CQL}} and will receive {{java.lang.NullPointerException}}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org