You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by KZ Win <kz...@pelotoncycle.com> on 2015/07/03 02:01:55 UTC

joining a node caused loads on some existing nodes to skyrocket

We had six node clusters and when we attempted to join a node to this, cpu
load on two gradually climbed to abnormally high number.   Stopping the
join and shutting down cassandra on two high-load nodes restored the
cluster health (we have RF=3)

Anyone have any insight on this cassandra behavior?  We have done node join
many times before; most recent was just 4 days before.  The

The following unusual messages in the relevant time period for two nodes.
We are using cassandra 2.0.10


Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR [GossipStage:1]
CassandraDaemon.java (line 199) Exception in thread
Thread[GossipStage:1,5,main]

Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException

Jun 30 16:47:30 cass-22.pelotime.com     at
org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301)

Jun 30 16:47:30 cass-22.pelotime.com     at
org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)

Jun 30 16:47:30 cass-22.pelotime.com     at
org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)

Jun 30 16:47:30 cass-22.pelotime.com     at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)

Jun 30 16:47:30 cass-22.pelotime.com     at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)

Jun 30 16:47:30 cass-22.pelotime.com     at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)

Jun 30 16:47:30 cass-22.pelotime.com     at java.lang.Thread.run(Unknown
Source)

Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO [GossipStage:2]
Gossiper.java (line 910) Node /10.0.251.77 is now part of the cluster

Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77

Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO
[RequestResponseStage:138] Gossiper.java (line 876) InetAddress /10.0.251.77
is now UP

Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO [GossipStage:2]
Gossiper.java (line 890) InetAddress /10.0.251.77 is now DOWN

Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77

Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [GossipTasks:1]
Gossiper.java (line 658) FatClient /10.0.251.77 has been silent for
30000ms, removing from gossip

Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve



Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77

Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO
[RequestResponseStage:26] Gossiper.java (line 876) InetAddress /10.0.251.77
is now UP

Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileChannelImpl.transferTo0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.SocketDispatcher.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.write(Unknown
Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.SocketChannelImpl.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.SocketDispatcher.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at sun.nio.ch.IOUtil.write(Unknown
Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.SocketChannelImpl.write(Unknown Source)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)

Jun 30 16:50:52 cass-24.pelotime.com     at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)

Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
Source)

Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred

Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe

Jun 30 16:50:52 cass-24.pelotime.com     at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)

Re: joining a node caused loads on some existing nodes to skyrocket

Posted by Sebastian Estevez <se...@datastax.com>.
Hi KZ,

We're you monitoring tpstats and netstats during the bootstrap? This could
give us some more information.

Your logs look a little like 8318 but there would have to be another error
earlier in the logs and the ticket is for a node replacement not a fresh
bootstrap... Can you check for an error earlier in the logs?

It might also make sense to check your system.peers for a rogue entry after
this failed bootstrap. If necessary, remove it.

Was there anything interesting around the same time in the logs of the
nodes that got hot?
On Jul 2, 2015 8:02 PM, "KZ Win" <kz...@pelotoncycle.com> wrote:

> We had six node clusters and when we attempted to join a node to this, cpu
> load on two gradually climbed to abnormally high number.   Stopping the
> join and shutting down cassandra on two high-load nodes restored the
> cluster health (we have RF=3)
>
> Anyone have any insight on this cassandra behavior?  We have done node
> join many times before; most recent was just 4 days before.  The
>
> The following unusual messages in the relevant time period for two nodes.
> We are using cassandra 2.0.10
>
>
> Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR
> [GossipStage:1] CassandraDaemon.java (line 199) Exception in thread
> Thread[GossipStage:1,5,main]
>
> Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com     at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO
> [GossipStage:2] Gossiper.java (line 910) Node /10.0.251.77 is now part of
> the cluster
>
> Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO
> [RequestResponseStage:138] Gossiper.java (line 876) InetAddress /
> 10.0.251.77 is now UP
>
> Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO
> [GossipStage:2] Gossiper.java (line 890) InetAddress /10.0.251.77 is now
> DOWN
>
> Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO
> [GossipTasks:1] Gossiper.java (line 658) FatClient /10.0.251.77 has been
> silent for 30000ms, removing from gossip
>
> Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve
>
>
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO
> [RequestResponseStage:26] Gossiper.java (line 876) InetAddress /
> 10.0.251.77 is now UP
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.SocketDispatcher.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.IOUtil.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.SocketDispatcher.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.IOUtil.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com     at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com     at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
>
>