You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by KZ Win <kz...@pelotoncycle.com> on 2015/07/03 02:01:55 UTC
joining a node caused loads on some existing nodes to skyrocket
We had six node clusters and when we attempted to join a node to this, cpu
load on two gradually climbed to abnormally high number. Stopping the
join and shutting down cassandra on two high-load nodes restored the
cluster health (we have RF=3)
Anyone have any insight on this cassandra behavior? We have done node join
many times before; most recent was just 4 days before. The
The following unusual messages in the relevant time period for two nodes.
We are using cassandra 2.0.10
Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR [GossipStage:1]
CassandraDaemon.java (line 199) Exception in thread
Thread[GossipStage:1,5,main]
Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException
Jun 30 16:47:30 cass-22.pelotime.com at
org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301)
Jun 30 16:47:30 cass-22.pelotime.com at
org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)
Jun 30 16:47:30 cass-22.pelotime.com at
org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
Jun 30 16:47:30 cass-22.pelotime.com at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
Jun 30 16:47:30 cass-22.pelotime.com at
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
Jun 30 16:47:30 cass-22.pelotime.com at
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
Jun 30 16:47:30 cass-22.pelotime.com at java.lang.Thread.run(Unknown
Source)
Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO [GossipStage:2]
Gossiper.java (line 910) Node /10.0.251.77 is now part of the cluster
Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77
Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO
[RequestResponseStage:138] Gossiper.java (line 876) InetAddress /10.0.251.77
is now UP
Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO [GossipStage:2]
Gossiper.java (line 890) InetAddress /10.0.251.77 is now DOWN
Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77
Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [GossipTasks:1]
Gossiper.java (line 658) FatClient /10.0.251.77 has been silent for
30000ms, removing from gossip
Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve
Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77
Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO
[RequestResponseStage:26] Gossiper.java (line 876) InetAddress /10.0.251.77
is now UP
Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version with
/10.0.251.77
Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
Source)
Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.SocketDispatcher.write(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at sun.nio.ch.IOUtil.write(Unknown
Source)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
Source)
Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.SocketDispatcher.write(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at sun.nio.ch.IOUtil.write(Unknown
Source)
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.SocketChannelImpl.write(Unknown Source)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
Jun 30 16:50:52 cass-24.pelotime.com at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
Source)
Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
10.0.251.77] StreamSession.java (line 454) [Stream
#5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
Jun 30 16:50:52 cass-24.pelotime.com at
sun.nio.ch.FileDispatcherImpl.write0(Native Method)
Re: joining a node caused loads on some existing nodes to skyrocket
Posted by Sebastian Estevez <se...@datastax.com>.
Hi KZ,
We're you monitoring tpstats and netstats during the bootstrap? This could
give us some more information.
Your logs look a little like 8318 but there would have to be another error
earlier in the logs and the ticket is for a node replacement not a fresh
bootstrap... Can you check for an error earlier in the logs?
It might also make sense to check your system.peers for a rogue entry after
this failed bootstrap. If necessary, remove it.
Was there anything interesting around the same time in the logs of the
nodes that got hot?
On Jul 2, 2015 8:02 PM, "KZ Win" <kz...@pelotoncycle.com> wrote:
> We had six node clusters and when we attempted to join a node to this, cpu
> load on two gradually climbed to abnormally high number. Stopping the
> join and shutting down cassandra on two high-load nodes restored the
> cluster health (we have RF=3)
>
> Anyone have any insight on this cassandra behavior? We have done node
> join many times before; most recent was just 4 days before. The
>
> The following unusual messages in the relevant time period for two nodes.
> We are using cassandra 2.0.10
>
>
> Jun 30 16:47:30 cass-22.pelotime.com cassandra-serverERROR
> [GossipStage:1] CassandraDaemon.java (line 199) Exception in thread
> Thread[GossipStage:1,5,main]
>
> Jun 30 16:47:30 cass-22.pelotime.com java.lang.NullPointerException
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> org.apache.cassandra.gms.Gossiper.convict(Gossiper.java:301)
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> org.apache.cassandra.gms.FailureDetector.forceConviction(FailureDetector.java:251)
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> org.apache.cassandra.gms.GossipShutdownVerbHandler.doVerb(GossipShutdownVerbHandler.java:37)
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62)
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:47:30 cass-22.pelotime.com cassandra-server INFO
> [GossipStage:2] Gossiper.java (line 910) Node /10.0.251.77 is now part of
> the cluster
>
> Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:47:35 cass-22.pelotime.com cassandra-server INFO
> [RequestResponseStage:138] Gossiper.java (line 876) InetAddress /
> 10.0.251.77 is now UP
>
> Jun 30 16:47:38 cass-22.pelotime.com cassandra-server INFO
> [GossipStage:2] Gossiper.java (line 890) InetAddress /10.0.251.77 is now
> DOWN
>
> Jun 30 16:48:02 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO
> [GossipTasks:1] Gossiper.java (line 658) FatClient /10.0.251.77 has been
> silent for 30000ms, removing from gossip
>
> Jun 30 16:48:05 cass-22.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking ve
>
>
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO
> [RequestResponseStage:26] Gossiper.java (line 876) InetAddress /
> 10.0.251.77 is now UP
>
> Jun 30 16:48:59 cass-24.pelotime.com cassandra-server INFO [HANDSHAKE-/
> 10.0.251.77] OutboundTcpConnection.java (line 386) Handshaking version
> with /10.0.251.77
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:74)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:59)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:42)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.SocketDispatcher.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.IOUtil.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.SocketDispatcher.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.IOUtil.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.SocketChannelImpl.write(Unknown Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)
>
> Jun 30 16:50:52 cass-24.pelotime.com at java.lang.Thread.run(Unknown
> Source)
>
> Jun 30 16:50:52 cass-24.pelotime.com cassandra-serverERROR [STREAM-OUT-/
> 10.0.251.77] StreamSession.java (line 454) [Stream
> #5f2251e0-1f69-11e5-94c0-d9033a25abe9] Streaming error occurred
>
> Jun 30 16:50:52 cass-24.pelotime.com java.io.IOException: Broken pipe
>
> Jun 30 16:50:52 cass-24.pelotime.com at
> sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>
>
>