You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by George Sigletos <si...@textkernel.nl> on 2016/06/02 12:36:55 UTC

Re: Error while rebuilding a node: Stream failed

I gave up completely with rebuild.

Now I am running `nodetool repair` and in case of network issues I retry
for the token ranges that failed using the -st and -et options of `nodetool
repair`.

That would be good enough for now, till we fix our network problems.

On Sat, May 28, 2016 at 7:05 PM, George Sigletos <si...@textkernel.nl>
wrote:

> No luck unfortunately. It seems that the connection to the destination
> node was lost.
>
> However there was progress compared to the previous times. A lot more data
> was streamed.
>
> (From source node)
> INFO  [GossipTasks:1] 2016-05-28 17:53:57,155 Gossiper.java:1008 -
> InetAddress /54.172.235.227 is now DOWN
> INFO  [HANDSHAKE-/54.172.235.227] 2016-05-28 17:53:58,238
> OutboundTcpConnection.java:487 - Handshaking version with /54.172.235.227
> ERROR [STREAM-IN-/54.172.235.227] 2016-05-28 17:54:08,938
> StreamSession.java:505 - [Stream #d25a05c0-241f-11e6-bb50-1b05ac77baf9]
> Streaming error occurred
> java.io.IOException: Connection timed out
>         at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> ~[na:1.7.0_79]
>         at sun.nio.ch.SocketDispatcher.read(Unknown Source) ~[na:1.7.0_79]
>         at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
> ~[na:1.7.0_79]
>         at sun.nio.ch.IOUtil.read(Unknown Source) ~[na:1.7.0_79]
>         at sun.nio.ch.SocketChannelImpl.read(Unknown Source) ~[na:1.7.0_79]
>         at sun.nio.ch.SocketAdaptor$SocketInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
>         at sun.nio.ch.ChannelInputStream.read(Unknown Source)
> ~[na:1.7.0_79]
>         at java.nio.channels.Channels$ReadableByteChannelImpl.read(Unknown
> Source) ~[na:1.7.0_79]
>         at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:51)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:257)
> ~[apache-cassandra-2.1.14.jar:2.1.14]
>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
> INFO  [SharedPool-Worker-1] 2016-05-28 17:54:59,612 Gossiper.java:993 -
> InetAddress /54.172.235.227 is now UP
>
> On Fri, May 27, 2016 at 5:37 PM, George Sigletos <si...@textkernel.nl>
> wrote:
>
>> I am trying once more using more aggressive tcp settings, as recommended
>> here
>> <https://docs.datastax.com/en/cassandra/2.1/cassandra/troubleshooting/trblshootIdleFirewall.html>
>>
>> sudo sysctl -w net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3 net.ipv4.tcp_keepalive_intvl=10
>>
>> (added to /etc/sysctl.conf and run sysctl -p /etc/sysctl.conf on all
>> nodes)
>>
>> Let's see what happens. I don't know what else to try. I have even
>> further increased streaming_socket_timeout_in_ms
>>
>>
>>
>> On Fri, May 27, 2016 at 4:56 PM, Paulo Motta <pa...@gmail.com>
>> wrote:
>>
>>> I'm afraid raising streaming_socket_timeout_in_ms won't help much in
>>> this case because the incoming connection on the source node is timing out
>>> on the network layer, and streaming_socket_timeout_in_ms controls the
>>> socket timeout in the app layer and throws SocketTimeoutException (not java.io.IOException:
>>> Connection timed out). So you should probably use more aggressive tcp
>>> keep-alive settings (net.ipv4.tcp_keepalive_*) on both hosts, did you try
>>> tuning that? Even that might not be sufficient as some routers tend to
>>> ignore tcp keep-alives and just kill idle connections.
>>>
>>> As said before, this will ultimately be fixed by adding keep-alive to
>>> the app layer on CASSANDRA-11841. If tuning tcp keep-alives does not help,
>>> one extreme approach would be to backport this to 2.1 (unless some
>>> experienced operator out there has a more creative approach).
>>>
>>> @eevans, I'm not sure he is using a mixed version cluster, it seem he
>>> finished the upgrade from 2.1.13 to 2.1.14 before performing the rebuild.
>>>
>>> 2016-05-27 11:39 GMT-03:00 Eric Evans <jo...@gmail.com>:
>>>
>>>> From the various stacktraces in this thread, it's obvious you are
>>>> mixing versions 2.1.13 and 2.1.14.  Topology changes like this aren't
>>>> supported with mixed Cassandra versions.  Sometimes it will work,
>>>> sometimes it won't (and it will definitely not work in this instance).
>>>>
>>>> You should either upgrade your 2.1.13 nodes to 2.1.14 first, or add
>>>> the new nodes using 2.1.13, and upgrade after.
>>>>
>>>> On Fri, May 27, 2016 at 8:41 AM, George Sigletos <
>>>> sigletos@textkernel.nl> wrote:
>>>>
>>>> >>>> ERROR [STREAM-IN-/192.168.1.141] 2016-05-26 09:08:05,027
>>>> >>>> StreamSession.java:505 - [Stream
>>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>>> >>>> Streaming error occurred
>>>> >>>> java.lang.RuntimeException: Outgoing stream handler has been closed
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:138)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:568)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:457)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:263)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at java.lang.Thread.run(Unknown Source) [na:1.7.0_79]
>>>> >>>>
>>>> >>>> And this is from the source node:
>>>> >>>>
>>>> >>>> ERROR [STREAM-OUT-/172.31.22.104] 2016-05-26 11:08:05,097
>>>> >>>> StreamSession.java:505 - [Stream
>>>> #74c57bc0-231a-11e6-a698-1b05ac77baf9]
>>>> >>>> Streaming error occurred
>>>> >>>> java.io.IOException: Broken pipe
>>>> >>>>         at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
>>>> >>>> ~[na:1.7.0_79]
>>>> >>>>         at sun.nio.ch.FileChannelImpl.transferToDirectly(Unknown
>>>> Source)
>>>> >>>> ~[na:1.7.0_79]
>>>> >>>>         at sun.nio.ch.FileChannelImpl.transferTo(Unknown Source)
>>>> >>>> ~[na:1.7.0_79]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.compress.CompressedStreamWriter.write(CompressedStreamWriter.java:84)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage.serialize(OutgoingFileMessage.java:88)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:49)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.messages.OutgoingFileMessage$1.serialize(OutgoingFileMessage.java:41)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:45)
>>>> >>>> ~[apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:358)
>>>> >>>> [apache-cassandra-2.1.14.jar:2.1.14]
>>>> >>>>         at
>>>> >>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:330)
>>>> >>>> [apache-cassandra-2.1.14.jar:2.1.14]
>>>>
>>>>
>>>> >>>>>>>>>>> ERROR [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:57,704
>>>> >>>>>>>>>>> StreamSession.java:620 - [Stream
>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> >>>>>>>>>>> Remote peer 192.168.1.140 failed stream session.
>>>> >>>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:57,705
>>>> >>>>>>>>>>> StreamSession.java:505 - [Stream
>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> >>>>>>>>>>> Streaming error occurred
>>>> >>>>>>>>>>> java.io.IOException: Connection timed out
>>>> >>>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native
>>>> Method)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown
>>>> >>>>>>>>>>> Source) ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown
>>>> Source)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:323)
>>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at java.lang.Thread.run(Unknown Source)
>>>> [na:1.7.0_79]
>>>> >>>>>>>>>>> INFO  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,625
>>>> >>>>>>>>>>> StreamResultFuture.java:180 - [Stream
>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> >>>>>>>>>>> Session with /192.168.1.140 is complete
>>>> >>>>>>>>>>> WARN  [STREAM-IN-/192.168.1.140] 2016-05-24 22:44:58,627
>>>> >>>>>>>>>>> StreamResultFuture.java:207 - [Stream
>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> >>>>>>>>>>> Stream failed
>>>> >>>>>>>>>>> ERROR [RMI TCP Connection(24)-127.0.0.1] 2016-05-24
>>>> 22:44:58,628
>>>> >>>>>>>>>>> StorageService.java:1075 - Error while rebuilding node
>>>> >>>>>>>>>>> org.apache.cassandra.streaming.StreamException: Stream
>>>> failed
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> com.google.common.util.concurrent.Futures$4.run(Futures.java:1172)
>>>> >>>>>>>>>>> ~[guava-16.0.jar:na]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>>>> >>>>>>>>>>> ~[guava-16.0.jar:na]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>>>> >>>>>>>>>>> ~[guava-16.0.jar:na]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>>>> >>>>>>>>>>> ~[guava-16.0.jar:na]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>>>> >>>>>>>>>>> ~[guava-16.0.jar:na]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:208)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:184)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:415)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.StreamSession.sessionFailed(StreamSession.java:621)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:475)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:256)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at java.lang.Thread.run(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>> ERROR [STREAM-OUT-/192.168.1.140] 2016-05-24 22:44:58,629
>>>> >>>>>>>>>>> StreamSession.java:505 - [Stream
>>>> #2c290460-20d4-11e6-930f-1b05ac77baf9]
>>>> >>>>>>>>>>> Streaming error occurred
>>>> >>>>>>>>>>> java.io.IOException: Broken pipe
>>>> >>>>>>>>>>>         at sun.nio.ch.FileDispatcherImpl.write0(Native
>>>> Method)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.SocketDispatcher.write(Unknown Source)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.IOUtil.writeFromNativeBuffer(Unknown
>>>> >>>>>>>>>>> Source) ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.IOUtil.write(Unknown Source)
>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at sun.nio.ch.SocketChannelImpl.write(Unknown
>>>> Source)
>>>> >>>>>>>>>>> ~[na:1.7.0_79]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.io.util.DataOutputStreamAndChannel.write(DataOutputStreamAndChannel.java:48)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)
>>>> >>>>>>>>>>> ~[apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:351)
>>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at
>>>> >>>>>>>>>>>
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:331)
>>>> >>>>>>>>>>> [apache-cassandra-2.1.13.jar:2.1.13]
>>>> >>>>>>>>>>>         at java.lang.Thread.run(Unknown Source)
>>>> [na:1.7.0_79]
>>>>
>>>>
>>>>
>>>> --
>>>> Eric Evans
>>>> john.eric.evans@gmail.com
>>>>
>>>
>>>
>>
>