You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jérôme Mainaud <je...@mainaud.com> on 2016/08/15 18:37:44 UTC

New node block in autobootstrap

Hello,

A client of mime have problems when adding a node in the cluster.
After 4 days, the node is still in joining mode, it doesn't have the same
level of load than the other and there seems to be no streaming from and to
the new node.

This node has a history.

   1. At the begin, it was in a seed in the cluster.
   2. Ops detected that client had problems with it.
   3. They tried to reset it but failed. In their process they launched
   several repair and rebuild process on the node.
   4. Then they asked me to help them.
   5. We stopped the node,
   6. removed it from the list of seeds (more precisely it was replaced by
   another node),
   7. removed it from the cluster (I choose not to use decommission since
   node data was compromised)
   8. deleted all files from data, commitlog and savedcache directories.
   9. after the leaving process ended, it was started as a fresh new node
   and began autobootstrap.


As I don’t have direct access to the cluster I don't have a lot of
information, but I will have tomorrow (logs and results of some commands).
And I can ask for people any required information.

Does someone have any idea of what could have happened and what I should
investigate first ?
What would you do to unlock the situation ?

Context: The cluster consists of two DC, each with 15 nodes. Average load
is around 3 TB per node. The joining node froze a little after 2 TB.

Thank you for your help.
Cheers,


-- 
Jérôme Mainaud
jerome@mainaud.com

Re: New node block in autobootstrap

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
>
> Forgot to set replication for new data center :(


I was feeling like it could be it :-). From the other thread:


> It should be ran from DC3 servers, after altering keyspace to add
> keyspaces to the new datacenter. Is this the way you're doing it?
>
>    - Are all the nodes using the same version ('nodetool version')?
>    - What does 'nodetool status keyspace_name1' output?
>    - Are you sure to be using Network Topology Strategy on '
>    *keyspace_name1'? *Have you modified this schema to add replications
>    on DC3
>
> My guess is something could be wrong with the configuration.
>


I was starting to wonder about this one though, so thanks for letting us
about it :-).

C*heers,
-----------------------
Alain Rodriguez - @arodream - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-09-28 23:54 GMT+02:00 techpyaasa . <te...@gmail.com>:

> Forgot to set replication for new data center :(
>
> On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad <jo...@jonhaddad.com>
> wrote:
>
>> What was the reason?
>>
>> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <te...@gmail.com>
>> wrote:
>>
>>> Very sorry...I got the reason for this issue..
>>> Please ignore.
>>>
>>>
>>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <te...@gmail.com>
>>> wrote:
>>>
>>>> @Paulo
>>>>
>>>> We have done changes as you said
>>>> net.ipv4.tcp_keepalive_time=60
>>>> net.ipv4.tcp_keepalive_probes=3
>>>> net.ipv4.tcp_keepalive_intvl=10
>>>>
>>>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>>>> "phi_convict_threshold : 9".
>>>>
>>>> And once again recommissioned new data center (DC3)  , ran " nodetool
>>>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>>>> got exit without any exception.
>>>>
>>>> Please check logs below
>>>>
>>>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>>>> StorageService.java (line 914) rebuild from dc: IDC*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>>>> StreamResultFuture.java (line 87) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.75*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.132*
>>>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.75*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.133*
>>>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.132*
>>>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.133*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.167*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.78*
>>>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.167*
>>>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.78*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.126*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.191*
>>>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.126*
>>>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.191*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.168*
>>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>>>> StreamResultFuture.java (line 91) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>>> /xxx.xxx.198.169*
>>>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.168*
>>>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>>>> StreamSession.java (line 214) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>>> /xxx.xxx.198.169*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
>>>> complete*
>>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
>>>> StreamResultFuture.java (line 220) [Stream
>>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*
>>>>
>>>>
>>>> As you can see logs above , nodetool rebuild finished w/o data got
>>>> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp
>>>> in logs).
>>>>
>>>>
>>>> And also "nodetool status" seems to be all fine from this new
>>>> nodes(from which I run 'nodetool rebuild').
>>>>
>>>> Please let us know what could be the issue here.
>>>>
>>>> Thanks in advance.
>>>>
>>>> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> Yeah this is likely to be caused by idle connections being shut down,
>>>>> so you may need to update your tcp_keepalive* and/or network/firewall
>>>>> settings.
>>>>>
>>>>>
>>>>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>
>>>>> :
>>>>>
>>>>>> Hi paul,
>>>>>>
>>>>>> Thanks for the reply...
>>>>>>
>>>>>> I'm getting following streaming exceptions during nodetool rebuild in
>>>>>> c*-2.0.17
>>>>>>
>>>>>> *04:24:49,759 StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.io.IOException: Connection timed out*
>>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>>> *    at
>>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> ConnectionHandler.java (line 104) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>>>>>> /xxx.xxx.98.168*
>>>>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> StreamResultFuture.java (line 186) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>>>>> complete*
>>>>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>>> StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.io.IOException: Broken pipe*
>>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>>> *    at
>>>>>> sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>>> ConnectionHandler.java (line 244) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>>>>> 4736, transfer size: 2306880, compressed?: true), file:
>>>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>>>>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>>> StreamSession.java (line 461) [Stream
>>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>>>>>> *    at
>>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>>
>>>>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> What type of streaming timeout are you getting? Do you have a stack
>>>>>>> trace? What version are you in?
>>>>>>>
>>>>>>> See more information about tuning tcp_keepalive* here:
>>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>>>>>> shooting/trblshootIdleFirewall.html
>>>>>>>
>>>>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <
>>>>>>> laxmikanth524@gmail.com>:
>>>>>>>
>>>>>>>> @Paulo Motta
>>>>>>>>
>>>>>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>>>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>>>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-
>>>>>>>> streaming-errors-or-failures  , but still we are getting streaming
>>>>>>>> exceptions.
>>>>>>>>
>>>>>>>> And what is the suggestible settings/values for kernel
>>>>>>>> tcp_keepalive which would help streaming succeed ?
>>>>>>>>
>>>>>>>> Thank you
>>>>>>>>
>>>>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <
>>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>>
>>>>>>>>> What version are you in? This seems like a typical case were there
>>>>>>>>> was a problem with streaming (hanging, etc), do you have access to the
>>>>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are
>>>>>>>>> related to timeouts, so you should review your cassandra
>>>>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>>>>>
>>>>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>>>>>> particular series for a more robust version.
>>>>>>>>>
>>>>>>>>> Is there any reason why you didn't use the replace procedure
>>>>>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>>>>>> a bit faster than remove + bootstrap procedure.
>>>>>>>>>
>>>>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>>>>>
>>>>>>>>>> Hello,
>>>>>>>>>>
>>>>>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>>>>>> After 4 days, the node is still in joining mode, it doesn't have
>>>>>>>>>> the same level of load than the other and there seems to be no streaming
>>>>>>>>>> from and to the new node.
>>>>>>>>>>
>>>>>>>>>> This node has a history.
>>>>>>>>>>
>>>>>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>>>>>    2. Ops detected that client had problems with it.
>>>>>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>>>>>    launched several repair and rebuild process on the node.
>>>>>>>>>>    4. Then they asked me to help them.
>>>>>>>>>>    5. We stopped the node,
>>>>>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>>>>>    replaced by another node),
>>>>>>>>>>    7. removed it from the cluster (I choose not to use
>>>>>>>>>>    decommission since node data was compromised)
>>>>>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>>>>>    directories.
>>>>>>>>>>    9. after the leaving process ended, it was started as a fresh
>>>>>>>>>>    new node and began autobootstrap.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> As I don’t have direct access to the cluster I don't have a lot
>>>>>>>>>> of information, but I will have tomorrow (logs and results of some
>>>>>>>>>> commands). And I can ask for people any required information.
>>>>>>>>>>
>>>>>>>>>> Does someone have any idea of what could have happened and what I
>>>>>>>>>> should investigate first ?
>>>>>>>>>> What would you do to unlock the situation ?
>>>>>>>>>>
>>>>>>>>>> Context: The cluster consists of two DC, each with 15 nodes.
>>>>>>>>>> Average load is around 3 TB per node. The joining node froze a little after
>>>>>>>>>> 2 TB.
>>>>>>>>>>
>>>>>>>>>> Thank you for your help.
>>>>>>>>>> Cheers,
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Jérôme Mainaud
>>>>>>>>>> jerome@mainaud.com
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Regards,
>>>>>>>> Laxmikanth
>>>>>>>> 99621 38051
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>

Re: New node block in autobootstrap

Posted by "techpyaasa ." <te...@gmail.com>.
Forgot to set replication for new data center :(

On Wed, Sep 28, 2016 at 11:33 PM, Jonathan Haddad <jo...@jonhaddad.com> wrote:

> What was the reason?
>
> On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <te...@gmail.com> wrote:
>
>> Very sorry...I got the reason for this issue..
>> Please ignore.
>>
>>
>> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <te...@gmail.com>
>> wrote:
>>
>>> @Paulo
>>>
>>> We have done changes as you said
>>> net.ipv4.tcp_keepalive_time=60
>>> net.ipv4.tcp_keepalive_probes=3
>>> net.ipv4.tcp_keepalive_intvl=10
>>>
>>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>>> "phi_convict_threshold : 9".
>>>
>>> And once again recommissioned new data center (DC3)  , ran " nodetool
>>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>>> got exit without any exception.
>>>
>>> Please check logs below
>>>
>>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>>> StorageService.java (line 914) rebuild from dc: IDC*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>>> StreamResultFuture.java (line 87) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.75*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.132*
>>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.75*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.133*
>>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.132*
>>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.133*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.167*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.78*
>>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.167*
>>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.78*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.126*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.191*
>>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.126*
>>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.191*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.168*
>>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>>> StreamResultFuture.java (line 91) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>>> /xxx.xxx.198.169*
>>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.168*
>>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>>> StreamSession.java (line 214) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>>> /xxx.xxx.198.169*
>>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
>>> StreamResultFuture.java (line 186) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
>>> complete*
>>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
>>> StreamResultFuture.java (line 220) [Stream
>>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*
>>>
>>>
>>> As you can see logs above , nodetool rebuild finished w/o data got
>>> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp
>>> in logs).
>>>
>>>
>>> And also "nodetool status" seems to be all fine from this new nodes(from
>>> which I run 'nodetool rebuild').
>>>
>>> Please let us know what could be the issue here.
>>>
>>> Thanks in advance.
>>>
>>> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pa...@gmail.com>
>>> wrote:
>>>
>>>> Yeah this is likely to be caused by idle connections being shut down,
>>>> so you may need to update your tcp_keepalive* and/or network/firewall
>>>> settings.
>>>>
>>>>
>>>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>>>
>>>>> Hi paul,
>>>>>
>>>>> Thanks for the reply...
>>>>>
>>>>> I'm getting following streaming exceptions during nodetool rebuild in
>>>>> c*-2.0.17
>>>>>
>>>>> *04:24:49,759 StreamSession.java (line 461) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>> *java.io.IOException: Connection timed out*
>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>> ConnectionHandler.java (line 104) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>>>>> /xxx.xxx.98.168*
>>>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>> StreamResultFuture.java (line 186) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>>>> complete*
>>>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>>> StreamSession.java (line 461) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>> *java.io.IOException: Broken pipe*
>>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>> ConnectionHandler.java (line 244) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>>>> 4736, transfer size: 2306880, compressed?: true), file:
>>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>>>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>>> StreamSession.java (line 461) [Stream
>>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>>>>> *    at
>>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>>
>>>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> What type of streaming timeout are you getting? Do you have a stack
>>>>>> trace? What version are you in?
>>>>>>
>>>>>> See more information about tuning tcp_keepalive* here:
>>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/
>>>>>> trblshootIdleFirewall.html
>>>>>>
>>>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <laxmikanth524@gmail.com
>>>>>> >:
>>>>>>
>>>>>>> @Paulo Motta
>>>>>>>
>>>>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>>>>> suggested in datastax blog  - https://support.datastax.com/
>>>>>>> hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-
>>>>>>> of-streaming-errors-or-failures  , but still we are getting
>>>>>>> streaming exceptions.
>>>>>>>
>>>>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>>>>> which would help streaming succeed ?
>>>>>>>
>>>>>>> Thank you
>>>>>>>
>>>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <
>>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>>
>>>>>>>> What version are you in? This seems like a typical case were there
>>>>>>>> was a problem with streaming (hanging, etc), do you have access to the
>>>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are
>>>>>>>> related to timeouts, so you should review your cassandra
>>>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>>>>
>>>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>>>>> particular series for a more robust version.
>>>>>>>>
>>>>>>>> Is there any reason why you didn't use the replace procedure
>>>>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>>>>> a bit faster than remove + bootstrap procedure.
>>>>>>>>
>>>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>>>>
>>>>>>>>> Hello,
>>>>>>>>>
>>>>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>>>>> After 4 days, the node is still in joining mode, it doesn't have
>>>>>>>>> the same level of load than the other and there seems to be no streaming
>>>>>>>>> from and to the new node.
>>>>>>>>>
>>>>>>>>> This node has a history.
>>>>>>>>>
>>>>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>>>>    2. Ops detected that client had problems with it.
>>>>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>>>>    launched several repair and rebuild process on the node.
>>>>>>>>>    4. Then they asked me to help them.
>>>>>>>>>    5. We stopped the node,
>>>>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>>>>    replaced by another node),
>>>>>>>>>    7. removed it from the cluster (I choose not to use
>>>>>>>>>    decommission since node data was compromised)
>>>>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>>>>    directories.
>>>>>>>>>    9. after the leaving process ended, it was started as a fresh
>>>>>>>>>    new node and began autobootstrap.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>>>>>> And I can ask for people any required information.
>>>>>>>>>
>>>>>>>>> Does someone have any idea of what could have happened and what I
>>>>>>>>> should investigate first ?
>>>>>>>>> What would you do to unlock the situation ?
>>>>>>>>>
>>>>>>>>> Context: The cluster consists of two DC, each with 15 nodes.
>>>>>>>>> Average load is around 3 TB per node. The joining node froze a little after
>>>>>>>>> 2 TB.
>>>>>>>>>
>>>>>>>>> Thank you for your help.
>>>>>>>>> Cheers,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jérôme Mainaud
>>>>>>>>> jerome@mainaud.com
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Laxmikanth
>>>>>>> 99621 38051
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>>

Re: New node block in autobootstrap

Posted by Jonathan Haddad <jo...@jonhaddad.com>.
What was the reason?

On Wed, Sep 28, 2016 at 9:58 AM techpyaasa . <te...@gmail.com> wrote:

> Very sorry...I got the reason for this issue..
> Please ignore.
>
>
> On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <te...@gmail.com>
> wrote:
>
>> @Paulo
>>
>> We have done changes as you said
>> net.ipv4.tcp_keepalive_time=60
>> net.ipv4.tcp_keepalive_probes=3
>> net.ipv4.tcp_keepalive_intvl=10
>>
>> and increased streaming_socket_timeout_in_ms to 48 hours ,
>> "phi_convict_threshold : 9".
>>
>> And once again recommissioned new data center (DC3)  , ran " nodetool
>> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
>> got exit without any exception.
>>
>> Please check logs below
>>
>> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
>> StorageService.java (line 914) rebuild from dc: IDC*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
>> StreamResultFuture.java (line 87) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.75*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.132*
>> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.75*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.133*
>> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.132*
>> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.133*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.167*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.78*
>> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.167*
>> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.78*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.126*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.191*
>> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.126*
>> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.191*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.168*
>> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
>> StreamResultFuture.java (line 91) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
>> /xxx.xxx.198.169*
>> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.168*
>> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
>> StreamSession.java (line 214) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
>> /xxx.xxx.198.169*
>> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
>> StreamResultFuture.java (line 186) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
>> complete*
>> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
>> StreamResultFuture.java (line 220) [Stream
>> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*
>>
>>
>> As you can see logs above , nodetool rebuild finished w/o data got
>> stremed and all streaming sessions completed WITHIN NOT TIME(See time stamp
>> in logs).
>>
>>
>> And also "nodetool status" seems to be all fine from this new nodes(from
>> which I run 'nodetool rebuild').
>>
>> Please let us know what could be the issue here.
>>
>> Thanks in advance.
>>
>> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pa...@gmail.com>
>> wrote:
>>
>>> Yeah this is likely to be caused by idle connections being shut down, so
>>> you may need to update your tcp_keepalive* and/or network/firewall settings.
>>>
>>>
>>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>>
>>>> Hi paul,
>>>>
>>>> Thanks for the reply...
>>>>
>>>> I'm getting following streaming exceptions during nodetool rebuild in
>>>> c*-2.0.17
>>>>
>>>> *04:24:49,759 StreamSession.java (line 461) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>> *java.io.IOException: Connection timed out*
>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>> *    at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>> ConnectionHandler.java (line 104) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>>>> /xxx.xxx.98.168*
>>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>> StreamResultFuture.java (line 186) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>>> complete*
>>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>>> StreamSession.java (line 461) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>> *java.io.IOException: Broken pipe*
>>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>>> *    at
>>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>> ConnectionHandler.java (line 244) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>>> 4736, transfer size: 2306880, compressed?: true), file:
>>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>>> StreamSession.java (line 461) [Stream
>>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>>>> *    at
>>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>>>> *    at
>>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>>>> *    at
>>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>>
>>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com>
>>>> wrote:
>>>>
>>>>> What type of streaming timeout are you getting? Do you have a stack
>>>>> trace? What version are you in?
>>>>>
>>>>> See more information about tuning tcp_keepalive* here:
>>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html
>>>>>
>>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>
>>>>> :
>>>>>
>>>>>> @Paulo Motta
>>>>>>
>>>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>>>> suggested in datastax blog  -
>>>>>> https://support.datastax.com/hc/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-streaming-errors-or-failures
>>>>>> , but still we are getting streaming exceptions.
>>>>>>
>>>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>>>> which would help streaming succeed ?
>>>>>>
>>>>>> Thank you
>>>>>>
>>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <
>>>>>> pauloricardomg@gmail.com> wrote:
>>>>>>
>>>>>>> What version are you in? This seems like a typical case were there
>>>>>>> was a problem with streaming (hanging, etc), do you have access to the
>>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are
>>>>>>> related to timeouts, so you should review your cassandra
>>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>>>
>>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>>>> particular series for a more robust version.
>>>>>>>
>>>>>>> Is there any reason why you didn't use the replace procedure
>>>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>>>> a bit faster than remove + bootstrap procedure.
>>>>>>>
>>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>>>
>>>>>>>> Hello,
>>>>>>>>
>>>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>>>> After 4 days, the node is still in joining mode, it doesn't have
>>>>>>>> the same level of load than the other and there seems to be no streaming
>>>>>>>> from and to the new node.
>>>>>>>>
>>>>>>>> This node has a history.
>>>>>>>>
>>>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>>>    2. Ops detected that client had problems with it.
>>>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>>>    launched several repair and rebuild process on the node.
>>>>>>>>    4. Then they asked me to help them.
>>>>>>>>    5. We stopped the node,
>>>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>>>    replaced by another node),
>>>>>>>>    7. removed it from the cluster (I choose not to use
>>>>>>>>    decommission since node data was compromised)
>>>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>>>    directories.
>>>>>>>>    9. after the leaving process ended, it was started as a fresh
>>>>>>>>    new node and began autobootstrap.
>>>>>>>>
>>>>>>>>
>>>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>>>>> And I can ask for people any required information.
>>>>>>>>
>>>>>>>> Does someone have any idea of what could have happened and what I
>>>>>>>> should investigate first ?
>>>>>>>> What would you do to unlock the situation ?
>>>>>>>>
>>>>>>>> Context: The cluster consists of two DC, each with 15 nodes.
>>>>>>>> Average load is around 3 TB per node. The joining node froze a little after
>>>>>>>> 2 TB.
>>>>>>>>
>>>>>>>> Thank you for your help.
>>>>>>>> Cheers,
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Jérôme Mainaud
>>>>>>>> jerome@mainaud.com
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> Laxmikanth
>>>>>> 99621 38051
>>>>>>
>>>>>>
>>>>>
>>>
>>
>

Re: New node block in autobootstrap

Posted by "techpyaasa ." <te...@gmail.com>.
Very sorry...I got the reason for this issue..
Please ignore.


On Wed, Sep 28, 2016 at 10:14 PM, techpyaasa . <te...@gmail.com> wrote:

> @Paulo
>
> We have done changes as you said
> net.ipv4.tcp_keepalive_time=60
> net.ipv4.tcp_keepalive_probes=3
> net.ipv4.tcp_keepalive_intvl=10
>
> and increased streaming_socket_timeout_in_ms to 48 hours ,
> "phi_convict_threshold : 9".
>
> And once again recommissioned new data center (DC3)  , ran " nodetool
> rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
> got exit without any exception.
>
> Please check logs below
>
> *INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
> StorageService.java (line 914) rebuild from dc: IDC*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
> StreamResultFuture.java (line 87) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.75*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.132*
> * INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.75*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.133*
> * INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.132*
> * INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.133*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.167*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.78*
> * INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.167*
> * INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.78*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.126*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.191*
> * INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.126*
> * INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.191*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.168*
> * INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
> StreamResultFuture.java (line 91) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
> /xxx.xxx.198.169*
> * INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.168*
> * INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
> StreamSession.java (line 214) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
> /xxx.xxx.198.169*
> * INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
> StreamResultFuture.java (line 186) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
> complete*
> * INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
> StreamResultFuture.java (line 220) [Stream
> #3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*
>
>
> As you can see logs above , nodetool rebuild finished w/o data got stremed
> and all streaming sessions completed WITHIN NOT TIME(See time stamp in
> logs).
>
>
> And also "nodetool status" seems to be all fine from this new nodes(from
> which I run 'nodetool rebuild').
>
> Please let us know what could be the issue here.
>
> Thanks in advance.
>
> On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pa...@gmail.com>
> wrote:
>
>> Yeah this is likely to be caused by idle connections being shut down, so
>> you may need to update your tcp_keepalive* and/or network/firewall settings.
>>
>>
>> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>
>>> Hi paul,
>>>
>>> Thanks for the reply...
>>>
>>> I'm getting following streaming exceptions during nodetool rebuild in
>>> c*-2.0.17
>>>
>>> *04:24:49,759 StreamSession.java (line 461) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>> *java.io.IOException: Connection timed out*
>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>> *    at
>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>>> *    at java.lang.Thread.run(Thread.java:745)*
>>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>> ConnectionHandler.java (line 104) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>>> /xxx.xxx.98.168*
>>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>> StreamResultFuture.java (line 186) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>>> complete*
>>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>>> StreamSession.java (line 461) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>> *java.io.IOException: Broken pipe*
>>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>>> *    at
>>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>>> *    at java.lang.Thread.run(Thread.java:745)*
>>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>> ConnectionHandler.java (line 244) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>>> 4736, transfer size: 2306880, compressed?: true), file:
>>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>>> StreamSession.java (line 461) [Stream
>>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>>> *    at
>>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>>> *    at
>>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>>> *    at
>>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>>> *    at java.lang.Thread.run(Thread.java:745)*
>>>
>>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com>
>>> wrote:
>>>
>>>> What type of streaming timeout are you getting? Do you have a stack
>>>> trace? What version are you in?
>>>>
>>>> See more information about tuning tcp_keepalive* here:
>>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>>> shooting/trblshootIdleFirewall.html
>>>>
>>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>>>
>>>>> @Paulo Motta
>>>>>
>>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>>> treaming-errors-or-failures  , but still we are getting streaming
>>>>> exceptions.
>>>>>
>>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>>> which would help streaming succeed ?
>>>>>
>>>>> Thank you
>>>>>
>>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <
>>>>> pauloricardomg@gmail.com> wrote:
>>>>>
>>>>>> What version are you in? This seems like a typical case were there
>>>>>> was a problem with streaming (hanging, etc), do you have access to the
>>>>>> logs? Maybe look for streaming errors? Typically streaming errors are
>>>>>> related to timeouts, so you should review your cassandra
>>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>>
>>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>>> particular series for a more robust version.
>>>>>>
>>>>>> Is there any reason why you didn't use the replace procedure
>>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>>> a bit faster than remove + bootstrap procedure.
>>>>>>
>>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>>>> same level of load than the other and there seems to be no streaming from
>>>>>>> and to the new node.
>>>>>>>
>>>>>>> This node has a history.
>>>>>>>
>>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>>    2. Ops detected that client had problems with it.
>>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>>    launched several repair and rebuild process on the node.
>>>>>>>    4. Then they asked me to help them.
>>>>>>>    5. We stopped the node,
>>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>>    replaced by another node),
>>>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>>>    since node data was compromised)
>>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>>    directories.
>>>>>>>    9. after the leaving process ended, it was started as a fresh
>>>>>>>    new node and began autobootstrap.
>>>>>>>
>>>>>>>
>>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>>>> And I can ask for people any required information.
>>>>>>>
>>>>>>> Does someone have any idea of what could have happened and what I
>>>>>>> should investigate first ?
>>>>>>> What would you do to unlock the situation ?
>>>>>>>
>>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>>>
>>>>>>> Thank you for your help.
>>>>>>> Cheers,
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jérôme Mainaud
>>>>>>> jerome@mainaud.com
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Laxmikanth
>>>>> 99621 38051
>>>>>
>>>>>
>>>>
>>
>

Re: New node block in autobootstrap

Posted by "techpyaasa ." <te...@gmail.com>.
@Paulo

We have done changes as you said
net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10

and increased streaming_socket_timeout_in_ms to 48 hours ,
"phi_convict_threshold : 9".

And once again recommissioned new data center (DC3)  , ran " nodetool
rebuild 'DC1' " , but this time NO data got streamed and 'nodetool rebuild'
got exit without any exception.

Please check logs below

*INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:44,571
StorageService.java (line 914) rebuild from dc: IDC*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,520
StreamResultFuture.java (line 87) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Executing streaming plan for Rebuild*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,521
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:1] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.75*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,522
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.133*
* INFO [StreamConnectionEstablisher:2] 2016-09-28 09:18:47,522
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.132*
* INFO [StreamConnectionEstablisher:3] 2016-09-28 09:18:47,523
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.133*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,523
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.167*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.78*
* INFO [StreamConnectionEstablisher:4] 2016-09-28 09:18:47,524
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.167*
* INFO [StreamConnectionEstablisher:5] 2016-09-28 09:18:47,525
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.78*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,524
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.126*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,525
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.191*
* INFO [StreamConnectionEstablisher:6] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.126*
* INFO [StreamConnectionEstablisher:7] 2016-09-28 09:18:47,526
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.191*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,526
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.168*
* INFO [RMI TCP Connection(10)-xxx.xxx.12.140] 2016-09-28 09:18:47,527
StreamResultFuture.java (line 91) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Beginning stream session with
/xxx.xxx.198.169*
* INFO [StreamConnectionEstablisher:8] 2016-09-28 09:18:47,527
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.168*
* INFO [StreamConnectionEstablisher:9] 2016-09-28 09:18:47,528
StreamSession.java (line 214) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Starting streaming to
/xxx.xxx.198.169*
* INFO [STREAM-IN-/xxx.xxx.198.132] 2016-09-28 09:18:47,713
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.132 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.191] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.191 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.133] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.133 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.169] 2016-09-28 09:18:47,716
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.169 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.167] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.167 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.126] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.126 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.78] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.78 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.168] 2016-09-28 09:18:47,715
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.168 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,776
StreamResultFuture.java (line 186) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] Session with /xxx.xxx.198.75 is
complete*
* INFO [STREAM-IN-/xxx.xxx.198.75] 2016-09-28 09:18:47,778
StreamResultFuture.java (line 220) [Stream
#3a47f8d0-8597-11e6-bd17-3f6744d54a01] All sessions completed*


As you can see logs above , nodetool rebuild finished w/o data got stremed
and all streaming sessions completed WITHIN NOT TIME(See time stamp in
logs).


And also "nodetool status" seems to be all fine from this new nodes(from
which I run 'nodetool rebuild').

Please let us know what could be the issue here.

Thanks in advance.

On Wed, Sep 28, 2016 at 1:04 AM, Paulo Motta <pa...@gmail.com>
wrote:

> Yeah this is likely to be caused by idle connections being shut down, so
> you may need to update your tcp_keepalive* and/or network/firewall settings.
>
>
> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>
>> Hi paul,
>>
>> Thanks for the reply...
>>
>> I'm getting following streaming exceptions during nodetool rebuild in
>> c*-2.0.17
>>
>> *04:24:49,759 StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Connection timed out*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> ConnectionHandler.java (line 104) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>> /xxx.xxx.98.168*
>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamResultFuture.java (line 186) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>> complete*
>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Broken pipe*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> ConnectionHandler.java (line 244) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>> 4736, transfer size: 2306880, compressed?: true), file:
>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>>
>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com> wrote:
>>
>>> What type of streaming timeout are you getting? Do you have a stack
>>> trace? What version are you in?
>>>
>>> See more information about tuning tcp_keepalive* here:
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>> shooting/trblshootIdleFirewall.html
>>>
>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>>
>>>> @Paulo Motta
>>>>
>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>> treaming-errors-or-failures  , but still we are getting streaming
>>>> exceptions.
>>>>
>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>> which would help streaming succeed ?
>>>>
>>>> Thank you
>>>>
>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricardomg@gmail.com
>>>> > wrote:
>>>>
>>>>> What version are you in? This seems like a typical case were there was
>>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>>> timeouts, so you should review your cassandra
>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>
>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>> particular series for a more robust version.
>>>>>
>>>>> Is there any reason why you didn't use the replace procedure
>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>> a bit faster than remove + bootstrap procedure.
>>>>>
>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>>> same level of load than the other and there seems to be no streaming from
>>>>>> and to the new node.
>>>>>>
>>>>>> This node has a history.
>>>>>>
>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>    2. Ops detected that client had problems with it.
>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>    launched several repair and rebuild process on the node.
>>>>>>    4. Then they asked me to help them.
>>>>>>    5. We stopped the node,
>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>    replaced by another node),
>>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>>    since node data was compromised)
>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>    directories.
>>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>>    node and began autobootstrap.
>>>>>>
>>>>>>
>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>>> And I can ask for people any required information.
>>>>>>
>>>>>> Does someone have any idea of what could have happened and what I
>>>>>> should investigate first ?
>>>>>> What would you do to unlock the situation ?
>>>>>>
>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>>
>>>>>> Thank you for your help.
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jérôme Mainaud
>>>>>> jerome@mainaud.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Laxmikanth
>>>> 99621 38051
>>>>
>>>>
>>>
>

Re: New node block in autobootstrap

Posted by laxmikanth sadula <la...@gmail.com>.
Ok... Thanks for the reply...
I'm going to retry nodetool rebuild with following changes as you said

net.ipv4.tcp_keepalive_time=60 net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10

Hope this changes would be enough on the new node where I'm running
'nodetool rebuild' and hope NOT required on all existing nodes from which
data is going to get streamed..Am I right?

On Sep 28, 2016 1:04 AM, "Paulo Motta" <pa...@gmail.com> wrote:

> Yeah this is likely to be caused by idle connections being shut down, so
> you may need to update your tcp_keepalive* and/or network/firewall settings.
>
> 2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>
>> Hi paul,
>>
>> Thanks for the reply...
>>
>> I'm getting following streaming exceptions during nodetool rebuild in
>> c*-2.0.17
>>
>> *04:24:49,759 StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Connection timed out*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> ConnectionHandler.java (line 104) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
>> /xxx.xxx.98.168*
>> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamResultFuture.java (line 186) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
>> complete*
>> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.io.IOException: Broken pipe*
>> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
>> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
>> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
>> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
>> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
>> *    at
>> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> ConnectionHandler.java (line 244) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
>> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
>> 4736, transfer size: 2306880, compressed?: true), file:
>> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
>> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
>> StreamSession.java (line 461) [Stream
>> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
>> *java.lang.RuntimeException: Outgoing stream handler has been closed*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
>> *    at
>> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
>> *    at
>> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
>> *    at java.lang.Thread.run(Thread.java:745)*
>>
>> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com> wrote:
>>
>>> What type of streaming timeout are you getting? Do you have a stack
>>> trace? What version are you in?
>>>
>>> See more information about tuning tcp_keepalive* here:
>>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>>> shooting/trblshootIdleFirewall.html
>>>
>>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>>
>>>> @Paulo Motta
>>>>
>>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>>> suggested in datastax blog  - https://support.datastax.com/h
>>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>>> treaming-errors-or-failures  , but still we are getting streaming
>>>> exceptions.
>>>>
>>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>>> which would help streaming succeed ?
>>>>
>>>> Thank you
>>>>
>>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pauloricardomg@gmail.com
>>>> > wrote:
>>>>
>>>>> What version are you in? This seems like a typical case were there was
>>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>>> timeouts, so you should review your cassandra
>>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>>
>>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>>> particular series for a more robust version.
>>>>>
>>>>> Is there any reason why you didn't use the replace procedure
>>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>>> a bit faster than remove + bootstrap procedure.
>>>>>
>>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> A client of mime have problems when adding a node in the cluster.
>>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>>> same level of load than the other and there seems to be no streaming from
>>>>>> and to the new node.
>>>>>>
>>>>>> This node has a history.
>>>>>>
>>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>>    2. Ops detected that client had problems with it.
>>>>>>    3. They tried to reset it but failed. In their process they
>>>>>>    launched several repair and rebuild process on the node.
>>>>>>    4. Then they asked me to help them.
>>>>>>    5. We stopped the node,
>>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>>    replaced by another node),
>>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>>    since node data was compromised)
>>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>>    directories.
>>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>>    node and began autobootstrap.
>>>>>>
>>>>>>
>>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>>> And I can ask for people any required information.
>>>>>>
>>>>>> Does someone have any idea of what could have happened and what I
>>>>>> should investigate first ?
>>>>>> What would you do to unlock the situation ?
>>>>>>
>>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>>
>>>>>> Thank you for your help.
>>>>>> Cheers,
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jérôme Mainaud
>>>>>> jerome@mainaud.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Laxmikanth
>>>> 99621 38051
>>>>
>>>>
>>>
>

Re: New node block in autobootstrap

Posted by Paulo Motta <pa...@gmail.com>.
Yeah this is likely to be caused by idle connections being shut down, so
you may need to update your tcp_keepalive* and/or network/firewall settings.

2016-09-27 15:29 GMT-03:00 laxmikanth sadula <la...@gmail.com>:

> Hi paul,
>
> Thanks for the reply...
>
> I'm getting following streaming exceptions during nodetool rebuild in
> c*-2.0.17
>
> *04:24:49,759 StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.io.IOException: Connection timed out*
> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *    at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
> *    at java.lang.Thread.run(Thread.java:745)*
> *DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> ConnectionHandler.java (line 104) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
> /xxx.xxx.98.168*
> * INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> StreamResultFuture.java (line 186) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
> complete*
> *ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
> StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.io.IOException: Broken pipe*
> *    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
> *    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
> *    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
> *    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
> *    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
> *    at
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
> *    at java.lang.Thread.run(Thread.java:745)*
> *DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
> ConnectionHandler.java (line 244) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
> 68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
> 4736, transfer size: 2306880, compressed?: true), file:
> /home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
> *ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
> StreamSession.java (line 461) [Stream
> #5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
> *java.lang.RuntimeException: Outgoing stream handler has been closed*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
> *    at
> org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
> *    at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
> *    at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
> *    at java.lang.Thread.run(Thread.java:745)*
>
> On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com> wrote:
>
>> What type of streaming timeout are you getting? Do you have a stack
>> trace? What version are you in?
>>
>> See more information about tuning tcp_keepalive* here:
>> https://docs.datastax.com/en/cassandra/2.0/cassandra/trouble
>> shooting/trblshootIdleFirewall.html
>>
>> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>>
>>> @Paulo Motta
>>>
>>> Even we are facing Streaming timeout exceptions during 'nodetool
>>> rebuild' , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as
>>> suggested in datastax blog  - https://support.datastax.com/h
>>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>>> treaming-errors-or-failures  , but still we are getting streaming
>>> exceptions.
>>>
>>> And what is the suggestible settings/values for kernel tcp_keepalive
>>> which would help streaming succeed ?
>>>
>>> Thank you
>>>
>>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pa...@gmail.com>
>>> wrote:
>>>
>>>> What version are you in? This seems like a typical case were there was
>>>> a problem with streaming (hanging, etc), do you have access to the logs?
>>>> Maybe look for streaming errors? Typically streaming errors are related to
>>>> timeouts, so you should review your cassandra
>>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>>
>>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>>> bootstrap resume. There were also some streaming hanging problems fixed
>>>> recently, so I'd advise you to upgrade to the latest version of your
>>>> particular series for a more robust version.
>>>>
>>>> Is there any reason why you didn't use the replace procedure
>>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>>> a bit faster than remove + bootstrap procedure.
>>>>
>>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>>
>>>>> Hello,
>>>>>
>>>>> A client of mime have problems when adding a node in the cluster.
>>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>>> same level of load than the other and there seems to be no streaming from
>>>>> and to the new node.
>>>>>
>>>>> This node has a history.
>>>>>
>>>>>    1. At the begin, it was in a seed in the cluster.
>>>>>    2. Ops detected that client had problems with it.
>>>>>    3. They tried to reset it but failed. In their process they
>>>>>    launched several repair and rebuild process on the node.
>>>>>    4. Then they asked me to help them.
>>>>>    5. We stopped the node,
>>>>>    6. removed it from the list of seeds (more precisely it was
>>>>>    replaced by another node),
>>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>>    since node data was compromised)
>>>>>    8. deleted all files from data, commitlog and savedcache
>>>>>    directories.
>>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>>    node and began autobootstrap.
>>>>>
>>>>>
>>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>>> information, but I will have tomorrow (logs and results of some commands).
>>>>> And I can ask for people any required information.
>>>>>
>>>>> Does someone have any idea of what could have happened and what I
>>>>> should investigate first ?
>>>>> What would you do to unlock the situation ?
>>>>>
>>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>>
>>>>> Thank you for your help.
>>>>> Cheers,
>>>>>
>>>>>
>>>>> --
>>>>> Jérôme Mainaud
>>>>> jerome@mainaud.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Laxmikanth
>>> 99621 38051
>>>
>>>
>>

Re: New node block in autobootstrap

Posted by laxmikanth sadula <la...@gmail.com>.
Hi paul,

Thanks for the reply...

I'm getting following streaming exceptions during nodetool rebuild in
c*-2.0.17

*04:24:49,759 StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.io.IOException: Connection timed out*
*    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*    at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:311)*
*    at java.lang.Thread.run(Thread.java:745)*
*DEBUG [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
ConnectionHandler.java (line 104) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Closing stream connection handler on
/xxx.xxx.98.168*
* INFO [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
StreamResultFuture.java (line 186) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Session with /xxx.xxx.98.168 is
complete*
*ERROR [STREAM-OUT-/xxx.xxx.98.168] 2016-09-27 04:24:49,764
StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.io.IOException: Broken pipe*
*    at sun.nio.ch.FileDispatcherImpl.write0(Native Method)*
*    at sun.nio.ch.SocketDispatcher.write(SocketDispatcher.java:47)*
*    at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)*
*    at sun.nio.ch.IOUtil.write(IOUtil.java:65)*
*    at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:487)*
*    at
org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:44)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:339)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:319)*
*    at java.lang.Thread.run(Thread.java:745)*
*DEBUG [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
ConnectionHandler.java (line 244) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Received File (Header (cfId:
68af9ee0-96f8-3b1d-a418-e5ae844f2cc2, #3, version: jb, estimated keys:
4736, transfer size: 2306880, compressed?: true), file:
/home/cassandra/data_directories/data/keyspace_name1/archiving_metadata/keyspace_name1-archiving_metadata-tmp-jb-27-Data.db)*
*ERROR [STREAM-IN-/xxx.xxx.98.168] 2016-09-27 04:24:49,909
StreamSession.java (line 461) [Stream
#5e1b7f40-8496-11e6-8847-1b88665e430d] Streaming error occurred*
*java.lang.RuntimeException: Outgoing stream handler has been closed*
*    at
org.apache.cassandra.streaming.ConnectionHandler.sendMessage(ConnectionHandler.java:126)*
*    at
org.apache.cassandra.streaming.StreamSession.receive(StreamSession.java:524)*
*    at
org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:413)*
*    at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:245)*
*    at java.lang.Thread.run(Thread.java:745)*

On Sep 27, 2016 11:48 PM, "Paulo Motta" <pa...@gmail.com> wrote:

> What type of streaming timeout are you getting? Do you have a stack trace?
> What version are you in?
>
> See more information about tuning tcp_keepalive* here:
> https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/
> trblshootIdleFirewall.html
>
> 2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:
>
>> @Paulo Motta
>>
>> Even we are facing Streaming timeout exceptions during 'nodetool rebuild'
>> , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested
>> in datastax blog  - https://support.datastax.com/h
>> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-s
>> treaming-errors-or-failures  , but still we are getting streaming
>> exceptions.
>>
>> And what is the suggestible settings/values for kernel tcp_keepalive
>> which would help streaming succeed ?
>>
>> Thank you
>>
>> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pa...@gmail.com>
>> wrote:
>>
>>> What version are you in? This seems like a typical case were there was a
>>> problem with streaming (hanging, etc), do you have access to the logs?
>>> Maybe look for streaming errors? Typically streaming errors are related to
>>> timeouts, so you should review your cassandra
>>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>>
>>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>>> bootstrap resume. There were also some streaming hanging problems fixed
>>> recently, so I'd advise you to upgrade to the latest version of your
>>> particular series for a more robust version.
>>>
>>> Is there any reason why you didn't use the replace procedure
>>> (-Dreplace_address) to replace the node with the same tokens? This would be
>>> a bit faster than remove + bootstrap procedure.
>>>
>>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>>
>>>> Hello,
>>>>
>>>> A client of mime have problems when adding a node in the cluster.
>>>> After 4 days, the node is still in joining mode, it doesn't have the
>>>> same level of load than the other and there seems to be no streaming from
>>>> and to the new node.
>>>>
>>>> This node has a history.
>>>>
>>>>    1. At the begin, it was in a seed in the cluster.
>>>>    2. Ops detected that client had problems with it.
>>>>    3. They tried to reset it but failed. In their process they
>>>>    launched several repair and rebuild process on the node.
>>>>    4. Then they asked me to help them.
>>>>    5. We stopped the node,
>>>>    6. removed it from the list of seeds (more precisely it was
>>>>    replaced by another node),
>>>>    7. removed it from the cluster (I choose not to use decommission
>>>>    since node data was compromised)
>>>>    8. deleted all files from data, commitlog and savedcache
>>>>    directories.
>>>>    9. after the leaving process ended, it was started as a fresh new
>>>>    node and began autobootstrap.
>>>>
>>>>
>>>> As I don’t have direct access to the cluster I don't have a lot of
>>>> information, but I will have tomorrow (logs and results of some commands).
>>>> And I can ask for people any required information.
>>>>
>>>> Does someone have any idea of what could have happened and what I
>>>> should investigate first ?
>>>> What would you do to unlock the situation ?
>>>>
>>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>>
>>>> Thank you for your help.
>>>> Cheers,
>>>>
>>>>
>>>> --
>>>> Jérôme Mainaud
>>>> jerome@mainaud.com
>>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Laxmikanth
>> 99621 38051
>>
>>
>

Re: New node block in autobootstrap

Posted by Paulo Motta <pa...@gmail.com>.
What type of streaming timeout are you getting? Do you have a stack trace?
What version are you in?

See more information about tuning tcp_keepalive* here:
https://docs.datastax.com/en/cassandra/2.0/cassandra/troubleshooting/trblshootIdleFirewall.html

2016-09-27 14:07 GMT-03:00 laxmikanth sadula <la...@gmail.com>:

> @Paulo Motta
>
> Even we are facing Streaming timeout exceptions during 'nodetool rebuild'
> , I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested
> in datastax blog  - https://support.datastax.com/h
> c/en-us/articles/206502913-FAQ-How-to-reduce-the-impact-of-
> streaming-errors-or-failures  , but still we are getting streaming
> exceptions.
>
> And what is the suggestible settings/values for kernel tcp_keepalive which
> would help streaming succeed ?
>
> Thank you
>
> On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pa...@gmail.com>
> wrote:
>
>> What version are you in? This seems like a typical case were there was a
>> problem with streaming (hanging, etc), do you have access to the logs?
>> Maybe look for streaming errors? Typically streaming errors are related to
>> timeouts, so you should review your cassandra
>> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>>
>> If you're on 2.2+ you can resume a failed bootstrap with nodetool
>> bootstrap resume. There were also some streaming hanging problems fixed
>> recently, so I'd advise you to upgrade to the latest version of your
>> particular series for a more robust version.
>>
>> Is there any reason why you didn't use the replace procedure
>> (-Dreplace_address) to replace the node with the same tokens? This would be
>> a bit faster than remove + bootstrap procedure.
>>
>> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>>
>>> Hello,
>>>
>>> A client of mime have problems when adding a node in the cluster.
>>> After 4 days, the node is still in joining mode, it doesn't have the
>>> same level of load than the other and there seems to be no streaming from
>>> and to the new node.
>>>
>>> This node has a history.
>>>
>>>    1. At the begin, it was in a seed in the cluster.
>>>    2. Ops detected that client had problems with it.
>>>    3. They tried to reset it but failed. In their process they launched
>>>    several repair and rebuild process on the node.
>>>    4. Then they asked me to help them.
>>>    5. We stopped the node,
>>>    6. removed it from the list of seeds (more precisely it was replaced
>>>    by another node),
>>>    7. removed it from the cluster (I choose not to use decommission
>>>    since node data was compromised)
>>>    8. deleted all files from data, commitlog and savedcache
>>>    directories.
>>>    9. after the leaving process ended, it was started as a fresh new
>>>    node and began autobootstrap.
>>>
>>>
>>> As I don’t have direct access to the cluster I don't have a lot of
>>> information, but I will have tomorrow (logs and results of some commands).
>>> And I can ask for people any required information.
>>>
>>> Does someone have any idea of what could have happened and what I should
>>> investigate first ?
>>> What would you do to unlock the situation ?
>>>
>>> Context: The cluster consists of two DC, each with 15 nodes. Average
>>> load is around 3 TB per node. The joining node froze a little after 2 TB.
>>>
>>> Thank you for your help.
>>> Cheers,
>>>
>>>
>>> --
>>> Jérôme Mainaud
>>> jerome@mainaud.com
>>>
>>
>>
>
>
> --
> Regards,
> Laxmikanth
> 99621 38051
>
>

Re: New node block in autobootstrap

Posted by laxmikanth sadula <la...@gmail.com>.
@Paulo Motta

Even we are facing Streaming timeout exceptions during 'nodetool rebuild' ,
I set streaming_socket_timeout_in_ms to 86400000 (24 hours) as suggested in
datastax blog  - https://support.datastax.com/hc/en-us/articles/206502913-
FAQ-How-to-reduce-the-impact-of-streaming-errors-or-failures  , but still
we are getting streaming exceptions.

And what is the suggestible settings/values for kernel tcp_keepalive which
would help streaming succeed ?

Thank you

On Tue, Aug 16, 2016 at 12:21 AM, Paulo Motta <pa...@gmail.com>
wrote:

> What version are you in? This seems like a typical case were there was a
> problem with streaming (hanging, etc), do you have access to the logs?
> Maybe look for streaming errors? Typically streaming errors are related to
> timeouts, so you should review your cassandra
> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>
> If you're on 2.2+ you can resume a failed bootstrap with nodetool
> bootstrap resume. There were also some streaming hanging problems fixed
> recently, so I'd advise you to upgrade to the latest version of your
> particular series for a more robust version.
>
> Is there any reason why you didn't use the replace procedure
> (-Dreplace_address) to replace the node with the same tokens? This would be
> a bit faster than remove + bootstrap procedure.
>
> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>
>> Hello,
>>
>> A client of mime have problems when adding a node in the cluster.
>> After 4 days, the node is still in joining mode, it doesn't have the same
>> level of load than the other and there seems to be no streaming from and to
>> the new node.
>>
>> This node has a history.
>>
>>    1. At the begin, it was in a seed in the cluster.
>>    2. Ops detected that client had problems with it.
>>    3. They tried to reset it but failed. In their process they launched
>>    several repair and rebuild process on the node.
>>    4. Then they asked me to help them.
>>    5. We stopped the node,
>>    6. removed it from the list of seeds (more precisely it was replaced
>>    by another node),
>>    7. removed it from the cluster (I choose not to use decommission
>>    since node data was compromised)
>>    8. deleted all files from data, commitlog and savedcache directories.
>>    9. after the leaving process ended, it was started as a fresh new
>>    node and began autobootstrap.
>>
>>
>> As I don’t have direct access to the cluster I don't have a lot of
>> information, but I will have tomorrow (logs and results of some commands).
>> And I can ask for people any required information.
>>
>> Does someone have any idea of what could have happened and what I should
>> investigate first ?
>> What would you do to unlock the situation ?
>>
>> Context: The cluster consists of two DC, each with 15 nodes. Average load
>> is around 3 TB per node. The joining node froze a little after 2 TB.
>>
>> Thank you for your help.
>> Cheers,
>>
>>
>> --
>> Jérôme Mainaud
>> jerome@mainaud.com
>>
>
>


-- 
Regards,
Laxmikanth
99621 38051

Re: New node block in autobootstrap

Posted by Jérôme Mainaud <je...@mainaud.com>.
Hello Paul,

Thank you for your reply.
The version is 2.2.6.

I received the logs today and can confirm three streams failed after
timeout. We will try to resume the bootstrap as you recommended.

I didn't use -Dreplace_address for two reasons:

   1. Because someone tried to reset the node someway. Because this person
   is on vacation, nobody really knows what he did. I supposed he just trash
   the data directory and launch the node again without (-Dreplace_address)
   nor removing the node before. I was unsure about how valid the tokens were
   so I preferred to remove it to go back to a clean situation.
   2. Since the replacing node and the new node have the same endpoint
   address (this is a fresh version of the same node) I was not sure the
   replace_address will not be confused.

Since I had time and was not sure that replacing the node would work in my
situation, I chose the slow safe way. Maybe I could have used it.





-- 
Jérôme Mainaud
jerome@mainaud.com

2016-08-15 20:51 GMT+02:00 Paulo Motta <pa...@gmail.com>:

> What version are you in? This seems like a typical case were there was a
> problem with streaming (hanging, etc), do you have access to the logs?
> Maybe look for streaming errors? Typically streaming errors are related to
> timeouts, so you should review your cassandra
> streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.
>
> If you're on 2.2+ you can resume a failed bootstrap with nodetool
> bootstrap resume. There were also some streaming hanging problems fixed
> recently, so I'd advise you to upgrade to the latest version of your
> particular series for a more robust version.
>
> Is there any reason why you didn't use the replace procedure
> (-Dreplace_address) to replace the node with the same tokens? This would be
> a bit faster than remove + bootstrap procedure.
>
> 2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:
>
>> Hello,
>>
>> A client of mime have problems when adding a node in the cluster.
>> After 4 days, the node is still in joining mode, it doesn't have the same
>> level of load than the other and there seems to be no streaming from and to
>> the new node.
>>
>> This node has a history.
>>
>>    1. At the begin, it was in a seed in the cluster.
>>    2. Ops detected that client had problems with it.
>>    3. They tried to reset it but failed. In their process they launched
>>    several repair and rebuild process on the node.
>>    4. Then they asked me to help them.
>>    5. We stopped the node,
>>    6. removed it from the list of seeds (more precisely it was replaced
>>    by another node),
>>    7. removed it from the cluster (I choose not to use decommission
>>    since node data was compromised)
>>    8. deleted all files from data, commitlog and savedcache directories.
>>    9. after the leaving process ended, it was started as a fresh new
>>    node and began autobootstrap.
>>
>>
>> As I don’t have direct access to the cluster I don't have a lot of
>> information, but I will have tomorrow (logs and results of some commands).
>> And I can ask for people any required information.
>>
>> Does someone have any idea of what could have happened and what I should
>> investigate first ?
>> What would you do to unlock the situation ?
>>
>> Context: The cluster consists of two DC, each with 15 nodes. Average load
>> is around 3 TB per node. The joining node froze a little after 2 TB.
>>
>> Thank you for your help.
>> Cheers,
>>
>>
>> --
>> Jérôme Mainaud
>> jerome@mainaud.com
>>
>
>

Re: New node block in autobootstrap

Posted by Paulo Motta <pa...@gmail.com>.
What version are you in? This seems like a typical case were there was a
problem with streaming (hanging, etc), do you have access to the logs?
Maybe look for streaming errors? Typically streaming errors are related to
timeouts, so you should review your cassandra
streaming_socket_timeout_in_ms and kernel tcp_keepalive settings.

If you're on 2.2+ you can resume a failed bootstrap with nodetool bootstrap
resume. There were also some streaming hanging problems fixed recently, so
I'd advise you to upgrade to the latest version of your particular series
for a more robust version.

Is there any reason why you didn't use the replace procedure
(-Dreplace_address) to replace the node with the same tokens? This would be
a bit faster than remove + bootstrap procedure.

2016-08-15 15:37 GMT-03:00 Jérôme Mainaud <je...@mainaud.com>:

> Hello,
>
> A client of mime have problems when adding a node in the cluster.
> After 4 days, the node is still in joining mode, it doesn't have the same
> level of load than the other and there seems to be no streaming from and to
> the new node.
>
> This node has a history.
>
>    1. At the begin, it was in a seed in the cluster.
>    2. Ops detected that client had problems with it.
>    3. They tried to reset it but failed. In their process they launched
>    several repair and rebuild process on the node.
>    4. Then they asked me to help them.
>    5. We stopped the node,
>    6. removed it from the list of seeds (more precisely it was replaced
>    by another node),
>    7. removed it from the cluster (I choose not to use decommission since
>    node data was compromised)
>    8. deleted all files from data, commitlog and savedcache directories.
>    9. after the leaving process ended, it was started as a fresh new node
>    and began autobootstrap.
>
>
> As I don’t have direct access to the cluster I don't have a lot of
> information, but I will have tomorrow (logs and results of some commands).
> And I can ask for people any required information.
>
> Does someone have any idea of what could have happened and what I should
> investigate first ?
> What would you do to unlock the situation ?
>
> Context: The cluster consists of two DC, each with 15 nodes. Average load
> is around 3 TB per node. The joining node froze a little after 2 TB.
>
> Thank you for your help.
> Cheers,
>
>
> --
> Jérôme Mainaud
> jerome@mainaud.com
>