You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by David CHARBONNIER <Da...@rgsystem.com> on 2015/07/01 16:55:52 UTC

RE: Stream failure while adding a new node

Hi Alain,

We still have the timeout problem in OPSCenter and we still didn’t solve this problem so no we didn’t ran an entire repair with the repair service.
And yes, during this try, we’ve set auto_bootstrap to true and ran a repair on the 9th node after it finished streaming.

Thank you for your help.

Best regards,

[cid:image001.png@01D0B41D.54B89AA0]

David CHARBONNIER

Sysadmin

T : +33 411 934 200

david.charbonnier@rgsystem.com<ma...@rgsystem.com>


ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com<http://www.rgsystem.com/>



[cid:image003.png@01D0B41D.54B89AA0]



De : Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Envoyé : mardi 30 juin 2015 15:18
À : user@cassandra.apache.org
Objet : Re: Stream failure while adding a new node

Hi David,

Are you sure you ran the repair entirely (9 days + repair logs ok on opscenter server) before adding the 10th node ? This is important to avoid potential data loss ! Did you set auto_bootstrap to true on this 10th node ?

C*heers,

Alain



2015-06-29 14:54 GMT+02:00 David CHARBONNIER <Da...@rgsystem.com>>:
Hi,

We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 with a 9 nodes cluster.
We need to add a few new nodes to the cluster but we’re experiencing an issue we don’t know how to solve.
Here is exactly what we did :

-          We had 8 nodes and need to add a few ones

-          We tried to add 9th node but stream stucked a very long time and bootstrap never finish (related to streaming_socket_timeout_in_ms default value in cassandra.yaml)

-          We ran a solution given by a Datastax’s architect : restart the node with auto_bootstrap set to false and run a repair

-          After this issue, we ran into pathing the default configuration on all our nodes to avoid this problem and made a rolling restart of the cluster

-          Then, we tried adding a 10th node but it receives stream from only one node (node2).

Here is the logs on this problematic node (node10) :
INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 87) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Executing streaming plan for Bootstrap
INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node6
INFO [main] 2015-06-26 15:25:59,491 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node5
INFO [main] 2015-06-26 15:25:59,492 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node4
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node3
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node9
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node8
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node7
INFO [main] 2015-06-26 15:25:59,494 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node1
INFO [main] 2015-06-26 15:25:59,494 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node2
INFO [STREAM-IN-/node6] 2015-06-26 15:25:59,515 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node6 is complete
INFO [STREAM-IN-/node4] 2015-06-26 15:25:59,516 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node4 is complete
INFO [STREAM-IN-/node5] 2015-06-26 15:25:59,517 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node5 is complete
INFO [STREAM-IN-/node3] 2015-06-26 15:25:59,527 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node3 is complete
INFO [STREAM-IN-/node1] 2015-06-26 15:25:59,528 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node1 is complete
INFO [STREAM-IN-/node8] 2015-06-26 15:25:59,530 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node8 is complete
INFO [STREAM-IN-/node7] 2015-06-26 15:25:59,531 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node7 is complete
INFO [STREAM-IN-/node9] 2015-06-26 15:25:59,533 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node9 is complete
INFO [STREAM-IN-/node2] 2015-06-26 15:26:04,874 StreamResultFuture.java (line 173) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Prepare completed. Receiving 171 files(14844054090 bytes), sending 0 files(0 bytes)

On the other nodes (not node2 which streams data), there is an error telling that node10 has no hostID.

Did you ran into this issue or do you have any idea on how to solve this ?

Thank you for your help.

Best regards,

[cid:image001.png@01D0B41D.54B89AA0]

David CHARBONNIER

Sysadmin

T : +33 411 934 200

david.charbonnier@rgsystem.com<ma...@rgsystem.com>


ZAC Aéroport

125 Impasse Adam Smith

34470 Pérols - France

www.rgsystem.com<http://www.rgsystem.com/>



[cid:image003.png@01D0B41D.54B89AA0]





Re: Stream failure while adding a new node

Posted by Jan <cn...@yahoo.com>.
David ;
bring down all the nodes with the exception of the 'seed' node.Now bring up the 10th node.   Run 'nodetool status'  and wait until this 10th node is UP. Bring up the rest of the nodes after that. Run  'nodetool status'  again and check that all the nodes are UP.  
Alternatively;decommission the 10th node completely.drop it from the Cluster.  Build a new node with the same IP and hostname  and have it join the running cluster. 
hope this helpsJan


 


     On Wednesday, July 1, 2015 7:56 AM, David CHARBONNIER <Da...@rgsystem.com> wrote:
   

 #yiv2507924157 #yiv2507924157 -- _filtered #yiv2507924157 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2507924157 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv2507924157 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv2507924157 #yiv2507924157 p.yiv2507924157MsoNormal, #yiv2507924157 li.yiv2507924157MsoNormal, #yiv2507924157 div.yiv2507924157MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2507924157 a:link, #yiv2507924157 span.yiv2507924157MsoHyperlink {color:blue;text-decoration:underline;}#yiv2507924157 a:visited, #yiv2507924157 span.yiv2507924157MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv2507924157 p {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2507924157 p.yiv2507924157MsoAcetate, #yiv2507924157 li.yiv2507924157MsoAcetate, #yiv2507924157 div.yiv2507924157MsoAcetate {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv2507924157 span.yiv2507924157EmailStyle18 {color:#1F497D;}#yiv2507924157 span.yiv2507924157TextedebullesCar {}#yiv2507924157 .yiv2507924157MsoChpDefault {} _filtered #yiv2507924157 {margin:70.85pt 70.85pt 70.85pt 70.85pt;}#yiv2507924157 div.yiv2507924157WordSection1 {}#yiv2507924157 Hi Alain,    We still have the timeout problem in OPSCenter and we still didn’t solve this problem so no we didn’t ran an entire repair with the repair service. And yes, during this try, we’ve set auto_bootstrap to true and ran a repair on the 9th node after it finished streaming.    Thank you for your help.    Best regards,    
|   | 
| David CHARBONNIER  |
| Sysadmin  |
| T : +33 411 934 200  |
| david.charbonnier@rgsystem.com  |

 |  | 
| ZAC Aéroport  |
| 125 Impasse Adam Smith  |
| 34470 Pérols - France  |
| www.rgsystem.com  |

 |

   
|   |

      De : Alain RODRIGUEZ [mailto:arodrime@gmail.com]
Envoyé : mardi 30 juin 2015 15:18
À : user@cassandra.apache.org
Objet : Re: Stream failure while adding a new node    Hi David,    Are you sure you ran the repair entirely (9 days + repair logs ok on opscenter server) before adding the 10th node ? This is important to avoid potential data loss ! Did you set auto_bootstrap to true on this 10th node ?    C*heers,    Alain          2015-06-29 14:54 GMT+02:00 David CHARBONNIER <Da...@rgsystem.com>: Hi,   We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 with a 9 nodes cluster. We need to add a few new nodes to the cluster but we’re experiencing an issue we don’t know how to solve. Here is exactly what we did : -         We had 8 nodes and need to add a few ones -         We tried to add 9th node but stream stucked a very long time and bootstrap never finish (related to streaming_socket_timeout_in_ms default value in cassandra.yaml) -         We ran a solution given by a Datastax’s architect : restart the node with auto_bootstrap set to false and run a repair -         After this issue, we ran into pathing the default configuration on all our nodes to avoid this problem and made a rolling restart of the cluster -         Then, we tried adding a 10th node but it receives stream from only one node (node2).   Here is the logs on this problematic node (node10) : INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 87) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Executing streaming plan for Bootstrap INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node6 INFO [main] 2015-06-26 15:25:59,491 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node5 INFO [main] 2015-06-26 15:25:59,492 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node4 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node3 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node9 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node8 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node7 INFO [main] 2015-06-26 15:25:59,494 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node1 INFO [main] 2015-06-26 15:25:59,494 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node2 INFO [STREAM-IN-/node6] 2015-06-26 15:25:59,515 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node6 is complete INFO [STREAM-IN-/node4] 2015-06-26 15:25:59,516 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node4 is complete INFO [STREAM-IN-/node5] 2015-06-26 15:25:59,517 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node5 is complete INFO [STREAM-IN-/node3] 2015-06-26 15:25:59,527 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node3 is complete INFO [STREAM-IN-/node1] 2015-06-26 15:25:59,528 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node1 is complete INFO [STREAM-IN-/node8] 2015-06-26 15:25:59,530 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node8 is complete INFO [STREAM-IN-/node7] 2015-06-26 15:25:59,531 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node7 is complete INFO [STREAM-IN-/node9] 2015-06-26 15:25:59,533 StreamResultFuture.java (line 186) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Session with /node9 is complete INFO [STREAM-IN-/node2] 2015-06-26 15:26:04,874 StreamResultFuture.java (line 173) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Prepare completed. Receiving 171 files(14844054090 bytes), sending 0 files(0 bytes)   On the other nodes (not node2 which streams data), there is an error telling that node10 has no hostID.   Did you ran into this issue or do you have any idea on how to solve this ?   Thank you for your help.   Best regards,   
|   | 
| David CHARBONNIER  |
| Sysadmin  |
| T : +33 411 934 200  |
| david.charbonnier@rgsystem.com  |

 |  | 
| ZAC Aéroport  |
| 125 Impasse Adam Smith  |
| 34470 Pérols - France  |
| www.rgsystem.com  |

 |

  
|   |