You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alexandru Sicoe <ad...@gmail.com> on 2012/06/25 11:57:02 UTC

repair never finishing 1.0.7

Hello everyone,

 I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about
300GB/node in the DC2.

 The DCs are communicating over a gateway where I do NAT for ports 7000,
9160 and 7199.

 I did a "nodetool repair" on a node in DC2 without any external load on
the system.

 It took 5 hrs to finish the Merkle tree calculations (which is fine for
me) but then in the streaming phase nothing happens (0% seen in "nodetool
netstats") and stays like that forever. Note: it has to stream to/from
nodes in DC1!

 I tried another time and still the same.

 Looking around I found this thread

http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
 which seems to describe the same problem.

The thread gives 2 suggestions:
- a full cluster restart allows the first attempted repair to complete
(haven't tested yet; this is not practical even if it works)
- issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the
problem

Questions:
1) How can I make sure that the JIRA issue above is my real problem? (I see
no errors or warns in the logs; no other activity)
2) What should I do to make the repairs work? (If the JIRA issue is the
problem, then I see there is a fix for it in Version 1.0.11 which is not
released yet)

Thanks,
Alex

Re: repair never finishing 1.0.7

Posted by aaron morton <aa...@thelastpickle.com>.
The nodes in DC1 need to be able to reach the nodes in DC2 on the public (NAT'd) IP.  

Others may be able to provide some more details . 

Cheers

-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 27/06/2012, at 9:51 PM, Andras Szerdahelyi wrote:

> Aaron,
> 
>> The broadcast_address allows a node to broadcast an address that is different to the ones it's bound to on the local interfaces https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270
> 
> 
> Yes and thats not where the problem is IMO.. If you broadcast your translated address ( say 1.2.3.4, a public ip ) , nodes outside your VPN'd network will have no problems connecting as long as they can route to this address ( which they should ), but any other nodes on the local net ( e.g. 10.0.1.2 ) won't be able to connect/route to their neighbor who's telling them to open the return socket to 1.2.3.4
> 
> Am i getting this right? At least this is what i have experienced not so long ago:
> 
> DC1 nodes
> a) 10.0.1.1 translated to 1.2.3.4 on NAT
> b) 10.0.1.2 translated to 1.2.3.5 on NAT 
> 
> DC2 nodes
> a) 10.0.2.1 translated to 1.2.4.4 on NAT
> b) 10.0.2.2 translated to 1.2.4.5 on NAT
> 
> Let's assume DC2 nodes' broadcast_addresses are their public addresses.
> 
> if, DC1:a and DC1:b broadcast their public address, 1.2.3.4 and 1.2.3.5, they are advertising an address that is not routable on their network ( loopback ) but DC2:a and DC2:b can connect/route to them just fine. Nodetool ring on any DC1 node says the others in DC1 are down, everything else is up . Nodetool ring on any DC2 node says everything is up.
> 
> if DC1:a and DC1:b broadcast their private address, they can connect to each other fine, but  DC2:a and DC2:b will have no chance to route to them. Nodetool ring on any DC1 node says everything is up. Nodetool ring on any DC2 node says DC1 nodes are down.
> 
> regards,
> Andras
> 
> 
> 
> 
> On 27 Jun 2012, at 11:29, aaron morton wrote:
> 
>>> Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. 
>> The broadcast_address allows a node to broadcast an address that is different to the ones it's bound to on the local interfaces https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270
>> 
>>  1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
>>> 
>>>> 
>> If the errors are not there it is not your problem. 
>> 
>>>> - a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
>> Rolling restart of the nodes involved in the repair is sufficient. 
>> 
>> Double checking the networking and check the logs on both sides of the transfer for errors or warnings. The code around streaming is better at failing loudly now days. 
>> 
>> If you dont see anything set DEBUG logging on org.apache.cassandra.streaming.FileStreamTask. That will let you know if things start and progress. 
>> 
>> Hope that helps. 
>> 
>> 
>> -----------------
>> Aaron Morton
>> Freelance Developer
>> @aaronmorton
>> http://www.thelastpickle.com
>> 
>> On 26/06/2012, at 6:16 PM, Alexandru Sicoe wrote:
>> 
>>> Hi Andras,
>>> 
>>> I am not using a VPN. The system has been running successfully in this configuration for a couple of weeks until I noticed the repair is not working.
>>> 
>>> What happens is that I configure the IP Tables of the machine on each Cassandra node to forward packets that are sent to any of the IPs in the other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The gateway does the NAT sending the packets on the other side to the real destination IP, having replaced the source IP with the initial sender's IP (at least in my understanding of it). 
>>> 
>>> What might be the problem given the configuration? How to fix this?
>>> 
>>> Cheers,
>>> Alex
>>> 
>>> On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <an...@ignitionone.com> wrote:
>>> 
>>>>  The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.
>>> 
>>> 
>>> Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll assume you are not.
>>> 
>>> So, your nodes are behind network address translation - is that to say they advertise ( broadcast ) their internal or translated/forwarded IP to each other? Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. Either the nodes on your local network won't be able to communicate with each other, because they broadcast their translated ( public ) address which is normally ( router configuration ) not routable from within the local network, or the nodes broadcast their internal IP, in which case the "outside" nodes are helpless in trying to connect to a local net. On DC2 nodes/the node you issue the repair on, check for any sockets being opened to the internal addresses of the nodes in DC1.
>>> 
>>> 
>>> regards,
>>> Andras
>>> 
>>> 
>>> 
>>> On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:
>>> 
>>>> Hello everyone,
>>>> 
>>>>  I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2. 
>>>> 
>>>>  The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.
>>>> 
>>>>  I did a "nodetool repair" on a node in DC2 without any external load on the system. 
>>>> 
>>>>  It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in "nodetool netstats") and stays like that forever. Note: it has to stream to/from nodes in DC1!
>>>> 
>>>>  I tried another time and still the same.
>>>> 
>>>>  Looking around I found this thread  
>>>>              http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
>>>>  which seems to describe the same problem.
>>>> 
>>>> The thread gives 2 suggestions:
>>>> - a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
>>>> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem 
>>>> 
>>>> Questions:
>>>> 1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
>>>> 2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet)
>>>> 
>>>> Thanks,
>>>> Alex
>>> 
>>> 
>> 
> 


Re: repair never finishing 1.0.7

Posted by Andras Szerdahelyi <an...@ignitionone.com>.
Aaron,

The broadcast_address allows a node to broadcast an address that is different to the ones it's bound to on the local interfaces https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270

Yes and thats not where the problem is IMO.. If you broadcast your translated address ( say 1.2.3.4, a public ip ) , nodes outside your VPN'd network will have no problems connecting as long as they can route to this address ( which they should ), but any other nodes on the local net ( e.g. 10.0.1.2 ) won't be able to connect/route to their neighbor who's telling them to open the return socket to 1.2.3.4

Am i getting this right? At least this is what i have experienced not so long ago:

DC1 nodes
a) 10.0.1.1 translated to 1.2.3.4 on NAT
b) 10.0.1.2 translated to 1.2.3.5 on NAT

DC2 nodes
a) 10.0.2.1 translated to 1.2.4.4 on NAT
b) 10.0.2.2 translated to 1.2.4.5 on NAT

Let's assume DC2 nodes' broadcast_addresses are their public addresses.

if, DC1:a and DC1:b broadcast their public address, 1.2.3.4 and 1.2.3.5, they are advertising an address that is not routable on their network ( loopback ) but DC2:a and DC2:b can connect/route to them just fine. Nodetool ring on any DC1 node says the others in DC1 are down, everything else is up . Nodetool ring on any DC2 node says everything is up.

if DC1:a and DC1:b broadcast their private address, they can connect to each other fine, but  DC2:a and DC2:b will have no chance to route to them. Nodetool ring on any DC1 node says everything is up. Nodetool ring on any DC2 node says DC1 nodes are down.

regards,
Andras




On 27 Jun 2012, at 11:29, aaron morton wrote:

Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience.
The broadcast_address allows a node to broadcast an address that is different to the ones it's bound to on the local interfaces https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270

 1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)

If the errors are not there it is not your problem.

- a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
Rolling restart of the nodes involved in the repair is sufficient.

Double checking the networking and check the logs on both sides of the transfer for errors or warnings. The code around streaming is better at failing loudly now days.

If you dont see anything set DEBUG logging on org.apache.cassandra.streaming.FileStreamTask. That will let you know if things start and progress.

Hope that helps.


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com<http://www.thelastpickle.com/>

On 26/06/2012, at 6:16 PM, Alexandru Sicoe wrote:

Hi Andras,

I am not using a VPN. The system has been running successfully in this configuration for a couple of weeks until I noticed the repair is not working.

What happens is that I configure the IP Tables of the machine on each Cassandra node to forward packets that are sent to any of the IPs in the other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The gateway does the NAT sending the packets on the other side to the real destination IP, having replaced the source IP with the initial sender's IP (at least in my understanding of it).

What might be the problem given the configuration? How to fix this?

Cheers,
Alex

On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <an...@ignitionone.com>> wrote:

 The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.

Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll assume you are not.

So, your nodes are behind network address translation - is that to say they advertise ( broadcast ) their internal or translated/forwarded IP to each other? Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. Either the nodes on your local network won't be able to communicate with each other, because they broadcast their translated ( public ) address which is normally ( router configuration ) not routable from within the local network, or the nodes broadcast their internal IP, in which case the "outside" nodes are helpless in trying to connect to a local net. On DC2 nodes/the node you issue the repair on, check for any sockets being opened to the internal addresses of the nodes in DC1.


regards,
Andras



On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:

Hello everyone,

 I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2.

 The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.

 I did a "nodetool repair" on a node in DC2 without any external load on the system.

 It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in "nodetool netstats") and stays like that forever. Note: it has to stream to/from nodes in DC1!

 I tried another time and still the same.

 Looking around I found this thread
             http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
 which seems to describe the same problem.

The thread gives 2 suggestions:
- a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
- issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem

Questions:
1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet)

Thanks,
Alex





Re: repair never finishing 1.0.7

Posted by aaron morton <aa...@thelastpickle.com>.
> Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. 
The broadcast_address allows a node to broadcast an address that is different to the ones it's bound to on the local interfaces https://github.com/apache/cassandra/blob/trunk/conf/cassandra.yaml#L270

 1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
> 
>> 
If the errors are not there it is not your problem. 

>> - a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
Rolling restart of the nodes involved in the repair is sufficient. 

Double checking the networking and check the logs on both sides of the transfer for errors or warnings. The code around streaming is better at failing loudly now days. 

If you dont see anything set DEBUG logging on org.apache.cassandra.streaming.FileStreamTask. That will let you know if things start and progress. 

Hope that helps. 


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 26/06/2012, at 6:16 PM, Alexandru Sicoe wrote:

> Hi Andras,
> 
> I am not using a VPN. The system has been running successfully in this configuration for a couple of weeks until I noticed the repair is not working.
> 
> What happens is that I configure the IP Tables of the machine on each Cassandra node to forward packets that are sent to any of the IPs in the other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The gateway does the NAT sending the packets on the other side to the real destination IP, having replaced the source IP with the initial sender's IP (at least in my understanding of it). 
> 
> What might be the problem given the configuration? How to fix this?
> 
> Cheers,
> Alex
> 
> On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <an...@ignitionone.com> wrote:
> 
>>  The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.
> 
> 
> Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll assume you are not.
> 
> So, your nodes are behind network address translation - is that to say they advertise ( broadcast ) their internal or translated/forwarded IP to each other? Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. Either the nodes on your local network won't be able to communicate with each other, because they broadcast their translated ( public ) address which is normally ( router configuration ) not routable from within the local network, or the nodes broadcast their internal IP, in which case the "outside" nodes are helpless in trying to connect to a local net. On DC2 nodes/the node you issue the repair on, check for any sockets being opened to the internal addresses of the nodes in DC1.
> 
> 
> regards,
> Andras
> 
> 
> 
> On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:
> 
>> Hello everyone,
>> 
>>  I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2. 
>> 
>>  The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.
>> 
>>  I did a "nodetool repair" on a node in DC2 without any external load on the system. 
>> 
>>  It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in "nodetool netstats") and stays like that forever. Note: it has to stream to/from nodes in DC1!
>> 
>>  I tried another time and still the same.
>> 
>>  Looking around I found this thread  
>>              http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
>>  which seems to describe the same problem.
>> 
>> The thread gives 2 suggestions:
>> - a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
>> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem 
>> 
>> Questions:
>> 1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
>> 2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet)
>> 
>> Thanks,
>> Alex
> 
> 


Re: repair never finishing 1.0.7

Posted by Alexandru Sicoe <ad...@gmail.com>.
Hi Andras,

I am not using a VPN. The system has been running successfully in this
configuration for a couple of weeks until I noticed the repair is not
working.

What happens is that I configure the IP Tables of the machine on each
Cassandra node to forward packets that are sent to any of the IPs in the
other DC (on ports 7000, 9160 and 7199)  to be sent to the gateway IP. The
gateway does the NAT sending the packets on the other side to the real
destination IP, having replaced the source IP with the initial sender's IP
(at least in my understanding of it).

What might be the problem given the configuration? How to fix this?

Cheers,
Alex

On Mon, Jun 25, 2012 at 12:47 PM, Andras Szerdahelyi <
andras.szerdahelyi@ignitionone.com> wrote:

>
>   The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>
>  Ah, that sounds familiar. You don't mention if you are VPN'd or not.
> I'll assume you are not.
>
>  So, your nodes are behind network address translation - is that to say
> they advertise ( broadcast ) their internal or translated/forwarded IP to
> each other? Setting up a Cassandra ring across NAT ( without a VPN ) is
> impossible in my experience. Either the nodes on your local network won't
> be able to communicate with each other, because they broadcast their
> translated ( public ) address which is normally ( router configuration )
> not routable from within the local network, or the nodes broadcast their
> internal IP, in which case the "outside" nodes are helpless in trying to
> connect to a local net. On DC2 nodes/the node you issue the repair on,
> check for any sockets being opened to the internal addresses of the nodes
> in DC1.
>
>
>  regards,
> Andras
>
>
>
>  On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:
>
> Hello everyone,
>
>  I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about
> 300GB/node in the DC2.
>
>  The DCs are communicating over a gateway where I do NAT for ports 7000,
> 9160 and 7199.
>
>  I did a "nodetool repair" on a node in DC2 without any external load on
> the system.
>
>  It took 5 hrs to finish the Merkle tree calculations (which is fine for
> me) but then in the streaming phase nothing happens (0% seen in "nodetool
> netstats") and stays like that forever. Note: it has to stream to/from
> nodes in DC1!
>
>  I tried another time and still the same.
>
>  Looking around I found this thread
>
> http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
>  which seems to describe the same problem.
>
> The thread gives 2 suggestions:
> - a full cluster restart allows the first attempted repair to complete
> (haven't tested yet; this is not practical even if it works)
> - issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the
> problem
>
> Questions:
> 1) How can I make sure that the JIRA issue above is my real problem? (I
> see no errors or warns in the logs; no other activity)
> 2) What should I do to make the repairs work? (If the JIRA issue is the
> problem, then I see there is a fix for it in Version 1.0.11 which is not
> released yet)
>
> Thanks,
> Alex
>
>
>

Re: repair never finishing 1.0.7

Posted by Andras Szerdahelyi <an...@ignitionone.com>.
 The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.

Ah, that sounds familiar. You don't mention if you are VPN'd or not. I'll assume you are not.

So, your nodes are behind network address translation - is that to say they advertise ( broadcast ) their internal or translated/forwarded IP to each other? Setting up a Cassandra ring across NAT ( without a VPN ) is impossible in my experience. Either the nodes on your local network won't be able to communicate with each other, because they broadcast their translated ( public ) address which is normally ( router configuration ) not routable from within the local network, or the nodes broadcast their internal IP, in which case the "outside" nodes are helpless in trying to connect to a local net. On DC2 nodes/the node you issue the repair on, check for any sockets being opened to the internal addresses of the nodes in DC1.


regards,
Andras



On 25 Jun 2012, at 11:57, Alexandru Sicoe wrote:

Hello everyone,

 I have a 2 DC (DC1:3 and DC2:6) Cassandra1.0.7 setup. I have about 300GB/node in the DC2.

 The DCs are communicating over a gateway where I do NAT for ports 7000, 9160 and 7199.

 I did a "nodetool repair" on a node in DC2 without any external load on the system.

 It took 5 hrs to finish the Merkle tree calculations (which is fine for me) but then in the streaming phase nothing happens (0% seen in "nodetool netstats") and stays like that forever. Note: it has to stream to/from nodes in DC1!

 I tried another time and still the same.

 Looking around I found this thread
             http://www.mail-archive.com/user@cassandra.apache.org/msg22167.html
 which seems to describe the same problem.

The thread gives 2 suggestions:
- a full cluster restart allows the first attempted repair to complete (haven't tested yet; this is not practical even if it works)
- issue https://issues.apache.org/jira/browse/CASSANDRA-4223 can be the problem

Questions:
1) How can I make sure that the JIRA issue above is my real problem? (I see no errors or warns in the logs; no other activity)
2) What should I do to make the repairs work? (If the JIRA issue is the problem, then I see there is a fix for it in Version 1.0.11 which is not released yet)

Thanks,
Alex