You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Sasha Dolgy <sd...@gmail.com> on 2011/09/12 20:39:35 UTC

AntiEntropyService.getNeighbors pulls information from where?

This relates to the issue i opened the other day:
https://issues.apache.org/jira/browse/CASSANDRA-3175 ..  basically,
'nodetool ring' throws an exception on two of the four nodes.

In my fancy little world, the problems appear to be related to one of
the nodes thinking that someone is their neighbor ... and that someone
moved away a long time ago............

/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5]
2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7]
2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
/mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9]
2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not
proceed on repair because a neighbor (/10.130.185.136) is dead:
manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.

Appears only in the logs for one node that is generating the issue. 172.16.12.10

Where do I find where the AntiEntropyService.getNeighbors(tablename,
range) is pulling it's information from?

On the two nodes that work:

[default@system] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
172.16.14.12, 172.16.14.10]
[default@system]

>From the two nodes that don't work:

[default@unknown] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2Snitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
172.16.14.12, 172.16.14.10]
UNREACHABLE: [10.130.185.136] --> which is really 172.16.14.10
[default@unknown]

Really now.  Where does 10.130.185.136 exist?  It's in none of the
configurations I have AND the full ring has been shut down and started
up ... not trying to give Vijay a hard time by posting here btw!

Just thinking it could be something super silly ... that a wider
audience has come across.

-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: AntiEntropyService.getNeighbors pulls information from where?

Posted by Sasha Dolgy <sd...@gmail.com>.
use system;
del LocationInfo[52696e67];

i ran this on the nodes that had the problems.  stopped, started the
nodes, it re-did it's job .... job done.  all fixed with a new bug!
https://issues.apache.org/jira/browse/CASSANDRA-3186

On Tue, Sep 13, 2011 at 2:09 AM, aaron morton <aa...@thelastpickle.com> wrote:
> I'm pretty sure I'm behind on how to deal with this problem.
>
> Best I know is to start the node with "-Dcassandra.load_ring_state=false" as a JVM option. But if the ghost IP address is in gossip it will not work, and it should be in gossip.
>
> Does the ghost IP show up in nodetool ring ?
>
> Anyone know a way to remove a ghost IP from gossip that does not have a token associated with it ?
>
> Cheers
>
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> @aaronmorton
> http://www.thelastpickle.com
>
> On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote:
>
>> This relates to the issue i opened the other day:
>> https://issues.apache.org/jira/browse/CASSANDRA-3175 ..  basically,
>> 'nodetool ring' throws an exception on two of the four nodes.
>>
>> In my fancy little world, the problems appear to be related to one of
>> the nodes thinking that someone is their neighbor ... and that someone
>> moved away a long time ago............
>>
>> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5]
>> 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not
>> proceed on repair because a neighbor (/10.130.185.136) is dead:
>> manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
>> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7]
>> 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not
>> proceed on repair because a neighbor (/10.130.185.136) is dead:
>> manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
>> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9]
>> 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not
>> proceed on repair because a neighbor (/10.130.185.136) is dead:
>> manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.
>>
>> Appears only in the logs for one node that is generating the issue. 172.16.12.10
>>
>> Where do I find where the AntiEntropyService.getNeighbors(tablename,
>> range) is pulling it's information from?
>>
>> On the two nodes that work:
>>
>> [default@system] describe cluster;
>> Cluster Information:
>> Snitch: org.apache.cassandra.locator.Ec2Snitch
>> Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> Schema versions:
>> 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
>> 172.16.14.12, 172.16.14.10]
>> [default@system]
>>
>> From the two nodes that don't work:
>>
>> [default@unknown] describe cluster;
>> Cluster Information:
>> Snitch: org.apache.cassandra.locator.Ec2Snitch
>> Partitioner: org.apache.cassandra.dht.RandomPartitioner
>> Schema versions:
>> 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
>> 172.16.14.12, 172.16.14.10]
>> UNREACHABLE: [10.130.185.136] --> which is really 172.16.14.10
>> [default@unknown]
>>
>> Really now.  Where does 10.130.185.136 exist?  It's in none of the
>> configurations I have AND the full ring has been shut down and started
>> up ... not trying to give Vijay a hard time by posting here btw!
>>
>> Just thinking it could be something super silly ... that a wider
>> audience has come across.
>>
>> --
>> Sasha Dolgy
>> sasha.dolgy@gmail.com
>
>



-- 
Sasha Dolgy
sasha.dolgy@gmail.com

Re: AntiEntropyService.getNeighbors pulls information from where?

Posted by aaron morton <aa...@thelastpickle.com>.
I'm pretty sure I'm behind on how to deal with this problem. 

Best I know is to start the node with "-Dcassandra.load_ring_state=false" as a JVM option. But if the ghost IP address is in gossip it will not work, and it should be in gossip.

Does the ghost IP show up in nodetool ring ? 

Anyone know a way to remove a ghost IP from gossip that does not have a token associated with it ?

Cheers
  
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 13/09/2011, at 6:39 AM, Sasha Dolgy wrote:

> This relates to the issue i opened the other day:
> https://issues.apache.org/jira/browse/CASSANDRA-3175 ..  basically,
> 'nodetool ring' throws an exception on two of the four nodes.
> 
> In my fancy little world, the problems appear to be related to one of
> the nodes thinking that someone is their neighbor ... and that someone
> moved away a long time ago............
> 
> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:5]
> 2011-09-10 21:20:02,182 AntiEntropyService.java (line 658) Could not
> proceed on repair because a neighbor (/10.130.185.136) is dead:
> manual-repair-d8cdb59a-04a4-4596-b73f-cba3bd2b9eab failed.
> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:7]
> 2011-09-11 21:20:02,258 AntiEntropyService.java (line 658) Could not
> proceed on repair because a neighbor (/10.130.185.136) is dead:
> manual-repair-ad17e938-f474-469c-9180-d88a9007b6b9 failed.
> /mnt/cassandra/logs/system.log: INFO [AntiEntropySessions:9]
> 2011-09-12 21:20:02,256 AntiEntropyService.java (line 658) Could not
> proceed on repair because a neighbor (/10.130.185.136) is dead:
> manual-repair-636150a5-4f0e-45b7-b400-24d8471a1c88 failed.
> 
> Appears only in the logs for one node that is generating the issue. 172.16.12.10
> 
> Where do I find where the AntiEntropyService.getNeighbors(tablename,
> range) is pulling it's information from?
> 
> On the two nodes that work:
> 
> [default@system] describe cluster;
> Cluster Information:
> Snitch: org.apache.cassandra.locator.Ec2Snitch
> Partitioner: org.apache.cassandra.dht.RandomPartitioner
> Schema versions:
> 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
> 172.16.14.12, 172.16.14.10]
> [default@system]
> 
> From the two nodes that don't work:
> 
> [default@unknown] describe cluster;
> Cluster Information:
> Snitch: org.apache.cassandra.locator.Ec2Snitch
> Partitioner: org.apache.cassandra.dht.RandomPartitioner
> Schema versions:
> 1b871300-dbdc-11e0-0000-564008fe649f: [172.16.12.10, 172.16.12.11,
> 172.16.14.12, 172.16.14.10]
> UNREACHABLE: [10.130.185.136] --> which is really 172.16.14.10
> [default@unknown]
> 
> Really now.  Where does 10.130.185.136 exist?  It's in none of the
> configurations I have AND the full ring has been shut down and started
> up ... not trying to give Vijay a hard time by posting here btw!
> 
> Just thinking it could be something super silly ... that a wider
> audience has come across.
> 
> -- 
> Sasha Dolgy
> sasha.dolgy@gmail.com