You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2013/06/04 12:08:10 UTC
Can't reach itself
Hi,
I have an issue since switch to multiple DC. I use AWS EC2 instances,
C*1.2.2, 12 nodes eu-west + 6 nodes us-east (new DC).
Datacenter: eu-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
UN public ip 133.43 GB 8.3% ae33d60c-1c24-4c10-b58c-59d06faac5ca
UN public ip 171.3 GB 8.3% bb94c428-c98d-454d-af80-6612548a8125
UN public ip 140.26 GB 8.3% 136bbced-25ed-4a37-abd9-7ab0d146d1c7
UN public ip 132.14 GB 8.3% 086ebf3e-c58f-4b76-b4d5-6600f7b79cf7
UN public ip 178.26 GB 8.3% 9255d30f-848f-4251-800b-2c61b4e0cfbf
UN public ip 153.79 GB 8.3% 7b4fd83a-ca9c-4115-b146-222ab040abd6
UN public ip 146.82 GB 8.3% bf233d59-d7a4-482f-adaf-d48531d16305
UN public ip 151.1 GB 8.3% fa3b617d-5d31-4db2-87bf-494ee8a9f95f
UN public ip 131.78 GB 8.3% dac399dc-ac7c-4ee3-9503-f55e8a9f1675
UN public ip 130.18 GB 8.3% 56b8654a-f8b3-43d4-8b15-2e74d5dfe81b
UN public ip 161.96 GB 8.3% 97624d02-ba48-42e7-88f7-2d3b0175d6ef
UN public ip 130.26 GB 8.3% 868c45b3-4afc-43db-b2d0-5c0f89d018fb
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Owns Host ID
UN public ip 246.74 GB 0.0% 212888f6-ecf8-4953-8f83-c5653fb176cb
UN public ip 320.15 GB 0.0% bcd696da-433b-4e6b-8030-11629eaf5b84
UN public ip 353.22 GB 0.0% 3f5cb04a-3ac3-46f3-b101-31a9ae7682bc
UN public ip 348.91 GB 0.0% 836b3b76-418a-4a22-bab4-c1a0bd49de65
UN public ip 269.37 GB 0.0% 9408c7ff-ec47-4824-af81-92aa311a1984
UN public ip 244.94 GB 0.0% 668eb3ca-8ee4-40ae-98e7-987c471bd675
On each node of the new DC, owns 0% (from status view). A nodetool ring
myks gives me:
Datacenter: eu-west
==========
Replicas: 3
Address Rack Status State Load Owns
Token
public ip 1b Up Normal 131.78 GB 25.00%
113427455640312821154458202477256070485
public ip 1b Up Normal 161.96 GB 25.00%
141784319550391026443072753096570088106
public ip 1b Up Normal 153.43 GB 25.00%
70892159775195513221536376548285044053
public ip 1b Up Normal 151.1 GB 25.00%
99249023685273718510150927167599061674
public ip 1b Up Normal 130.26 GB 25.00%
155962751505430129087380028406227096917
public ip 1b Up Normal 146.82 GB 25.00%
85070591730234615865843651857942052864
public ip 1b Up Normal 171.35 GB 25.00%
14178431955039102644307275309657008810
public ip 1b Up Normal 132.14 GB 25.00%
42535295865117307932921825928971026432
public ip 1b Up Normal 140.26 GB 25.00%
28356863910078205288614550619314017621
public ip 1b Up Normal 133.43 GB 25.00%
0
public ip 1b Up Normal 130.18 GB 25.00%
127605887595351923798765477786913079296
public ip 1b Up Normal 178.27 GB 25.00%
56713727820156410577229101238628035242
Datacenter: us-east
==========
Replicas: 3
Address Rack Status State Load Owns
Token
100
public ip 1b Up Normal 320.15 GB 50.00%
28356863910078205288614550619314017721
public ip 1b Up Normal 353.14 GB 50.00%
56713727820156410577229101238628035342
public ip 1b Up Normal 348.35 GB 50.00%
85070591730234615865843651857942052964
public ip 1b Up Normal 269.35 GB 50.00%
113427455640312821154458202477256070585
public ip 1b Up Normal 244.94 GB 50.00%
141784319550391026443072753096570088206
public ip 1b Up Normal 246.74 GB 50.00%
100
This seems to be ok.
When I run "describe cluster;" from cassandra-cli from an eu-west node :
[default@unknown] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
e968865b-3b96-3c87-af0a-6294067a832f: [My 18 publics ip]
So far so good.
>From an us-east node now :
[default@unknown] describe cluster;
Cluster Information:
Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
Partitioner: org.apache.cassandra.dht.RandomPartitioner
Schema versions:
UNREACHABLE: [public ip of the node itself]
e968865b-3b96-3c87-af0a-6294067a832f: [17 others publics ip]
Why isn't this node not able to see itself ? What port / service is in used
while describing cluster ? I have tried opening all port with no success.
Also tried the following script to help the node finding itself, but it
doesn't seems to work...
--------------------- script
---------------------------------------------------------------------------------------
#!/bin/bash
PUBLIC_IP=$(wget -qO- http://instance-data/latest/meta-data/public-ipv4)
/sbin/ifconfig eth0:1 $PUBLIC_IP netmask 255.255.255.255 broadcast
$PUBLIC_IP
--------------------- end of script
--------------------------------------------------------------------------------------
eth0:1 Link encap:Ethernet HWaddr 12:31:39:22:c1:41
inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:255.255.255.255
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
Interrupt:47
I see a lot of hinted handoff compactions too.
Any clue on what's wrong ?
Re: Can't reach itself
Posted by Alain RODRIGUEZ <ar...@gmail.com>.
"I see a lot of hinted handoff compactions too."
I might have not been clear enough, I see a lot of "compaction of
system.hints" that I interpret as being due to a lot of data that couldn't
reach their destination.
2013/6/4 Alain RODRIGUEZ <ar...@gmail.com>
> Hi,
>
> I have an issue since switch to multiple DC. I use AWS EC2 instances,
> C*1.2.2, 12 nodes eu-west + 6 nodes us-east (new DC).
>
> Datacenter: eu-west
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Owns Host ID
> UN public ip 133.43 GB 8.3% ae33d60c-1c24-4c10-b58c-59d06faac5ca
> UN public ip 171.3 GB 8.3% bb94c428-c98d-454d-af80-6612548a8125
> UN public ip 140.26 GB 8.3% 136bbced-25ed-4a37-abd9-7ab0d146d1c7
> UN public ip 132.14 GB 8.3% 086ebf3e-c58f-4b76-b4d5-6600f7b79cf7
> UN public ip 178.26 GB 8.3% 9255d30f-848f-4251-800b-2c61b4e0cfbf
> UN public ip 153.79 GB 8.3% 7b4fd83a-ca9c-4115-b146-222ab040abd6
> UN public ip 146.82 GB 8.3% bf233d59-d7a4-482f-adaf-d48531d16305
> UN public ip 151.1 GB 8.3% fa3b617d-5d31-4db2-87bf-494ee8a9f95f
> UN public ip 131.78 GB 8.3% dac399dc-ac7c-4ee3-9503-f55e8a9f1675
> UN public ip 130.18 GB 8.3% 56b8654a-f8b3-43d4-8b15-2e74d5dfe81b
> UN public ip 161.96 GB 8.3% 97624d02-ba48-42e7-88f7-2d3b0175d6ef
> UN public ip 130.26 GB 8.3% 868c45b3-4afc-43db-b2d0-5c0f89d018fb
> Datacenter: us-east
> ===================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Owns Host ID
> UN public ip 246.74 GB 0.0% 212888f6-ecf8-4953-8f83-c5653fb176cb
> UN public ip 320.15 GB 0.0% bcd696da-433b-4e6b-8030-11629eaf5b84
> UN public ip 353.22 GB 0.0% 3f5cb04a-3ac3-46f3-b101-31a9ae7682bc
> UN public ip 348.91 GB 0.0% 836b3b76-418a-4a22-bab4-c1a0bd49de65
> UN public ip 269.37 GB 0.0% 9408c7ff-ec47-4824-af81-92aa311a1984
> UN public ip 244.94 GB 0.0% 668eb3ca-8ee4-40ae-98e7-987c471bd675
>
> On each node of the new DC, owns 0% (from status view). A nodetool ring
> myks gives me:
>
> Datacenter: eu-west
> ==========
> Replicas: 3
>
> Address Rack Status State Load Owns
> Token
> public ip 1b Up Normal 131.78 GB 25.00%
> 113427455640312821154458202477256070485
> public ip 1b Up Normal 161.96 GB 25.00%
> 141784319550391026443072753096570088106
> public ip 1b Up Normal 153.43 GB 25.00%
> 70892159775195513221536376548285044053
> public ip 1b Up Normal 151.1 GB 25.00%
> 99249023685273718510150927167599061674
> public ip 1b Up Normal 130.26 GB 25.00%
> 155962751505430129087380028406227096917
> public ip 1b Up Normal 146.82 GB 25.00%
> 85070591730234615865843651857942052864
> public ip 1b Up Normal 171.35 GB 25.00%
> 14178431955039102644307275309657008810
> public ip 1b Up Normal 132.14 GB 25.00%
> 42535295865117307932921825928971026432
> public ip 1b Up Normal 140.26 GB 25.00%
> 28356863910078205288614550619314017621
> public ip 1b Up Normal 133.43 GB 25.00%
> 0
> public ip 1b Up Normal 130.18 GB 25.00%
> 127605887595351923798765477786913079296
> public ip 1b Up Normal 178.27 GB 25.00%
> 56713727820156410577229101238628035242
>
> Datacenter: us-east
> ==========
> Replicas: 3
>
> Address Rack Status State Load Owns
> Token
>
> 100
> public ip 1b Up Normal 320.15 GB 50.00%
> 28356863910078205288614550619314017721
> public ip 1b Up Normal 353.14 GB 50.00%
> 56713727820156410577229101238628035342
> public ip 1b Up Normal 348.35 GB 50.00%
> 85070591730234615865843651857942052964
> public ip 1b Up Normal 269.35 GB 50.00%
> 113427455640312821154458202477256070585
> public ip 1b Up Normal 244.94 GB 50.00%
> 141784319550391026443072753096570088206
> public ip 1b Up Normal 246.74 GB 50.00%
> 100
>
> This seems to be ok.
>
> When I run "describe cluster;" from cassandra-cli from an eu-west node :
>
> [default@unknown] describe cluster;
> Cluster Information:
> Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
> Partitioner: org.apache.cassandra.dht.RandomPartitioner
> Schema versions:
> e968865b-3b96-3c87-af0a-6294067a832f: [My 18 publics ip]
>
> So far so good.
> From an us-east node now :
>
> [default@unknown] describe cluster;
> Cluster Information:
> Snitch: org.apache.cassandra.locator.Ec2MultiRegionSnitch
> Partitioner: org.apache.cassandra.dht.RandomPartitioner
> Schema versions:
> UNREACHABLE: [public ip of the node itself]
>
> e968865b-3b96-3c87-af0a-6294067a832f: [17 others publics ip]
>
>
> Why isn't this node not able to see itself ? What port / service is in
> used while describing cluster ? I have tried opening all port with no
> success. Also tried the following script to help the node finding itself,
> but it doesn't seems to work...
>
> --------------------- script
> ---------------------------------------------------------------------------------------
> #!/bin/bash
> PUBLIC_IP=$(wget -qO- http://instance-data/latest/meta-data/public-ipv4)
> /sbin/ifconfig eth0:1 $PUBLIC_IP netmask 255.255.255.255 broadcast
> $PUBLIC_IP
>
> --------------------- end of script
> --------------------------------------------------------------------------------------
>
> eth0:1 Link encap:Ethernet HWaddr 12:31:39:22:c1:41
> inet addr:xx.xx.xx.xx Bcast:xx.xx.xx.xx Mask:255.255.255.255
> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
> Interrupt:47
>
>
> I see a lot of hinted handoff compactions too.
>
> Any clue on what's wrong ?
>