You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Oleg Dulin <ol...@gmail.com> on 2013/09/27 12:23:45 UTC
Unbalanced ring mystery multi-DC issue with 1.1.11
Consider this output from nodetool ring:
Address DC Rack Status State Load
Effective-Ownership Token
127605887595351923798765477786913079396
dc1.5 DC1 RAC1 Up Normal 32.07 GB 50.00%
0
dc2.100 DC2 RAC1 Up Normal 8.21 GB 50.00%
100
dc1.6 DC1 RAC1 Up Normal 32.82 GB 50.00%
42535295865117307932921825928971026432
dc2.101 DC2 RAC1 Up Normal 12.41 GB 50.00%
42535295865117307932921825928971026532
dc1.7 DC1 RAC1 Up Normal 28.37 GB 50.00%
85070591730234615865843651857942052864
dc2.102 DC2 RAC1 Up Normal 12.27 GB 50.00%
85070591730234615865843651857942052964
dc1.8 DC1 RAC1 Up Normal 27.34 GB 50.00%
127605887595351923798765477786913079296
dc2.103 DC2 RAC1 Up Normal 13.46 GB 50.00%
127605887595351923798765477786913079396
I concealed IPs and DC names for confidentiality.
All of the data loading was happening against DC1 at a pretty brisk
rate, of, say, 200K writes per minute.
Note how my tokens are offset by 100. Shouldn't that mean that load on
each node should be roughly identical ? In DC1 it is roughly around 30
G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by
token range.
To verify that the nodes are in sync, I ran nodetool -h localhost
repair MyKeySpace --partitioner-range on each node in DC2. Watching the
logs, I see that the repair went really quick and all column families
are in sync!
I need help making sense of this. Is this because DC1 is not fully
compacted ? Is it because DC2 is not fully synced and I am not checking
correctly ? How can I tell that there is still replication going on in
progress (note, I started my load yesterday at 9:50am).
--
Regards,
Oleg Dulin
http://www.olegdulin.com
Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Posted by Aaron Morton <aa...@thelastpickle.com>.
Check the logs for messages about nodes going up and down, and also look at the MessagingService MBean for timeouts. If the node in DR 2 times out replying to DR1 the DR1 node will store a hint.
Also when hints are stored they are TTL'd to the gc_grace_seconds for the CF (IIRC). If that's low the hints may not have been delivered.
Am not aware of any specific tracking for failed hints other than log messages.
A
-----------------
Aaron Morton
New Zealand
@aaronmorton
Co-Founder & Principal Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com
On 28/09/2013, at 12:01 AM, Oleg Dulin <ol...@gmail.com> wrote:
> Here is some more information.
>
> I am running full repair on one of the nodes and I am observing strange behavior.
>
> Both DCs were up during the data load. But repair is reporting a lot of out-of-sync data. Why would that be ? Is there a way for me to tell that WAN may be dropping hinted handoff traffic ?
>
> Regards,
> Oleg
>
> On 2013-09-27 10:35:34 +0000, Oleg Dulin said:
>
>> Wanted to add one more thing:
>> I can also tell that the numbers are not consistent across DRs this way -- I have a column family with really wide rows (a couple million columns).
>> DC1 reports higher column counts than DC2. DC2 only becomes consistent after I do the command a couple of times and trigger a read-repair. But why would nodetool repair logs show that everything is in sync ?
>> Regards,
>> Oleg
>> On 2013-09-27 10:23:45 +0000, Oleg Dulin said:
>>> Consider this output from nodetool ring:
>>> Address DC Rack Status State Load Effective-Ownership Token
>>> 127605887595351923798765477786913079396
>>> dc1.5 DC1 RAC1 Up Normal 32.07 GB 50.00% 0
>>> dc2.100 DC2 RAC1 Up Normal 8.21 GB 50.00% 100
>>> dc1.6 DC1 RAC1 Up Normal 32.82 GB 50.00% 42535295865117307932921825928971026432
>>> dc2.101 DC2 RAC1 Up Normal 12.41 GB 50.00% 42535295865117307932921825928971026532
>>> dc1.7 DC1 RAC1 Up Normal 28.37 GB 50.00% 85070591730234615865843651857942052864
>>> dc2.102 DC2 RAC1 Up Normal 12.27 GB 50.00% 85070591730234615865843651857942052964
>>> dc1.8 DC1 RAC1 Up Normal 27.34 GB 50.00% 127605887595351923798765477786913079296
>>> dc2.103 DC2 RAC1 Up Normal 13.46 GB 50.00% 127605887595351923798765477786913079396
>>> I concealed IPs and DC names for confidentiality.
>>> All of the data loading was happening against DC1 at a pretty brisk rate, of, say, 200K writes per minute.
>>> Note how my tokens are offset by 100. Shouldn't that mean that load on each node should be roughly identical ? In DC1 it is roughly around 30 G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by token range.
>>> To verify that the nodes are in sync, I ran nodetool -h localhost repair MyKeySpace --partitioner-range on each node in DC2. Watching the logs, I see that the repair went really quick and all column families are in sync!
>>> I need help making sense of this. Is this because DC1 is not fully compacted ? Is it because DC2 is not fully synced and I am not checking correctly ? How can I tell that there is still replication going on in progress (note, I started my load yesterday at 9:50am).
>
>
> --
> Regards,
> Oleg Dulin
> http://www.olegdulin.com
>
>
Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Posted by Oleg Dulin <ol...@gmail.com>.
Here is some more information.
I am running full repair on one of the nodes and I am observing strange
behavior.
Both DCs were up during the data load. But repair is reporting a lot of
out-of-sync data. Why would that be ? Is there a way for me to tell
that WAN may be dropping hinted handoff traffic ?
Regards,
Oleg
On 2013-09-27 10:35:34 +0000, Oleg Dulin said:
> Wanted to add one more thing:
>
> I can also tell that the numbers are not consistent across DRs this way
> -- I have a column family with really wide rows (a couple million
> columns).
>
> DC1 reports higher column counts than DC2. DC2 only becomes consistent
> after I do the command a couple of times and trigger a read-repair. But
> why would nodetool repair logs show that everything is in sync ?
>
> Regards,
> Oleg
>
> On 2013-09-27 10:23:45 +0000, Oleg Dulin said:
>
>> Consider this output from nodetool ring:
>>
>> Address DC Rack Status State Load
>> Effective-Ownership Token
>>
>> 127605887595351923798765477786913079396
>> dc1.5 DC1 RAC1 Up Normal 32.07 GB 50.00% 0
>> dc2.100 DC2 RAC1 Up Normal 8.21 GB 50.00% 100
>> dc1.6 DC1 RAC1 Up Normal 32.82 GB 50.00%
>> 42535295865117307932921825928971026432
>> dc2.101 DC2 RAC1 Up Normal 12.41 GB 50.00%
>> 42535295865117307932921825928971026532
>> dc1.7 DC1 RAC1 Up Normal 28.37 GB 50.00%
>> 85070591730234615865843651857942052864
>> dc2.102 DC2 RAC1 Up Normal 12.27 GB 50.00%
>> 85070591730234615865843651857942052964
>> dc1.8 DC1 RAC1 Up Normal 27.34 GB 50.00%
>> 127605887595351923798765477786913079296
>> dc2.103 DC2 RAC1 Up Normal 13.46 GB 50.00%
>> 127605887595351923798765477786913079396
>>
>> I concealed IPs and DC names for confidentiality.
>>
>> All of the data loading was happening against DC1 at a pretty brisk
>> rate, of, say, 200K writes per minute.
>>
>> Note how my tokens are offset by 100. Shouldn't that mean that load on
>> each node should be roughly identical ? In DC1 it is roughly around 30
>> G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by
>> token range.
>>
>> To verify that the nodes are in sync, I ran nodetool -h localhost
>> repair MyKeySpace --partitioner-range on each node in DC2. Watching the
>> logs, I see that the repair went really quick and all column families
>> are in sync!
>>
>> I need help making sense of this. Is this because DC1 is not fully
>> compacted ? Is it because DC2 is not fully synced and I am not checking
>> correctly ? How can I tell that there is still replication going on in
>> progress (note, I started my load yesterday at 9:50am).
--
Regards,
Oleg Dulin
http://www.olegdulin.com
Re: Unbalanced ring mystery multi-DC issue with 1.1.11
Posted by Oleg Dulin <ol...@gmail.com>.
Wanted to add one more thing:
I can also tell that the numbers are not consistent across DRs this way
-- I have a column family with really wide rows (a couple million
columns).
DC1 reports higher column counts than DC2. DC2 only becomes consistent
after I do the command a couple of times and trigger a read-repair. But
why would nodetool repair logs show that everything is in sync ?
Regards,
Oleg
On 2013-09-27 10:23:45 +0000, Oleg Dulin said:
> Consider this output from nodetool ring:
>
> Address DC Rack Status State Load
> Effective-Ownership Token
>
> 127605887595351923798765477786913079396
> dc1.5 DC1 RAC1 Up Normal 32.07 GB 50.00%
> 0
> dc2.100 DC2 RAC1 Up Normal 8.21 GB 50.00%
> 100
> dc1.6 DC1 RAC1 Up Normal 32.82 GB 50.00%
> 42535295865117307932921825928971026432
> dc2.101 DC2 RAC1 Up Normal 12.41 GB 50.00%
> 42535295865117307932921825928971026532
> dc1.7 DC1 RAC1 Up Normal 28.37 GB 50.00%
> 85070591730234615865843651857942052864
> dc2.102 DC2 RAC1 Up Normal 12.27 GB 50.00%
> 85070591730234615865843651857942052964
> dc1.8 DC1 RAC1 Up Normal 27.34 GB 50.00%
> 127605887595351923798765477786913079296
> dc2.103 DC2 RAC1 Up Normal 13.46 GB 50.00%
> 127605887595351923798765477786913079396
>
> I concealed IPs and DC names for confidentiality.
>
> All of the data loading was happening against DC1 at a pretty brisk
> rate, of, say, 200K writes per minute.
>
> Note how my tokens are offset by 100. Shouldn't that mean that load on
> each node should be roughly identical ? In DC1 it is roughly around 30
> G on each node. In DC2 it is almost 1/3rd of the nearest DC1 node by
> token range.
>
> To verify that the nodes are in sync, I ran nodetool -h localhost
> repair MyKeySpace --partitioner-range on each node in DC2. Watching the
> logs, I see that the repair went really quick and all column families
> are in sync!
>
> I need help making sense of this. Is this because DC1 is not fully
> compacted ? Is it because DC2 is not fully synced and I am not checking
> correctly ? How can I tell that there is still replication going on in
> progress (note, I started my load yesterday at 9:50am).
--
Regards,
Oleg Dulin
http://www.olegdulin.com