You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2012/12/18 18:26:49 UTC
High disk read throughput on only one node.
Hi,
I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.
root@ip-10-64-177-38:~# nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC Rack Status State Load Owns
Token
141784319550391032739561396922763706368
10.64.167.32 eu-west 1b Up Normal 178.37 GB
33.33% 28356863910078203714492389662765613056
10.250.202.154 eu-west 1b Up Normal 174.93 GB
33.33% 85070591730234615865843651857942052863
10.64.177.38 eu-west 1b Up Normal 167.13 GB
33.33% 141784319550391032739561396922763706368
root@ip-10-64-177-38:~# nodetool ring cassa_teads
Address DC Rack Status State Load
Effective-Ownership Token
141784319550391032739561396922763706368
10.64.167.32 eu-west 1b Up Normal 178.37 GB
100.00% 28356863910078203714492389662765613056
10.250.202.154 eu-west 1b Up Normal 174.93 GB
100.00% 85070591730234615865843651857942052863
10.64.177.38 eu-west 1b Up Normal 167.13 GB
100.00% 141784319550391032739561396922763706368
My cluster is well balanced, all the nodes have an identical configuration,
but yet I have a lot of disk reads on one of them as you can see in these
screenshots:
Datastax OpsCenter :
http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
or
AWS console : http://img59.imageshack.us/img59/5223/ec2managementconsole.png
I have tried to see what is read from any nodes with "inotifywatch -r -t300
/raid0 > inotifywatch5min" and get the following result:
root@ip-10-64-177-38:~# cat inotifywatch5min
total access close_nowrite open filename
234580 113280 60691 60609
/raid0/cassandra/data/cassa_teads/data_viewer/
56013 27108 14454 14451
/raid0/cassandra/data/cassa_teads/data_ip_viewer/
30748 14998 7884 7866
/raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
301 147 76 78
/raid0/cassandra/data/cassa_teads/data_transac/
191 95 48 48
/raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
6 0 3 3 /raid0/cassandra/
2 0 1 1
/raid0/cassandra/data/
2 0 1 1
/raid0/cassandra/commitlog/
2 0 1 1
/raid0/cassandra/saved_caches/
root@ip-10-250-202-154:~# cat inotifywatch5min
total access modify close_write close_nowrite open moved_from
moved_to create delete filename
307378 115456 77706 12 57119 57035
/raid0/cassandra/data/cassa_teads/data_viewer/
55395 26878 0 0 14259 14258
/raid0/cassandra/data/cassa_teads/data_ip_viewer/
36155 17653 0 0 9256 9246
/raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
7377 188 7153 6 4 11
/raid0/cassandra/data/cassa_teads/data_action/
401 0 364 6 4 12
/raid0/cassandra/data/cassa_teads/stats_ad_uv/
244 120 0 0 62 62
/raid0/cassandra/data/cassa_teads/data_transac/
160 76 0 0 42 42
/raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
26 0 0 0 13 13
/raid0/cassandra/data/cassa_teads/
12 0 2 2 1 3
/raid0/cassandra/commitlog/
6 0 0 0 3
3 /raid0/cassandra/
2 0 0 0 1
1 /raid0/cassandra/data/
2 0 0 0 1
1 /raid0/cassandra/saved_caches/
root@ip-10-64-167-32:~# cat inotifywatch5min
total access modify close_write close_nowrite open filename
218489 105379 0 0 56615 56495
/raid0/cassandra/data/cassa_teads/data_viewer/
61553 29733 0 0 15911 15909
/raid0/cassandra/data/cassa_teads/data_ip_viewer/
33262 16214 0 0 8528 8520
/raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
407 0 370 6 4 12
/raid0/cassandra/data/cassa_teads/stats_ad_uv/
280 134 0 0 73 73
/raid0/cassandra/data/cassa_teads/data_transac/
163 79 0 0 42 42
/raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
17 0 3 3 1 4
/raid0/cassandra/commitlog/
8 0 0 0 4
4 /raid0/cassandra/data/cassa_teads/
6 0 0 0 3
3 /raid0/cassandra/
2 0 0 0 1
1 /raid0/cassandra/data/
2 0 0 0 1
1 /raid0/cassandra/saved_caches/
I see no reason to have a md0 disk read throughput up to 8 time bigger on
the .38 node...
root@ip-10-64-167-32:~# iostat -mx 5 100
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvdap1 0.00 0.40 0.20 0.40 0.00 0.00 18.67
0.00 6.67 6.67 0.40
xvdb 3.60 0.00 471.60 0.00 13.48 0.00 58.53
33.31 69.58 1.79 84.20
xvdc 3.40 0.00 474.40 0.00 13.76 0.00 59.42
30.06 62.63 1.72 81.60
xvdd 2.60 0.00 478.20 0.00 13.69 0.00 58.61
28.74 61.74 1.76 84.40
xvde 3.40 0.00 472.40 0.00 13.34 0.00 57.82
23.08 47.96 1.74 82.40
md0 0.00 0.00 1954.20 0.00 55.57 0.00 58.24
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
11.95 11.88 11.54 48.33 0.20 16.10
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvdap1 0.00 2.40 0.40 1.00 0.00 0.01 25.14
0.15 110.00 55.71 7.80
xvdb 3.40 0.00 481.60 63.20 13.94 2.59 62.13
33.79 62.86 1.56 84.80
xvdc 3.60 0.00 469.20 64.40 13.43 2.68 61.81
40.35 76.23 1.54 82.40
xvdd 6.60 0.20 480.80 64.40 13.93 2.65 62.28
45.99 84.72 1.63 88.60
xvde 6.60 0.00 482.00 65.60 14.20 2.68 63.14
48.76 89.77 1.61 88.00
md0 0.00 0.00 1910.80 257.80 54.92 10.60 61.88
0.00 0.00 0.00 0.00
avg-cpu: %user %nice %system %iowait %steal %idle
12.41 12.82 10.93 36.15 0.21 27.48
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
xvdap1 0.00 0.60 2.40 0.60 0.03 0.00 23.47
0.18 58.67 24.67 7.40
xvdb 3.80 0.00 429.20 4.80 12.41 0.20 59.48
16.44 37.94 1.70 73.80
xvdc 3.00 0.00 432.60 4.80 12.47 0.20 59.33
20.74 47.44 1.71 75.00
xvdd 2.20 0.00 439.60 4.80 12.56 0.20 58.80
17.57 39.44 1.78 79.20
xvde 2.00 0.00 449.20 3.60 12.84 0.15 58.73
15.96 35.29 1.68 76.00
md0 0.00 0.00 1744.40 18.00 49.63 0.75 58.55
0.00 0.00 0.00 0.00
Did anyone experiments something similar ?
How can I dig further to see what is wrong ?
Any idea on what on C* can be responsible of this king of targeted reads ?
Alain
Re: High disk read throughput on only one node.
Posted by Alain RODRIGUEZ <ar...@gmail.com>.
It looks like nobody has already experiment this kind of trouble or even
has a clue about it.
Under heavy load this creates a high latency (because of iowait) in my app
in prod and we can't handle it longer. If there is nothing new in the few
upcoming days I think I'll drop this node and replace it, hopping this will
fix my issue...
I wait a bit more because I am hopping we will find out what is the issue
and this will help the C* community.
2012/12/20 Alain RODRIGUEZ <ar...@gmail.com>
> "routing more traffic to it?"
>
> So shouldn't I see more "network in" on that node in the AWS console ?
>
> It seems that each node is recieving and sending an equal amount of data.
>
> What value should I use for dynamic-snitch-badness-threshold to give it a
> try ?
> Le 20 déc. 2012 00:37, "Bryan Talbot" <bt...@aeriagames.com> a écrit :
>
> Oh, you're on ec2. Maybe the dynamic snitch is detecting that one node is
>> performing better than the others so is routing more traffic to it?
>>
>>
>> http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
>>
>> -Bryan
>>
>>
>>
>>
>> On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>>
>>> @Aaron
>>> "Is there a sustained difference or did it settle back ? "
>>>
>>> Sustained, clearly. During the day all nodes read at about 6MB/s while
>>> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
>>> nodes reads at 8-10MB/s
>>>
>>> "Could this have been compaction or repair or upgrade tables working ? "
>>>
>>> Was my first thought but definitely no. this occurs continuously.
>>>
>>> "Do the read / write counts available in nodetool cfstats show anything
>>> different ? "
>>>
>>> The cfstats shows different counts (a lot less reads/writes for the
>>> "bad" node) but they didn't join the ring at the same time. I join you the
>>> cfstats just in case it could help somehow.
>>>
>>> Node 38: http://pastebin.com/ViS1MR8d (bad one)
>>> Node 32: http://pastebin.com/MrSTHH9F
>>> Node 154: http://pastebin.com/7p0Usvwd
>>>
>>> @Bryan
>>>
>>> "clients always connect to that server"
>>>
>>> I didn't join it in the screenshot from AWS console, but AWS report an
>>> (almost) equal network within the nodes (same for output and cpu). The cpu
>>> load is a lot higher in the broken node as shown by the OpsCenter, but
>>> that's due to the high iowait...)
>>>
>>
>>
>>
>> --
>> Bryan Talbot
>> Architect / Platform team lead, Aeria Games and Entertainment
>> Silicon Valley | Berlin | Tokyo | Sao Paulo
>>
>>
>>
Re: High disk read throughput on only one node.
Posted by Alain RODRIGUEZ <ar...@gmail.com>.
"routing more traffic to it?"
So shouldn't I see more "network in" on that node in the AWS console ?
It seems that each node is recieving and sending an equal amount of data.
What value should I use for dynamic-snitch-badness-threshold to give it a
try ?
Le 20 déc. 2012 00:37, "Bryan Talbot" <bt...@aeriagames.com> a écrit :
> Oh, you're on ec2. Maybe the dynamic snitch is detecting that one node is
> performing better than the others so is routing more traffic to it?
>
>
> http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
>
> -Bryan
>
>
>
>
> On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>
>> @Aaron
>> "Is there a sustained difference or did it settle back ? "
>>
>> Sustained, clearly. During the day all nodes read at about 6MB/s while
>> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
>> nodes reads at 8-10MB/s
>>
>> "Could this have been compaction or repair or upgrade tables working ? "
>>
>> Was my first thought but definitely no. this occurs continuously.
>>
>> "Do the read / write counts available in nodetool cfstats show anything
>> different ? "
>>
>> The cfstats shows different counts (a lot less reads/writes for the "bad"
>> node) but they didn't join the ring at the same time. I join you the
>> cfstats just in case it could help somehow.
>>
>> Node 38: http://pastebin.com/ViS1MR8d (bad one)
>> Node 32: http://pastebin.com/MrSTHH9F
>> Node 154: http://pastebin.com/7p0Usvwd
>>
>> @Bryan
>>
>> "clients always connect to that server"
>>
>> I didn't join it in the screenshot from AWS console, but AWS report an
>> (almost) equal network within the nodes (same for output and cpu). The cpu
>> load is a lot higher in the broken node as shown by the OpsCenter, but
>> that's due to the high iowait...)
>>
>
>
>
> --
> Bryan Talbot
> Architect / Platform team lead, Aeria Games and Entertainment
> Silicon Valley | Berlin | Tokyo | Sao Paulo
>
>
>
Re: High disk read throughput on only one node.
Posted by Bryan Talbot <bt...@aeriagames.com>.
Oh, you're on ec2. Maybe the dynamic snitch is detecting that one node is
performing better than the others so is routing more traffic to it?
http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
-Bryan
On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> @Aaron
> "Is there a sustained difference or did it settle back ? "
>
> Sustained, clearly. During the day all nodes read at about 6MB/s while
> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
> nodes reads at 8-10MB/s
>
> "Could this have been compaction or repair or upgrade tables working ? "
>
> Was my first thought but definitely no. this occurs continuously.
>
> "Do the read / write counts available in nodetool cfstats show anything
> different ? "
>
> The cfstats shows different counts (a lot less reads/writes for the "bad"
> node) but they didn't join the ring at the same time. I join you the
> cfstats just in case it could help somehow.
>
> Node 38: http://pastebin.com/ViS1MR8d (bad one)
> Node 32: http://pastebin.com/MrSTHH9F
> Node 154: http://pastebin.com/7p0Usvwd
>
> @Bryan
>
> "clients always connect to that server"
>
> I didn't join it in the screenshot from AWS console, but AWS report an
> (almost) equal network within the nodes (same for output and cpu). The cpu
> load is a lot higher in the broken node as shown by the OpsCenter, but
> that's due to the high iowait...)
>
--
Bryan Talbot
Architect / Platform team lead, Aeria Games and Entertainment
Silicon Valley | Berlin | Tokyo | Sao Paulo
Re: High disk read throughput on only one node.
Posted by Alain RODRIGUEZ <ar...@gmail.com>.
@Aaron
"Is there a sustained difference or did it settle back ? "
Sustained, clearly. During the day all nodes read at about 6MB/s while this
one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
nodes reads at 8-10MB/s
"Could this have been compaction or repair or upgrade tables working ? "
Was my first thought but definitely no. this occurs continuously.
"Do the read / write counts available in nodetool cfstats show anything
different ? "
The cfstats shows different counts (a lot less reads/writes for the "bad"
node) but they didn't join the ring at the same time. I join you the
cfstats just in case it could help somehow.
Node 38: http://pastebin.com/ViS1MR8d (bad one)
Node 32: http://pastebin.com/MrSTHH9F
Node 154: http://pastebin.com/7p0Usvwd
@Bryan
"clients always connect to that server"
I didn't join it in the screenshot from AWS console, but AWS report an
(almost) equal network within the nodes (same for output and cpu). The cpu
load is a lot higher in the broken node as shown by the OpsCenter, but
that's due to the high iowait...)
Re: High disk read throughput on only one node.
Posted by Bryan Talbot <bt...@aeriagames.com>.
Or maybe the clients always connect to that server which can satisfy all
reads. They have 3 nodes with RF=3.
-Bryan
On Wed, Dec 19, 2012 at 12:15 PM, aaron morton <aa...@thelastpickle.com>wrote:
> Is there a sustained difference or did it settle back ?
> Could this have been compaction or repair or upgrade tables working ?
>
> Do the read / write counts available in nodetool cfstats show anything
> different ?
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/12/2012, at 6:26 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>
> Hi,
>
> I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.
>
> root@ip-10-64-177-38:~# nodetool ring
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address DC Rack Status State Load
> Owns Token
>
> 141784319550391032739561396922763706368
> 10.64.167.32 eu-west 1b Up Normal 178.37 GB
> 33.33% 28356863910078203714492389662765613056
> 10.250.202.154 eu-west 1b Up Normal 174.93 GB
> 33.33% 85070591730234615865843651857942052863
> 10.64.177.38 eu-west 1b Up Normal 167.13 GB
> 33.33% 141784319550391032739561396922763706368
>
> root@ip-10-64-177-38:~# nodetool ring cassa_teads
> Address DC Rack Status State Load
> Effective-Ownership Token
>
> 141784319550391032739561396922763706368
> 10.64.167.32 eu-west 1b Up Normal 178.37 GB
> 100.00% 28356863910078203714492389662765613056
> 10.250.202.154 eu-west 1b Up Normal 174.93 GB
> 100.00% 85070591730234615865843651857942052863
> 10.64.177.38 eu-west 1b Up Normal 167.13 GB
> 100.00% 141784319550391032739561396922763706368
>
> My cluster is well balanced, all the nodes have an identical
> configuration, but yet I have a lot of disk reads on one of them as you can
> see in these screenshots:
>
> Datastax OpsCenter :
> http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
> or
> AWS console :
> http://img59.imageshack.us/img59/5223/ec2managementconsole.png
>
> I have tried to see what is read from any nodes with "inotifywatch -r
> -t300 /raid0 > inotifywatch5min" and get the following result:
>
> root@ip-10-64-177-38:~# cat inotifywatch5min
> total access close_nowrite open filename
> 234580 113280 60691 60609
> /raid0/cassandra/data/cassa_teads/data_viewer/
> 56013 27108 14454 14451
> /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 30748 14998 7884 7866
> /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 301 147 76 78
> /raid0/cassandra/data/cassa_teads/data_transac/
> 191 95 48 48
> /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 6 0 3 3 /raid0/cassandra/
> 2 0 1 1
> /raid0/cassandra/data/
> 2 0 1 1
> /raid0/cassandra/commitlog/
> 2 0 1 1
> /raid0/cassandra/saved_caches/
>
>
> root@ip-10-250-202-154:~# cat inotifywatch5min
> total access modify close_write close_nowrite open moved_from
> moved_to create delete filename
> 307378 115456 77706 12 57119 57035
> /raid0/cassandra/data/cassa_teads/data_viewer/
> 55395 26878 0 0 14259 14258
> /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 36155 17653 0 0 9256 9246
> /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 7377 188 7153 6 4 11
> /raid0/cassandra/data/cassa_teads/data_action/
> 401 0 364 6 4 12
> /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 244 120 0 0 62 62
> /raid0/cassandra/data/cassa_teads/data_transac/
> 160 76 0 0 42 42
> /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 26 0 0 0 13
> 13 /raid0/cassandra/data/cassa_teads/
> 12 0 2 2 1
> 3 /raid0/cassandra/commitlog/
> 6 0 0 0 3
> 3 /raid0/cassandra/
> 2 0 0 0 1
> 1 /raid0/cassandra/data/
> 2 0 0 0 1
> 1 /raid0/cassandra/saved_caches/
>
>
> root@ip-10-64-167-32:~# cat inotifywatch5min
> total access modify close_write close_nowrite open filename
> 218489 105379 0 0 56615 56495
> /raid0/cassandra/data/cassa_teads/data_viewer/
> 61553 29733 0 0 15911 15909
> /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 33262 16214 0 0 8528 8520
> /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 407 0 370 6 4 12
> /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 280 134 0 0 73 73
> /raid0/cassandra/data/cassa_teads/data_transac/
> 163 79 0 0 42 42
> /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 17 0 3 3 1
> 4 /raid0/cassandra/commitlog/
> 8 0 0 0 4
> 4 /raid0/cassandra/data/cassa_teads/
> 6 0 0 0 3
> 3 /raid0/cassandra/
> 2 0 0 0 1
> 1 /raid0/cassandra/data/
> 2 0 0 0 1
> 1 /raid0/cassandra/saved_caches/
>
> I see no reason to have a md0 disk read throughput up to 8 time bigger on
> the .38 node...
>
> root@ip-10-64-167-32:~# iostat -mx 5 100
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
> xvdap1 0.00 0.40 0.20 0.40 0.00 0.00 18.67
> 0.00 6.67 6.67 0.40
> xvdb 3.60 0.00 471.60 0.00 13.48 0.00 58.53
> 33.31 69.58 1.79 84.20
> xvdc 3.40 0.00 474.40 0.00 13.76 0.00 59.42
> 30.06 62.63 1.72 81.60
> xvdd 2.60 0.00 478.20 0.00 13.69 0.00 58.61
> 28.74 61.74 1.76 84.40
> xvde 3.40 0.00 472.40 0.00 13.34 0.00 57.82
> 23.08 47.96 1.74 82.40
> md0 0.00 0.00 1954.20 0.00 55.57 0.00 58.24
> 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 11.95 11.88 11.54 48.33 0.20 16.10
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
> xvdap1 0.00 2.40 0.40 1.00 0.00 0.01 25.14
> 0.15 110.00 55.71 7.80
> xvdb 3.40 0.00 481.60 63.20 13.94 2.59 62.13
> 33.79 62.86 1.56 84.80
> xvdc 3.60 0.00 469.20 64.40 13.43 2.68 61.81
> 40.35 76.23 1.54 82.40
> xvdd 6.60 0.20 480.80 64.40 13.93 2.65 62.28
> 45.99 84.72 1.63 88.60
> xvde 6.60 0.00 482.00 65.60 14.20 2.68 63.14
> 48.76 89.77 1.61 88.00
> md0 0.00 0.00 1910.80 257.80 54.92 10.60 61.88
> 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 12.41 12.82 10.93 36.15 0.21 27.48
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
> avgqu-sz await svctm %util
> xvdap1 0.00 0.60 2.40 0.60 0.03 0.00 23.47
> 0.18 58.67 24.67 7.40
> xvdb 3.80 0.00 429.20 4.80 12.41 0.20 59.48
> 16.44 37.94 1.70 73.80
> xvdc 3.00 0.00 432.60 4.80 12.47 0.20 59.33
> 20.74 47.44 1.71 75.00
> xvdd 2.20 0.00 439.60 4.80 12.56 0.20 58.80
> 17.57 39.44 1.78 79.20
> xvde 2.00 0.00 449.20 3.60 12.84 0.15 58.73
> 15.96 35.29 1.68 76.00
> md0 0.00 0.00 1744.40 18.00 49.63 0.75 58.55
> 0.00 0.00 0.00 0.00
>
> Did anyone experiments something similar ?
>
> How can I dig further to see what is wrong ?
>
> Any idea on what on C* can be responsible of this king of targeted reads ?
>
> Alain
>
>
>
Re: High disk read throughput on only one node.
Posted by aaron morton <aa...@thelastpickle.com>.
Is there a sustained difference or did it settle back ?
Could this have been compaction or repair or upgrade tables working ?
Do the read / write counts available in nodetool cfstats show anything different ?
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand
@aaronmorton
http://www.thelastpickle.com
On 19/12/2012, at 6:26 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
> Hi,
>
> I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.
>
> root@ip-10-64-177-38:~# nodetool ring
> Note: Ownership information does not include topology, please specify a keyspace.
> Address DC Rack Status State Load Owns Token
> 141784319550391032739561396922763706368
> 10.64.167.32 eu-west 1b Up Normal 178.37 GB 33.33% 28356863910078203714492389662765613056
> 10.250.202.154 eu-west 1b Up Normal 174.93 GB 33.33% 85070591730234615865843651857942052863
> 10.64.177.38 eu-west 1b Up Normal 167.13 GB 33.33% 141784319550391032739561396922763706368
>
> root@ip-10-64-177-38:~# nodetool ring cassa_teads
> Address DC Rack Status State Load Effective-Ownership Token
> 141784319550391032739561396922763706368
> 10.64.167.32 eu-west 1b Up Normal 178.37 GB 100.00% 28356863910078203714492389662765613056
> 10.250.202.154 eu-west 1b Up Normal 174.93 GB 100.00% 85070591730234615865843651857942052863
> 10.64.177.38 eu-west 1b Up Normal 167.13 GB 100.00% 141784319550391032739561396922763706368
>
> My cluster is well balanced, all the nodes have an identical configuration, but yet I have a lot of disk reads on one of them as you can see in these screenshots:
>
> Datastax OpsCenter : http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
> or
> AWS console : http://img59.imageshack.us/img59/5223/ec2managementconsole.png
>
> I have tried to see what is read from any nodes with "inotifywatch -r -t300 /raid0 > inotifywatch5min" and get the following result:
>
> root@ip-10-64-177-38:~# cat inotifywatch5min
> total access close_nowrite open filename
> 234580 113280 60691 60609 /raid0/cassandra/data/cassa_teads/data_viewer/
> 56013 27108 14454 14451 /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 30748 14998 7884 7866 /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 301 147 76 78 /raid0/cassandra/data/cassa_teads/data_transac/
> 191 95 48 48 /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 6 0 3 3 /raid0/cassandra/
> 2 0 1 1 /raid0/cassandra/data/
> 2 0 1 1 /raid0/cassandra/commitlog/
> 2 0 1 1 /raid0/cassandra/saved_caches/
>
>
> root@ip-10-250-202-154:~# cat inotifywatch5min
> total access modify close_write close_nowrite open moved_from moved_to create delete filename
> 307378 115456 77706 12 57119 57035 /raid0/cassandra/data/cassa_teads/data_viewer/
> 55395 26878 0 0 14259 14258 /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 36155 17653 0 0 9256 9246 /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 7377 188 7153 6 4 11 /raid0/cassandra/data/cassa_teads/data_action/
> 401 0 364 6 4 12 /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 244 120 0 0 62 62 /raid0/cassandra/data/cassa_teads/data_transac/
> 160 76 0 0 42 42 /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 26 0 0 0 13 13 /raid0/cassandra/data/cassa_teads/
> 12 0 2 2 1 3 /raid0/cassandra/commitlog/
> 6 0 0 0 3 3 /raid0/cassandra/
> 2 0 0 0 1 1 /raid0/cassandra/data/
> 2 0 0 0 1 1 /raid0/cassandra/saved_caches/
>
>
> root@ip-10-64-167-32:~# cat inotifywatch5min
> total access modify close_write close_nowrite open filename
> 218489 105379 0 0 56615 56495 /raid0/cassandra/data/cassa_teads/data_viewer/
> 61553 29733 0 0 15911 15909 /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 33262 16214 0 0 8528 8520 /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 407 0 370 6 4 12 /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 280 134 0 0 73 73 /raid0/cassandra/data/cassa_teads/data_transac/
> 163 79 0 0 42 42 /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 17 0 3 3 1 4 /raid0/cassandra/commitlog/
> 8 0 0 0 4 4 /raid0/cassandra/data/cassa_teads/
> 6 0 0 0 3 3 /raid0/cassandra/
> 2 0 0 0 1 1 /raid0/cassandra/data/
> 2 0 0 0 1 1 /raid0/cassandra/saved_caches/
>
> I see no reason to have a md0 disk read throughput up to 8 time bigger on the .38 node...
>
> root@ip-10-64-167-32:~# iostat -mx 5 100
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
> xvdap1 0.00 0.40 0.20 0.40 0.00 0.00 18.67 0.00 6.67 6.67 0.40
> xvdb 3.60 0.00 471.60 0.00 13.48 0.00 58.53 33.31 69.58 1.79 84.20
> xvdc 3.40 0.00 474.40 0.00 13.76 0.00 59.42 30.06 62.63 1.72 81.60
> xvdd 2.60 0.00 478.20 0.00 13.69 0.00 58.61 28.74 61.74 1.76 84.40
> xvde 3.40 0.00 472.40 0.00 13.34 0.00 57.82 23.08 47.96 1.74 82.40
> md0 0.00 0.00 1954.20 0.00 55.57 0.00 58.24 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 11.95 11.88 11.54 48.33 0.20 16.10
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
> xvdap1 0.00 2.40 0.40 1.00 0.00 0.01 25.14 0.15 110.00 55.71 7.80
> xvdb 3.40 0.00 481.60 63.20 13.94 2.59 62.13 33.79 62.86 1.56 84.80
> xvdc 3.60 0.00 469.20 64.40 13.43 2.68 61.81 40.35 76.23 1.54 82.40
> xvdd 6.60 0.20 480.80 64.40 13.93 2.65 62.28 45.99 84.72 1.63 88.60
> xvde 6.60 0.00 482.00 65.60 14.20 2.68 63.14 48.76 89.77 1.61 88.00
> md0 0.00 0.00 1910.80 257.80 54.92 10.60 61.88 0.00 0.00 0.00 0.00
>
> avg-cpu: %user %nice %system %iowait %steal %idle
> 12.41 12.82 10.93 36.15 0.21 27.48
>
> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
> xvdap1 0.00 0.60 2.40 0.60 0.03 0.00 23.47 0.18 58.67 24.67 7.40
> xvdb 3.80 0.00 429.20 4.80 12.41 0.20 59.48 16.44 37.94 1.70 73.80
> xvdc 3.00 0.00 432.60 4.80 12.47 0.20 59.33 20.74 47.44 1.71 75.00
> xvdd 2.20 0.00 439.60 4.80 12.56 0.20 58.80 17.57 39.44 1.78 79.20
> xvde 2.00 0.00 449.20 3.60 12.84 0.15 58.73 15.96 35.29 1.68 76.00
> md0 0.00 0.00 1744.40 18.00 49.63 0.75 58.55 0.00 0.00 0.00 0.00
>
> Did anyone experiments something similar ?
>
> How can I dig further to see what is wrong ?
>
> Any idea on what on C* can be responsible of this king of targeted reads ?
>
> Alain