You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2012/12/18 18:26:49 UTC

High disk read throughput on only one node.

Hi,

I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.

root@ip-10-64-177-38:~# nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address         DC          Rack        Status State   Load            Owns
               Token

               141784319550391032739561396922763706368
10.64.167.32    eu-west     1b          Up     Normal  178.37 GB
33.33%              28356863910078203714492389662765613056
10.250.202.154  eu-west     1b          Up     Normal  174.93 GB
33.33%              85070591730234615865843651857942052863
10.64.177.38    eu-west     1b          Up     Normal  167.13 GB
33.33%              141784319550391032739561396922763706368

root@ip-10-64-177-38:~# nodetool ring cassa_teads
Address         DC          Rack        Status State   Load
 Effective-Ownership Token

               141784319550391032739561396922763706368
10.64.167.32    eu-west     1b          Up     Normal  178.37 GB
100.00%             28356863910078203714492389662765613056
10.250.202.154  eu-west     1b          Up     Normal  174.93 GB
100.00%             85070591730234615865843651857942052863
10.64.177.38    eu-west     1b          Up     Normal  167.13 GB
100.00%             141784319550391032739561396922763706368

My cluster is well balanced, all the nodes have an identical configuration,
but yet I have a lot of disk reads on one of them as you can see in these
screenshots:

Datastax OpsCenter :
http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
or
AWS console : http://img59.imageshack.us/img59/5223/ec2managementconsole.png

I have tried to see what is read from any nodes with "inotifywatch -r -t300
/raid0 > inotifywatch5min" and get the following result:

root@ip-10-64-177-38:~# cat inotifywatch5min
total       access  close_nowrite  open      filename
234580   113280  60691             60609
 /raid0/cassandra/data/cassa_teads/data_viewer/
56013      27108   14454            14451
/raid0/cassandra/data/cassa_teads/data_ip_viewer/
30748      14998   7884              7866
 /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
301          147      76                  78
/raid0/cassandra/data/cassa_teads/data_transac/
191           95       48                  48
   /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
6               0        3                    3           /raid0/cassandra/
2               0        1                    1
  /raid0/cassandra/data/
2               0        1                    1
  /raid0/cassandra/commitlog/
2               0        1                    1
  /raid0/cassandra/saved_caches/


root@ip-10-250-202-154:~# cat inotifywatch5min
total       access  modify  close_write  close_nowrite  open  moved_from
 moved_to  create  delete  filename
307378  115456  77706   12                57119             57035
 /raid0/cassandra/data/cassa_teads/data_viewer/
55395    26878   0           0                 14259             14258
  /raid0/cassandra/data/cassa_teads/data_ip_viewer/
36155    17653   0           0                 9256               9246
   /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
7377      188      7153      6                 4                    11
      /raid0/cassandra/data/cassa_teads/data_action/
401        0         364        6                 4                    12
        /raid0/cassandra/data/cassa_teads/stats_ad_uv/
244        120      0           0                 62                   62
       /raid0/cassandra/data/cassa_teads/data_transac/
160        76        0           0                 42                   42
        /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
26          0         0           0                 13                   13
         /raid0/cassandra/data/cassa_teads/
12          0         2           2                 1                     3
          /raid0/cassandra/commitlog/
6            0         0           0                 3
 3          /raid0/cassandra/
2            0         0           0                 1
 1          /raid0/cassandra/data/
2            0         0           0                 1
 1          /raid0/cassandra/saved_caches/


root@ip-10-64-167-32:~# cat inotifywatch5min
total       access  modify  close_write  close_nowrite  open    filename
218489   105379  0            0                  56615         56495
 /raid0/cassandra/data/cassa_teads/data_viewer/
61553     29733   0             0                  15911        15909
 /raid0/cassandra/data/cassa_teads/data_ip_viewer/
33262     16214   0             0                  8528          8520
 /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
407         0         370          6                  4               12
     /raid0/cassandra/data/cassa_teads/stats_ad_uv/
280         134      0             0                  73              73
    /raid0/cassandra/data/cassa_teads/data_transac/
163          79       0             0                  42              42
    /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
17            0        3             3                  1                 4
       /raid0/cassandra/commitlog/
8             0        0              0                  4
4        /raid0/cassandra/data/cassa_teads/
6             0        0              0                  3
3        /raid0/cassandra/
2             0        0              0                  1
1        /raid0/cassandra/data/
2             0        0              0                  1
1        /raid0/cassandra/saved_caches/

I see no reason to have a md0 disk read throughput up to 8 time bigger on
the .38 node...

root@ip-10-64-167-32:~# iostat -mx 5 100

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdap1            0.00     0.40    0.20    0.40     0.00     0.00    18.67
    0.00    6.67   6.67   0.40
xvdb              3.60     0.00  471.60    0.00    13.48     0.00    58.53
   33.31   69.58   1.79  84.20
xvdc              3.40     0.00  474.40    0.00    13.76     0.00    59.42
   30.06   62.63   1.72  81.60
xvdd              2.60     0.00  478.20    0.00    13.69     0.00    58.61
   28.74   61.74   1.76  84.40
xvde              3.40     0.00  472.40    0.00    13.34     0.00    57.82
   23.08   47.96   1.74  82.40
md0               0.00     0.00 1954.20    0.00    55.57     0.00    58.24
    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          11.95   11.88   11.54   48.33    0.20   16.10

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdap1            0.00     2.40    0.40    1.00     0.00     0.01    25.14
    0.15  110.00  55.71   7.80
xvdb              3.40     0.00  481.60   63.20    13.94     2.59    62.13
   33.79   62.86   1.56  84.80
xvdc              3.60     0.00  469.20   64.40    13.43     2.68    61.81
   40.35   76.23   1.54  82.40
xvdd              6.60     0.20  480.80   64.40    13.93     2.65    62.28
   45.99   84.72   1.63  88.60
xvde              6.60     0.00  482.00   65.60    14.20     2.68    63.14
   48.76   89.77   1.61  88.00
md0               0.00     0.00 1910.80  257.80    54.92    10.60    61.88
    0.00    0.00   0.00   0.00

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          12.41   12.82   10.93   36.15    0.21   27.48

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
avgqu-sz   await  svctm  %util
xvdap1            0.00     0.60    2.40    0.60     0.03     0.00    23.47
    0.18   58.67  24.67   7.40
xvdb              3.80     0.00  429.20    4.80    12.41     0.20    59.48
   16.44   37.94   1.70  73.80
xvdc              3.00     0.00  432.60    4.80    12.47     0.20    59.33
   20.74   47.44   1.71  75.00
xvdd              2.20     0.00  439.60    4.80    12.56     0.20    58.80
   17.57   39.44   1.78  79.20
xvde              2.00     0.00  449.20    3.60    12.84     0.15    58.73
   15.96   35.29   1.68  76.00
md0               0.00     0.00 1744.40   18.00    49.63     0.75    58.55
    0.00    0.00   0.00   0.00

Did anyone experiments something similar ?

How can I dig further to see what is wrong ?

Any idea on what on C* can be responsible of this king of targeted reads ?

Alain

Re: High disk read throughput on only one node.

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

It looks like nobody has already experiment this kind of trouble or even
has a clue about it.

Under heavy load this creates a high latency (because of iowait) in my app
in prod and we can't handle it longer. If there is nothing new in the few
upcoming days I think I'll drop this node and replace it, hopping this will
fix my issue...

I wait a bit more because I am hopping we will find out what is the issue
and this will help the C* community.




2012/12/20 Alain RODRIGUEZ <ar...@gmail.com>

> "routing more traffic to it?"
>
> So shouldn't I see more "network in" on that node in the AWS console ?
>
> It seems that each node is recieving and sending an equal amount of data.
>
> What value should I use for dynamic-snitch-badness-threshold to give it a
> try ?
> Le 20 déc. 2012 00:37, "Bryan Talbot" <bt...@aeriagames.com> a écrit :
>
> Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is
>> performing better than the others so is routing more traffic to it?
>>
>>
>> http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
>>
>> -Bryan
>>
>>
>>
>>
>> On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>>
>>> @Aaron
>>> "Is there a sustained difference or did it settle back ? "
>>>
>>> Sustained, clearly. During the day all nodes read at about 6MB/s while
>>> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
>>> nodes reads at 8-10MB/s
>>>
>>> "Could this have been compaction or repair or upgrade tables working ? "
>>>
>>> Was my first thought but definitely no. this occurs continuously.
>>>
>>> "Do the read / write counts available in nodetool cfstats show anything
>>> different ? "
>>>
>>> The cfstats shows different counts (a lot less reads/writes for the
>>> "bad" node)  but they didn't join the ring at the same time. I join you the
>>> cfstats just in case it could help somehow.
>>>
>>> Node  38: http://pastebin.com/ViS1MR8d (bad one)
>>> Node  32: http://pastebin.com/MrSTHH9F
>>> Node 154: http://pastebin.com/7p0Usvwd
>>>
>>> @Bryan
>>>
>>>  "clients always connect to that server"
>>>
>>> I didn't join it in the screenshot from AWS console, but AWS report an
>>> (almost) equal network within the nodes (same for output and cpu). The cpu
>>> load is a lot higher in the broken node as shown by the OpsCenter, but
>>> that's due to the high iowait...)
>>>
>>
>>
>>
>> --
>> Bryan Talbot
>> Architect / Platform team lead, Aeria Games and Entertainment
>> Silicon Valley | Berlin | Tokyo | Sao Paulo
>>
>>
>>

Re: High disk read throughput on only one node.

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

"routing more traffic to it?"

So shouldn't I see more "network in" on that node in the AWS console ?

It seems that each node is recieving and sending an equal amount of data.

What value should I use for dynamic-snitch-badness-threshold to give it a
try ?
Le 20 déc. 2012 00:37, "Bryan Talbot" <bt...@aeriagames.com> a écrit :

> Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is
> performing better than the others so is routing more traffic to it?
>
>
> http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold
>
> -Bryan
>
>
>
>
> On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com>wrote:
>
>> @Aaron
>> "Is there a sustained difference or did it settle back ? "
>>
>> Sustained, clearly. During the day all nodes read at about 6MB/s while
>> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
>> nodes reads at 8-10MB/s
>>
>> "Could this have been compaction or repair or upgrade tables working ? "
>>
>> Was my first thought but definitely no. this occurs continuously.
>>
>> "Do the read / write counts available in nodetool cfstats show anything
>> different ? "
>>
>> The cfstats shows different counts (a lot less reads/writes for the "bad"
>> node)  but they didn't join the ring at the same time. I join you the
>> cfstats just in case it could help somehow.
>>
>> Node  38: http://pastebin.com/ViS1MR8d (bad one)
>> Node  32: http://pastebin.com/MrSTHH9F
>> Node 154: http://pastebin.com/7p0Usvwd
>>
>> @Bryan
>>
>>  "clients always connect to that server"
>>
>> I didn't join it in the screenshot from AWS console, but AWS report an
>> (almost) equal network within the nodes (same for output and cpu). The cpu
>> load is a lot higher in the broken node as shown by the OpsCenter, but
>> that's due to the high iowait...)
>>
>
>
>
> --
> Bryan Talbot
> Architect / Platform team lead, Aeria Games and Entertainment
> Silicon Valley | Berlin | Tokyo | Sao Paulo
>
>
>

Re: High disk read throughput on only one node.

Posted by Bryan Talbot <bt...@aeriagames.com>.

Oh, you're on ec2.  Maybe the dynamic snitch is detecting that one node is
performing better than the others so is routing more traffic to it?

http://www.datastax.com/docs/1.1/configuration/node_configuration#dynamic-snitch-badness-threshold

-Bryan




On Wed, Dec 19, 2012 at 2:30 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> @Aaron
> "Is there a sustained difference or did it settle back ? "
>
> Sustained, clearly. During the day all nodes read at about 6MB/s while
> this one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
> nodes reads at 8-10MB/s
>
> "Could this have been compaction or repair or upgrade tables working ? "
>
> Was my first thought but definitely no. this occurs continuously.
>
> "Do the read / write counts available in nodetool cfstats show anything
> different ? "
>
> The cfstats shows different counts (a lot less reads/writes for the "bad"
> node)  but they didn't join the ring at the same time. I join you the
> cfstats just in case it could help somehow.
>
> Node  38: http://pastebin.com/ViS1MR8d (bad one)
> Node  32: http://pastebin.com/MrSTHH9F
> Node 154: http://pastebin.com/7p0Usvwd
>
> @Bryan
>
>  "clients always connect to that server"
>
> I didn't join it in the screenshot from AWS console, but AWS report an
> (almost) equal network within the nodes (same for output and cpu). The cpu
> load is a lot higher in the broken node as shown by the OpsCenter, but
> that's due to the high iowait...)
>



-- 
Bryan Talbot
Architect / Platform team lead, Aeria Games and Entertainment
Silicon Valley | Berlin | Tokyo | Sao Paulo

Re: High disk read throughput on only one node.

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

@Aaron
"Is there a sustained difference or did it settle back ? "

Sustained, clearly. During the day all nodes read at about 6MB/s while this
one reads at 30-40 MB/s. At night while other reads 2MB/s the "broken"
nodes reads at 8-10MB/s

"Could this have been compaction or repair or upgrade tables working ? "

Was my first thought but definitely no. this occurs continuously.

"Do the read / write counts available in nodetool cfstats show anything
different ? "

The cfstats shows different counts (a lot less reads/writes for the "bad"
node)  but they didn't join the ring at the same time. I join you the
cfstats just in case it could help somehow.

Node  38: http://pastebin.com/ViS1MR8d (bad one)
Node  32: http://pastebin.com/MrSTHH9F
Node 154: http://pastebin.com/7p0Usvwd

@Bryan

"clients always connect to that server"

I didn't join it in the screenshot from AWS console, but AWS report an
(almost) equal network within the nodes (same for output and cpu). The cpu
load is a lot higher in the broken node as shown by the OpsCenter, but
that's due to the high iowait...)

Re: High disk read throughput on only one node.

Posted by Bryan Talbot <bt...@aeriagames.com>.

Or maybe the clients always connect to that server which can satisfy all
reads.  They have 3 nodes with RF=3.

-Bryan


On Wed, Dec 19, 2012 at 12:15 PM, aaron morton <aa...@thelastpickle.com>wrote:

> Is there a sustained difference or did it settle back ?
> Could this have been compaction or repair or upgrade tables working ?
>
> Do the read / write counts available in nodetool cfstats show anything
> different ?
>
> Cheers
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 19/12/2012, at 6:26 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:
>
> Hi,
>
> I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.
>
> root@ip-10-64-177-38:~# nodetool ring
> Note: Ownership information does not include topology, please specify a
> keyspace.
> Address         DC          Rack        Status State   Load
>  Owns                Token
>
>                  141784319550391032739561396922763706368
> 10.64.167.32    eu-west     1b          Up     Normal  178.37 GB
> 33.33%              28356863910078203714492389662765613056
> 10.250.202.154  eu-west     1b          Up     Normal  174.93 GB
> 33.33%              85070591730234615865843651857942052863
> 10.64.177.38    eu-west     1b          Up     Normal  167.13 GB
> 33.33%              141784319550391032739561396922763706368
>
> root@ip-10-64-177-38:~# nodetool ring cassa_teads
> Address         DC          Rack        Status State   Load
>  Effective-Ownership Token
>
>                  141784319550391032739561396922763706368
> 10.64.167.32    eu-west     1b          Up     Normal  178.37 GB
> 100.00%             28356863910078203714492389662765613056
> 10.250.202.154  eu-west     1b          Up     Normal  174.93 GB
> 100.00%             85070591730234615865843651857942052863
> 10.64.177.38    eu-west     1b          Up     Normal  167.13 GB
> 100.00%             141784319550391032739561396922763706368
>
> My cluster is well balanced, all the nodes have an identical
> configuration, but yet I have a lot of disk reads on one of them as you can
> see in these screenshots:
>
> Datastax OpsCenter :
> http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
> or
> AWS console :
> http://img59.imageshack.us/img59/5223/ec2managementconsole.png
>
> I have tried to see what is read from any nodes with "inotifywatch -r
> -t300 /raid0 > inotifywatch5min" and get the following result:
>
> root@ip-10-64-177-38:~# cat inotifywatch5min
> total       access  close_nowrite  open      filename
> 234580   113280  60691             60609
>  /raid0/cassandra/data/cassa_teads/data_viewer/
> 56013      27108   14454            14451
> /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 30748      14998   7884              7866
>  /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 301          147      76                  78
> /raid0/cassandra/data/cassa_teads/data_transac/
> 191           95       48                  48
>    /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 6               0        3                    3           /raid0/cassandra/
> 2               0        1                    1
>   /raid0/cassandra/data/
> 2               0        1                    1
>   /raid0/cassandra/commitlog/
> 2               0        1                    1
>   /raid0/cassandra/saved_caches/
>
>
> root@ip-10-250-202-154:~# cat inotifywatch5min
> total       access  modify  close_write  close_nowrite  open  moved_from
>  moved_to  create  delete  filename
> 307378  115456  77706   12                57119             57035
>  /raid0/cassandra/data/cassa_teads/data_viewer/
> 55395    26878   0           0                 14259             14258
>   /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 36155    17653   0           0                 9256               9246
>      /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 7377      188      7153      6                 4                    11
>       /raid0/cassandra/data/cassa_teads/data_action/
> 401        0         364        6                 4                    12
>           /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 244        120      0           0                 62                   62
>          /raid0/cassandra/data/cassa_teads/data_transac/
> 160        76        0           0                 42                   42
>         /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 26          0         0           0                 13
> 13          /raid0/cassandra/data/cassa_teads/
> 12          0         2           2                 1
>  3           /raid0/cassandra/commitlog/
> 6            0         0           0                 3
>  3          /raid0/cassandra/
> 2            0         0           0                 1
>  1          /raid0/cassandra/data/
> 2            0         0           0                 1
>  1          /raid0/cassandra/saved_caches/
>
>
> root@ip-10-64-167-32:~# cat inotifywatch5min
> total       access  modify  close_write  close_nowrite  open    filename
> 218489   105379  0            0                  56615         56495
>  /raid0/cassandra/data/cassa_teads/data_viewer/
> 61553     29733   0             0                  15911        15909
>  /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 33262     16214   0             0                  8528          8520
>  /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 407         0         370          6                  4               12
>      /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 280         134      0             0                  73              73
>     /raid0/cassandra/data/cassa_teads/data_transac/
> 163          79       0             0                  42              42
>       /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 17            0        3             3                  1
> 4        /raid0/cassandra/commitlog/
> 8             0        0              0                  4
> 4        /raid0/cassandra/data/cassa_teads/
> 6             0        0              0                  3
> 3        /raid0/cassandra/
> 2             0        0              0                  1
> 1        /raid0/cassandra/data/
> 2             0        0              0                  1
> 1        /raid0/cassandra/saved_caches/
>
> I see no reason to have a md0 disk read throughput up to 8 time bigger on
> the .38 node...
>
> root@ip-10-64-167-32:~# iostat -mx 5 100
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> xvdap1            0.00     0.40    0.20    0.40     0.00     0.00    18.67
>     0.00    6.67   6.67   0.40
> xvdb              3.60     0.00  471.60    0.00    13.48     0.00    58.53
>    33.31   69.58   1.79  84.20
> xvdc              3.40     0.00  474.40    0.00    13.76     0.00    59.42
>    30.06   62.63   1.72  81.60
> xvdd              2.60     0.00  478.20    0.00    13.69     0.00    58.61
>    28.74   61.74   1.76  84.40
> xvde              3.40     0.00  472.40    0.00    13.34     0.00    57.82
>    23.08   47.96   1.74  82.40
> md0               0.00     0.00 1954.20    0.00    55.57     0.00    58.24
>     0.00    0.00   0.00   0.00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           11.95   11.88   11.54   48.33    0.20   16.10
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> xvdap1            0.00     2.40    0.40    1.00     0.00     0.01    25.14
>     0.15  110.00  55.71   7.80
> xvdb              3.40     0.00  481.60   63.20    13.94     2.59    62.13
>    33.79   62.86   1.56  84.80
> xvdc              3.60     0.00  469.20   64.40    13.43     2.68    61.81
>    40.35   76.23   1.54  82.40
> xvdd              6.60     0.20  480.80   64.40    13.93     2.65    62.28
>    45.99   84.72   1.63  88.60
> xvde              6.60     0.00  482.00   65.60    14.20     2.68    63.14
>    48.76   89.77   1.61  88.00
> md0               0.00     0.00 1910.80  257.80    54.92    10.60    61.88
>     0.00    0.00   0.00   0.00
>
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           12.41   12.82   10.93   36.15    0.21   27.48
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz
> avgqu-sz   await  svctm  %util
> xvdap1            0.00     0.60    2.40    0.60     0.03     0.00    23.47
>     0.18   58.67  24.67   7.40
> xvdb              3.80     0.00  429.20    4.80    12.41     0.20    59.48
>    16.44   37.94   1.70  73.80
> xvdc              3.00     0.00  432.60    4.80    12.47     0.20    59.33
>    20.74   47.44   1.71  75.00
> xvdd              2.20     0.00  439.60    4.80    12.56     0.20    58.80
>    17.57   39.44   1.78  79.20
> xvde              2.00     0.00  449.20    3.60    12.84     0.15    58.73
>    15.96   35.29   1.68  76.00
> md0               0.00     0.00 1744.40   18.00    49.63     0.75    58.55
>     0.00    0.00   0.00   0.00
>
> Did anyone experiments something similar ?
>
> How can I dig further to see what is wrong ?
>
> Any idea on what on C* can be responsible of this king of targeted reads ?
>
> Alain
>
>
>

Re: High disk read throughput on only one node.

Posted by aaron morton <aa...@thelastpickle.com>.

Is there a sustained difference or did it settle back ? 
Could this have been compaction or repair or upgrade tables working ? 

Do the read / write counts available in nodetool cfstats show anything different ? 

Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 19/12/2012, at 6:26 AM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Hi,
> 
> I am experimenting a strange issue in my C* 1.1.6, 3 node, RF=3 cluster.
> 
> root@ip-10-64-177-38:~# nodetool ring
> Note: Ownership information does not include topology, please specify a keyspace.
> Address         DC          Rack        Status State   Load            Owns                Token
>                                                                                            141784319550391032739561396922763706368
> 10.64.167.32    eu-west     1b          Up     Normal  178.37 GB       33.33%              28356863910078203714492389662765613056
> 10.250.202.154  eu-west     1b          Up     Normal  174.93 GB       33.33%              85070591730234615865843651857942052863
> 10.64.177.38    eu-west     1b          Up     Normal  167.13 GB       33.33%              141784319550391032739561396922763706368
> 
> root@ip-10-64-177-38:~# nodetool ring cassa_teads
> Address         DC          Rack        Status State   Load            Effective-Ownership Token
>                                                                                            141784319550391032739561396922763706368
> 10.64.167.32    eu-west     1b          Up     Normal  178.37 GB       100.00%             28356863910078203714492389662765613056
> 10.250.202.154  eu-west     1b          Up     Normal  174.93 GB       100.00%             85070591730234615865843651857942052863
> 10.64.177.38    eu-west     1b          Up     Normal  167.13 GB       100.00%             141784319550391032739561396922763706368
> 
> My cluster is well balanced, all the nodes have an identical configuration, but yet I have a lot of disk reads on one of them as you can see in these screenshots:
> 
> Datastax OpsCenter : http://img4.imageshack.us/img4/2528/datastaxopscenterheighr.png
> or
> AWS console : http://img59.imageshack.us/img59/5223/ec2managementconsole.png
> 
> I have tried to see what is read from any nodes with "inotifywatch -r -t300 /raid0 > inotifywatch5min" and get the following result:
> 
> root@ip-10-64-177-38:~# cat inotifywatch5min
> total       access  close_nowrite  open      filename
> 234580   113280  60691             60609    /raid0/cassandra/data/cassa_teads/data_viewer/
> 56013      27108   14454            14451     /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 30748      14998   7884              7866      /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 301          147      76                  78         /raid0/cassandra/data/cassa_teads/data_transac/
> 191           95       48                  48         /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 6               0        3                    3           /raid0/cassandra/
> 2               0        1                    1           /raid0/cassandra/data/
> 2               0        1                    1           /raid0/cassandra/commitlog/
> 2               0        1                    1           /raid0/cassandra/saved_caches/
> 
> 
> root@ip-10-250-202-154:~# cat inotifywatch5min
> total       access  modify  close_write  close_nowrite  open  moved_from  moved_to  create  delete  filename
> 307378  115456  77706   12                57119             57035        /raid0/cassandra/data/cassa_teads/data_viewer/
> 55395    26878   0           0                 14259             14258       /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 36155    17653   0           0                 9256               9246        /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 7377      188      7153      6                 4                    11           /raid0/cassandra/data/cassa_teads/data_action/
> 401        0         364        6                 4                    12           /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 244        120      0           0                 62                   62          /raid0/cassandra/data/cassa_teads/data_transac/
> 160        76        0           0                 42                   42         /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 26          0         0           0                 13                   13          /raid0/cassandra/data/cassa_teads/
> 12          0         2           2                 1                     3           /raid0/cassandra/commitlog/
> 6            0         0           0                 3                     3          /raid0/cassandra/
> 2            0         0           0                 1                     1          /raid0/cassandra/data/
> 2            0         0           0                 1                     1          /raid0/cassandra/saved_caches/
> 
> 
> root@ip-10-64-167-32:~# cat inotifywatch5min
> total       access  modify  close_write  close_nowrite  open    filename
> 218489   105379  0            0                  56615         56495  /raid0/cassandra/data/cassa_teads/data_viewer/
> 61553     29733   0             0                  15911        15909  /raid0/cassandra/data/cassa_teads/data_ip_viewer/
> 33262     16214   0             0                  8528          8520    /raid0/cassandra/data/cassa_teads/algo_ad_newcapping/
> 407         0         370          6                  4               12        /raid0/cassandra/data/cassa_teads/stats_ad_uv/
> 280         134      0             0                  73              73       /raid0/cassandra/data/cassa_teads/data_transac/
> 163          79       0             0                  42              42       /raid0/cassandra/data/cassa_teads/data_cust_website_viewer/
> 17            0        3             3                  1                 4        /raid0/cassandra/commitlog/
> 8             0        0              0                  4                 4        /raid0/cassandra/data/cassa_teads/
> 6             0        0              0                  3                 3        /raid0/cassandra/
> 2             0        0              0                  1                 1        /raid0/cassandra/data/
> 2             0        0              0                  1                 1        /raid0/cassandra/saved_caches/
> 
> I see no reason to have a md0 disk read throughput up to 8 time bigger on the .38 node...
> 
> root@ip-10-64-167-32:~# iostat -mx 5 100
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> xvdap1            0.00     0.40    0.20    0.40     0.00     0.00    18.67     0.00    6.67   6.67   0.40
> xvdb              3.60     0.00  471.60    0.00    13.48     0.00    58.53    33.31   69.58   1.79  84.20
> xvdc              3.40     0.00  474.40    0.00    13.76     0.00    59.42    30.06   62.63   1.72  81.60
> xvdd              2.60     0.00  478.20    0.00    13.69     0.00    58.61    28.74   61.74   1.76  84.40
> xvde              3.40     0.00  472.40    0.00    13.34     0.00    57.82    23.08   47.96   1.74  82.40
> md0               0.00     0.00 1954.20    0.00    55.57     0.00    58.24     0.00    0.00   0.00   0.00
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           11.95   11.88   11.54   48.33    0.20   16.10
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> xvdap1            0.00     2.40    0.40    1.00     0.00     0.01    25.14     0.15  110.00  55.71   7.80
> xvdb              3.40     0.00  481.60   63.20    13.94     2.59    62.13    33.79   62.86   1.56  84.80
> xvdc              3.60     0.00  469.20   64.40    13.43     2.68    61.81    40.35   76.23   1.54  82.40
> xvdd              6.60     0.20  480.80   64.40    13.93     2.65    62.28    45.99   84.72   1.63  88.60
> xvde              6.60     0.00  482.00   65.60    14.20     2.68    63.14    48.76   89.77   1.61  88.00
> md0               0.00     0.00 1910.80  257.80    54.92    10.60    61.88     0.00    0.00   0.00   0.00
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           12.41   12.82   10.93   36.15    0.21   27.48
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await  svctm  %util
> xvdap1            0.00     0.60    2.40    0.60     0.03     0.00    23.47     0.18   58.67  24.67   7.40
> xvdb              3.80     0.00  429.20    4.80    12.41     0.20    59.48    16.44   37.94   1.70  73.80
> xvdc              3.00     0.00  432.60    4.80    12.47     0.20    59.33    20.74   47.44   1.71  75.00
> xvdd              2.20     0.00  439.60    4.80    12.56     0.20    58.80    17.57   39.44   1.78  79.20
> xvde              2.00     0.00  449.20    3.60    12.84     0.15    58.73    15.96   35.29   1.68  76.00
> md0               0.00     0.00 1744.40   18.00    49.63     0.75    58.55     0.00    0.00   0.00   0.00
> 
> Did anyone experiments something similar ?
> 
> How can I dig further to see what is wrong ?
> 
> Any idea on what on C* can be responsible of this king of targeted reads ?
> 
> Alain