You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark Jones <MJ...@imagehawk.com> on 2010/04/08 18:25:07 UTC

SAR results don't seem overwhelming

I stopped writing to the cluster more than 8 hours ago, at worst case, I could only be getting a periodic memtable dump (I think)

Running 16 QUORUM read threads getting 600 records/second

Sar for all 3 nodes (collected almost simultaneously:
Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     10.86      0.00      2.61     44.47      0.00     42.06

Average:          tps      rtps      wtps   bread/s   bwrtn/s
Average:       284.76    283.96      0.80  14541.83      7.17
----------------
Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     14.33      0.00      2.99     31.45      0.00     51.23

Average:          tps      rtps      wtps   bread/s   bwrtn/s
Average:       219.26    217.96      1.30   4320.16     90.22
----------------
Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all     51.76      0.00      7.50     28.38      0.00     12.35

Average:          tps      rtps      wtps   bread/s   bwrtn/s
Average:       164.72    163.73      0.99  15892.17      8.72

And the client:------------------------------
Average:        CPU     %user     %nice   %system   %iowait    %steal     %idle
Average:        all      0.35      0.00      0.89      0.00      0.00     98.77

Average:          tps      rtps      wtps   bread/s   bwrtn/s
Average:         0.90      0.10      0.80     25.60     27.20


From: Avinash Lakshman [mailto:avinash.lakshman@gmail.com]
Sent: Thursday, April 08, 2010 10:15 AM
To: user@cassandra.apache.org
Subject: Re: Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL

The tooth wave in memory utilization could be memtable dumps. I/O wait in TCP happens when you are overwhelming the server with requests. Could you run sar and find out how many bytes/sec you are receiving/transmitting?

Cheers
Avinash
On Thu, Apr 8, 2010 at 7:45 AM, Mark Jones <MJ...@imagehawk.com>> wrote:
I don't see any way to increase the # of active Deserializers in storage-conf.xml

Tpstats more than 8 hours after insert/read stop

Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            227
STREAM-STAGE                      0         0              1
RESPONSE-STAGE                    0         0       76724280
ROW-READ-STAGE                    8      4091        1138277
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1   1849826       78135012
GMFD                              0         0         136886
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0           1803
ROW-MUTATION-STAGE                0         0       68669717
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            438
FLUSH-WRITER-POOL                 0         0            438
AE-SERVICE-STAGE                  0         0              3
HINTED-HANDOFF-POOL               0         0              3

More than 30 minutes later (with no reads or writes to the cluster)

Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            227
STREAM-STAGE                      0         0              1
RESPONSE-STAGE                    0         0       76724280
ROW-READ-STAGE                    8      4098        1314304
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         1   1663578       78336771
GMFD                              0         0         142651
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0           1803
ROW-MUTATION-STAGE                0         0       68669717
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            438
FLUSH-WRITER-POOL                 0         0            438
AE-SERVICE-STAGE                  0         0              3
HINTED-HANDOFF-POOL               0         0              3

The other 2 nodes in the cluster have Pending Counts of 0, but this node seems hung
indefinitely processing requests that should have long ago timed out for the client.

TOP is showing a huge amount of I/O Wait, but I'm not sure how to track where the wait is happening below here.  I now have jconsole up and running on this machine, and the memory usage appears to be a saw tooth wave, going from 1GB up to 4GB over 3 hours, then plunging back to 1GB and resuming its climb.

top - 08:33:40 up 1 day, 19:25,  4 users,  load average: 7.75, 7.96, 8.16
Tasks: 177 total,   2 running, 175 sleeping,   0 stopped,   0 zombie
Cpu(s): 16.6%us,  7.2%sy,  0.0%ni, 34.5%id, 41.1%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:   8123068k total,  8062240k used,    60828k free,     2624k buffers
Swap: 12699340k total,  1951504k used, 10747836k free,  3757300k cached


Re: SAR results don't seem overwhelming

Posted by Jonathan Ellis <jb...@gmail.com>.
Can you keep this to one thread please?  It is hard to follow when the
subject keeps changing.