You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Mark Jones <MJ...@imagehawk.com> on 2010/04/08 18:25:07 UTC
SAR results don't seem overwhelming
I stopped writing to the cluster more than 8 hours ago, at worst case, I could only be getting a periodic memtable dump (I think)
Running 16 QUORUM read threads getting 600 records/second
Sar for all 3 nodes (collected almost simultaneously:
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 10.86 0.00 2.61 44.47 0.00 42.06
Average: tps rtps wtps bread/s bwrtn/s
Average: 284.76 283.96 0.80 14541.83 7.17
----------------
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 14.33 0.00 2.99 31.45 0.00 51.23
Average: tps rtps wtps bread/s bwrtn/s
Average: 219.26 217.96 1.30 4320.16 90.22
----------------
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 51.76 0.00 7.50 28.38 0.00 12.35
Average: tps rtps wtps bread/s bwrtn/s
Average: 164.72 163.73 0.99 15892.17 8.72
And the client:------------------------------
Average: CPU %user %nice %system %iowait %steal %idle
Average: all 0.35 0.00 0.89 0.00 0.00 98.77
Average: tps rtps wtps bread/s bwrtn/s
Average: 0.90 0.10 0.80 25.60 27.20
From: Avinash Lakshman [mailto:avinash.lakshman@gmail.com]
Sent: Thursday, April 08, 2010 10:15 AM
To: user@cassandra.apache.org
Subject: Re: Some insight into the slow read speed. Where to go from here? RC1 MESSAGE-DESERIALIZER-POOL
The tooth wave in memory utilization could be memtable dumps. I/O wait in TCP happens when you are overwhelming the server with requests. Could you run sar and find out how many bytes/sec you are receiving/transmitting?
Cheers
Avinash
On Thu, Apr 8, 2010 at 7:45 AM, Mark Jones <MJ...@imagehawk.com>> wrote:
I don't see any way to increase the # of active Deserializers in storage-conf.xml
Tpstats more than 8 hours after insert/read stop
Pool Name Active Pending Completed
FILEUTILS-DELETE-POOL 0 0 227
STREAM-STAGE 0 0 1
RESPONSE-STAGE 0 0 76724280
ROW-READ-STAGE 8 4091 1138277
LB-OPERATIONS 0 0 0
MESSAGE-DESERIALIZER-POOL 1 1849826 78135012
GMFD 0 0 136886
LB-TARGET 0 0 0
CONSISTENCY-MANAGER 0 0 1803
ROW-MUTATION-STAGE 0 0 68669717
MESSAGE-STREAMING-POOL 0 0 0
LOAD-BALANCER-STAGE 0 0 0
FLUSH-SORTER-POOL 0 0 0
MEMTABLE-POST-FLUSHER 0 0 438
FLUSH-WRITER-POOL 0 0 438
AE-SERVICE-STAGE 0 0 3
HINTED-HANDOFF-POOL 0 0 3
More than 30 minutes later (with no reads or writes to the cluster)
Pool Name Active Pending Completed
FILEUTILS-DELETE-POOL 0 0 227
STREAM-STAGE 0 0 1
RESPONSE-STAGE 0 0 76724280
ROW-READ-STAGE 8 4098 1314304
LB-OPERATIONS 0 0 0
MESSAGE-DESERIALIZER-POOL 1 1663578 78336771
GMFD 0 0 142651
LB-TARGET 0 0 0
CONSISTENCY-MANAGER 0 0 1803
ROW-MUTATION-STAGE 0 0 68669717
MESSAGE-STREAMING-POOL 0 0 0
LOAD-BALANCER-STAGE 0 0 0
FLUSH-SORTER-POOL 0 0 0
MEMTABLE-POST-FLUSHER 0 0 438
FLUSH-WRITER-POOL 0 0 438
AE-SERVICE-STAGE 0 0 3
HINTED-HANDOFF-POOL 0 0 3
The other 2 nodes in the cluster have Pending Counts of 0, but this node seems hung
indefinitely processing requests that should have long ago timed out for the client.
TOP is showing a huge amount of I/O Wait, but I'm not sure how to track where the wait is happening below here. I now have jconsole up and running on this machine, and the memory usage appears to be a saw tooth wave, going from 1GB up to 4GB over 3 hours, then plunging back to 1GB and resuming its climb.
top - 08:33:40 up 1 day, 19:25, 4 users, load average: 7.75, 7.96, 8.16
Tasks: 177 total, 2 running, 175 sleeping, 0 stopped, 0 zombie
Cpu(s): 16.6%us, 7.2%sy, 0.0%ni, 34.5%id, 41.1%wa, 0.0%hi, 0.6%si, 0.0%st
Mem: 8123068k total, 8062240k used, 60828k free, 2624k buffers
Swap: 12699340k total, 1951504k used, 10747836k free, 3757300k cached
Re: SAR results don't seem overwhelming
Posted by Jonathan Ellis <jb...@gmail.com>.
Can you keep this to one thread please? It is hard to follow when the
subject keeps changing.