You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jeff Ferland <jb...@tubularlabs.com> on 2016/02/04 22:27:02 UTC

System block cache vs. disk access and metrics

We struggled for a while to upgrade due to an out of order SStables bug. During this time, load continued to increase and we were eventually accessing the disk a lot. When we could finally expand the cluster, the went down by an order of magnitude. This leads me to conclude that we had blown out the block cache.

Linux unfortunately doesn’t have a metric for tracking the block cache hit ratio. There is system tap which may be the way we have to go, but I’m wondering about Cassandra counters as well. If I can track the ratio of SSTable reads vs. actual disk reads, I’ll have sufficiently good enough data to not spend my time writing up a system tap script.

This brings about the following specific questions:
 * Which if any metric corresponds to the number of queries made by clients
 * Which if any metric corresponds to the number of sstable reads performed

Metrics such as cassandra.ReadCount aren’t perfectly clear as to what they do and don’t indicate, so feedback on that before I go on another source code reading adventure is welcomed.

Cheers all,
-Jeff