You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Nimi Wariboko Jr <ni...@channelmeter.com> on 2015/06/20 02:15:07 UTC

Performance tuning for read throughput w/ token scans?

[Cassandra 2.1.5]

I'm trying to explore my options for increasing read throughput with token
scans (SELECT * FROM x WHERE token(y) > L AND token(y) < L). So far I've
started by reading an entire virtual token range from a single node.

Currently on a single query I can read about 57,286.03 rows/s which
translates to 5.5 MiB/s. However under load (even under heavy load) my disk
utilization never gets that high (SSDs, less than 10%) - nor does my
network utilization (1gbit).

So far I've tried -
 - Moving to the G1 collector (started with the cassandra-env that is was
linked from CASSANDRA-7486) - which reduced timeouts which I think were
caused longish pauses
 - Enabling TIMEHORIZON message coalescing

I'm still very new to JVM tuning but I used jstack to inspect what was
going on in threads with high cpu usage. Its almost always either
OutBoundTcpConnection stack/thread or SEPWorker stack/thread - and judging
by what the SEPWorker does (I mostly see compares like
https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/db/composites/AbstractCType.java#L185),
I think I might be CPU bound? (I'm still new to the actual Cassandra source
code, so apologies if that doesn't make sense either).

Given this information, does anyone have any pointers on what levers I
could pull next or other things I can look to measure?

Thanks for any help,
Nimi