You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Chris Baron <Ch...@ip-soft.net> on 2011/02/16 05:25:32 UTC

Hinted Handoff/GC Tuning Headache

Recently upgraded my 8 node cluster from 0.6.6 to 0.7.0 (even more recently 0.7.1) for ExpiringColumn, among the many other spectacular improvements.

Retuned the GC settings based on experience from 0.6.6 and new defaults.

After about a week, two of the nodes were very far behind on minor compactions (2k+ SSTables per CF and growing, 20k+ pending compactions). The SSTable switch rate on these two nodes was about 10x higher than the other nodes. I also observed rolling long pause deaths (Gossip saying node X is dead), seemingly every three minutes one of the nodes would long pause GC. I saw this behavior also when I upgraded from 0.6.6 to 0.6.8, but I rolled back to 0.6.6 because time did not allow for a deeper observation at that time. (found this: https://issues.apache.org/jira/browse/CASSANDRA-1656)

I eventually traced this behavior back to a nasty interaction between Hinted Handoff and GC tuned for normal operating conditions.

If I understand the code correctly, when a node replays a hint it reads the hinted data directly from the application tables (read: my ColumnFamily). If the replaying node happens to be to also be a replica it will resend the entire row, even if only one column was mutated. Because of the rolling GC pause deaths the HHs rarely succeeded and if they did it wasn’t long before a new set of hints were recorded.

Disabling Hinted Handoffs has fixed this problem, for me.

Looking into intermittent GC issues further, the verbose gc log showed ParNew promotion failures, so I conservatively lowered CMSInitiatingOccupancyFraction, MAX_NEWSIZE, and in_memory_compaction_limit_in_mb. I’m now seeing long CMS times (8000ms+) but no failures, which leads me to believe 6G heap may be too large based on the current tuning.

It’s worth noting that I saw no increase in ColumnFamily WriteCount or StorageProxy.WriteOperations, only ColumnFamily MemtableColumnsCount and MemtableDataSize were increasing very rapidly on the target node while HintedHandoffs were replaying.

--
Chris