You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Romain Hardouin <ro...@yahoo.fr> on 2016/10/14 19:15:08 UTC

Repair: huge boost on C* 2.1 with CASSANDRA-12580

Hi all,

Many people here have troubles with repair so I would like to share my experience regarding the backport of CASSANDRA-12580 "Fix merkle tree size calculation" (thanks Paulo!) in our C* 2.1.16. I was expecting some minor improvements but the results are impressive on some tables.

Because of a slow VPN between our EU and US AWS DCs, the massive drop of overstreaming is a big win for us. On top of that, before the backport I used to see many RepairException that increased during each repair. With this fix the graph shows only one exception on one node, so we can say it's negligible. Such exceptions are not critical because Cassandra-reaper makes a retry but it's a waste of time.


I run a repair on tables set by set (some sets of tables being more critical, etc.).
The most impressive result so far for a set is:
* Before: 23 days (days, not hours)
* With CASSANDRA-12580: 16 hours (yes, hours!)

The improvement is not always dramatic (e.g. 8 hours instead of 39 hours on another set) but still significant and valuable.

Moreover, considering that:
* repair is a mandatory operation in many use cases
* Paulo already made the patch for 2.1
* C* 2.1 is widely used (the most used?)
I think this bugfix is critical - from an Ops point of view - and should land in 2.1.17 to be available to people that don't deploy from sources.

Best,

Romain