You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Daniel Doubleday <da...@gmx.net> on 2011/03/07 18:18:18 UTC

Alternative to repair

Hi all

we're still on 0.6 and are facing problems with repairs. 

I.e. a repair for one CF takes around 60h and we have to do that twice (RF=3, 5 nodes). During that time the cluster is under pretty heavy IO load. It kinda works but during peek times we see lots of dropped messages (including writes). So we are actually creating inconsistencies that we are trying to fix with the repair.

Since we already have a very simple hadoopish framework in place which allows us to do token range walks with multiple workers and restart at a given position in case of failure I created a simple worker that would read everything with CL_ALL. With only one worker and almost no performance impact one scan took 7h.

My understanding is that at that point due to read repair I got the same as I would have achieved with repair runs.

Is that true or am I missing something?

Cheers,
Daniel

Re: Alternative to repair

Posted by Daniel Doubleday <da...@gmx.net>.

Thanks for the reply!

> Not really:
> 
> - range scans do not perform read repair

Ok I obviously overlooked that RangeSliceResponseResolver does not repair rows on nodes that never saw a write for a given key at all. But that's not a big problem for us since we are mainly interested in fixing missed deletions. And read repairs for conflicting updates seem to work fine too.

> - if you converted it to range scan + [multi]get, the RR messages are
> fair game to drop to cope with load ("active" repair messages are
> never dropped in 0.6.7+)

We actually have a little hack for that: In the special case of CL_ALL on range slice queries we perform synchronous mutations as read repairs (instead of read repair messages). This way we get timeouts for the read when a repair fails. In that case we restart at the given token and continue from there when the cluster load is lower.

For the time being I guess thats good enough and we hope that 0.7 works a little smoother when doing repairs.

Cheers,
Daniel

On Mar 7, 2011, at 7:22 PM, Jonathan Ellis wrote:

> On Mon, Mar 7, 2011 at 11:18 AM, Daniel Doubleday
> <da...@gmx.net> wrote:
>> Since we already have a very simple hadoopish framework in place which allows us to do token range walks with multiple workers and restart at a given position in case of failure I created a simple worker that would read everything with CL_ALL. With only one worker and almost no performance impact one scan took 7h.
>> 
>> My understanding is that at that point due to read repair I got the same as I would have achieved with repair runs.
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of DataStax, the source for professional Cassandra support
> http://www.datastax.com

Re: Alternative to repair

Posted by Jonathan Ellis <jb...@gmail.com>.

On Mon, Mar 7, 2011 at 11:18 AM, Daniel Doubleday
<da...@gmx.net> wrote:
> Since we already have a very simple hadoopish framework in place which allows us to do token range walks with multiple workers and restart at a given position in case of failure I created a simple worker that would read everything with CL_ALL. With only one worker and almost no performance impact one scan took 7h.
>
> My understanding is that at that point due to read repair I got the same as I would have achieved with repair runs.

Not really:

- range scans do not perform read repair
- if you converted it to range scan + [multi]get, the RR messages are
fair game to drop to cope with load ("active" repair messages are
never dropped in 0.6.7+)

-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com