You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hefeng Yuan <hf...@rhapsody.com> on 2011/08/18 21:33:38 UTC
Nodetool repair takes 4+ hours for about 10G data
Hi,
Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?
The ring looks like below, we're using 0.8.1. Our repair is scheduled to run once per week for all nodes.
Compaction related configuration is like this:
#concurrent_compactors: 1
compaction_throughput_mb_per_sec: 16
Address DC Rack Load Owns
10.150.13.92 Cassandra RAC1 13.31 GB 16.67%
10.150.12.61 Brisk RAC1 5.89 GB 8.33%
10.150.13.48 Cassandra RAC1 8.62 GB 8.33%
10.150.13.62 Cassandra RAC1 12.62 GB 16.67%
10.150.12.58 Brisk RAC1 5.98 GB 8.33%
10.150.13.88 Cassandra RAC1 16.69 GB 8.33%
10.150.13.89 Cassandra RAC1 15.26 GB 16.67%
10.150.12.62 Brisk RAC1 3.72 GB 8.33%
10.150.13.90 Cassandra RAC1 35.01 GB 8.33%
Thanks,
Hefeng
Re: Nodetool repair takes 4+ hours for about 10G data
Posted by Peter Schuller <pe...@infidyne.com>.
> The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.)
It does.
--
/ Peter Schuller (@scode on twitter)
Re: Nodetool repair takes 4+ hours for about 10G data
Posted by aaron morton <aa...@thelastpickle.com>.
The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.)
Watch the logs or check
nodetool compactionstats to see when the Validation completes completes.
and
nodetool netstats to see how long the data transfer takes
It sounds a little long. It could be either the time taken to work out the differences or the time taken to stream the data across.
Cheers
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com
On 19/08/2011, at 7:33 AM, Hefeng Yuan wrote:
> Hi,
>
> Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?
>
> The ring looks like below, we're using 0.8.1. Our repair is scheduled to run once per week for all nodes.
>
> Compaction related configuration is like this:
> #concurrent_compactors: 1
> compaction_throughput_mb_per_sec: 16
>
> Address DC Rack Load Owns
>
> 10.150.13.92 Cassandra RAC1 13.31 GB 16.67%
> 10.150.12.61 Brisk RAC1 5.89 GB 8.33%
> 10.150.13.48 Cassandra RAC1 8.62 GB 8.33%
> 10.150.13.62 Cassandra RAC1 12.62 GB 16.67%
> 10.150.12.58 Brisk RAC1 5.98 GB 8.33%
> 10.150.13.88 Cassandra RAC1 16.69 GB 8.33%
> 10.150.13.89 Cassandra RAC1 15.26 GB 16.67%
> 10.150.12.62 Brisk RAC1 3.72 GB 8.33%
> 10.150.13.90 Cassandra RAC1 35.01 GB 8.33%
>
> Thanks,
> Hefeng
Re: Nodetool repair takes 4+ hours for about 10G data
Posted by Peter Schuller <pe...@infidyne.com>.
> Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?
It does not seem entirely crazy, depending on the nature of your data
and how CPU-intensive it is "per byte" to compact.
Assuming there is no functional problem that is delaying this, the
question is what the bottleneck is. If you have a lot of read traffic
that is keeping the drives busy, it might be that compaction is
throttling on reading from disk (despite being sequential for the
compaction) because of the live reads. Else you might be CPU bound
(you can use something like htop to gauge fairly well whether you seem
to be saturating a core doing compaction).
To be clear, the processes to watch for are:
* The "validating compaction" happening on the node repairing AND ITS
NEIGHBORS - can be CPU or I/O bound (or throttled) - nodetool
compactionstats, htop, iostat -x -k 1
* Streaming of data - can be network or disk bound (maybe throttled if
the streaming throttling is in the version you're running) - nodetool
netstats, ifstat, iostat -x -k 1
* The "sstable rebuild" compaction happening after streaming, building
bloom filters and indexes. Can be CPU or I/O bound (or throttled) -
nodetool compactionstats, htop, iostat -x -k 1
--
/ Peter Schuller (@scode on twitter)