You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Hefeng Yuan <hf...@rhapsody.com> on 2011/08/18 21:33:38 UTC

Nodetool repair takes 4+ hours for about 10G data

Hi,

Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?

The ring looks like below, we're using 0.8.1. Our repair is scheduled to run once per week for all nodes.

Compaction related configuration is like this:
#concurrent_compactors: 1
compaction_throughput_mb_per_sec: 16

Address         DC                 Rack   Load      Owns      
                                                      
10.150.13.92    Cassandra   RAC1  13.31 GB  16.67%    
10.150.12.61    Brisk             RAC1  5.89 GB   8.33%     
10.150.13.48    Cassandra   RAC1  8.62 GB   8.33%     
10.150.13.62    Cassandra   RAC1  12.62 GB  16.67%    
10.150.12.58    Brisk            RAC1  5.98 GB   8.33%     
10.150.13.88    Cassandra   RAC1  16.69 GB  8.33%     
10.150.13.89    Cassandra   RAC1  15.26 GB  16.67%    
10.150.12.62    Brisk            RAC1  3.72 GB   8.33%     
10.150.13.90    Cassandra   RAC1  35.01 GB  8.33% 

Thanks,
Hefeng

Re: Nodetool repair takes 4+ hours for about 10G data

Posted by Peter Schuller <pe...@infidyne.com>.
> The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.)

It does.

-- 
/ Peter Schuller (@scode on twitter)

Re: Nodetool repair takes 4+ hours for about 10G data

Posted by aaron morton <aa...@thelastpickle.com>.
The compactions ettings do not affect repair. (Thinking out loud, or does it ? Validation compactions and table builds.)

Watch the logs or check 

nodetool compactionstats to see when the Validation completes completes.
and
nodetool netstats to see how long the data transfer takes

It sounds a little long. It could be either the time taken to work out the differences or the time taken to stream the data across.

Cheers
 
-----------------
Aaron Morton
Freelance Cassandra Developer
@aaronmorton
http://www.thelastpickle.com

On 19/08/2011, at 7:33 AM, Hefeng Yuan wrote:

> Hi,
> 
> Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?
> 
> The ring looks like below, we're using 0.8.1. Our repair is scheduled to run once per week for all nodes.
> 
> Compaction related configuration is like this:
> #concurrent_compactors: 1
> compaction_throughput_mb_per_sec: 16
> 
> Address         DC                 Rack   Load      Owns      
> 
> 10.150.13.92    Cassandra   RAC1  13.31 GB  16.67%    
> 10.150.12.61    Brisk             RAC1  5.89 GB   8.33%     
> 10.150.13.48    Cassandra   RAC1  8.62 GB   8.33%     
> 10.150.13.62    Cassandra   RAC1  12.62 GB  16.67%    
> 10.150.12.58    Brisk            RAC1  5.98 GB   8.33%     
> 10.150.13.88    Cassandra   RAC1  16.69 GB  8.33%     
> 10.150.13.89    Cassandra   RAC1  15.26 GB  16.67%    
> 10.150.12.62    Brisk            RAC1  3.72 GB   8.33%     
> 10.150.13.90    Cassandra   RAC1  35.01 GB  8.33% 
> 
> Thanks,
> Hefeng


Re: Nodetool repair takes 4+ hours for about 10G data

Posted by Peter Schuller <pe...@infidyne.com>.
> Is it normal that the repair takes 4+ hours for every node, with only about 10G data? If this is not expected, do we have any hint what could be causing this?

It does not seem entirely crazy, depending on the nature of your data
and how CPU-intensive it is "per byte" to compact.

Assuming there is no functional problem that is delaying this, the
question is what the bottleneck is. If you have a lot of read traffic
that is keeping the drives busy, it might be that compaction is
throttling on reading from disk (despite being sequential for the
compaction) because of the live reads. Else you might be CPU bound
(you can use something like htop to gauge fairly well whether you seem
to be saturating a core doing compaction).

To be clear, the processes to watch for are:

* The "validating compaction" happening on the node repairing AND ITS
NEIGHBORS - can be CPU or I/O bound (or throttled) - nodetool
compactionstats, htop, iostat -x -k 1
* Streaming of data - can be network or disk bound (maybe throttled if
the streaming throttling is in the version you're running) - nodetool
netstats, ifstat, iostat -x -k 1
* The "sstable rebuild" compaction happening after streaming, building
bloom filters and indexes. Can be CPU or I/O bound (or throttled) -
nodetool compactionstats, htop, iostat -x -k 1

-- 
/ Peter Schuller (@scode on twitter)