You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Stefano Ortolani <os...@gmail.com> on 2016/11/01 09:37:25 UTC

Re: Incremental repairs leading to unrepaired data

That is not happening anymore since I am repairing a keyspace with
much less data (the other one is still there in write-only mode).
The command I am using is the most boring (even shed the -pr option so
to keep anticompactions to a minimum): nodetool -h localhost repair
<keyspace>
It's executed sequentially on each node (no overlapping, next node
waits for the previous to complete).

Regards,
Stefano Ortolani

On Mon, Oct 31, 2016 at 11:18 PM, kurt Greaves <ku...@instaclustr.com> wrote:
> Blowing out to 1k SSTables seems a bit full on. What args are you passing to
> repair?
>
> Kurt Greaves
> kurt@instaclustr.com
> www.instaclustr.com
>
> On 31 October 2016 at 09:49, Stefano Ortolani <os...@gmail.com> wrote:
>>
>> I've collected some more data-points, and I still see dropped
>> mutations with compaction_throughput_mb_per_sec set to 8.
>> The only notable thing regarding the current setup is that I have
>> another keyspace (not being repaired though) with really wide rows
>> (100MB per partition), but that shouldn't have any impact in theory.
>> Nodes do not seem that overloaded either and don't see any GC spikes
>> while those mutations are dropped :/
>>
>> Hitting a dead end here, any further idea where to look for further ideas?
>>
>> Regards,
>> Stefano
>>
>> On Wed, Aug 10, 2016 at 12:41 PM, Stefano Ortolani <os...@gmail.com>
>> wrote:
>> > That's what I was thinking. Maybe GC pressure?
>> > Some more details: during anticompaction I have some CFs exploding to 1K
>> > SStables (to be back to ~200 upon completion).
>> > HW specs should be quite good (12 cores/32 GB ram) but, I admit, still
>> > relying on spinning disks, with ~150GB per node.
>> > Current version is 3.0.8.
>> >
>> >
>> > On Wed, Aug 10, 2016 at 12:36 PM, Paulo Motta <pa...@gmail.com>
>> > wrote:
>> >>
>> >> That's pretty low already, but perhaps you should lower to see if it
>> >> will
>> >> improve the dropped mutations during anti-compaction (even if it
>> >> increases
>> >> repair time), otherwise the problem might be somewhere else. Generally
>> >> dropped mutations is a signal of cluster overload, so if there's
>> >> nothing
>> >> else wrong perhaps you need to increase your capacity. What version are
>> >> you
>> >> in?
>> >>
>> >> 2016-08-10 8:21 GMT-03:00 Stefano Ortolani <os...@gmail.com>:
>> >>>
>> >>> Not yet. Right now I have it set at 16.
>> >>> Would halving it more or less double the repair time?
>> >>>
>> >>> On Tue, Aug 9, 2016 at 7:58 PM, Paulo Motta <pa...@gmail.com>
>> >>> wrote:
>> >>>>
>> >>>> Anticompaction throttling can be done by setting the usual
>> >>>> compaction_throughput_mb_per_sec knob on cassandra.yaml or via
>> >>>> nodetool
>> >>>> setcompactionthroughput. Did you try lowering that  and checking if
>> >>>> that
>> >>>> improves the dropped mutations?
>> >>>>
>> >>>> 2016-08-09 13:32 GMT-03:00 Stefano Ortolani <os...@gmail.com>:
>> >>>>>
>> >>>>> Hi all,
>> >>>>>
>> >>>>> I am running incremental repaird on a weekly basis (can't do it
>> >>>>> every
>> >>>>> day as one single run takes 36 hours), and every time, I have at
>> >>>>> least one
>> >>>>> node dropping mutations as part of the process (this almost always
>> >>>>> during
>> >>>>> the anticompaction phase). Ironically this leads to a system where
>> >>>>> repairing
>> >>>>> makes data consistent at the cost of making some other data not
>> >>>>> consistent.
>> >>>>>
>> >>>>> Does anybody know why this is happening?
>> >>>>>
>> >>>>> My feeling is that this might be caused by anticompacting column
>> >>>>> families with really wide rows and with many SStables. If that is
>> >>>>> the case,
>> >>>>> any way I can throttle that?
>> >>>>>
>> >>>>> Thanks!
>> >>>>> Stefano
>> >>>>
>> >>>>
>> >>>
>> >>
>> >
>
>

Re: Incremental repairs leading to unrepaired data

Posted by kurt Greaves <ku...@instaclustr.com>.

Can't say I have too many ideas. If load is low during the repair it
shouldn't be happening. Your disks aren't overutilised correct? No other
processes writing loads of data to them?