You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Fabrice Facorat <fa...@gmail.com> on 2016/03/04 16:55:22 UTC

Re: Increase compaction performance

Any news on this ?

We also have issues during repairs when using many LCS tables. We end
up with 8k sstables, many pending tasks and dropped mutations

We are using Cassandra 2.0.10, on a 24 cores server, with
multithreaded compactions enabled.

~$ nodetool getstreamthroughput
Current stream throughput: 200 MB/s

~$ nodetool getcompactionthroughput
Current compaction throughput: 16 MB/s

Most sstables are tiny 4K or 8K/12K sstables:

~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' | wc -l
7405
~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | wc -l
7440

~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' |
cut -f1 -d" " | sort | uniq -c
     36
   7003 4.0K
    396 8.0K


Pool Name                    Active   Pending      Completed   Blocked
 All time blocked
ReadStage                         0         0      258098148         0
                0
RequestResponseStage              0         0      613994884         0
                0
MutationStage                     0         0      332242206         0
                0
ReadRepairStage                   0         0        3360040         0
                0
ReplicateOnWriteStage             0         0              0         0
                0
GossipStage                       0         0        2471033         0
                0
CacheCleanupExecutor              0         0              0         0
                0
MigrationStage                    0         0              0         0
                0
MemoryMeter                       0         0          25160         0
                0
FlushWriter                       1         1         134083         0
              521
ValidationExecutor                1         1          89514         0
                0
InternalResponseStage             0         0              0         0
                0
AntiEntropyStage                  0         0         636471         0
                0
MemtablePostFlusher               1         1         334667         0
                0
MiscStage                         0         0              0         0
                0
PendingRangeCalculator            0         0            181         0
                0
commitlog_archiver                0         0              0         0
                0
CompactionExecutor               24        24        5241768         0
                0
AntiEntropySessions               0         0          15184         0
                0
HintedHandoff                     0         0            278         0
                0

Message type           Dropped
RANGE_SLICE                  0
READ_REPAIR                267
PAGED_RANGE                  0
BINARY                       0
READ                         0
MUTATION                150970
_TRACE                       0
REQUEST_RESPONSE             0
COUNTER_MUTATION             0


2016-02-12 20:08 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
> I had to decrease streaming throughput to 10 (from default 200) in order to
> avoid effect or rising number of SSTables and number of compaction tasks
> while running repair. It's working very slow but it's stable and doesn't
> hurt the whole cluster. Will try to adjust configuration gradually to see if
> can make it any better. Thanks!
>
> On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <ml...@gmail.com> wrote:
>>
>>
>>
>> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <ar...@gmail.com>
>> wrote:
>>>
>>> Also, are you using incremental repairs (not sure about the available
>>> options in Spotify Reaper) what command did you run ?
>>>
>>
>> No.
>>
>>>
>>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <ar...@gmail.com>:
>>>>>
>>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>>>>
>>>>
>>>>
>>>> What is your current compaction throughput ?  The current value of
>>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>>
>>
>>
>> Throughput was initially set to 1024 and I've gradually increased it to
>> 2048, 4K and 16K but haven't seen any changes. Tried to change it both from
>> `nodetool` and also cassandra.yaml (with restart after changes).
>>
>>>>
>>>>
>>>> nodetool getcompactionthroughput
>>>>
>>>>> How to speed up compaction? Increased compaction throughput and
>>>>> concurrent compactors but no change. Seems there is plenty idle resources
>>>>> but can't force C* to use it.
>>>>
>>>>
>>>> You might want to try un-throttle the compaction throughput through:
>>>>
>>>> nodetool setcompactionsthroughput 0
>>>>
>>>> Choose a canari node. Monitor compaction pending and disk throughput
>>>> (make sure server is ok too - CPU...)
>>
>>
>>
>> Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
>> sceptical about it.
>>
>>>>
>>>>
>>>> Some other information could be useful:
>>>>
>>>> What is your number of cores per machine and the compaction strategies
>>>> for the 'most compacting' tables. What are write/update patterns, any TTL or
>>>> tombstones ? Do you use a high number of vnodes ?
>>
>>
>> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
>> 256.
>>
>> Using LCS for all tables. Write / update heavy. No warnings about large
>> number of tombstones but we're removing items frequently.
>>
>>
>>>>
>>>>
>>>> Also what is your repair routine and your values for gc_grace_seconds ?
>>>> When was your last repair and do you think your cluster is suffering of a
>>>> high entropy ?
>>
>>
>> We're having problem with repair for months (CASSANDRA-9935).
>> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
>> successfully for long time I guess cluster is suffering of high entropy.
>>
>>>>
>>>>
>>>> You can lower the stream throughput to make sure nodes can cope with
>>>> what repairs are feeding them.
>>>>
>>>> nodetool getstreamthroughput
>>>> nodetool setstreamthroughput X
>>
>>
>> Yes, this sounds interesting. As we're having problem with repair for
>> months it could that lots of things are transferred between nodes.
>>
>> Thanks!
>>
>>>>
>>>>
>>>> C*heers,
>>>>
>>>> -----------------
>>>> Alain Rodriguez
>>>> France
>>>>
>>>> The Last Pickle
>>>> http://www.thelastpickle.com
>>>>
>>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
>>>>>
>>>>> Hi,
>>>>>
>>>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>>>>> using Cassandra Reaper but nodes after couple of hours are full of pending
>>>>> compaction tasks (regular not the ones about validation)
>>>>>
>>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
>>>>>
>>>>> How to speed up compaction? Increased compaction throughput and
>>>>> concurrent compactors but no change. Seems there is plenty idle resources
>>>>> but can't force C* to use it.
>>>>>
>>>>> Any clue where there might be a bottleneck?
>>>>>
>>>>>
>>>>> --
>>>>> BR,
>>>>> Michał Łowicki
>>>>>
>>>>
>>>
>>
>>
>>
>> --
>> BR,
>> Michał Łowicki
>
>
>
>
> --
> BR,
> Michał Łowicki



-- 
Close the World, Open the Net
http://www.linux-wizard.net

Re: Increase compaction performance

Posted by Fabrice Facorat <fa...@gmail.com>.
@Alain:

Indeed when repairing (or bootstraping) all sstables end up in L0 as
original level is not passed down to the node. So cassandra end up
compacting a lot of sstables in L0 before trying to make them move to upper
levels.

The issue still exist in 2.1 and is even worse as you have less concurrent
comapctor available ('we had 24 with our 24 cores servers).

Presently (with cassandra 2.1) we set concurrent compaction to 4 and
compactionthroughput to 128MB/s as this si the setup that use less CPU.

I will try to reduce the number of replicats sending sstables to the node
by switching to sequential repairs (we were using // repairs), and really
if this is not enough, we will try to throttle streamthroughput (and reduce
also number of LCS tables)

Thanks :)



2016-03-10 11:26 GMT+01:00 Alain RODRIGUEZ <ar...@gmail.com>:

> Hi Michal,
>
> Sorry about the delay answering here.
>
> The value you gave (10) looks a lot like what I had to do in the past in
> the cluster I managed. I described the issue here:
> https://issues.apache.org/jira/browse/CASSANDRA-9509
>
> A few people hit this issue already. Hope you were able to successfully
> complete the first repair correctly without harming the cluster too much.
>
> @Fabrice,
>
> People answering the list probably missed your post as it was on an open
> thread and I have been out so I missed it too.
>
> I would:
>
> Set concurrent compactors to 8 (max) - Can be updated through JMX
> Set compaction throughput to 32, 64 or even 0 (go incrementally and on one
> node first). Use 0 if you have SSD, unless you'll probably make the disk
> throughput a bottleneck - Can be updated through nodetool
> Set stream throughput to 10 Mb/s - Can be updated through nodetool (by the
> way it is Mb and not MB)
>
> Monitor resources and number of sstable, see how it goes.
>
> You're also probably hitting
> https://issues.apache.org/jira/browse/CASSANDRA-9509.
>
> Also using LCS, I read (but was not able to find the reference or the fix
> version) that repaired data was put back in L0, inducing even more
> compactions. Have no more info about this, but upgrading to 2.0.Last is
> needed and I would probably go 2.1.last as a lot of stuff around repairs
> were fixed there and Cassandra 2.0 is no longer supported.
>
> Hope you will find a way to mitigate thing though, or already have. Bonne
> chance ;-).
>
> C*heers,
> -----------------------
> Alain Rodriguez - alain@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
>
> 2016-03-04 16:55 GMT+01:00 Fabrice Facorat <fa...@gmail.com>:
>
>> Any news on this ?
>>
>> We also have issues during repairs when using many LCS tables. We end
>> up with 8k sstables, many pending tasks and dropped mutations
>>
>> We are using Cassandra 2.0.10, on a 24 cores server, with
>> multithreaded compactions enabled.
>>
>> ~$ nodetool getstreamthroughput
>> Current stream throughput: 200 MB/s
>>
>> ~$ nodetool getcompactionthroughput
>> Current compaction throughput: 16 MB/s
>>
>> Most sstables are tiny 4K or 8K/12K sstables:
>>
>> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' | wc
>> -l
>> 7405
>> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | wc -l
>> 7440
>>
>> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' |
>> cut -f1 -d" " | sort | uniq -c
>>      36
>>    7003 4.0K
>>     396 8.0K
>>
>>
>> Pool Name                    Active   Pending      Completed   Blocked
>>  All time blocked
>> ReadStage                         0         0      258098148         0
>>                 0
>> RequestResponseStage              0         0      613994884         0
>>                 0
>> MutationStage                     0         0      332242206         0
>>                 0
>> ReadRepairStage                   0         0        3360040         0
>>                 0
>> ReplicateOnWriteStage             0         0              0         0
>>                 0
>> GossipStage                       0         0        2471033         0
>>                 0
>> CacheCleanupExecutor              0         0              0         0
>>                 0
>> MigrationStage                    0         0              0         0
>>                 0
>> MemoryMeter                       0         0          25160         0
>>                 0
>> FlushWriter                       1         1         134083         0
>>               521
>> ValidationExecutor                1         1          89514         0
>>                 0
>> InternalResponseStage             0         0              0         0
>>                 0
>> AntiEntropyStage                  0         0         636471         0
>>                 0
>> MemtablePostFlusher               1         1         334667         0
>>                 0
>> MiscStage                         0         0              0         0
>>                 0
>> PendingRangeCalculator            0         0            181         0
>>                 0
>> commitlog_archiver                0         0              0         0
>>                 0
>> CompactionExecutor               24        24        5241768         0
>>                 0
>> AntiEntropySessions               0         0          15184         0
>>                 0
>> HintedHandoff                     0         0            278         0
>>                 0
>>
>> Message type           Dropped
>> RANGE_SLICE                  0
>> READ_REPAIR                267
>> PAGED_RANGE                  0
>> BINARY                       0
>> READ                         0
>> MUTATION                150970
>> _TRACE                       0
>> REQUEST_RESPONSE             0
>> COUNTER_MUTATION             0
>>
>>
>> 2016-02-12 20:08 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
>> > I had to decrease streaming throughput to 10 (from default 200) in
>> order to
>> > avoid effect or rising number of SSTables and number of compaction tasks
>> > while running repair. It's working very slow but it's stable and doesn't
>> > hurt the whole cluster. Will try to adjust configuration gradually to
>> see if
>> > can make it any better. Thanks!
>> >
>> > On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <ml...@gmail.com>
>> wrote:
>> >>
>> >>
>> >>
>> >> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <ar...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Also, are you using incremental repairs (not sure about the available
>> >>> options in Spotify Reaper) what command did you run ?
>> >>>
>> >>
>> >> No.
>> >>
>> >>>
>> >>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <ar...@gmail.com>:
>> >>>>>
>> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
>> >>>>
>> >>>>
>> >>>>
>> >>>> What is your current compaction throughput ?  The current value of
>> >>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
>> >>
>> >>
>> >>
>> >> Throughput was initially set to 1024 and I've gradually increased it to
>> >> 2048, 4K and 16K but haven't seen any changes. Tried to change it both
>> from
>> >> `nodetool` and also cassandra.yaml (with restart after changes).
>> >>
>> >>>>
>> >>>>
>> >>>> nodetool getcompactionthroughput
>> >>>>
>> >>>>> How to speed up compaction? Increased compaction throughput and
>> >>>>> concurrent compactors but no change. Seems there is plenty idle
>> resources
>> >>>>> but can't force C* to use it.
>> >>>>
>> >>>>
>> >>>> You might want to try un-throttle the compaction throughput through:
>> >>>>
>> >>>> nodetool setcompactionsthroughput 0
>> >>>>
>> >>>> Choose a canari node. Monitor compaction pending and disk throughput
>> >>>> (make sure server is ok too - CPU...)
>> >>
>> >>
>> >>
>> >> Yes, I'll try it out but if increasing it 16 times didn't help I'm a
>> bit
>> >> sceptical about it.
>> >>
>> >>>>
>> >>>>
>> >>>> Some other information could be useful:
>> >>>>
>> >>>> What is your number of cores per machine and the compaction
>> strategies
>> >>>> for the 'most compacting' tables. What are write/update patterns,
>> any TTL or
>> >>>> tombstones ? Do you use a high number of vnodes ?
>> >>
>> >>
>> >> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
>> >> 256.
>> >>
>> >> Using LCS for all tables. Write / update heavy. No warnings about large
>> >> number of tombstones but we're removing items frequently.
>> >>
>> >>
>> >>>>
>> >>>>
>> >>>> Also what is your repair routine and your values for
>> gc_grace_seconds ?
>> >>>> When was your last repair and do you think your cluster is suffering
>> of a
>> >>>> high entropy ?
>> >>
>> >>
>> >> We're having problem with repair for months (CASSANDRA-9935).
>> >> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
>> >> successfully for long time I guess cluster is suffering of high
>> entropy.
>> >>
>> >>>>
>> >>>>
>> >>>> You can lower the stream throughput to make sure nodes can cope with
>> >>>> what repairs are feeding them.
>> >>>>
>> >>>> nodetool getstreamthroughput
>> >>>> nodetool setstreamthroughput X
>> >>
>> >>
>> >> Yes, this sounds interesting. As we're having problem with repair for
>> >> months it could that lots of things are transferred between nodes.
>> >>
>> >> Thanks!
>> >>
>> >>>>
>> >>>>
>> >>>> C*heers,
>> >>>>
>> >>>> -----------------
>> >>>> Alain Rodriguez
>> >>>> France
>> >>>>
>> >>>> The Last Pickle
>> >>>> http://www.thelastpickle.com
>> >>>>
>> >>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
>> >>>>>
>> >>>>> Hi,
>> >>>>>
>> >>>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
>> >>>>> using Cassandra Reaper but nodes after couple of hours are full of
>> pending
>> >>>>> compaction tasks (regular not the ones about validation)
>> >>>>>
>> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC
>> pauses.
>> >>>>>
>> >>>>> How to speed up compaction? Increased compaction throughput and
>> >>>>> concurrent compactors but no change. Seems there is plenty idle
>> resources
>> >>>>> but can't force C* to use it.
>> >>>>>
>> >>>>> Any clue where there might be a bottleneck?
>> >>>>>
>> >>>>>
>> >>>>> --
>> >>>>> BR,
>> >>>>> Michał Łowicki
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >>
>> >>
>> >> --
>> >> BR,
>> >> Michał Łowicki
>> >
>> >
>> >
>> >
>> > --
>> > BR,
>> > Michał Łowicki
>>
>>
>>
>> --
>> Close the World, Open the Net
>> http://www.linux-wizard.net
>>
>
>


-- 
Close the World, Open the Net
http://www.linux-wizard.net

Re: Increase compaction performance

Posted by Alain RODRIGUEZ <ar...@gmail.com>.
Hi Michal,

Sorry about the delay answering here.

The value you gave (10) looks a lot like what I had to do in the past in
the cluster I managed. I described the issue here:
https://issues.apache.org/jira/browse/CASSANDRA-9509

A few people hit this issue already. Hope you were able to successfully
complete the first repair correctly without harming the cluster too much.

@Fabrice,

People answering the list probably missed your post as it was on an open
thread and I have been out so I missed it too.

I would:

Set concurrent compactors to 8 (max) - Can be updated through JMX
Set compaction throughput to 32, 64 or even 0 (go incrementally and on one
node first). Use 0 if you have SSD, unless you'll probably make the disk
throughput a bottleneck - Can be updated through nodetool
Set stream throughput to 10 Mb/s - Can be updated through nodetool (by the
way it is Mb and not MB)

Monitor resources and number of sstable, see how it goes.

You're also probably hitting
https://issues.apache.org/jira/browse/CASSANDRA-9509.

Also using LCS, I read (but was not able to find the reference or the fix
version) that repaired data was put back in L0, inducing even more
compactions. Have no more info about this, but upgrading to 2.0.Last is
needed and I would probably go 2.1.last as a lot of stuff around repairs
were fixed there and Cassandra 2.0 is no longer supported.

Hope you will find a way to mitigate thing though, or already have. Bonne
chance ;-).

C*heers,
-----------------------
Alain Rodriguez - alain@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com




2016-03-04 16:55 GMT+01:00 Fabrice Facorat <fa...@gmail.com>:

> Any news on this ?
>
> We also have issues during repairs when using many LCS tables. We end
> up with 8k sstables, many pending tasks and dropped mutations
>
> We are using Cassandra 2.0.10, on a 24 cores server, with
> multithreaded compactions enabled.
>
> ~$ nodetool getstreamthroughput
> Current stream throughput: 200 MB/s
>
> ~$ nodetool getcompactionthroughput
> Current compaction throughput: 16 MB/s
>
> Most sstables are tiny 4K or 8K/12K sstables:
>
> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' | wc -l
> 7405
> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | wc -l
> 7440
>
> ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' |
> cut -f1 -d" " | sort | uniq -c
>      36
>    7003 4.0K
>     396 8.0K
>
>
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> ReadStage                         0         0      258098148         0
>                 0
> RequestResponseStage              0         0      613994884         0
>                 0
> MutationStage                     0         0      332242206         0
>                 0
> ReadRepairStage                   0         0        3360040         0
>                 0
> ReplicateOnWriteStage             0         0              0         0
>                 0
> GossipStage                       0         0        2471033         0
>                 0
> CacheCleanupExecutor              0         0              0         0
>                 0
> MigrationStage                    0         0              0         0
>                 0
> MemoryMeter                       0         0          25160         0
>                 0
> FlushWriter                       1         1         134083         0
>               521
> ValidationExecutor                1         1          89514         0
>                 0
> InternalResponseStage             0         0              0         0
>                 0
> AntiEntropyStage                  0         0         636471         0
>                 0
> MemtablePostFlusher               1         1         334667         0
>                 0
> MiscStage                         0         0              0         0
>                 0
> PendingRangeCalculator            0         0            181         0
>                 0
> commitlog_archiver                0         0              0         0
>                 0
> CompactionExecutor               24        24        5241768         0
>                 0
> AntiEntropySessions               0         0          15184         0
>                 0
> HintedHandoff                     0         0            278         0
>                 0
>
> Message type           Dropped
> RANGE_SLICE                  0
> READ_REPAIR                267
> PAGED_RANGE                  0
> BINARY                       0
> READ                         0
> MUTATION                150970
> _TRACE                       0
> REQUEST_RESPONSE             0
> COUNTER_MUTATION             0
>
>
> 2016-02-12 20:08 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
> > I had to decrease streaming throughput to 10 (from default 200) in order
> to
> > avoid effect or rising number of SSTables and number of compaction tasks
> > while running repair. It's working very slow but it's stable and doesn't
> > hurt the whole cluster. Will try to adjust configuration gradually to
> see if
> > can make it any better. Thanks!
> >
> > On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <ml...@gmail.com>
> wrote:
> >>
> >>
> >>
> >> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <ar...@gmail.com>
> >> wrote:
> >>>
> >>> Also, are you using incremental repairs (not sure about the available
> >>> options in Spotify Reaper) what command did you run ?
> >>>
> >>
> >> No.
> >>
> >>>
> >>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <ar...@gmail.com>:
> >>>>>
> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses
> >>>>
> >>>>
> >>>>
> >>>> What is your current compaction throughput ?  The current value of
> >>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ?
> >>
> >>
> >>
> >> Throughput was initially set to 1024 and I've gradually increased it to
> >> 2048, 4K and 16K but haven't seen any changes. Tried to change it both
> from
> >> `nodetool` and also cassandra.yaml (with restart after changes).
> >>
> >>>>
> >>>>
> >>>> nodetool getcompactionthroughput
> >>>>
> >>>>> How to speed up compaction? Increased compaction throughput and
> >>>>> concurrent compactors but no change. Seems there is plenty idle
> resources
> >>>>> but can't force C* to use it.
> >>>>
> >>>>
> >>>> You might want to try un-throttle the compaction throughput through:
> >>>>
> >>>> nodetool setcompactionsthroughput 0
> >>>>
> >>>> Choose a canari node. Monitor compaction pending and disk throughput
> >>>> (make sure server is ok too - CPU...)
> >>
> >>
> >>
> >> Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit
> >> sceptical about it.
> >>
> >>>>
> >>>>
> >>>> Some other information could be useful:
> >>>>
> >>>> What is your number of cores per machine and the compaction strategies
> >>>> for the 'most compacting' tables. What are write/update patterns, any
> TTL or
> >>>> tombstones ? Do you use a high number of vnodes ?
> >>
> >>
> >> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to
> >> 256.
> >>
> >> Using LCS for all tables. Write / update heavy. No warnings about large
> >> number of tombstones but we're removing items frequently.
> >>
> >>
> >>>>
> >>>>
> >>>> Also what is your repair routine and your values for gc_grace_seconds
> ?
> >>>> When was your last repair and do you think your cluster is suffering
> of a
> >>>> high entropy ?
> >>
> >>
> >> We're having problem with repair for months (CASSANDRA-9935).
> >> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it
> >> successfully for long time I guess cluster is suffering of high entropy.
> >>
> >>>>
> >>>>
> >>>> You can lower the stream throughput to make sure nodes can cope with
> >>>> what repairs are feeding them.
> >>>>
> >>>> nodetool getstreamthroughput
> >>>> nodetool setstreamthroughput X
> >>
> >>
> >> Yes, this sounds interesting. As we're having problem with repair for
> >> months it could that lots of things are transferred between nodes.
> >>
> >> Thanks!
> >>
> >>>>
> >>>>
> >>>> C*heers,
> >>>>
> >>>> -----------------
> >>>> Alain Rodriguez
> >>>> France
> >>>>
> >>>> The Last Pickle
> >>>> http://www.thelastpickle.com
> >>>>
> >>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <ml...@gmail.com>:
> >>>>>
> >>>>> Hi,
> >>>>>
> >>>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair
> >>>>> using Cassandra Reaper but nodes after couple of hours are full of
> pending
> >>>>> compaction tasks (regular not the ones about validation)
> >>>>>
> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses.
> >>>>>
> >>>>> How to speed up compaction? Increased compaction throughput and
> >>>>> concurrent compactors but no change. Seems there is plenty idle
> resources
> >>>>> but can't force C* to use it.
> >>>>>
> >>>>> Any clue where there might be a bottleneck?
> >>>>>
> >>>>>
> >>>>> --
> >>>>> BR,
> >>>>> Michał Łowicki
> >>>>>
> >>>>
> >>>
> >>
> >>
> >>
> >> --
> >> BR,
> >> Michał Łowicki
> >
> >
> >
> >
> > --
> > BR,
> > Michał Łowicki
>
>
>
> --
> Close the World, Open the Net
> http://www.linux-wizard.net
>