You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Carl Hu <me...@carlhu.com> on 2015/06/26 16:26:46 UTC

Mixing incremental repair with sequential

Dear colleagues,

We are using incremental repair and have noticed that every few repairs,
the cluster experiences pauses.

We run the repair with the following command: nodetool repair -par -inc

I have tried to run it not in parallel, but get the following error:
"It is not possible to mix sequential repair and incremental repairs."

Does anyone have any suggestions?

Many thanks in advance,
Carl

Re: Mixing incremental repair with sequential

Posted by Carl Hu <me...@carlhu.com>.

Alain,

The reduction of compaction is having significant impact lowering response
time, especially at the 90th percentile level, for us.

For the record, we are using AWS's i2.2xl instance types (these are ssd).
We were running compaction_throughput_mb_per_sec at 18. Now we are running
at 10. Latency variation for reads is hugely reduced. This is very
promising.

Thanks, Alain.

Best,
Carl


On Fri, Jun 26, 2015 at 7:40 PM, Alain RODRIGUEZ <ar...@gmail.com> wrote:

> Here is something I wrote some time ago:
>
>
> http://planetcassandra.org/blog/interview/video-advertising-platform-teads-chose-cassandra-spm-and-opscenter-to-monitor-a-personalized-ad-experience/
>
> Monitoring absolutely necessary to understand what is happening in the
> system. There is no magic in there and if you find bottlenecks, you can
> think about how to alleviate things. I would say at least as much as the
> design of your data models.
>
> "I've lowered compaction threshhold from 18 to 10mb/s. Will see what
> happens."
> If you have no SSD and compactions are creating a bottleneck at the disk
> the disk, this looks reasonable as long as the "compactions pending" metric
> remains low enough.
>
> If it is a cpu issue and you have many cores, I would advice you to try
> lowering the concurrent_compactor: number. (by default 1 compactor per
>  core)
>
> Once again it will depend on were the pressure is. Anyway, you might want
> to do anything you will try on one node only to test it first. Also, one
> option at the time (or a couple that you believe would have a synergy), and
> monitor the evolutions.
>
> C*heers,
>
> Alain
>
> 2015-06-26 21:30 GMT+02:00 Carl Hu <me...@carlhu.com>:
>
>> Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
>> compaction threshhold from 18 to 10mb/s. Will see what happens.
>>
>> >  I hope you have a monitoring tool up and running and an easy way to
>> detect errors on your logs.
>>
>> We do not have this. What do you use for this?
>>
>> Thank you,
>> Carl
>>
>>
>> On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ <ar...@gmail.com>
>> wrote:
>>
>>> "It is not possible to mix sequential repair and incremental repairs."
>>>
>>> I guess that is a system limitation, even if I am not sure of it (I
>>> don't have used C*2.1 yet)
>>>
>>> I would focus on tuning your repair by :
>>> - Monitoring performance / logs (see why the cluster hangs)
>>> - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
>>> list run it per table (
>>> http://www.datastax.com/dev/blog/advanced-repair-techniques)
>>>
>>> Depending on what's the root issue that makes hang your cluster it is
>>> hard to help you.
>>>
>>> - If CPU is a limit, then some tuning around compactions or GC might be
>>> needed (or a few more things)
>>> - if you have Disk IO limitations, you might want to add machines or
>>> tune compaction throughput
>>> - If your network is the issue, there are commands to tune the bandwidth
>>> used by streams.
>>>
>>> You need to troubleshot this and give us more informations. I hope you
>>> have a monitoring tool up and running and an easy way to detect errors on
>>> your logs.
>>>
>>> C*heers,
>>>
>>> Alain
>>>
>>> 2015-06-26 16:26 GMT+02:00 Carl Hu <me...@carlhu.com>:
>>>
>>>> Dear colleagues,
>>>>
>>>> We are using incremental repair and have noticed that every few
>>>> repairs, the cluster experiences pauses.
>>>>
>>>> We run the repair with the following command: nodetool repair -par -inc
>>>>
>>>> I have tried to run it not in parallel, but get the following error:
>>>> "It is not possible to mix sequential repair and incremental repairs."
>>>>
>>>> Does anyone have any suggestions?
>>>>
>>>> Many thanks in advance,
>>>> Carl
>>>>
>>>>
>>>
>>
>

Re: Mixing incremental repair with sequential

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

Here is something I wrote some time ago:

http://planetcassandra.org/blog/interview/video-advertising-platform-teads-chose-cassandra-spm-and-opscenter-to-monitor-a-personalized-ad-experience/

Monitoring absolutely necessary to understand what is happening in the
system. There is no magic in there and if you find bottlenecks, you can
think about how to alleviate things. I would say at least as much as the
design of your data models.

"I've lowered compaction threshhold from 18 to 10mb/s. Will see what
happens."
If you have no SSD and compactions are creating a bottleneck at the disk
the disk, this looks reasonable as long as the "compactions pending" metric
remains low enough.

If it is a cpu issue and you have many cores, I would advice you to try
lowering the concurrent_compactor: number. (by default 1 compactor per core)

Once again it will depend on were the pressure is. Anyway, you might want
to do anything you will try on one node only to test it first. Also, one
option at the time (or a couple that you believe would have a synergy), and
monitor the evolutions.

C*heers,

Alain

2015-06-26 21:30 GMT+02:00 Carl Hu <me...@carlhu.com>:

> Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
> compaction threshhold from 18 to 10mb/s. Will see what happens.
>
> >  I hope you have a monitoring tool up and running and an easy way to
> detect errors on your logs.
>
> We do not have this. What do you use for this?
>
> Thank you,
> Carl
>
>
> On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ <ar...@gmail.com>
> wrote:
>
>> "It is not possible to mix sequential repair and incremental repairs."
>>
>> I guess that is a system limitation, even if I am not sure of it (I don't
>> have used C*2.1 yet)
>>
>> I would focus on tuning your repair by :
>> - Monitoring performance / logs (see why the cluster hangs)
>> - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
>> list run it per table (
>> http://www.datastax.com/dev/blog/advanced-repair-techniques)
>>
>> Depending on what's the root issue that makes hang your cluster it is
>> hard to help you.
>>
>> - If CPU is a limit, then some tuning around compactions or GC might be
>> needed (or a few more things)
>> - if you have Disk IO limitations, you might want to add machines or tune
>> compaction throughput
>> - If your network is the issue, there are commands to tune the bandwidth
>> used by streams.
>>
>> You need to troubleshot this and give us more informations. I hope you
>> have a monitoring tool up and running and an easy way to detect errors on
>> your logs.
>>
>> C*heers,
>>
>> Alain
>>
>> 2015-06-26 16:26 GMT+02:00 Carl Hu <me...@carlhu.com>:
>>
>>> Dear colleagues,
>>>
>>> We are using incremental repair and have noticed that every few repairs,
>>> the cluster experiences pauses.
>>>
>>> We run the repair with the following command: nodetool repair -par -inc
>>>
>>> I have tried to run it not in parallel, but get the following error:
>>> "It is not possible to mix sequential repair and incremental repairs."
>>>
>>> Does anyone have any suggestions?
>>>
>>> Many thanks in advance,
>>> Carl
>>>
>>>
>>
>

Re: Mixing incremental repair with sequential

Posted by Carl Hu <me...@carlhu.com>.

Thank you, Alain, for the response. We're using 2.1 indeed. I've lowered
compaction threshhold from 18 to 10mb/s. Will see what happens.

>  I hope you have a monitoring tool up and running and an easy way to
detect errors on your logs.

We do not have this. What do you use for this?

Thank you,
Carl


On Fri, Jun 26, 2015 at 11:26 AM, Alain RODRIGUEZ <ar...@gmail.com>
wrote:

> "It is not possible to mix sequential repair and incremental repairs."
>
> I guess that is a system limitation, even if I am not sure of it (I don't
> have used C*2.1 yet)
>
> I would focus on tuning your repair by :
> - Monitoring performance / logs (see why the cluster hangs)
> - Use range repairs (as a workaround to the Merkle tree 32K limit) or at
> list run it per table (
> http://www.datastax.com/dev/blog/advanced-repair-techniques)
>
> Depending on what's the root issue that makes hang your cluster it is hard
> to help you.
>
> - If CPU is a limit, then some tuning around compactions or GC might be
> needed (or a few more things)
> - if you have Disk IO limitations, you might want to add machines or tune
> compaction throughput
> - If your network is the issue, there are commands to tune the bandwidth
> used by streams.
>
> You need to troubleshot this and give us more informations. I hope you
> have a monitoring tool up and running and an easy way to detect errors on
> your logs.
>
> C*heers,
>
> Alain
>
> 2015-06-26 16:26 GMT+02:00 Carl Hu <me...@carlhu.com>:
>
>> Dear colleagues,
>>
>> We are using incremental repair and have noticed that every few repairs,
>> the cluster experiences pauses.
>>
>> We run the repair with the following command: nodetool repair -par -inc
>>
>> I have tried to run it not in parallel, but get the following error:
>> "It is not possible to mix sequential repair and incremental repairs."
>>
>> Does anyone have any suggestions?
>>
>> Many thanks in advance,
>> Carl
>>
>>
>

Re: Mixing incremental repair with sequential

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

"It is not possible to mix sequential repair and incremental repairs."

I guess that is a system limitation, even if I am not sure of it (I don't
have used C*2.1 yet)

I would focus on tuning your repair by :
- Monitoring performance / logs (see why the cluster hangs)
- Use range repairs (as a workaround to the Merkle tree 32K limit) or at
list run it per table (
http://www.datastax.com/dev/blog/advanced-repair-techniques)

Depending on what's the root issue that makes hang your cluster it is hard
to help you.

- If CPU is a limit, then some tuning around compactions or GC might be
needed (or a few more things)
- if you have Disk IO limitations, you might want to add machines or tune
compaction throughput
- If your network is the issue, there are commands to tune the bandwidth
used by streams.

You need to troubleshot this and give us more informations. I hope you have
a monitoring tool up and running and an easy way to detect errors on your
logs.

C*heers,

Alain

2015-06-26 16:26 GMT+02:00 Carl Hu <me...@carlhu.com>:

> Dear colleagues,
>
> We are using incremental repair and have noticed that every few repairs,
> the cluster experiences pauses.
>
> We run the repair with the following command: nodetool repair -par -inc
>
> I have tried to run it not in parallel, but get the following error:
> "It is not possible to mix sequential repair and incremental repairs."
>
> Does anyone have any suggestions?
>
> Many thanks in advance,
> Carl
>
>