You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Alain RODRIGUEZ <ar...@gmail.com> on 2013/07/04 16:43:56 UTC

Restart node => hinted handoff flood

Hi,

Using C*1.2.2 12 EC2 xLarge cluster.

When I restart a node, if it spend a few minutes down, when I bring it up,
all the cpu are blocked at 100%, even once compactions are disabled,
inducing a very big and intolerable latency in my app. I suspect Hinted
Handoff to be the cause of this. disabling gossip fix the problem, enabling
it again brings the latency back (with a lot of gc, dropped messages...).

Is there a way to disable HH ? Are they responsible for this issue ?

I currently have this node down, any fast insight would be appreciated.

Alain

Re: Restart node => hinted handoff flood

Posted by Michał Michalski <mi...@opera.com>.

My blind guess is: https://issues.apache.org/jira/browse/CASSANDRA-5179

In our case the only sensible solution was to pause hints delivery and 
disable storing them (both done with a nodetool: pausehandoff and 
disablehandoff). Once they TTL'd (3 hours by default I believe?) I 
turned HH on again and started to repair. However, problem has returned 
on the next day, so I had to do a quick C* upgrade with the version 
having this patch applied (we use a "self-built" 1.2.1 with a few 
additional patches applied).

M.

W dniu 04.07.2013 18:41, Alain RODRIGUEZ pisze:
> The point is that there is no way, afaik, to limit the speed of these
> Hinted Handoff since it's not a stream like repair or bootstrap, no way
> either to keep the node out of the ring during the time it is receiving
> hints since hints and "normal" traffic both go through gossip protocol on
> port 7000.
>
> How to avoid this Hinted Handoff flood on returning nodes ?
>
> Alain
>
>
> 2013/7/4 Alain RODRIGUEZ <ar...@gmail.com>
>
>> Hi,
>>
>> Using C*1.2.2 12 EC2 xLarge cluster.
>>
>> When I restart a node, if it spend a few minutes down, when I bring it up,
>> all the cpu are blocked at 100%, even once compactions are disabled,
>> inducing a very big and intolerable latency in my app. I suspect Hinted
>> Handoff to be the cause of this. disabling gossip fix the problem, enabling
>> it again brings the latency back (with a lot of gc, dropped messages...).
>>
>> Is there a way to disable HH ? Are they responsible for this issue ?
>>
>> I currently have this node down, any fast insight would be appreciated.
>>
>> Alain
>>
>

Re: Restart node => hinted handoff flood

Posted by Alain RODRIGUEZ <ar...@gmail.com>.

The point is that there is no way, afaik, to limit the speed of these
Hinted Handoff since it's not a stream like repair or bootstrap, no way
either to keep the node out of the ring during the time it is receiving
hints since hints and "normal" traffic both go through gossip protocol on
port 7000.

How to avoid this Hinted Handoff flood on returning nodes ?

Alain


2013/7/4 Alain RODRIGUEZ <ar...@gmail.com>

> Hi,
>
> Using C*1.2.2 12 EC2 xLarge cluster.
>
> When I restart a node, if it spend a few minutes down, when I bring it up,
> all the cpu are blocked at 100%, even once compactions are disabled,
> inducing a very big and intolerable latency in my app. I suspect Hinted
> Handoff to be the cause of this. disabling gossip fix the problem, enabling
> it again brings the latency back (with a lot of gc, dropped messages...).
>
> Is there a way to disable HH ? Are they responsible for this issue ?
>
> I currently have this node down, any fast insight would be appreciated.
>
> Alain
>