You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Andrew Bialecki <an...@klaviyo.com> on 2017/10/27 05:49:13 UTC

Hinted handoff throttled even after "nodetool sethintedhandoffthrottlekb 0"

We have a 96 node cluster running 3.11 with 256 vnodes each. We're running
a rolling restart. As we restart nodes, we notice that each node takes a
while to have all other nodes be marked as up and this corresponds to nodes
that haven't finished playing hints.

We looked at the hinted handoff throttling, noticed it was still the
default of 1024, so we tried to turn it off by setting it to zero. Reading
the source, it looks like that rate limiting won't take affect until the
current set of hints have finished. So we made that change cluster wide and
then restarted the next node. However, we still saw the same issue.

Looking at iftop and network throughput, it's very low (~10kB/s) and
therefore the few 100k of hints that accumulate while the node is restart
end up take several minutes to get sent.

Any other knobs we should be tuning to increase hinted handoff throughput?
Or other reasons why hinted handoff runs so slowly?

-- 
Andrew Bialecki

Re: Hinted handoff throttled even after "nodetool sethintedhandoffthrottlekb 0"

Posted by Andrew Bialecki <an...@klaviyo.com>.
Bit more information. Using jmxterm and inspecting the state of a node when
it's "slow" playing hints, I can see the following from the node that has
hints to play:

$>get MaxHintsInProgress
#mbean = org.apache.cassandra.db:type=StorageProxy:
MaxHintsInProgress = 2048;

$>get HintsInProgress
#mbean = org.apache.cassandra.db:type=StorageProxy:
HintsInProgress = 0;

$>get TotalHints
#mbean = org.apache.cassandra.db:type=StorageProxy:
TotalHints = 129687;

Is there some throttling that would cause hints to not be played at all if,
for instance, the cluster has enough load or something related to a timeout
setting?

On Fri, Oct 27, 2017 at 1:49 AM, Andrew Bialecki <
andrew.bialecki@klaviyo.com> wrote:

> We have a 96 node cluster running 3.11 with 256 vnodes each. We're running
> a rolling restart. As we restart nodes, we notice that each node takes a
> while to have all other nodes be marked as up and this corresponds to nodes
> that haven't finished playing hints.
>
> We looked at the hinted handoff throttling, noticed it was still the
> default of 1024, so we tried to turn it off by setting it to zero. Reading
> the source, it looks like that rate limiting won't take affect until the
> current set of hints have finished. So we made that change cluster wide and
> then restarted the next node. However, we still saw the same issue.
>
> Looking at iftop and network throughput, it's very low (~10kB/s) and
> therefore the few 100k of hints that accumulate while the node is restart
> end up take several minutes to get sent.
>
> Any other knobs we should be tuning to increase hinted handoff throughput?
> Or other reasons why hinted handoff runs so slowly?
>
> --
> Andrew Bialecki
>



-- 
Andrew Bialecki

<https://www.klaviyo.com/>