You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@kudu.apache.org by "Todd Lipcon (JIRA)" <ji...@apache.org> on 2018/05/22 22:34:00 UTC
[jira] [Commented] (KUDU-2452) Prevent follower from causing pre-elections when UpdateConsensus is slow

    [ https://issues.apache.org/jira/browse/KUDU-2452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16484697#comment-16484697 ] 

Todd Lipcon commented on KUDU-2452:
-----------------------------------

I think we already do stop the failure detector during UpdateReplica, at least while we're waiting on the log, don't we?

The issue I've seen more is that enough tablets are blocked in UpdateConsensus that other (unrelated) tablets can't get their heartbeats processed due to queue overflows. Those _other_ victim tablets then end up calling pre-elections, which only contribute more to the load.

I think the best starting point for this would be KUDU-1707, which would allow simple liveness heartbeats to continue to get through even when tablets are holding up the threads.

> Prevent follower from causing pre-elections when UpdateConsensus is slow
> ------------------------------------------------------------------------
>
>                 Key: KUDU-2452
>                 URL: https://issues.apache.org/jira/browse/KUDU-2452
>             Project: Kudu
>          Issue Type: Improvement
>    Affects Versions: 1.7.0
>            Reporter: Will Berkeley
>            Priority: Major
>
> Thanks to pre-elections (KUDU-1365), slow UpdateConsensus calls on a single follower don't disturb the whole tablet by calling elections. However, sometimes I see situations where one or more followers are constantly calling pre-elections, and only rarely, if ever, overflowing their service queues. Occasionally, in 3x replicated tablets, the followers will get "lucky" and detect a leader failure at around the same time, and an election will happen.
> This background instability has caused bugs like KUDU-2343 that should be rare to occur pretty frequently, plus the extra RequestConsensusVote RPCs add a little more stress on the consensus service and on replicas' consensus locks. It also spams the logs, since there's no generally no exponential backoff for these pre-elections because there's a successful heartbeat in between them.
> It seems like we can get into the situation where the average number of in-flight consensus requests is constant over time, so on average we are processing each heartbeat in less than the heartbeat interval, however some heartbeats take longer. Since UpdateConsensus calls to a replica are serialized, a few of these in a row trigger the failure detector, despite the follower receiving every heartbeat in a timely manner and responding successfully eventually (and on average in a timely manner).
> It'd be nice to prevent these worthless pre-elections. A couple of ideas:
> 1. Separately calculate a backoff for failed pre-elections, and reset it when a pre-election succeeds or more generally when there's an election.
> 2. Don't count the time the follower is executing UpdateConsensus against the failure detector. [~mpercy] suggested stopping the failure detector during UpdateReplica() and resuming it when the function returns.
> 3. Move leader failure detection out-of-band of UpdateConsensus entirely.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)