You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2015/11/09 18:15:11 UTC

[jira] [Resolved] (SLING-5285) more aggressive self-check for heartbeat timeout

     [ https://issues.apache.org/jira/browse/SLING-5285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Egli resolved SLING-5285.
--------------------------------
    Resolution: Fixed

fixed in http://svn.apache.org/viewvc?rev=1713477&view=rev (which was also about SLING-5284)

> more aggressive self-check for heartbeat timeout
> ------------------------------------------------
>
>                 Key: SLING-5285
>                 URL: https://issues.apache.org/jira/browse/SLING-5285
>             Project: Sling
>          Issue Type: Improvement
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.2.0
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: Discovery Impl 1.2.2
>
>
> SLING-5195 introduced a self-check that was monitoring if the HeartbeatHandler was properly storing the heartbeats regularly. This is done because there are different reasons why that might not be the case, eg: the HeartbeatHandler could be blocked because of another long-running-commit happening locally - or it might be blocked due to thread-pool-exhaustion - or perhaps something yet different.
> The check was setting off an alarm when the time-since-last-heartbeat was bigger than a *heartbeatTimeout*. This however is not sufficient. The comparison should be much more aggressive. It should compare against a *heartbeatTimeout minus 2 times heartbeatInterval* to have enough safety margin. _2 times_ because 1 time is actually the very minimum: this background check only _runs_ every heartbeatInterval, so in the worst case it could run just _heartbeatInterval_ many seconds before the timeout hits - and still be too late by a fraction. So 1 is the very minimum. The _2_ is actually adding a safety margin of 1 _heartbeatInterval_ only.
> *Note:* this also means that you should configure the heartbeatTimeout at least 4-5 times the heartbeatInterval.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)