You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Timothee Maret (JIRA)" <ji...@apache.org> on 2019/07/10 10:05:00 UTC

[jira] [Commented] (SLING-8531) Support JournalAvailabilityChecker exponential backoff

    [ https://issues.apache.org/jira/browse/SLING-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16881918#comment-16881918 ] 

Timothee Maret commented on SLING-8531:
---------------------------------------

The approach in [~cschneider] PR is actually better than the one proposed. It'd only produce load when the journal is not available and this load will decrease exponentially. PR LGTM, thanks!

> Support JournalAvailabilityChecker exponential backoff 
> -------------------------------------------------------
>
>                 Key: SLING-8531
>                 URL: https://issues.apache.org/jira/browse/SLING-8531
>             Project: Sling
>          Issue Type: Improvement
>          Components: Content Distribution
>    Affects Versions: Content Distribution Journal Core 0.1.2
>            Reporter: Timothee Maret
>            Assignee: Christian Schneider
>            Priority: Major
>             Fix For: Content Distribution Journal Core 0.1.4
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The average load generated by JournalAvailabilityChecker multiplies quickly for multi tenant deployments. The checker can be configured (via Sling Scheduler {{scheduler.period}}) to reduce the polling frequency but doing so also reduces the sensibility to detect availability changes.
> To improve the sensibility we should support an exponential backoff algorithm. The algorithm would divide the rate by two (up to a limit) every time the availability status does not change and reset the rate when the status changes. Steady states (available or unavailable) would eventually yield the least load. In the average case (availability status is steady) the load will be reduced up to the limit. In the worst case (availability changes all the time) the load will not be reduced compared to today. 
> The base rate would be Sling Scheduler {{scheduler.period}}. The rate at time t + 1 would be computed as follow: Rate~t+1~ = Multiplier~t+1~ * Rate~t+1~. The table below summarise how the multiplier would evolve according to the available status change. 
> ||State~t~||State~t+1~||Multiplier~t+1~||
> |unavailable|unavailable|max(2 * Multiplier~t~, limit)|
> |unavailable|available|1|
> |available|unavailable|1|
> |available|available|max(2 * Multiplier~t~, limit)|
> The limit would be hardcoded to 16 which would reduce the load by an order of magnitude, we could expose the limit as a configuration later if needed.
> There should be no need to randomise the multiplier for now as the checker are expected to be started at random time. If we hit a scenario where the checkers start at the same time, we could simply randomise the first scheduled event.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)