You are viewing a plain text version of this content. The canonical link for it is here.
Posted to oak-issues@jackrabbit.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2015/08/26 14:47:45 UTC

[jira] [Resolved] (OAK-3238) fine tune clock-sync check vs lease-check settings

     [ https://issues.apache.org/jira/browse/OAK-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Egli resolved OAK-3238.
------------------------------
    Resolution: Fixed

Changed the lease behavior as follows (http://svn.apache.org/r1697913):
 * update is now done already after 20 sec - this should not have any negative performance implications. the lease timeout is left unchanged at 60 sec
 * the lease-check is now done with a margin of 20 sec (1/3 of the leaseTime): so if the lease is valid for less than 20 sec it will now consider that as a failure. 
/fyi: [~mreutegg], [~chetanm], [~reschke]

> fine tune clock-sync check vs lease-check settings
> --------------------------------------------------
>
>                 Key: OAK-3238
>                 URL: https://issues.apache.org/jira/browse/OAK-3238
>             Project: Jackrabbit Oak
>          Issue Type: Improvement
>          Components: core
>    Affects Versions: 1.3.4
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>             Fix For: 1.3.5
>
>
> There are now two components that try to assure 'discovery-lite' (OAK-2844) is reporting a coherent cluster view to the upper layers:
> * OAK-2682 : time difference detection: by default fails if clock is off by more than 2 seconds at startup. That results in a 4 sec max margin in a document-cluster
> * OAK-2739 : lease-checking: every instance checks if the local lease is valid upon any document access. This check is done against the actual 'leaseEndTime' - which is updated every (by default) 30 seconds to be valid for (by default) another 60 seconds.
> These two factors combined, in the worst case you could still end up having that 4 second time window where the local instance fails to update the lease (eg lease-thread dies) but it considers itself still owning a valid lease - while a remote instance might be those 4 seconds off and considers the lease as timed out.
> So overall: the 3 factors 'lease duration', 'lease update frequency' and 'maximum allowed clock difference' must be better tuned to end up in a stable mechanism.
> Suggestion:
>  * increase the 'lease duration' to be 3 x 'lease update frequency', ie 90sec lease duration
> * reduce the lease check failure limit from 'lease duration' to 2x 'lease update frequency' - assuming that one 'lease update interval' is way larger than the 'maximum allowed clock difference'



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)