You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2014/11/05 15:12:34 UTC

[jira] [Updated] (SLING-3434) Make intra-cluster discovery-heartbeats independent from machine clock differences

     [ https://issues.apache.org/jira/browse/SLING-3434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Egli updated SLING-3434:
-------------------------------
    Fix Version/s:     (was: Discovery Impl 1.0.14)

(Removed fix version, as this does not seem critical atm)

An intermediate step of this could be to start with a clock-diff-detection mechanism (eg by means of a 'clock vote' where each instance writes down its own UTC time) and simply issue a log.error if the clocks differ substantially. That would be non-intrusive and could be rolled-out with little side-effects/risks.

As a next step, once we have gathered some experience with the stability and feasibility of the above, that mechanism could be used to establish a 'cluster time zone' and each instance adds the discovered delta to it. That way removing the need to warn if the clocks are not in sync..

> Make intra-cluster discovery-heartbeats independent from machine clock differences
> ----------------------------------------------------------------------------------
>
>                 Key: SLING-3434
>                 URL: https://issues.apache.org/jira/browse/SLING-3434
>             Project: Sling
>          Issue Type: Bug
>          Components: Extensions
>    Affects Versions: Discovery Impl 1.0.2
>            Reporter: Stefan Egli
>
> SLING-2967 fixed an issue where topology connectors were dependent on having machine clocks in sync - so inter-cluster we're no longer dependent on NTP-synching.
> Inside a cluster though, this problem is still there. Since heartbeats are written as absolute time - based on the originator's machine clock - it still only works fine the whole cluster is NTP-synched.
> In general I think this is not a problem as it is best-practice to make sure machines have NTP set up.
> Nevertheless, it would help if discovery.impl could become independent from this.
> Also, if clocks are off by too much, pseudo-network-partitions can occur, with the result of having multiple leaders in a cluster (also see SLING-3432)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)