You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (Jira)" <ji...@apache.org> on 2021/08/24 14:26:00 UTC

[jira] [Comment Edited] (SLING-10489) Ignore partially started, newly joining instances to avoid disturbing discovery (for a while)

    [ https://issues.apache.org/jira/browse/SLING-10489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17403834#comment-17403834 ] 

Stefan Egli edited comment on SLING-10489 at 8/24/21, 2:25 PM:
---------------------------------------------------------------

* created [discovery.commons PR#4|https://github.com/apache/sling-org-apache-sling-discovery-commons/pull/4] - with the following improvements:
** skipping activeIds in OakBacklogClusterSyncService if they are partially started
** ignore syncToken for view change checks if there are partially started instances
** LogSilencer introduced, which reduces log.info spam caused by discovery
* updated [discovery.oak PR#4|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/4] - which now doesn't build because it wants the LogSilencer from above PR .. (so these 2 bundles need to be built together and travis can't know that). The PR contains the following improvements:
** only ever consider new slingIds for suppression and never those already seen
** if joining a cluster also do JoinerDelay if there are other instances that are partially started eg concurrently
** use LogSilencer to reduce log.info spam caused by discovery




was (Author: egli):
* created [discovery.commons PR#4|https://github.com/apache/sling-org-apache-sling-discovery-commons/pull/4] - which improves partial-startup-suppression stability and introduces the LogSilencer
* updated [discovery.oak PR#4|https://github.com/apache/sling-org-apache-sling-discovery-oak/pull/4] - which now doesn't build because it wants the LogSilencer from above PR .. (so these 2 bundles need to be built together and travis can't know that)

> Ignore partially started, newly joining instances to avoid disturbing discovery (for a while)
> ---------------------------------------------------------------------------------------------
>
>                 Key: SLING-10489
>                 URL: https://issues.apache.org/jira/browse/SLING-10489
>             Project: Sling
>          Issue Type: Improvement
>          Components: Discovery
>    Affects Versions: Discovery Oak 1.2.34
>            Reporter: Stefan Egli
>            Assignee: Stefan Egli
>            Priority: Major
>             Fix For: Discovery Oak 1.2.36
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Discovery.oak requires that both Oak and Sling are operating normally in order to declare victory and announce a new topology.
> The startup phase is especially tricky in this regard, since there are multiple elements that need to get updated (some are in the Oak layer, some in Sling) :
>  * lease & clusterNodeId : this is maintained by Oak
>  * idMap : this is maintained by IdMapService (Sling)
>  * leaderElectionId : this is maintained by OakViewChecker (Sling)
>  * syncToken : this is maintained by SyncTokenService (Sling)
> Situations have been seen where Oak starts up fine, but higher level (eg Sling) bundles were not activated within a reasonable amount of time. This lead to discovery staying in TOPOLOGY_CHANGING state for longer than expected.
> There should be a mechanism that ignores (suppresses) newly joining instances if they start up only partially. However, after a certain timeout this mechanism should give up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)