You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Stefan Egli (JIRA)" <ji...@apache.org> on 2018/08/16 09:28:00 UTC

[jira] [Comment Edited] (SLING-7830) Defined leader switch

    [ https://issues.apache.org/jira/browse/SLING-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16582255#comment-16582255 ] 

Stefan Egli edited comment on SLING-7830 at 8/16/18 9:27 AM:
-------------------------------------------------------------

The leader election is based on the leaderElectionId stored in the repository under {{/var/discovery/oak/clusterInstances}}. When a leader starts up, it stores its own leaderElectionId there. As Carsten mentioned, that's made up of a prefix, then the start time and the slingId (to avoid clashes). At the time the cluster view is analysed, the leader is the one with the *lowest* leaderElectionId (String comparison).

That fact can be used to follow the following procedure:
 * before bringing up new (eg blue) instances, put the old (eg green) instances's leaderElectionIds to the back of the leader comparison 'queue' by incrementing for example the prefix, eg. replace the leaderElectionIds from *{{1}}*{{_0000001534409616936_374019fc-68bd-4c8d-a4cf-8ee8b07c63bc}} to *{{2}}*{{_0000001534409616936_374019fc-68bd-4c8d-a4cf-8ee8b07c63bc}} (and do the same for *all* old instances). Do this in *1 jcr transaction* (otherwise there will be an *unwanted* leader change in the old cluster)
 * then bring up the new (eg green) instances. (One of) the new instance(s) will automatically become leader, since the prefix is {{1}} by default and thus lower than the old instances.

We could be looking at automating something like this and providing it via some API/JMX..


was (Author: egli):
The leader election is based on the leaderElectionId stored in the repository under {{/var/discovery/oak/clusterInstances}}. When a leader starts up, it stores its own leaderElectionId there. As Carsten mentioned, that's made up of a prefix, then the start time and the slingId (to avoid clashes). At the time the cluster view is analysed, the leader is the one with the lowest leaderElectionId.

That fact can be used to follow the following procedure:
* before bringing up new (eg blue) instances, put the old (green) instances's leaderElectionIds in the back of the leader comparison by incrementing for example the prefix, eg. replace the leaderElectionIds from *{{1}}*{{_0000001534409616936_374019fc-68bd-4c8d-a4cf-8ee8b07c63bc}} to *{{2}}*{{_0000001534409616936_374019fc-68bd-4c8d-a4cf-8ee8b07c63bc}} (and do the same for *all* old instances). Do this in *1 jcr transaction*
* then bring up the new (eg green) instances. (One of) the new instance(s) will automatically become leader, since the prefix is {{1}} by default and thus lower than the old instances.

We could be looking at automating something like this and providing it via some API/JMX..

> Defined leader switch
> ---------------------
>
>                 Key: SLING-7830
>                 URL: https://issues.apache.org/jira/browse/SLING-7830
>             Project: Sling
>          Issue Type: Improvement
>          Components: Discovery
>            Reporter: Carsten Ziegeler
>            Priority: Major
>
> The current leader selection is based on startup time and sling id (mainly) and is stable across changed in the topology for as long as the leader is up and running.
> However there are use cases like blue green deployment where new instances with a new version are started and taking over the functionality. However with the current discovery setup, the leader would still be one of the instances with the old version.
> With a new deployed version, tasks currently bound to the leader should run on the new version.
> Therefore the leader needs to switch and stay the leader (until it dies).
> We probably need an additional criteria for the leader selection
> /cc [~egli]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)