You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@sling.apache.org by "Timothee Maret (JIRA)" <ji...@apache.org> on 2016/01/29 15:21:39 UTC

[jira] [Comment Edited] (SLING-5435) Decouple processes that depend on cluster leader elections from the cluster leader elections.

    [ https://issues.apache.org/jira/browse/SLING-5435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15123504#comment-15123504 ] 

Timothee Maret edited comment on SLING-5435 at 1/29/16 2:21 PM:
----------------------------------------------------------------

bq. If those "processes" don't exist, then it sounds like there is nothing to stop a faster leader election implementation that is not slowed down by the latency required to ensure a repository reaches a consistent state.

As I wrote in my previous comment, I think that all consumers of the {{TopologyEventListener}} currently in Sling do make a legit case of waiting on the repository. However, my point is that not all {{TopologyEventListener}} need to wait on the repository. I have shared a list of use cases as well. Thus, instead of imposing the repository wait, we could allow to configure a listener so that it does not have to wait on some repository sync. The implementation proposal has been discussed above as well.

I propose to keep the focus of this issue on that goal. Alternatively, I would open a separate one.

bq.  If that's really the case, then this issue can be closed and replaced by a new issue titled something like "Implement leader election using RAFT over the network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.
Generally, it is possible to make a "too fast" discovery "slow enough" thanks to the piece of code [~egli] mentioned earlier in this thread.


was (Author: marett):
bq. If those "processes" don't exist, then it sounds like there is nothing to stop a faster leader election implementation that is not slowed down by the latency required to ensure a repository reaches a consistent state.

As I wrote in my previous comment, I think that all consumers of the {{TopologyEventListener}} currently in Sling do make a legit case of waiting on the repository. However, my point is that not all {{TopologyEventListener}} need to wait on the repository. I have shared a list of use cases as well. Thus, instead of imposing the repository wait, we could allow to configure a listener so that it does not have to wait on some repository sync. The implementation proposal has been discussed above as well.

I propose to keep the focus of this issue on that goal. Alternatively, I would open a separate one.

bq.  If that's really the case, then this issue can be closed and replaced by a new issue titled something like "Implement leader election using RAFT over the network" in the same way that systems like etcd perform elections.
I have already opened SLING-5423 to track that and I am working on it.

> Decouple processes that depend on cluster leader elections from the cluster leader elections.
> ---------------------------------------------------------------------------------------------
>
>                 Key: SLING-5435
>                 URL: https://issues.apache.org/jira/browse/SLING-5435
>             Project: Sling
>          Issue Type: Improvement
>          Components: General
>            Reporter: Ian Boston
>
> Currently there are many processes in Sling that must complete before a Sling Discovery cluster leader election is declared complete. These processes include things like transferring all Jobs from the old leader to the new leader and waiting for the data to appear visible on the new leader. This introduces an additional overhead to the leader election process which introduces a higher than desirable timeout for elections and heartbeat. This higher than desirable timeout precludes the use of more efficient election and distributed consensus algorithms as implemented in Etcd, Zookeeper or implementations of RAFT.
> If the election could be declared complete leaving individual components to manage their own post election operations (ie decoupling those processes from the election), then faster election or alternative Discovery implementations such as the one implemented on etcd could be used.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)