You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2022/10/13 18:16:00 UTC

[jira] [Commented] (SOLR-14368) SyncStrategy result should not prevent a replica to become leader

    [ https://issues.apache.org/jira/browse/SOLR-14368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17617247#comment-17617247 ] 

David Smiley commented on SOLR-14368:
-------------------------------------

Looking at where election code calls to the SyncStrategy [here|https://github.com/apache/solr/blob/branch_9_1/solr/core/src/java/org/apache/solr/cloud/ShardLeaderElectionContext.java#L214].  Are you saying that the election algorithm should still initiate the sync here but it can ignore whatever its result is.  Just loop around and hope that the shard terms have updated appropriately?  Does it even need to initiate the sync here -- will the replica sync on its own?

> SyncStrategy result should not prevent a replica to become leader
> -----------------------------------------------------------------
>
>                 Key: SOLR-14368
>                 URL: https://issues.apache.org/jira/browse/SOLR-14368
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>
> h2. History
> In the beginning of SolrCloud, to become leader a replica will need to _sync_ with other replicas, This process includes
>  * Compare the current replica (leader’s candidate) tlog with others replicas. For example if current candidate’s data is too behind others, that replica should not become leader.
>  * Requesting other replicas to do a sync back before become leader, so imagine when the old leader got shut down when it trying to send multiple updates (u1, u2, u3, u4) to others
>  * Replica A may receive updates (u1, u2)
>  * Replica B may receive updates (u3, u4)
>  * If replica A becomes leader and it does not request replica B to sync back, replica B then needs to go into a recovery process which is costly.
> But this process have some problem
>  # We only sync with live replicas, so in case of no others live replicas at the time of the election, current replica can blindly become leader -> data loss, this problem was fixed with SOLR-11702
>  # For any IOException which is not catched properly during the communication process with the current replica and others can prevent that replica becoming leader.
> h2. Idea
> Basically with new ShardTerms information, we can pick arbitrary replicas with the highest _term_ to become leader. The reason here is replica’s _term_ effectively represents how close a replica is up-to-date with the leader.
> The only meaning of _sync_ with other replicas now is to prevent costly recovery processes from happening. Therefore SyncStrategy should not prevent a replica from becoming a leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org