You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Commit Tag Bot (JIRA)" <ji...@apache.org> on 2013/03/22 17:41:16 UTC
[jira] [Commented] (SOLR-3813) When a new leader syncs, we need to ask all shards to sync back, not just those that are active.

    [ https://issues.apache.org/jira/browse/SOLR-3813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13610847#comment-13610847 ] 

Commit Tag Bot commented on SOLR-3813:
--------------------------------------

[branch_4x commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revision&revision=1384937

SOLR-3833: When a election is started because a leader went down, the new leader candidate should decline if the last state they published was not active.

SOLR-3836: When doing peer sync, we should only count sync attempts that cannot reach the given host as success when the candidate leader is syncing with the replicas - not when replicas are syncing to the leader.

SOLR-3835: In our leader election algorithm, if on connection loss we found we did not create our election node, we should retry, not throw an exception.

SOLR-3834: A new leader on cluster startup should also run the leader sync process in case there was a bad cluster shutdown.

SOLR-3772: On cluster startup, we should wait until we see all registered replicas before running the leader process - or if they all do not come up, N amount of time.
  
SOLR-3756: If we are elected the leader of a shard, but we fail to publish this for any reason, we should clean up and re trigger a leader election.

SOLR-3812: ConnectionLoss during recovery can cause lost updates, leading to shard inconsistency.
  
SOLR-3813: When a new leader syncs, we need to ask all shards to sync back, not just those that are active.

SOLR-3807: Currently during recovery we pause for a number of seconds after waiting for the leader to see a recovering state so that any previous updates will have finished before our commit on the leader - we don't need this wait for peersync.
  
SOLR-3837: When a leader is elected and asks replicas to sync back to him and that fails, we should ask those nodes to recovery asynchronously rather than synchronously.

                
> When a new leader syncs, we need to ask all shards to sync back, not just those that are active.
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-3813
>                 URL: https://issues.apache.org/jira/browse/SOLR-3813
>             Project: Solr
>          Issue Type: Bug
>          Components: SolrCloud
>            Reporter: Mark Miller
>            Assignee: Mark Miller
>            Priority: Critical
>             Fix For: 4.0, 5.0
>
>
> Otherwise there is a race where a shard can complete recovery against the old leader and publish as active, while missing the sync stage with the leader - resulting in possible lost updates and shard inconsistency.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org