You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Cao Manh Dat (JIRA)" <ji...@apache.org> on 2018/07/10 02:03:00 UTC

[jira] [Comment Edited] (SOLR-12412) Leader should give up leadership when IndexWriter.tragedy occur

    [ https://issues.apache.org/jira/browse/SOLR-12412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16537875#comment-16537875 ] 

Cao Manh Dat edited comment on SOLR-12412 at 7/10/18 2:02 AM:
--------------------------------------------------------------

Thanks [~steve_rowe], I will take a look at the failure.

[~tomasflobbe] I tried to do that, but it will be quite complex, the process will be (not mention the race condition we can meet)
* The core publish itself as DOWN
* The core cancel it election context
* The core delete its index dir
* ... 

Given that tragic exception is not a frequent event and using Overseer will bring us some benefits like
* The update request that met the exception does not get blocked (async)
* Much cleaner and well-tested approach
* We can easily improve the solution to make it more robust. Ex: when delete replica failed because the node went down, Overseer can remove the replica from clusterstate (therefore even when the node come back, it will be automatically removed) then, Overseer can add a new replica in another node.


was (Author: caomanhdat):
Thanks [~steve_rowe], I will take a look at the failure.

[~tomasflobbe] I tried to do that, but it will be quite complex, the process will be (not mention the race condition we can meet)
* The core publish itself as DOWN
* The core cancel it election context
* The core delete its index dir
* ... 
Given that tragic exception is not a frequent event and using Overseer will bring us some benefits like
* The update request that met the exception does not get blocked (async)
* Much cleaner and well-tested approach
* We can easily improve the solution to make it more robust. Ex: when delete replica failed because the node went down, Overseer can remove the replica from clusterstate (therefore even when the node come back, it will be automatically removed) then, Overseer can add a new replica in another node.

> Leader should give up leadership when IndexWriter.tragedy occur
> ---------------------------------------------------------------
>
>                 Key: SOLR-12412
>                 URL: https://issues.apache.org/jira/browse/SOLR-12412
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Cao Manh Dat
>            Assignee: Cao Manh Dat
>            Priority: Major
>         Attachments: SOLR-12412.patch, SOLR-12412.patch
>
>
> When a leader meets some kind of unrecoverable exception (ie: CorruptedIndexException). The shard will go into the readable state and human has to intervene. In that case, it will be the best if the leader gives up its leadership and let other replicas become the leader. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org