You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Ishan Chattopadhyaya (JIRA)" <ji...@apache.org> on 2015/11/04 12:50:27 UTC

[jira] [Comment Edited] (SOLR-7569) Create an API to force a leader election between nodes

    [ https://issues.apache.org/jira/browse/SOLR-7569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14989410#comment-14989410 ] 

Ishan Chattopadhyaya edited comment on SOLR-7569 at 11/4/15 11:49 AM:
----------------------------------------------------------------------

Thanks for your review, [~noble.paul].

bq. Let's not keep the core admin command as OVERRIDELASTPUBLISHED. This means it can be a generic enough API which may be abused by others for other things. Let's not tell others what we are doing internally and keep the command name opaque

This patch uses FORCEPREPAREFORLEADERSHIP from SOLR-8233. Does this sound fine?

bq.  This particular collection admin operation does not really have to  go to overseer, it can be performed by the receiving node itself because the clearing of LIR node does not have to be done at overseer anyway

The reason why I wanted to keep it at Overseer was that most cluster management code is there. I can move this to CollectionsHandler instead of OCMH.


was (Author: ichattopadhyaya):
bq. Let's not keep the core admin command as OVERRIDELASTPUBLISHED. This means it can be a generic enough API which may be abused by others for other things. Let's not tell others what we are doing internally and keep the command name opaque

This patch uses FORCEPREPAREFORLEADERSHIP from SOLR-8233. Does this sound fine?

bq.  This particular collection admin operation does not really have to  go to overseer, it can be performed by the receiving node itself because the clearing of LIR node does not have to be done at overseer anyway

The reason why I wanted to keep it at Overseer was that most cluster management code is there. I can move this to CollectionsHandler instead of OCMH.

> Create an API to force a leader election between nodes
> ------------------------------------------------------
>
>                 Key: SOLR-7569
>                 URL: https://issues.apache.org/jira/browse/SOLR-7569
>             Project: Solr
>          Issue Type: New Feature
>          Components: SolrCloud
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>              Labels: difficulty-medium, impact-high
>         Attachments: SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569.patch, SOLR-7569_lir_down_state_test.patch
>
>
> There are many reasons why Solr will not elect a leader for a shard e.g. all replicas' last published state was recovery or due to bugs which cause a leader to be marked as 'down'. While the best solution is that they never get into this state, we need a manual way to fix this when it does get into this  state. Right now we can do a series of dance involving bouncing the node (since recovery paths between bouncing and REQUESTRECOVERY are different), but that is difficult when running a large cluster. Although it is possible that such a manual API may lead to some data loss but in some cases, it is the only possible option to restore availability.
> This issue proposes to build a new collection API which can be used to force replicas into recovering a leader while avoiding data loss on a best effort basis.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org