You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Jessica Cheng Mallet (JIRA)" <ji...@apache.org> on 2015/09/18 06:17:04 UTC

[jira] [Comment Edited] (SOLR-8069) Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.

    [ https://issues.apache.org/jira/browse/SOLR-8069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14804934#comment-14804934 ] 

Jessica Cheng Mallet edited comment on SOLR-8069 at 9/18/15 4:17 AM:
---------------------------------------------------------------------

The scenario that I have in mind is if somehow we're switching leadership back and forth due to nodes going into GC after receiving retries of an expensive query, what if a node is a leader at time T1, decided to set another node in LiR but went to GC before it did, so that it lost the leadership. Then, the other node briefly gained leadership at T2 and maybe processed an update or two but then also went to GC and lost its leadership. Then, the first node wakes up from GC and became the leader once more at T3--and then this code execute. My question is if it's absolutely safe for this node to set the other node in LiR simply because it's the leader now, even though when it decided to set the LiR, it was the leader  at T1.


was (Author: mewmewball):
The scenario that I have in mind is if somehow we're switching leadership back and forth due to nodes going into GC after receiving retries of an expensive query, what if a node is a leader at time T1, decided to set another node in LiR but went to GC before it did, so that it lost the leadership. Then, the other node briefly gained leadership at T2 but then also went to GC and lost its leadership. Then, the first node wakes up from GC and became the leader once more at T3--and then this code execute. My question is if it's absolutely safe for this node to set the other node in LiR simply because it's the leader now, even though when it decided to set the LiR, it was the leader  at T1.

> Leader Initiated Recovery can put the replica with the latest data into LIR and a shard will have no leader even on restart.
> ----------------------------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-8069
>                 URL: https://issues.apache.org/jira/browse/SOLR-8069
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Mark Miller
>         Attachments: SOLR-8069.patch, SOLR-8069.patch
>
>
> I've seen this twice now. Need to work on a test.
> When some issues hit all the replicas at once, you can end up in a situation where the rightful leader was put or put itself into LIR. Even on restart, this rightful leader won't take leadership and you have to manually clear the LIR nodes.
> It seems that if all the replicas participate in election on startup, LIR should just be cleared.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org