You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org> on 2016/08/24 21:04:20 UTC

[jira] [Commented] (SOLR-9438) Shard split can lose data

    [ https://issues.apache.org/jira/browse/SOLR-9438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15435682#comment-15435682 ] 

Shalin Shekhar Mangar commented on SOLR-9438:
---------------------------------------------

A simple fix is for the overseer to check live node information before setting the parent shard as ‘invalid’. This will work because by the time the leader vote wait period expires, the killed former-leader’s ephemeral nodes should have expired.

But it gets trickier if the leader comes back online and recovers from this new (incomplete) replica. This will again mark the sub-shard as active. To prevent this, the overseer must ensure that the live node of the sub-shard leader still exists (with the same sequence number assigned at the time of split) before changing the sub-slice state to active.

> Shard split can lose data
> -------------------------
>
>                 Key: SOLR-9438
>                 URL: https://issues.apache.org/jira/browse/SOLR-9438
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: 4.10.4, 5.5.2, 6.1
>            Reporter: Shalin Shekhar Mangar
>            Assignee: Shalin Shekhar Mangar
>              Labels: difficulty-medium, impact-high
>             Fix For: master (7.0), 6.3
>
>
> Solr’s shard split can lose documents if the parent/sub-shard leader is killed (or crashes) between the time that the new sub-shard replica is created and before it recovers. In such a case the slice has already been set to ‘recovery’ state, the sub-shard replica comes up, finds that no other replica is up, waits until the leader vote wait time and then proceeds to become the leader as well as publish itself as active. Once that happens the overseer seeing that all replicas of the sub-shard are now ‘active’, sets the parent slice as ‘inactive’ and the new sub-shard as ‘active’.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org