You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Yonik Seeley (JIRA)" <ji...@apache.org> on 2015/12/10 04:14:10 UTC

[jira] [Commented] (SOLR-8372) Canceled recovery can lead to data loss

    [ https://issues.apache.org/jira/browse/SOLR-8372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15049944#comment-15049944 ] 

Yonik Seeley commented on SOLR-8372:
------------------------------------

I've been thinking about possible ways to deal with this:
- stop updates at a higher level... if the distributed update processor knows it's not in the right state to accept updates, then reject them
  -- this has problems with race conditions unless the check/reject is with the bucket lock held
- keep buffering updates when recovery is canceled.  When another call to bufferUpdates() is made, reset the starting position so we know where replay needs to start from.
- introduce a new state into UpdateLog (the current states are REPLAYING, BUFFERING, APPLYING_BUFFERED, ACTIVE)
  -- this new state would do what?  Silently drop updates it receives?  Throw an exception?  The latter would seem to complicate things further if it could possibly cause another node to put us into LIR again.

In really hairy scenarios, one might think that keeping updates might be useful rather than dropping them.  So perhaps the "keep buffering" option may be simplest as it also avoids introducing another state?  It should normally only a a few more updates coming in that were in the pipeline when something happened to our recovery attempt anyway (like the leader dying)? 


> Canceled recovery can lead to data loss
> ---------------------------------------
>
>                 Key: SOLR-8372
>                 URL: https://issues.apache.org/jira/browse/SOLR-8372
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Yonik Seeley
>
> A recovery via index replication tells the update log to start buffering updates.  If that recovery is canceled for whatever reason by the replica, the RecoveryStrategy calls ulog.dropBufferedUpdates() which stops buffering and places the UpdateLog back in active mode.  If updates come from the leader after this point (and before ReplicationStrategy retries recovery), the update will be processed as normal and added to the transaction log. If the server is bounced, those last updates to the transaction log look normal (no FLAG_GAP) and can be used to determine who is more up to date. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org