You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2019/11/04 18:52:00 UTC

[jira] [Updated] (SOLR-13872) Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)

     [ https://issues.apache.org/jira/browse/SOLR-13872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris M. Hostetter updated SOLR-13872:
--------------------------------------
    Attachment: SOLR-13872.patch
      Assignee: Chris M. Hostetter
        Status: Open  (was: Open)


Ok, I think i have enough of a handle on what's going on, and what *needs* to be going on to move forward...

First off -- I'm attaching a patch with some new/additional tests logic:
* TestStressThreadBackup in particular is a starting point for a robust test similar to the manual steps to reproduce that i posted above.  
** It tests using both the REplicationHandler and the CoreAdmin API for doing core backups.  
** it fails easily for me using replication handler, but iv'e never actual seen it fail using the CoreApi (which jives with my previous comment about the window of time for the race condition being shorter in thta code path)
* The changes to TestCoreBackup and TestReplicationHandler are largely just to prove to myself that most of the complexity in the IndexDeletionPolicyWrapper as far as allowing callers to pass in an arbitrary commit (instead of the "latest" commit that IndexDeletionPolicyWrapper knows about) is really not needed (AFAICT ... i may have missed a use case).  
** So most of the "If no latest commit in IDWP, then use & reserve latest commit from searcher" is not needed
** In fact, because of how the "NRT" readers in use by the SolrIndexSearcher work, it's a really bad idea to do this
*** see additions to TestIndexWriterReader and TestCoreBackup.testDemoWhyBackupCodeShouldNeverUseIndexCommitFromSearcher

So with all this in mind, i'm going to move forward with the basic API changes i proposed before, and -- i think -- make hte delete() method in the Delegate IndexCommit wrappers synchronize on the outer IndexDeletionPolicyWrapper to address the main synchronization concerns i had before.  from what i can see so far that, combined with a new (synchronized) method to atomically "getAndReserveLatestCommit()", should fix the API flaws (when used properly in the caller code, which i'll also work on).

I'm also going to try and remove some of the duplicate code paths in SnapShooter -- there's no reason why createSnapshot and createSnapAsync should look so similar and still be so different -- the async code paths should just call the same methods as the sync code path, but in a thread.


>  Backup can fail to read index files w/NoSuchFileException during merges (SOLR-11616 regression)
> ------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-13872
>                 URL: https://issues.apache.org/jira/browse/SOLR-13872
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Chris M. Hostetter
>            Assignee: Chris M. Hostetter
>            Priority: Major
>         Attachments: SOLR-13872.patch, index_churn.pl
>
>
> SOLR-11616 purports to fix a bug in Solr's backup functionality that causes 'NoSuchFileException' errors when attempting to backup an index while it is undergoing indexing (and segment merging)
> Although SOLR-11616 is marked with "Fix Version: 7.2" it's pretty easy to demonstrate that this bug still exists on master, branch_8x, and even in 7.2 - so it seems less like the current problem is a "regression" and more that the original fix didn't work.
> ----
> The crux of the problem seems to be concurrency bugs in if/how a commit is "reserved" before attempting to copy the files in that commit to the backup location.  
> A possible work around discussed in more depth in the comments below is to update {{solrconfig.xml}} to explicitly configure the {{SolrDeletionPolicy}} with either the {{maxCommitsToKeep}} or {{maxCommitAge}} options to ensure the commits are kept around long enough for the backup to be created.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org