You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "Chris M. Hostetter (Jira)" <ji...@apache.org> on 2019/11/08 18:17:00 UTC

[jira] [Created] (SOLR-13908) Possible bugs when using HdfsDirectoryFactory w/ softCommit=true + openSearcher=true

Chris M. Hostetter created SOLR-13908:
-----------------------------------------

             Summary: Possible bugs when using HdfsDirectoryFactory w/ softCommit=true + openSearcher=true
                 Key: SOLR-13908
                 URL: https://issues.apache.org/jira/browse/SOLR-13908
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
          Components: hdfs
            Reporter: Chris M. Hostetter


While working on SOLR-13872 something caught my eye that seems fishy....

*Background:*

SOLR-4916 introduced the API {{DirectoryFactory.searchersReserveCommitPoints()}} -- a method that {{SolrIndexSearcher}} uses to decide if it needs to explicitly save/release the {{IndexCommit}} point of it's {{DirectoryReader}} with the {{IndexDeletionPolicytWrapper}}, for use on Filesystems that don't in some way "protect" open files...

{code:title=SolrIndexSearcher}
    if (directoryFactory.searchersReserveCommitPoints()) {
      // reserve commit point for life of searcher
      core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration());
    }
{code}

{code:title=DirectoryFactory}
  /**
   * If your implementation can count on delete-on-last-close semantics
   * or throws an exception when trying to remove a file in use, return
   * false (eg NFS). Otherwise, return true. Defaults to returning false.
   * 
   * @return true if factory impl requires that Searcher's explicitly
   * reserve commit points.
   */
  public boolean searchersReserveCommitPoints() {
    return false;
  }
{code}

{{HdfsDirectoryFactory}} is (still) the only {{DirectoryFactory}} Impl that returns {{true}}.

----

*Concern:*

As noted in LUCENE-9040  The behavior of {{DirectoryReader.getIndexCommit()}} is a little weird / underspecified when dealing with an "NRT" {{IndexReader}} (opened directly off of an {{IndexWriter}} using "un-committed" changes) ... which is exactly what {{SolrIndexSearcher}} is using in solr setups that use {{softCommit=true&openSearcher=false}}.

In particular the {{IndexCommit.getGeneration()}} value that will be used when {{SolrIndexSearcher}} executes {{core.getDeletionPolicy().saveCommitPoint(reader.getIndexCommit().getGeneration());}} will be (as of the current code) the {{generation}} of the last _hard_ commit -- meaning that new segment/data files since the last "hard commit" will not be protected from deletion if additional commits/merges happen on the index duringthe life of the {{SolrIndexSearcher}} -- either view concurrent rapid commits, or via {{commit=true&softCommit=false&openSearcher=false}}.

I have not investigated this in depth, but I believe there is risk here of unpredictible bugs when using HDFS in conjunction with {{softCommit=true&openSearcher=true}}.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org