You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2014/03/20 17:00:44 UTC

[jira] [Created] (LUCENE-5541) FileExistsCachingDirectory, to work around unreliable File.exists

Michael McCandless created LUCENE-5541:
------------------------------------------

             Summary: FileExistsCachingDirectory, to work around unreliable File.exists
                 Key: LUCENE-5541
                 URL: https://issues.apache.org/jira/browse/LUCENE-5541
             Project: Lucene - Core
          Issue Type: Bug
          Components: core/store
            Reporter: Michael McCandless


File.exists is a dangerous method in Java, because if there is a
low-level IOException (permission denied, out of file handles, etc.)
the method can return false when it should return true.

Fortunately, as of Lucene 4.x, we rely much less on File.exists,
because we track which files the codec components created, and we know
those files then exist.

But, unfortunately, going from 3.0.x to 3.6.x, we increased our
reliance on File.exists, e.g. when creating CFS we check File.exists
on each sub-file before trying to add it, and I have a customer
corruption case where apparently a transient low level IOE caused
File.exists to incorrectly return false for one of the sub-files.  It
results in corruption like this:

{noformat}
  java.io.FileNotFoundException: No sub-file with id .fnm found (fileName=_1u7.cfs files: [.tis, .tii, .frq, .prx, .fdt, .nrm, .fdx])
      org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:157)
      org.apache.lucene.index.CompoundFileReader.openInput(CompoundFileReader.java:146)
      org.apache.lucene.index.FieldInfos.<init>(FieldInfos.java:71)
      org.apache.lucene.index.IndexWriter.getFieldInfos(IndexWriter.java:1212)
      org.apache.lucene.index.IndexWriter.getCurrentFieldInfos(IndexWriter.java:1228)
      org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:1161)
{noformat}

I think typically local file systems don't often hit such low level
errors, but if you have an index on a remote filesystem, where network
hiccups can cause problems, it's more likely.

As a simple workaround, I created a basic Directory delegator that
holds a Set of all created but not deleted files, and short-circuits
fileExists to return true if the file is in that set.

I don't plan to commit this: we aren't doing bug-fix releases on
3.6.x anymore (it's very old by now), and this problem is already
"fixed" in 4.x (by reducing our reliance on File.exists), but I wanted
to post the code here in case others hit it.  It looks like it was hit
e.g. https://netbeans.org/bugzilla/show_bug.cgi?id=189571 and
https://issues.jboss.org/browse/ISPN-2981 




--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org