You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Vitaly Funstein <vf...@gmail.com> on 2014/01/30 09:03:55 UTC

NRT index readers and new commits

Suppose I have an IndexReader instance obtained with this API:

DirectoryReader.open(IndexWriter, boolean);

(I actually use a ReaderManager in front of it, but that's beside the
point).

There is no manual commit happening prior to this call. Now, I would like
to keep this reader around until no longer needed, i.e. until the app is
done with the data this reader will see. Since the index is live, there may
be new data added after the reader is returned, of course - followed by one
or more commits at arbitrary points in time.

The question is - what will happen to the flushed segment files this reader
is backed by, when there is a later commit on the writer that is tied to
the NRT reader? If I'm reading the code correctly, IndexFileDeleter will
detect reference counts to those old segment files reaching 0 and try to
delete them. This is because IndexWriter.getReader(boolean) doesn't
checkpoint with the IndexFileDeleter associated with it, which would
increase ref counts on the managed files. Also, actually closing this
reader appears to have provide no feedback to the backing writer, it only
closes the data stream(s) but doesn't seem to release used segment files to
the deleter...

Does all this just rely on that most OSs will allow deletion of a file that
is still opened (Windows being a notable exception)? It seems the whole
IndexWriter.deletePendingFiles() API exists to work around the situation
when it's not allowed... And is it valid to assume it's a safe thing to do,
even when the OS supports it?

Re: NRT index readers and new commits

Posted by Michael McCandless <lu...@mikemccandless.com>.
Lucene absolutely relies on this behavior, that most filesystems support.

I.e., if you have an open file, and someone else deletes the file
behind it, your open file will continue to work until you close it,
and then it's "really" deleted.  ("delete on last close")

Unix achieves this by allowing the deletion of the directory entry
(but the file bytes / inode still remain allocated on disk).  Windows
achieves it by refusing to delete still-open files.

But some filesystems, e.g. NFS, do not do this when the two operations
are on separate clients (this results in the Stale NFS), which is why
you must use a custom IndexDeletionPolicy if your index is shared via
NFS.  Separately, such an approach usually results in poor search
performance ...

In your case, since you're using NRT, NFS is a non-issue: even if you
did have your index on NFS, the NFS client handles "delete on last
close" for an operations on that single client.

Mike McCandless

http://blog.mikemccandless.com


On Thu, Jan 30, 2014 at 3:03 AM, Vitaly Funstein <vf...@gmail.com> wrote:
> Suppose I have an IndexReader instance obtained with this API:
>
> DirectoryReader.open(IndexWriter, boolean);
>
> (I actually use a ReaderManager in front of it, but that's beside the
> point).
>
> There is no manual commit happening prior to this call. Now, I would like
> to keep this reader around until no longer needed, i.e. until the app is
> done with the data this reader will see. Since the index is live, there may
> be new data added after the reader is returned, of course - followed by one
> or more commits at arbitrary points in time.
>
> The question is - what will happen to the flushed segment files this reader
> is backed by, when there is a later commit on the writer that is tied to
> the NRT reader? If I'm reading the code correctly, IndexFileDeleter will
> detect reference counts to those old segment files reaching 0 and try to
> delete them. This is because IndexWriter.getReader(boolean) doesn't
> checkpoint with the IndexFileDeleter associated with it, which would
> increase ref counts on the managed files. Also, actually closing this
> reader appears to have provide no feedback to the backing writer, it only
> closes the data stream(s) but doesn't seem to release used segment files to
> the deleter...
>
> Does all this just rely on that most OSs will allow deletion of a file that
> is still opened (Windows being a notable exception)? It seems the whole
> IndexWriter.deletePendingFiles() API exists to work around the situation
> when it's not allowed... And is it valid to assume it's a safe thing to do,
> even when the OS supports it?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org