You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2007/06/22 22:41:26 UTC

[jira] Closed: (LUCENE-673) Exceptions when using Lucene over NFS

     [ https://issues.apache.org/jira/browse/LUCENE-673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless closed LUCENE-673.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 2.2

This issue is now resolved by both LUCENE-701 and LUCENE-710 being fixed.
As far as I know there are no other outstanding issues preventing Lucene from
working over NFS.  Here's an excerpt from email I just sent to java-user:

As far as I know, Lucene should now work over NFS, except you will
have to make a custom deletion policy that works for your application.

Lucene had issues with NFS in three areas: locking, stale client-side
file caches and how NFS handles deletion of open files.  The first two
were fixed in Lucene 2.1 with lock-less commits (LUCENE-701) and the
last one is fixed in 2.2 with the addition of "custom deletion
policies" (LUCENE-710).

For a custom deletion policy you need to implement the
org.apache.lucene.index.IndexDeletionPolicy interface in your own
class and pass an instance of that class to your IndexWriter.  This
class tells IndexWriter when it's safe to delete older commits.  By
default Lucene uses an instance of KeepOnlyLastCommitDeletionPolicy.

The basic idea is to implement logic that can tell when your readers
are done using an older commit in the index.  For example if you know
your readers refresh themselves once per hour then your deletion
policy can safely delete any commit older than 1 hour.

But please note that while I believe NFS should work fine, this has
not been heavily tested yet.  Also note that performance over NFS is
generally not great.  If you do go down this route please report back
on any success or failure!  Thanks.


> Exceptions when using Lucene over NFS
> -------------------------------------
>
>                 Key: LUCENE-673
>                 URL: https://issues.apache.org/jira/browse/LUCENE-673
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: Index
>    Affects Versions: 2.0.0
>         Environment: NFS server/client
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>             Fix For: 2.2
>
>
> I'm opening this issue to track details on the known problems with
> Lucene over NFS.
> The summary is: if you have one machine writing to an index stored on
> an NFS mount, and other machine(s) reading (and periodically
> re-opening the index) then sometimes on re-opening the index the
> reader will hit a FileNotFound exception.
> This has hit many users because this is a natural way to "scale up"
> your searching (single writer, multiple readers) across machines.  The
> best current workaround (I think?) is to take the approach Solr takes
> (either by actually using Solr or copying/modifying its approach) to
> take snapshots of the index and then have the readers open the
> snapshots instead of the "live" index being written to.
> I've been working on two patches for Lucene:
>   * A locking (LockFactory) implementation using native OS locks
>   * Lock-less commits
> (I'll open separate issues with the details for those).
> I have a simple stress test where one machine is constantly adding
> docs to an index over NFS, and another machine is constantly
> re-opening the index searcher over NFS.
> These tests have revealed new details (at least for me!) about the
> root cause of our NFS problems:
>   * Even when using native locks over NFS, Lucene still hits these
>     exceptions!
>     I was surprised by this because I had always thought (assumed?)
>     the NFS problem was because the "simple" file-based locking was
>     not correct over NFS, and that switching to native OS filesystem
>     locking would resolve it, but it doesn't.
>     I can reproduce the "FileNotFound" exceptions even when using NFS
>     V4 (the latest NFS protocol), so this is not just a "your NFS
>     server is too old" issue.
>   * Then, when running the same stress test with the lock-less
>     changes, I don't hit any exceptions.  I've tested on NFS version
>     2, 3 and 4 (using the "nfsvers=N" mount option).
> I think this means that in fact (as Hoss at one point suggested I
> believe), the NFS problems are likely due to the cache coherence of
> the NFS file system (I think the "segments" file in particular)
> against the existence of the actual segment data files.
> In other words, even if you lock correctly, on the reader side it will
> sometimes see stale contents of the "segments" file which lead it to
> try to open a now deleted segment data file.
> So I think this is good news / bad news: the bad news is, native
> locking doesn't fix our problems with NFS (as at least I had expected
> it to).  But the good news is, it looks like (still need to do more
> thorough testing of this) the changes for lock-less commits do enable
> Lucene to work fine over NFS.
> [One quick side note in case it helps others: to get native locks
> working over NFS on Ubuntu/Debian Linux 6.06, I had to "apt-get
> install nfs-common" on the NFS client machines.  Before I did this I
> would hit "No locks available" IOExceptions on calling the "tryLock"
> method.  The default nfs server install on the server machine just
> worked because it runs in kernel mode and it start a lockd process.]

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org