You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Mellor <pm...@hemscott.co.uk> on 2005/02/16 11:21:09 UTC

Concurrent searching & re-indexing

Hi,

I've read from various sources on the Internet that it is perfectly safe to
simultaneously search a Lucene index that is being updated from another
Thread, as long as all write access to the index is synchronized.  But does
this apply only to updating the index (i.e. deleting and adding documents),
or to a complete re-indexing (i.e. create a new IndexWriter with the
'create' argument true and then re-add all the documents)?

I have a class which encapsulates all access to my index, so that writes can
be synchronized.  This class also exposes a method to obtain an
IndexSearcher for the index.  I'm running unit tests to test this which
create many threads - each thread does a complete re-indexing and then
obtains an IndexSearcher and does a query.

I'm finding that with sufficiently high numbers of threads, I'm getting the
occasional failure, with the following exception thrown when attempting to
construct a new IndexWriter (during the reindexing) -

java.io.IOException: couldn't delete _a.f1
at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:151)
...

The exception occurs quite infrequently (usually for somewhere between 1-5%
of the Threads).

Does the IndexSearcher take a 'snapshot' of the index at creation?  Or does
it access the filesystem whilst searching?  I am also synchronizing creation
of the IndexSearcher with the write lock, so that the IndexSearcher is not
created whilst the index is being recreated (and vice versa).  But do I need
to ensure that the IndexSearcher cannot search whilst the index is being
recreated as well?

Note that a similar unit test where the threads update the index (rather
than recreate it from scratch) works fine, as expected.

This is running on Windows 2000.

Any help would be much appreciated!

Paul

This e-mail and any files transmitted with it are confidential and intended
solely for the use of the individual or entity to whom they are addressed.
If you are not the intended recipient, you should not copy, retransmit or
use the e-mail and/or files transmitted with it  and should not disclose
their contents. In such a case, please notify netwebmaster@hemscott.co.uk
and delete the message from your own system. Any opinions expressed in this
e-mail and/or files transmitted with it that do not relate to the official
business of this company are those solely of the author and should not be
interpreted as being endorsed by this company.

Re: Concurrent searching & re-indexing

Posted by Doug Cutting <cu...@apache.org>.
Paul Mellor wrote:
> I've read from various sources on the Internet that it is perfectly safe to
> simultaneously search a Lucene index that is being updated from another
> Thread, as long as all write access to the index is synchronized.  But does
> this apply only to updating the index (i.e. deleting and adding documents),
> or to a complete re-indexing (i.e. create a new IndexWriter with the
> 'create' argument true and then re-add all the documents)?
[ ...]
> java.io.IOException: couldn't delete _a.f1
> at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
[...]
> This is running on Windows 2000.

On Windows one cannot delete a file while it is still open.  So, no, on 
Windows one cannot remove an index entirely while an IndexReader or 
Searcher is still open on it, since it is simply impossible to remove 
all the files in the index.

We might attempt to patch this by keeping a list of such files and 
attempt to delete them later (as is done when updating an index).  But 
this could cause problems, as a new index will eventually try to use 
these same file names again, and it would then conflict with the open 
IndexReader.  This is not a problem when updating an existing index, 
since filenames (except for a few which are not kept open, like 
"segments") are never reused in the lifetime of an index.  So, in order 
for such a fix to work we would need to switch to globally unique 
segment names, e.g., long random strings, rather than increasing integers.

In the meantime, the safe way to rebuild an index from scratch while 
other processes are reading it is simply to delete all of its documents, 
then start adding new ones.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Concurrent searching & re-indexing

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Hi Paul,

If I understand your setup correctly, it looks like you are running
multiple threads that create IndexWriter for the ame directory.  That's
a "no no".

This section (first hit) describes all various concurrency issues with
regards to adds, updates, optimization, and searches:
  http://www.lucenebook.com/search?query=concurrent

IndexSearcher (IndexReader, really) does take a snapshot of the index
state when it is opened, so at that time the index segments listed in
segments should be in a complete state.  It also reads index files when
searching, of course.

Otis


--- Paul Mellor <pm...@hemscott.co.uk> wrote:

> Hi,
> 
> I've read from various sources on the Internet that it is perfectly
> safe to
> simultaneously search a Lucene index that is being updated from
> another
> Thread, as long as all write access to the index is synchronized. 
> But does
> this apply only to updating the index (i.e. deleting and adding
> documents),
> or to a complete re-indexing (i.e. create a new IndexWriter with the
> 'create' argument true and then re-add all the documents)?
> 
> I have a class which encapsulates all access to my index, so that
> writes can
> be synchronized.  This class also exposes a method to obtain an
> IndexSearcher for the index.  I'm running unit tests to test this
> which
> create many threads - each thread does a complete re-indexing and
> then
> obtains an IndexSearcher and does a query.
> 
> I'm finding that with sufficiently high numbers of threads, I'm
> getting the
> occasional failure, with the following exception thrown when
> attempting to
> construct a new IndexWriter (during the reindexing) -
> 
> java.io.IOException: couldn't delete _a.f1
> at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:166)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:135)
> at
>
org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:113)
> at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:151)
> ...
> 
> The exception occurs quite infrequently (usually for somewhere
> between 1-5%
> of the Threads).
> 
> Does the IndexSearcher take a 'snapshot' of the index at creation? 
> Or does
> it access the filesystem whilst searching?  I am also synchronizing
> creation
> of the IndexSearcher with the write lock, so that the IndexSearcher
> is not
> created whilst the index is being recreated (and vice versa).  But do
> I need
> to ensure that the IndexSearcher cannot search whilst the index is
> being
> recreated as well?
> 
> Note that a similar unit test where the threads update the index
> (rather
> than recreate it from scratch) works fine, as expected.
> 
> This is running on Windows 2000.
> 
> Any help would be much appreciated!
> 
> Paul
> 
> This e-mail and any files transmitted with it are confidential and
> intended
> solely for the use of the individual or entity to whom they are
> addressed.
> If you are not the intended recipient, you should not copy,
> retransmit or
> use the e-mail and/or files transmitted with it  and should not
> disclose
> their contents. In such a case, please notify
> netwebmaster@hemscott.co.uk
> and delete the message from your own system. Any opinions expressed
> in this
> e-mail and/or files transmitted with it that do not relate to the
> official
> business of this company are those solely of the author and should
> not be
> interpreted as being endorsed by this company.
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org