You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Pa...@saaconsultants.com on 2005/09/07 10:44:05 UTC

Updating the index and searching




Hello,

I have an index into which documents get added and updated (by deleting and
adding). When I run queries on the index these have to take into account
all changes on the index so I open a new IndexReader. What I am finding is
that when the index is large the opening of the index takes a considerable
time with the effect that queries appear to take a long time to run. If a
new document is added or an existing document is updated then I need make
the changes and then open a new IndexReader so that I am able to see the
changes.

Does anybody else have a similar problem? Any solutions?

One idea I have been thinking about that may get around the problem is to
have two indexes, one which is only ever open for reading, the other used
for reading/writing. The idea is that new documents only get added to the
read/write index whilst queries and deletes would operate on both indexes.
Periodically (overnight or when the read/write index reaches a certain
size) then the read/write index will be merged into the read only index and
a new empty read/write index created. The thinking behind this is the
read/write index will be kept small and so constantly opening and closing
will be relatively quick whilst the read only index will be able to be kept
open.

One worry I have is that keeping the read only index open for long periods
without closing it means that the list of deleted documents will be kept in
memory until the index is closed. If something happens to the server
hosting the index then these deletes will be lost. Is there any way to
force the deletes to be flushed to the disk without closing and reopening
the index?

Any help/ideas would be greatly appreciated.

Regards

Paul I.

P.S. I know this subject has been touched upon before and I have had a look
through the mailing lists but couldn't find anything. It would be nice if
the mailing lists had a search facility...


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


aslib cranfield test collection

Posted by Gusenbauer Stefan <gu...@eduhi.at>.
Sorry for that offtopic message but does anyone has experiences with the
aslib cranfield test collection or does anyone know where i can get it?
thanks in advance
stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Updating the index and searching

Posted by Chris Hostetter <ho...@fucit.org>.
The best advice I can give on this topic is don't open a new IndexReader
for every search.  There is a lot of caching that goes on under the
covers (particularly when you sort by things other then SCORE) which is
completely wasted if you open a new IndexReader everytime.  If you use
Filters, then you should almost allways use the CachingWrapperFilter,
which is also wasted if you open a new IndexReader everytime.

At a minimum, use a singleton IndexReader for your whol applicaiton, and
don't replace it with a new IndexReader unless
IndexReader.getCurrentVersion says it has changed since the last time you
opened it.

Ideally, don't check IndexReader.getCurrentVersion on every search, check
it only if the last time it was checked was more then N seconds ago --
where N can be defined by your needs, but make it as large as possible.
Even if N is very small, and you are making constant updates to your
index, at least this way concurrent searches can take advantage of
eachother.


: Date: Wed, 7 Sep 2005 09:44:05 +0100
: From: Paul.Illingworth@saaconsultants.com
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: Updating the index and searching
:
:
:
:
:
: Hello,
:
: I have an index into which documents get added and updated (by deleting and
: adding). When I run queries on the index these have to take into account
: all changes on the index so I open a new IndexReader. What I am finding is
: that when the index is large the opening of the index takes a considerable
: time with the effect that queries appear to take a long time to run. If a
: new document is added or an existing document is updated then I need make
: the changes and then open a new IndexReader so that I am able to see the
: changes.
:
: Does anybody else have a similar problem? Any solutions?
:
: One idea I have been thinking about that may get around the problem is to
: have two indexes, one which is only ever open for reading, the other used
: for reading/writing. The idea is that new documents only get added to the
: read/write index whilst queries and deletes would operate on both indexes.
: Periodically (overnight or when the read/write index reaches a certain
: size) then the read/write index will be merged into the read only index and
: a new empty read/write index created. The thinking behind this is the
: read/write index will be kept small and so constantly opening and closing
: will be relatively quick whilst the read only index will be able to be kept
: open.
:
: One worry I have is that keeping the read only index open for long periods
: without closing it means that the list of deleted documents will be kept in
: memory until the index is closed. If something happens to the server
: hosting the index then these deletes will be lost. Is there any way to
: force the deletes to be flushed to the disk without closing and reopening
: the index?
:
: Any help/ideas would be greatly appreciated.
:
: Regards
:
: Paul I.
:
: P.S. I know this subject has been touched upon before and I have had a look
: through the mailing lists but couldn't find anything. It would be nice if
: the mailing lists had a search facility...
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
: For additional commands, e-mail: java-user-help@lucene.apache.org
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Updating the index and searching

Posted by Daniel Naber <lu...@danielnaber.de>.
On Wednesday 07 September 2005 10:44, Paul.Illingworth@saaconsultants.com 
wrote:

> account all changes on the index so I open a new IndexReader. What I am
> finding is that when the index is large the opening of the index takes a
> considerable time with the effect that queries appear to take a long
> time to run. 

You could try Lucene from subversion. Opening indexes should be faster 
there.

> P.S. I know this subject has been touched upon before and I have had a
> look through the mailing lists but couldn't find anything. It would be
> nice if the mailing lists had a search facility...

Here's an archive with a search feature:
http://www.gossamer-threads.com/lists/lucene/

Regards
 Daniel

-- 
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org