You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chris Kimm <ch...@seeqa.com> on 2004/03/11 18:35:41 UTC
update performance
The standard pattern for updating an index - removing a document then
re-adding the modified document to the index - is currently a
significant performance bottleneck in my application. I sometimes need
to update ~1000 documents at a time. The major cost of this pattern as
far as I can see is IndexWriter.close (). Average times for an update
to an FSDirectory look like this:
delete document: 7 ms
create document: 6 ms
add document: 11 ms
IndexWriter.close: 59 ms
Is there a way to synchronize IndexWriter and IndexReader so that a call
to IndexWriter.close is not required for each update? I guess I mean to
ask if there is a *simple* way to do this. I imagine that one could
write an IndexUpdater class which manages the synchronization of Locks,
temp files, etc.
Thanks,
Chris
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: update performance
Posted by "Kevin A. Burton" <bu...@newsmonster.org>.
Chris Kimm wrote:
> Unfortunately, I'm not able to batch the updates. The application
> needs to make some descisions based on what each document looks like
> before and after the update, so I have to do it one at a time. I
> guess this is not a common useage scenario for Lucene. Otherwise, an
> update() might already be built in somewhere.
>
> Is there anything in the locking/sync framework which precludes saving
> the cost of closing the Directory object and deleting the temp lock
> file each time an update is made?
>
Use a RAM directory... then when you're pretty sure you're done call
IndexWriter.addIndexes() on the disk index.
Will that work for you?
You can also do this every N documents, or minutes, or memory usage, and
have the commit work with a synchronized thread.
Kevin
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster - http://www.newsmonster.org/
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
AIM/YIM - sfburtonator, Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster
Re: update performance
Posted by Doug Cutting <cu...@apache.org>.
Chris Kimm wrote:
> Unfortunately, I'm not able to batch the updates. The application needs
> to make some descisions based on what each document looks like before
> and after the update, so I have to do it one at a time.
Are these decisions dependent on other documents? If not, you should be
able to queue the updates and apply them as a batch, no?
> I guess this
> is not a common useage scenario for Lucene. Otherwise, an update()
> might already be built in somewhere.
Rather, Lucene's API makes it convenient to do what is efficient, and
less convenient to do what is inefficient. Batching is inherently more
efficient.
> Is there anything in the locking/sync framework which precludes saving
> the cost of closing the Directory object and deleting the temp lock file
> each time an update is made?
You could disable locking, but I doubt it will make it much faster.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: update performance
Posted by Chris Kimm <ch...@seeqa.com>.
Unfortunately, I'm not able to batch the updates. The application needs
to make some descisions based on what each document looks like before
and after the update, so I have to do it one at a time. I guess this
is not a common useage scenario for Lucene. Otherwise, an update()
might already be built in somewhere.
Is there anything in the locking/sync framework which precludes saving
the cost of closing the Directory object and deleting the temp lock file
each time an update is made?
-Chris
Doug Cutting wrote:
> It sounds like you're not batching your updates.
>
> The most efficient approch to update 1000 documents would be to:
>
> 1. Open an IndexReader;
> 2. Delete all 1000 documents.
> 3. Close the reader;
> 4. Open an IndexWriter;
> 5. Add all 1000 updated documents;
> 6. Close the IndexWriter.
>
> Is that what you're doing?
>
> Doug
>
> Chris Kimm wrote:
>
>> The standard pattern for updating an index - removing a document then
>> re-adding the modified document to the index - is currently a
>> significant performance bottleneck in my application. I sometimes
>> need to update ~1000 documents at a time. The major cost of this
>> pattern as far as I can see is IndexWriter.close (). Average times
>> for an update to an FSDirectory look like this:
>>
>> delete document: 7 ms
>> create document: 6 ms
>> add document: 11 ms
>> IndexWriter.close: 59 ms
>>
>> Is there a way to synchronize IndexWriter and IndexReader so that a
>> call to IndexWriter.close is not required for each update? I guess I
>> mean to ask if there is a *simple* way to do this. I imagine that one
>> could write an IndexUpdater class which manages the synchronization of
>> Locks, temp files, etc.
>>
>> Thanks,
>>
>> Chris
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
>> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: update performance
Posted by Doug Cutting <cu...@apache.org>.
It sounds like you're not batching your updates.
The most efficient approch to update 1000 documents would be to:
1. Open an IndexReader;
2. Delete all 1000 documents.
3. Close the reader;
4. Open an IndexWriter;
5. Add all 1000 updated documents;
6. Close the IndexWriter.
Is that what you're doing?
Doug
Chris Kimm wrote:
> The standard pattern for updating an index - removing a document then
> re-adding the modified document to the index - is currently a
> significant performance bottleneck in my application. I sometimes need
> to update ~1000 documents at a time. The major cost of this pattern as
> far as I can see is IndexWriter.close (). Average times for an update
> to an FSDirectory look like this:
>
> delete document: 7 ms
> create document: 6 ms
> add document: 11 ms
> IndexWriter.close: 59 ms
>
> Is there a way to synchronize IndexWriter and IndexReader so that a call
> to IndexWriter.close is not required for each update? I guess I mean to
> ask if there is a *simple* way to do this. I imagine that one could
> write an IndexUpdater class which manages the synchronization of Locks,
> temp files, etc.
>
> Thanks,
>
> Chris
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org