You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dan Quaroni <dq...@OPENRATINGS.com> on 2003/08/20 20:03:43 UTC

Fastest batch indexing with 1.3-rc1

Hey there.  What's the fastest way to do a batch index with lucene 1.3-rc1
on a dual or quad-processor box?  The files I'm indexing are very easy to
split divide among multiple threads.

Here's what I've done at this point:

Each thread has its own IndexWriter writing to its own RAMDirectory.  Every
<number> of documents, I mergeIndexes the thread's index to the main disk
index.

The thread writers have a mergeFactor of 50.
The disk indexWriter has a mergeFactor of 30.
I call optimize only on the main disk index, and only once at the very end.

Just doing this has shown great improvements for me, but I want to squeeze
out every bit of performance I can.  What's the fastest way to mergeIndexes?
Should I use a low mergeFactor when working with RAMDirectorys?  Should I
optimize the thread index before I merge it to the main one?

Thanks!

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org