You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by sunil goyal <su...@gmail.com> on 2005/03/23 08:50:24 UTC

incremental indexing - efficiency

Hello all,

I am trying to use Lucene for doing incremental indexing of the order
of million of records daily using a single machine (P4 2.4Ghz 1 GB
RAM). I do get messages updated every few minutes based on which I
need to update the index.

I am using a StandardAnalyzer and writing documents using IndexWriter
(FSDirectory) using the following structure:

Document document = new Document();
document.add(Field.Keyword(INDEX_MESG_ID, LongField.longToString(mesg_id)));
document.add(Field.UnStored(INDEX_MESG_TITLE, mesgTitle));
document.add(Field.UnStored(INDEX_MESG_DESCRIPTION,mesgDescription));
document.add(Field.Keyword(INDEX_MESG_DATE,mesgDate));

mesgTitle is string of the order of 20-50 words
mesgDescription is a string of the order of 500-1000 words.

I profiled my application and the index writer is able to write 16
messages per second, which is too low for the order that I want to
achieve. I have tried various options:
- increasing mergeFactor and minMergeDocs from 10 to 100 to 1000 didn't help.
- Changed indexwriter from FSDirectory to RAMDirectory and later
synchronizing changes to FSDirectory. Even this didn't help.

It will be great if someone can give me pointers for increasing the
efficiency of index writer process. Can someone give me practical
examples of how much maximum efficiency can be achieved for similar
process on a single machine?

Thanks

Regards
Sunil

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org