You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Goel, Nikhil" <ni...@verizon.com> on 2005/04/05 04:14:20 UTC

Time taken in Indexing when the index is already huge

Hi, 

   

I have been using lucene-1.3.jar for quite some time and we are using another library to store the index in DB. 

When we started indexing  the writer.optimize used to take in the range of 600-800 milliseconds to return but now our index has grown to huge proportion and its around 10 MB hence the writer.optimize is taking around 30-40 seconds and it is not acceptable for our solution. I put the timings on writer.optimize() and it's the one which takes most of this time. 

 

So I am just wondering if someone is facing the same problem in indexing the data when the index is already huge or is there another way to manage such huge index.

 

Here is the simple code which we use to index the data. 

IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), false); //Create an indexwriter

writer.addDocument(doc); //doc is of type  org.apache.lucene.document.Document...

writer.optimize(); //optimize is called on indexwriter..This is the one which takes most of the time and is responsible for the delay.

writer.close(); // indexwriter is closed

 

 

The time taken by optimize call grows a lot when the index is of larger size. I tried to look it up on Erik Hatcher and Otis Gospodnetić <http://www.manning.com/hatcher2#author#author>  book too but everywhere it says Lucene is quite scalable and don't have trouble in indexing even with huge data. Can anyone please provide  some insight into this?

 

Thanks.

Nikhil