You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Chuck Williams <ch...@allthingslocal.com> on 2005/04/05 05:53:54 UTC

[Fwd: Re: Time taken in Indexing when the index is already huge]

Goel, Nikhil writes (4/4/2005 7:14 PM):

>Hi, 
>
>   
>
>I have been using lucene-1.3.jar for quite some time and we are using another library to store the index in DB. 
>
>When we started indexing  the writer.optimize used to take in the range of 600-800 milliseconds to return but now our index has grown to huge proportion and its around 10 MB hence the writer.optimize is taking around 30-40 seconds and it is not acceptable for our solution. I put the timings on writer.optimize() and it's the one which takes most of this time. 
>
> 
>
>So I am just wondering if someone is facing the same problem in indexing the data when the index is already huge or is there another way to manage such huge index.
>
> 
>
>Here is the simple code which we use to index the data. 
>
>IndexWriter writer = new IndexWriter(dbDirectory, new StandardAnalyzer(), false); //Create an indexwriter
>
>writer.addDocument(doc); //doc is of type  org.apache.lucene.document.Document...
>
>writer.optimize(); //optimize is called on indexwriter..This is the one which takes most of the time and is responsible for the delay.
>
>writer.close(); // indexwriter is closed
>  
>
Does this code imply you are optimizing after every new document is 
indexed?  10MB is actually a pretty small index.  Depending on your 
inflow of documents, you should be able to optimize maybe once a day, 
during your application's least busy period.  Your IndexSearcher can 
still search your documents effectively while the index is unoptimized.  
As a first step, try not optimizing at all.

Chuck

> 
>
> 
>
>The time taken by optimize call grows a lot when the index is of larger size. I tried to look it up on Erik Hatcher and Otis Gospodnetić <http://www.manning.com/hatcher2#author#author>  book too but everywhere it says Lucene is quite scalable and don't have trouble in indexing even with huge data. Can anyone please provide  some insight into this?
>
> 
>
>Thanks.
>
>Nikhil
>
> 
>
> 
>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org