You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Volodymyr Bychkoviak <vb...@i-hypergrid.com> on 2005/03/01 18:19:24 UTC

Large Index managing

Hi,

just an idea how to manage large index that is updated very often.

Very often there is need to update an document in index. To update 
document in index you should delete old document from index and then add 
new one. In most cases it require you to open IndexReader, delete 
document, close IndexReader, create IndexWriter, add document, close 
IndexWriter, and re-open IndexSearcher (if index is searched heavily). 
Profiling some applications I found that most time is spend in 
IndexReader.open() method. Also it produces many objects, so it also 
gives GC overhead.

Idea to optimize this process is to create two indexes. One main index 
that could be very large and second index that will serve as "change 
buffer". We can keep one IndexReader open for the first index. (and use 
it for searching and for deleting old documents). Second index is small 
and we can reopen IndexReader frequently when needed.

when second index reaches some number of documents we can merge it with 
main index.
to search this "multi" index we could use MultiSearcher over this two 
indexes but with little trick: first IndexSearcher is kept same during 
all time till second index is merged with main and second IndexSearcher 
is reopened when second index changes.

It is just idea. (It is not tested)

Will it help to improve speed of updating large index and lower memory 
overhead?
Any comments?

Regards,
Volodymyr Bychkoviak



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org