You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Monsur Hossain <mo...@monsur.com> on 2005/04/29 00:10:45 UTC

IndexSearcher hanging on to old index files in Windows

Hi all.  I'm running Lucene.NET in a Windows/ASP.NET environment.  We are
searching a 300meg index in a web environment, where the IndexSearcher is
cached.  Every 10-30 minutes, a separate process updates the index.  When
ASP.NET's cache detects a changed index, it drops the current IndexSearcher
(which the Garbage collector takes care of in the future [1]) and creates a
new one. 

Now, while the index is being updated, the current IndexSearcher in cache
holds a reference to the old index files.  Therefore, the IndexWriter can't
delete them, and they sit around in the folder, continuing to grow.  Since
the IndexSearcher is left to the GC, there's no guarantee of when the files
will be released.  

I was considering such previously mentioned systems as reference counting
[2] and swapping between two indexes [3].  But in both these cases, I don't
think I'm ever guaranteed that an old IndexSearcher will have released its
grasp on the old files in time to delete them.  

Anyway, I'd like to hear if others are dealing with this issue.

Also, I'm curious, is this a Windows specific issue; I haven't seen any
mention of this on UNIX?

Thanks,
Monsur

[1] http://tinyurl.com/8qzo4
[2] http://tinyurl.com/8enzh
[3] I can't find a link to it, but it was suggested by George Aroush in a
previous thread of mine.



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: IndexSearcher hanging on to old index files in Windows

Posted by Chuck Williams <ch...@allthingslocal.com>.
Monsur Hossain writes (4/28/2005 3:10 PM):

>Hi all.  I'm running Lucene.NET in a Windows/ASP.NET environment.  We are
>searching a 300meg index in a web environment, where the IndexSearcher is
>cached.  Every 10-30 minutes, a separate process updates the index.  When
>ASP.NET's cache detects a changed index, it drops the current IndexSearcher
>(which the Garbage collector takes care of in the future [1]) and creates a
>new one. 
>
>Now, while the index is being updated, the current IndexSearcher in cache
>holds a reference to the old index files.  Therefore, the IndexWriter can't
>delete them, and they sit around in the folder, continuing to grow.  Since
>the IndexSearcher is left to the GC, there's no guarantee of when the files
>will be released.  
>
>I was considering such previously mentioned systems as reference counting
>[2] and swapping between two indexes [3].  But in both these cases, I don't
>think I'm ever guaranteed that an old IndexSearcher will have released its
>grasp on the old files in time to delete them.  
>
>Anyway, I'd like to hear if others are dealing with this issue.
>  
>
Perhaps I'm not fully understanding your issue, but I did a stress test 
recently with a large Lucene index (growing to about 10 million large 
documents on a single node) and didn't encounter this problem.  The 
system did continual round-the-clock indexing at about 100k 
documents/hour with nightly optimizations.  Searching was performed on 
the same index on the same node in parallel (taking generally 20 to 
200ms per search).  The test harness closed the underlying IndexReader 
and reopened a new one every 2 minutes, thus guaranteeing that search 
results were up-to-date within 2 minutes.  I wasn't doing deletes, but 
old segment files caused by incremental merging and/or optimization were 
not hanging around as far as I could tell. This was on the Java version 
on Windows.

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org