You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Monsur Hossain <mo...@monsur.com> on 2005/04/29 00:10:45 UTC
IndexSearcher hanging on to old index files in Windows
Hi all. I'm running Lucene.NET in a Windows/ASP.NET environment. We are
searching a 300meg index in a web environment, where the IndexSearcher is
cached. Every 10-30 minutes, a separate process updates the index. When
ASP.NET's cache detects a changed index, it drops the current IndexSearcher
(which the Garbage collector takes care of in the future [1]) and creates a
new one.
Now, while the index is being updated, the current IndexSearcher in cache
holds a reference to the old index files. Therefore, the IndexWriter can't
delete them, and they sit around in the folder, continuing to grow. Since
the IndexSearcher is left to the GC, there's no guarantee of when the files
will be released.
I was considering such previously mentioned systems as reference counting
[2] and swapping between two indexes [3]. But in both these cases, I don't
think I'm ever guaranteed that an old IndexSearcher will have released its
grasp on the old files in time to delete them.
Anyway, I'd like to hear if others are dealing with this issue.
Also, I'm curious, is this a Windows specific issue; I haven't seen any
mention of this on UNIX?
Thanks,
Monsur
[1] http://tinyurl.com/8qzo4
[2] http://tinyurl.com/8enzh
[3] I can't find a link to it, but it was suggested by George Aroush in a
previous thread of mine.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: IndexSearcher hanging on to old index files in Windows
Posted by Chuck Williams <ch...@allthingslocal.com>.
Monsur Hossain writes (4/28/2005 3:10 PM):
>Hi all. I'm running Lucene.NET in a Windows/ASP.NET environment. We are
>searching a 300meg index in a web environment, where the IndexSearcher is
>cached. Every 10-30 minutes, a separate process updates the index. When
>ASP.NET's cache detects a changed index, it drops the current IndexSearcher
>(which the Garbage collector takes care of in the future [1]) and creates a
>new one.
>
>Now, while the index is being updated, the current IndexSearcher in cache
>holds a reference to the old index files. Therefore, the IndexWriter can't
>delete them, and they sit around in the folder, continuing to grow. Since
>the IndexSearcher is left to the GC, there's no guarantee of when the files
>will be released.
>
>I was considering such previously mentioned systems as reference counting
>[2] and swapping between two indexes [3]. But in both these cases, I don't
>think I'm ever guaranteed that an old IndexSearcher will have released its
>grasp on the old files in time to delete them.
>
>Anyway, I'd like to hear if others are dealing with this issue.
>
>
Perhaps I'm not fully understanding your issue, but I did a stress test
recently with a large Lucene index (growing to about 10 million large
documents on a single node) and didn't encounter this problem. The
system did continual round-the-clock indexing at about 100k
documents/hour with nightly optimizations. Searching was performed on
the same index on the same node in parallel (taking generally 20 to
200ms per search). The test harness closed the underlying IndexReader
and reopened a new one every 2 minutes, thus guaranteeing that search
results were up-to-date within 2 minutes. I wasn't doing deletes, but
old segment files caused by incremental merging and/or optimization were
not hanging around as far as I could tell. This was on the Java version
on Windows.
Chuck
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org