You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Smith <ps...@aconex.com> on 2006/02/13 07:43:15 UTC

CompoundFileReader question/'leaking' file descriptors ?

I've been hunting an insidious problem whereby during heavy  
incremental indexing operations in production on redhat el3 machine I  
notice that the java process has a lot of open files which appear to  
be deleted.

Now, before anyone jumps in, yes I know the # open file limit needs  
to be incremented, i've done that (it's at a hideous 16000 at the  
moment..).  Things I've verified include Writers/readers/searchers  
get closed when they should (finally blocks etc).

Using the 'lsof' command to track the open files, we see tonnes of  
these entries:

[root@index1 logs]# lsof -p `ps -efww | grep '[m]el.xml' | awk  
'{print $2}'` | grep deleted | head
java    23749 root  120r   REG       8,3     17507  61079633 /aconex/ 
index/current/project/39/56/0000025639/corr/000001/_dga.cfs (deleted)
java    23749 root  121r   REG       8,3     21775  61079684 /aconex/ 
index/current/project/39/56/0000025639/corr/000001/_dlc.cfs (deleted)
java    23749 root  123r   REG       8,3     17507  61079728 /aconex/ 
index/current/project/39/56/0000025639/corr/000001/_dq4.cfs (deleted)
......

What is REALLY weird is that they eventually do get released.  And  
scarily enough, it seems to track with when the garbage collector  
does a major collection (we managed to figure this out using Yourkit  
profiler and hitting the force GC), and lo, they disappear...  We  
have many indexes (2000, one for each project-entity), and not an  
UberIndex, and hence having indexes leak file handles is much more  
noticeable.

We're using Lucene 1.4.3, and after hunting around in the source code  
just to see what I might be missing, I came across this, and I'd just  
like some comments.

CompoundFileReader has an inner-class CSInputStream which is used to  
read the stream (and we're using the Compound format, so this is  
relevant here).

However it overrides InputStream.close(), but does not call  
super.close().  After tracing around where this is all used I believe  
that this method REALLY SHOULD be calling super.close() (or not  
overriding) it,because CompoundFileReader will be given an  
InputStream to wrap, eventually coming down to FSInputStream which  
apparently then calss Descriptor.close().

Scarily enough this ends up calling RandomAccessFile.close, which  
goes into native library calls and, assumably, close the file.

The guard here is that the finalizer method in FSInputStream does  
call close() so that would well explain the releasing of file handles  
at garbage collection intervals.

Why would CompoundFileReader not need to call .close()?

Am I going mad here and just seeing ghosts? Comments appreciated.

Paul Smith


Re: CompoundFileReader question/'leaking' file descriptors ?

Posted by Doug Cutting <cu...@apache.org>.
Paul Smith wrote:
> is 1.9 binary backward compatible? (both source code and index format).

That is the intent.  Try a nightly build:

http://cvs.apache.org/dist/lucene/java/nightly/

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: CompoundFileReader question/'leaking' file descriptors ?

Posted by Paul Smith <ps...@aconex.com>.
On 14/02/2006, at 7:44 AM, Doug Cutting wrote:

> Paul Smith wrote:
>> We're using Lucene 1.4.3, and after hunting around in the source  
>> code  just to see what I might be missing, I came across this, and  
>> I'd just  like some comments.
>
> Please try using a 1.9 build to see if this is something that's  
> perhaps already been fixed.
>
is 1.9 binary backward compatible? (both source code and index format).

>> CompoundFileReader has an inner-class CSInputStream which is used  
>> to  read the stream (and we're using the Compound format, so this  
>> is  relevant here).
>> However it overrides InputStream.close(), but does not call   
>> super.close().  After tracing around where this is all used I  
>> believe  that this method REALLY SHOULD be calling super.close()  
>> (or not  overriding) it,because CompoundFileReader will be given  
>> an  InputStream to wrap, eventually coming down to FSInputStream  
>> which  apparently then calss Descriptor.close().
>
> No, all CSInputStream's share a single FSInputStream, so the  
> FSInputStream shouldn't be closed until all of the CSInputStream's,  
> have been closed.  This is done by CompoundFileReader.close().  It  
> sounds like that's what's not getting called.  As you update  
> indexes, how do you close stale readers?

Yes, after looking at the code again I see that it extends from a  
Lucene class called InputStream, and I had assumed it was  
java.io.InputStream.

I'm going to take yet-another look at our code, but nothing is  
obvious.  Our current production environment is running in a side-by- 
side mode. There are no searches being performed, only indexing as we  
test out our indexing support/monitoring, and fail-over techniques.

Paul

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: CompoundFileReader question/'leaking' file descriptors ?

Posted by Paul Smith <ps...@aconex.com>.
>
> No, all CSInputStream's share a single FSInputStream, so the  
> FSInputStream shouldn't be closed until all of the CSInputStream's,  
> have been closed.  This is done by CompoundFileReader.close().  It  
> sounds like that's what's not getting called.  As you update  
> indexes, how do you close stale readers?

Total false alarm (as even I expected this would be).  AFter digging  
further and further around our code I have discovered that the  
Incremental Indexer does a search of the index to find related  
records to delete (when preparing for an update of an item, it needs  
to know what child items to also delete, in our case which  
Distributions to a mail to be removed on a mail update).

This search ends up creating an IndexSearcher which is held open for  
a short time.  A cache of IndexSearchers is used to smooth out lots  
of change, and is held open for at least 15 seconds to ensure any  
existing queries get a chance to complete (the index searcher going  
into 'wait' queue to be closed).

This search of course is what is holding the files open.  The search  
is done just prior to the delete, and hence why the segment is marked  
for deletion.  Waiting the amount of time for the IndexSearcher to  
close sees the file descriptor released.

Sorry for the intrusion.

cheers,

Paul Smith

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: CompoundFileReader question/'leaking' file descriptors ?

Posted by Doug Cutting <cu...@apache.org>.
Paul Smith wrote:
> We're using Lucene 1.4.3, and after hunting around in the source code  
> just to see what I might be missing, I came across this, and I'd just  
> like some comments.

Please try using a 1.9 build to see if this is something that's perhaps 
already been fixed.

> CompoundFileReader has an inner-class CSInputStream which is used to  
> read the stream (and we're using the Compound format, so this is  
> relevant here).
> 
> However it overrides InputStream.close(), but does not call  
> super.close().  After tracing around where this is all used I believe  
> that this method REALLY SHOULD be calling super.close() (or not  
> overriding) it,because CompoundFileReader will be given an  InputStream 
> to wrap, eventually coming down to FSInputStream which  apparently then 
> calss Descriptor.close().

No, all CSInputStream's share a single FSInputStream, so the 
FSInputStream shouldn't be closed until all of the CSInputStream's, have 
been closed.  This is done by CompoundFileReader.close().  It sounds 
like that's what's not getting called.  As you update indexes, how do 
you close stale readers?

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org