You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Paul Smith <ps...@aconex.com> on 2006/02/13 07:43:15 UTC
CompoundFileReader question/'leaking' file descriptors ?
I've been hunting an insidious problem whereby during heavy
incremental indexing operations in production on redhat el3 machine I
notice that the java process has a lot of open files which appear to
be deleted.
Now, before anyone jumps in, yes I know the # open file limit needs
to be incremented, i've done that (it's at a hideous 16000 at the
moment..). Things I've verified include Writers/readers/searchers
get closed when they should (finally blocks etc).
Using the 'lsof' command to track the open files, we see tonnes of
these entries:
[root@index1 logs]# lsof -p `ps -efww | grep '[m]el.xml' | awk
'{print $2}'` | grep deleted | head
java 23749 root 120r REG 8,3 17507 61079633 /aconex/
index/current/project/39/56/0000025639/corr/000001/_dga.cfs (deleted)
java 23749 root 121r REG 8,3 21775 61079684 /aconex/
index/current/project/39/56/0000025639/corr/000001/_dlc.cfs (deleted)
java 23749 root 123r REG 8,3 17507 61079728 /aconex/
index/current/project/39/56/0000025639/corr/000001/_dq4.cfs (deleted)
......
What is REALLY weird is that they eventually do get released. And
scarily enough, it seems to track with when the garbage collector
does a major collection (we managed to figure this out using Yourkit
profiler and hitting the force GC), and lo, they disappear... We
have many indexes (2000, one for each project-entity), and not an
UberIndex, and hence having indexes leak file handles is much more
noticeable.
We're using Lucene 1.4.3, and after hunting around in the source code
just to see what I might be missing, I came across this, and I'd just
like some comments.
CompoundFileReader has an inner-class CSInputStream which is used to
read the stream (and we're using the Compound format, so this is
relevant here).
However it overrides InputStream.close(), but does not call
super.close(). After tracing around where this is all used I believe
that this method REALLY SHOULD be calling super.close() (or not
overriding) it,because CompoundFileReader will be given an
InputStream to wrap, eventually coming down to FSInputStream which
apparently then calss Descriptor.close().
Scarily enough this ends up calling RandomAccessFile.close, which
goes into native library calls and, assumably, close the file.
The guard here is that the finalizer method in FSInputStream does
call close() so that would well explain the releasing of file handles
at garbage collection intervals.
Why would CompoundFileReader not need to call .close()?
Am I going mad here and just seeing ghosts? Comments appreciated.
Paul Smith
Re: CompoundFileReader question/'leaking' file descriptors ?
Posted by Doug Cutting <cu...@apache.org>.
Paul Smith wrote:
> is 1.9 binary backward compatible? (both source code and index format).
That is the intent. Try a nightly build:
http://cvs.apache.org/dist/lucene/java/nightly/
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: CompoundFileReader question/'leaking' file descriptors ?
Posted by Paul Smith <ps...@aconex.com>.
On 14/02/2006, at 7:44 AM, Doug Cutting wrote:
> Paul Smith wrote:
>> We're using Lucene 1.4.3, and after hunting around in the source
>> code just to see what I might be missing, I came across this, and
>> I'd just like some comments.
>
> Please try using a 1.9 build to see if this is something that's
> perhaps already been fixed.
>
is 1.9 binary backward compatible? (both source code and index format).
>> CompoundFileReader has an inner-class CSInputStream which is used
>> to read the stream (and we're using the Compound format, so this
>> is relevant here).
>> However it overrides InputStream.close(), but does not call
>> super.close(). After tracing around where this is all used I
>> believe that this method REALLY SHOULD be calling super.close()
>> (or not overriding) it,because CompoundFileReader will be given
>> an InputStream to wrap, eventually coming down to FSInputStream
>> which apparently then calss Descriptor.close().
>
> No, all CSInputStream's share a single FSInputStream, so the
> FSInputStream shouldn't be closed until all of the CSInputStream's,
> have been closed. This is done by CompoundFileReader.close(). It
> sounds like that's what's not getting called. As you update
> indexes, how do you close stale readers?
Yes, after looking at the code again I see that it extends from a
Lucene class called InputStream, and I had assumed it was
java.io.InputStream.
I'm going to take yet-another look at our code, but nothing is
obvious. Our current production environment is running in a side-by-
side mode. There are no searches being performed, only indexing as we
test out our indexing support/monitoring, and fail-over techniques.
Paul
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: CompoundFileReader question/'leaking' file descriptors ?
Posted by Paul Smith <ps...@aconex.com>.
>
> No, all CSInputStream's share a single FSInputStream, so the
> FSInputStream shouldn't be closed until all of the CSInputStream's,
> have been closed. This is done by CompoundFileReader.close(). It
> sounds like that's what's not getting called. As you update
> indexes, how do you close stale readers?
Total false alarm (as even I expected this would be). AFter digging
further and further around our code I have discovered that the
Incremental Indexer does a search of the index to find related
records to delete (when preparing for an update of an item, it needs
to know what child items to also delete, in our case which
Distributions to a mail to be removed on a mail update).
This search ends up creating an IndexSearcher which is held open for
a short time. A cache of IndexSearchers is used to smooth out lots
of change, and is held open for at least 15 seconds to ensure any
existing queries get a chance to complete (the index searcher going
into 'wait' queue to be closed).
This search of course is what is holding the files open. The search
is done just prior to the delete, and hence why the segment is marked
for deletion. Waiting the amount of time for the IndexSearcher to
close sees the file descriptor released.
Sorry for the intrusion.
cheers,
Paul Smith
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: CompoundFileReader question/'leaking' file descriptors ?
Posted by Doug Cutting <cu...@apache.org>.
Paul Smith wrote:
> We're using Lucene 1.4.3, and after hunting around in the source code
> just to see what I might be missing, I came across this, and I'd just
> like some comments.
Please try using a 1.9 build to see if this is something that's perhaps
already been fixed.
> CompoundFileReader has an inner-class CSInputStream which is used to
> read the stream (and we're using the Compound format, so this is
> relevant here).
>
> However it overrides InputStream.close(), but does not call
> super.close(). After tracing around where this is all used I believe
> that this method REALLY SHOULD be calling super.close() (or not
> overriding) it,because CompoundFileReader will be given an InputStream
> to wrap, eventually coming down to FSInputStream which apparently then
> calss Descriptor.close().
No, all CSInputStream's share a single FSInputStream, so the
FSInputStream shouldn't be closed until all of the CSInputStream's, have
been closed. This is done by CompoundFileReader.close(). It sounds
like that's what's not getting called. As you update indexes, how do
you close stale readers?
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org