You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Dmitry Serebrennikov <dm...@earthlink.net> on 2002/05/01 22:16:48 UTC
Re: FileNotFoundException: Too many open files
PA,
> On average, there seem to be less than one hundred Lucene files per
index.
You are probably past this point by now, but since I didn't see anyone
pick up on this, I wanted to respond.
"Less then a hundred" is definetely too many files for a Lucene index,
unless you have a very large number of stored fields!
An optimized index should have about a dozen. So this either means that
you have many stored fields, or you are not calling optimize, or that,
if you are, there are unclosed IndexReader instances floating around
that are still using segments that existed before the optimization
(which replaces all segments with one new one).
About file names:
Here's the naming convention of the files in the index. This might help
you understand which kind of a situation you are facing:
The index directory has the following files:
deletable - one, lists segment ids that can be deleted when no
longer locked by the filesystem because they are open
segments - one, lists segment ids of the current set of segments
_<n>.tii - one per segment, "term index" file
_<n>.tis - one per segment, "term infos" file
_<n>.frq - one per segment, "term frequency" file
_<n>.prx - one per segment, "term positions" file
_<n>.fdx - one per segment, "field index" file
_<n>.fdt - one per segment, "field infos" file
_<n>.fnm - one per segment, "field infos" file
_<n>.f<m> - one per segment per stored field, "field data" file
<n> - is the segment number, encoded using numbers and letters
<m> - is the field number, which is a unique field id in that segment.
(I realize that this is still too vague, but I had not looked through
that code in a while, so I can't do better than "term infos" and "field
infos" right now. However, this should give you an idea of what to
expect I think).
An index should have 2 + n * (7 + m) files, where n is the number of
segments and m is the number of stored fields. For an optimized index
with one stored field this gives 10 files (not a 100!).
About garbage collection:
I believe that the IndexReader instances will attempt to close
themselves upon finalization, however that may occur very differently
between different VMs and OSs. So, unless IndexReaders are closed
explicitly, this might explain why an application runs fine under
Windows, but has problems under OSX, or whatever.
About the file handles:
I'm not familiar with BSD (which is the basis for OSX on which you are
having these problems, right?), so I don't know how the number of open
files is managed there. I know that on Solaris it is a per-process
setting with a "soft" limit, "hard" limit, both controlled by each user,
and a system-wide max to the "hard" limit which only a root can change.
I agree that a desktop application should not require changes to system
configuration, but it might resonably expect a default value to be
present and it might change the soft limit (which is usually set very
low) in the startup script.
On NT, so far as I know, there is no explicit setting for the number of
open files. Rather, it is limited by the amount of available memory in a
particular NT kernel memory pool (not just the free memory on the
system). The pool size can be controlled probably, but I've found that
it is usually generous enough - more so than the Solaris settings.
If BSD is like NT in this regard (at least to some degree), the number
of open files will be determined for the entire system, so depending on
what other applications are running, your tests may produce a different
results.
Good luck.
Dmitry.
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: FileNotFoundException: Too many open files
Posted by petite_abeille <pe...@mac.com>.
On Wednesday, May 1, 2002, at 10:16 PM, Dmitry Serebrennikov wrote:
> "Less then a hundred" is definetely too many files for a Lucene index,
> unless you have a very large number of stored fields!
Since changing indexing strategy, I have between 12 to 20 files per
index (including deletable and segments).
> An optimized index should have about a dozen.
That what I see for small objects (eg with few fields).
> So this either means that you have many stored fields
My "richest" object has around a dozen fields.
> or you are not calling optimize, or that, if you are, there are
> unclosed IndexReader instances floating around that are still using
> segments that existed before the optimization (which replaces all
> segments with one new one).
I guess I have this part under control now.
> About file names:
Thanks for the explanation :-) I mostly have <n>.f<m> type files as one
may expect.
> An index should have 2 + n * (7 + m) files, where n is the number of
> segments and m is the number of stored fields. For an optimized index
> with one stored field this gives 10 files (not a 100!).
It seems that I'm getting there... ;-)
> So, unless IndexReaders are closed explicitly, this might explain why
> an application runs fine under Windows, but has problems under OSX, or
> whatever.
I decided to be much more "aggressive" with all the file handles... But
I still rely heavely on the garbage collector as I'm using the reference
api extensively... Seems to work fine so far...
> I agree that a desktop application should not require changes to system
> configuration, but it might resonably expect a default value to be
> present and it might change the soft limit (which is usually set very
> low) in the startup script.
So far, my app seems to be doing fine without having to mess around with
any system parameters... Also, it seems to be more responsive since I
have more indexes... Go figure ;-)
> Good luck.
Thanks.
PA.
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>