You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by pb...@gmx.net on 2007/04/28 01:03:23 UTC

How to index a lot of fields (without FileNotFoundException: Too many open files)

Hello,

What would be the best strategy to support an index with thousands or even hundreds of thousands of individual field names?

I have client applications that create a lot of key/value type data. I use the key as document field name so I end up with _a lot_ of .f<m> files and
eventually the my application breaks with: FileNotFoundException: Too many open files.

I searched this list and it seems like others had this problem before, but I could not find a solution.

>From what I read in the Lucene docs, these .f<m> files store the normalization factor for the corresponding field. What exactly is this used for and more importantly, can this be disabled so that the files are not created in the first place?

Any advice is appreciated.

Thanks,
Rico.


-- 
"Feel free" - 10 GB Mailbox, 100 FreeSMS/Monat ...
Jetzt GMX TopMail testen: http://www.gmx.net/de/go/topmail

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

Posted by Doron Cohen <DO...@il.ibm.com>.

Just in case norms info cannot be spared, note that since Lucene 2.1 norms
are maintained in a single file, no matter how many fields there are.

However due to a bug in 2.1 this did not prevent the too many open files
problem. This bug was already fixed but not yet released. For more details
on this fix see LUCENE-821.

Doron

Chris Hostetter <ho...@fucit.org> wrote on 27/04/2007 17:25:08:

>
> : >From what I read in the Lucene docs, these .f<m> files store the
> : normalization factor for the corresponding field. What exactly is this
> : used for and more importantly, can this be disabled so that the files
> : are not created in the first place?
>
> field norms are primarily used for length normalization, but they also
> store field and document boosts ... more info can be found in the scoring
> and Similarity docs.
>
> if you don't care about these things, there is an omitNorms option on the
> Field class that can be used to not only reduce the number of open
> files you need, but also make your index smaller on disk, and reduce your
> memory usage.  (i believe it was added in 1.9)
>
> -Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: How to index a lot of fields (without FileNotFoundException: Too many open files)

Posted by Chris Hostetter <ho...@fucit.org>.

: >From what I read in the Lucene docs, these .f<m> files store the
: normalization factor for the corresponding field. What exactly is this
: used for and more importantly, can this be disabled so that the files
: are not created in the first place?

field norms are primarily used for length normalization, but they also
store field and document boosts ... more info can be found in the scoring
and Similarity docs.

if you don't care about these things, there is an omitNorms option on the
Field class that can be used to not only reduce the number of open
files you need, but also make your index smaller on disk, and reduce your
memory usage.  (i believe it was added in 1.9)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org