You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Wilton, Reece" <Re...@dig.com> on 2003/10/07 18:47:27 UTC

Too Many Open Files

Hi,

The index directory that Lucene created has 2,322 files in it.  When I
try to open it I get the dreaded "Too Many Open Files" problem:
    java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)

The index has about 50,000 docs in it.  It was created with a merge
factor of 5,000.  Is there a way that I can reduce the number of files
or increase the number of files that windows can open?

Any help is appreciated!
Reece

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Too Many Open Files

Posted by Doug Cutting <cu...@lucene.com>.
Wilton, Reece wrote:
> The index directory that Lucene created has 2,322 files in it.  When I
> try to open it I get the dreaded "Too Many Open Files" problem:
>     java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
> files)
> 
> The index has about 50,000 docs in it.  It was created with a merge
> factor of 5,000.  Is there a way that I can reduce the number of files
> or increase the number of files that windows can open?

5000 is way too large for the merge factor.  Please read the FAQ and 
other messages on this list for guidelines.  I've personally never found 
use for a merge factor larger than 50.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


document score for a page

Posted by Maurice Coyle <ma...@ucd.ie>.
hi,

if i have an index containing a page and i want to know the score that page
has for a given query, is there a way of finding out the score without
performing a search?  it seems like a strange question but the reason is
that if i perform a search using the given query on my index, sometimes the
page i want is not returned in the Hits object and so i can't find out the
score of the page.

i'm performing some analysis of search results so for each query i need to
know the score for every page in the index for that query, even if it's not
returned when a search is performed on the query.  will i need to implement
a method to do this, or does one already exist?

i guess as an addendum to this question it would be useful for me to know
when lucene decides to stop returning results.  is it just when all pages
containing the query term have been returned?

maurice


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Document-Document similarity

Posted by Steve Rowe <sa...@gwmail.syr.edu>.
Maurice,

Why not perform document-as-query?  That is, parse a document to 
produce a query, submit the query, and get a list of documents ranked 
by similarity.

Are you trying to do clustering?  Write a custom analyzer which saves 
the analysis of each document as it's parsed for the indexing process, 
then iterate through all of the documents, submit each as a query, and 
collect the results.

Or pseudo-relevance feedback?  Re-parse the top N documents resulting 
from a given query, bundle up the results as another query, then 
recombine the scores after you weight the components (Rocchio's 
formula; the full thing also involves a negatively reinforcing 
component -- re-parse the bottom M documents resulting from the 
initial query, package as another query, then use a negative weight 
when combining with other components' scores -- but this step doesn't 
seem to contribute positively in a reliable fashion to the overall 
outcome).

Steve Rowe

Maurice Coyle wrote:
> does anyone know of a way to get the similarity between two documents as
> opposed to between a document and a query?  at the moment, i'm forced to
> make a term-frequency vector for each document and get the cosine of the
> angle between them, but i was hoping there was a more elegant way of doing
> this using either the lucene api (although from my study of it it doesnt
> look like this is the case) or some other class library that another lucene
> user has created.
> 
> any help much appreciated.
> 
> maurice


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Document-Document similarity

Posted by Maurice Coyle <ma...@ucd.ie>.
hi,

does anyone know of a way to get the similarity between two documents as
opposed to between a document and a query?  at the moment, i'm forced to
make a term-frequency vector for each document and get the cosine of the
angle between them, but i was hoping there was a more elegant way of doing
this using either the lucene api (although from my study of it it doesnt
look like this is the case) or some other class library that another lucene
user has created.

any help much appreciated.

maurice


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Re: Too Many Open Files

Posted by Rociel Buico <bu...@yahoo.com>.
try to check your codes if you are openning a file, close it after using.
 
--buics

"Wilton, Reece" <Re...@dig.com> wrote:
Hi,

The index directory that Lucene created has 2,322 files in it. When I
try to open it I get the dreaded "Too Many Open Files" problem:
java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)

The index has about 50,000 docs in it. It was created with a merge
factor of 5,000. Is there a way that I can reduce the number of files
or increase the number of files that windows can open?

Any help is appreciated!
Reece

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search