You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Wilton, Reece" <Re...@dig.com> on 2003/10/07 18:47:27 UTC
Too Many Open Files
Hi,
The index directory that Lucene created has 2,322 files in it. When I
try to open it I get the dreaded "Too Many Open Files" problem:
java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)
The index has about 50,000 docs in it. It was created with a merge
factor of 5,000. Is there a way that I can reduce the number of files
or increase the number of files that windows can open?
Any help is appreciated!
Reece
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Too Many Open Files
Posted by Doug Cutting <cu...@lucene.com>.
Wilton, Reece wrote:
> The index directory that Lucene created has 2,322 files in it. When I
> try to open it I get the dreaded "Too Many Open Files" problem:
> java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
> files)
>
> The index has about 50,000 docs in it. It was created with a merge
> factor of 5,000. Is there a way that I can reduce the number of files
> or increase the number of files that windows can open?
5000 is way too large for the merge factor. Please read the FAQ and
other messages on this list for guidelines. I've personally never found
use for a merge factor larger than 50.
Doug
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
document score for a page
Posted by Maurice Coyle <ma...@ucd.ie>.
hi,
if i have an index containing a page and i want to know the score that page
has for a given query, is there a way of finding out the score without
performing a search? it seems like a strange question but the reason is
that if i perform a search using the given query on my index, sometimes the
page i want is not returned in the Hits object and so i can't find out the
score of the page.
i'm performing some analysis of search results so for each query i need to
know the score for every page in the index for that query, even if it's not
returned when a search is performed on the query. will i need to implement
a method to do this, or does one already exist?
i guess as an addendum to this question it would be useful for me to know
when lucene decides to stop returning results. is it just when all pages
containing the query term have been returned?
maurice
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Document-Document similarity
Posted by Steve Rowe <sa...@gwmail.syr.edu>.
Maurice,
Why not perform document-as-query? That is, parse a document to
produce a query, submit the query, and get a list of documents ranked
by similarity.
Are you trying to do clustering? Write a custom analyzer which saves
the analysis of each document as it's parsed for the indexing process,
then iterate through all of the documents, submit each as a query, and
collect the results.
Or pseudo-relevance feedback? Re-parse the top N documents resulting
from a given query, bundle up the results as another query, then
recombine the scores after you weight the components (Rocchio's
formula; the full thing also involves a negatively reinforcing
component -- re-parse the bottom M documents resulting from the
initial query, package as another query, then use a negative weight
when combining with other components' scores -- but this step doesn't
seem to contribute positively in a reliable fashion to the overall
outcome).
Steve Rowe
Maurice Coyle wrote:
> does anyone know of a way to get the similarity between two documents as
> opposed to between a document and a query? at the moment, i'm forced to
> make a term-frequency vector for each document and get the cosine of the
> angle between them, but i was hoping there was a more elegant way of doing
> this using either the lucene api (although from my study of it it doesnt
> look like this is the case) or some other class library that another lucene
> user has created.
>
> any help much appreciated.
>
> maurice
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Document-Document similarity
Posted by Maurice Coyle <ma...@ucd.ie>.
hi,
does anyone know of a way to get the similarity between two documents as
opposed to between a document and a query? at the moment, i'm forced to
make a term-frequency vector for each document and get the cosine of the
angle between them, but i was hoping there was a more elegant way of doing
this using either the lucene api (although from my study of it it doesnt
look like this is the case) or some other class library that another lucene
user has created.
any help much appreciated.
maurice
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
Re: Too Many Open Files
Posted by Rociel Buico <bu...@yahoo.com>.
try to check your codes if you are openning a file, close it after using.
--buics
"Wilton, Reece" <Re...@dig.com> wrote:
Hi,
The index directory that Lucene created has 2,322 files in it. When I
try to open it I get the dreaded "Too Many Open Files" problem:
java.io.FileNotFoundException: C:\Index\_1lvq.f107 (Too many open
files)
The index has about 50,000 docs in it. It was created with a merge
factor of 5,000. Is there a way that I can reduce the number of files
or increase the number of files that windows can open?
Any help is appreciated!
Reece
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org
---------------------------------
Do you Yahoo!?
The New Yahoo! Shopping - with improved product search