You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Declan Newman <go...@gmail.com> on 2008/07/14 20:46:57 UTC

MultiSearcher and TopFieldDocCollector

Hi,

I'm in the process of trying to optimize searches and avoid the dreaded 
OutOfMemoryError s.

We currently return the entire document from each of the search results 
and then filter the results using parameters obtained from a database. 
Not very efficient.

The idea was to override TopFieldDocCollector to do the sorting etc. and 
only load the full document for those we need to display. But, I haven't 
found an easy way to use TopFieldDocCollector (FieldSortedHitQueue etc.) 
with MultiSearcher.

Is there an easy way I've missed?

Thanks for any advice.

Declan

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: MultiSearcher and TopFieldDocCollector

Posted by Chris Hostetter <ho...@fucit.org>.

: The idea was to override TopFieldDocCollector to do the sorting etc. and only
: load the full document for those we need to display. But, I haven't found an
: easy way to use TopFieldDocCollector (FieldSortedHitQueue etc.) with
: MultiSearcher.

I don't understand this statement ... i mean, i haven't ever used 
MultiSearcher, but it has a search(Weight,Filter,HitCollector) ... what 
exactly is the problem you are facing?


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Stable score scaling; LSI again

Posted by Asad Sayeed <ab...@us.ibm.com>.

In other words, for my first question, what I want to know is how I might
consistently and correctly get the same max score for any two pairs of
identical documents without having to rewrite major parts of lucene.   I
could find ALL the scores and divide them by the max, but that seems
somehow wrong and not robust, especially since if I put the identical
documents several times into the index, I get slightly different scores
from a MoreLikeThis query.

Yours,
--Asad.


                                                                           
             Asad                                                          
             Sayeed/Watson/IBM                                             
             @IBMUS                                                     To 
                                       java-user@lucene.apache.org         
             07/14/2008 10:15                                           cc 
             PM                                                            
                                                                   Subject 
                                       Stable score scaling; LSI again     
             Please respond to                                             
             java-user@lucene.                                             
                apache.org                                                 
                                                                           
                                                                           
                                                                           




Hi, I have a couple of questions about how to alter the similarity scores.
I need scores that can be thresholded, and whose thresholds remain stable
even when I add documents to the IndexWriter. ie, identity should be a
fixed value such as 1.0.  I know that for efficiency reasons, Lucene
doesn't do this.  However, that level of efficiency is not as big a concern
for me as getting a stable, thresholdable similarity score from, eg,
"normal" cosine similarity.  Is there a way to change the DefaultSimilarity
trivally to get this feature, or is it a major overhaul?  The searches from
Lucene are being fed to another analyzer is why, so when the "identity"
score changes by adding docs to the index, it messes up the rest of the
processing.

The other question I had was about scoring via Latent Semantic Indexing.  I
read in the archives of this list from way back when that LSI was hard to
integrate into Lucene.  Is that still the case?  I mean, from what I
understand, it is just transforming the index in some way.

Yours,
--Asad.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Stable score scaling; LSI again

Posted by Asad Sayeed <ab...@us.ibm.com>.

Hi, I have a couple of questions about how to alter the similarity scores.
I need scores that can be thresholded, and whose thresholds remain stable
even when I add documents to the IndexWriter. ie, identity should be a
fixed value such as 1.0.  I know that for efficiency reasons, Lucene
doesn't do this.  However, that level of efficiency is not as big a concern
for me as getting a stable, thresholdable similarity score from, eg,
"normal" cosine similarity.  Is there a way to change the DefaultSimilarity
trivally to get this feature, or is it a major overhaul?  The searches from
Lucene are being fed to another analyzer is why, so when the "identity"
score changes by adding docs to the index, it messes up the rest of the
processing.

The other question I had was about scoring via Latent Semantic Indexing.  I
read in the archives of this list from way back when that LSI was hard to
integrate into Lucene.  Is that still the case?  I mean, from what I
understand, it is just transforming the index in some way.

Yours,
--Asad.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org