You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Otis Gospodnetic <ot...@yahoo.com> on 2007/05/02 20:31:15 UTC

Custom HitCollector with SolrIndexSearcher and caching

Hi,

I have a situation where I have some external weight information that I'd like to use for the computation of the final "weighted" ranking and I'm trawling through Solr sources for a good place to plug this in.  What I have is an index in which each Document has an identifier that I can map to some numeric weight stored externally (e.g. in a text file with identifier->weight, which I read on startup).  Searches return the regular hits, but before returning the final responses, I'd like to take each hit's score and multiply it by the appropriate weight.

This sounds like a job for a custom WeightedHitCollector that does the multiplication as it gets hits' docIds and scores, and a custom WeightedRequestHandler.  Sounds right?  If so, I'm looking at SolrIndexSearcher to see what this WRHandler could call and still make use of all caching and other goodness in there.  From what I can tell so far, there is no way I can pass my custom WHCollector to any of the SISearcher methods and benefit from caching.

I feel like I might be missing something, and there is in fact a way to use a custom HitCollector and benefit from caching, but I just don't see it now.

Thanks,
Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share



Re: Custom HitCollector with SolrIndexSearcher and caching

Posted by Chris Hostetter <ho...@fucit.org>.
: I feel like I might be missing something, and there is in fact a way to
: use a custom HitCollector and benefit from caching, but I just don't see
: it now.

I can't think of any easy way to do what you describe ... you can always
use the low level IndexSearcher methods with a custom HitCollector that
wraps a DocSetHitCollector and then explicitly cache the DocSet yourself,
but thta doesn't really help you with the DocList ... there definitely
doesn't seem to be an *easy* way to do what you're describing at the
moment, but with a little refactoring methods like getDocListAndSet
*coult* take in some sort of CompositeHitCollector class with an API
like...

   /**
    * a HitCollector whose colelct method will delegate to a specified
    * HitCollector for each match it wants collected
    */
   public abstract class CompositeHitCollector extends HitCollector {
     public setComposed(HitCollector inner);
   }

...then the meat and potatoes methods of SolrIndexSearcher could take in
your custom written CompositeHitCollector, specify the anonymous inner
HitCollector it needs to use for the case it finds itself in, and now
you've got a window into the collection process where you can much with
scores or igore certain matches.

It would be a non trivial change, but it would be possible.




-Hoss