You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "jayson.minard" <ja...@gmail.com> on 2009/01/21 08:58:12 UTC

Intersecting a precalculated OpenBitSet with results

Hello.

We currently are intersecting a precalculated OpenBitSet (that does not
affect score) with search by adding it to the filters in the ResponseBuilder
via a wrapped in a ConstantScoreQuery and Filter.  We do this just before
QueryComponent does its job.   

The code appears like:



    List filters = rb.getFilters();
    if (filters == null) {
      filters = new ArrayList();
      rb.setFilters(filters);
    }
    Filter filter = new Filter() {
      @Override
      public DocIdSet getDocIdSet(IndexReader reader) throws IOException {
        return obs;
      }
    };
    filters.add(new ConstantScoreQuery(filter));



This works as expected and is a common solution mentioned here on the forum.

But, it does more work than needed.  Wouldn't it be better to just intersect
this set with the results after they are determined?  To do this, Query
Component needs a place to allow intersections before it stuffs the response
in the response.  

Is that a correct assumption that we can do that?  

Old code...



    SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
    cmd.setTimeAllowed(timeAllowed);
    SolrIndexSearcher.QueryResult result = new
SolrIndexSearcher.QueryResult();
    searcher.search(result,cmd);
    rb.setResult( result );



New code assuming the bitmasks were stored in the ResponseBuilder
getIntersections() list... (but not sure what to do with a DocList)



    SolrIndexSearcher.QueryCommand cmd = rb.getQueryCommand();
    cmd.setTimeAllowed(timeAllowed);
    SolrIndexSearcher.QueryResult result = new
SolrIndexSearcher.QueryResult();
    searcher.search(result,cmd);
     if (rb.getIntersections() != null) {
      for (OpenBitSet bitsetToApply : rb.getIntersections()) {
        if (result.getDocSet() != null) {
          result.setDocSet(result.getDocSet().intersection(new
BitDocSet(bitsetToApply)));
        }
        else if (result.getDocList() != null) {
          result.setDocSet(result.getDocList().intersection(new
BitDocSet(bitsetToApply)));
          resut.setDocList(null); // can this change on the fly like this?
        }
        else {
           throw new RuntimeException ("unexpected resluts contains none of
DocSet or DocList");
        }
      }
    }
    rb.setResult( result );



Or is this the wrong path to go down?  

Any advice is helpful, and most likely this would involve a patch to make
happen.  
-- 
View this message in context: http://www.nabble.com/Intersecting-a-precalculated-OpenBitSet-with-results-tp21578520p21578520.html
Sent from the Solr - Dev mailing list archive at Nabble.com.

Re: Intersecting a precalculated OpenBitSet with results

Posted by Chris Hostetter <ho...@fucit.org>.
: We currently are intersecting a precalculated OpenBitSet (that does not
: affect score) with search by adding it to the filters in the ResponseBuilder
: via a wrapped in a ConstantScoreQuery and Filter.  We do this just before
: QueryComponent does its job.   

	...

: But, it does more work than needed.  Wouldn't it be better to just intersect
: this set with the results after they are determined?  To do this, Query
: Component needs a place to allow intersections before it stuffs the response
: in the response.  

I think a simpler approach might be to write a component that runs after 
QueryComponent which:
 * extracts the doclist/set currently in the response (by QueryComponent)
 * intersects them with the computed bitsets
 * writes the new doclist/docset back into the response


However: i wouldn't actaully assume that's better then your current 
approach of wrapping the OpenBitSet into a ConstantScoreQuery -- your 
current approach is probably helping the the IndexSearcher "skip" past a 
lot of docs w/o scoring them, but applying the intersection after the fact 
would eliminate that optimization.

It's also very hard to compute a DocList from an intersection after the 
fact ... you essentially need to re-execute the search anyway since it 
only keeps track of a set number of sorted documents.




-Hoss