You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Patterson <jd...@gmail.com> on 2007/10/26 22:51:08 UTC

Cache BitSet or doc number?

Hi,

I am thinking about caching search results for common queries and just want
to check that for small numbers of results it would be better to store the
doc number as ints or shorts than to store a Filter with a BitSet.  I guess
if you results contain less than 1/32 or 1/16 of the number of documents
then it would take less memory.

Is there anything else to consider?

Thanks,

John
-- 
View this message in context: http://www.nabble.com/Cache-BitSet-or-doc-number--tf4699716.html#a13435244
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Cache BitSet or doc number?

Posted by Thom Nelson <th...@gmail.com>.
It all depends on whether you want to cache the top n doc ids or to cache
some sort of filter.  Solr uses DocSets (unordered by design) to filter
search results in a HitCollector, and there is also the Lucene Filter
mechanism which relies on BitSet.  Decoupling Filter from BitSet is a great
thing to do, but if all you want to do is cache the search results and not a
Filter, your idea of just storing ordered doc ids is probably fine.  If you
store them as VInts you can keep Integer-level precision and save some
memory.

On 10/27/07, Paul Elschot <pa...@xs4all.nl> wrote:
>
> Have a look at decoupling Filter from BitSet:
>
> http://issues.apache.org/jira/browse/LUCENE-584
>
> There also is a SortedVIntList there that stores document numbers
> more compactly than BitSet,  and an implementation of
> CachingFilterQuery (iirc) that chooses the more compact representation
> of BitSet and SortedVIntList.
>
> Regards,
> Paul Elschot
>
>
> On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote:
> > On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> > > Thom Nelson wrote:
> > > > Check out the HashDocSet from Solr, this is the best way to cache
> small
> > > > sets of search results.  In general, the Solr BitSet/DocSet classes
> are
> > > > more efficient than using the standard java.util.BitSet.  You can
> use
> > > > these independent of the rest of Solr (though I recommend checking
> out
> > > > Solr if you want to do complex caching).
> > >
> > > I imagine the fastest way to combine cached results is to store them
> in
> > > an array ordered by doc number so that the ConjunctionQuery can use
> them
> > > directly.  The Javadoc for HashDocSet says that they are stored out of
> > > order which would make this impossible.
> >
> > You're speaking at quite an abstract level... it really depends on
> > what specific issue you are seeing that you're trying to solve.
> >
> > -Yonik
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Re: Cache BitSet or doc number?

Posted by Paul Elschot <pa...@xs4all.nl>.
Have a look at decoupling Filter from BitSet:

http://issues.apache.org/jira/browse/LUCENE-584

There also is a SortedVIntList there that stores document numbers
more compactly than BitSet,  and an implementation of
CachingFilterQuery (iirc) that chooses the more compact representation
of BitSet and SortedVIntList.

Regards,
Paul Elschot


On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote:
> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> > Thom Nelson wrote:
> > > Check out the HashDocSet from Solr, this is the best way to cache small
> > > sets of search results.  In general, the Solr BitSet/DocSet classes are
> > > more efficient than using the standard java.util.BitSet.  You can use
> > > these independent of the rest of Solr (though I recommend checking out
> > > Solr if you want to do complex caching).
> >
> > I imagine the fastest way to combine cached results is to store them in
> > an array ordered by doc number so that the ConjunctionQuery can use them
> > directly.  The Javadoc for HashDocSet says that they are stored out of
> > order which would make this impossible.
>
> You're speaking at quite an abstract level... it really depends on
> what specific issue you are seeing that you're trying to solve.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Cache BitSet or doc number?

Posted by Yonik Seeley <yo...@apache.org>.
On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> Thom Nelson wrote:
> > Check out the HashDocSet from Solr, this is the best way to cache small
> > sets of search results.  In general, the Solr BitSet/DocSet classes are
> > more efficient than using the standard java.util.BitSet.  You can use
> > these independent of the rest of Solr (though I recommend checking out
> > Solr if you want to do complex caching).
> >
>
> I imagine the fastest way to combine cached results is to store them in an
> array ordered by doc number so that the ConjunctionQuery can use them
> directly.  The Javadoc for HashDocSet says that they are stored out of order
> which would make this impossible.

You're speaking at quite an abstract level... it really depends on
what specific issue you are seeing that you're trying to solve.

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Cache BitSet or doc number?

Posted by John Patterson <jd...@gmail.com>.


Thom Nelson wrote:
> 
> Check out the HashDocSet from Solr, this is the best way to cache small 
> sets of search results.  In general, the Solr BitSet/DocSet classes are 
> more efficient than using the standard java.util.BitSet.  You can use 
> these independent of the rest of Solr (though I recommend checking out 
> Solr if you want to do complex caching).
> 

I imagine the fastest way to combine cached results is to store them in an
array ordered by doc number so that the ConjunctionQuery can use them
directly.  The Javadoc for HashDocSet says that they are stored out of order
which would make this impossible.
-- 
View this message in context: http://www.nabble.com/Cache-BitSet-or-doc-number--tf4699716.html#a13435843
Sent from the Lucene - Java Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Cache BitSet or doc number?

Posted by Thom Nelson <th...@gmail.com>.
Check out the HashDocSet from Solr, this is the best way to cache small 
sets of search results.  In general, the Solr BitSet/DocSet classes are 
more efficient than using the standard java.util.BitSet.  You can use 
these independent of the rest of Solr (though I recommend checking out 
Solr if you want to do complex caching).

- Thom

John Patterson wrote:
> Hi,
>
> I am thinking about caching search results for common queries and just want
> to check that for small numbers of results it would be better to store the
> doc number as ints or shorts than to store a Filter with a BitSet.  I guess
> if you results contain less than 1/32 or 1/16 of the number of documents
> then it would take less memory.
>
> Is there anything else to consider?
>
> Thanks,
>
> John
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org