You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by John Patterson <jd...@gmail.com> on 2007/10/26 22:51:08 UTC
Cache BitSet or doc number?
Hi,
I am thinking about caching search results for common queries and just want
to check that for small numbers of results it would be better to store the
doc number as ints or shorts than to store a Filter with a BitSet. I guess
if you results contain less than 1/32 or 1/16 of the number of documents
then it would take less memory.
Is there anything else to consider?
Thanks,
John
--
View this message in context: http://www.nabble.com/Cache-BitSet-or-doc-number--tf4699716.html#a13435244
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Cache BitSet or doc number?
Posted by Thom Nelson <th...@gmail.com>.
It all depends on whether you want to cache the top n doc ids or to cache
some sort of filter. Solr uses DocSets (unordered by design) to filter
search results in a HitCollector, and there is also the Lucene Filter
mechanism which relies on BitSet. Decoupling Filter from BitSet is a great
thing to do, but if all you want to do is cache the search results and not a
Filter, your idea of just storing ordered doc ids is probably fine. If you
store them as VInts you can keep Integer-level precision and save some
memory.
On 10/27/07, Paul Elschot <pa...@xs4all.nl> wrote:
>
> Have a look at decoupling Filter from BitSet:
>
> http://issues.apache.org/jira/browse/LUCENE-584
>
> There also is a SortedVIntList there that stores document numbers
> more compactly than BitSet, and an implementation of
> CachingFilterQuery (iirc) that chooses the more compact representation
> of BitSet and SortedVIntList.
>
> Regards,
> Paul Elschot
>
>
> On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote:
> > On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> > > Thom Nelson wrote:
> > > > Check out the HashDocSet from Solr, this is the best way to cache
> small
> > > > sets of search results. In general, the Solr BitSet/DocSet classes
> are
> > > > more efficient than using the standard java.util.BitSet. You can
> use
> > > > these independent of the rest of Solr (though I recommend checking
> out
> > > > Solr if you want to do complex caching).
> > >
> > > I imagine the fastest way to combine cached results is to store them
> in
> > > an array ordered by doc number so that the ConjunctionQuery can use
> them
> > > directly. The Javadoc for HashDocSet says that they are stored out of
> > > order which would make this impossible.
> >
> > You're speaking at quite an abstract level... it really depends on
> > what specific issue you are seeing that you're trying to solve.
> >
> > -Yonik
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
Re: Cache BitSet or doc number?
Posted by Paul Elschot <pa...@xs4all.nl>.
Have a look at decoupling Filter from BitSet:
http://issues.apache.org/jira/browse/LUCENE-584
There also is a SortedVIntList there that stores document numbers
more compactly than BitSet, and an implementation of
CachingFilterQuery (iirc) that chooses the more compact representation
of BitSet and SortedVIntList.
Regards,
Paul Elschot
On Saturday 27 October 2007 02:15:48 Yonik Seeley wrote:
> On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> > Thom Nelson wrote:
> > > Check out the HashDocSet from Solr, this is the best way to cache small
> > > sets of search results. In general, the Solr BitSet/DocSet classes are
> > > more efficient than using the standard java.util.BitSet. You can use
> > > these independent of the rest of Solr (though I recommend checking out
> > > Solr if you want to do complex caching).
> >
> > I imagine the fastest way to combine cached results is to store them in
> > an array ordered by doc number so that the ConjunctionQuery can use them
> > directly. The Javadoc for HashDocSet says that they are stored out of
> > order which would make this impossible.
>
> You're speaking at quite an abstract level... it really depends on
> what specific issue you are seeing that you're trying to solve.
>
> -Yonik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Cache BitSet or doc number?
Posted by Yonik Seeley <yo...@apache.org>.
On 10/26/07, John Patterson <jd...@gmail.com> wrote:
> Thom Nelson wrote:
> > Check out the HashDocSet from Solr, this is the best way to cache small
> > sets of search results. In general, the Solr BitSet/DocSet classes are
> > more efficient than using the standard java.util.BitSet. You can use
> > these independent of the rest of Solr (though I recommend checking out
> > Solr if you want to do complex caching).
> >
>
> I imagine the fastest way to combine cached results is to store them in an
> array ordered by doc number so that the ConjunctionQuery can use them
> directly. The Javadoc for HashDocSet says that they are stored out of order
> which would make this impossible.
You're speaking at quite an abstract level... it really depends on
what specific issue you are seeing that you're trying to solve.
-Yonik
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Cache BitSet or doc number?
Posted by John Patterson <jd...@gmail.com>.
Thom Nelson wrote:
>
> Check out the HashDocSet from Solr, this is the best way to cache small
> sets of search results. In general, the Solr BitSet/DocSet classes are
> more efficient than using the standard java.util.BitSet. You can use
> these independent of the rest of Solr (though I recommend checking out
> Solr if you want to do complex caching).
>
I imagine the fastest way to combine cached results is to store them in an
array ordered by doc number so that the ConjunctionQuery can use them
directly. The Javadoc for HashDocSet says that they are stored out of order
which would make this impossible.
--
View this message in context: http://www.nabble.com/Cache-BitSet-or-doc-number--tf4699716.html#a13435843
Sent from the Lucene - Java Users mailing list archive at Nabble.com.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Cache BitSet or doc number?
Posted by Thom Nelson <th...@gmail.com>.
Check out the HashDocSet from Solr, this is the best way to cache small
sets of search results. In general, the Solr BitSet/DocSet classes are
more efficient than using the standard java.util.BitSet. You can use
these independent of the rest of Solr (though I recommend checking out
Solr if you want to do complex caching).
- Thom
John Patterson wrote:
> Hi,
>
> I am thinking about caching search results for common queries and just want
> to check that for small numbers of results it would be better to store the
> doc number as ints or shorts than to store a Filter with a BitSet. I guess
> if you results contain less than 1/32 or 1/16 of the number of documents
> then it would take less memory.
>
> Is there anything else to consider?
>
> Thanks,
>
> John
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org