You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Dennis Kubes <nu...@dragonflymc.com> on 2006/10/24 16:32:34 UTC
Re: Plugin HitCollector
Our problem is that we need to count hits for sub-categories. There are
over 550,000 categories. I am assuming I can't do this inside of a
bitset? Is there a good way to do this type of functionality?
Dennis
Andrzej Bialecki wrote:
> Dennis Kubes wrote:
>> We are running into the same issue. Remember that hits just give you
>> doc id and getting hit details from the hit does another read. So
>> looping through the hits to access every document will do a read per
>> document. If it is a small number of hits, no big deal, but the more
>> hits to access, the more time. For our situation limiting the query
>> doesn't work, we need to know information about the hit itself (i.e.
>> a certain field so we can do a count based on the field). We
>> implemented it using HitCollector modifications in Lucene. This
>> works but is not ideal in terms of speed so we are looking at making
>> modifications to the IndexReader itself so when it gets the Hits it
>> also gets our field. Understand that doing something like this
>> though changes core Lucene functionality. I am not necessarily
>> recommending doing it this way, we just couldn't find another way.
>
> Well, all depends on what kind of details you need to get from each
> hit. Have you tried using FieldCache instead? Or pre-populated BitSets
> which you then would intersect with the result BitSet to get counts of
> matching docs?
>