You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Nadav Har'El <ny...@math.technion.ac.il> on 2006/06/27 18:08:23 UTC

Combining Hits and HitCollector

Hi,

Searcher.search(Query) returns a Hits object, useful for the display of top
results. Searcher.search(Query, HitCollector) runs a HitsCollector for doing
some sort of processing over all results.
Unfortunately, there is currently no method to do both at the same time.

For some uses, for example faceted search (that was discussed on this list
a few times in the past), you need to do both: go over all results (and,
for example, count how many results belong to each value), and at the same
time build a Hits object (for displaying the top search results).

Changing Searcher, and/or Hits to allow for doing both things at once should
not be too hard, but before I go and do it (and submit the change as a patch),
I was wondering if I'm not reinventing the wheel, and if perhaps someone has
already done this, or there were already discussions on how or how not to do
it.

Thanks,
Nadav.


-- 
Nadav Har'El                        |      Tuesday, Jun 27 2006, 1 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |Unix is user friendly - it's just picky
http://nadav.harel.org.il           |about its friends.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Combining Hits and HitCollector

Posted by Nadav Har'El <ny...@math.technion.ac.il>.
On Tue, Jun 27, 2006, Chuck Williams wrote about "Re: Combining Hits and HitCollector":
> IMHO, Hits is the worst class in Lucene.  It's atrocities are numerous,
> including the hardwired "50" and the strange normalization of dividing
> all scores by the top score if the top score happens to be greater than
> 1.0 (which destroys any notion of score values having any absolute
> meaning, although many apps erroneously assume they do).  It is quite
> easy to use a TopDocsCollector or a TopFieldDocCollector and do a better
> job than Hits does.

Thanks for the suggestion.

You've made a very good point, and indeed I'm beginning to question the
value in my idea of combining Hits and a HitCollector, when for almost
any application I can think of a TopDocs would be just as good as Hits,
and when (as you said) it's much easier to combine the collector building
a TopDocs (TopDocsCollector or TopFieldDocCollector) with another collector.

Perhaps a "MultiHitCollector" combining several other collectors could be
useful, although you're right and it's very easy to write one when needed
and it doesn't really need to be part of Lucene's core.

> This all notwithstanding, a built-in class that combined Hits with a
> second HitCollector probably would be used by many people, although I
> would recommend the approach above as a better alternative.

I wonder: if Hits is considered a problematic class, should we really go
ahead and expand its capabilities, like I proposed initially? Perhaps not...
Perhaps it's better to recommend other approaches in javadoc, FAQs, or in
the form of new code, say, two new simple methods in Searcher:

	TopDocs search(Query, Filter, int, HitCollector)
	TopFieldDocs search(Query, Filter, int, Sort, HitCollector)

In the long run, perhaps we need to give some thought as to whether we
should continue demonstrating the use of Hits (rather than TopDocs) in most
Lucene examples, and whether perhaps, the Hits API should be deprecated.


Nadav.

-- 
Nadav Har'El                        |      Tuesday, Jun 27 2006, 2 Tammuz 5766
IBM Haifa Research Lab              |-----------------------------------------
                                    |"Never be afraid to tell the world who
http://nadav.harel.org.il           |you are." -- Anonymous

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Re: Combining Hits and HitCollector

Posted by Chuck Williams <ch...@manawiz.com>.
IMHO, Hits is the worst class in Lucene.  It's atrocities are numerous,
including the hardwired "50" and the strange normalization of dividing
all scores by the top score if the top score happens to be greater than
1.0 (which destroys any notion of score values having any absolute
meaning, although many apps erroneously assume they do).  It is quite
easy to use a TopDocsCollector or a TopFieldDocCollector and do a better
job than Hits does.

For faceted search I use a SamplingHitCollector to gather the
facet-determination sample.  It takes as one of its constructor
parameters, rankingCollector, an arbitrary HitCollector to gather the
top scoring or top sorted results.  Then it only takes one line of code
to combine the two collectors:  rankingCollector.collect(doc, score)
within SamplingHitCollector.collect().

This all notwithstanding, a built-in class that combined Hits with a
second HitCollector probably would be used by many people, although I
would recommend the approach above as a better alternative.

Chuck


Nadav Har'El wrote on 06/27/2006 09:08 AM:
> Hi,
>
> Searcher.search(Query) returns a Hits object, useful for the display of top
> results. Searcher.search(Query, HitCollector) runs a HitsCollector for doing
> some sort of processing over all results.
> Unfortunately, there is currently no method to do both at the same time.
>
> For some uses, for example faceted search (that was discussed on this list
> a few times in the past), you need to do both: go over all results (and,
> for example, count how many results belong to each value), and at the same
> time build a Hits object (for displaying the top search results).
>
> Changing Searcher, and/or Hits to allow for doing both things at once should
> not be too hard, but before I go and do it (and submit the change as a patch),
> I was wondering if I'm not reinventing the wheel, and if perhaps someone has
> already done this, or there were already discussions on how or how not to do
> it.
>
> Thanks,
> Nadav.
>
>
>   



---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org