You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@lucenenet.apache.org by "Kaufmann M." <ka...@gmail.com> on 2010/07/07 14:18:48 UTC

Counting occurences with HitCollector

Hello everbody,
I have a running project in which I'd like to realize an overview table of
the search results (similar to faceted searching).
Currently I've tried different approaches to do this:

DataTable in HitCollector to count occurences
Faceted Booleanqueries

Now in both cases I have a problem:
I have multiple fields I'd like to count:
- Main category (numerical value between 0 and 50)
- Subcategories (string values, 5-15 per result)

With the DataTable method I can count both categories, but if the results
reach a big number it get's miserably slow.
With the Faceted Booleanqueries I cannot search for the subcategories (I
would have to search for thousands of different strings).

Does anybody have an Idea how to solve this?

Concerning the usage in the end:
I'd like to display an overview like:
Maincategory 1 [50 Hits]
 - Subcategory 1 [20 Hits]
 - Subcategory 2 [10 Hits]
 ... Top 10 subcategories
... all Maincategories

Any help would be greatly appreciated.
Best Regards

Re: Counting occurences with HitCollector

Posted by Jokin Cuadrado <jo...@gmail.com>.
If you are getting the data via the "doc" property it's a very bad idea, it
haves to get the whole document from the hard disk and it's terribly slow.

The best approach in this case it's to get a fieldcache, and get the field
values from there. You can see this approach in a simple facets application
that i have wrote a lot of time ago.
http://lucene.apache.org/~digy/files/SimpleFacets.zip

You can take a look also to the various discussions about facets in this
same list.

On Wed, Jul 7, 2010 at 2:18 PM, Kaufmann M. <ka...@gmail.com> wrote:

> Hello everbody,
> I have a running project in which I'd like to realize an overview table of
> the search results (similar to faceted searching).
> Currently I've tried different approaches to do this:
>
> DataTable in HitCollector to count occurences
> Faceted Booleanqueries
>
> Now in both cases I have a problem:
> I have multiple fields I'd like to count:
> - Main category (numerical value between 0 and 50)
> - Subcategories (string values, 5-15 per result)
>
> With the DataTable method I can count both categories, but if the results
> reach a big number it get's miserably slow.
> With the Faceted Booleanqueries I cannot search for the subcategories (I
> would have to search for thousands of different strings).
>
> Does anybody have an Idea how to solve this?
>
> Concerning the usage in the end:
> I'd like to display an overview like:
> Maincategory 1 [50 Hits]
>  - Subcategory 1 [20 Hits]
>  - Subcategory 2 [10 Hits]
>  ... Top 10 subcategories
> ... all Maincategories
>
> Any help would be greatly appreciated.
> Best Regards
>



-- 
Jokin