You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Martijn van Groningen (JIRA)" <ji...@apache.org> on 2011/05/15 12:59:47 UTC

[jira] [Updated] (LUCENE-3098) Grouped total count

     [ https://issues.apache.org/jira/browse/LUCENE-3098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Martijn van Groningen updated LUCENE-3098:
------------------------------------------

    Attachment: LUCENE-3098.patch

Attached an initial patch for computing the total group count.

Currently it is implemented as a separate collector. The collector can be executed in the first and second pass if the MultiCollector is used.

bq. We may want to just fold this into the 1st pass collector since it's already looking up group ord + value? 
The 1st pass collector is more concerned with finding the top N groups. For this it takes in account the sort within a group to choose the right group head. The total group count collector doesn't care about the group sort. It just increments the count if an unseen group has been detected. The group count collector needs to do this for all groups, not just the top N. Therefore I think it best implemented in a separate collector.

I also measured some basic performance. I used a machine with a 2.16 GHz Core 2 Duo processor and 4GB RAM. I used an index of 30M documents. The group field has around 7500 unique values. The average search time was around 350 ms. The average heap usage was 122 MB. I ran 50 searches in parallel with only the total group count collector.

> Grouped total count
> -------------------
>
>                 Key: LUCENE-3098
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3098
>             Project: Lucene - Java
>          Issue Type: New Feature
>            Reporter: Martijn van Groningen
>             Fix For: 3.2, 4.0
>
>         Attachments: LUCENE-3098.patch
>
>
> When grouping currently you can get two counts:
> * Total hit count. Which counts all documents that matched the query.
> * Total grouped hit count. Which counts all documents that have been grouped in the top N groups.
> Since the end user gets groups in his search result instead of plain documents with grouping. The total number of groups as total count makes more sense in many situations. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org