You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Michael McCandless (JIRA)" <ji...@apache.org> on 2011/05/10 19:31:47 UTC

[jira] [Commented] (LUCENE-1421) Ability to group search results by field

    [ https://issues.apache.org/jira/browse/LUCENE-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031266#comment-13031266 ] 

Michael McCandless commented on LUCENE-1421:
--------------------------------------------

bq. I think that grouping code should be part of Lucene instead of Solr.

+1

This is a very popular issue (currently tied for 2nd place in votes).

Unfortunately, I think the single-pass collector attached here doesn't
scale very well to large maxDoc and/or large number of unique groups.
Also, it pulls a DocTermsIndex on the top-level reader (costly in an
NRT/reopen setting since it's not per-segment).

So I decided to factor out parts of Solr's current two-pass approach
into a shared "grouping" module.

The downside of the two-pass approach is you run the query twice,
automatically halving your QPS.  (It's even worse because the grouping
itself is somewhat computing intensive too).  To try to help mitigate
this, I also added a new CachingCollector, which just holds hits
(docID and optionally score) up to a max allowed RAM consumption, and
can then replay them for the 2nd pass.  In includes a "max RAM"
setting so that if too many hits are found, it stops caching (and you
must then re-execute the query).

But one nice side effect of the two-phased approach is that sharding
is in theory straightforward (I think?).  Ie, all shards would do the
first phase, concurrently, to get the top N groups.  Then you
merge-sort the resulting top groups, then run second phase (finding
docs w/in the top groups) on all shards, then merge results from the
same group across all shards.


> Ability to group search results by field
> ----------------------------------------
>
>                 Key: LUCENE-1421
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1421
>             Project: Lucene - Java
>          Issue Type: New Feature
>          Components: Search
>            Reporter: Artyom Sokolov
>            Priority: Minor
>         Attachments: lucene-grouping.patch
>
>
> It would be awesome to group search results by specified field. Some functionality was provided for Apache Solr but I think it should be done in Core Lucene. There could be some useful information like total hits about collapsed data like total count and so on.
> Thanks,
> Artyom

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org