You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "David Smiley (Confluence)" <co...@apache.org> on 2013/08/03 23:39:00 UTC

[CONF] Apache Solr Reference Guide > Result Grouping

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Result Grouping (https://cwiki.apache.org/confluence/display/solr/Result+Grouping)

Change Comment:
---------------------------------------------------------------------
group.truncate in fact works in distributed grouping -- SOLR-2776

Edited by David Smiley:
---------------------------------------------------------------------
Result Grouping groups documents with a common field value into groups and returns the top documents for each group. For example, if you searched for "DVD" on an electronic retailer's e-commerce site, you might be returned three categories such as "TV and Video," "Movies," and "Computers," with three results per category. In this case, the query term "DVD" appeared in all three categories, so Solr groups them together in order to increase relevancy for the user.

Result Grouping is separate from [Faceting]. Though it is conceptually similar, faceting returns all relevant results and allows the user to refine the results based on the facet category. For example, if you searched for "shoes" on a footwear retailer's e-commerce site, you would be returned all results for that query term, along with selectable facets such as "size," "color," "brand," and so on.

However, with Solr 4 you can also group facets. The grouped faceting works with the first {{group.field}} parameter, and other {{group.field}} parameters are ignored. Grouped faceting only supports {{facet.field}} for string based fields that are not tokenized and are not multivalued.

Grouped faceting currently doesn't support date and pivot faceting, but it does support range faceting.

Grouped faceting differs from non grouped facets (sum of all facets) == (total  of products with that property) as shown in  the following example:

Object 1
- name: Phaser 4620a
- ppm: 62
- product_range: 6

Object 2
- name: Phaser 4620i
- ppm: 65
- product_range: 6

Object 3
- name: ML6512
- ppm: 62
- product_range: 7

If you ask Solr to group these documents by "product_range", then the total amount of groups is 2, but the facets for ppm are 2 for 62 and 1 for 65.

h2. Request Parameters

Result Grouping takes the following request parameters. Any number of these request parameters can be included in a single request:

|| Parameter || Type || Description ||
| group | Boolean | If true, query results will be grouped. |
| group.field | string | The name of the field by which to group results. The field be single-valued, and either be indexed or a field type that has a value source and works in a function query, such as {{ExternalFileField}}. It must also be a string-based field, such as {{StrField}} or {{TextField}} |
| group.func | query | Group based on the unique values of a function query. Supported only in Sol4r 4.0. |
| group.query | query | Return a single group of documents that match the given query. |
| rows | integer | The number of groups to return. The default value is 10. |
| start | integer | Specifies an initial offset for the list of groups. |
| group.limit | integer | Specifies the number of results to return for each group. The default value is 1. |
| group.offset | integer | Specifies an initial offset for the document list of each group. |
| sort | sortspec | Specifies how Solr sorts the groups relative to each other. For example, {{sort=popularity desc}} will cause the groups to be sorted according to the highest popularity document in each group. The default value is {{score desc}}. |
| group.sort | sortspec | Specifies how Solr sorts documents within a single group. The default value is {{score desc}}. |
| group.format | grouped/simple | If this parameter is set to {{simple}}, the grouped documents are presented in a single flat list, and the {{start}} and {{rows}} parameters affect the numbers of documents instead of groups. |
| group.main | Boolean | If true, the result of the first field grouping command is used as the main result list in the response, using {{group.format=simple}}. |
| group.ngroups | Boolean | If true, Solr includes the number of groups that have matched the query in the results. The default value is false. |
| group.truncate | Boolean | If true, facet counts are based on the most relevant document of each group matching the query. The default value is false. |
| group.facet | Boolean |  Determines whether to compute grouped facets for the field facets specified in facet.field parameters. Grouped facets are computed based on the first specified group. As with normal field faceting, fields shouldn't be tokenized (otherwise counts are computed for each token). Grouped faceting supports single and multivalued fields. Default is false. New with Solr 4. |
| group.cache.percent | integer between 0 and 100 | Setting this parameter to a number greater than 0 enables caching for result grouping. Result Grouping executes two searches; this option caches the second search. The default value is 0. Testing has shown that group caching only improves search time with Boolean, wildcard, and fuzzy queries. For simple queries like term or "match all" queries, group caching degrades performance. |

Any number of group commands ({{group.field}}, {{group.func}}, {{group.query}}) may be specified in a single request.

Grouping is also supported for distributed searches. Currently {{group.func}} is the only parameter that doesn't supported distributed searches. 

h2. Examples

All of the following examples work with the data provided in the Solr Example directory.

h3. Grouping Results by Field

In this example, we will group results based on the {{manu_exact}} field, which specifies the manufacturer of the items in the sample dataset.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name&q=solr+memory&group=true&group.field=manu_exact]}}

{code:borderStyle=solid|borderColor=#666666}
{
...
"grouped":{
  "manu_exact":{
    "matches":6,
    "groups":[{
        "groupValue":"Apache Software Foundation",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"SOLR1000",
              "name":"Solr, the Enterprise Search Server"}]
        }},
      {
        "groupValue":"Corsair Microsystems Inc.",
        "doclist":{"numFound":2,"start":0,"docs":[
            {
              "id":"VS1GB400C3",
              "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"}]
        }},
      {
        "groupValue":"A-DATA Technology Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"VDBDB1A16",
              "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"}]
        }},
      {
        "groupValue":"Canon Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"0579B002",
              "name":"Canon PIXMA MP500 All-In-One Photo Printer"}]
        }},
      {
        "groupValue":"ASUS Computer Inc.",
        "doclist":{"numFound":1,"start":0,"docs":[
            {
              "id":"EN7800GTX/2DHTV/256M",
              "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
        }
      }
    ]
   }
  }
{code}

The response indicates that there are six total matches for our query. For each unique value of {{group.field}}, Solr returns a {{docList}} with the top scoring document. The {{docList}} also includes the total number of matches in that group as the {{numFound}} value. The groups are sorted by the score of the top document within each group.

We can run the same query with the request parameter {{group.main=true}}. This will format the results as a single flat document list. This flat format does not include as much information as the normal result grouping query results, but it may be easier for existing Solr clients to parse.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true]}}

{code:borderStyle=solid|borderColor=#666666}
{
  "responseHeader":{
    "status":0,
    "QTime":1,
    "params":{
      "fl":"id,name,manufacturer",
      "indent":"true",
      "q":"solr memory",
      "group.field":"manu_exact",
      "group.main":"true",
      "group":"true",
      "wt":"json"}},
  "grouped":{},
  "response":{"numFound":6,"start":0,"docs":[
      {
        "id":"SOLR1000",
        "name":"Solr, the Enterprise Search Server"},
      {
        "id":"VS1GB400C3",
        "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail"},
      {
        "id":"VDBDB1A16",
        "name":"A-DATA V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM"},
      {
        "id":"0579B002",
        "name":"Canon PIXMA MP500 All-In-One Photo Printer"},
      {
        "id":"EN7800GTX/2DHTV/256M",
        "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)"}]
  }
}
{code}

h3. Grouping by Query

In this example, we will use the {{group.query}} parameter to find the top three results for "memory" in two different price ranges: 0.00 to 99.99, and over 100.

{{[http://localhost:8983/solr/select?wt=json&indent=true&fl=name,price&q=memory&group=true&group.query=price:\[0+TO+99.99\]&group.query=price:\[100+TO+*\]&group.limit=3]}}

{code:borderStyle=solid|borderColor=#666666}
{
  "responseHeader":{
    "status":0,
    "QTime":42,
    "params":{
      "fl":"name,price",
      "indent":"true",
      "q":"memory",
      "group.limit":"3",
      "group.query":["price:[0 TO 99.99]",
        "price:[100 TO *]"],
      "group":"true",
      "wt":"json"}},
  "grouped":{
    "price:[0 TO 99.99]":{
      "matches":5,
      "doclist":{"numFound":1,"start":0,"docs":[
          {
            "name":"CORSAIR ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory - Retail",
            "price":74.99}]
      }},
    "price:[100 TO *]":{
      "matches":5,
      "doclist":{"numFound":3,"start":0,"docs":[
          {
            "name":"CORSAIR  XMS 2GB (2 x 1GB) 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel 
               Kit System Memory - Retail",
            "price":185.0},
          {
            "name":"Canon PIXMA MP500 All-In-One Photo Printer",
            "price":179.99},
          {
            "name":"ASUS Extreme N7800GTX/2DHTV (256 MB)",
            "price":479.95}]
      }
     }
   }
 }
{code}

In this case, Solr found five matches for "memory," but only returns four results grouped by price. This is because one result for "memory" did not have a price assigned to it.

h2. Distributed Result Grouping

Solr also supports result grouping on distributed indexes. If you are using result grouping on the "/select" request handler, you must provide the {{shards}} parameter described here. If you are using result grouping on a request handler other than "/select", you must also provide the {{shards.qt}} parameter:

|| Parameter || Description ||
| shards | Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see [Distributed Search with Index Sharding] |
| shards.qt | Specifies the request handler Solr uses for requests to shards. This parameter is not required for the {{/select}} request handler. |

For example: {{[http://localhost:8983/solr/select?wt=json&indent=true&fl=id,name,manufacturer&q=solr+memory&group=true&group.field=manu_exact&group.main=true&shards=solr-shard1:8983/solr,solr-shard2:8983/solr]}}

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action