You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "Cassandra Targett (Confluence)" <co...@apache.org> on 2013/08/20 23:29:00 UTC

[CONF] Apache Solr Reference Guide > The Terms Component

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: The Terms Component (https://cwiki.apache.org/confluence/display/solr/The+Terms+Component)

Change Comment:
---------------------------------------------------------------------
Updated example responses

Edited by Cassandra Targett:
---------------------------------------------------------------------
The Terms Component provides access to the indexed terms in a field and the number of documents that match each term. This can be useful for building an auto-suggest feature or any other feature that operates at the term level instead of the search or document level. Retrieving terms in index order is very fast since the implementation directly uses Lucene's TermEnum to iterate over the term dictionary. 

In a sense, this component provides fast field-faceting over the whole index, not restricted by the base query or any filters. The document frequencies returned are the number of documents that match the term, including any documents that have been marked for deletion but not yet removed from the index. 

To use the Terms Component, users can pass in a variety of options to control what terms are returned. These parameters are: 

||Parameter||Description||Syntax||
|terms|If set to true, enables the Terms Component. By default, the Terms Component is on.|{{terms=\{true\|false\}}}|
|terms.fl|Specifies the field from which to retrieve terms. |{{terms.fl=_field_}}|
|terms.lower|Specifies the term at which to start. If not specified, the empty string is used, causing Solr to start at the beginning of the field. |{{terms.lower=_term_}}|
|terms.lower.incl|If set to true, includes the lower-bound term in the result set. By default, this parameter is set to true.|{{terms.lower.incl=\{true\|false\}}}|
|terms.mincount|Specifies the minimum document frequency to return in order for a term to be included in a query response. Results are inclusive of the mincount (that is, >= mincount). This parameter is optional.|{{terms.mincount=_integer_}}|
|terms.maxcount|Specifies the maximum document frequency a term must have in order to be included in a query response. The default setting is -1, which sets no upper bound. Results are inclusive of the maxcount (that is, <= maxcount). This parameter is optional.|{{terms.maxcount=_integer_}}|
|terms.prefix|Restricts matches to terms that begin with the specified string.|{{terms.prefix=\{string\}}}|
|terms.limit|Specifies the maximum number of terms to  return. The default is 10. If the limit is set to a number less than 0, then no maximum limit is enforced.|{{terms.limit=_integer_}}|
|terms.upper|Specifies the term to stop at. Any application using the Terms component must set either {{terms.limit}} or {{terms.upper}}. |{{terms.upper=_upper_term_}}|
|terms.upper.incl|If set to true, includes the upper bound term in the result set. The default is false. |{{terms.upper.incl=\{true\|false\}}}|
|terms.raw|If set to true, returns the raw characters of the indexed term, regardless of whether it is human-readable. For instance, the indexed form of numeric numbers is not human-readable. The default is false.|{{terms.raw=\{true\|false\}}}|

The output is a list of the terms and their document frequency values.

h2. Examples

The following examples use the sample Solr configuration located in the {{<Solr>/example}} directory and the sample documents in the {{exampledocs}} directory.

h3. Get First 10 Terms

This query requests the first ten terms in the name field: {{http://localhost:8983/solr/terms?terms.fl=name}}

Results:

{code:xml|language=html/xml|borderStyle=solid|borderColor=#666666}
<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">2</int>
   </lst>
   <lst name="terms">
      <lst name="name">
         <int name="one">5</int>
         <int name="184">3</int>
         <int name="1gb">3</int>
         <int name="3200">3</int>
         <int name="400">3</int>
         <int name="ddr">3</int>
         <int name="gb">3</int>
         <int name="ipod">3</int>
         <int name="memory">3</int>
         <int name="pc">3</int>
      </lst>
   </lst>
</response>
{code}

h3. Get First 10 Terms, Starting with Letter 'a'

This query requests the first ten terms in the name field, beginning with the first term that begins with the letter a: {{http://localhost:8983/solr/terms?terms.fl=name&terms.lower=a}}

Results: 

{code:xml|language=html/xml|borderStyle=solid|borderColor=#666666}
<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">3</int>
   </lst>
   <lst name="terms">
      <lst name="name">
         <int name="one">5</int>
         <int name="ddr">3</int>
         <int name="gb">3</int>
         <int name="ipod">3</int>
         <int name="memory">3</int>
         <int name="pc">3</int>
         <int name="pin">3</int>
         <int name="sdram">3</int>
         <int name="system">3</int>
         <int name="unbuffered">3</int>
      </lst>
   </lst>
</response>
{code}

h2. Using the Terms Component for an Auto-Suggest Feature

If the [Suggester|Suggester] doesn't suit your needs, you can use the Terms component in Solr to build a similar feature for your own search application. Simply submit a query specifying whatever characters the user has typed so far as a prefix. For example, if the user has typed "at", the search engine's interface would submit the following query:

{{http://localhost:8983/solr/terms?terms.fl=name&terms.prefix=at}}

Result:

{code:language=html/xml|borderStyle=solid|borderColor=#666666}
<?xml version="1.0" encoding="UTF-8"?>
<response>
   <lst name="responseHeader">
      <int name="status">0</int>
      <int name="QTime">1</int>
   </lst>
   <lst name="terms">
      <lst name="name">
         <int name="ata">1</int>
         <int name="ati">1</int>
      </lst>
   </lst>
</response>
{code}

You can use the parameter {{omitHeader=true}} to omit the response header from the query response, like in this example, which also returns the response in JSON format: {{http://localhost:8983/solr/terms?terms.fl=name&terms.prefix=at&indent=true&wt=json&omitHeader=true}}

Result: 

{code:language=none|borderStyle=solid|borderColor=#666666}
{
    "terms": {
        "name": [
            "ata",
            1,
            "ati",
            1
        ]
    }
}
{code}

h2. Distributed Search Support

The TermsComponent also supports distributed indexes. For the {{/terms}} request handler, you must provide the following two parameters:

||Parameter||Description||
|shards| Specifies the shards in your distributed indexing configuration. For more information about distributed indexing, see [solr:Distributed Search with Index Sharding]. |
|shards.qt| Specifies the request handler Solr uses for requests to shards. |

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action