You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "Cassandra Targett (Confluence)" <co...@apache.org> on 2013/08/13 19:28:00 UTC

[CONF] Apache Solr Reference Guide > Suggester

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: Suggester (https://cwiki.apache.org/confluence/display/solr/Suggester)


Edited by Cassandra Targett:
---------------------------------------------------------------------
{section}
{column:width=75%}
Solr includes an autosuggest component called Suggester, which is built on the [SpellCheck search component|Spell Checking]. The autocomplete suggestions that Suggester provides come from a dictionary that is either based on the main index or on a dictionary file that you provide. It is common to provide only the top-N suggestions, either ranked alphabetically or according to their usefulness for an average user (such as popularity or the number of returned results).

Because this feature is based on the [SpellCheck search component|Spell Checking], configuring Suggester is similar to configuring spell checking. Unlike the SpellCheck Component, however, Suggester has no direct indexing option at this time.

In {{solrconfig.xml}}, we need to add a search component and a request handler.
{column}
{column:width=25%}
{panel}
Covered in this section:
{toc:maxLevel=2}
{panel}
{column}
{section}

h2. Adding the Suggest Search Component

The first step is to add a search component to {{solrconfig.xml}} to extend the SpellChecker. Here is some sample code that could be used.

{code:xml|borderStyle=solid|borderColor=#666666}
<searchComponent class="solr.SpellCheckComponent" name="suggest">
   <lst name="spellchecker">
      <str name="name">suggest</str>
      <str name="classname">org.apache.solr.spelling.suggest.Suggester</str>
      <str name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>
      <str name="field">name</str>  <!-- the indexed field to derive suggestions from -->
      <float name="threshold">0.005</float>
      <str name="buildOnCommit">true</str>
<!--
      <str name="sourceLocation">american-english</str>
-->
   </lst>
</searchComponent>
{code}

One of the most important parameters is the {{lookupImpl}}, which is described in more detail below. In this example, the {{sourceLocation}} is commented out, which means that a dictionary file will not be used. Instead, the field defined with the {{field}} parameter will be used as the dictionary. We've included the unused {{sourceLocation}} in the example to demonstrate it's usage.

h3. Suggester Search Component Parameters

The Suggester search component takes the following configuration parameters:

|| Parameter || Description ||
| searchComponent name | Arbitrary name for the search component. |
| name | A symbolic name for this spellchecker. You can refer to this name in the URL parameters and in the SearchHandler configuration. |
| classname | The full class name of the component: {{org.apache.solr.spelling.Suggester}} |
| lookupImpl | Lookup implementation. Choose one of these four: \\
\\
{{{*}org.apache.solr.suggest.fst.FSTLookup{*}}}: automaton-based lookup. This implementation is slower to build, but provides the lowest memory cost. We recommend using this implementation unless you need more sophisticated matching results, in which case you should use the Jaspell implementation. \\
\\
{{{*}org.apache.solr.suggest.wfst.WFSTLookup{*}}}: weighted automaton representation; an alternative to FSTLookup for more fine-grained ranking. WFSTLookup does not use buckets, but instead a shortest path algorithm. Note that it expects weights to be whole numbers. If weight is missing it's assumed to be 1.0. Weights affect the sorting of matching suggestions when {{spellcheck.onlyMorePopular=true}} is selected: weights are treated as "popularity" score, with higher weights preferred over suggestions with lower weights. \\
\\
{{{*}org.apache.solr.suggest.jaspell.JaspellLookup{*}}}: a more complex lookup based on a ternary trie from the [JaSpell|http://jaspell.sourceforge.net/] project. Use this implementation if you need more sophisticated matching results. \\
\\
{{{*}org.apache.solr.suggest.tst.TSTLookup{*}}}: a simple compact ternary trie based lookup.\\
\\
All four implementations will likely run at similar speed when requests are made through HTTP. Direct benchmarks of these classes indicate that FSTLookup provides better performance compared to the other three methods, and at a much lower memory cost. We recommend using the FSTLookup implementation unless you need more sophisticated matching, in which case you should use the JaspellLookup implementation or FSTLookupFactory. |
| buildOnCommit or buildOnOptimize | *False* by default. If *true* then the lookup data structure will be rebuilt after commit. If *false*, then the lookup data will be built only when requested by URL parameter {{spellcheck.build=true}}. Use {{buildOnCommit}} to rebuild the dictionary with every commit, or {{buildOnOptimize}} to build the dictionary only when the index is optimized. {note}Currently implemented lookups keep their data in memory, so unlike spellchecker data, this data is discarded on core reload and not available until you invoke the build command, either explicitly or implicitly during a commit.{note} |
| queryConverter | Allows defining an alternate converter that can parse phrases in dictionary files. It passes the whole string to the query analyzer rather than analyzing it for spelling. Define it in {{solrconfig.xml}} as {{<queryConverter name="queryConverter" class="org.apache.solr.spelling.SuggestQueryConverter"/>}}. |
| sourceLocation | The path to the dictionary file. If this value is empty then the main index will be used as a source of terms and weights. |
| field | If {{sourceLocation}} is empty then terms from this field in the index will be used when building the trie. See also the section [#Defining a Field for Suggester] for more information on setting up a field to use. |
| threshold | A value between zero and one representing the minimum fraction of the total documents where a term should appear in order to be added to the lookup dictionary.\\
\\
When you use the index as the dictionary, you may encounter many invalid or uncommon terms. The {{threshold}} parameter addresses this issue. By setting the {{threshold}} parameter to a value just above zero, you can greatly reduce the number of unusable terms in your dictionary while maintaining most of the common terms. The example above sets the {{threshold}} value to 0.5%. The {{threshold}} parameter does not affect file-based dictionaries. |

h3. Using a Dictionary File

If using a dictionary file, it should be a plain text file in UTF-8 encoding. Blank lines and lines that start with a '#' are ignored. The remaining lines must consist of either a string without literal TAB (\u0007) characters, or a string and a TAB separated floating-point weight. You can use both single terms and phrases in a dictionary file.

{code:xml|borderStyle=solid|borderColor=#666666}
# This is a sample dictionary file.

acquire
accidentally\t2.0
accommodate\t3.0
{code}

h2. Adding the Suggest Request Handler

After adding the search component, a request handler must be added to {{solrconfig.xml}}. This request handler will set a number of parameters for serving suggestion requests and incorporate the "suggest" search component defined in the previous step. Because the Suggester is based on the SpellCheckComponent, the request handler shares many of the same parameters.

{code:xml|borderStyle=solid|borderColor=#666666}
<requestHandler class="org.apache.solr.handler.component.SearchHandler" name="/suggest">
   <lst name="defaults">
      <str name="spellcheck">true</str>
      <str name="spellcheck.dictionary">suggest</str>
      <str name="spellcheck.onlyMorePopular">true</str>
      <str name="spellcheck.count">5</str>
      <str name="spellcheck.collate">true</str>
   </lst>
   <arr name="components">
      <str>suggest</str>
   </arr>
</requestHandler>
{code}

h3. Suggest Request Handler Parameters

The Suggest request handler takes the following configuration parameters:

|| Parameter || Description ||
| spellcheck=true | This parameter should always be true, because we always want to run the Suggester for queries submitted to this handler. |
| spellcheck.dictionary | The name of the dictionary component configured in the search component. |
| spellcheck.onlyMorePopular | If true, then suggestions will be sorted by weight ("popularity"), which is the recommended setting. The {{count}} parameter will effectively limit this to a top-N list of best suggestions. If false, suggestions are sorted alphabetically. |
| spellcheck.count | Specifies the number of suggestions for Solr to return. |
| spellcheck.collate | If true, Solr provides a query collated with the first matching suggestion. |

h2. Defining a Field for Suggester

Any field can be used as the basis of the dictionary (if not using an explicit dictionary file). You may want to create a custom field for this purpose, and use the copy fields feature to copy text from various fields to the dedicated "suggester" field.

{code:xml|borderStyle=solid|borderColor=#666666}
 <field indexed="true" multiValued="true" name="suggestions" stored="false" type="textSpell"/>
{code}

You may want to define a custom {{fieldType}} in {{schema.xml}} to prevent over-analysis of the content of a field for use in suggestions. For example, if you have some analysis that stems terms, you wouldn't want the stemmed terms in the suggestion list, since the stemmed forms of words would be presented to users. Here is an example that could be used: 

{code:xml|borderStyle=solid|borderColor=#666666}
<fieldType class="solr.TextField" name="textSpell" positionIncrementGap="100">
   <analyzer>
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.StandardFilterFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
   </analyzer>
</fieldType> 
{code}

Once the field is configured, it is defined in the [Suggest search component|#Adding the Suggest Search Component] with the {{field}} parameter.

h2. Related Topics

* [solr:RequestHandlers and SearchComponents in SolrConfig]
* [solr:Solr Field Types]
* [solr:Copying Fields]

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action