You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@lucene.apache.org by "Cassandra Targett (Confluence)" <co...@apache.org> on 2013/09/13 17:50:00 UTC

[CONF] Apache Solr Reference Guide > FastVector Highlighter

Space: Apache Solr Reference Guide (https://cwiki.apache.org/confluence/display/solr)
Page: FastVector Highlighter (https://cwiki.apache.org/confluence/display/solr/FastVector+Highlighter)

Change Comment:
---------------------------------------------------------------------
changes in progress: add params for FVH, reformat

Edited by Cassandra Targett:
---------------------------------------------------------------------
The FastVectorHighlighter is a TermVector-based highlighter that offers higher performance than the standard highlighter in many cases. To use the FastVectorHighlighter, set the {{hl.useFastVectorHighlighter}} parameter to {{true}}.

You must also turn on {{termVectors}}, {{termPositions}}, and {{termOffsets}} for each field that will be highlighted. Lastly, you should use a boundary scanner to prevent the FastVectorHighlighter from truncating your terms. In most cases, using the {{breakIterator}} boundary scanner will give you excellent results. See the section [Using Boundary Scanners with the Fast Vector Highlighter|#Using Boundary Scanners with the Fast Vector Highlighter] for more details about boundary scanners.

h2. FastVector Highlighter Parameters

The table below describes Solr's parameters for this highlighter. These parameters can be defined in the highlight search component, as defaults for the specific request handler, or passed to the request handler with the query.

|| Parameter || Default || Description ||
| hl | blank (no highlighting) | When set to *true*, enables highlighted snippets to be generated in the query response. A *false* or blank value disables highlighting. |
| hl.useFastVectorHighligter | false | When set to *true*, enables the FastVector Highlighter. |
| hl.q | blank | Specifies an overriding query term for highlighting. If {{hl.q}} is specified, the highlighter will use that term rather than the main query term. |
| hl.fl | blank | Specifies a list of fields to highlight. Accepts a comma\- or space-delimited list of fields for which Solr should generate highlighted snippets. If left blank, highlights the defaultSearchField (or the field specified the {{df}} parameter if used) for the StandardRequestHandler. For the DisMaxRequestHandler, the {{qf}} fields are used as defaults. \\
\\
A '\*' can be used to match field globs, such as 'text_\*' or even '\*' to highlight on all fields where highlighting is possible. When using '\*', consider adding {{hl.requireFieldMatch=true}}. |
| hl.snippets | 1 | Specifies maximum number of highlighted snippets to generate per field. It is possible for any number of snippets from zero to this value to be generated. This parameter accepts per-field overrides. |
| hl.fragsize | 100 | Specifies the size, in characters, of fragments to consider for highlighting. *0* indicates that no fragmenting should be considered and the whole field value should be used. This parameter accepts per-field overrides. | 
| hl.requireFieldMatch | false | If set to *true*, highlights terms only if they appear in the specified field. If *false*, terms are highlighted in all requested fields regardless of which field matched the query. | 
| hl.maxMultiValuedToExamine | {{integer.MAX_VALUE}} | Specifies the maximum number of entries in a multi-valued field to examine before stopping. This can potentially return zero results if the limit is reached before any matches are found. If used with the {{maxMultiValuedToMatch}}, whichever limit is reached first will determine when to stop looking. | 
| hl.maxMultiValuedToMatch | {{integer.MAX_VALUE}} | Specifies the maximum number of matches in a multi-valued field that are found before stopping. If {{hl.maxMultiValuedToExamine}} is also defined, whichever limit is reached first will determine when to stop looking. | 
| hl.alternateField | blank | Specifies a field to be used as a backup default summary if Solr cannot generate a snippet (i.e., because no terms match). This parameter accepts per-field overrides. | 
| hl.maxAlternateFieldLength | unlimited | Specifies the maximum number of characters of the field to return. Any value less than or equal to 0 means the field's length is unlimited. This parameter is only used in conjunction with the {{hl.alternateField}} parameter. | 
| hl.tag.pre \\
hl.tag.post | <em> and </em> | Specifies the text that should appear before ({{hl.tag.pre}}) and after ({{hl.tag.post}}) a highlighted term. This parameter accepts per-field overrides. | 
| hl.phraseLimit | {{integer.MAX_VALUE}} | To improve the performance of the FastVectorHighlighter, you can set a limit on the number (int) of phrases to be analyzed for highlighting. |
| hl.usePhraseHighlighter | true | If set to *true*, Solr will use the Lucene SpanScorer class to highlight phrase terms only when they appear within the query phrase in the document. | 
| hl.preserveMulti | | If *true*, multi-valued fields will return all values in the order they were saved in the index. If *false*, the default, only values that match the highlight request will be returned. |
| hl.fragListBuilder | simple | |
| hl.fragmentsBuilder | | |



| hl.boundaryScanner | | Specifies one of two boundary scanners to use with the FastVectorHighlighter: {{simple}} or {{breakIterator}}. See the section [Using Boundary Scanners with the Fast Vector Highlighter|#Using Boundary Scanners with the Fast Vector Highlighter] for more information about the boundary scanners. |

h2. Using Boundary Scanners with the Fast Vector Highlighter

The Fast Vector Highlighter will occasionally truncate highlighted words. To prevent this, implement a boundary scanner in {{solrconfig.xml}}, then use the {{hl.boundaryScanner}} parameter to specify the boundary scanner for highlighting.

Solr supports two boundary scanners: {{breakIterator}} and {{simple}}.

h5. The {{breakIterator}} Boundary Scanner

The {{breakIterator}} boundary scanner offers excellent performance right out of the box by taking locale and boundary type into account. In most cases you will want to use the {{breakIterator}} boundary scanner. To implement the {{breakIterator}} boundary scanner, add this code to the {{highlighting}} section of your {{solrconfig.xml}} file, adjusting the type, language, and country values as appropriate to your application:

{code:xml|borderStyle=solid|borderColor=#666666}
<boundaryScanner name="breakIterator" class="solr.highlight.BreakIteratorBoundaryScanner">
  <lst name="defaults">
    <str name="hl.bs.type">WORD</str>
    <str name="hl.bs.language">en</str>
    <str name="hl.bs.country">US</str>
 </lst>
</boundaryScanner>
{code}

Possible values for the {{hl.bs.type}} parameter are WORD, LINE, SENTENCE, and CHARACTER.

h5. The {{simple}} Boundary Scanner

The {{simple}} boundary scanner scans term boundaries for a specified maximum character value and for common delimiters such as punctuation marks. The {{simple}} boundary scanner may be useful for some custom To implement the {{simple}} boundary scanner, add this code to the {{highlighting}} section of your {{solrconfig.xml}} file, adjusting the values as appropriate to your application:

{code:xml|borderStyle=solid|borderColor=#666666}
<boundaryScanner name="simple" class="solr.highlight.SimpleBoundaryScanner" default="true">
<lst name="defaults">
 <str name="hl.bs.maxScan">10</str>
 <str name="hl.bs.chars">.,!?\t\n</str>
 </lst>
</boundaryScanner>
{code}

{scrollbar}


Stop watching space: https://cwiki.apache.org/confluence/users/removespacenotification.action?spaceKey=solr
Change email notification preferences: https://cwiki.apache.org/confluence/users/editmyemailsettings.action