You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Julien Massiera <ju...@francelabs.com> on 2018/07/12 08:48:56 UTC

Unified highlighter

Hi Solr community,

I would like some help with a strange behavior that I observe on the 
unified highlighter.

Here is the configuration of my highlighter :

<str name="hl">on</str>
<str name="hl.method">unified</str>
<str name="hl.defaultSummary">false</str>
<str name="hl.tag.pre">&lt;span class="em"&gt;</str>
<str name="hl.tag.post">&lt;/span&gt;</str>
<str name="hl.fl">content_fr content_en exactContent</str>
<str name="hl.requireFieldMatch">true</str>
<str name="hl.bs.type">CHARACTER</str>
<str name="hl.encoder">html</str>
<str name="hl.fragsize">200</str>
<str name="hl.maxAnalyzedChars">51200</str>


I indexed some html documents from the www.datafari.com website.

The problem is that on some documents (not all), there is not enough 
"context" wrapping the found search terms.

For example, by searching "France labs", here is the highlighting 
obtained for a certain document:

"content_en":["<span class=\"em\">France</span>&#32;<span 
class=\"em\">Labs</span>"]

Now, if I perform the same query but with the hl.bs.type set to SENTENCE 
instead of CHARACTER, I obtain the following highlighting for the same 
document :

"content_en":["Trusted&#32;by&#32;About&#32;Contact&#32;Home&#32;Migrating&#32;GSA&#32;&#169;&#32;2018&#32;Datafari&#32;by&#32;<span 
class=\"em\">France</span>&#32;<span class=\"em\">Labs</span>"]

This is way better but I strongly prefer using the WORD or CHARACTER 
types because highlighting can be too big with the SENTENCE or LINE 
types, depending on the indexed documents.

I tried to change the hl.bs.type to WORD or either to increase the 
hl.fragsize up to 1000, but with any other hl.bs.type than SENTENCE or 
LINE, the highlighting is limited to the found words only, which is not 
enough for what I need.

Is there something I am missing with the configuration ? For infos, I am 
using Solr 6.6.4.

Thanks for your help.

Julien