You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Shamik Bandopadhyay <sh...@gmail.com> on 2014/11/11 00:17:42 UTC

Highlighting simple.pre and simple.post values getting ignored

Hi,

  I'm facing a weird issue where the specified "hl.simple.pre" and
"hl.simple.post" values for highlighting is getting ignored. In my test
handler, I've the following entry:

<!-- Highlighting defaults -->
<str name="hl">true</str>
<str name="hl.simple.pre"><![CDATA[<span class="vivbold qt0">]]></str>
<str name="hl.simple.post"><![CDATA[</span>]]></str>
<str name="hl.fl">name subject</str>
<str name="hl.encoder">html</str>
<str name="f.subject.hl.fragsize">200</str>
<str name="hl.usePhraseHighlighter">false</str>
<str name="hl.useFastVectorHighlighter">true</str>
<str name="hl.boundaryScanner">breakIterator</str>


 <searchComponent class="solr.HighlightComponent" name="highlight">
    <highlighting>
      <fragmenter name="gap"
                  default="true"
                  class="solr.highlight.GapFragmenter">
        <lst name="defaults">
          <int name="hl.fragsize">100</int>
        </lst>
      </fragmenter>

      <fragmenter name="regex"
                  class="solr.highlight.RegexFragmenter">
        <lst name="defaults">
          <int name="hl.fragsize">70</int>
          <float name="hl.regex.slop">0.5</float>
          <str name="hl.regex.pattern">[-\w ,/\n\&quot;&apos;]{20,200}</str>
        </lst>
      </fragmenter>

      <formatter name="html"
                 default="true"
                 class="solr.highlight.HtmlFormatter">
        <lst name="defaults">
          <str name="hl.simple.pre"><![CDATA[<span class="vivbold
qt0">]]></str>
          <str name="hl.simple.post"><![CDATA[</span>]]></str>
        </lst>
      </formatter>

      <encoder name="html"
               class="solr.highlight.HtmlEncoder" />

      <fragListBuilder name="simple"
                       class="solr.highlight.SimpleFragListBuilder"/>

      <fragListBuilder name="single"
                       class="solr.highlight.SingleFragListBuilder"/>

      <fragListBuilder name="weighted"
                       default="true"
                       class="solr.highlight.WeightedFragListBuilder"/>

      <!-- default tag FragmentsBuilder -->
      <fragmentsBuilder name="default"
                        default="true"
                        class="solr.highlight.ScoreOrderFragmentsBuilder">
      </fragmentsBuilder>

      <!-- multi-colored tag FragmentsBuilder -->
      <fragmentsBuilder name="colored"
                        class="solr.highlight.ScoreOrderFragmentsBuilder">
        <lst name="defaults">
          <str name="hl.tag.pre"><![CDATA[
               <b style="background:yellow">,<b
style="background:lawgreen">,
               <b style="background:aquamarine">,<b
style="background:magenta">,
               <b style="background:palegreen">,<b
style="background:coral">,
               <b style="background:wheat">,<b style="background:khaki">,
               <b style="background:lime">,<b
style="background:deepskyblue">]]></str>
          <str name="hl.tag.post"><![CDATA[</b>]]></str>
        </lst>
      </fragmentsBuilder>

      <boundaryScanner name="default"
                       default="false"
                       class="solr.highlight.SimpleBoundaryScanner">
        <lst name="defaults">
          <str name="hl.bs.maxScan">10</str>
          <str name="hl.bs.chars">.,!? &#9;&#10;&#13;</str>
        </lst>
      </boundaryScanner>

      <boundaryScanner name="breakIterator"
                       class="solr.highlight.BreakIteratorBoundaryScanner">
        <lst name="defaults">
          <!-- type should be one of CHARACTER, WORD(default), LINE and
SENTENCE -->
          <str name="hl.bs.type">SENTENCE</str>
          <!-- language and country are used when constructing Locale
object.  -->
          <!-- And the Locale object will be used when getting instance of
BreakIterator -->
          <str name="hl.bs.language">en</str>
          <str name="hl.bs.country">US</str>
        </lst>
      </boundaryScanner>
    </highlighting>
  </searchComponent>

As you can see, I've specified the simple.pre and simple.post values in the
request handler as well as under standard formatter.

But, search result is always wrapping the term with <em></em>, not sure
where is this value coming from. There's no reference of it in solrconfig
file. Looks like it's ignoring the value from solrconfig and defaulting it
to <em>.

Can someone provide any pointer ? I'm using Solr 4.7.

Thanks,
Shamik

Re: Highlighting simple.pre and simple.post values getting ignored

Posted by shamik <sh...@gmail.com>.
Found the issue, to use FastVectorHighlighter, the pre and post tag syntax
are different

<str name="hl.tag.pre"></str>
<str name="hl.tag.post"></str>

This worked out as expected.



--
View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168663.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Highlighting simple.pre and simple.post values getting ignored

Posted by shamik <sh...@gmail.com>.
Looks like this has to do with the selection of  fast vector and
breakIterator as boundary scanner. I'm using them to make sure that the
highlighted snippet starts from the beginning of a  sentence and not from
the middle.

<str name="hl.usePhraseHighlighter">false</str>
<str name="hl.useFastVectorHighlighter">true</str>
<str name="hl.boundaryScanner">breakIterator</str>

Now, if I don't use them, I'm getting the right pre and post tags.

<str name="hl">on</str>
<str name="hl.fl">title name</str>
<str name="hl.encoder">html</str>
<str name="hl.simple.pre"></str>
<str name="hl.simple.post"></str>
<str name="f.title.hl.fragsize">0</str>
<str name="f.title.hl.alternateField">manu</str>
<str name="f.name.hl.fragsize">0</str>
<str name="f.name.hl.alternateField">name</str>
<str name="f.content.hl.snippets">3</str>
<str name="f.content.hl.fragsize">200</str>


Do i need any separate setting or "breakIterator" to support custom pre and
post tags?



--
View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-simple-pre-and-simple-post-values-getting-ignored-tp4168657p4168662.html
Sent from the Solr - User mailing list archive at Nabble.com.