You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Krlin, Jiri" <ji...@bridgepointeducation.com> on 2011/09/20 19:26:34 UTC

Issues with Solr Highlight

Our organization is adopting Solr to facilitate our search functionality.

One of the features we are employing is Highlights so that we can give the user a list or search results with context in which they appear. We are experiencing 2 issues with the snippets being returned.

I have tried everything I can think of including even searching the Lucidworks documentation and forums but have come up empty. Any help would be greatly appreciated.

These are the indexing settings:

  <fieldType name="text_general_nohtml" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <!-- in this example, we will only use synonyms at query time
        <filter class="solr.SynonymFilterFactory" synonyms="index_synonyms.txt" ignoreCase="true" expand="false"/>
        -->
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Given our search parameters:


<lst name="params">

<str name="hl.fragsize">100</str>

<str name="explainOther" />

<str name="indent">on</str>

<str name="hl.fl">*</str>

<str name="wt">standard</str>

<str name="hl">on</str>

<str name="rows">10</str>

<str name="version">2.2</str>

<str name="fl">*</str>

<str name="hl.snippets">10</str>

<str name="start">0</str>

<str name="q">+id:9765 +content:psychology</str>

<str name="qt">standard</str>

<str name="fq" />

</lst>


1) The hit text appears in a random location within the fragment.

Below is a highlight result when searching for "psychology". As can be seen the hit text appears randomly within the returned fragments. We need the hit word to appear at the center of the result. How can this be achieved? Also,


<lst name="highlighting">

-<%5Cl%20%22%22> <lst name="9765">

-<%5Cl%20%22%22> <arr name="id">

<str><em>9765</em></str>

</arr>

-<%5Cl%20%22%22> <arr name="content">

<str></span><span id="w202" class="werd"> of</span><span id="w203" class="werd"> the</span><span id="w204" class="werd"> <em>psychology</em></span><span id="w205" class="werd"> department</str>

<str></span><span id="w215" class="werd"> in</span><span id="w216" class="werd"> education</span><span id="w217" class="werd"> and</span><span id="w218" class="werd"> <em>psychology</em></str>

<str></span><span id="w248" class="werd"> in</span><span id="w249" class="werd"> counseling</span><span id="w250" class="werd"> <em>psychology</em></str>

<str></span><span id="w260" class="werd"> and</span><span id="w261" class="werd"> sports</span><span id="w262" class="werd"> <em>psychology</em>,</span><span id="w263" class="werd"> Ron</str>

</arr>

</lst>

</lst>


2) If two instances of the hit-text are within the hl.fragsize of each other then only one highlight result is retured instead of the expected two with each result having multiple hits.

We would like to still get 4 distinct results. How can this be achieved?


-<%5Cl%20%22%22> <lst name="highlighting">

-<%5Cl%20%22%22> <lst name="9765">

-<%5Cl%20%22%22> <arr name="id">

<str><em>9765</em></str>

</arr>

-<%5Cl%20%22%22> <arr name="content">

<str></span><span id="w198" class="werd"> Mossler</span><span id="w199" class="werd"> is</span><span id="w200" class="werd"> currently</span><span id="w201" class="werd"> chair</span><span id="w202" class="werd"> of</span><span id="w203" class="werd"> the</span><span id="w204" class="werd"> <em>psychology</em></span><span id="w205" class="werd"> department</span><span id="w206" class="werd"> at</span><span id="w207" class="werd"> Los</span><span id="w208" class="werd"> Angeles</span><span id="w209" class="werd"> Valley</span><span id="w210" class="werd"> College.</span><span id="w211" class="werd"> He</span><span id="w212" class="werd"> began</span><span id="w213" class="werd"> his</span><span id="w214" class="werd"> career</span><span id="w215" class="werd"> in</span><span id="w216" class="werd"> education</span><span id="w217" class="werd"> and</span><span id="w218" class="werd"> <em>psychology</em></span><span id="w219" class="werd"> in</span><span id="w220" class="werd"> 1980</span><span id="w221" class="werd"> when</str>

<str></span><span id="w246" class="werd"> from</span><span id="w247" class="werd"> UCLA</span><span id="w248" class="werd"> in</span><span id="w249" class="werd"> counseling</span><span id="w250" class="werd"> <em>psychology</em>.</span><span id="w251" class="werd"> In</span><span id="w252" class="werd"> addition</span><span id="w253" class="werd"> to</span><span id="w254" class="werd"> numerous</span><span id="w255" class="werd"> magazine</span><span id="w256" class="werd"> columns</span><span id="w257" class="werd"> on</span><span id="w258" class="werd"> childhood</span><span id="w259" class="werd"> discipline</span><span id="w260" class="werd"> and</span><span id="w261" class="werd"> sports</span><span id="w262" class="werd"> <em>psychology</em>,</span><span id="w263" class="werd"> Ron</span><span id="w264" class="werd"> has</span><span id="w265" class="werd"> authored</span><span id="w266" class="werd"> and</span><span id="w267" class="werd"> contributed</span><span id="w268" class="werd"> to</span><span id="w269" class="werd"> reading</str>

</arr>

</lst>

</lst>


John Krlin  |  Software Developer
Bridgepoint Education  |  Higher access to higher education

858.668.2586 x 4904 (office)
760.505.0814 (cell)
jiri.krlin@bridgepointeducation.com<bl...@bridgepointeducation.com>
www.bridgepointeducation.com<blocked::http://www.bridgepointeducation.com/>

IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from this company may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.