You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Ramanathapuram, Rajesh" <Ra...@turner.com> on 2011/04/20 15:43:03 UTC

Solr - Multi Term highlighting issue

Hello,

I am dealing with a highlighting issue in SOLR, I will try to explain
the issue.

When I search for a single term in solr, it wraps <em> tag around the
words I want to highlight, all works well.
But if I search multiple term, for most part highlighting works good and
then for some of the terms, 
the highlight return multiple terms in a sing <em> tag     ...
<em>srchtrm1) <br><b><p>.... srchtrm2</em>
I expect solr to return highlight terms like    ... <em>srchtrm1</em>)
<br><b><p>... <em>srchtrm2</em>

When I search for 'US mec chile', here is how my result appears 
  ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES: We
had ... with <em>US</em> and <em>Chile</em> ...,
  (<em>MEC)</b></p><p></p><p><b>US</em>  ....

This is what I was expecting it to be 
  ... Corboba. (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
FIRES: We had ... with <em>US</em> and <em>Chile</em> ...,
(<em>MEC</em>)</b></p><p></p><p><b><em>US</em> ....

Here is my query params 
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">26</int> 
- <lst name="params">
     <str name="hl.fragsize">100000</str> 
     <str name="explainOther" /> 
     <str name="indent">on</str> 
     <str name="hl.fl">story, slug</str> 
     <str name="wt">standard</str> 
     <str name="hl">on</str> 
     <str name="rows">10</str> 
     <str name="version">2.2</str> 
     <str name="hl.highlightMultiTerm">true</str> 
     <str name="fl">*</str> 
     <str name="start">0</str> 
     <str name="q">mec us chile</str> 
     <str name="qt">standard</str> 
     <str name="hl.usePhraseHighlighter">true</str> 
     <str name="fq">storyid="XXXX XXXX XXXXX"</str> 
  </lst>
  </lst>

Here are some other links I found in the forum, but no real conclusion
 
http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_hi
ghlighting_question#78163c42a67cb533 
   
I am going to try this patch, which also had no conclusive results
   https://issues.apache.org/jira/browse/SOLR-1394 

Has anyone come across this issue?
Any suggestions on how to fix this issue is much appreciated.


thanks & regards,
Rajesh Ramana 

RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Hi Robert, 

Thanks for your help. 

This looks much closer to my issue(may be not). Unfortunately, I can't
switch to solr version 3.1 yet. 
I hope to revisit and update this post when I do.

Thanks

thanks & regards,
Rajesh Ramana 
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474 


-----Original Message-----
From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com] 
Sent: Sunday, April 24, 2011 1:58 AM
To: solr-user@lucene.apache.org
Cc: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

I think I am using ver 1.4, I 'll try to review the link you provided
later today.

Rajesh Ramana




On Apr 24, 2011, at 12:52 AM, "Robert Muir" <rc...@gmail.com> wrote:

> On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh 
> <Ra...@turner.com> wrote:
>> What is really weird is if I search for srchterm1 and srchterm2 
>> separately, the results come up fine. If I search for multiple terms,

>> this issue seems to happen when the terms are separated by html tags 
>> and special characters like ') / \' etc...
>> 
> 
> What version of Solr are you using? Because you are saying the issue 
> only happens when terms involve special characters, its possible it 
> could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874,
> with the overlapping terms being created by the WordDelimiterFilter.
> 
> This is fixed in 3.1.

Re: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
I think I am using ver 1.4, I 'll try to review the link you provided later today.

Rajesh Ramana




On Apr 24, 2011, at 12:52 AM, "Robert Muir" <rc...@gmail.com> wrote:

> On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh
> <Ra...@turner.com> wrote:
>> What is really weird is if I search for srchterm1 and srchterm2
>> separately, the results come up fine. If I search for multiple terms,
>> this issue seems to happen when the terms are separated by html tags and
>> special characters like ') / \' etc...
>> 
> 
> What version of Solr are you using? Because you are saying the issue
> only happens when terms involve special characters, its possible it
> could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874,
> with the overlapping terms being created by the WordDelimiterFilter.
> 
> This is fixed in 3.1.

Re: Solr - Multi Term highlighting issue

Posted by Robert Muir <rc...@gmail.com>.
On Sat, Apr 23, 2011 at 11:36 PM, Ramanathapuram, Rajesh
<Ra...@turner.com> wrote:
> What is really weird is if I search for srchterm1 and srchterm2
> separately, the results come up fine. If I search for multiple terms,
> this issue seems to happen when the terms are separated by html tags and
> special characters like ') / \' etc...
>

What version of Solr are you using? Because you are saying the issue
only happens when terms involve special characters, its possible it
could be this bug: https://issues.apache.org/jira/browse/LUCENE-2874,
with the overlapping terms being created by the WordDelimiterFilter.

This is fixed in 3.1.

RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Hi Koji, 

Thanks for taking time to look into this issue, I really appreciate your
efforts.

I am wondering the problem might be a document format issue(just my
guess).

What is really weird is if I search for srchterm1 and srchterm2
separately, the results come up fine. If I search for multiple terms,
this issue seems to happen when the terms are separated by html tags and
special characters like ') / \' etc... 

I am fairly new to SOLR, still trying to understand, how things work. My
guess is somewhere or somehow a whitespace is missed, and the highlight
regex based fragmentor is messing things up. 

Here it is (for review) from solrconfig.xml, if you can think of
anything obvious....

  <highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case
-->
   <fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html"
class="org.apache.solr.highlight.HtmlFormatter" default="true">
    <lst name="defaults">
     <str name="hl.simple.pre"><![CDATA[<em>]]></str>
     <str name="hl.simple.post"><![CDATA[</em>]]></str>
    </lst>
   </formatter>
  </highlighting>


I will try to open a JIRA issue in the next couple of weeks, when my
schedule slows down.

Once again, thanks much for your help.

thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Koji Sekiguchi [mailto:koji@r.email.ne.jp] 
Sent: Saturday, April 23, 2011 11:07 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Thank you for sending the definitions. I thought you defined n-gram
based field for story and slug, but your definitions looks fine for me.

I don't understand why you got such strange snippets.
I think you can open a jira issue for this problem (sorry I cannot take
it at this moment) with a test case that reproduces the problem would be
much helpful.

Koji

(11/04/24 11:50), Ramanathapuram, Rajesh wrote:
> Hi Koji,
>
> My apologies for misunderstanding the question ...
>
> here is Fields ...
>
> <fields>
>
>   <field name="storyid" type="string" indexed="true" stored="true"
> required="true" />
>      <field name="slug" type="text" indexed="false" stored="true" />
>      <field name="author" type="string" indexed="true" stored="true"
/>
>      <field name="status" type="string" indexed="false" stored="true"
/>
>      <field name="docdate" type="tdate" indexed="true" stored="true"
/>	
>      <field name="createdate" type="tdate" indexed="false"
stored="true"
> />
>      <field name="modifyby" type="string" indexed="true" stored="true"
/>
> 	<field name="story" type="text" indexed="false" stored="true" />
> 	<field name="queue" type="lowercase" indexed="true"
> stored="true" />
> 	<field name="modifydate" type="tdate" indexed="false"
> stored="true" />
> 	<field name="endorser" type="string" indexed="false"
> stored="true" />
> 	<field name="slug_sort" type="lowercase" indexed="true"
> stored="false" />
> 	<field name="url" type="string" indexed="false" stored="true" />
> 	<field name="showtitle" type="string" indexed="true"
> stored="true" />
> 	<field name="date_sort" type="pdate" indexed="true"
> stored="false" sortMissingFirst="true" />
>
> 	<field name="site" type="string" stored="true" indexed="true"/>
> 	<field name="segment" type="string" stored="true"
> indexed="false"/>
>      <field name="digest" type="string" stored="true"
indexed="false"/>
>      <field name="boost" type="float" stored="true" indexed="false"/>
> 	<field name="host" type="url" stored="false" indexed="true"/>
> 	<field name="tstamp" type="long" stored="true" indexed="false"
> />
> 	<field name="anchor" type="string" stored="true" indexed="true"
>              multiValued="true"/>
> 	<field name="headline" type="string" indexed="true"
> stored="true" />
> 	<field name="highlight" type="string" indexed="true"
> stored="true" />
> 	<field name="guests" type="string" indexed="true" stored="true"
> />
> 	<field name="transcriptnum" type="string" indexed="false"
> stored="true" />
> 	<field name="additionalinewsfields" type="text" indexed="false"
> stored="true" />
>
>     <field name="all_text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>
>     <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
>
>     <dynamicField name="*_kstem"  type="text_kstem" indexed="true"
> stored="true" multiValued="true"/>
> </fields>
> <uniqueKey>storyid</uniqueKey>
>
>
> And here is Types ...
>
> <types>
>       <fieldType name="string" class="solr.StrField"
> sortMissingLast="true" omitNorms="true"/>
>
>       <fieldType name="boolean" class="solr.BoolField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldtype name="binary" class="solr.BinaryField"/>
>
>      <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="float" class="solr.TrieFloatField"
> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="long" class="solr.TrieLongField"
precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="double" class="solr.TrieDoubleField"
> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
>
>      <fieldType name="tint" class="solr.TrieIntField"
precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tfloat" class="solr.TrieFloatField"
> precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tlong" class="solr.TrieLongField"
precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tdouble" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>
>      <fieldType name="date" class="solr.TrieDateField"
omitNorms="true"
> precisionStep="0" positionIncrementGap="0"/>
>
>      <fieldType name="tdate" class="solr.TrieDateField"
omitNorms="true"
> precisionStep="6" positionIncrementGap="0"/>
>
>
>      <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
>      <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
>      <fieldType name="pfloat" class="solr.FloatField"
omitNorms="true"/>
>      <fieldType name="pdouble" class="solr.DoubleField"
> omitNorms="true"/>
>      <fieldType name="pdate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="slong" class="solr.SortableLongField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="sfloat" class="solr.SortableFloatField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="sdouble" class="solr.SortableDoubleField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldType name="random" class="solr.RandomSortField"
indexed="true"
> />
>
>      <!-- A text field that only splits on whitespace for exact 
> matching of words -->
>      <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text_kstem" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>
>          <filter
>
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter
>
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="textTight" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="textgen" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text_rev" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>             maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>      </fieldType>
>      <fieldType name="alphaOnlySort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>        <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
>          <filter class="solr.TrimFilterFactory" />
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement="" replace="all"
>          />
>        </analyzer>
>      </fieldType>
>
>      <fieldtype name="phonetic" stored="false" indexed="true"
> class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.DoubleMetaphoneFilterFactory"
> inject="false"/>
>        </analyzer>
>      </fieldtype>
>
>      <fieldtype name="payloads" stored="false" indexed="true"
> class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.DelimitedPayloadTokenFilterFactory"
> encoder="float"/>
>        </analyzer>
>      </fieldtype>
>
>      <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
> 		<filter class="solr.TrimFilterFactory" />
>        </analyzer>
>      </fieldType>
> 	
> 	<fieldType name="url" class="solr.TextField"
>              positionIncrementGap="100">
>              <analyzer>
>                  <tokenizer class="solr.StandardTokenizerFactory"/>
>                  <filter class="solr.LowerCaseFilterFactory"/>
>                  <filter class="solr.WordDelimiterFilterFactory"
>                      generateWordParts="1" generateNumberParts="1"/>
>                  <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>              </analyzer>
>          </fieldType>
>
>      <fieldtype name="ignored" stored="false" indexed="false"
> multiValued="true" class="solr.StrField" />
>
>   </types>
>
>
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
> Sent: Saturday, April 23, 2011 9:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> Hi Rajesh,
>
> My question was how story and slug fields are defined in schema.xml.
> In other words, please show us your<fieldType/>  and<field/>  for 
> those fields.
>
> Koji
> --
> http://www.rondhuit.com/en/
>
>
> (11/04/24 10:18), Ramanathapuram, Rajesh wrote:
>> I don't have hl.fl defined in my schema.xml, I am passing it in as my

>> query parameters
>>
>> <str name="hl.fl">story, slug</str>
>>
>> The elongated parameters is sent like this...
>> 'hl' =>   'on',
>>       'hl.fragsize' =>   $fragsize,
>>       'hl.maxAnalyzedChars' =>   $fragsize,
>>       'hl.fl' =>   'slug,story',
>>       'hl.simple.pre' =>   '<span class="' .$className . '">',
>>       'hl.simple.post' =>   '</span>',
>>
>> Here is my query params in response
>>
>>    -<response>
>>    -<lst name="responseHeader">
>>      <int name="status">0</int>
>>      <int name="QTime">26</int>
>>    -<lst name="params">
>>         <str name="hl.fragsize">100000</str>
>>         <str name="explainOther" />
>>         <str name="indent">on</str>
>>         <str name="hl.fl">story, slug</str>
>>         <str name="wt">standard</str>
>>         <str name="hl">on</str>
>>         <str name="rows">10</str>
>>         <str name="version">2.2</str>
>>         <str name="hl.highlightMultiTerm">true</str>
>>         <str name="fl">*</str>
>>         <str name="start">0</str>
>>         <str name="q">mec us chile</str>
>>         <str name="qt">standard</str>
>>         <str name="hl.usePhraseHighlighter">true</str>
>>         <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>      </lst>
>>      </lst>
>>
>> Please let me know.
>>
>> thanks&   regards,
>> Rajesh Ramana
>>
>>
>> -----Original Message-----
>> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
>> Sent: Friday, April 22, 2011 8:38 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr - Multi Term highlighting issue
>>
>> How are your hl.fl fields defined in schema.xml?
>>
>> Koji
>> --
>> http://www.rondhuit.com/en/
>>
>> (11/04/23 1:23), Ramanathapuram, Rajesh wrote:
>>> Does anybody has other suggestions?
>>>
>>> thanks&    regards,
>>> Rajesh Ramana
>>> Enterprise Applications, Turner Broadcasting System, Inc.
>>> 404.878.7474
>>>
>>>
>>> -----Original Message-----
>>> From: Ramanathapuram, Rajesh
> [mailto:Rajesh.Ramanathapuram@turner.com]
>>> Sent: Wednesday, April 20, 2011 2:51 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Solr - Multi Term highlighting issue
>>>
>>> Thanks Erick.
>>>
>>> I tried your suggestion, the issue still exists.
>>>
>>>
> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
>>>
> &q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
>>>
> rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
>>>
> 000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
>>> ntiguous=false
>>>
>>> -<lst name="params">
>>>      <str name="hl.fragsize">100000</str>
>>>      <str name="explainOther" />
>>>      <str name="indent">on</str>
>>>      <str name="hl.mergeContiguous">false</str>    ....
>>>
>>>
>>> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES
> ...
>>>
>>>
>>> thanks&    regards,
>>> Rajesh Ramana
>>>
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Sent: Wednesday, April 20, 2011 11:59 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr - Multi Term highlighting issue
>>>
>>> Does your configuration have "hl.mergeContiguous" set to true by any
>> chance? And what happens if you explicitly set this to "false" on 
>> your query?
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
>> Rajesh<Ra...@turner.com>    wrote:
>>>> Hello,
>>>>
>>>> I am dealing with a highlighting issue in SOLR, I will try to
> explain
>>
>>>> the issue.
>>>>
>>>> When I search for a single term in solr, it wraps<em>    tag around
> the
>>
>>>> words I want to highlight, all works well.
>>>> But if I search multiple term, for most part highlighting works 
>>>> good and then for some of the terms, the highlight return multiple 
>>>> terms
>> in
>>>> a sing<em>    tag     ...
>>>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>    I expect solr to
return
>>>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>>>> <em>srchtrm2</em>
>>>>
>>>> When I search for 'US mec chile', here is how my result appears
>>>>     ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST
> FIRES:
>>>> We had ... with<em>US</em>    and<em>Chile</em>    ...,
>>>>     (<em>MEC)</b></p><p></p><p><b>US</em>      ....
>>>>
>>>> This is what I was expecting it to be
>>>>     ... Corboba.
>>>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>>>> FIRES: We had ... with<em>US</em>    and<em>Chile</em>    ...,
>>>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>    ....
>>>>
>>>> Here is my query params
>>>> -<response>
>>>> -<lst name="responseHeader">
>>>>     <int name="status">0</int>
>>>>     <int name="QTime">26</int>
>>>> -<lst name="params">
>>>>        <str name="hl.fragsize">100000</str>
>>>>        <str name="explainOther" />
>>>>        <str name="indent">on</str>
>>>>        <str name="hl.fl">story, slug</str>
>>>>        <str name="wt">standard</str>
>>>>        <str name="hl">on</str>
>>>>        <str name="rows">10</str>
>>>>        <str name="version">2.2</str>
>>>>        <str name="hl.highlightMultiTerm">true</str>
>>>>        <str name="fl">*</str>
>>>>        <str name="start">0</str>
>>>>        <str name="q">mec us chile</str>
>>>>        <str name="qt">standard</str>
>>>>        <str name="hl.usePhraseHighlighter">true</str>
>>>>        <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>>>     </lst>
>>>>     </lst>
>>>>
>>>> Here are some other links I found in the forum, but no real 
>>>> conclusion
>>>>
>>>>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>>>> _
>>>> hi
>>>> ghlighting_question#78163c42a67cb533
>>>>
>>>> I am going to try this patch, which also had no conclusive results
>>>>      https://issues.apache.org/jira/browse/SOLR-1394
>>>>
>>>> Has anyone come across this issue?
>>>> Any suggestions on how to fix this issue is much appreciated.
>>>>
>>>>
>>>> thanks&    regards,
>>>> Rajesh Ramana
>>>>
>>>
>>
>>
>>
>
>
>


--
http://www.rondhuit.com/en/

Re: Solr - Multi Term highlighting issue

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Thank you for sending the definitions. I thought you defined n-gram based
field for story and slug, but your definitions looks fine for me.

I don't understand why you got such strange snippets.
I think you can open a jira issue for this problem (sorry I cannot take it
at this moment) with a test case that reproduces the problem would be much helpful.

Koji

(11/04/24 11:50), Ramanathapuram, Rajesh wrote:
> Hi Koji,
>
> My apologies for misunderstanding the question ...
>
> here is Fields ...
>
> <fields>
>
>   <field name="storyid" type="string" indexed="true" stored="true"
> required="true" />
>      <field name="slug" type="text" indexed="false" stored="true" />
>      <field name="author" type="string" indexed="true" stored="true" />
>      <field name="status" type="string" indexed="false" stored="true" />
>      <field name="docdate" type="tdate" indexed="true" stored="true"  />	
>      <field name="createdate" type="tdate" indexed="false" stored="true"
> />
>      <field name="modifyby" type="string" indexed="true" stored="true" />
> 	<field name="story" type="text" indexed="false" stored="true" />
> 	<field name="queue" type="lowercase" indexed="true"
> stored="true" />
> 	<field name="modifydate" type="tdate" indexed="false"
> stored="true" />
> 	<field name="endorser" type="string" indexed="false"
> stored="true" />
> 	<field name="slug_sort" type="lowercase" indexed="true"
> stored="false" />
> 	<field name="url" type="string" indexed="false" stored="true" />
> 	<field name="showtitle" type="string" indexed="true"
> stored="true" />
> 	<field name="date_sort" type="pdate" indexed="true"
> stored="false" sortMissingFirst="true" />
>
> 	<field name="site" type="string" stored="true" indexed="true"/>
> 	<field name="segment" type="string" stored="true"
> indexed="false"/>
>      <field name="digest" type="string" stored="true" indexed="false"/>
>      <field name="boost" type="float" stored="true" indexed="false"/>
> 	<field name="host" type="url" stored="false" indexed="true"/>
> 	<field name="tstamp" type="long" stored="true" indexed="false"
> />
> 	<field name="anchor" type="string" stored="true" indexed="true"
>              multiValued="true"/>
> 	<field name="headline" type="string" indexed="true"
> stored="true" />
> 	<field name="highlight" type="string" indexed="true"
> stored="true" />
> 	<field name="guests" type="string" indexed="true" stored="true"
> />
> 	<field name="transcriptnum" type="string" indexed="false"
> stored="true" />
> 	<field name="additionalinewsfields" type="text" indexed="false"
> stored="true" />
>
>     <field name="all_text" type="text" indexed="true" stored="false"
> multiValued="true"/>
>
>     <field name="timestamp" type="date" indexed="true" stored="true"
> default="NOW" multiValued="false"/>
>
>     <dynamicField name="*_kstem"  type="text_kstem" indexed="true"
> stored="true" multiValued="true"/>
> </fields>
> <uniqueKey>storyid</uniqueKey>
>
>
> And here is Types ...
>
> <types>
>       <fieldType name="string" class="solr.StrField"
> sortMissingLast="true" omitNorms="true"/>
>
>       <fieldType name="boolean" class="solr.BoolField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldtype name="binary" class="solr.BinaryField"/>
>
>      <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="float" class="solr.TrieFloatField"
> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="double" class="solr.TrieDoubleField"
> precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
>
>      <fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tfloat" class="solr.TrieFloatField"
> precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8"
> omitNorms="true" positionIncrementGap="0"/>
>      <fieldType name="tdouble" class="solr.TrieDoubleField"
> precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
>
>      <fieldType name="date" class="solr.TrieDateField" omitNorms="true"
> precisionStep="0" positionIncrementGap="0"/>
>
>      <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
> precisionStep="6" positionIncrementGap="0"/>
>
>
>      <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
>      <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
>      <fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
>      <fieldType name="pdouble" class="solr.DoubleField"
> omitNorms="true"/>
>      <fieldType name="pdate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldType name="sint" class="solr.SortableIntField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="slong" class="solr.SortableLongField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="sfloat" class="solr.SortableFloatField"
> sortMissingLast="true" omitNorms="true"/>
>      <fieldType name="sdouble" class="solr.SortableDoubleField"
> sortMissingLast="true" omitNorms="true"/>
>
>      <fieldType name="random" class="solr.RandomSortField" indexed="true"
> />
>
>      <!-- A text field that only splits on whitespace for exact matching
> of words -->
>      <fieldType name="text_ws" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text_kstem" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="false" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>
>          <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter
> class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
> protected="protwords.txt"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="textTight" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"/>
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="0" generateNumberParts="0" catenateWords="1"
> catenateNumbers="1" catenateAll="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.SnowballPorterFilterFactory"
> language="English" protected="protwords.txt"/>
>          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="textgen" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>      </fieldType>
>
>      <fieldType name="text_rev" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer type="index">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true" />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>          <filter class="solr.ReversedWildcardFilterFactory"
> withOriginal="true"
>             maxPosAsterisk="3" maxPosQuestion="2"
> maxFractionAsterisk="0.33"/>
>        </analyzer>
>        <analyzer type="query">
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.SynonymFilterFactory"
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
>          <filter class="solr.StopFilterFactory"
>                  ignoreCase="true"
>                  words="stopwords.txt"
>                  enablePositionIncrements="true"
>                  />
>          <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
>          <filter class="solr.LowerCaseFilterFactory"/>
>        </analyzer>
>      </fieldType>
>      <fieldType name="alphaOnlySort" class="solr.TextField"
> sortMissingLast="true" omitNorms="true">
>        <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
>          <filter class="solr.TrimFilterFactory" />
>          <filter class="solr.PatternReplaceFilterFactory"
>                  pattern="([^a-z])" replacement="" replace="all"
>          />
>        </analyzer>
>      </fieldType>
>
>      <fieldtype name="phonetic" stored="false" indexed="true"
> class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.StandardTokenizerFactory"/>
>          <filter class="solr.DoubleMetaphoneFilterFactory"
> inject="false"/>
>        </analyzer>
>      </fieldtype>
>
>      <fieldtype name="payloads" stored="false" indexed="true"
> class="solr.TextField">
>        <analyzer>
>          <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>          <filter class="solr.DelimitedPayloadTokenFilterFactory"
> encoder="float"/>
>        </analyzer>
>      </fieldtype>
>
>      <fieldType name="lowercase" class="solr.TextField"
> positionIncrementGap="100">
>        <analyzer>
>          <tokenizer class="solr.KeywordTokenizerFactory"/>
>          <filter class="solr.LowerCaseFilterFactory" />
> 		<filter class="solr.TrimFilterFactory" />
>        </analyzer>
>      </fieldType>
> 	
> 	<fieldType name="url" class="solr.TextField"
>              positionIncrementGap="100">
>              <analyzer>
>                  <tokenizer class="solr.StandardTokenizerFactory"/>
>                  <filter class="solr.LowerCaseFilterFactory"/>
>                  <filter class="solr.WordDelimiterFilterFactory"
>                      generateWordParts="1" generateNumberParts="1"/>
>                  <filter
> class="solr.RemoveDuplicatesTokenFilterFactory"/>
>              </analyzer>
>          </fieldType>
>
>      <fieldtype name="ignored" stored="false" indexed="false"
> multiValued="true" class="solr.StrField" />
>
>   </types>
>
>
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
> Sent: Saturday, April 23, 2011 9:51 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> Hi Rajesh,
>
> My question was how story and slug fields are defined in schema.xml.
> In other words, please show us your<fieldType/>  and<field/>  for those
> fields.
>
> Koji
> --
> http://www.rondhuit.com/en/
>
>
> (11/04/24 10:18), Ramanathapuram, Rajesh wrote:
>> I don't have hl.fl defined in my schema.xml, I am passing it in as my
>> query parameters
>>
>> <str name="hl.fl">story, slug</str>
>>
>> The elongated parameters is sent like this...
>> 'hl' =>   'on',
>>       'hl.fragsize' =>   $fragsize,
>>       'hl.maxAnalyzedChars' =>   $fragsize,
>>       'hl.fl' =>   'slug,story',
>>       'hl.simple.pre' =>   '<span class="' .$className . '">',
>>       'hl.simple.post' =>   '</span>',
>>
>> Here is my query params in response
>>
>>    -<response>
>>    -<lst name="responseHeader">
>>      <int name="status">0</int>
>>      <int name="QTime">26</int>
>>    -<lst name="params">
>>         <str name="hl.fragsize">100000</str>
>>         <str name="explainOther" />
>>         <str name="indent">on</str>
>>         <str name="hl.fl">story, slug</str>
>>         <str name="wt">standard</str>
>>         <str name="hl">on</str>
>>         <str name="rows">10</str>
>>         <str name="version">2.2</str>
>>         <str name="hl.highlightMultiTerm">true</str>
>>         <str name="fl">*</str>
>>         <str name="start">0</str>
>>         <str name="q">mec us chile</str>
>>         <str name="qt">standard</str>
>>         <str name="hl.usePhraseHighlighter">true</str>
>>         <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>      </lst>
>>      </lst>
>>
>> Please let me know.
>>
>> thanks&   regards,
>> Rajesh Ramana
>>
>>
>> -----Original Message-----
>> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
>> Sent: Friday, April 22, 2011 8:38 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr - Multi Term highlighting issue
>>
>> How are your hl.fl fields defined in schema.xml?
>>
>> Koji
>> --
>> http://www.rondhuit.com/en/
>>
>> (11/04/23 1:23), Ramanathapuram, Rajesh wrote:
>>> Does anybody has other suggestions?
>>>
>>> thanks&    regards,
>>> Rajesh Ramana
>>> Enterprise Applications, Turner Broadcasting System, Inc.
>>> 404.878.7474
>>>
>>>
>>> -----Original Message-----
>>> From: Ramanathapuram, Rajesh
> [mailto:Rajesh.Ramanathapuram@turner.com]
>>> Sent: Wednesday, April 20, 2011 2:51 PM
>>> To: solr-user@lucene.apache.org
>>> Subject: RE: Solr - Multi Term highlighting issue
>>>
>>> Thanks Erick.
>>>
>>> I tried your suggestion, the issue still exists.
>>>
>>>
> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
>>>
> &q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
>>>
> rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
>>>
> 000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
>>> ntiguous=false
>>>
>>> -<lst name="params">
>>>      <str name="hl.fragsize">100000</str>
>>>      <str name="explainOther" />
>>>      <str name="indent">on</str>
>>>      <str name="hl.mergeContiguous">false</str>    ....
>>>
>>>
>>> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES
> ...
>>>
>>>
>>> thanks&    regards,
>>> Rajesh Ramana
>>>
>>>
>>> -----Original Message-----
>>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>>> Sent: Wednesday, April 20, 2011 11:59 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Solr - Multi Term highlighting issue
>>>
>>> Does your configuration have "hl.mergeContiguous" set to true by any
>> chance? And what happens if you explicitly set this to "false" on your
>> query?
>>>
>>> Best
>>> Erick
>>>
>>> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
>> Rajesh<Ra...@turner.com>    wrote:
>>>> Hello,
>>>>
>>>> I am dealing with a highlighting issue in SOLR, I will try to
> explain
>>
>>>> the issue.
>>>>
>>>> When I search for a single term in solr, it wraps<em>    tag around
> the
>>
>>>> words I want to highlight, all works well.
>>>> But if I search multiple term, for most part highlighting works good
>>>> and then for some of the terms, the highlight return multiple terms
>> in
>>>> a sing<em>    tag     ...
>>>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>    I expect solr to return
>>>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>>>> <em>srchtrm2</em>
>>>>
>>>> When I search for 'US mec chile', here is how my result appears
>>>>     ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST
> FIRES:
>>>> We had ... with<em>US</em>    and<em>Chile</em>    ...,
>>>>     (<em>MEC)</b></p><p></p><p><b>US</em>      ....
>>>>
>>>> This is what I was expecting it to be
>>>>     ... Corboba.
>>>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>>>> FIRES: We had ... with<em>US</em>    and<em>Chile</em>    ...,
>>>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>    ....
>>>>
>>>> Here is my query params
>>>> -<response>
>>>> -<lst name="responseHeader">
>>>>     <int name="status">0</int>
>>>>     <int name="QTime">26</int>
>>>> -<lst name="params">
>>>>        <str name="hl.fragsize">100000</str>
>>>>        <str name="explainOther" />
>>>>        <str name="indent">on</str>
>>>>        <str name="hl.fl">story, slug</str>
>>>>        <str name="wt">standard</str>
>>>>        <str name="hl">on</str>
>>>>        <str name="rows">10</str>
>>>>        <str name="version">2.2</str>
>>>>        <str name="hl.highlightMultiTerm">true</str>
>>>>        <str name="fl">*</str>
>>>>        <str name="start">0</str>
>>>>        <str name="q">mec us chile</str>
>>>>        <str name="qt">standard</str>
>>>>        <str name="hl.usePhraseHighlighter">true</str>
>>>>        <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>>>     </lst>
>>>>     </lst>
>>>>
>>>> Here are some other links I found in the forum, but no real
>>>> conclusion
>>>>
>>>>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>>>> _
>>>> hi
>>>> ghlighting_question#78163c42a67cb533
>>>>
>>>> I am going to try this patch, which also had no conclusive results
>>>>      https://issues.apache.org/jira/browse/SOLR-1394
>>>>
>>>> Has anyone come across this issue?
>>>> Any suggestions on how to fix this issue is much appreciated.
>>>>
>>>>
>>>> thanks&    regards,
>>>> Rajesh Ramana
>>>>
>>>
>>
>>
>>
>
>
>


-- 
http://www.rondhuit.com/en/

RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Hi Koji, 

My apologies for misunderstanding the question ... 

here is Fields ...

<fields>
  
 <field name="storyid" type="string" indexed="true" stored="true"
required="true" /> 
    <field name="slug" type="text" indexed="false" stored="true" />
    <field name="author" type="string" indexed="true" stored="true" />
    <field name="status" type="string" indexed="false" stored="true" />
    <field name="docdate" type="tdate" indexed="true" stored="true"  />	
    <field name="createdate" type="tdate" indexed="false" stored="true"
/>
    <field name="modifyby" type="string" indexed="true" stored="true" />
	<field name="story" type="text" indexed="false" stored="true" />
	<field name="queue" type="lowercase" indexed="true"
stored="true" />
	<field name="modifydate" type="tdate" indexed="false"
stored="true" />
	<field name="endorser" type="string" indexed="false"
stored="true" />
	<field name="slug_sort" type="lowercase" indexed="true"
stored="false" />
	<field name="url" type="string" indexed="false" stored="true" />
	<field name="showtitle" type="string" indexed="true"
stored="true" />
	<field name="date_sort" type="pdate" indexed="true"
stored="false" sortMissingFirst="true" />

	<field name="site" type="string" stored="true" indexed="true"/>
	<field name="segment" type="string" stored="true"
indexed="false"/>
    <field name="digest" type="string" stored="true" indexed="false"/>
    <field name="boost" type="float" stored="true" indexed="false"/>
	<field name="host" type="url" stored="false" indexed="true"/>
	<field name="tstamp" type="long" stored="true" indexed="false"
/> 
	<field name="anchor" type="string" stored="true" indexed="true"
            multiValued="true"/>
	<field name="headline" type="string" indexed="true"
stored="true" />
	<field name="highlight" type="string" indexed="true"
stored="true" />
	<field name="guests" type="string" indexed="true" stored="true"
/>
	<field name="transcriptnum" type="string" indexed="false"
stored="true" />
	<field name="additionalinewsfields" type="text" indexed="false"
stored="true" />

   <field name="all_text" type="text" indexed="true" stored="false"
multiValued="true"/>

   <field name="timestamp" type="date" indexed="true" stored="true"
default="NOW" multiValued="false"/>
   
   <dynamicField name="*_kstem"  type="text_kstem" indexed="true"
stored="true" multiValued="true"/>
</fields>
<uniqueKey>storyid</uniqueKey>


And here is Types ...

<types>
     <fieldType name="string" class="solr.StrField"
sortMissingLast="true" omitNorms="true"/>

     <fieldType name="boolean" class="solr.BoolField"
sortMissingLast="true" omitNorms="true"/>
 
    <fieldtype name="binary" class="solr.BinaryField"/>

    <fieldType name="int" class="solr.TrieIntField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="float" class="solr.TrieFloatField"
precisionStep="0" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="long" class="solr.TrieLongField" precisionStep="0"
omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="double" class="solr.TrieDoubleField"
precisionStep="0" omitNorms="true" positionIncrementGap="0"/>

    <fieldType name="tint" class="solr.TrieIntField" precisionStep="8"
omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tfloat" class="solr.TrieFloatField"
precisionStep="8" omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tlong" class="solr.TrieLongField" precisionStep="8"
omitNorms="true" positionIncrementGap="0"/>
    <fieldType name="tdouble" class="solr.TrieDoubleField"
precisionStep="8" omitNorms="true" positionIncrementGap="0"/>

    <fieldType name="date" class="solr.TrieDateField" omitNorms="true"
precisionStep="0" positionIncrementGap="0"/>

    <fieldType name="tdate" class="solr.TrieDateField" omitNorms="true"
precisionStep="6" positionIncrementGap="0"/>


    <fieldType name="pint" class="solr.IntField" omitNorms="true"/>
    <fieldType name="plong" class="solr.LongField" omitNorms="true"/>
    <fieldType name="pfloat" class="solr.FloatField" omitNorms="true"/>
    <fieldType name="pdouble" class="solr.DoubleField"
omitNorms="true"/>
    <fieldType name="pdate" class="solr.DateField"
sortMissingLast="true" omitNorms="true"/>

    <fieldType name="sint" class="solr.SortableIntField"
sortMissingLast="true" omitNorms="true"/>
    <fieldType name="slong" class="solr.SortableLongField"
sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sfloat" class="solr.SortableFloatField"
sortMissingLast="true" omitNorms="true"/>
    <fieldType name="sdouble" class="solr.SortableDoubleField"
sortMissingLast="true" omitNorms="true"/>

    <fieldType name="random" class="solr.RandomSortField" indexed="true"
/>

    <!-- A text field that only splits on whitespace for exact matching
of words -->
    <fieldType name="text_ws" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_kstem" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="false" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>

        <filter
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter
class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory"
protected="protwords.txt"/>
      </analyzer>
    </fieldType>

    <fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
      </analyzer>
    </fieldType>

    <fieldType name="textTight" class="solr.TextField"
positionIncrementGap="100" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="0" generateNumberParts="0" catenateWords="1"
catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.SnowballPorterFilterFactory"
language="English" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="textgen" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>

    <fieldType name="text_rev" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.ReversedWildcardFilterFactory"
withOriginal="true"
           maxPosAsterisk="3" maxPosQuestion="2"
maxFractionAsterisk="0.33"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="stopwords.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
    </fieldType>
    <fieldType name="alphaOnlySort" class="solr.TextField"
sortMissingLast="true" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
        <filter class="solr.TrimFilterFactory" />
        <filter class="solr.PatternReplaceFilterFactory"
                pattern="([^a-z])" replacement="" replace="all"
        />
      </analyzer>
    </fieldType>
    
    <fieldtype name="phonetic" stored="false" indexed="true"
class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.DoubleMetaphoneFilterFactory"
inject="false"/>
      </analyzer>
    </fieldtype>

    <fieldtype name="payloads" stored="false" indexed="true"
class="solr.TextField" >
      <analyzer>
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.DelimitedPayloadTokenFilterFactory"
encoder="float"/>
      </analyzer>
    </fieldtype>

    <fieldType name="lowercase" class="solr.TextField"
positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.KeywordTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory" />
		<filter class="solr.TrimFilterFactory" />
      </analyzer>
    </fieldType>
	
	 <fieldType name="url" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer>
                <tokenizer class="solr.StandardTokenizerFactory"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" generateNumberParts="1"/>
                <filter
class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>

    <fieldtype name="ignored" stored="false" indexed="false"
multiValued="true" class="solr.StrField" /> 

 </types>



thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Koji Sekiguchi [mailto:koji@r.email.ne.jp] 
Sent: Saturday, April 23, 2011 9:51 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Hi Rajesh,

My question was how story and slug fields are defined in schema.xml.
In other words, please show us your <fieldType/> and <field/> for those
fields.

Koji
--
http://www.rondhuit.com/en/


(11/04/24 10:18), Ramanathapuram, Rajesh wrote:
> I don't have hl.fl defined in my schema.xml, I am passing it in as my
> query parameters
>
> <str name="hl.fl">story, slug</str>
>
> The elongated parameters is sent like this...
> 'hl' =>  'on',
>      'hl.fragsize' =>  $fragsize,
>      'hl.maxAnalyzedChars' =>  $fragsize,
>      'hl.fl' =>  'slug,story',
>      'hl.simple.pre' =>  '<span class="' .$className . '">',
>      'hl.simple.post' =>  '</span>',
>
> Here is my query params in response
>
>   -<response>
>   -<lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">26</int>
>   -<lst name="params">
>        <str name="hl.fragsize">100000</str>
>        <str name="explainOther" />
>        <str name="indent">on</str>
>        <str name="hl.fl">story, slug</str>
>        <str name="wt">standard</str>
>        <str name="hl">on</str>
>        <str name="rows">10</str>
>        <str name="version">2.2</str>
>        <str name="hl.highlightMultiTerm">true</str>
>        <str name="fl">*</str>
>        <str name="start">0</str>
>        <str name="q">mec us chile</str>
>        <str name="qt">standard</str>
>        <str name="hl.usePhraseHighlighter">true</str>
>        <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>     </lst>
>     </lst>
>
> Please let me know.
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
> Sent: Friday, April 22, 2011 8:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> How are your hl.fl fields defined in schema.xml?
>
> Koji
> --
> http://www.rondhuit.com/en/
>
> (11/04/23 1:23), Ramanathapuram, Rajesh wrote:
>> Does anybody has other suggestions?
>>
>> thanks&   regards,
>> Rajesh Ramana
>> Enterprise Applications, Turner Broadcasting System, Inc.
>> 404.878.7474
>>
>>
>> -----Original Message-----
>> From: Ramanathapuram, Rajesh
[mailto:Rajesh.Ramanathapuram@turner.com]
>> Sent: Wednesday, April 20, 2011 2:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr - Multi Term highlighting issue
>>
>> Thanks Erick.
>>
>> I tried your suggestion, the issue still exists.
>>
>>
http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
>>
&q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
>>
rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
>>
000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
>> ntiguous=false
>>
>> -<lst name="params">
>>     <str name="hl.fragsize">100000</str>
>>     <str name="explainOther" />
>>     <str name="indent">on</str>
>>     <str name="hl.mergeContiguous">false</str>   ....
>>
>>
>> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES
...
>>
>>
>> thanks&   regards,
>> Rajesh Ramana
>>
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Wednesday, April 20, 2011 11:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr - Multi Term highlighting issue
>>
>> Does your configuration have "hl.mergeContiguous" set to true by any
> chance? And what happens if you explicitly set this to "false" on your
> query?
>>
>> Best
>> Erick
>>
>> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
> Rajesh<Ra...@turner.com>   wrote:
>>> Hello,
>>>
>>> I am dealing with a highlighting issue in SOLR, I will try to
explain
>
>>> the issue.
>>>
>>> When I search for a single term in solr, it wraps<em>   tag around
the
>
>>> words I want to highlight, all works well.
>>> But if I search multiple term, for most part highlighting works good
>>> and then for some of the terms, the highlight return multiple terms
> in
>>> a sing<em>   tag     ...
>>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>   I expect solr to return
>>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>>> <em>srchtrm2</em>
>>>
>>> When I search for 'US mec chile', here is how my result appears
>>>    ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST
FIRES:
>>> We had ... with<em>US</em>   and<em>Chile</em>   ...,
>>>    (<em>MEC)</b></p><p></p><p><b>US</em>     ....
>>>
>>> This is what I was expecting it to be
>>>    ... Corboba.
>>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>>> FIRES: We had ... with<em>US</em>   and<em>Chile</em>   ...,
>>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>   ....
>>>
>>> Here is my query params
>>> -<response>
>>> -<lst name="responseHeader">
>>>    <int name="status">0</int>
>>>    <int name="QTime">26</int>
>>> -<lst name="params">
>>>       <str name="hl.fragsize">100000</str>
>>>       <str name="explainOther" />
>>>       <str name="indent">on</str>
>>>       <str name="hl.fl">story, slug</str>
>>>       <str name="wt">standard</str>
>>>       <str name="hl">on</str>
>>>       <str name="rows">10</str>
>>>       <str name="version">2.2</str>
>>>       <str name="hl.highlightMultiTerm">true</str>
>>>       <str name="fl">*</str>
>>>       <str name="start">0</str>
>>>       <str name="q">mec us chile</str>
>>>       <str name="qt">standard</str>
>>>       <str name="hl.usePhraseHighlighter">true</str>
>>>       <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>>    </lst>
>>>    </lst>
>>>
>>> Here are some other links I found in the forum, but no real
>>> conclusion
>>>
>>>
http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>>> _
>>> hi
>>> ghlighting_question#78163c42a67cb533
>>>
>>> I am going to try this patch, which also had no conclusive results
>>>     https://issues.apache.org/jira/browse/SOLR-1394
>>>
>>> Has anyone come across this issue?
>>> Any suggestions on how to fix this issue is much appreciated.
>>>
>>>
>>> thanks&   regards,
>>> Rajesh Ramana
>>>
>>
>
>
>



Re: Solr - Multi Term highlighting issue

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
Hi Rajesh,

My question was how story and slug fields are defined in schema.xml.
In other words, please show us your <fieldType/> and <field/>
for those fields.

Koji
-- 
http://www.rondhuit.com/en/


(11/04/24 10:18), Ramanathapuram, Rajesh wrote:
> I don't have hl.fl defined in my schema.xml, I am passing it in as my
> query parameters
>
> <str name="hl.fl">story, slug</str>
>
> The elongated parameters is sent like this...
> 'hl' =>  'on',
>      'hl.fragsize' =>  $fragsize,
>      'hl.maxAnalyzedChars' =>  $fragsize,
>      'hl.fl' =>  'slug,story',
>      'hl.simple.pre' =>  '<span class="' .$className . '">',
>      'hl.simple.post' =>  '</span>',
>
> Here is my query params in response
>
>   -<response>
>   -<lst name="responseHeader">
>     <int name="status">0</int>
>     <int name="QTime">26</int>
>   -<lst name="params">
>        <str name="hl.fragsize">100000</str>
>        <str name="explainOther" />
>        <str name="indent">on</str>
>        <str name="hl.fl">story, slug</str>
>        <str name="wt">standard</str>
>        <str name="hl">on</str>
>        <str name="rows">10</str>
>        <str name="version">2.2</str>
>        <str name="hl.highlightMultiTerm">true</str>
>        <str name="fl">*</str>
>        <str name="start">0</str>
>        <str name="q">mec us chile</str>
>        <str name="qt">standard</str>
>        <str name="hl.usePhraseHighlighter">true</str>
>        <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>     </lst>
>     </lst>
>
> Please let me know.
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
> Sent: Friday, April 22, 2011 8:38 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> How are your hl.fl fields defined in schema.xml?
>
> Koji
> --
> http://www.rondhuit.com/en/
>
> (11/04/23 1:23), Ramanathapuram, Rajesh wrote:
>> Does anybody has other suggestions?
>>
>> thanks&   regards,
>> Rajesh Ramana
>> Enterprise Applications, Turner Broadcasting System, Inc.
>> 404.878.7474
>>
>>
>> -----Original Message-----
>> From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com]
>> Sent: Wednesday, April 20, 2011 2:51 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Solr - Multi Term highlighting issue
>>
>> Thanks Erick.
>>
>> I tried your suggestion, the issue still exists.
>>
>> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
>> &q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
>> rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
>> 000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
>> ntiguous=false
>>
>> -<lst name="params">
>>     <str name="hl.fragsize">100000</str>
>>     <str name="explainOther" />
>>     <str name="indent">on</str>
>>     <str name="hl.mergeContiguous">false</str>   ....
>>
>>
>> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...
>>
>>
>> thanks&   regards,
>> Rajesh Ramana
>>
>>
>> -----Original Message-----
>> From: Erick Erickson [mailto:erickerickson@gmail.com]
>> Sent: Wednesday, April 20, 2011 11:59 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Solr - Multi Term highlighting issue
>>
>> Does your configuration have "hl.mergeContiguous" set to true by any
> chance? And what happens if you explicitly set this to "false" on your
> query?
>>
>> Best
>> Erick
>>
>> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
> Rajesh<Ra...@turner.com>   wrote:
>>> Hello,
>>>
>>> I am dealing with a highlighting issue in SOLR, I will try to explain
>
>>> the issue.
>>>
>>> When I search for a single term in solr, it wraps<em>   tag around the
>
>>> words I want to highlight, all works well.
>>> But if I search multiple term, for most part highlighting works good
>>> and then for some of the terms, the highlight return multiple terms
> in
>>> a sing<em>   tag     ...
>>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>   I expect solr to return
>>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>>> <em>srchtrm2</em>
>>>
>>> When I search for 'US mec chile', here is how my result appears
>>>    ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES:
>>> We had ... with<em>US</em>   and<em>Chile</em>   ...,
>>>    (<em>MEC)</b></p><p></p><p><b>US</em>     ....
>>>
>>> This is what I was expecting it to be
>>>    ... Corboba.
>>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>>> FIRES: We had ... with<em>US</em>   and<em>Chile</em>   ...,
>>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>   ....
>>>
>>> Here is my query params
>>> -<response>
>>> -<lst name="responseHeader">
>>>    <int name="status">0</int>
>>>    <int name="QTime">26</int>
>>> -<lst name="params">
>>>       <str name="hl.fragsize">100000</str>
>>>       <str name="explainOther" />
>>>       <str name="indent">on</str>
>>>       <str name="hl.fl">story, slug</str>
>>>       <str name="wt">standard</str>
>>>       <str name="hl">on</str>
>>>       <str name="rows">10</str>
>>>       <str name="version">2.2</str>
>>>       <str name="hl.highlightMultiTerm">true</str>
>>>       <str name="fl">*</str>
>>>       <str name="start">0</str>
>>>       <str name="q">mec us chile</str>
>>>       <str name="qt">standard</str>
>>>       <str name="hl.usePhraseHighlighter">true</str>
>>>       <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>>    </lst>
>>>    </lst>
>>>
>>> Here are some other links I found in the forum, but no real
>>> conclusion
>>>
>>> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>>> _
>>> hi
>>> ghlighting_question#78163c42a67cb533
>>>
>>> I am going to try this patch, which also had no conclusive results
>>>     https://issues.apache.org/jira/browse/SOLR-1394
>>>
>>> Has anyone come across this issue?
>>> Any suggestions on how to fix this issue is much appreciated.
>>>
>>>
>>> thanks&   regards,
>>> Rajesh Ramana
>>>
>>
>
>
>



RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Also, I found this in SolrConfig.xml ...

<requestHandler name="dismax" class="solr.SearchHandler" >
    <lst name="defaults">
     <str name="defType">dismax</str>
     <str name="echoParams">explicit</str>
     <float name="tie">0.01</float>
     <str name="qf">
        text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
     </str>
     <str name="pf">
        text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
     </str>
     <str name="bf">
        popularity^0.5 recip(price,1,1000,1000)^0.3
     </str>
     <str name="fl">
        id,name,price,score
     </str>
     <str name="mm">
        2&lt;-1 5&lt;-2 6&lt;90%
     </str>
     <int name="ps">100</int>
     <str name="q.alt">*:*</str>
     <!-- example highlighter config, enable per-query with hl=true -->
     <str name="hl.fl">text features name</str>
     <!-- for this field, we want no fragmenting, just highlighting -->
     <str name="f.name.hl.fragsize">0</str>
     <!-- instructs Solr to return the field itself if no query terms
are
          found -->
     <str name="f.name.hl.alternateField">name</str>
     <str name="f.text.hl.fragmenter">regex</str> <!-- defined below -->
    </lst>
  </requestHandler>

And also this  ....

  <highlighting>
   <!-- Configure the standard fragmenter -->
   <!-- This could most likely be commented out in the "default" case
-->
   <fragmenter name="gap"
class="org.apache.solr.highlight.GapFragmenter" default="true">
    <lst name="defaults">
     <int name="hl.fragsize">100</int>
    </lst>
   </fragmenter>

   <!-- A regular-expression-based fragmenter (f.i., for sentence
extraction) -->
   <fragmenter name="regex"
class="org.apache.solr.highlight.RegexFragmenter">
    <lst name="defaults">
      <!-- slightly smaller fragsizes work better because of slop -->
      <int name="hl.fragsize">70</int>
      <!-- allow 50% slop on fragment sizes -->
      <float name="hl.regex.slop">0.5</float>
      <!-- a basic sentence pattern -->
      <str name="hl.regex.pattern">[-\w ,/\n\"']{20,200}</str>
    </lst>
   </fragmenter>

   <!-- Configure the standard formatter -->
   <formatter name="html"
class="org.apache.solr.highlight.HtmlFormatter" default="true">
    <lst name="defaults">
     <str name="hl.simple.pre"><![CDATA[<em>]]></str>
     <str name="hl.simple.post"><![CDATA[</em>]]></str>
    </lst>
   </formatter>
  </highlighting>

Hope this sheds some light on identifying this issue. 

thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com] 
Sent: Saturday, April 23, 2011 9:18 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Multi Term highlighting issue

I don't have hl.fl defined in my schema.xml, I am passing it in as my
query parameters

<str name="hl.fl">story, slug</str> 

The elongated parameters is sent like this...
'hl' => 'on',
    'hl.fragsize' => $fragsize,
    'hl.maxAnalyzedChars' => $fragsize,
    'hl.fl' => 'slug,story',
    'hl.simple.pre' => '<span class="' .$className . '">',
    'hl.simple.post' => '</span>',

Here is my query params in response 

 -<response>
 -<lst name="responseHeader">
   <int name="status">0</int>
   <int name="QTime">26</int>
 -<lst name="params">
      <str name="hl.fragsize">100000</str>
      <str name="explainOther" />
      <str name="indent">on</str>
      <str name="hl.fl">story, slug</str>
      <str name="wt">standard</str>
      <str name="hl">on</str>
      <str name="rows">10</str>
      <str name="version">2.2</str>
      <str name="hl.highlightMultiTerm">true</str>
      <str name="fl">*</str>
      <str name="start">0</str>
      <str name="q">mec us chile</str>
      <str name="qt">standard</str>
      <str name="hl.usePhraseHighlighter">true</str>
      <str name="fq">storyid="XXXX XXXX XXXXX"</str>
   </lst>
   </lst>

Please let me know.

thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Koji Sekiguchi [mailto:koji@r.email.ne.jp]
Sent: Friday, April 22, 2011 8:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

How are your hl.fl fields defined in schema.xml?

Koji
--
http://www.rondhuit.com/en/

(11/04/23 1:23), Ramanathapuram, Rajesh wrote:
> Does anybody has other suggestions?
>
> thanks&  regards,
> Rajesh Ramana
> Enterprise Applications, Turner Broadcasting System, Inc.
> 404.878.7474
>
>
> -----Original Message-----
> From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com]
> Sent: Wednesday, April 20, 2011 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr - Multi Term highlighting issue
>
> Thanks Erick.
>
> I tried your suggestion, the issue still exists.
>
> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
> &q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
> rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
> 000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
> ntiguous=false
>
> -<lst name="params">
>    <str name="hl.fragsize">100000</str>
>    <str name="explainOther" />
>    <str name="indent">on</str>
>    <str name="hl.mergeContiguous">false</str>  ....
>
>
> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...
>
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, April 20, 2011 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> Does your configuration have "hl.mergeContiguous" set to true by any
chance? And what happens if you explicitly set this to "false" on your
query?
>
> Best
> Erick
>
> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
Rajesh<Ra...@turner.com>  wrote:
>> Hello,
>>
>> I am dealing with a highlighting issue in SOLR, I will try to explain

>> the issue.
>>
>> When I search for a single term in solr, it wraps<em>  tag around the

>> words I want to highlight, all works well.
>> But if I search multiple term, for most part highlighting works good 
>> and then for some of the terms, the highlight return multiple terms
in
>> a sing<em>  tag     ...
>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>  I expect solr to return
>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>> <em>srchtrm2</em>
>>
>> When I search for 'US mec chile', here is how my result appears
>>   ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES:
>> We had ... with<em>US</em>  and<em>Chile</em>  ...,
>>   (<em>MEC)</b></p><p></p><p><b>US</em>    ....
>>
>> This is what I was expecting it to be
>>   ... Corboba. 
>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>> FIRES: We had ... with<em>US</em>  and<em>Chile</em>  ..., 
>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>  ....
>>
>> Here is my query params
>> -<response>
>> -<lst name="responseHeader">
>>   <int name="status">0</int>
>>   <int name="QTime">26</int>
>> -<lst name="params">
>>      <str name="hl.fragsize">100000</str>
>>      <str name="explainOther" />
>>      <str name="indent">on</str>
>>      <str name="hl.fl">story, slug</str>
>>      <str name="wt">standard</str>
>>      <str name="hl">on</str>
>>      <str name="rows">10</str>
>>      <str name="version">2.2</str>
>>      <str name="hl.highlightMultiTerm">true</str>
>>      <str name="fl">*</str>
>>      <str name="start">0</str>
>>      <str name="q">mec us chile</str>
>>      <str name="qt">standard</str>
>>      <str name="hl.usePhraseHighlighter">true</str>
>>      <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>   </lst>
>>   </lst>
>>
>> Here are some other links I found in the forum, but no real 
>> conclusion
>>
>> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>> _
>> hi
>> ghlighting_question#78163c42a67cb533
>>
>> I am going to try this patch, which also had no conclusive results
>>    https://issues.apache.org/jira/browse/SOLR-1394
>>
>> Has anyone come across this issue?
>> Any suggestions on how to fix this issue is much appreciated.
>>
>>
>> thanks&  regards,
>> Rajesh Ramana
>>
>



RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
I don't have hl.fl defined in my schema.xml, I am passing it in as my
query parameters

<str name="hl.fl">story, slug</str> 

The elongated parameters is sent like this...
'hl' => 'on',
    'hl.fragsize' => $fragsize,
    'hl.maxAnalyzedChars' => $fragsize,
    'hl.fl' => 'slug,story',
    'hl.simple.pre' => '<span class="' .$className . '">',
    'hl.simple.post' => '</span>',

Here is my query params in response 

 -<response>
 -<lst name="responseHeader">
   <int name="status">0</int>
   <int name="QTime">26</int>
 -<lst name="params">
      <str name="hl.fragsize">100000</str>
      <str name="explainOther" />
      <str name="indent">on</str>
      <str name="hl.fl">story, slug</str>
      <str name="wt">standard</str>
      <str name="hl">on</str>
      <str name="rows">10</str>
      <str name="version">2.2</str>
      <str name="hl.highlightMultiTerm">true</str>
      <str name="fl">*</str>
      <str name="start">0</str>
      <str name="q">mec us chile</str>
      <str name="qt">standard</str>
      <str name="hl.usePhraseHighlighter">true</str>
      <str name="fq">storyid="XXXX XXXX XXXXX"</str>
   </lst>
   </lst>

Please let me know.

thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Koji Sekiguchi [mailto:koji@r.email.ne.jp] 
Sent: Friday, April 22, 2011 8:38 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

How are your hl.fl fields defined in schema.xml?

Koji
--
http://www.rondhuit.com/en/

(11/04/23 1:23), Ramanathapuram, Rajesh wrote:
> Does anybody has other suggestions?
>
> thanks&  regards,
> Rajesh Ramana
> Enterprise Applications, Turner Broadcasting System, Inc.
> 404.878.7474
>
>
> -----Original Message-----
> From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com]
> Sent: Wednesday, April 20, 2011 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr - Multi Term highlighting issue
>
> Thanks Erick.
>
> I tried your suggestion, the issue still exists.
>
> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2
> &q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standa
> rd&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100
> 000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeCo
> ntiguous=false
>
> -<lst name="params">
>    <str name="hl.fragsize">100000</str>
>    <str name="explainOther" />
>    <str name="indent">on</str>
>    <str name="hl.mergeContiguous">false</str>  ....
>
>
> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...
>
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, April 20, 2011 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> Does your configuration have "hl.mergeContiguous" set to true by any
chance? And what happens if you explicitly set this to "false" on your
query?
>
> Best
> Erick
>
> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram,
Rajesh<Ra...@turner.com>  wrote:
>> Hello,
>>
>> I am dealing with a highlighting issue in SOLR, I will try to explain

>> the issue.
>>
>> When I search for a single term in solr, it wraps<em>  tag around the

>> words I want to highlight, all works well.
>> But if I search multiple term, for most part highlighting works good 
>> and then for some of the terms, the highlight return multiple terms
in
>> a sing<em>  tag     ...
>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>  I expect solr to return
>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>> <em>srchtrm2</em>
>>
>> When I search for 'US mec chile', here is how my result appears
>>   ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES:
>> We had ... with<em>US</em>  and<em>Chile</em>  ...,
>>   (<em>MEC)</b></p><p></p><p><b>US</em>    ....
>>
>> This is what I was expecting it to be
>>   ... Corboba. 
>> (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>> FIRES: We had ... with<em>US</em>  and<em>Chile</em>  ..., 
>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>  ....
>>
>> Here is my query params
>> -<response>
>> -<lst name="responseHeader">
>>   <int name="status">0</int>
>>   <int name="QTime">26</int>
>> -<lst name="params">
>>      <str name="hl.fragsize">100000</str>
>>      <str name="explainOther" />
>>      <str name="indent">on</str>
>>      <str name="hl.fl">story, slug</str>
>>      <str name="wt">standard</str>
>>      <str name="hl">on</str>
>>      <str name="rows">10</str>
>>      <str name="version">2.2</str>
>>      <str name="hl.highlightMultiTerm">true</str>
>>      <str name="fl">*</str>
>>      <str name="start">0</str>
>>      <str name="q">mec us chile</str>
>>      <str name="qt">standard</str>
>>      <str name="hl.usePhraseHighlighter">true</str>
>>      <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>   </lst>
>>   </lst>
>>
>> Here are some other links I found in the forum, but no real 
>> conclusion
>>
>> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr
>> _
>> hi
>> ghlighting_question#78163c42a67cb533
>>
>> I am going to try this patch, which also had no conclusive results
>>    https://issues.apache.org/jira/browse/SOLR-1394
>>
>> Has anyone come across this issue?
>> Any suggestions on how to fix this issue is much appreciated.
>>
>>
>> thanks&  regards,
>> Rajesh Ramana
>>
>



Re: Solr - Multi Term highlighting issue

Posted by Koji Sekiguchi <ko...@r.email.ne.jp>.
How are your hl.fl fields defined in schema.xml?

Koji
-- 
http://www.rondhuit.com/en/

(11/04/23 1:23), Ramanathapuram, Rajesh wrote:
> Does anybody has other suggestions?
>
> thanks&  regards,
> Rajesh Ramana
> Enterprise Applications, Turner Broadcasting System, Inc.
> 404.878.7474
>
>
> -----Original Message-----
> From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com]
> Sent: Wednesday, April 20, 2011 2:51 PM
> To: solr-user@lucene.apache.org
> Subject: RE: Solr - Multi Term highlighting issue
>
> Thanks Erick.
>
> I tried your suggestion, the issue still exists.
>
> http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2&q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeContiguous=false
>
> -<lst name="params">
>    <str name="hl.fragsize">100000</str>
>    <str name="explainOther" />
>    <str name="indent">on</str>
>    <str name="hl.mergeContiguous">false</str>  ....
>
>
> ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...
>
>
> thanks&  regards,
> Rajesh Ramana
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerickson@gmail.com]
> Sent: Wednesday, April 20, 2011 11:59 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr - Multi Term highlighting issue
>
> Does your configuration have "hl.mergeContiguous" set to true by any chance? And what happens if you explicitly set this to "false" on your query?
>
> Best
> Erick
>
> On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh<Ra...@turner.com>  wrote:
>> Hello,
>>
>> I am dealing with a highlighting issue in SOLR, I will try to explain
>> the issue.
>>
>> When I search for a single term in solr, it wraps<em>  tag around the
>> words I want to highlight, all works well.
>> But if I search multiple term, for most part highlighting works good
>> and then for some of the terms, the highlight return multiple terms in
>> a sing<em>  tag     ...
>> <em>srchtrm1)<br><b><p>.... srchtrm2</em>  I expect solr to return
>> highlight terms like    ...<em>srchtrm1</em>)<br><b><p>...
>> <em>srchtrm2</em>
>>
>> When I search for 'US mec chile', here is how my result appears
>>   ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES:
>> We had ... with<em>US</em>  and<em>Chile</em>  ...,
>>   (<em>MEC)</b></p><p></p><p><b>US</em>    ....
>>
>> This is what I was expecting it to be
>>   ... Corboba. (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
>> FIRES: We had ... with<em>US</em>  and<em>Chile</em>  ...,
>> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em>  ....
>>
>> Here is my query params
>> -<response>
>> -<lst name="responseHeader">
>>   <int name="status">0</int>
>>   <int name="QTime">26</int>
>> -<lst name="params">
>>      <str name="hl.fragsize">100000</str>
>>      <str name="explainOther" />
>>      <str name="indent">on</str>
>>      <str name="hl.fl">story, slug</str>
>>      <str name="wt">standard</str>
>>      <str name="hl">on</str>
>>      <str name="rows">10</str>
>>      <str name="version">2.2</str>
>>      <str name="hl.highlightMultiTerm">true</str>
>>      <str name="fl">*</str>
>>      <str name="start">0</str>
>>      <str name="q">mec us chile</str>
>>      <str name="qt">standard</str>
>>      <str name="hl.usePhraseHighlighter">true</str>
>>      <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>>   </lst>
>>   </lst>
>>
>> Here are some other links I found in the forum, but no real conclusion
>>
>> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_
>> hi
>> ghlighting_question#78163c42a67cb533
>>
>> I am going to try this patch, which also had no conclusive results
>>    https://issues.apache.org/jira/browse/SOLR-1394
>>
>> Has anyone come across this issue?
>> Any suggestions on how to fix this issue is much appreciated.
>>
>>
>> thanks&  regards,
>> Rajesh Ramana
>>
>



RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Does anybody has other suggestions?

thanks & regards,
Rajesh Ramana 
Enterprise Applications, Turner Broadcasting System, Inc.
404.878.7474 


-----Original Message-----
From: Ramanathapuram, Rajesh [mailto:Rajesh.Ramanathapuram@turner.com] 
Sent: Wednesday, April 20, 2011 2:51 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr - Multi Term highlighting issue

Thanks Erick. 

I tried your suggestion, the issue still exists.

http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2&q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeContiguous=false

- <lst name="params">
  <str name="hl.fragsize">100000</str>
  <str name="explainOther" />
  <str name="indent">on</str>
  <str name="hl.mergeContiguous">false</str> ....


... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...


thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com]
Sent: Wednesday, April 20, 2011 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Does your configuration have "hl.mergeContiguous" set to true by any chance? And what happens if you explicitly set this to "false" on your query?

Best
Erick

On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh <Ra...@turner.com> wrote:
> Hello,
>
> I am dealing with a highlighting issue in SOLR, I will try to explain 
> the issue.
>
> When I search for a single term in solr, it wraps <em> tag around the 
> words I want to highlight, all works well.
> But if I search multiple term, for most part highlighting works good 
> and then for some of the terms, the highlight return multiple terms in 
> a sing <em> tag     ...
> <em>srchtrm1) <br><b><p>.... srchtrm2</em> I expect solr to return 
> highlight terms like    ... <em>srchtrm1</em>) <br><b><p>...
> <em>srchtrm2</em>
>
> When I search for 'US mec chile', here is how my result appears
>  ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES: 
> We had ... with <em>US</em> and <em>Chile</em> ...,
>  (<em>MEC)</b></p><p></p><p><b>US</em>  ....
>
> This is what I was expecting it to be
>  ... Corboba. (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
> FIRES: We had ... with <em>US</em> and <em>Chile</em> ..., 
> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em> ....
>
> Here is my query params
> - <response>
> - <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">26</int>
> - <lst name="params">
>     <str name="hl.fragsize">100000</str>
>     <str name="explainOther" />
>     <str name="indent">on</str>
>     <str name="hl.fl">story, slug</str>
>     <str name="wt">standard</str>
>     <str name="hl">on</str>
>     <str name="rows">10</str>
>     <str name="version">2.2</str>
>     <str name="hl.highlightMultiTerm">true</str>
>     <str name="fl">*</str>
>     <str name="start">0</str>
>     <str name="q">mec us chile</str>
>     <str name="qt">standard</str>
>     <str name="hl.usePhraseHighlighter">true</str>
>     <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>  </lst>
>  </lst>
>
> Here are some other links I found in the forum, but no real conclusion
>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_
> hi
> ghlighting_question#78163c42a67cb533
>
> I am going to try this patch, which also had no conclusive results
>   https://issues.apache.org/jira/browse/SOLR-1394
>
> Has anyone come across this issue?
> Any suggestions on how to fix this issue is much appreciated.
>
>
> thanks & regards,
> Rajesh Ramana
>

RE: Solr - Multi Term highlighting issue

Posted by "Ramanathapuram, Rajesh" <Ra...@turner.com>.
Thanks Erick. 

I tried your suggestion, the issue still exists.

http://localhost:8983/searchsolr/mainCore/select?indent=on&version=2.2&q=mec+us+chile&fq=storyid%3DXXXXXXX%22&start=0&rows=10&fl=*&qt=standard&wt=standard&explainOther=&hl=on&hl.fl=story%2C+slug&hl.fragsize=100000&hl.highlightMultiTerm=true&hl.usePhraseHighlighter=true&hl.mergeContiguous=false

- <lst name="params">
  <str name="hl.fragsize">100000</str> 
  <str name="explainOther" /> 
  <str name="indent">on</str> 
  <str name="hl.mergeContiguous">false</str> ....


... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES ...


thanks & regards,
Rajesh Ramana 


-----Original Message-----
From: Erick Erickson [mailto:erickerickson@gmail.com] 
Sent: Wednesday, April 20, 2011 11:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr - Multi Term highlighting issue

Does your configuration have "hl.mergeContiguous" set to true by any chance? And what happens if you explicitly set this to "false" on your query?

Best
Erick

On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh <Ra...@turner.com> wrote:
> Hello,
>
> I am dealing with a highlighting issue in SOLR, I will try to explain 
> the issue.
>
> When I search for a single term in solr, it wraps <em> tag around the 
> words I want to highlight, all works well.
> But if I search multiple term, for most part highlighting works good 
> and then for some of the terms, the highlight return multiple terms in 
> a sing <em> tag     ...
> <em>srchtrm1) <br><b><p>.... srchtrm2</em> I expect solr to return 
> highlight terms like    ... <em>srchtrm1</em>) <br><b><p>... 
> <em>srchtrm2</em>
>
> When I search for 'US mec chile', here is how my result appears
>  ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES: 
> We had ... with <em>US</em> and <em>Chile</em> ...,
>  (<em>MEC)</b></p><p></p><p><b>US</em>  ....
>
> This is what I was expecting it to be
>  ... Corboba. (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
> FIRES: We had ... with <em>US</em> and <em>Chile</em> ..., 
> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em> ....
>
> Here is my query params
> - <response>
> - <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">26</int>
> - <lst name="params">
>     <str name="hl.fragsize">100000</str>
>     <str name="explainOther" />
>     <str name="indent">on</str>
>     <str name="hl.fl">story, slug</str>
>     <str name="wt">standard</str>
>     <str name="hl">on</str>
>     <str name="rows">10</str>
>     <str name="version">2.2</str>
>     <str name="hl.highlightMultiTerm">true</str>
>     <str name="fl">*</str>
>     <str name="start">0</str>
>     <str name="q">mec us chile</str>
>     <str name="qt">standard</str>
>     <str name="hl.usePhraseHighlighter">true</str>
>     <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>  </lst>
>  </lst>
>
> Here are some other links I found in the forum, but no real conclusion
>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_
> hi
> ghlighting_question#78163c42a67cb533
>
> I am going to try this patch, which also had no conclusive results
>   https://issues.apache.org/jira/browse/SOLR-1394
>
> Has anyone come across this issue?
> Any suggestions on how to fix this issue is much appreciated.
>
>
> thanks & regards,
> Rajesh Ramana
>

Re: Solr - Multi Term highlighting issue

Posted by Erick Erickson <er...@gmail.com>.
Does your configuration have "hl.mergeContiguous" set to true by any
chance? And what
happens if you explicitly set this to "false" on your query?

Best
Erick

On Wed, Apr 20, 2011 at 9:43 AM, Ramanathapuram, Rajesh
<Ra...@turner.com> wrote:
> Hello,
>
> I am dealing with a highlighting issue in SOLR, I will try to explain
> the issue.
>
> When I search for a single term in solr, it wraps <em> tag around the
> words I want to highlight, all works well.
> But if I search multiple term, for most part highlighting works good and
> then for some of the terms,
> the highlight return multiple terms in a sing <em> tag     ...
> <em>srchtrm1) <br><b><p>.... srchtrm2</em>
> I expect solr to return highlight terms like    ... <em>srchtrm1</em>)
> <br><b><p>... <em>srchtrm2</em>
>
> When I search for 'US mec chile', here is how my result appears
>  ... Corboba. (<em>MEC)</b></p><p></p><p><b>CHILE</em>/FOREST FIRES: We
> had ... with <em>US</em> and <em>Chile</em> ...,
>  (<em>MEC)</b></p><p></p><p><b>US</em>  ....
>
> This is what I was expecting it to be
>  ... Corboba. (<em>MEC</em>)</b></p><p></p><p><b><em>CHILE</em>/FOREST
> FIRES: We had ... with <em>US</em> and <em>Chile</em> ...,
> (<em>MEC</em>)</b></p><p></p><p><b><em>US</em> ....
>
> Here is my query params
> - <response>
> - <lst name="responseHeader">
>  <int name="status">0</int>
>  <int name="QTime">26</int>
> - <lst name="params">
>     <str name="hl.fragsize">100000</str>
>     <str name="explainOther" />
>     <str name="indent">on</str>
>     <str name="hl.fl">story, slug</str>
>     <str name="wt">standard</str>
>     <str name="hl">on</str>
>     <str name="rows">10</str>
>     <str name="version">2.2</str>
>     <str name="hl.highlightMultiTerm">true</str>
>     <str name="fl">*</str>
>     <str name="start">0</str>
>     <str name="q">mec us chile</str>
>     <str name="qt">standard</str>
>     <str name="hl.usePhraseHighlighter">true</str>
>     <str name="fq">storyid="XXXX XXXX XXXXX"</str>
>  </lst>
>  </lst>
>
> Here are some other links I found in the forum, but no real conclusion
>
> http://www.lucidimagination.com/search/document/ac64e4f0abb6e4fc/solr_hi
> ghlighting_question#78163c42a67cb533
>
> I am going to try this patch, which also had no conclusive results
>   https://issues.apache.org/jira/browse/SOLR-1394
>
> Has anyone come across this issue?
> Any suggestions on how to fix this issue is much appreciated.
>
>
> thanks & regards,
> Rajesh Ramana
>