You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Szűcs Roland <sz...@bookandwalk.hu> on 2020/04/04 19:58:02 UTC

highlight if the field and hl.fl has different analysis

Hi folks,
I have a author field with very simple definition:
<field name="author" type="short_text_hu" multiValued="true" stored="true"/>

<fieldType name="short_text_hu" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
      <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="true"/>
    </analyzer>
    <analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

I have a suggester friendly definition of this field:
<copyField source="author" dest="author_ngram"/>
<field name="author_ngram" type="short_text_hu_suggester"
multiValued="true" indexed="true" stored="false"/>

<fieldType name="short_text_hu_suggester" class="solr.TextField"
positionIncrementGap="100" multiValued="true">
    <analyzer type="index">
      <tokenizer class="solr.StandardTokenizerFactory"/>
 <filter class="solr.LowerCaseFilterFactory"/>
 <filter class="solr.EdgeNGramFilterFactory" minGramSize="1"
maxGramSize="15"/>
      <filter class="solr.ASCIIFoldingFilterFactory"
preserveOriginal="true"/>
    </analyzer>
<analyzer type="query">
      <tokenizer class="solr.StandardTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
    </analyzer>
  </fieldType>

I do not use the suggester component as it gives back strings and I need
specific documents so I apply the following approch in solrconfig:

<requestHandler name="/suggestquery" class="solr.SearchHandler">
<lst name="defaults">
<str name="echoParams">all</str>
<str name="defType">edismax</str>
* <str name="qf">author_ngram^5 title_ngram^10</str>*
<str name="fl">id,imageUrl,title,price,author</str>
<str name="mm">3&lt;74%</str>
<str name="pf">author_ngram^15 title_ngram^30</str>
<str name="tie">0.1</str>

<str name="hl">true</str>
* <str name="hl.fl">author title</str>*
<str name="hl.method">original</str>
</lst>
</requestHandler>
As you see my queryparser searches in the author_ngram field which is a
copyfield and not stored of course. On the other hand I would like to show
to the customers the meaningful fields like author.

Despite of this the highlighter gives back partially good results:

If the author field is Arany János and I search for Arany Já, I get back
<b>Arany<-b> János. The second term is not highlighted.

I need help on two issues:
1. Why did it work even partially if the analysis of the query field and
the highlight fields are different?
2. If it is able to handle the different analysis what can I do to support
the multi field highlighting?

Thanks,
Roland