You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bjørn Hjelle <bj...@gmail.com> on 2015/12/21 14:34:00 UTC

Solr 5.4, NGramFilterFactory highlighting

Hi,

I have problems getting hit highlighting to work in NGram-fields, with
search terms longer than 8 characters.
Without the luceneMatchVersion="4.3" parameter in the field type
definition, the whole word is highlighted, not just the search term.


Here are the exact steps to reproduce the issue:

Download Solr 5.4.0:

$ wget http://archive.apache.org/dist/lucene/solr/5.4.0/solr-5.4.0.tgz
$ tar xvfx solr-5.4.0.tgz

Start solr:

$ cd solr-5.4.0
$ bin/solr start

In another command prompt, create a core:

$ bin/solr create_core -c test -d
server/solr/configsets/sample_techproducts_configs


Add to server/solr/test/conf/schema.xml:


    <dynamicField name="*_ngram" type="text_ngram"    indexed="true"
stored="true"/>

    <fieldType name="text_ngram" class="solr.TextField">
         <analyzer type="index">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
            <filter class="solr.NGramFilterFactory" maxGramSize="20"
minGramSize="3" luceneMatchVersion="4.3"/>
        </analyzer>
        <analyzer type="query">
            <tokenizer class="solr.StandardTokenizerFactory"/>
            <filter class="solr.LowerCaseFilterFactory"/>
        </analyzer>
    </fieldType>

Reload the core to pick up config changes:
$ curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"


Create file doc.xml with contents:

<add>
  <doc>
    <field name="id">DOC2</field>
    <field name="name_ngram">thisisalongword in the document</field>
  </doc>
</add>


Index the document:

$ bin/post -c test doc.xml


Perform a search that shows that we find the document and the search term
is highlighted:
http://localhost:8983/solr/test/select?q=name_ngram%3Athis&wt=json&indent=true&hl=true&hl.fl=name_ngram&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

  "highlighting":{
    "DOC2":{
      "name_ngram":["<em>this</em>isalongword in the document"]}}}


Add more characters to the search term, we still find the document, but the
search term is now NOT highlighted:

http://localhost:8983/solr/test/select?q=name_ngram%3Athisisalong&wt=json&indent=true&hl=true&hl.fl=name_ngram&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E

  "highlighting":{
    "DOC2":{
      "name_ngram":["thisisalongword in the document"]}}}


Thank you,
Bjørn Hjelle