You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Bjørn Hjelle <bj...@gmail.com> on 2015/12/21 14:34:00 UTC
Solr 5.4, NGramFilterFactory highlighting
Hi,
I have problems getting hit highlighting to work in NGram-fields, with
search terms longer than 8 characters.
Without the luceneMatchVersion="4.3" parameter in the field type
definition, the whole word is highlighted, not just the search term.
Here are the exact steps to reproduce the issue:
Download Solr 5.4.0:
$ wget http://archive.apache.org/dist/lucene/solr/5.4.0/solr-5.4.0.tgz
$ tar xvfx solr-5.4.0.tgz
Start solr:
$ cd solr-5.4.0
$ bin/solr start
In another command prompt, create a core:
$ bin/solr create_core -c test -d
server/solr/configsets/sample_techproducts_configs
Add to server/solr/test/conf/schema.xml:
<dynamicField name="*_ngram" type="text_ngram" indexed="true"
stored="true"/>
<fieldType name="text_ngram" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" maxGramSize="20"
minGramSize="3" luceneMatchVersion="4.3"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>
Reload the core to pick up config changes:
$ curl "http://localhost:8983/solr/admin/cores?action=RELOAD&core=test"
Create file doc.xml with contents:
<add>
<doc>
<field name="id">DOC2</field>
<field name="name_ngram">thisisalongword in the document</field>
</doc>
</add>
Index the document:
$ bin/post -c test doc.xml
Perform a search that shows that we find the document and the search term
is highlighted:
http://localhost:8983/solr/test/select?q=name_ngram%3Athis&wt=json&indent=true&hl=true&hl.fl=name_ngram&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
"highlighting":{
"DOC2":{
"name_ngram":["<em>this</em>isalongword in the document"]}}}
Add more characters to the search term, we still find the document, but the
search term is now NOT highlighted:
http://localhost:8983/solr/test/select?q=name_ngram%3Athisisalong&wt=json&indent=true&hl=true&hl.fl=name_ngram&hl.simple.pre=%3Cem%3E&hl.simple.post=%3C%2Fem%3E
"highlighting":{
"DOC2":{
"name_ngram":["thisisalongword in the document"]}}}
Thank you,
Bjørn Hjelle