You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by "Ing. Jorge Luis Betancourt Gonzalez" <jl...@uci.cu> on 2013/11/13 16:38:59 UTC
Strange behavior of gap fragmenter on highlighting
I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is my configuration for the gap fragmenter:
<fragmenter name="gap"
default="true"
class="solr.highlight.GapFragmenter">
<lst name="defaults">
<int name="hl.fragsize">150</int>
</lst>
</fragmenter>
This is the basic configuration, just tweaked the fragsize parameter to get shorter fragments. The thing is that for 1 particular PDF document in my results I get a really long snippet, way over 150 characters. This get a little more odd, if I change the 150 value for 100 the snippet for the same document it's normal ~ 100 characters. The type of the field being highlighted is this:
<fieldType name="text" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.ISOLatin1AccentFilterFactory"/>
<filter class="solr.SnowballPorterFilterFactory" languange="Spanish"/>
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<filter class="solr.StopFilterFactory"
ignoreCase="true" words="stopwords.txt"/>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1"
catenateWords="1" catenateNumbers="1" catenateAll="0"
splitOnCaseChange="1" types="characters.txt"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Any ideas about what's happening?? Or how could I debug what is really going on??
Greetings!
________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu