You are viewing a plain text version of this content. The canonical link for it is here.

Posted to solr-user@lucene.apache.org by "Ing. Jorge Luis Betancourt Gonzalez" <jl...@uci.cu> on 2013/11/13 16:38:59 UTC

Strange behavior of gap fragmenter on highlighting

I'm seeing a rare behavior of the gap fragmenter on solr 3.6. Right now this is my configuration for the gap fragmenter:

      <fragmenter name="gap"
                  default="true"
                  class="solr.highlight.GapFragmenter">
        <lst name="defaults">
          <int name="hl.fragsize">150</int>
        </lst>
      </fragmenter>

This is the basic configuration, just tweaked the fragsize parameter to get shorter fragments. The thing is that for 1 particular PDF document in my results I get a really long snippet, way over 150 characters. This get a little more odd, if I change the 150 value for 100 the snippet for the same document it's normal ~ 100 characters. The type of the field being highlighted is this:

        <fieldType name="text" class="solr.TextField"
            positionIncrementGap="100">
            <analyzer type="index">
                <tokenizer class="solr.WhitespaceTokenizerFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.ISOLatin1AccentFilterFactory"/>
                <filter class="solr.SnowballPorterFilterFactory" languange="Spanish"/>
                <charFilter class="solr.HTMLStripCharFilterFactory"/>
                <filter class="solr.StopFilterFactory"
                    ignoreCase="true" words="stopwords.txt"/>
                <filter class="solr.WordDelimiterFilterFactory"
                    generateWordParts="1" generateNumberParts="1"
                    catenateWords="1" catenateNumbers="1" catenateAll="0"
                    splitOnCaseChange="1" types="characters.txt"/>
                <filter class="solr.LowerCaseFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
            </analyzer>
        </fieldType>

Any ideas about what's happening?? Or how could I debug what is really going on??

Greetings!
________________________________________________________________________________________________
III Escuela Internacional de Invierno en la UCI del 17 al 28 de febrero del 2014. Ver www.uci.cu