You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Oliver Messner <me...@synyx.de> on 2010/12/30 13:36:05 UTC

Highlighter problem when using WordDelimiterFilter and term vectors

Hi,

when using WordDelimiterFilterFactory in the fieldType definition and
setting termVectors="true" termPositions="true" termOffsets="true" on
the field, Solr gives me the following response for the query request
?q=warmwasserspeicher&version=2.2&indent=on&hl=true

<lst name="highlighting">
  <lst name="id-1">
    <arr name="content">
      <str>some text Warm<em>WarmWasserSpeicher</em> here</str>
    </arr>
  </lst>
</lst>

As you can see, the highlighter does not work like expected (at least
for me). If the term vectors are not stored into the index, I get the
expected result <str>some text <em>WarmWasserSpeicher</em> here</str>.

I'm using Solr version 1.4.1
BTW, this problem does not occur when using the FastVectorHighlighter
(after applying patches https://issues.apache.org/jira/browse/SOLR-1268)


Any ideas?


Uploaded document:
<add>
  <doc>
    <field name="id">id-1</field>
    <field name="content">some text WarmWasserSpeicher here</field>
  </doc>
</add>


Field type definition:
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory"/>
    <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="0"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>


Field definition:
<fields>
  ...
  <field name="content" type="text" indexed="true" stored="true"
termVectors="true" termPositions="true" termOffsets="true"/>
</fields>


solrconf.xml:
<requestHandler name="dismax" class="solr.SearchHandler" default="true">
  <lst name="defaults">
    <bool name="tv">true</bool>
    <str name="defType">dismax</str>
    <str name="qf">content</str>
    <str name="mm">1</str>
    <str name="hl">true</str>
    <str name="fl">score</str>
  </lst>
  <arr name="last-components">
    <str>tvComponent</str>
  </arr>
</requestHandler>
...
<searchComponent name="tvComponent"
class="org.apache.solr.handler.component.TermVectorComponent"/>


Thanks,
Oliver