You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christian Vogler <ch...@gmail.com> on 2008/02/27 09:16:19 UTC

Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)

On Wednesday 27 February 2008 03:58:14 Chris Hostetter wrote:
> I'm not much of a highligher expert, but this *seems* like it was probably
> intentional ... you are tlaking abouthte use case where you have a stored
> field, and no term positions correct? ... so in order to highlight, the
> highlighter needs to analyzed the stored text to find the word positions?

Yes, that is correct. I index and store the field, and have term positions 
disabled. Your explanation makes sense, thanks. 

However, to follow up, I have run into some strange highlighter behavior on 
multi-valued text fields. In particular, I have a field like this:

<fieldType name="text_de" class="solr.TextField" 
positionIncrementGap="100">...</fieldType>

The analyzers for indexing and query are identical, except that I put a 
compound word splitter in the indexer chain. I use this in a multi-valued 
category field:

<field name="category" type="text_de" indexed="true" stored="true" 
multiValued="true" />

Typical values from documents are:
<arr name="category"><str>Gebärdensprache</str><str>Recht</str></arr>

where the indexed terms, after analysis are: "gebard" "sprach" and "recht", 
respectively. Now, if I query for "Gebärden" (which the analyzer transforms 
into "gebard"), I get matches, as expected, but the highlighter retrieves 
only the match on the first token of the first field, like this:

<arr name="category"><str>&lt;em&gt;Gebärden&lt;/em&gt;</str></arr>

The fragment, snippet, and merging parameters have no effect on this behavior; 
hl.requireFieldMatch is off; hl.fragmenter is gap.

What is a bit strange is that If the field have only one value, then the 
highlighter retrieves the entire contents of the field; that is, if we have 
indexed

<arr name="category"><str>Gebärdensprache</str></arr>

then the highlighter will show

<arr name="category"><str>&lt;em&gt;Gebärden&lt;/em&gt;sprache</str></arr>

which is the behavior that I expected, irrespective of whether the field has 
one or more values.

Any idea what could be going on here?

Best regards
- Christian

Re: Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)

Posted by Chris Hostetter <ho...@fucit.org>.
: which is the behavior that I expected, irrespective of whether the field has 
: one or more values.
: 
: Any idea what could be going on here?

not really ... but like i said, i'm not really a "highlighter guy".  I 
can't think of any reason why having multiple values would cause this 
behavior ... does the behavior change if the "value" that matches isn't 
the first one?  what if positionIncrimentGap="0" ?

either way, it seems like a bug to me ... unless someone else chimes in 
with a "that's by design because..." reply, i would open a bug and attach 
a small test case demonstrating the problem (which should be fairly 
straightforward since it doens't require a lot of data)



-Hoss