You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Christian Vogler <ch...@gmail.com> on 2008/02/27 09:16:19 UTC
Seeing strange highlighting in multi-valued field (was: Why does highlight use the index analyzer)
On Wednesday 27 February 2008 03:58:14 Chris Hostetter wrote:
> I'm not much of a highligher expert, but this *seems* like it was probably
> intentional ... you are tlaking abouthte use case where you have a stored
> field, and no term positions correct? ... so in order to highlight, the
> highlighter needs to analyzed the stored text to find the word positions?
Yes, that is correct. I index and store the field, and have term positions
disabled. Your explanation makes sense, thanks.
However, to follow up, I have run into some strange highlighter behavior on
multi-valued text fields. In particular, I have a field like this:
<fieldType name="text_de" class="solr.TextField"
positionIncrementGap="100">...</fieldType>
The analyzers for indexing and query are identical, except that I put a
compound word splitter in the indexer chain. I use this in a multi-valued
category field:
<field name="category" type="text_de" indexed="true" stored="true"
multiValued="true" />
Typical values from documents are:
<arr name="category"><str>Gebärdensprache</str><str>Recht</str></arr>
where the indexed terms, after analysis are: "gebard" "sprach" and "recht",
respectively. Now, if I query for "Gebärden" (which the analyzer transforms
into "gebard"), I get matches, as expected, but the highlighter retrieves
only the match on the first token of the first field, like this:
<arr name="category"><str><em>Gebärden</em></str></arr>
The fragment, snippet, and merging parameters have no effect on this behavior;
hl.requireFieldMatch is off; hl.fragmenter is gap.
What is a bit strange is that If the field have only one value, then the
highlighter retrieves the entire contents of the field; that is, if we have
indexed
<arr name="category"><str>Gebärdensprache</str></arr>
then the highlighter will show
<arr name="category"><str><em>Gebärden</em>sprache</str></arr>
which is the behavior that I expected, irrespective of whether the field has
one or more values.
Any idea what could be going on here?
Best regards
- Christian
Re: Seeing strange highlighting in multi-valued field (was: Why does
highlight use the index analyzer)
Posted by Chris Hostetter <ho...@fucit.org>.
: which is the behavior that I expected, irrespective of whether the field has
: one or more values.
:
: Any idea what could be going on here?
not really ... but like i said, i'm not really a "highlighter guy". I
can't think of any reason why having multiple values would cause this
behavior ... does the behavior change if the "value" that matches isn't
the first one? what if positionIncrimentGap="0" ?
either way, it seems like a bug to me ... unless someone else chimes in
with a "that's by design because..." reply, i would open a bug and attach
a small test case demonstrating the problem (which should be fairly
straightforward since it doens't require a lot of data)
-Hoss