You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@solr.apache.org by Clive Lewis <vi...@gmail.com> on 2021/08/13 08:39:26 UTC

UnifiedHighlighter BreakIterator

Hello!

*Problem:*
I have a multivalue field that stores paragraphs of the text. (1 paragraph
= 1 value). position gap between values = 5000. Right now I use
fastVectorHighlighter and it works as expected for queries like "Big Bang
Theory"~5000 (because of 5000 slop it searches only inside of one value
(paragraph)). But apparently fastVectorHighlighter doesn't support phrase
queries with the wrong word order. So if I do "Big Theory Bang"~5000, it
will find the document, but won't find the snippet.

*Possible solution:*
I noticed that UnifiedHiglighter supports the slop and it returns snippets
for the query above. But sometimes snippets are empty for queries where
even fastVectorHighlighter returned something. I assume it's because of how
UnifiedHighlighter splits the text.

*Question:*
I want to make so UnifiedHighlighter will search for snippets in each value
of the field separately.
As I understand, by default UH splits text by sentences. You can modify
that using the parameter *hl.bs.type. *Possible values are CHARACTER, WORD,
LINE, SENTENCE, WHOLE, SEPARATOR. But how can I tell him to "treat each
value of my multivalue field separately and search inside of it"?

*I probably can add a special constant symbol at the end of each paragraph
and split it by SEPARATOR, but it feels like a shitty hack, plus it will
require me to reindex millions of documents. *