You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Raf <r....@gmail.com> on 2012/07/03 21:22:40 UTC

How to extract only highlight spans?

Hi,
is it possibile to use Lucene Highlighter classes to extract highlight
spans instead of getting the "highlighted" string?
I am using lucene 3.0.3 (and I cannot upgrade version for now).

I have the following snippet of code:

QueryScorer scorer = new QueryScorer(highlightQuery);  // already rewritten
scorer.init(tokenStream);
tokenStream.reset();

Highlighter highlighter = new Highlighter(formatter, scorer);
highlighter.setTextFragmenter(fragmenter); // a NullFragmenter
String bestFragments = highlighter.getBestFragments(tokenStream,
textToHighlight, maxNumFragments, fragmentsSeparator);

This returns the highlighted text (with html spans in it).

Instead, I would like to be able to get only a list of "spans" (e.g. <4,10>
<15,27> ...) that correspond to text positions (same positions read by
tokenStream) to highlight.
I need them because I have to merge lucene query highlight with some custom
highlight info (already expressed as start/end spans) and it is very
difficult to merge the two info if lucene gives me only the highlighted
text.

Is there a way to extract this information using only the user query, the
text to highlight and the token stream of the search field?

Thank you in advance.

Bye
*Raf*