You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Paul Hill <pa...@metajure.com> on 2012/04/04 22:06:11 UTC

Hit Highlighting which highlighter to use?

Using the original org.apache.lucene.search.highlight.Highlighter should I be able to give it a query like [ My AND Words AND "My Words"^100 ] (the actually phrase in this query is converted to a span query with a slop 1),
and expect it find the fragment many pages into the file that has span "My Words" and rank it better than fragments earlier in the document with "My" and "Word" (or lots of "My" and "Words")?

I  ask because currently, I'm not getting the fragment with the phrase as the best fragment, and I go through some hacky post processing to look down the list for a "better" match, but I'm wondering if we have the HitHighlighter wired up wrong.

At this time, my index does not have offsets and positions vectors for all tokenized fields and the body "text" field just how positions.

I understand that FastVectorHighlighter is fast, but would it do a better job of finding the phrase or span in the text if I added positions and offsets to text?

When highlighting the small fields like title, path etc.  should I add term vector with positions and offset and use FastVectorHighlighter or is it just not worth storing that extra information just for highlighting?

-Paul