You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "V. Svetov" <te...@gmail.com> on 2016/12/12 02:02:13 UTC

How to highlight and fragment Multi-value field

Hi all

How to provide highlighting for fragmented string which is created from
multi-value field using Lucene 5.3.2 ?
Is any known solution for it?

1. Or first approach -  merge all multi-value into one single value and
apply

highlighter.getBestTextFragments(tokenStream, text, false, maxNumFragments);

however we got few fragments which may break boundary between few original
values since no any delimiters are added to the string.


2. Second approach to get fragmented highlighted value from each value of
mutli-value field and then  form the indexed list from top scored
fragments)

We got :
   Caused by:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException:   Token
three exceeds length of provided text sized 32

org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:224)
if we use setStoreTermVectorOffsets when index the field.

If the index* does not *set setStoreTermVectorOffsets the exception is not
thrown
      FieldType fType = new FieldType();
     ...
    //  fType.setStoreTermVectorOffsets(true);

However the fragment size is much bigger than requested fragment size..
Please expose correct technique to get fragmented highlighted string for
munti-value field.


 Thanks