You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@solr.apache.org by "David Smiley (Jira)" <ji...@apache.org> on 2021/03/14 17:13:00 UTC

[jira] [Created] (SOLR-15260) Precompute snippet delimiter breaks for the UnifiedHighlighter

David Smiley created SOLR-15260:
-----------------------------------

             Summary: Precompute snippet delimiter breaks for the UnifiedHighlighter
                 Key: SOLR-15260
                 URL: https://issues.apache.org/jira/browse/SOLR-15260
             Project: Solr
          Issue Type: Improvement
      Security Level: Public (Default Security Level. Issues are Public)
          Components: highlighter
            Reporter: David Smiley


The "BreakIterator" implementation inside the UnifiedHighlighter can play a significant role in the performance of highlighting.  The default ones are based in the JDK and thus we don't have control over them but they may very well be optimized but have a complicated job to do.  I propose that the break locations be computed at indexing time in a Solr UpdateRequestProcessor and place them into a pre analyzed common field named maybe {{\_highlighter_breaks_}} that needs indexed=true plus offsets.  In this field, the term is the actual field name, the position is meaningless, and the offset pair refers to the span of the break iterator (typically a sentence).  This data can be efficiently stored in Lucene.  The UnifiedHighlighter already has a flexible BreakIterator producer but it's not notified of the current document, and so changes would be needed there (separate LUCENE issue).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)