You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by "Timothy M. Rodriguez (JIRA)" <ji...@apache.org> on 2016/10/27 19:28:58 UTC

[jira] [Created] (LUCENE-7526) Improvements to UnifiedHighlighter OffsetStrategies

Timothy M. Rodriguez created LUCENE-7526:
--------------------------------------------

             Summary: Improvements to UnifiedHighlighter OffsetStrategies
                 Key: LUCENE-7526
                 URL: https://issues.apache.org/jira/browse/LUCENE-7526
             Project: Lucene - Core
          Issue Type: Improvement
            Reporter: Timothy M. Rodriguez
            Priority: Minor


This ticket improves several of the UnifiedHighlighter FieldOffsetStrategies by reducing reliance on creating or re-creating TokenStreams.

The primary changes are as follows:

* AnalysisOffsetStrategy - split into two offset strategies
  * MemoryIndexOffsetStrategy - the primary analysis mode that utilizes a MemoryIndex for producing Offsets
  * TokenStreamOffsetStrategy - an offset strategy that avoids creating a MemoryIndex.  Can only be used if the query distills down to terms and automata.

* TokenStream removal 
  * MemoryIndexOffsetStrategy - previously a TokenStream was created to fill the memory index and then once consumed a new one was generated by uninverting the MemoryIndex back into a TokenStream if there were automata (wildcard/mtq queries) involved.  Now this is avoided, which should save memory and avoid a second pass over the data.
  * TermVectorOffsetStrategy - this was refactored in a similar way to avoid generating a TokenStream if automata are involved.
  * PostingsWithTermVectorsOffsetStrategy - similar refactoring

* CompositePostingsEnum - aggregates several underlying PostingsEnums for wildcard/mtq queries.  This should improve relevancy by providing unified metrics for a wildcard across all it's term matches

* Added a HighlightFlag for enabling the newly separated TokenStreamOffsetStrategy since it can adversely affect passage relevancy



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org