You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2019/12/26 10:30:36 UTC

[GitHub] [lucene-solr] Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left

Traktormaster commented on a change in pull request #1123: LUCENE-9093: Unified highlighter with word separator never gives context to the left
URL: https://github.com/apache/lucene-solr/pull/1123#discussion_r361426082
 
 

 ##########
 File path: lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/LengthGoalBreakIterator.java
 ##########
 @@ -174,7 +175,48 @@ private int moveToBreak(int idx) { // precondition: idx is a known break
   // called at start of new Passage given first word start offset
   @Override
   public int preceding(int offset) {
-    return baseIter.preceding(offset); // no change needed
+    final int fragmentStart = Math.max(baseIter.preceding(offset), 0); // convert DONE to 0
+    fragmentEndFromPreceding = baseIter.following(fragmentStart);
 
 Review comment:
   Unfortunately no. The fragmentStart argument is the start of the match that could be anything depending on the tokenizer in the index analyzer chain. Even if we assume it's the start of a word or a phrase, the underlying BI can break on different places. In case of SENTENCE the preceding() call here will find the beginning of the sentence. In case of SEPARATOR, which is customizable by query, the breaks can be anywhere else.
   We could only assume fragmentStart is a break point if the underlying BI would be the same as the tokenizer in the index analyzer chain. (I'm not sure, but the query analyzer chain could be different I think.)

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org