You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/08/15 23:49:58 UTC

[GitHub] [lucene] elliotzlin opened a new pull request, #1069: [LUCENE-2587] Highlighter fragment bug

elliotzlin opened a new pull request, #1069:
URL: https://github.com/apache/lucene/pull/1069

   ### Description (or a Jira issue link if you have one)
   [LUCENE-2587](https://issues.apache.org/jira/browse/LUCENE-2587)
   
   The issue has a good write up of the bug.
   
   To summarize, we start new fragments at the end offset of the previous fragment instead of the start offset of the first token of the fragment, which potentially introduces spurious un-analyzed chars in the fragment. To take the test case as an example, we analyze out punctuation when tokenizing the string. However when highlighting the fragment containing the hit we get a fragment that starts with a period `.`.
   
   The fix here starts new fragments at the start offset of the token that leads the new fragment. We also store the end offset of the antecedent fragment so we can use that to determine whether we can merge contiguous fragments.
   <!--
   If this is your first contribution to Lucene, please make sure you have reviewed the contribution guide.
   https://github.com/apache/lucene/blob/main/CONTRIBUTING.md
   -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dsmiley commented on pull request #1069: [LUCENE-2587] Highlighter fragment bug

Posted by GitBox <gi...@apache.org>.
dsmiley commented on PR #1069:
URL: https://github.com/apache/lucene/pull/1069#issuecomment-1278245086

   If only we renamed "Highlighter" to "OriginalHighlighter", maybe folks wouldn't continue to using this thing.  Is the UnifiedHighlighter not satisfying you, and if so, why not?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [PR] [LUCENE-2587] Highlighter fragment bug [lucene]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #1069:
URL: https://github.com/apache/lucene/pull/1069#issuecomment-1880904461

   This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the dev@lucene.apache.org list. Thank you for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] elliotzlin commented on pull request #1069: [LUCENE-2587] Highlighter fragment bug

Posted by "elliotzlin (via GitHub)" <gi...@apache.org>.
elliotzlin commented on PR #1069:
URL: https://github.com/apache/lucene/pull/1069#issuecomment-1711167476

   @dsmiley apologies for my delay in getting back to your comment! I don't have any qualms about refactoring to deter people from using this. I took up this ticket more so to get involved with contributing to the Lucene project and found this in the backlog, and less so because I was using the Highlighter in a project.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org