You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/03/13 05:29:45 UTC

[GitHub] [lucene] zacharymorn opened a new pull request #16: LUCENE-9634: Fix highlighting of extended intervals matched using offset

zacharymorn opened a new pull request #16:
URL: https://github.com/apache/lucene/pull/16


   **This PR is currently in draft state**
   
   # Description
   
   Fix highlighting of extended intervals matched using offset
   
   # Proposed Solution
   
   In `OffsetsFromMatchIterator`, retrieves `ExtendedIntervalsSource` and its `before` / `after` position values, and use them to adjust highlight offset range matched by offset
   
   # Tests
   * gradle precommit currently failing for nocommit comment
   * passed previously failed test
   * passed new tests
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/lucene/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Lucene maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `main` branch.
   - [x] I have run `./gradlew check`.
   - [x] I have added tests for my changes.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on a change in pull request #16: LUCENE-9634: Fix highlighting of extended intervals matched using offset

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on a change in pull request #16:
URL: https://github.com/apache/lucene/pull/16#discussion_r593615132



##########
File path: lucene/core/src/java/org/apache/lucene/search/DisjunctionMatchesIterator.java
##########
@@ -250,6 +250,10 @@ public MatchesIterator getSubMatches() throws IOException {
 
   @Override
   public Query getQuery() {
-    return queue.top().getQuery();
+    if (queue.size() > 0) {

Review comment:
       This change is needed to fix two other tests that would throw NPE from here when `matchesIterator.getQuery()` is called in `OffsetsFromMatchIterator`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on pull request #16: LUCENE-9634: Fix highlighting of extended intervals matched using offset

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #16:
URL: https://github.com/apache/lucene/pull/16#issuecomment-810722944


   Hi @romseygeek, just want to circle back to this PR to see if you could provide any guidance here? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] zacharymorn commented on pull request #16: LUCENE-9634: Fix highlighting of extended intervals matched using offset

Posted by GitBox <gi...@apache.org>.
zacharymorn commented on pull request #16:
URL: https://github.com/apache/lucene/pull/16#issuecomment-798974587


   No problem Dawid and thanks for the feedback! I agree that re-analyzing the token stream does seems to be wasteful if the index already has position and offset information. I think the central issue here is that the current `Intervals#extend` API here only takes before / after values based on position instead of offset. But on the other hand, additional API that takes in / provides offset based before / after values may not be very meaningful to most client applications (I think), as offset varies based on each token length.  So highlighting offset range computed based on non-positional APIs may always require adjustment here if the before / after values are provided based on position (this may also include `OffsetsFromTokens` and `OffsetsFromTokens` strategies as well). 
   
   https://github.com/apache/lucene/blob/471f38c0310671f9856ad86ad98c4238d2b770dc/lucene/queries/src/java/org/apache/lucene/queries/intervals/Intervals.java#L251-L252
   
   However, I'm also wondering if the position to / from offset mappings were already computed / stored for other purposes and readily available in OffsetRetrievalStrategy classes? If so, we could skip the re-analysis as well.
   
   Alan, could you please let me know if you have any pointer here? I'm happy to try out different approaches to find the best solution here.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org