You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2022/11/30 13:19:57 UTC

[GitHub] [lucene] romseygeek opened a new pull request, #11990: Don't let merged passages push out lower-scoring ones

romseygeek opened a new pull request, #11990:
URL: https://github.com/apache/lucene/pull/11990

   PassageScorer uses a priority queue of size `maxPassages` to keep track of
   which highlighted passages are worth returning to the user.  Once all
   passages have been collected, we go through and merge overlapping
   passages together, but this reduction in the number of passages is not 
   compensated for by re-adding the highest-scoring passages that were pushed
   out of the queue by passages which have been merged away.
   
   This commit increases the size of the priority queue to try and account for
   overlapping passages that will subsequently be merged together.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] romseygeek merged pull request #11990: Don't let merged passages push out lower-scoring ones

Posted by GitBox <gi...@apache.org>.
romseygeek merged PR #11990:
URL: https://github.com/apache/lucene/pull/11990


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a diff in pull request #11990: Don't let merged passages push out lower-scoring ones

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #11990:
URL: https://github.com/apache/lucene/pull/11990#discussion_r1036450814


##########
lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/PassageSelector.java:
##########
@@ -89,8 +89,9 @@ public List<Passage> pickBest(
     }
 
     // Best passages so far.
+    int pqSize = Math.max(markers.size(), maxPassages);

Review Comment:
   This can make the priority queue huge in degenerate cases when there is a lot of markers. And there may be for queries that expand to hundreds of terms (prefix queries are an example of this). I wonder if this shouldn't be a function of maxPassages rather than the number of markers...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a diff in pull request #11990: Don't let merged passages push out lower-scoring ones

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #11990:
URL: https://github.com/apache/lucene/pull/11990#discussion_r1036988933


##########
lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/PassageSelector.java:
##########
@@ -89,8 +89,9 @@ public List<Passage> pickBest(
     }
 
     // Best passages so far.
+    int pqSize = Math.max(markers.size(), maxPassages);

Review Comment:
   That sounds good to me as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] romseygeek commented on a diff in pull request #11990: Don't let merged passages push out lower-scoring ones

Posted by GitBox <gi...@apache.org>.
romseygeek commented on code in PR #11990:
URL: https://github.com/apache/lucene/pull/11990#discussion_r1036896059


##########
lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/PassageSelector.java:
##########
@@ -89,8 +89,9 @@ public List<Passage> pickBest(
     }
 
     // Best passages so far.
+    int pqSize = Math.max(markers.size(), maxPassages);

Review Comment:
   > min(markers.size(), maxPassages * 3)
   
   Another option might be to just have a minimum size of 16, given that this is only really a problem when you're asking for two or three passages.  Once you get above 10 passages then one or two less becomes less of an issue



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] dweiss commented on a diff in pull request #11990: Don't let merged passages push out lower-scoring ones

Posted by GitBox <gi...@apache.org>.
dweiss commented on code in PR #11990:
URL: https://github.com/apache/lucene/pull/11990#discussion_r1036458813


##########
lucene/highlighter/src/java/org/apache/lucene/search/matchhighlight/PassageSelector.java:
##########
@@ -89,8 +89,9 @@ public List<Passage> pickBest(
     }
 
     // Best passages so far.
+    int pqSize = Math.max(markers.size(), maxPassages);

Review Comment:
   I looked at it and I think it was actually a deliberate decision (the selection of passages being independent from merging and capped at the user-requested max) so that regardless of how many markers there are, the overhead of the pq remains fairly low. I realize viewpoints will vary - I use this code in cases where the highlighter takes hits from multiple queries (and like I said, there can be hundreds of markers...). This change will degrade the performance significantly at literally no gain.
   
   I'd try overestimating pqSize based on maxPassages: say, min(markers.size(), maxPassages * 3). The parameter could even be configurable so that the overhead can be tuned from the outside (with a reasonable default). WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org