You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2021/07/20 12:27:19 UTC

[GitHub] [lucene] gtroitskiy opened a new pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

gtroitskiy opened a new pull request #217:
URL: https://github.com/apache/lucene/pull/217


   https://issues.apache.org/jira/browse/LUCENE-10030
   disabling scoring for near-miss hits, since it's quite heavy and it isn't being used anywhere
   please check the Jira issue for more details


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r673540568



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       but this will break `collectNearMiss` logic? `failedCollector` expects its own "near-miss" `collectDocId`:
   https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L598
   we can set some default score for `else` case to make it more consistent, what do you think? (pushed with --force)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r673540568



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       but this will break `collectNearMiss` logic? `failedCollector` expects its own "near-miss" `collectDocId`:
   https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java#L598
   we can set some default score for `else` case to make it more consistent, what do you think? (pushed with --force)
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r675819557



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       So it turns out, in a crazy stroke of coincidence, I actually need a fix for this for some work I'm doing as well. I sketched out an approach I _think_ works [here](https://github.com/gsmiller/lucene/commit/0e908f25701a050ca3e2117ab3d844c3a88c0681) if you want to have a look. I haven't spent much time on this so it's possible (likely?) this is flawed in some way. If you're interested, feel free to borrow ideas for your PR here.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r673575108



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       Sorry, I removed my comment, when I saw your Jira comment :slightly_smiling_face: 
   Yes, it truly makes sense to support lazy scoring for near-misses: users may override some places to enable `keepScores` and it's not good to prevent them from getting scores on dimensional collectors.
   I pushed (with --force) some changes to fix it for `doQueryFirstScoring`
   Though it doesn't look like same thing can be supported for `doDrillDownAdvanceScoring` and `doUnionScoring`, since scores array is computed prior collecting near-misses




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676888240



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       oh, my bad, missed the general case :slightly_frowning_face: 
   Thanks for your patience and elegant solution!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r673515186



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       What about also updating `collectDocID` in the same place (i.e., inside the `if (failedCollector == null)` check) so that the instance of `ScoreAndDoc` we make available to the various Collectors stays consistent (i.e., the result of calling `score()` always reports the score for the doc returned by `docID()`)? 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676894541



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       No worries at all. Thanks for sticking with the PR!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676879074



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       Thanks, this looks great!
   
   I think the caching could be useful down in `collectHit()` in the case that a `sidewaysLeafCollector` decides to call back into `score()` (e.g., if `FacetsCollector` has `keepScores` set to `true` and calls back to get the score [here](https://github.com/apache/lucene/blob/main/lucene/facet/src/java/org/apache/lucene/facet/FacetsCollector.java#L125)). Without using something like `ScoreCachingWrappingScorer`, the underlying score would need to be recomputed for the same docid if I'm not mistaken. Does that sound right or am I overlooking something?
   
   Thanks again for taking this up! Excited to get this change merged once we figure out whether-or-not we need the caching layer in place.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r673552569



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       Ah right of course. It actually makes this a bit trickier then huh? Sorry, I wasn't thinking about this deeply enough at first, but technically, even "near miss" collectors could call into `ScoreAndDoc` and expect to get a score. In fact, `FacetCollector`s could do this (see the `keepScores` boolean that can be passed to the ctor). So I think we need a meaningful score available for the "near miss" cases after all.
   
   Let's think about this a little more and see if we can come up with a way to avoid computing a score until a collector accesses it. Maybe we could leverage a `Supplier` in some way? I'll think about this a bit more.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r675059243



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       I was thinking about this a little more and wanted to see what you thought about a slightly different approach. Full disclosure, I’ve been away from my computer for a day and am on my phone right now, so I haven’t looked into this as deeply as I’d like. 
   
   It looks to me, when doing “query first scoring”, that the ScoreAndDoc wrapper is essentially useless. The baseScorer will be positioned on the “collectDoc” whenever it collects a full hit or near miss. Can’t we just pass baseScorer into the collectors (via setScorer) when doing queryFirstScoring and avoid using a ScoreAndDoc instance completely? That would also allow the score to be lazily evaluated since the collectors would just be calling into baseScorer directly if they need the score. If I remember correctly, there’s a caching scorer wrapper we could use here as well to ensure the score is only computed once in the case that multiple drill sideways collectors call back for the score. 
   
   Like I said, I’m on my phone and haven’t been able to try this out at all, so maybe it’s completely flawed for some reason. But maybe you could have a look and see if something along these lines would work?
   
   Thanks again for taking this on!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller merged pull request #217:
URL: https://github.com/apache/lucene/pull/217


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gtroitskiy commented on a change in pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gtroitskiy commented on a change in pull request #217:
URL: https://github.com/apache/lucene/pull/217#discussion_r676822527



##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysScorer.java
##########
@@ -195,11 +195,8 @@ private void doQueryFirstScoring(Bits acceptDocs, LeafCollector collector, DocsA
 
       collectDocID = docID;

Review comment:
       sorry, I've been away for a while
   thank you for the sketch, your approach is definitely more elegant :slightly_smiling_face: 
   except I'm not sure we need caching, since by design only one drill-down collector is being called for a specific docId
   
   PS force-pushed changes




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on pull request #217: LUCENE-10030: [DrillSidewaysScorer.doQueryFirstScoring] disable scoring for near-miss hits

Posted by GitBox <gi...@apache.org>.
gsmiller commented on pull request #217:
URL: https://github.com/apache/lucene/pull/217#issuecomment-886989427


   Thanks again @gtroitskiy !


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org