You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2021/03/11 08:03:14 UTC

[GitHub] [solr] limingnihao opened a new pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

limingnihao opened a new pull request #8:
URL: https://github.com/apache/solr/pull/8


   <!--
   _(If you are a project committer then you may remove some/all of the following template.)_
   
   Before creating a pull request, please file an issue in the ASF Jira system for Lucene or Solr:
   
   * https://issues.apache.org/jira/projects/LUCENE
   * https://issues.apache.org/jira/projects/SOLR
   
   You will need to create an account in Jira in order to create an issue.
   
   The title of the PR should reference the Jira issue number in the form:
   
   * LUCENE-####: <short description of problem or changes>
   * SOLR-####: <short description of problem or changes>
   
   LUCENE and SOLR must be fully capitalized. A short description helps people scanning pull requests for items they can work on.
   
   Properly referencing the issue in the title ensures that Jira is correctly updated with code review comments and commits. -->
   
   
   # Description
   When timeAllowed parameter is set, the SolrQueryTimeoutImpl function will be started to detect whether it has timed out when the term is loaded. When overtime, an ExitingReaderException is thrown.
   
   In the process of scoreFeatures of LTRQuery, ExitingReaderException will occur in two stages.

   *  scorer: Occurs when a term needs to be loaded to the LeafReaderContext when creating Weight.
   *  score: The term needs to be loaded when some functions call getValue. For example, FloatPayloadValueSource.
   
   So it can be compatible with this ExitingReaderException, and partly return. At least it's better than empty.
   
   # Solution
   In scoreFeatures method, catch ExitingReaderException and return the currently loaded document.
   
   # Tests
   Simulation throws ExitingReaderException in the scorer process, and partially returns the loaded document.
Simulation throws ExitingReaderException in the score process of Feature, partially returning the calculated document.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability.
   - [x] I have created a Jira issue and added the issue ID to my pull request title.
   - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended)
   - [x] I have developed this patch against the `master` branch.
   - [x] I have run `ant precommit` and the appropriate test suite.
   - [x] I have added tests for my changes.
   - [x] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only).
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [solr] limingnihao commented on pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

Posted by GitBox <gi...@apache.org>.
limingnihao commented on pull request #8:
URL: https://github.com/apache/solr/pull/8#issuecomment-803781815


   @alessandrobenedetti Add a parameter to LTR that lets the user choose what to do with the timeout? What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [solr] alessandrobenedetti commented on pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on pull request #8:
URL: https://github.com/apache/solr/pull/8#issuecomment-804151653


   > @alessandrobenedetti Add a parameter to LTR that lets the user choose what to do with the timeout? What do you think?
   
   mmm personally I don't think it is useful to have some results reranked and some not, depending on a time-out and with little control over knowing "up to what" was reranked.
   So, in my opinion, I would not add it as a parameter, it will just add unnecessary complexity.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [solr] cpoerschke commented on a change in pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

Posted by GitBox <gi...@apache.org>.
cpoerschke commented on a change in pull request #8:
URL: https://github.com/apache/solr/pull/8#discussion_r592622736



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##########
@@ -181,14 +185,19 @@ public void scoreFeatures(IndexSearcher indexSearcher,
         readerContext = leaves.get(readerUpto);
         endDoc = readerContext.docBase + readerContext.reader().maxDoc();
       }
-      // We advanced to another segment
-      if (readerContext != null) {
-        docBase = readerContext.docBase;
-        scorer = modelWeight.scorer(readerContext);
+      try{
+        // We advanced to another segment
+        if (readerContext != null) {
+          docBase = readerContext.docBase;
+          scorer = modelWeight.scorer(readerContext);
+        }
+        scoreSingleHit(indexSearcher, topN, modelWeight, docBase, hitUpto, hit, docID, scoringQuery, scorer, reranked);
+        hitUpto++;
+      } catch (ExitableDirectoryReader.ExitingReaderException ex) {
+        break;

Review comment:
       An advantage of handling the exception here is that the code has an opportunity to do "as much as possible" reranking in the available time i.e. some reranking rather than no reranking. A disadvantage is that the calling code and end user don't know if the returned results were the "true" results or if they are "partial" because there was only enough time for part of the reranking work.

##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRScoringQuery.java
##########
@@ -249,6 +250,8 @@ private void createWeights(IndexSearcher searcher, boolean needsScores,
       try{
         Feature.FeatureWeight fw = f.createWeight(searcher, needsScores, req, originalQuery, efi);
         featureWeights.add(fw);
+      } catch (ExitableDirectoryReader.ExitingReaderException ex) {
+        throw ex;

Review comment:
       Differentiating `ExitableDirectoryReader.ExitingReaderException` from other `Exception` types here would give calling code an opportunity to handle the exception. I haven't yet explored how perhaps https://github.com/apache/solr/blob/main/solr/contrib/ltr/src/test/org/apache/solr/ltr/TestLTRScoringQuery.java could include testing for `ExitableDirectoryReader.ExitingReaderException` exceptions.

##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##########
@@ -124,10 +125,13 @@ public TopDocs rescore(IndexSearcher searcher, TopDocs firstPassTopDocs,
     final LTRScoringQuery.ModelWeight modelWeight = (LTRScoringQuery.ModelWeight) searcher
         .createWeight(searcher.rewrite(scoringQuery), ScoreMode.COMPLETE, 1);
 
-    scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);
+    int hitUpto = scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);

Review comment:
       Via this change here `scoreFeatures` handles the `ExitableDirectoryReader.ExitingReaderException` scenario but in the [artificial scenario](https://issues.apache.org/jira/browse/SOLR-14607?focusedCommentId=17299792&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17299792) using the `techproducts` example the `searcher.createWeight` was also encountering the exception.

##########
File path: solr/core/src/java/org/apache/solr/search/ReRankCollector.java
##########
@@ -150,6 +160,12 @@ public TopDocs topDocs(int start, int howMany) {
         rescoredDocs.scoreDocs = scoreDocs;
         return rescoredDocs;
       }
+    } catch (ExitableDirectoryReader.ExitingReaderException ex) {
+      if (mainDocs != null) {
+        throw new UnReRankedTopDocs(mainDocs);
+      } else {
+        throw ex;

Review comment:
       If we managed to get the top docs but then rescoring encountered the timeout exception then we could provide the top docs to the caller, for the caller to decide if un-reranked documents are better than no documents at all.

##########
File path: solr/core/src/java/org/apache/solr/search/ReRankCollector.java
##########
@@ -97,9 +105,11 @@ public ScoreMode scoreMode() {
   @SuppressWarnings({"unchecked"})
   public TopDocs topDocs(int start, int howMany) {
 
+    TopDocs mainDocs = null;
+
     try {
 
-      TopDocs mainDocs = mainCollector.topDocs(0,  Math.max(reRankDocs, length));
+      mainDocs = mainCollector.topDocs(0,  Math.max(reRankDocs, length));

Review comment:
       Here we get the top docs (and I image we could hit the timeout exception at this point) and then at line 119 we rescore the documents (and there the timeout exception can also happen).

##########
File path: solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java
##########
@@ -1599,7 +1599,13 @@ public ScoreMode scoreMode() {
       ScoreMode scoreModeUsed = buildAndRunCollectorChain(qr, query, collector, cmd, pf.postFilter).scoreMode();
 
       totalHits = topCollector.getTotalHits();
-      TopDocs topDocs = topCollector.topDocs(0, len);
+      TopDocs topDocs = null;
+      try {
+        topDocs = topCollector.topDocs(0, len);
+      } catch (ReRankCollector.UnReRankedTopDocs urrTopDocs) {
+        topDocs = urrTopDocs.topDocs;
+        qr.setPartialResults(true);
+      }

Review comment:
       `buildAndRunCollectorChain` already handles the timeout exception but afterwards `topCollector.topDocs` can also encounter the timeout exception. If un-reranked documents are available then we provide them here and indicate to the caller that the response is partial. The `timeAllowed` query parameter [1] and the `partialResults` response flag go together i.e. the client would already know to check about it. Also this would apply to both reranking via `ltr` and reranking via the `rerank` [2] parser code paths.
   
   [1] https://solr.apache.org/guide/8_8/query-re-ranking.html#rerank-query-parser
   [2] https://solr.apache.org/guide/8_8/common-query-parameters.html#timeallowed-parameter




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [solr] alessandrobenedetti commented on a change in pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

Posted by GitBox <gi...@apache.org>.
alessandrobenedetti commented on a change in pull request #8:
URL: https://github.com/apache/solr/pull/8#discussion_r594462247



##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##########
@@ -124,10 +125,13 @@ public TopDocs rescore(IndexSearcher searcher, TopDocs firstPassTopDocs,
     final LTRScoringQuery.ModelWeight modelWeight = (LTRScoringQuery.ModelWeight) searcher
         .createWeight(searcher.rewrite(scoringQuery), ScoreMode.COMPLETE, 1);
 
-    scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);
+    int hitUpto = scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);
+    final ScoreDoc[] rerankHited = new ScoreDoc[hitUpto];

Review comment:
       rerankHited ?
   -> rerankedHits ?

##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##########
@@ -124,10 +125,13 @@ public TopDocs rescore(IndexSearcher searcher, TopDocs firstPassTopDocs,
     final LTRScoringQuery.ModelWeight modelWeight = (LTRScoringQuery.ModelWeight) searcher
         .createWeight(searcher.rewrite(scoringQuery), ScoreMode.COMPLETE, 1);
 
-    scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);
+    int hitUpto = scoreFeatures(searcher,topN, modelWeight, firstPassResults, leaves, reranked);
+    final ScoreDoc[] rerankHited = new ScoreDoc[hitUpto];
+    System.arraycopy(reranked, 0, rerankHited,0, hitUpto);

Review comment:
       are we introducing a copy of the reranked for?

##########
File path: solr/contrib/ltr/src/java/org/apache/solr/ltr/LTRRescorer.java
##########
@@ -181,14 +185,19 @@ public void scoreFeatures(IndexSearcher indexSearcher,
         readerContext = leaves.get(readerUpto);
         endDoc = readerContext.docBase + readerContext.reader().maxDoc();
       }
-      // We advanced to another segment
-      if (readerContext != null) {
-        docBase = readerContext.docBase;
-        scorer = modelWeight.scorer(readerContext);
+      try{
+        // We advanced to another segment
+        if (readerContext != null) {
+          docBase = readerContext.docBase;
+          scorer = modelWeight.scorer(readerContext);
+        }
+        scoreSingleHit(indexSearcher, topN, modelWeight, docBase, hitUpto, hit, docID, scoringQuery, scorer, reranked);
+        hitUpto++;
+      } catch (ExitableDirectoryReader.ExitingReaderException ex) {
+        break;

Review comment:
       I was reasoning about this, I think I prefer an all or nothing approach:
   Try to rerank up to the topK I am targeting, in the allowed time, if I fail to succeed in the allowed time I don't rerank at all.
   Having some hit rescored and others not, because of time, it's not ideal, given the fact the reRankDocs should regulate what is reranked and what not




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [solr] limingnihao commented on pull request #8: SOLR-14607: LTR Query, the TimeAllowed parameter causes a timeout exception after no results

Posted by GitBox <gi...@apache.org>.
limingnihao commented on pull request #8:
URL: https://github.com/apache/solr/pull/8#issuecomment-796583433


   @cpoerschke I have resubmitted it. Please review this PR again, thanks.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org