You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@solr.apache.org by GitBox <gi...@apache.org> on 2022/12/12 19:55:36 UTC

[GitHub] [solr] magibney opened a new pull request, #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

magibney opened a new pull request, #1236:
URL: https://github.com/apache/solr/pull/1236

   See: https://issues.apache.org/jira/browse/SOLR-16585


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mkhludnev commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by "mkhludnev (via GitHub)" <gi...@apache.org>.
mkhludnev commented on PR #1236:
URL: https://github.com/apache/solr/pull/1236#issuecomment-1442489544

   @magibney what about https://github.com/apache/solr/pull/1378/files ? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047579554


##########
solr/CHANGES.txt:
##########
@@ -177,6 +177,12 @@ Other Changes
 
 * SOLR-16569: Add java system property to overseer queue size (Nick Ginther via noble)
 
+==================  9.1.1 ==================
+
+Bug Fixes
+---------------------
+* SOLR-16585: Fix NPE in MatchAllDocs pagination (Michael Gibney)

Review Comment:
   when....>   Clarify to users and us when this happens.  IMO we (collectively) don't take enough care on these summaries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney commented on PR #1236:
URL: https://github.com/apache/solr/pull/1236#issuecomment-1347260186

   >It's worth to add some similar test case into. Solr examples/CLI smoke tests. Also, it's worth to add more variety into this test case.
   
   Definitely; didn't want to delay getting something up though. Will look into building out testing further.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047195884


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -2203,15 +2203,19 @@ public DocListAndSet getDocListAndSet(Query query, Sort lsort, int offset, int l
 
   private DocList constantScoreDocList(int offset, int length, DocSet docs) {
     final int size = docs.size();
-    if (length == 0 || size <= offset) {
-      return new DocSlice(0, 0, new int[0], null, size, 0f, TotalHits.Relation.EQUAL_TO);
-    }
-    final DocIterator iter = docs.iterator();
-    for (int i = offset; i > 0; i--) {
-      iter.nextDoc(); // discard
-    }
-    final int returnSize = Math.min(length, size - offset);
+
+    // NOTE: it would be possible to special-case `length == 0 || size <= offset` here
+    // (returning a DocList backed by an empty array) -- but the cases that would practically
+    // benefit from doing so would be extremely unusual, and likely pathological:
+    //   1. length==0 in conjunction with offset>0 (why?)
+    //   2. specifying offset>size (paging beyond end of results)
+    // This would require special consideration in dealing with cache handling (and generation
+    // of the final DocList via `DocSlice.subset(int, int)`), and it's just not worth it.
+
+    final int returnSize = Math.min(offset + length, size);

Review Comment:
   Tangential: I think it would be quite straightforward to add a DocList implementation that could support a backing docs int[] that _doesn't_ cover the whole range (basically how I was incorrectly treating DocSlice). If we did that, it would allow to efficiently support arbitrarily deep offset-based paging over constant-score queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
janhoy commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047283207


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -2203,15 +2203,19 @@ public DocListAndSet getDocListAndSet(Query query, Sort lsort, int offset, int l
 
   private DocList constantScoreDocList(int offset, int length, DocSet docs) {
     final int size = docs.size();
-    if (length == 0 || size <= offset) {
-      return new DocSlice(0, 0, new int[0], null, size, 0f, TotalHits.Relation.EQUAL_TO);
-    }
-    final DocIterator iter = docs.iterator();
-    for (int i = offset; i > 0; i--) {
-      iter.nextDoc(); // discard
-    }
-    final int returnSize = Math.min(length, size - offset);
+
+    // NOTE: it would be possible to special-case `length == 0 || size <= offset` here
+    // (returning a DocList backed by an empty array) -- but the cases that would practically
+    // benefit from doing so would be extremely unusual, and likely pathological:
+    //   1. length==0 in conjunction with offset>0 (why?)
+    //   2. specifying offset>size (paging beyond end of results)
+    // This would require special consideration in dealing with cache handling (and generation
+    // of the final DocList via `DocSlice.subset(int, int)`), and it's just not worth it.
+
+    final int returnSize = Math.min(offset + length, size);

Review Comment:
   Makes sense, thanks for the explanation. I was just afraid of over-allocating and causing GC. I'll resolve this thread.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] dsmiley commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
dsmiley commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047582860


##########
solr/CHANGES.txt:
##########
@@ -177,6 +177,12 @@ Other Changes
 
 * SOLR-16569: Add java system property to overseer queue size (Nick Ginther via noble)
 
+==================  9.1.1 ==================
+
+Bug Fixes
+---------------------
+* SOLR-16585: Fix NPE in MatchAllDocs pagination (Michael Gibney)

Review Comment:
   ```suggestion
   * SOLR-16585: Fixed NPE when paginating MatchAllDocs like q=*:*&start=10 (Michael Gibney)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney merged pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney merged PR #1236:
URL: https://github.com/apache/solr/pull/1236


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] janhoy commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
janhoy commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047094224


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -2203,15 +2203,19 @@ public DocListAndSet getDocListAndSet(Query query, Sort lsort, int offset, int l
 
   private DocList constantScoreDocList(int offset, int length, DocSet docs) {
     final int size = docs.size();
-    if (length == 0 || size <= offset) {
-      return new DocSlice(0, 0, new int[0], null, size, 0f, TotalHits.Relation.EQUAL_TO);
-    }
-    final DocIterator iter = docs.iterator();
-    for (int i = offset; i > 0; i--) {
-      iter.nextDoc(); // discard
-    }
-    final int returnSize = Math.min(length, size - offset);
+
+    // NOTE: it would be possible to special-case `length == 0 || size <= offset` here
+    // (returning a DocList backed by an empty array) -- but the cases that would practically
+    // benefit from doing so would be extremely unusual, and likely pathological:
+    //   1. length==0 in conjunction with offset>0 (why?)
+    //   2. specifying offset>size (paging beyond end of results)
+    // This would require special consideration in dealing with cache handling (and generation
+    // of the final DocList via `DocSlice.subset(int, int)`), and it's just not worth it.
+
+    final int returnSize = Math.min(offset + length, size);

Review Comment:
   Why is returnSize changed from earlier `Math.min(length, size - offset)`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by "magibney (via GitHub)" <gi...@apache.org>.
magibney commented on PR #1236:
URL: https://github.com/apache/solr/pull/1236#issuecomment-1441691759

   [testMatchAllDocsPlain()](https://github.com/apache/solr/pull/1236/files#diff-87d99efc408087bfac0b86439e9d396699535d80c21f4e5792ff79e90184ec16R244-R256) was updated to hit a wide variety of offsets, in and out of range of the target docset. That's all for now, but that was the big gap that prevented catching this. 
   
   At a high level, what other combinations were you thinking should be tested? I don't actually think smoke tests add that much with this. Couldn't hurt; but the problem here was "no tests", not "no integration tests". As alarming as this regression was, I don't see anything inherent in this functionality to indicate a need for integration tests for this, any more than any other functionality.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] mkhludnev commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by "mkhludnev (via GitHub)" <gi...@apache.org>.
mkhludnev commented on PR #1236:
URL: https://github.com/apache/solr/pull/1236#issuecomment-1441427045

   > > It's worth to add some similar test case into. Solr examples/CLI smoke tests. Also, it's worth to add more variety into this test case.
   > 
   > Definitely; didn't want to delay getting something up though. Will look into building out testing further.
   
   Hello, @magibney. I'm wondering if we have such test cases.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney commented on PR #1236:
URL: https://github.com/apache/solr/pull/1236#issuecomment-1361455049

   Thanks everyone; committed and backported to `branch_9x` and `branch_9_1`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047678837


##########
solr/CHANGES.txt:
##########
@@ -177,6 +177,12 @@ Other Changes
 
 * SOLR-16569: Add java system property to overseer queue size (Nick Ginther via noble)
 
+==================  9.1.1 ==================
+
+Bug Fixes
+---------------------
+* SOLR-16585: Fix NPE in MatchAllDocs pagination (Michael Gibney)

Review Comment:
   Good point; addressed in 026506e439d61b7fdd775990af5b056609bd9da9 (incorporating a slightly modified version of your suggestion).
   
   For thoroughness, I'd note that this also could affect any constant-score query when `useFilterForSortedQuery=true` -- but given that the matchAllDocs case is so prominent I feel like it's only really worth mentioning that one in CHANGES.txt.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org


[GitHub] [solr] magibney commented on a diff in pull request #1236: SOLR-16585: Fix NPE in MatchAllDocs pagination

Posted by GitBox <gi...@apache.org>.
magibney commented on code in PR #1236:
URL: https://github.com/apache/solr/pull/1236#discussion_r1047173635


##########
solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java:
##########
@@ -2203,15 +2203,19 @@ public DocListAndSet getDocListAndSet(Query query, Sort lsort, int offset, int l
 
   private DocList constantScoreDocList(int offset, int length, DocSet docs) {
     final int size = docs.size();
-    if (length == 0 || size <= offset) {
-      return new DocSlice(0, 0, new int[0], null, size, 0f, TotalHits.Relation.EQUAL_TO);
-    }
-    final DocIterator iter = docs.iterator();
-    for (int i = offset; i > 0; i--) {
-      iter.nextDoc(); // discard
-    }
-    final int returnSize = Math.min(length, size - offset);
+
+    // NOTE: it would be possible to special-case `length == 0 || size <= offset` here
+    // (returning a DocList backed by an empty array) -- but the cases that would practically
+    // benefit from doing so would be extremely unusual, and likely pathological:
+    //   1. length==0 in conjunction with offset>0 (why?)
+    //   2. specifying offset>size (paging beyond end of results)
+    // This would require special consideration in dealing with cache handling (and generation
+    // of the final DocList via `DocSlice.subset(int, int)`), and it's just not worth it.
+
+    final int returnSize = Math.min(offset + length, size);

Review Comment:
   That's an indirect consequence of the essence of the bug. In SOLR-14765 I incorrectly assumed it was ok to trim the result list at the point of initial DocSlice creation -- i.e., if you have offset=200, len=10, that it should be possible to return a DocSlice backed by a 10-element int[]. But in fact there's an assumption, in all cases, that index 0 of the backing docs int[] is the top doc (and consequently we always need to populate the array out as far as docs[offset+length].
   
   So at this stage, DocSlice is initialized with an `offset=0` ctor arg, where offset reflects both the offset into the array _and_ the offset of the first element of the array . The _actual_ returned docList is created subsequently by calling initialDocList.subset(actualRequestedOffset, actualRequestedLength). In fact, in the codebase the initial DocSlice is almost always created with offset=0 -- I toyed with the idea of actually enforcing this, making a public ctor that always sets offset=0 and length=docs.length, and only setting explicit offset/len via a private ctor (invoked by DocSlice.subset(int, int)). But it became more complex than I'd hoped, so I figured to keep this change as small as possible.
   
   When initial DocSlices are created in this way, it simplifies the logic about how they can be cached and repurposed to serve other ranges of requested docs. The special-casing of length=0 I think would also have messed with this, hence the removal of the special-casing. Worst-case consequence of removing the special-casing is that if somebody requests offset>0, length=0, we now have to build and populate an array of `offset` elements, or if someone requests offset>docSet.size(), length>0, we now have to build and populate an array of `docSet.size()` elements. Neither of these would actually be used to satisfy the immediate request, but both cases are weird/pathological edge cases so probably not worth special-casing, given the hoops we'd have to jump through, e.g. to avoid caching them, etc.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@solr.apache.org
For additional commands, e-mail: issues-help@solr.apache.org