You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "gashutos (via GitHub)" <gi...@apache.org> on 2023/05/25 20:37:15 UTC

[GitHub] [lucene] gashutos opened a new pull request, #12334: fix searchafer high latency when after value is out of range for segment

gashutos opened a new pull request, #12334:
URL: https://github.com/apache/lucene/pull/12334

   ### Description
   As mentioned in issue #12333, if ```after``` value is out of range for a field in a segment, ```PagingFieldCollector``` is not able to put any item in the ```FieldhitQueue```.  Due to which ```queueFull``` always stays ```false``` and it wont ever update PointValues based ```competitiveIterator``` to skip non-competitive hits. 
   
   I think, it makes sense to update ```competitiveIterator``` even if queue is not full and the query is ```after``` query because we have at least ```topValue``` in ```after``` query always. That itself should be capable enough to skip non-competitive hits. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590663201

   @gashutos I think we should make users aware of this optimization, would you be up for opening another PR that adds a CHANGES entry?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590703139

   Looks great, thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590666069

   Let's also update the title/description of this PR?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gashutos commented on a diff in pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gashutos (via GitHub)" <gi...@apache.org>.
gashutos commented on code in PR #12334:
URL: https://github.com/apache/lucene/pull/12334#discussion_r1211208875


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   >>we also don't know if we'll ever need these arrays. If the queue never fills, we could unnecessarily have allocated one of them.
   
   if ```queueFull``` is ```false```, that mean we will be breaching this condition ```|| (queueFull == false && topValueSet == false)) return;``` only if its ```after``` query where ```topValueSet``` is set to ```true```. We dont allocate unnecessary here, i.e we initiliaze min/max values only if we are to call ```encodeTop``` or ```encodeBottom``` for that.
   
   I gave explanation w.r.t code in current PR, but if you were talking in context of existing code, yes, we are allocating unnecessary in case ```queueFull``` is always false.
   
   >>I think we still have enough information upfront though to eagerly allocate these like we do today? Is it just a question of being eager vs. lazy with these?
   
   There is a problem if we follow the same approach, check this code, 
   ```
   } else {
           if (queueFull) { // bottom is avilable only when queue is full
             minValueAsBytes = minValueAsBytes == null ? new byte[bytesCount] : minValueAsBytes;
             encodeBottom(minValueAsBytes);
           }
           if (topValueSet) {
             maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;
             encodeTop(maxValueAsBytes);
           }
         }
   ```
   if ```queueFUll``` is always false & there is a ```topValueSet```, we will have ```minValueAsBytes``` set as ```[0, 0, 0, 0, 0, 0, 0, 0, 0]``` instead ```null```. This will result in incorrect ```minValueAsBytes``` in further calculations.



##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   Does that clarify ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12334:
URL: https://github.com/apache/lucene/pull/12334#discussion_r1206813319


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   Do we need the lazy initialization? I thought `topValueSet` would already be set before the `NumericLeafComparator` gets constructed. Maybe I'm misunderstanding that?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gashutos commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gashutos (via GitHub)" <gi...@apache.org>.
gashutos commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590638213

   > We just upgraded Elasticsearch to a Lucene snapshot that has this change, and this triggered major speedups on some queries. In my opinion, the PR title and description don't do justice to this change since it does not only help when after is out of range, also when after is within the range but filtering only based on the after value significantly reduces the number of hits to evaluate.
   
   @jpountz Agreed !
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller merged pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller merged PR #12334:
URL: https://github.com/apache/lucene/pull/12334


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12334:
URL: https://github.com/apache/lucene/pull/12334#discussion_r1210741210


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   Apologies in advance if I'm misunderstanding, but as the code is currently written, we also don't know if we'll ever _need_ these arrays. If the queue never fills, we could unnecessarily have allocated one of them. I think we still have enough information upfront though to eagerly allocate these like we do today? Is it just a question of being eager vs. lazy with these?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590631585

   We just upgraded Elasticsearch to a Lucene snapshot that has this change, and this triggered major speedups on some queries. In my opinion, the PR title and description don't do justice to this change since it does not only help when `after` is out of range, also when `after` is within the range but filtering only based on the `after` value significantly reduces the number of hits to evaluate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gashutos commented on a diff in pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gashutos (via GitHub)" <gi...@apache.org>.
gashutos commented on code in PR #12334:
URL: https://github.com/apache/lucene/pull/12334#discussion_r1206958937


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   We dont know upfront at the time of construction (where currently initialization is done) that would we be needing maxValueAsBytes & minValuesAsBytes both. Like about the case, where we dont have any competitive hit collected in queue hence no bottom but has ```after``` value so the topValue.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gashutos commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

Posted by "gashutos (via GitHub)" <gi...@apache.org>.
gashutos commented on PR #12334:
URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590701060

   Sure, changes title/description, LMK if looks good.
   CHANGES.txt PR https://github.com/apache/lucene/pull/12368


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] gsmiller commented on a diff in pull request #12334: Fix searchafter query high latency when after value is out of range for segment

Posted by "gsmiller (via GitHub)" <gi...@apache.org>.
gsmiller commented on code in PR #12334:
URL: https://github.com/apache/lucene/pull/12334#discussion_r1212360862


##########
lucene/core/src/java/org/apache/lucene/search/comparators/NumericComparator.java:
##########
@@ -204,13 +200,21 @@ private void updateCompetitiveIterator() throws IOException {
         return;
       }
       if (reverse == false) {
-        encodeBottom(maxValueAsBytes);
+        if (queueFull) { // bottom is avilable only when queue is full
+          maxValueAsBytes = maxValueAsBytes == null ? new byte[bytesCount] : maxValueAsBytes;

Review Comment:
   Oh, I see. Yes, that helps clarify. Thanks! The difference I missed is that we never start updating the competitive iterator until the queue fills in the current code, so it doesn't matter that these byte arrays are initialized as zeros, but now it does matter because `null` has meaning in the case of the competitive iterator update. OK got it. Thanks for walking me through it :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org