You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by GitBox <gi...@apache.org> on 2023/01/12 13:36:19 UTC

[GitHub] [lucene] jpountz opened a new pull request, #12081: Speed up DocIdMerger on sorted indexes.

jpountz opened a new pull request, #12081:
URL: https://github.com/apache/lucene/pull/12081

   In the case when an index is sorted on a low-cardinality field, or the index sort order correlates with the order in which documents get ingested, we can optimize `SortedDocIDMerger` by doing a single comparison with the doc ID on the next sub. This checks covers at the same time whether the priority queue needs reordering and whether the current sub reached `NO_MORE_DOCS`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz merged pull request #12081: Speed up DocIdMerger on sorted indexes.

Posted by GitBox <gi...@apache.org>.
jpountz merged PR #12081:
URL: https://github.com/apache/lucene/pull/12081


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


[GitHub] [lucene] jpountz commented on pull request #12081: Speed up DocIdMerger on sorted indexes.

Posted by GitBox <gi...@apache.org>.
jpountz commented on PR #12081:
URL: https://github.com/apache/lucene/pull/12081#issuecomment-1380369577

   Here are timings of the first doc value merges on the IndexTaxis benchmark and a sorted dense index:
   
   ```
   SM 0 [2023-01-12T13:17:47.581987785Z; Thread-0]: 564 ms to merge doc values [367596 docs]
   SM 0 [2023-01-12T13:17:52.428441520Z; Thread-0]: 392 ms to merge doc values [365886 docs]
   SM 0 [2023-01-12T13:17:57.028935759Z; Thread-0]: 377 ms to merge doc values [365212 docs]
   SM 0 [2023-01-12T13:18:01.642897130Z; Thread-0]: 379 ms to merge doc values [367056 docs]
   SM 0 [2023-01-12T13:18:06.191013759Z; Thread-0]: 368 ms to merge doc values [364343 docs]
   SM 0 [2023-01-12T13:18:10.726133213Z; Thread-0]: 367 ms to merge doc values [363438 docs]
   SM 0 [2023-01-12T13:18:15.299072784Z; Thread-0]: 374 ms to merge doc values [366954 docs]
   SM 0 [2023-01-12T13:18:19.907346178Z; Thread-0]: 370 ms to merge doc values [365804 docs]
   SM 0 [2023-01-12T13:18:24.545160224Z; Thread-0]: 371 ms to merge doc values [366997 docs]
   SM 0 [2023-01-12T13:18:29.163147828Z; Thread-0]: 374 ms to merge doc values [367039 docs]
   SM 0 [2023-01-12T13:18:34.031698013Z; Thread-0]: 3709 ms to merge doc values [3660325 docs]
   SM 0 [2023-01-12T13:18:41.003270469Z; Thread-0]: 369 ms to merge doc values [365722 docs]
   SM 0 [2023-01-12T13:18:45.598362451Z; Thread-0]: 369 ms to merge doc values [365728 docs]
   SM 0 [2023-01-12T13:18:50.243855116Z; Thread-0]: 376 ms to merge doc values [367656 docs]
   SM 0 [2023-01-12T13:18:54.862259612Z; Thread-0]: 368 ms to merge doc values [365076 docs]
   SM 0 [2023-01-12T13:18:59.503139644Z; Thread-0]: 372 ms to merge doc values [365477 docs]
   SM 0 [2023-01-12T13:19:04.151794532Z; Thread-0]: 374 ms to merge doc values [367064 docs]
   SM 0 [2023-01-12T13:19:08.810873192Z; Thread-0]: 373 ms to merge doc values [367412 docs]
   SM 0 [2023-01-12T13:19:13.460733970Z; Thread-0]: 371 ms to merge doc values [366363 docs]
   SM 0 [2023-01-12T13:19:18.065982734Z; Thread-0]: 384 ms to merge doc values [366018 docs]
   SM 0 [2023-01-12T13:19:22.840259876Z; Thread-0]: 384 ms to merge doc values [365759 docs]
   SM 0 [2023-01-12T13:19:27.916927516Z; Thread-0]: 3842 ms to merge doc values [3662275 docs]
   SM 0 [2023-01-12T13:19:35.187165246Z; Thread-0]: 386 ms to merge doc values [366840 docs]
   ```
   
   Now the same log with the change for the same set of merges:
   ```
   SM 0 [2023-01-12T13:27:49.359139222Z; Thread-0]: 482 ms to merge doc values [367596 docs]
   SM 0 [2023-01-12T13:27:54.237137785Z; Thread-0]: 380 ms to merge doc values [365886 docs]
   SM 0 [2023-01-12T13:27:59.001734090Z; Thread-0]: 371 ms to merge doc values [365212 docs]
   SM 0 [2023-01-12T13:28:03.713076713Z; Thread-0]: 365 ms to merge doc values [367056 docs]
   SM 0 [2023-01-12T13:28:08.355025029Z; Thread-0]: 352 ms to merge doc values [364343 docs]
   SM 0 [2023-01-12T13:28:13.032506866Z; Thread-0]: 352 ms to merge doc values [363438 docs]
   SM 0 [2023-01-12T13:28:17.722795068Z; Thread-0]: 354 ms to merge doc values [366954 docs]
   SM 0 [2023-01-12T13:28:22.468102730Z; Thread-0]: 367 ms to merge doc values [365804 docs]
   SM 0 [2023-01-12T13:28:27.314385233Z; Thread-0]: 363 ms to merge doc values [366997 docs]
   SM 0 [2023-01-12T13:28:32.162119724Z; Thread-0]: 366 ms to merge doc values [367039 docs]
   SM 0 [2023-01-12T13:28:37.027154229Z; Thread-0]: 3678 ms to merge doc values [3660325 docs]
   SM 0 [2023-01-12T13:28:44.367846338Z; Thread-0]: 364 ms to merge doc values [365722 docs]
   SM 0 [2023-01-12T13:28:49.208172350Z; Thread-0]: 365 ms to merge doc values [365728 docs]
   SM 0 [2023-01-12T13:28:54.096854303Z; Thread-0]: 367 ms to merge doc values [367656 docs]
   SM 0 [2023-01-12T13:28:58.937724069Z; Thread-0]: 355 ms to merge doc values [365076 docs]
   SM 0 [2023-01-12T13:29:03.645241892Z; Thread-0]: 350 ms to merge doc values [365477 docs]
   SM 0 [2023-01-12T13:29:08.355256074Z; Thread-0]: 355 ms to merge doc values [367064 docs]
   SM 0 [2023-01-12T13:29:13.091695689Z; Thread-0]: 353 ms to merge doc values [367412 docs]
   SM 0 [2023-01-12T13:29:17.813632707Z; Thread-0]: 353 ms to merge doc values [366363 docs]
   SM 0 [2023-01-12T13:29:22.522670731Z; Thread-0]: 355 ms to merge doc values [366018 docs]
   SM 0 [2023-01-12T13:29:27.252547913Z; Thread-0]: 357 ms to merge doc values [365759 docs]
   SM 0 [2023-01-12T13:29:32.005065809Z; Thread-0]: 3580 ms to merge doc values [3662275 docs]
   SM 0 [2023-01-12T13:29:39.224502172Z; Thread-0]: 359 ms to merge doc values [366840 docs]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org