You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "slow-J (via GitHub)" <gi...@apache.org> on 2023/11/02 12:27:47 UTC

[I] Explore partially decoding blocks (within-block skipping) [lucene]

slow-J opened a new issue, #12749:
URL: https://github.com/apache/lucene/issues/12749

   ### Description
   
   Idea from @mikemccand 's comment in https://github.com/apache/lucene/issues/12696#issuecomment-1770461719
   
   ```
   Another exciting optimization such a "patch-less" encoding could implement is within-block skipping (I believe Tantivy does this).
   
   Today, our skipper is forced to align to block boundaries, so when we skip to a given docid, we go to the block that may contain this docid, decode all 128 int[], then linearly scan within those 128 ints. This is quite a bit of overhead for each skip request!
   
   If we could lower that linear scan cost to maybe 16 or 8 or something, the conjunctive queries should get even faster. But perhaps it becomes trickier to take advantage of SIMD optimizations if we are decoding a subset of ints, not sure.
   ```
   
   After the change in https://github.com/apache/lucene/pull/12741 , we will no longer use patching when encoding doc blocks. 
   This may allow us to partially decode blocks? This would mean skipping could jump to the middle of a block, instead of having to be at block boundaries as they are today.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [I] Explore partially decoding blocks (within-block skipping) [lucene]

Posted by "Tony-X (via GitHub)" <gi...@apache.org>.
Tony-X commented on issue #12749:
URL: https://github.com/apache/lucene/issues/12749#issuecomment-1799460960

   > How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some checkpoints separately, but then it may be easier/better to simply go with smaller blocks (e.g. 64 doc IDs instead of 128)?
   
   +1. Delta-encoding here is the blocker, unless we change the encoding scheme. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org


Re: [I] Explore partially decoding blocks (within-block skipping) [lucene]

Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on issue #12749:
URL: https://github.com/apache/lucene/issues/12749#issuecomment-1792592185

   How would it work? Since blocks are delta-coded, you can't know the value at a given index without decoding all previous values and computing their sum? Or you need to store some checkpoints separately, but then it may be easier/better to simply go with smaller blocks (e.g. 64 doc IDs instead of 128)?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org