You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@lucene.apache.org by "jpountz (via GitHub)" <gi...@apache.org> on 2023/11/24 16:26:18 UTC
[PR] Faster prefix sum for bitsPerValue up to 9. [lucene]
jpountz opened a new pull request, #12843:
URL: https://github.com/apache/lucene/pull/12843
In order to speed up computing prefix sums, we currently pack two deltas in a long and then sum up longs. This allows us to sum up two deltas in a single instruction. We can do better for numbers of bits per value up to 9: since `128 * 2^9 = 2^16`, we can actually compute the prefix sum on only 16 bits per value, summing up 4 deltas at once.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Faster prefix sum for bitsPerValue up to 9. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12843:
URL: https://github.com/apache/lucene/pull/12843#issuecomment-1825884854
luceneutil doesn't see a noticeable difference (all p-values are high) but the micro-benchmark that is attached to this PR seems to see an improvement:
```
main
Benchmark (bpv) Mode Cnt Score Error Units
ForUtilBenchmark.decodeAndPrefixSum 6 thrpt 25 18.762 ± 0.739 ops/us
ForUtilBenchmark.decodeAndPrefixSum 7 thrpt 25 18.075 ± 0.220 ops/us
ForUtilBenchmark.decodeAndPrefixSum 8 thrpt 25 21.040 ± 0.285 ops/us
ForUtilBenchmark.decodeAndPrefixSum 9 thrpt 25 16.790 ± 0.896 ops/us
ForUtilBenchmark.decodeAndPrefixSum 10 thrpt 25 17.441 ± 1.260 ops/us
ForUtilBenchmark.decodeAndPrefixSum 11 thrpt 25 16.697 ± 0.883 ops/us
PR
Benchmark (bpv) Mode Cnt Score Error Units
ForUtilBenchmark.decodeAndPrefixSum 6 thrpt 25 19.171 ± 0.277 ops/us
ForUtilBenchmark.decodeAndPrefixSum 7 thrpt 25 18.875 ± 0.203 ops/us
ForUtilBenchmark.decodeAndPrefixSum 8 thrpt 25 22.075 ± 0.497 ops/us
ForUtilBenchmark.decodeAndPrefixSum 9 thrpt 25 18.689 ± 0.792 ops/us
ForUtilBenchmark.decodeAndPrefixSum 10 thrpt 25 17.696 ± 0.252 ops/us
ForUtilBenchmark.decodeAndPrefixSum 11 thrpt 25 16.623 ± 0.856 ops/us
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org
Re: [PR] Faster prefix sum for bitsPerValue up to 9. [lucene]
Posted by "jpountz (via GitHub)" <gi...@apache.org>.
jpountz commented on PR #12843:
URL: https://github.com/apache/lucene/pull/12843#issuecomment-1826052610
Actually we can do even better by better tuning the disk layout for the prefix sum. Converting this PR to a draft until this is implemented.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@lucene.apache.org
For additional commands, e-mail: issues-help@lucene.apache.org