You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucy.apache.org by Nathan Kurz <na...@verse.com> on 2012/09/12 22:10:46 UTC

Re: [lucy-dev] lucene block postings format

A new paper on a similar topic:

http://lemire.me/blog/archives/2012/09/12/fast-integer-compression-decoding-billions-of-integers-per-second/

http://arxiv.org/pdf/1209.2137v1.pdf

"we introduce a novel vectorized scheme called SIMD-BP128 that
improves over previously proposed vectorized approaches. It is nearly
twice as fast as the previously fastest schemes on desktop processors
(varint-G8IU and PFOR). At the same time, SIMD-BP128 saves up to 2
bits per integer. For even better compression, we propose another new
vectorized scheme (SIMD-FastPFOR) that has a compression rate within
10% of a state-of-the-art scheme (Simple-8b) while being two times
faster during decoding."

I haven't read the details, but at a quick skim I think they reach the
same conclusions that PackedBinary (FOR) is significantly faster than
PFOR, and has only slightly less compression. But they then go on to
show that this benefit is minimal compared to the advantage of using
vectorized SIMD instructions.

I'm leery of their thought processes, though:
"According to our analysis, one decoding step of this algorithms would
require at least 11 assembly instructions. In contrast, one step of
our interleaved bit unpacking from 3 (used by our SIMD-BP128) would
typically require only 5 and 10 assembly instructions in a typical and
worst case, respectively."

In a paper that is about the benefits of SIMD, an argument presuming
that fewer instructions means faster is radically out of place. Even
without SIMD, different instructions take vastly different numbers of
cycles. I'm tempted to think that one of the authors "gets it" and
the other doesn't.

--nate