You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Andrzej Bialecki <ab...@getopt.org> on 2010/10/27 11:27:51 UTC

VSEncoding - int codec faster than p4delta

Hi,

I found this paper recently published in proceedings of CIKM'10:

VSEncoding: efficient coding and fast decoding of integer lists via
dynamic programming, F. Silvestri, R. Venturini

Available here http://portal.acm.org/citation.cfm?id=1871437.1871592

My favorite quotes:

"We point out that only our methods, together with In-
terpolative, are able to beat the entropy of the lists on the
datasets. This quasi-paradoxical eect is, indeed, present
because entropy does not consider context information. En-
tropy, or to use a notation commonly used in text compres-
sion, zeroth-order entropy, does not take into account pat-
terns (i.e. the context) that can be present in lists of blocks
of integers. By grouping together blocks of integers, in fact,
we are able to assign codewords to more than a single value
at a time. Therefore, it appears obvious that we can beat
the entropy in the case of VSE, VSE-R and Interpolative. Es-
sentially, this is possible since we exploit regularities on the
lists on these very skewed d-gaps lists (e.g., small values close
to each other or quite long runs of 1s). We remark that beat
the entropy is not possible with any prex code (e.g., sta-
tistical compressors like Arithmetic and Human or integer
encoders like , , 's, Golomb, and so on). Therefore, our
methods is certainly better in compression than any of these
kind of methods without the need of any comparison. Any-
way, for the sake of completeness, we report those results as
well in Table 3."

"As expected, Interpolative is the slowest as opposed to VSE
which tops others with more than 800 millions of integers
per second. Our methods, VSE and VSE-R, are among the
fastest in decoding with a number of mis (millions of integers per
second) decoded ranging from 450 of VSE-R to 835 of
VSE both of them measured using the gov2 collection. It
is interesting to observe the better performance in terms
of decoding speed of VSE with respect to others, and in
particular with respect to OPT-P4D, Simple9 and Simple16
which are considered state-of-the-art as far as decompression
speed is concerned."

P4D compression achieved a speed of 460 mis in this test, which means
the VSE method is twice as fast.

-- 
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VSEncoding - int codec faster than p4delta

Posted by Robert Muir <rc...@gmail.com>.
On Wed, Oct 27, 2010 at 11:12 AM, Michael McCandless
<lu...@mikemccandless.com> wrote:
> This sounds fabulous!
>
> FOR/PFOR already show great gains for me over standard codec... if we
> can do even better, that's awesome.
>

good thing Mike spent(present tense too!) all that work on the whole
flexible indexing thing :)

Really, seeing stuff like this shows it was the right thing to do.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VSEncoding - int codec faster than p4delta

Posted by Michael McCandless <lu...@mikemccandless.com>.
This sounds fabulous!

FOR/PFOR already show great gains for me over standard codec... if we
can do even better, that's awesome.

Mike

On Wed, Oct 27, 2010 at 10:36 AM, Paul Elschot <pa...@xs4all.nl> wrote:
> Op woensdag 27 oktober 2010 11:27:51 schreef Andrzej Bialecki:
>> Hi,
>>
>> I found this paper recently published in proceedings of CIKM'10:
>>
>> VSEncoding: efficient coding and fast decoding of integer lists via
>> dynamic programming, F. Silvestri, R. Venturini
>>
>> Available here http://portal.acm.org/citation.cfm?id=1871437.1871592
>
> A search for "VSEncoding" (with quotes) showed that the paper is currently also available via:
>
> http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html
>
> That page refers to this open access pdf version:
> (the url should be on single line, sorry for the format here)
>
> http://puma.isti.cnr.it/download.php?DocFile=2010-TR-016.pdf&langver=it&idcode=2010-TR-016&authority=cnr.isti&collection=cnr.isti&check=
>
> A very interesting read indeed, thanks.
>
> Regards,
> Paul Elschot
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
> For additional commands, e-mail: dev-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Re: VSEncoding - int codec faster than p4delta

Posted by Paul Elschot <pa...@xs4all.nl>.
Op woensdag 27 oktober 2010 11:27:51 schreef Andrzej Bialecki:
> Hi,
> 
> I found this paper recently published in proceedings of CIKM'10:
> 
> VSEncoding: efficient coding and fast decoding of integer lists via
> dynamic programming, F. Silvestri, R. Venturini
> 
> Available here http://portal.acm.org/citation.cfm?id=1871437.1871592

A search for "VSEncoding" (with quotes) showed that the paper is currently also available via:

http://puma.isti.cnr.it/publichtml/section_cnr_isti/cnr_isti_2010-TR-016.html

That page refers to this open access pdf version: 
(the url should be on single line, sorry for the format here)

http://puma.isti.cnr.it/download.php?DocFile=2010-TR-016.pdf&langver=it&idcode=2010-TR-016&authority=cnr.isti&collection=cnr.isti&check=

A very interesting read indeed, thanks.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org