You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2021/04/17 20:28:00 UTC

[jira] [Created] (PARQUET-2028) The example in delta-encoding seems incorrect

Jorge Leitão created PARQUET-2028:
-------------------------------------

             Summary: The example in delta-encoding seems incorrect
                 Key: PARQUET-2028
                 URL: https://issues.apache.org/jira/browse/PARQUET-2028
             Project: Parquet
          Issue Type: Bug
          Components: parquet-format
            Reporter: Jorge Leitão


In the example using delta-encoded, encoding [1, 2, 3, 4, 5], we state that

{code:java}
The final encoded data is:

header: 8 (block size), 1 (miniblock count), 5 (value count), 1 (first value)

block 1 (minimum delta), 0 (bitwidth), (no data needed for bitwidth 0)
{code}

I believe that the correct result should be

header: [8, 1, 5, 2]
block: [2, 0]

I.e first_value and min_delta should be 2, not 1.

This is because the zig-zag ULEB128-encoding of 1 is 2: the ULEB-128 encoding of 1 is 1, but AFAIK the zig-zag encoding of 1 is 2 (see e.g. [here|https://stackoverflow.com/a/2211086/931303]).

Alternatively, we could re-phrase "The final encoded data is:" to "The final data prior to zig-zag encoding is:"







--
This message was sent by Atlassian Jira
(v8.3.4#803005)