You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Jorge Leitão (Jira)" <ji...@apache.org> on 2021/04/17 20:28:00 UTC
[jira] [Created] (PARQUET-2028) The example in delta-encoding seems
incorrect
Jorge Leitão created PARQUET-2028:
-------------------------------------
Summary: The example in delta-encoding seems incorrect
Key: PARQUET-2028
URL: https://issues.apache.org/jira/browse/PARQUET-2028
Project: Parquet
Issue Type: Bug
Components: parquet-format
Reporter: Jorge Leitão
In the example using delta-encoded, encoding [1, 2, 3, 4, 5], we state that
{code:java}
The final encoded data is:
header: 8 (block size), 1 (miniblock count), 5 (value count), 1 (first value)
block 1 (minimum delta), 0 (bitwidth), (no data needed for bitwidth 0)
{code}
I believe that the correct result should be
header: [8, 1, 5, 2]
block: [2, 0]
I.e first_value and min_delta should be 2, not 1.
This is because the zig-zag ULEB128-encoding of 1 is 2: the ULEB-128 encoding of 1 is 1, but AFAIK the zig-zag encoding of 1 is 2 (see e.g. [here|https://stackoverflow.com/a/2211086/931303]).
Alternatively, we could re-phrase "The final encoded data is:" to "The final data prior to zig-zag encoding is:"
--
This message was sent by Atlassian Jira
(v8.3.4#803005)