You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ga...@apache.org on 2023/03/24 09:52:16 UTC

[parquet-format] branch master updated: PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2

This is an automated email from the ASF dual-hosted git repository.

gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 2a481fe  PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2
2a481fe is described below

commit 2a481fe1aad64ff770e21734533bb7ef5a057dac
Author: Gang Wu <us...@gmail.com>
AuthorDate: Fri Mar 24 17:52:09 2023 +0800

    PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2
    
    This closes #193
---
 Encodings.md | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/Encodings.md b/Encodings.md
index a70ae6f..5e38d48 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -68,6 +68,7 @@ This encoding uses a combination of bit-packing and run length encoding to more
 The grammar for this encoding looks like this, given a fixed bit-width known in advance:
 ```
 rle-bit-packed-hybrid: <length> <encoded-data>
+// length is not always prepended, please check the table below for more detail
 length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
 encoded-data := <run>*
 run := <bit-packed-run> | <rle-run>
@@ -123,6 +124,23 @@ data:
 * Dictionary indices
 * Boolean values in data pages, as an alternative to PLAIN encoding
 
+Whether prepending the four-byte `length` to the `encoded-data` is summarized as the table below:
+```
++--------------+------------------------+-----------------+
+| Page kind    | RLE-encoded data kind  | Prepend length? |
++--------------+------------------------+-----------------+
+| Data page v1 | Definition levels      | Y               |
+|              | Repetition levels      | Y               |
+|              | Dictionary indices     | N               |
+|              | Boolean values         | Y               |
++--------------+------------------------+-----------------+
+| Data page v2 | Definition levels      | N               |
+|              | Repetition levels      | N               |
+|              | Dictionary indices     | N               |
+|              | Boolean values         | Y               |
++--------------+------------------------+-----------------+
+```
+
 ### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4)
 
 This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE) hybrid encoding.