You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ga...@apache.org on 2023/03/24 09:52:16 UTC
[parquet-format] branch master updated: PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2
This is an automated email from the ASF dual-hosted git repository.
gangwu pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git
The following commit(s) were added to refs/heads/master by this push:
new 2a481fe PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2
2a481fe is described below
commit 2a481fe1aad64ff770e21734533bb7ef5a057dac
Author: Gang Wu <us...@gmail.com>
AuthorDate: Fri Mar 24 17:52:09 2023 +0800
PARQUET-2222: Fix incorrect spec for RLE encoding of data page v2
This closes #193
---
Encodings.md | 18 ++++++++++++++++++
1 file changed, 18 insertions(+)
diff --git a/Encodings.md b/Encodings.md
index a70ae6f..5e38d48 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -68,6 +68,7 @@ This encoding uses a combination of bit-packing and run length encoding to more
The grammar for this encoding looks like this, given a fixed bit-width known in advance:
```
rle-bit-packed-hybrid: <length> <encoded-data>
+// length is not always prepended, please check the table below for more detail
length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
encoded-data := <run>*
run := <bit-packed-run> | <rle-run>
@@ -123,6 +124,23 @@ data:
* Dictionary indices
* Boolean values in data pages, as an alternative to PLAIN encoding
+Whether prepending the four-byte `length` to the `encoded-data` is summarized as the table below:
+```
++--------------+------------------------+-----------------+
+| Page kind | RLE-encoded data kind | Prepend length? |
++--------------+------------------------+-----------------+
+| Data page v1 | Definition levels | Y |
+| | Repetition levels | Y |
+| | Dictionary indices | N |
+| | Boolean values | Y |
++--------------+------------------------+-----------------+
+| Data page v2 | Definition levels | N |
+| | Repetition levels | N |
+| | Dictionary indices | N |
+| | Boolean values | Y |
++--------------+------------------------+-----------------+
+```
+
### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4)
This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE) hybrid encoding.