You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@parquet.apache.org by ga...@apache.org on 2021/06/24 11:56:34 UTC

[parquet-format] branch master updated: Document dictionary page position (#177)

This is an automated email from the ASF dual-hosted git repository.

gabor pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/parquet-format.git


The following commit(s) were added to refs/heads/master by this push:
     new 43c891a  Document dictionary page position (#177)
43c891a is described below

commit 43c891a4494f85e2fe0e56f4ef408bcc60e8da48
Author: Gabor Szadovszky <ga...@apache.org>
AuthorDate: Thu Jun 24 13:56:23 2021 +0200

    Document dictionary page position (#177)
---
 README.md                      | 7 +++++++
 src/main/thrift/parquet.thrift | 5 +++++
 2 files changed, 12 insertions(+)

diff --git a/README.md b/README.md
index ac7c791..f5478c8 100644
--- a/README.md
+++ b/README.md
@@ -191,6 +191,13 @@ header and readers can skip over pages they are not interested in.  The data for
 page follows the header and can be compressed and/or encoded.  The compression and
 encoding is specified in the page metadata.
 
+A column chunk might be partly or completely dictionary encoded. It means that
+dictionary indexes are saved in the data pages instead of the actual values. The
+actual values are stored in the dictionary page. See details in
+[Encodings.md](https://github.com/apache/parquet-format/blob/master/Encodings.md#dictionary-encoding-plain_dictionary--2-and-rle_dictionary--8).
+The dictionary page must be placed at the first position of the column chunk. At
+most one dictionary page can be placed in a column chunk.
+
 Additionally, files can contain an optional column index to allow readers to
 skip pages more efficiently. See [PageIndex.md](PageIndex.md) for details and
 the reasoning behind adding these to the format.
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 05690e4..81a7cf8 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -534,6 +534,11 @@ struct IndexPageHeader {
   // TODO
 }
 
+/**
+ * The dictionary page must be placed at the first position of the column chunk
+ * if it is partly or completely dictionary encoded. At most one dictionary page
+ * can be placed in a column chunk.
+ **/
 struct DictionaryPageHeader {
   /** Number of values in the dictionary **/
   1: required i32 num_values;