You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/09/25 16:46:01 UTC

[jira] [Commented] (PARQUET-41) Add bloom filters to parquet statistics

    [ https://issues.apache.org/jira/browse/PARQUET-41?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16627622#comment-16627622 ] 

ASF GitHub Bot commented on PARQUET-41:
---------------------------------------

cjjnjust closed pull request #62: PARQUET-41: Add bloom filter for parquet
URL: https://github.com/apache/parquet-format/pull/62
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/.gitignore b/.gitignore
index cb047215..4587f73e 100644
--- a/.gitignore
+++ b/.gitignore
@@ -1,3 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 generated/*
 target
 dependency-reduced-pom.xml
diff --git a/.travis.yml b/.travis.yml
index dc339c51..42ba8baf 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -1,3 +1,20 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
 language: java
 dist: precise
 before_install:
diff --git a/CHANGES.md b/CHANGES.md
index befe5321..c0c529ab 100644
--- a/CHANGES.md
+++ b/CHANGES.md
@@ -19,6 +19,70 @@
 
 # Parquet #
 
+### Version 2.5.0 ###
+
+#### Bug
+
+*   [PARQUET-323](https://issues.apache.org/jira/browse/PARQUET-323) - INT96 should be marked as deprecated
+*   [PARQUET-1064](https://issues.apache.org/jira/browse/PARQUET-1064) - Deprecate type-defined sort ordering for INTERVAL type
+*   [PARQUET-1065](https://issues.apache.org/jira/browse/PARQUET-1065) - Deprecate type-defined sort ordering for INT96 type
+*   [PARQUET-1145](https://issues.apache.org/jira/browse/PARQUET-1145) - Add license to .gitignore and .travis.yml
+*   [PARQUET-1156](https://issues.apache.org/jira/browse/PARQUET-1156) - dev/merge\_parquet\_pr.py problems
+*   [PARQUET-1236](https://issues.apache.org/jira/browse/PARQUET-1236) - Upgrade org.slf4j:slf4j-api:1.7.2 to 1.7.12
+*   [PARQUET-1242](https://issues.apache.org/jira/browse/PARQUET-1242) - parquet.thrift refers to wrong releases for the new compressions
+*   [PARQUET-1251](https://issues.apache.org/jira/browse/PARQUET-1251) - Clarify ambiguous min/max stats for FLOAT/DOUBLE
+*   [PARQUET-1258](https://issues.apache.org/jira/browse/PARQUET-1258) - Update scm developer connection to github
+
+#### New Feature
+
+*   [PARQUET-1201](https://issues.apache.org/jira/browse/PARQUET-1201)  - Write column indexes
+
+#### Improvement
+
+*   [PARQUET-1171](https://issues.apache.org/jira/browse/PARQUET-1171) - \[C++\] Clarify valid uses for RLE, BIT_PACKED encodings
+*   [PARQUET-1197](https://issues.apache.org/jira/browse/PARQUET-1197) - Log rat failures
+
+#### Task
+
+*   [PARQUET-1234](https://issues.apache.org/jira/browse/PARQUET-1234) - Release Parquet format 2.5.0
+
+### Version 2.4.0 ###
+
+#### Bug
+
+*   [PARQUET-255](https://issues.apache.org/jira/browse/PARQUET-255) - Typo in decimal type specification
+*   [PARQUET-322](https://issues.apache.org/jira/browse/PARQUET-322) - Document ENUM as a logical type
+*   [PARQUET-412](https://issues.apache.org/jira/browse/PARQUET-412) - Format: Do not shade slf4j-api
+*   [PARQUET-419](https://issues.apache.org/jira/browse/PARQUET-419) - Update dev script in parquet-cpp to remove incubator.
+*   [PARQUET-655](https://issues.apache.org/jira/browse/PARQUET-655) - The LogicalTypes.md link in README.md points to the old Parquet GitHub repository
+*   [PARQUET-1031](https://issues.apache.org/jira/browse/PARQUET-1031) - Fix spelling errors, whitespace, GitHub urls
+*   [PARQUET-1032](https://issues.apache.org/jira/browse/PARQUET-1032) - Change link in Encodings.md for variable length encoding
+*   [PARQUET-1050](https://issues.apache.org/jira/browse/PARQUET-1050) - The comment of Parquet Format Thrift definition file error
+*   [PARQUET-1076](https://issues.apache.org/jira/browse/PARQUET-1076) - [Format] Switch to long key ids in KEYs file
+*   [PARQUET-1091](https://issues.apache.org/jira/browse/PARQUET-1091) - Wrong and broken links in README
+*   [PARQUET-1102](https://issues.apache.org/jira/browse/PARQUET-1102) - Travis CI builds are failing for parquet-format PRs
+*   [PARQUET-1134](https://issues.apache.org/jira/browse/PARQUET-1134) - Release Parquet format 2.4.0
+*   [PARQUET-1136](https://issues.apache.org/jira/browse/PARQUET-1136) - Makefile is broken
+
+#### Improvement
+
+*   [PARQUET-371](https://issues.apache.org/jira/browse/PARQUET-371) - Bumps Thrift version to 0.9.3
+*   [PARQUET-407](https://issues.apache.org/jira/browse/PARQUET-407) - Incorrect delta-encoding example
+*   [PARQUET-428](https://issues.apache.org/jira/browse/PARQUET-428) - Support INT96 and FIXED_LEN_BYTE_ARRAY types
+*   [PARQUET-601](https://issues.apache.org/jira/browse/PARQUET-601) - Add support in Parquet to configure the encoding used by ValueWriters
+*   [PARQUET-609](https://issues.apache.org/jira/browse/PARQUET-609) - Add Brotli compression to Parquet format
+*   [PARQUET-757](https://issues.apache.org/jira/browse/PARQUET-757) - Add NULL type to Bring Parquet logical types to par with Arrow
+*   [PARQUET-804](https://issues.apache.org/jira/browse/PARQUET-804) - parquet-format README.md still links to the old Google group
+*   [PARQUET-922](https://issues.apache.org/jira/browse/PARQUET-922) - Add index pages to the format to support efficient page skipping
+*   [PARQUET-1049](https://issues.apache.org/jira/browse/PARQUET-1049) - Make thrift version a property in pom.xml
+
+#### Task
+
+*   [PARQUET-450](https://issues.apache.org/jira/browse/PARQUET-450) - Small typos/issues in parquet-format documentation
+*   [PARQUET-667](https://issues.apache.org/jira/browse/PARQUET-667) - Update committers lists to point to apache website
+*   [PARQUET-1124](https://issues.apache.org/jira/browse/PARQUET-1124) - Add new compression codecs to the Parquet spec
+*   [PARQUET-1125](https://issues.apache.org/jira/browse/PARQUET-1125) - Add UUID logical type
+
 ### Version 2.2.0 ###
 
 * [PARQUET-23](https://issues.apache.org/jira/browse/PARQUET-23): Rename packages and maven coordinates to org.apache
diff --git a/Encodings.md b/Encodings.md
index c4cdf704..9358b137 100644
--- a/Encodings.md
+++ b/Encodings.md
@@ -27,30 +27,30 @@ This file contains the specification of all supported encodings.
 Supported Types: all
 
 This is the plain encoding that must be supported for types.  It is
-intended to be the simplest encoding.  Values are encoded back to back. 
+intended to be the simplest encoding.  Values are encoded back to back.
 
-The plain encoding is used whenever a more efficient encoding can not be used. It 
+The plain encoding is used whenever a more efficient encoding can not be used. It
 stores the data in the following format:
  - BOOLEAN: [Bit Packed](#RLE), LSB first
  - INT32: 4 bytes little endian
  - INT64: 8 bytes little endian
- - INT96: 12 bytes little endian
+ - INT96: 12 bytes little endian (deprecated)
  - FLOAT: 4 bytes IEEE little endian
  - DOUBLE: 8 bytes IEEE little endian
  - BYTE_ARRAY: length in 4 bytes little endian followed by the bytes contained in the array
  - FIXED_LEN_BYTE_ARRAY: the bytes contained in the array
 
 For native types, this outputs the data as little endian. Floating
-    point types are encoded in IEEE.  
+    point types are encoded in IEEE.
 
 For the byte array type, it encodes the length as a 4 byte little
 endian, followed by the bytes.
 
-### Dictionary Encoding (PLAIN_DICTIONARY = 2)
-The dictionary encoding builds a dictionary of values encountered in a given column. The 
+### Dictionary Encoding (PLAIN_DICTIONARY = 2 and RLE_DICTIONARY = 8)
+The dictionary encoding builds a dictionary of values encountered in a given column. The
 dictionary will be stored in a dictionary page per column chunk. The values are stored as integers
 using the [RLE/Bit-Packing Hybrid](#RLE) encoding. If the dictionary grows too big, whether in size
-or number of distinct values, the encoding will fall back to the plain encoding. The dictionary page is 
+or number of distinct values, the encoding will fall back to the plain encoding. The dictionary page is
 written first, before the data pages of the column chunk.
 
 Dictionary page format: the entries in the dictionary - in dictionary order - using the [plain](#PLAIN) encoding.
@@ -58,22 +58,28 @@ Dictionary page format: the entries in the dictionary - in dictionary order - us
 Data page format: the bit width used to encode the entry ids stored as 1 byte (max bit width = 32),
 followed by the values encoded using RLE/Bit packed described above (with the given bit width).
 
+Using the PLAIN_DICTIONARY enum value is deprecated in the Parquet 2.0 specification. Prefer using RLE_DICTIONARY
+in a data page and PLAIN in a dictionary page for Parquet 2.0+ files.
+
 ### <a name="RLE"></a>Run Length Encoding / Bit-Packing Hybrid (RLE = 3)
+
 This encoding uses a combination of bit-packing and run length encoding to more efficiently store repeated values.
 
 The grammar for this encoding looks like this, given a fixed bit-width known in advance:
 ```
 rle-bit-packed-hybrid: <length> <encoded-data>
-length := length of the <encoded-data> in bytes stored as 4 bytes little endian
+length := length of the <encoded-data> in bytes stored as 4 bytes little endian (unsigned int32)
 encoded-data := <run>*
-run := <bit-packed-run> | <rle-run>  
-bit-packed-run := <bit-packed-header> <bit-packed-values>  
-bit-packed-header := varint-encode(<bit-pack-count> << 1 | 1)  
-// we always bit-pack a multiple of 8 values at a time, so we only store the number of values / 8  
-bit-pack-count := (number of values in this run) / 8  
-bit-packed-values := *see 1 below*  
-rle-run := <rle-header> <repeated-value>  
-rle-header := varint-encode( (number of times repeated) << 1)  
+run := <bit-packed-run> | <rle-run>
+bit-packed-run := <bit-packed-header> <bit-packed-values>
+bit-packed-header := varint-encode(<bit-pack-scaled-run-len> << 1 | 1)
+// we always bit-pack a multiple of 8 values at a time, so we only store the number of values / 8
+bit-pack-scaled-run-len := (bit-packed-run-len) / 8
+bit-packed-run-len := *see 3 below*
+bit-packed-values := *see 1 below*
+rle-run := <rle-header> <repeated-value>
+rle-header := varint-encode( (rle-run-len) << 1)
+rle-run-len := *see 3 below*
 repeated-value := value that is repeated, using a fixed-width of round-up-to-next-byte(bit-width)
 ```
 
@@ -82,14 +88,14 @@ repeated-value := value that is repeated, using a fixed-width of round-up-to-nex
    though the order of the bits in each value remains in the usual order of most significant to least
    significant. For example, to pack the same values as the example in the deprecated encoding above:
 
-   The numbers 1 through 7 using bit width 3:  
+   The numbers 1 through 7 using bit width 3:
    ```
    dec value: 0   1   2   3   4   5   6   7
    bit value: 000 001 010 011 100 101 110 111
    bit label: ABC DEF GHI JKL MNO PQR STU VWX
    ```
-   
-   would be encoded like this where spaces mark byte boundaries (3 bytes):  
+
+   would be encoded like this where spaces mark byte boundaries (3 bytes):
    ```
    bit value: 10001000 11000110 11111010
    bit label: HIDEFABC RMNOJKLG VWXSTUPQ
@@ -101,9 +107,24 @@ repeated-value := value that is repeated, using a fixed-width of round-up-to-nex
    shifting and ORing with a mask. (to make this optimization work on a big-endian machine,
    you would have to use the ordering used in the [deprecated bit-packing](#BITPACKED) encoding)
 
-2. varint-encode() is ULEB-128 encoding, see http://en.wikipedia.org/wiki/Variable-length_quantity
+2. varint-encode() is ULEB-128 encoding, see https://en.wikipedia.org/wiki/LEB128
+
+3. bit-packed-run-len and rle-run-len must be in the range \[1, 2<sup>31</sup> - 1\].
+   This means that a Parquet implementation can always store the run length in a signed
+   32-bit integer. This length restriction was not part of the Parquet 2.5.0 and earlier
+   specifications, but longer runs were not readable by the most common Parquet
+   implementations so, in practice, were not safe for Parquet writers to emit.
+
+
+Note that the RLE encoding method is only supported for the following types of
+data:
+
+* Repetition and definition levels
+* Dictionary indices
+* Boolean values in data pages, as an alternative to PLAIN encoding
 
 ### <a name="BITPACKED"></a>Bit-packed (Deprecated) (BIT_PACKED = 4)
+
 This is a bit-packed only encoding, which is deprecated and will be replaced by the [RLE/bit-packing](#RLE) hybrid encoding.
 Each value is encoded back to back using a fixed width.
 There is no padding between values (except for the last byte) which is padded with 0s.
@@ -114,18 +135,21 @@ This implementation is deprecated because the [RLE/bit-packing](#RLE) hybrid is
 For compatibility reasons, this implementation packs values from the most significant bit to the least significant bit,
 which is not the same as the [RLE/bit-packing](#RLE) hybrid.
 
-For example, the numbers 1 through 7 using bit width 3:  
+For example, the numbers 1 through 7 using bit width 3:
 ```
 dec value: 0   1   2   3   4   5   6   7
 bit value: 000 001 010 011 100 101 110 111
 bit label: ABC DEF GHI JKL MNO PQR STU VWX
 ```
-would be encoded like this where spaces mark byte boundaries (3 bytes):  
+would be encoded like this where spaces mark byte boundaries (3 bytes):
 ```
 bit value: 00000101 00111001 01110111
 bit label: ABCDEFGH IJKLMNOP QRSTUVWX
 ```
 
+Note that the BIT_PACKED encoding method is only supported for encoding
+repetition and definition levels.
+
 ### <a name="DELTAENC"></a>Delta Encoding (DELTA_BINARY_PACKED = 5)
 Supported Types: INT32, INT64
 
@@ -141,7 +165,7 @@ The header is defined as follows:
  * the total value count is stored as a VLQ int
  * the first value is stored as a zigzag VLQ int
 
-Each block contains 
+Each block contains
 ```
 <min delta> <list of bitwidths of miniblocks> <miniblocks>
 ```
@@ -230,7 +254,7 @@ Supported Types: BYTE_ARRAY
 This is also known as incremental encoding or front compression: for each element in a
 sequence of strings, store the prefix length of the previous entry plus the suffix.
 
-For a longer description, see http://en.wikipedia.org/wiki/Incremental_encoding.
+For a longer description, see https://en.wikipedia.org/wiki/Incremental_encoding.
 
 This is stored as a sequence of delta-encoded prefix lengths (DELTA_BINARY_PACKED), followed by
-the suffixes encoded as delta length byte arrays (DELTA_LENGTH_BYTE_ARRAY). 
+the suffixes encoded as delta length byte arrays (DELTA_LENGTH_BYTE_ARRAY).
diff --git a/KEYS b/KEYS
index d99427de..47079383 100644
--- a/KEYS
+++ b/KEYS
@@ -2,19 +2,17 @@ This file contains the PGP keys of various developers.
 
 Users: pgp < KEYS
   gpg --import KEYS
-Developers: 
+Developers:
   pgp -kxa <your name> and append it to this file.
   (pgpk -ll <your name> && pgpk -xa <your name>) >> this file.
-  (gpg --list-sigs <your name>
+  (gpg --list-sigs --keyid-format long <your name>
     && gpg --armor --export <your name>) >> this file.
 
-pub   2048R/7AE7E47B 2013-04-10 [expires: 2017-04-10]
-uid                  Julien Le Dem <ju...@ledem.net>
-sig 3        7AE7E47B 2013-04-10  Julien Le Dem <ju...@ledem.net>
-sig          D3924CCD 2014-09-08  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sig          71F0F13B 2014-09-08  Tianshuo Deng <td...@twitter.com>
-sub   2048R/03C4E111 2013-04-10 [expires: 2017-04-10]
-sig          7AE7E47B 2013-04-10  Julien Le Dem <ju...@ledem.net>
+pub   2048R/97D7E8647AE7E47B 2013-04-10 [expired: 2017-04-10]
+uid                          Julien Le Dem <ju...@ledem.net>
+sig 3        97D7E8647AE7E47B 2013-04-10  Julien Le Dem <ju...@ledem.net>
+sig          FCB3CBD9D3924CCD 2014-09-08  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sig          7CD8278971F0F13B 2014-09-08  Tianshuo Deng <td...@twitter.com>
 
 -----BEGIN PGP PUBLIC KEY BLOCK-----
 Version: GnuPG v1
@@ -70,13 +68,13 @@ cqCIvQIvmBpPdlyaoglwJ8wWb76uIE6VFcN71FF3EfV51/yUeQGJaoExWLY6IH8x
 Xtn3IWkBWA==
 =xpC8
 -----END PGP PUBLIC KEY BLOCK-----
-pub   2048R/71F0F13B 2013-08-26
-uid                  Tianshuo Deng <td...@twitter.com>
-sig 3        71F0F13B 2013-08-26  Tianshuo Deng <td...@twitter.com>
-sig          D3924CCD 2014-09-08  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sig          7AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
-sub   2048R/0CEDD7ED 2013-08-26
-sig          71F0F13B 2013-08-26  Tianshuo Deng <td...@twitter.com>
+pub   2048R/7CD8278971F0F13B 2013-08-26
+uid                          Tianshuo Deng <td...@twitter.com>
+sig 3        7CD8278971F0F13B 2013-08-26  Tianshuo Deng <td...@twitter.com>
+sig          FCB3CBD9D3924CCD 2014-09-08  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sig          97D7E8647AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
+sub   2048R/F98EFADB0CEDD7ED 2013-08-26
+sig          7CD8278971F0F13B 2013-08-26  Tianshuo Deng <td...@twitter.com>
 
 -----BEGIN PGP PUBLIC KEY BLOCK-----
 Version: GnuPG v1
@@ -125,19 +123,19 @@ woq9HFeHZ5sSVQ56GOwENRvjCeqDOTmbwus0MIymrcs3yHC6O0UEHPlpHzePNLJl
 /Otnj+wHn/N9LAoxr7gDpu/cpBFiPSLD189FCbU15FnFVAEuC5Vd9Y/3IhMwvg==
 =Gd1/
 -----END PGP PUBLIC KEY BLOCK-----
-pub   1024D/4318F669 2009-06-30
-uid                  Tom White (CODE SIGNING KEY) <to...@apache.org>
-sig          68E327C1 2010-09-23  [User ID not found]
-sig          A7239D59 2010-09-23  Doug Cutting (Lucene guy) <cu...@apache.org>
-sig          299EB32C 2010-09-25  [User ID not found]
-sig          AEC77EAF 2010-09-27  [User ID not found]
-sig          1F27E622 2010-10-27  [User ID not found]
-sig 3        4318F669 2009-06-30  Tom White (CODE SIGNING KEY) <to...@apache.org>
-sig          C987200D 2010-10-02  [User ID not found]
-sig          3D0C92B9 2010-09-24  [User ID not found]
-sig          D3924CCD 2014-09-04  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sub   2048g/BAEBF3E3 2009-06-30
-sig          4318F669 2009-06-30  Tom White (CODE SIGNING KEY) <to...@apache.org>
+pub   1024D/4FB955854318F669 2009-06-30
+uid                          Tom White (CODE SIGNING KEY) <to...@apache.org>
+sig          E22A746A68E327C1 2010-09-23  [User ID not found]
+sig          DBAF69BEA7239D59 2010-09-23  [User ID not found]
+sig          E952F459299EB32C 2010-09-25  [User ID not found]
+sig          5E43CAB9AEC77EAF 2010-09-27  [User ID not found]
+sig          220F69801F27E622 2010-10-27  [User ID not found]
+sig 3        4FB955854318F669 2009-06-30  Tom White (CODE SIGNING KEY) <to...@apache.org>
+sig          2C89EE98C987200D 2010-10-02  [User ID not found]
+sig          1209E7F13D0C92B9 2010-09-24  [User ID not found]
+sig          FCB3CBD9D3924CCD 2014-09-04  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sub   2048g/A306EFF1BAEBF3E3 2009-06-30
+sig          4FB955854318F669 2009-06-30  Tom White (CODE SIGNING KEY) <to...@apache.org>
 
 -----BEGIN PGP PUBLIC KEY BLOCK-----
 Version: GnuPG v1
@@ -210,19 +208,19 @@ yDChOIyfrt/T8ooJQVaI8IhJBBgRAgAJBQJKSfejAhsMAAoJEE+5VYVDGPZpviQA
 njVeVF9MewkYAYXYwxDQs6J+KIx4AJ9xqFuYD+KbUSGjAUcDyaJPufpZng==
 =kUv7
 -----END PGP PUBLIC KEY BLOCK-----
-pub   4096R/D3924CCD 2014-08-13
-uid                  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sig 3        D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sig          4318F669 2014-09-04  Tom White (CODE SIGNING KEY) <to...@apache.org>
-sig          7AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
-uid                  Ryan Blue <bl...@apache.org>
-sig 3        D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sig          4318F669 2014-09-04  Tom White (CODE SIGNING KEY) <to...@apache.org>
-sig          7AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
-sub   4096R/A8B58800 2014-08-13
-sig          D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
-sub   4096R/A4B2E9B5 2014-08-13
-sig          D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+pub   4096R/FCB3CBD9D3924CCD 2014-08-13
+uid                          Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sig 3        FCB3CBD9D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sig          4FB955854318F669 2014-09-04  Tom White (CODE SIGNING KEY) <to...@apache.org>
+sig          97D7E8647AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
+uid                          Ryan Blue <bl...@apache.org>
+sig 3        FCB3CBD9D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sig          4FB955854318F669 2014-09-04  Tom White (CODE SIGNING KEY) <to...@apache.org>
+sig          97D7E8647AE7E47B 2014-09-08  Julien Le Dem <ju...@ledem.net>
+sub   4096R/F16C5528A8B58800 2014-08-13
+sig          FCB3CBD9D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
+sub   4096R/86781D4FA4B2E9B5 2014-08-13
+sig          FCB3CBD9D3924CCD 2014-08-13  Ryan Blue (CODE SIGNING KEY) <bl...@apache.org>
 
 -----BEGIN PGP PUBLIC KEY BLOCK-----
 Version: GnuPG v1
@@ -337,4 +335,64 @@ rEx//hthc5qG2W49kASK+2sK0gIqeHEkCBudcdH8rpfoIXx7cRfR3Pk+3o5GrZVf
 BM83UyGyWEjVQCR3/E/ag0jKwmsnlX6ofGFfS6xSqKK+H/FoLsbI23dS4o6bF4QA
 HT2hxY8ondF9eKU5rnzLGRFYmm1+Pw==
 =gSQT
+
 -----END PGP PUBLIC KEY BLOCK-----
+pub   4096R/90DE59A3 2018-03-23
+uid                  Zoltan Ivanfi (CODE SIGNING KEY) <zi...@apache.org>
+sig 3        90DE59A3 2018-03-23  Zoltan Ivanfi (CODE SIGNING KEY) <zi...@apache.org>
+sub   4096R/5842E3B5 2018-03-23
+sig          90DE59A3 2018-03-23  Zoltan Ivanfi (CODE SIGNING KEY) <zi...@apache.org>
+
+-----BEGIN PGP PUBLIC KEY BLOCK-----
+Version: GnuPG v1
+
+mQINBFq1Ew4BEADHh5yEROn9b0g2iVFdNeSNBidHKuErYQReqWWEYfReRL5gu8OX
+AePJyIC94inupY38vt6yxj9oQzoSwbSP9jRJODGH2AMxbZhMHqrfrAJLBVYHmv8x
+J8BP1lG/A0TVkQTTSkysKllWcz+QJB8sz5EksLOOTp/hFjJrGMntzmM94wJorCo7
+9kGksY195WJEYaFGwf5ZRbYksPj8c6il45b5eFxAZ1H3cNoCZDAMxVDayezY81Do
+MBHfdZO6/scZ13KDGO0zHXFHxp44AZIyCbqB09QRz7RPlrrUiHa4oV8gJEav8BqV
+833m0ajfncpeqtyLoQ2bweRPdc7WokhqgwFx/5YIXTE7xrEECxzFv0n2Ekg2na1K
+Z/uf7B5rduoNGNvuf/M6ySdzSfHV0Q7/oYXeUaFRqHlVtH4+HMxKt/oOlAxRsnRf
+6NjtxRd93u2WJarUK2tGyo+KcNck+0/W8s987WwhYXnMq8YgP/YhPD0Zw8A4axOa
+wrhZ8SePEtLTffk3h5uJDQZdzopONVLvmufvbvUL1vqYQ6bTM6C06FurQfI3aJA9
+b3Vlr/JkZI2gmfLmQ4ReJsC1XfZ1IVjibzvyi0njIvlTQhMd5qluBbKlFRcf2S15
+Fn1WRX1gNSeZdpEbR62NcAnqgIycuYPVDhfs9fm+Ogd7mRfCrhpOvIMCFQARAQAB
+tDVab2x0YW4gSXZhbmZpIChDT0RFIFNJR05JTkcgS0VZKSA8eml2YW5maUBhcGFj
+aGUub3JnPokCNwQTAQIAIQIbAwIeAQIXgAUCWrUZrAULCQgHAwUVCgkICwUWAgMB
+AAAKCRDzAcr1kN5Zo4PqEACGahN0HtTbt1kJhtYS3nMwQYTI73PjL5QSWqHlTdNx
+OfjRU5jMjaNpeNwjdx6hxLp/KnI5DZR+19MwA5trUQ3ZEAYkCqU19dmfaIB9rsVv
+JMeLXNLuSv11reOrvLYFs8AcWzwIzhPBNz4q9xZqloVE4aCsRqm25xpJae5a8eDG
+mPZdbjIBSD6Na+hai9l2egNQdYbvzD6Qydb4XDq8Se3RMq05f2RLOTYId8qb4inD
+es1jQi+apUDSZ+WIL7C5UtS6nlzXDnXQtIfHfJsAJl2IW91b6wnoJPMlHtt+3BJg
+82nI8XGIEeDRQGGhLC/ZfkWc5OXapOhDhYykxuGBurvLzq7dPp+iJcs5F1W4PX7I
+xzZD/2x23G/Eg09DmVWYkeKeh3HmwqcDbYN0ApgrUmRuwueAqXvhoEe6kxaZcLrj
+otDSmZD0vECOadhOgst0kYHdFCgQL5MoPQqJNHZDPsciq7WiiAU9aF9DtWJy+6Zb
+0b5TyaCoT4RaqdJj7AU5bR44BYwHwVTy55UEsa8jZxyvK4kGPFgXwqPwW+lxteiv
+k3edHALBEdVZEFs2+xmiz0ns3F4QZHdj9qBG5GGw3jf9iKBDqaerIviEdJS1/yzm
+u980v7jcpOwg2ZsyTKh/PFmUO8tDHszj68RbhPzdBPNXpXhtEYSdOfSOdeK/g87c
+L7kCDQRatRMOARAAzSPx83m+FbeODkApJreD7A14rlT+gMsMaQTapjD5XDHmuS42
+sO4PtV4pGAD4q/KnZzorV2u9tcRxteinALcCoKlP7PoB87tpqUELLkUwgDZjNfNz
+/GipyJFSdcT2waBY+/03bVpthceCxIV3b6xTm2owrJgS0Exd0b21X3zELKiV9UC6
+Pjtd1qLsKgf6N+RvIbT8De2CrFzyy+iISvnZTFMEDE9rnkXuwY93OLtOHjW9rncp
+x2aLYmxuoUh8fKZTcWTXe/uG7/elED08aUwb8JINjSNTYBugs/2OTOpKW3jbti0h
+GOGk/AD+sKNndTG66/nYD5ED6NW0/NleHCDNO+vh0vzjSds08daotj21Z/2sWY06
+qxYGOkTEQy4i0DyTxylxxvPk+c5pTIHupcLsRjmjl3J45vPANnkj4lkNMTdlkabJ
+P2lglwOV+fmW+nxGmW/83AxvNun1dMrHCV5oZXIR5eblyHGMwBpzonl7kOFTIagG
+wcJJK/erJxvFOdAYuiXkq51/DxlK5KNBIT/G1U71EzFRCU/jK+rdI+fAMmoiJ794
+F2PTQwF5NxEr28lM6qOC1QjF5gxVAQU2N6klP5R2Ir1OrIo6RFrhWO+j1AGnUYjE
+zcKLf/DuNzGkO1CTp25Z2mROHSc9vdhSm17EcfCzSPKIrCjkEKeW6Xi7N98AEQEA
+AYkCHwQYAQIACQIbDAUCWrUYgwAKCRDzAcr1kN5Zo5jVD/0UUCdJL4rEQ0PfQoMs
+Gtxx0xMl4ASQQM4ENVBPIzfhXMe3g9iRZkOrNAuRF2KZ3Hr1ekfM4FtcOX4ZGB7t
+TL9ai0QIWJYHj7eWQIpno1sHIQQhx0VpA2Av4gxVdfR7aL3O+rm7QLZU2TPXWd3o
+wiBn3BnWKgv0j6XmvWH1Yn13OpFuWjt+QEcE2W0wNg8MP7J+fz3XjC84BucMnBQv
+hgz7WkFATnWfwwDm+UB3pmibTqC/Kvia/GZzWrwGc/v73XckxnALMfUXV35KHAY4
+YXaLDrHu3h5SnXdoKFnyBkHwFZFlFYWSt47SYpeYvaWDUF1aplMXgH/xYoySeGMt
+2GL0xZKE9SI2xwNblqR2dmTOfTjO9HnkI6fYW4VuulBrp850DAWDaluKGoggQaq0
+t7qTBxOB4xA9tci9x347Oeq1QnBJZJnkOnEqY56GVG/0ACyemVaPNEg+0B/sD4Uq
+3JyQhtn/+UAlyL8Qg98ExOXqVMGK2+wo9P3aZJbR/TCjmNEsPJPWIITVxVHrr5is
+3Y4InJ6F8pt4etNyRtreOA7OpJfL4z2fYgtxPeOeSkKtI8/hU/x7pbJP40PKiNog
+EHa3g1YBk2sRqia3cCVZDEYjLymiJAUnyCWMktGWajs+931V44QSGGM+vWi/DauA
+VHP5p3w+PsIm1Xf2o1gQl2N2rA==
+=a8/z
+-----END PGP PUBLIC KEY BLOCK-----
+
diff --git a/LogicalTypes.md b/LogicalTypes.md
index 29cf5272..762769e7 100644
--- a/LogicalTypes.md
+++ b/LogicalTypes.md
@@ -32,12 +32,34 @@ This file contains the specification for all logical types.
 The parquet format's `ConvertedType` stores the type annotation. The annotation
 may require additional metadata fields, as well as rules for those fields.
 
-### UTF8 (Strings)
+## String Types
+
+### UTF8
 
 `UTF8` may only be used to annotate the binary primitive type and indicates
 that the byte array should be interpreted as a UTF-8 encoded character string.
 
-The sort order used for `UTF8` strings is `UNSIGNED` byte-wise comparison.
+The sort order used for `UTF8` strings is unsigned byte-wise comparison.
+
+### ENUM
+
+`ENUM` annotates the binary primitive type and indicates that the value
+was converted from an enumerated type in another data model (e.g. Thrift, Avro, Protobuf).
+Applications using a data model lacking a native enum type should interpret `ENUM`
+annotated field as a UTF-8 encoded string. 
+
+The sort order used for `ENUM` values is unsigned byte-wise comparison.
+
+### UUID
+
+`UUID` annotates a 16-byte fixed-length binary. The value is encoded using
+big-endian, so that `00112233-4455-6677-8899-aabbccddeeff` is encoded as the
+bytes `00 11 22 33 44 55 66 77 88 99 aa bb cc dd ee ff`
+(This example is from [wikipedia's UUID page][wiki-uuid]).
+
+The sort order used for `UUID` values is unsigned byte-wise comparison.
+
+[wiki-uuid]: https://en.wikipedia.org/wiki/Universally_unique_identifier
 
 ## Numeric Types
 
@@ -57,7 +79,7 @@ allows.
 implied by the `int32` and `int64` primitive types if no other annotation is
 present and should be considered optional.
 
-The sort order used for signed integer types is `SIGNED`.
+The sort order used for signed integer types is signed.
 
 ### Unsigned Integers
 
@@ -74,7 +96,7 @@ allows.
 `UINT_8`, `UINT_16`, and `UINT_32` must annotate an `int32` primitive type and
 `UINT_64` must annotate an `int64` primitive type.
 
-The sort order used for unsigned integer types is `UNSIGNED`.
+The sort order used for unsigned integer types is unsigned.
 
 ### DECIMAL
 
@@ -104,8 +126,8 @@ integer. A precision too large for the underlying type (see below) is an error.
 A `SchemaElement` with the `DECIMAL` `ConvertedType` must also have both
 `scale` and `precision` fields set, even if scale is 0 by default.
 
-The sort order used for `DECIMAL` values is `SIGNED`. The order is equivalent
-to signed comparison of decimal values.
+The sort order used for `DECIMAL` values is signed comparison of the represented
+value.
 
 If the column uses `int32` or `int64` physical types, then signed comparison of
 the integer values produces the correct ordering. If the physical type is
@@ -121,7 +143,7 @@ comparison.
 annotate an `int32` that stores the number of days from the Unix epoch, 1
 January 1970.
 
-The sort order used for `DATE` is `SIGNED`.
+The sort order used for `DATE` is signed.
 
 ### TIME\_MILLIS
 
@@ -129,7 +151,7 @@ The sort order used for `DATE` is `SIGNED`.
 without a date. It must annotate an `int32` that stores the number of
 milliseconds after midnight.
 
-The sort order used for `TIME\_MILLIS` is `SIGNED`.
+The sort order used for `TIME\_MILLIS` is signed.
 
 ### TIME\_MICROS
 
@@ -137,7 +159,7 @@ The sort order used for `TIME\_MILLIS` is `SIGNED`.
 without a date. It must annotate an `int64` that stores the number of
 microseconds after midnight.
 
-The sort order used for `TIME\_MICROS` is `SIGNED`.
+The sort order used for `TIME\_MICROS` is signed.
 
 ### TIMESTAMP\_MILLIS
 
@@ -145,7 +167,7 @@ The sort order used for `TIME\_MICROS` is `SIGNED`.
 millisecond precision. It must annotate an `int64` that stores the number of
 milliseconds from the Unix epoch, 00:00:00.000 on 1 January 1970, UTC.
 
-The sort order used for `TIMESTAMP\_MILLIS` is `SIGNED`.
+The sort order used for `TIMESTAMP\_MILLIS` is signed.
 
 ### TIMESTAMP\_MICROS
 
@@ -153,7 +175,7 @@ The sort order used for `TIMESTAMP\_MILLIS` is `SIGNED`.
 microsecond precision. It must annotate an `int64` that stores the number of
 microseconds from the Unix epoch, 00:00:00.000000 on 1 January 1970, UTC.
 
-The sort order used for `TIMESTAMP\_MICROS` is `SIGNED`.
+The sort order used for `TIMESTAMP\_MICROS` is signed.
 
 ### INTERVAL
 
@@ -169,8 +191,9 @@ example, there is no requirement that a large number of days should be
 expressed as a mix of months and days because there is not a constant
 conversion from days to months.
 
-The sort order used for `INTERVAL` is `UNSIGNED`, produced by sorting by
-the value of months, then days, then milliseconds with unsigned comparison.
+The sort order used for `INTERVAL` is undefined. When writing data, no min/max
+statistics should be saved for this type and if such non-compliant statistics
+are found during reading, they must be ignored.
 
 ## Embedded Types
 
@@ -184,6 +207,8 @@ string of valid JSON as defined by the [JSON specification][json-spec]
 
 [json-spec]: http://json.org/
 
+The sort order used for `JSON` is unsigned byte-wise comparison.
+
 ### BSON
 
 `BSON` is used for an embedded BSON document. It must annotate a `binary`
@@ -192,6 +217,8 @@ defined by the [BSON specification][bson-spec].
 
 [bson-spec]: http://bsonspec.org/spec.html
 
+The sort order used for `BSON` is unsigned byte-wise comparison.
+
 ## Nested Types
 
 This section specifies how `LIST` and `MAP` can be used to encode nested types
diff --git a/Makefile b/Makefile
index bb682803..17750c12 100644
--- a/Makefile
+++ b/Makefile
@@ -17,7 +17,14 @@
 # under the License.
 #
 
+.PHONY: doc
+
 thrift:
 	mkdir -p generated
-	thrift --gen cpp -o generated src/thrift/parquet.thrift 
-	thrift --gen java -o generated src/thrift/parquet.thrift 
+	thrift --gen cpp -o generated src/main/thrift/parquet.thrift
+	thrift --gen java -o generated src/main/thrift/parquet.thrift
+
+%.html: %.md
+	pandoc -f markdown_github -t html -o $@ $<
+
+doc: README.html PageIndex.html LogicalTypes.html
diff --git a/PageIndex.md b/PageIndex.md
new file mode 100644
index 00000000..7ac6e423
--- /dev/null
+++ b/PageIndex.md
@@ -0,0 +1,101 @@
+<!--
+  - Licensed to the Apache Software Foundation (ASF) under one
+  - or more contributor license agreements.  See the NOTICE file
+  - distributed with this work for additional information
+  - regarding copyright ownership.  The ASF licenses this file
+  - to you under the Apache License, Version 2.0 (the
+  - "License"); you may not use this file except in compliance
+  - with the License.  You may obtain a copy of the License at
+  -
+  -   http://www.apache.org/licenses/LICENSE-2.0
+  -
+  - Unless required by applicable law or agreed to in writing,
+  - software distributed under the License is distributed on an
+  - "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  - KIND, either express or implied.  See the License for the
+  - specific language governing permissions and limitations
+  - under the License.
+  -->
+
+# ColumnIndex Layout to Support Page Skipping
+
+This documents describes the format for column index pages in the Parquet
+footer. These pages contain statistics for DataPages and can be used to skip
+pages when scanning data in ordered and unordered columns.
+
+## Problem Statement
+In previous versions of the format, Statistics are stored for ColumnChunks in
+ColumnMetaData and for individual pages inside DataPageHeader structs. When
+reading pages, a reader had to process the page header in order to determine
+whether the page could be skipped based on the statistics. This means the reader
+had to access all pages in a column, thus likely reading most of the column
+data from disk.
+
+## Goals
+1. Make both range scans and point lookups I/O efficient by allowing direct
+   access to pages based on their min and max values. In particular:
+2. A single-row lookup in a rowgroup based on the sort column of that rowgroup
+   will only read one data page per retrieved column.
+    * Range scans on the sort column will only need to read the exact data
+      pages that contain relevant data.
+    * Make other selective scans I/O efficient: if we have a very selective
+      predicate on a non-sorting column, for the other retrieved columns we
+      should only need to access data pages that contain matching rows.
+3. No additional decoding effort for scans without selective predicates, e.g.,
+   full-row group scans. If a reader determines that it does not need to read
+   the index data, it does not incur any overhead.
+4. Index pages for sorted columns use minimal storage by storing only the
+   boundary elements between pages.
+
+## Non-Goals
+* Support for the equivalent of secondary indices, ie, an index structure
+  sorted on the key values over non-sorted data.
+
+
+## Technical Approach
+
+We add two new per-column structures to the row group metadata:
+* ColumnIndex: this allows navigation to the pages of a column based on column
+  values and is used to locate data pages that contain matching values for a
+  scan predicate
+* OffsetIndex: this allows navigation by row index and is used to retrieve
+  values for rows identified as matches via the ColumnIndex. Once rows of a
+  column are skipped, the corresponding rows in the other columns have to be
+  skipped. Hence the OffsetIndexes for each column in a RowGroup are stored
+  together.
+
+The new index structures are stored separately from RowGroup, near the footer,
+so that a reader does not have to pay the I/O and deserialization cost for
+reading the them if it is not doing selective scans. The index structures'
+location and length are stored in ColumnChunk.
+
+ ![Page Index Layout](doc/images/PageIndexLayout.png)
+
+Some observations:
+* We don't need to record the lower bound for the first page and the upper
+  bound for the last page, because the row group Statistics can provide that.
+  We still include those for the sake of uniformity, and the overhead should be
+  negligible.
+* We store lower and upper bounds for the values of each page. These may be the
+  actual minimum and maximum values found on a page, but can also be (more
+  compact) values that do not exist on a page. For example, instead of storing
+  ""Blart Versenwald III", a writer may set `min_values[i]="B"`,
+  `max_values[i]="C"`. This allows writers to truncate large values and writers
+  should use this to enforce some reasonable bound on the size of the index
+  structures.
+* Readers that support ColumnIndex should not also use page statistics. The
+  only reason to write page-level statistics when writing ColumnIndex structs
+  is to support older readers (not recommended).
+
+For ordered columns, this allows a reader to find matching pages by performing
+a binary search in `min_values` and `max_values`. For unordered columns, a
+reader can find matching pages by sequentially reading `min_values` and
+`max_values`.
+
+For range scans this approach can be extended to return ranges of rows, page
+indices, and page offsets to scan in each column. The reader can then
+initialize a scanner for each column and fast forward them to the start row of
+the scan.
+
+
+
diff --git a/README.md b/README.md
index 786cddc1..c759be95 100644
--- a/README.md
+++ b/README.md
@@ -50,11 +50,11 @@ Java resources can be build using `mvn package`. The current stable version shou
 
 C++ thrift resources can be generated via make.
 
-Thrift can be also code-genned into any other thrift-supported language.
+Thrift can be also code-generated into any other thrift-supported language.
 
 ## Glossary
-  - Block (HDFS block): This means a block in HDFS and the meaning is 
-    unchanged for describing this file format.  The file format is 
+  - Block (HDFS block): This means a block in HDFS and the meaning is
+    unchanged for describing this file format.  The file format is
     designed to work well on top of HDFS.
 
   - File: A HDFS file that must include the metadata for the file.
@@ -73,7 +73,7 @@ Thrift can be also code-genned into any other thrift-supported language.
 
 Hierarchically, a file consists of one or more row groups.  A row group
 contains exactly one column chunk per column.  Column chunks contain one or
-more pages. 
+more pages.
 
 ## Unit of parallelization
   - MapReduce - File/Row Group
@@ -101,14 +101,14 @@ This file and the [thrift definition](src/main/thrift/parquet.thrift) should be
     4-byte length in bytes of file metadata (little endian)
     4-byte magic number "PAR1"
 
-In the above example, there are N columns in this table, split into M row 
-groups.  The file metadata contains the locations of all the column metadata 
-start locations.  More details on what is contained in the metadata can be found 
+In the above example, there are N columns in this table, split into M row
+groups.  The file metadata contains the locations of all the column metadata
+start locations.  More details on what is contained in the metadata can be found
 in the thrift definition.
 
 Metadata is written after the data to allow for single pass writing.
 
-Readers are expected to first read the file metadata to find all the column 
+Readers are expected to first read the file metadata to find all the column
 chunks they are interested in.  The columns chunks should then be read sequentially.
 
  ![File Layout](https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif)
@@ -146,36 +146,37 @@ documented in
 [logical-types]: LogicalTypes.md
 
 ## Nested Encoding
-To encode nested columns, Parquet uses the Dremel encoding with definition and 
-repetition levels.  Definition levels specify how many optional fields in the 
+To encode nested columns, Parquet uses the Dremel encoding with definition and
+repetition levels.  Definition levels specify how many optional fields in the
 path for the column are defined.  Repetition levels specify at what repeated field
 in the path has the value repeated.  The max definition and repetition levels can
 be computed from the schema (i.e. how much nesting there is).  This defines the
 maximum number of bits required to store the levels (levels are defined for all
-values in the column).  
+values in the column).
 
 Two encodings for the levels are supported BIT_PACKED and RLE. Only RLE is now used as it supersedes BIT_PACKED.
 
 ## Nulls
-Nullity is encoded in the definition levels (which is run-length encoded).  NULL values 
-are not encoded in the data.  For example, in a non-nested schema, a column with 1000 NULLs 
+Nullity is encoded in the definition levels (which is run-length encoded).  NULL values
+are not encoded in the data.  For example, in a non-nested schema, a column with 1000 NULLs
 would be encoded with run-length encoding (0, 1000 times) for the definition levels and
-nothing else.  
+nothing else.
 
 ## Data Pages
 For data pages, the 3 pieces of information are encoded back to back, after the page
-header.  We have the 
- - repetition levels data, 
- - definition levels data,  
- - encoded values.
+header.
+In order we have:
+ 1. repetition levels data
+ 1. definition levels data
+ 1. encoded values
 
-The value of `uncompressed_page_size` specified in the header is for all 3 pieces combined.
+The value of `uncompressed_page_size` specified in the header is for all the 3 pieces combined.
 
-The data for the data page is always required.  The definition and repetition levels
+The encoded values for the data page is always required.  The definition and repetition levels
 are optional, based on the schema definition.  If the column is not nested (i.e.
 the path to the column has length 1), we do not encode the repetition levels (it would
 always have the value 1).  For data that is required, the definition levels are
-skipped (if encoded, it will always have the value of the max definition level). 
+skipped (if encoded, it will always have the value of the max definition level).
 
 For example, in the case where the column is non-nested and required, the data in the
 page is only the encoded values.
@@ -183,53 +184,57 @@ page is only the encoded values.
 The supported encodings are described in [Encodings.md](https://github.com/apache/parquet-format/blob/master/Encodings.md)
 
 ## Column chunks
-Column chunks are composed of pages written back to back.  The pages share a common 
-header and readers can skip over pages they are not interested in.  The data for the 
-page follows the header and can be compressed and/or encoded.  The compression and 
+Column chunks are composed of pages written back to back.  The pages share a common
+header and readers can skip over pages they are not interested in.  The data for the
+page follows the header and can be compressed and/or encoded.  The compression and
 encoding is specified in the page metadata.
 
+Additionally, files can contain an optional column index to allow readers to
+skip pages more efficiently. See [PageIndex.md](PageIndex.md) for details and
+the reasoning behind adding these to the format.
+
 ## Checksumming
-Data pages can be individually checksummed.  This allows disabling of checksums at the 
+Data pages can be individually checksummed.  This allows disabling of checksums at the
 HDFS file level, to better support single row lookups.
 
 ## Error recovery
-If the file metadata is corrupt, the file is lost.  If the column metadata is corrupt, 
-that column chunk is lost (but column chunks for this column in other row groups are 
-okay).  If a page header is corrupt, the remaining pages in that chunk are lost.  If 
-the data within a page is corrupt, that page is lost.  The file will be more 
+If the file metadata is corrupt, the file is lost.  If the column metadata is corrupt,
+that column chunk is lost (but column chunks for this column in other row groups are
+okay).  If a page header is corrupt, the remaining pages in that chunk are lost.  If
+the data within a page is corrupt, that page is lost.  The file will be more
 resilient to corruption with smaller row groups.
 
-Potential extension: With smaller row groups, the biggest issue is placing the file 
-metadata at the end.  If an error happens while writing the file metadata, all the 
-data written will be unreadable.  This can be fixed by writing the file metadata 
-every Nth row group.  
-Each file metadata would be cumulative and include all the row groups written so 
-far.  Combining this with the strategy used for rc or avro files using sync markers, 
-a reader could recover partially written files.  
+Potential extension: With smaller row groups, the biggest issue is placing the file
+metadata at the end.  If an error happens while writing the file metadata, all the
+data written will be unreadable.  This can be fixed by writing the file metadata
+every Nth row group.
+Each file metadata would be cumulative and include all the row groups written so
+far.  Combining this with the strategy used for rc or avro files using sync markers,
+a reader could recover partially written files.
 
 ## Separating metadata and column data.
 The format is explicitly designed to separate the metadata from the data.  This
 allows splitting columns into multiple files, as well as having a single metadata
-file reference multiple parquet files.  
+file reference multiple parquet files.
 
 ## Configurations
-- Row group size: Larger row groups allow for larger column chunks which makes it 
-possible to do larger sequential IO.  Larger groups also require more buffering in 
-the write path (or a two pass write).  We recommend large row groups (512MB - 1GB). 
-Since an entire row group might need to be read, we want it to completely fit on 
-one HDFS block.  Therefore, HDFS block sizes should also be set to be larger.  An 
-optimized read setup would be: 1GB row groups, 1GB HDFS block size, 1 HDFS block 
+- Row group size: Larger row groups allow for larger column chunks which makes it
+possible to do larger sequential IO.  Larger groups also require more buffering in
+the write path (or a two pass write).  We recommend large row groups (512MB - 1GB).
+Since an entire row group might need to be read, we want it to completely fit on
+one HDFS block.  Therefore, HDFS block sizes should also be set to be larger.  An
+optimized read setup would be: 1GB row groups, 1GB HDFS block size, 1 HDFS block
 per HDFS file.
-- Data page size: Data pages should be considered indivisible so smaller data pages 
-allow for more fine grained reading (e.g. single row lookup).  Larger page sizes 
-incur less space overhead (less page headers) and potentially less parsing overhead 
-(processing headers).  Note: for sequential scans, it is not expected to read a page 
+- Data page size: Data pages should be considered indivisible so smaller data pages
+allow for more fine grained reading (e.g. single row lookup).  Larger page sizes
+incur less space overhead (less page headers) and potentially less parsing overhead
+(processing headers).  Note: for sequential scans, it is not expected to read a page
 at a time; this is not the IO chunk.  We recommend 8KB for page sizes.
 
 ## Extensibility
 There are many places in the format for compatible extensions:
 - File Version: The file metadata contains a version.
-- Encodings: Encodings are specified by enum and more can be added in the future.  
+- Encodings: Encodings are specified by enum and more can be added in the future.
 - Page types: Additional page types can be added and safely skipped.
 
 ## Contributing
@@ -238,7 +243,7 @@ Changes to this core format definition are proposed and discussed in depth on th
 
 ## Code of Conduct
 
-We hold ourselves and the Parquet developer community to a code of conduct as described by [Twitter OSS](https://engineering.twitter.com/opensource): <https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md>. 
+We hold ourselves and the Parquet developer community to a code of conduct as described by [Twitter OSS](https://engineering.twitter.com/opensource): <https://github.com/twitter/code-of-conduct/blob/master/code-of-conduct.md>.
 
 ## License
 Copyright 2013 Twitter, Cloudera and other contributors.
diff --git a/dev/merge_parquet_pr.py b/dev/merge_parquet_pr.py
index dcb7af6b..b1a63d41 100755
--- a/dev/merge_parquet_pr.py
+++ b/dev/merge_parquet_pr.py
@@ -45,8 +45,21 @@
 print "PARQUET_HOME = " + PARQUET_HOME
 print "PROJECT_NAME = " + PROJECT_NAME
 
-# Remote name which points to the Gihub site
-PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME", "apache-github")
+def lines_from_cmd(cmd):
+    return subprocess.check_output(cmd.split(" ")).strip().split("\n")
+
+# Remote name which points to the GitHub site
+PR_REMOTE_NAME = os.environ.get("PR_REMOTE_NAME")
+available_remotes = lines_from_cmd("git remote")
+if PR_REMOTE_NAME is not None:
+    if PR_REMOTE_NAME not in available_remotes:
+        print "ERROR: git remote '%s' is not defined." % PR_REMOTE_NAME
+        sys.exit(-1)
+else:
+    remote_candidates = ["github-apache", "apache-github"]
+    # Get first available remote from the list of candidates
+    PR_REMOTE_NAME = next((remote for remote in available_remotes if remote in remote_candidates), None)
+
 # Remote name which points to Apache git
 PUSH_REMOTE_NAME = os.environ.get("PUSH_REMOTE_NAME", "apache")
 # ASF JIRA username
@@ -79,11 +92,19 @@ def fail(msg):
 
 
 def run_cmd(cmd):
-    if isinstance(cmd, list):
-        return subprocess.check_output(cmd)
-    else:
-        return subprocess.check_output(cmd.split(" "))
-
+    try:
+        if isinstance(cmd, list):
+            return subprocess.check_output(cmd)
+        else:
+            return subprocess.check_output(cmd.split(" "))
+    except subprocess.CalledProcessError as e:
+        # this avoids hiding the stdout / stderr of failed processes
+        print 'Command failed: %s' % cmd
+        print 'With output:'
+        print '--------------'
+        print e.output
+        print '--------------'
+        raise e
 
 def continue_maybe(prompt):
     result = raw_input("\n%s (y/n): " % prompt)
@@ -210,9 +231,9 @@ def fix_version_from_branch(branch, versions):
         return filter(lambda x: x.name.startswith(branch_ver), versions)[-1]
 
 def exctract_jira_id(title):
-    m = re.search('^(PARQUET-[0-9]+):.*$', title)
+    m = re.search(r'^(PARQUET-[0-9]+)\b.*$', title, re.IGNORECASE)
     if m and m.groups > 0:
-        return m.group(1)
+        return m.group(1).upper()
     else:
         fail("PR title should be prefixed by a jira id \"PARQUET-XXX: ...\", found: \"%s\"" % title)
 
@@ -287,12 +308,29 @@ def get_version_json(version_str):
 
     print "Succesfully resolved %s with fixVersions=%s!" % (jira_id, fix_versions)
 
-
-if not JIRA_USERNAME:
-    JIRA_USERNAME =  raw_input("Env JIRA_USERNAME not set, please enter your JIRA username:")
-
-if not JIRA_PASSWORD:
-    JIRA_PASSWORD =  getpass.getpass("Env JIRA_PASSWORD not set, please enter your JIRA password:")
+if JIRA_IMPORTED:
+    jira_login_accepted = False
+    while not jira_login_accepted:
+        if JIRA_USERNAME:
+            print "JIRA username: %s" % JIRA_USERNAME
+        else:
+            JIRA_USERNAME = raw_input("Enter JIRA username: ")
+
+        if not JIRA_PASSWORD:
+            JIRA_PASSWORD = getpass.getpass("Enter JIRA password: ")
+
+        try:
+            asf_jira = jira.client.JIRA({'server': JIRA_API_BASE},
+                                        basic_auth=(JIRA_USERNAME, JIRA_PASSWORD))
+            jira_login_accepted = True
+        except Exception as e:
+            print "\nJIRA login failed, try again\n"
+            JIRA_USERNAME = None
+            JIRA_PASSWORD = None
+else:
+    print "WARNING: Could not find jira python library. Run 'sudo pip install jira' to install."
+    print "The tool will continue to run but won't handle the JIRA."
+    print
 
 branches = get_json("%s/branches" % GITHUB_API_BASE)
 branch_names = filter(lambda x: x.startswith("branch-"), [x['name'] for x in branches])
@@ -305,7 +343,8 @@ def get_version_json(version_str):
 
 url = pr["url"]
 title = pr["title"]
-check_jira(title)
+if JIRA_IMPORTED:
+    check_jira(title)
 body = pr["body"]
 target_ref = pr["base"]["ref"]
 user_login = pr["user"]["login"]
@@ -350,5 +389,5 @@ def get_version_json(version_str):
     jira_comment = "Issue resolved by pull request %s\n[%s/%s]" % (pr_num, GITHUB_BASE, pr_num)
     resolve_jira(title, merged_refs, jira_comment)
 else:
-    print "Could not find jira-python library. Run 'sudo pip install jira-python' to install."
+    print "WARNING: Could not find jira python library. Run 'sudo pip install jira' to install."
     print "Exiting without trying to close the associated JIRA."
diff --git a/dev/source-release.sh b/dev/source-release.sh
index a58c520d..8d4e281c 100644
--- a/dev/source-release.sh
+++ b/dev/source-release.sh
@@ -58,8 +58,7 @@ git archive $release_hash --prefix $tag/ -o $tarball
 
 # sign the archive
 gpg --armor --output ${tarball}.asc --detach-sig $tarball
-gpg --print-md MD5 $tarball > ${tarball}.md5
-shasum $tarball > ${tarball}.sha
+shasum -a 512 $tarball > ${tarball}.sha512
 
 # check out the parquet RC folder
 svn co --depth=empty https://dist.apache.org/repos/dist/dev/parquet tmp
diff --git a/doc/images/PageIndexLayout.png b/doc/images/PageIndexLayout.png
new file mode 100644
index 00000000..83c5b02f
Binary files /dev/null and b/doc/images/PageIndexLayout.png differ
diff --git a/pom.xml b/pom.xml
index b7183a9f..0b0c1141 100644
--- a/pom.xml
+++ b/pom.xml
@@ -28,7 +28,7 @@
 
   <groupId>org.apache.parquet</groupId>
   <artifactId>parquet-format</artifactId>
-  <version>2.3.2-SNAPSHOT</version>
+  <version>2.5.1-SNAPSHOT</version>
   <packaging>jar</packaging>
 
   <name>Apache Parquet Format</name>
@@ -38,7 +38,7 @@
   <scm>
     <connection>scm:git:git@github.com:apache/parquet-format.git</connection>
     <url>scm:git:git@github.com:apache/parquet-format.git</url>
-    <developerConnection>scm:git:https://git-wip-us.apache.org/repos/asf/parquet-format.git</developerConnection>
+    <developerConnection>scm:git:git@github.com:apache/parquet-format.git</developerConnection>
     <tag>HEAD</tag>
   </scm>
 
@@ -170,6 +170,7 @@
       <plugin>
         <groupId>org.apache.rat</groupId>
         <artifactId>apache-rat-plugin</artifactId>
+        <version>0.12</version>
         <executions>
           <execution>
             <phase>test</phase>
@@ -179,6 +180,7 @@
           </execution>
         </executions>
         <configuration>
+          <consoleOutput>true</consoleOutput>
           <excludes>
             <exclude>**/*.avro</exclude>
             <exclude>**/*.avsc</exclude>
@@ -192,10 +194,8 @@
             <exclude>**/build/**</exclude>
             <exclude>**/target/**</exclude>
             <exclude>.git/**</exclude>
-            <exclude>.gitignore</exclude>
             <exclude>.idea/**</exclude>
             <exclude>*/jdiff/*.xml</exclude>
-            <exclude>.travis.yml</exclude>
             <exclude>licenses/**</exclude>
             <exclude>thrift-${thrift.version}/**</exclude>
             <exclude>thrift-${thrift.version}.tar.gz</exclude>
@@ -209,12 +209,7 @@
     <dependency>
       <groupId>org.slf4j</groupId>
       <artifactId>slf4j-api</artifactId>
-      <version>1.7.2</version>
-    </dependency>
-    <dependency>
-      <groupId>org.slf4j</groupId>
-      <artifactId>slf4j-nop</artifactId>
-      <version>1.7.2</version>
+      <version>1.7.12</version>
     </dependency>
     <dependency>
       <groupId>org.apache.thrift</groupId>
diff --git a/src/main/java/org/apache/parquet/format/LogicalTypes.java b/src/main/java/org/apache/parquet/format/LogicalTypes.java
new file mode 100644
index 00000000..7c63e41d
--- /dev/null
+++ b/src/main/java/org/apache/parquet/format/LogicalTypes.java
@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.parquet.format;
+
+/**
+ * Convenience instances of logical type classes.
+ */
+public class LogicalTypes {
+  public static class TimeUnits {
+    public static final TimeUnit MILLIS = TimeUnit.MILLIS(new MilliSeconds());
+    public static final TimeUnit MICROS = TimeUnit.MICROS(new MicroSeconds());
+  }
+
+  public static LogicalType DECIMAL(int scale, int precision) {
+    return LogicalType.DECIMAL(new DecimalType(scale, precision));
+  }
+
+  public static final LogicalType UTF8 = LogicalType.STRING(new StringType());
+  public static final LogicalType MAP  = LogicalType.MAP(new MapType());
+  public static final LogicalType LIST = LogicalType.LIST(new ListType());
+  public static final LogicalType ENUM = LogicalType.ENUM(new EnumType());
+  public static final LogicalType DATE = LogicalType.DATE(new DateType());
+  public static final LogicalType TIME_MILLIS = LogicalType.TIME(new TimeType(true, TimeUnits.MILLIS));
+  public static final LogicalType TIME_MICROS = LogicalType.TIME(new TimeType(true, TimeUnits.MICROS));
+  public static final LogicalType TIMESTAMP_MILLIS = LogicalType.TIMESTAMP(new TimestampType(true, TimeUnits.MILLIS));
+  public static final LogicalType TIMESTAMP_MICROS = LogicalType.TIMESTAMP(new TimestampType(true, TimeUnits.MICROS));
+  public static final LogicalType INT_8 = LogicalType.INTEGER(new IntType((byte) 8, true));
+  public static final LogicalType INT_16 = LogicalType.INTEGER(new IntType((byte) 16, true));
+  public static final LogicalType INT_32 = LogicalType.INTEGER(new IntType((byte) 32, true));
+  public static final LogicalType INT_64 = LogicalType.INTEGER(new IntType((byte) 64, true));
+  public static final LogicalType UINT_8 = LogicalType.INTEGER(new IntType((byte) 8, false));
+  public static final LogicalType UINT_16 = LogicalType.INTEGER(new IntType((byte) 16, false));
+  public static final LogicalType UINT_32 = LogicalType.INTEGER(new IntType((byte) 32, false));
+  public static final LogicalType UINT_64 = LogicalType.INTEGER(new IntType((byte) 64, false));
+  public static final LogicalType UNKNOWN = LogicalType.UNKNOWN(new NullType());
+  public static final LogicalType JSON = LogicalType.JSON(new JsonType());
+  public static final LogicalType BSON = LogicalType.BSON(new BsonType());
+}
diff --git a/src/main/java/org/apache/parquet/format/Util.java b/src/main/java/org/apache/parquet/format/Util.java
index 09cae2bd..55d61ff4 100644
--- a/src/main/java/org/apache/parquet/format/Util.java
+++ b/src/main/java/org/apache/parquet/format/Util.java
@@ -57,6 +57,22 @@
  */
 public class Util {
 
+  public static void writeColumnIndex(ColumnIndex columnIndex, OutputStream to) throws IOException {
+    write(columnIndex, to);
+  }
+
+  public static ColumnIndex readColumnIndex(InputStream from) throws IOException {
+    return read(from, new ColumnIndex());
+  }
+
+  public static void writeOffsetIndex(OffsetIndex offsetIndex, OutputStream to) throws IOException {
+    write(offsetIndex, to);
+  }
+
+  public static OffsetIndex readOffsetIndex(InputStream from) throws IOException {
+    return read(from, new OffsetIndex());
+  }
+
   public static void writePageHeader(PageHeader pageHeader, OutputStream to) throws IOException {
     write(pageHeader, to);
   }
diff --git a/src/main/thrift/parquet.thrift b/src/main/thrift/parquet.thrift
index 47812abc..160c161a 100644
--- a/src/main/thrift/parquet.thrift
+++ b/src/main/thrift/parquet.thrift
@@ -28,23 +28,12 @@ namespace java org.apache.parquet.format
  * with the encodings to control the on disk storage format.
  * For example INT16 is not included as a type since a good encoding of INT32
  * would handle this.
- *
- * When a logical type is not present, the type-defined sort order of these
- * physical types are:
- * * BOOLEAN - false, true
- * * INT32 - signed comparison
- * * INT64 - signed comparison
- * * INT96 - signed comparison
- * * FLOAT - signed comparison
- * * DOUBLE - signed comparison
- * * BYTE_ARRAY - unsigned byte-wise comparison
- * * FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
  */
 enum Type {
   BOOLEAN = 0;
   INT32 = 1;
   INT64 = 2;
-  INT96 = 3;
+  INT96 = 3;  // deprecated, only used by legacy implementations.
   FLOAT = 4;
   DOUBLE = 5;
   BYTE_ARRAY = 6;
@@ -95,12 +84,12 @@ enum ConvertedType {
    * Stored as days since Unix epoch, encoded as the INT32 physical type.
    *
    */
-  DATE = 6; 
+  DATE = 6;
 
-  /** 
-   * A time 
+  /**
+   * A time
    *
-   * The total number of milliseconds since midnight.  The value is stored 
+   * The total number of milliseconds since midnight.  The value is stored
    * as an INT32 physical type.
    */
   TIME_MILLIS = 7;
@@ -115,11 +104,11 @@ enum ConvertedType {
 
   /**
    * A date/time combination
-   * 
+   *
    * Date and time recorded as milliseconds since the Unix epoch.  Recorded as
    * a physical type of INT64.
    */
-  TIMESTAMP_MILLIS = 9; 
+  TIMESTAMP_MILLIS = 9;
 
   /**
    * A date/time combination
@@ -130,11 +119,11 @@ enum ConvertedType {
   TIMESTAMP_MICROS = 10;
 
 
-  /** 
-   * An unsigned integer value.  
-   * 
-   * The number describes the maximum number of meainful data bits in 
-   * the stored value. 8, 16 and 32 bit values are stored using the 
+  /**
+   * An unsigned integer value.
+   *
+   * The number describes the maximum number of meainful data bits in
+   * the stored value. 8, 16 and 32 bit values are stored using the
    * INT32 physical type.  64 bit values are stored using the INT64
    * physical type.
    *
@@ -158,40 +147,33 @@ enum ConvertedType {
   INT_32 = 17;
   INT_64 = 18;
 
-  /** 
+  /**
    * An embedded JSON document
-   * 
+   *
    * A JSON document embedded within a single UTF8 column.
    */
   JSON = 19;
 
-  /** 
+  /**
    * An embedded BSON document
-   * 
-   * A BSON document embedded within a single BINARY column. 
+   *
+   * A BSON document embedded within a single BINARY column.
    */
   BSON = 20;
 
   /**
    * An interval of time
-   * 
+   *
    * This type annotates data stored as a FIXED_LEN_BYTE_ARRAY of length 12
    * This data is composed of three separate little endian unsigned
    * integers.  Each stores a component of a duration of time.  The first
    * integer identifies the number of months associated with the duration,
    * the second identifies the number of days associated with the duration
-   * and the third identifies the number of milliseconds associated with 
+   * and the third identifies the number of milliseconds associated with
    * the provided duration.  This duration of time is independent of any
    * particular timezone or date.
    */
   INTERVAL = 21;
-
-  /**
-   * Annotates a column that is always null
-   * Sometimes when discovering the schema of existing data
-   * values are always null
-   */
-  NULL = 25;
 }
 
 /**
@@ -219,12 +201,12 @@ struct Statistics {
     * Values are encoded using PLAIN encoding, except that variable-length byte
     * arrays do not include a length prefix.
     *
-    * These fields encode min and max values determined by SIGNED comparison
+    * These fields encode min and max values determined by signed comparison
     * only. New files should use the correct order for a column's logical type
     * and store the values in the min_value and max_value fields.
     *
     * To support older readers, these may be set when the column order is
-    * SIGNED.
+    * signed.
     */
    1: optional binary max;
    2: optional binary min;
@@ -242,6 +224,116 @@ struct Statistics {
    6: optional binary min_value;
 }
 
+/** Empty structs to use as logical type annotations */
+struct StringType {}  // allowed for BINARY, must be encoded with UTF-8
+struct UUIDType {}    // allowed for FIXED[16], must encoded raw UUID bytes
+struct MapType {}     // see LogicalTypes.md
+struct ListType {}    // see LogicalTypes.md
+struct EnumType {}    // allowed for BINARY, must be encoded with UTF-8
+struct DateType {}    // allowed for INT32
+
+/**
+ * Logical type to annotate a column that is always null.
+ *
+ * Sometimes when discovering the schema of existing data, values are always
+ * null and the physical type can't be determined. This annotation signals
+ * the case where the physical type was guessed from all null values.
+ */
+struct NullType {}    // allowed for any physical type, only null values stored
+
+/**
+ * Decimal logical type annotation
+ *
+ * To maintain forward-compatibility in v1, implementations using this logical
+ * type must also set scale and precision on the annotated SchemaElement.
+ *
+ * Allowed for physical types: INT32, INT64, FIXED, and BINARY
+ */
+struct DecimalType {
+  1: required i32 scale
+  2: required i32 precision
+}
+
+/** Time units for logical types */
+struct MilliSeconds {}
+struct MicroSeconds {}
+union TimeUnit {
+  1: MilliSeconds MILLIS
+  2: MicroSeconds MICROS
+}
+
+/**
+ * Timestamp logical type annotation
+ *
+ * Allowed for physical types: INT64
+ */
+struct TimestampType {
+  1: required bool isAdjustedToUTC
+  2: required TimeUnit unit
+}
+
+/**
+ * Time logical type annotation
+ *
+ * Allowed for physical types: INT32 (millis), INT64 (micros)
+ */
+struct TimeType {
+  1: required bool isAdjustedToUTC
+  2: required TimeUnit unit
+}
+
+/**
+ * Integer logical type annotation
+ *
+ * bitWidth must be 8, 16, 32, or 64.
+ *
+ * Allowed for physical types: INT32, INT64
+ */
+struct IntType {
+  1: required byte bitWidth
+  2: required bool isSigned
+}
+
+/**
+ * Embedded JSON logical type annotation
+ *
+ * Allowed for physical types: BINARY
+ */
+struct JsonType {
+}
+
+/**
+ * Embedded BSON logical type annotation
+ *
+ * Allowed for physical types: BINARY
+ */
+struct BsonType {
+}
+
+/**
+ * LogicalType annotations to replace ConvertedType.
+ *
+ * To maintain compatibility, implementations using LogicalType for a
+ * SchemaElement must also set the corresponding ConvertedType from the
+ * following table.
+ */
+union LogicalType {
+  1:  StringType STRING       // use ConvertedType UTF8
+  2:  MapType MAP             // use ConvertedType MAP
+  3:  ListType LIST           // use ConvertedType LIST
+  4:  EnumType ENUM           // use ConvertedType ENUM
+  5:  DecimalType DECIMAL     // use ConvertedType DECIMAL
+  6:  DateType DATE           // use ConvertedType DATE
+  7:  TimeType TIME           // use ConvertedType TIME_MICROS or TIME_MILLIS
+  8:  TimestampType TIMESTAMP // use ConvertedType TIMESTAMP_MICROS or TIMESTAMP_MILLIS
+  // 9: reserved for INTERVAL
+  10: IntType INTEGER         // use ConvertedType INT_* or UINT_*
+  11: NullType UNKNOWN        // no compatible ConvertedType
+  12: JsonType JSON           // use ConvertedType JSON
+  13: BsonType BSON           // use ConvertedType BSON
+  14: UUIDType UUID
+}
+
 /**
  * Represents a element inside a schema definition.
  *  - if it is a group (inner node) then type is undefined and num_children is defined
@@ -289,6 +381,13 @@ struct SchemaElement {
    */
   9: optional i32 field_id;
 
+  /**
+   * The logical type of this SchemaElement
+   *
+   * LogicalType replaces ConvertedType, but ConvertedType is still required
+   * for some logical types to ensure forward-compatibility in format v1.
+   */
+  10: optional LogicalType logicalType
 }
 
 /**
@@ -321,7 +420,7 @@ enum Encoding {
    */
   PLAIN_DICTIONARY = 2;
 
-  /** Group packed run length encoding. Usable for definition/reptition levels
+  /** Group packed run length encoding. Usable for definition/repetition levels
    * encoding and Booleans (on one bit: 0 is false; 1 is true.)
    */
   RLE = 3;
@@ -353,13 +452,20 @@ enum Encoding {
 
 /**
  * Supported compression algorithms.
+ *
+ * Codecs added in 2.4 can be read by readers based on 2.4 and later.
+ * Codec support may vary between readers based on the format version and
+ * libraries available at runtime. Gzip, Snappy, and LZ4 codecs are
+ * widely available, while Zstd and Brotli require additional libraries.
  */
 enum CompressionCodec {
   UNCOMPRESSED = 0;
   SNAPPY = 1;
   GZIP = 2;
   LZO = 3;
-  BROTLI = 4;
+  BROTLI = 4; // Added in 2.4
+  LZ4 = 5;    // Added in 2.4
+  ZSTD = 6;   // Added in 2.4
 }
 
 enum PageType {
@@ -369,6 +475,16 @@ enum PageType {
   DATA_PAGE_V2 = 3;
 }
 
+/**
+ * Enum to annotate whether lists of min/max elements inside ColumnIndex
+ * are ordered and if so, in which direction.
+ */
+enum BoundaryOrder {
+  UNORDERED = 0;
+  ASCENDING = 1;
+  DESCENDING = 2;
+}
+
 /** Data page header */
 struct DataPageHeader {
   /** Number of values, including NULLs, in this data page. **/
@@ -403,7 +519,7 @@ struct DictionaryPageHeader {
 }
 
 /**
- * New page format alowing reading levels without decompressing the data
+ * New page format allowing reading levels without decompressing the data
  * Repetition and definition levels are uncompressed
  * The remaining section containing the data is compressed if is_compressed is true
  **/
@@ -420,9 +536,9 @@ struct DataPageHeaderV2 {
 
   // repetition levels and definition levels are always using RLE (without size in it)
 
-  /** length of the repetition levels */
-  5: required i32 definition_levels_byte_length;
   /** length of the definition levels */
+  5: required i32 definition_levels_byte_length;
+  /** length of the repetition levels */
   6: required i32 repetition_levels_byte_length;
 
   /**  whether the values are compressed.
@@ -542,6 +658,74 @@ struct ColumnMetaData {
    * This information can be used to determine if all data pages are
    * dictionary encoded for example **/
   13: optional list<PageEncodingStats> encoding_stats;
+
+  /** Byte offset from beginning of file to bloom filter data. The bloom filters
+   * data of columns together is stored before the start of row group wihch describe.**/
+  14: optional i64 bloom_filter_offset;
+}
+
+/**
+  * Block based algorithm type annotation.
+  */
+struct BlockAlgorithm {
+}
+
+/**
+  * Definition of bloom filter algorithm.
+  * In order for farward compatibility, we use union to replace enum.
+  */
+union BloomFilterAlgorithm {
+  /** Block based bloom filter. 
+   * The bloom filter bitset is separated into tiny bucket as tiny bloom
+   * filter, the high 32 bits hash value is used to select bucket, and
+   * lower 32 bits hash values are used to set bits in tiny bloom filter.
+   * See “Cache-, Hash- and Space-Efficient Bloom Filters”. Specifically,
+   * one tiny bloom filter contains eight 32-bit words (4 bytes stored in
+   * little endian), and the algorithm set one bit in each 32-bit word.
+   *
+   * In order to set bits in bucket, the algorithm need 8 SALT values
+   * (0x47b6137bU, 0x44974d91U, 0x8824ad5bU, 0xa2b7289dU, 0x705495c7U,
+   * 0x2df1424bU, 0x9efc4947U, 0x5c6bfb31U) to calculate index with formular:
+   *                  index[i] = (hash * SALT[i]) >> 27
+   **/
+   1: BlockAlgorithm BLOCK;
+}
+
+/**
+  * Hash strategy type annotation.
+  */
+struct Murmur3 {
+}
+
+/** 
+ * Definition for hash function used to compute hash of column value.
+ * Note that the hash function take plain encoding (little endian order for
+ * primitive types, see Encoding definition for detail) of column values as input.
+ *
+ * In order for farward compatibility, we use union to replace enum.
+ */
+union BloomFilterHash {
+  /** Murmur3 Hash Strategy.
+   * Murmur3 hash has 32 bits and 128 bits hash variants, we use least significant 
+   * 64 bits from the result of x64 128-bits function murmur3hash_x64_128 in little
+   * endian order.
+   **/
+  1: Murmur3 MURMUR3;
+}
+
+/**
+  * Bloom filter header is stored at beginning of bloom filter data of each column 
+  * and followed by its bitset.
+  */
+struct BloomFilterHeader {
+  /** The size of bitset in bytes, must be a power of 2**/
+  1: required i32 numBytes;
+
+  /** The algorithm for setting bits. **/
+  2: required BloomFilterAlgorithm algorithm;
+
+  /** The hash function used for bloom filter. **/
+  3: required BloomFilterHash hash;
 }
 
 struct ColumnChunk {
@@ -558,6 +742,18 @@ struct ColumnChunk {
    * metadata.
    **/
   3: optional ColumnMetaData meta_data
+
+  /** File offset of ColumnChunk's OffsetIndex **/
+  4: optional i64 offset_index_offset
+
+  /** Size of ColumnChunk's OffsetIndex, in bytes **/
+  5: optional i32 offset_index_length
+
+  /** File offset of ColumnChunk's ColumnIndex **/
+  6: optional i64 column_index_offset
+
+  /** Size of ColumnChunk's ColumnIndex, in bytes **/
+  7: optional i32 column_index_length
 }
 
 struct RowGroup {
@@ -582,7 +778,9 @@ struct RowGroup {
 struct TypeDefinedOrder {}
 
 /**
- * Union to specify the order used for min, max, and sorting values in a column.
+ * Union to specify the order used for the min_value and max_value fields for a
+ * column. This union takes the role of an enhanced enum that allows rich
+ * elements (which will be needed for a collation-based ordering in the future).
  *
  * Possible values are:
  * * TypeDefinedOrder - the column uses the order defined by its logical or
@@ -592,9 +790,116 @@ struct TypeDefinedOrder {}
  * for this column should be ignored.
  */
 union ColumnOrder {
+
+  /**
+   * The sort orders for logical types are:
+   *   UTF8 - unsigned byte-wise comparison
+   *   INT8 - signed comparison
+   *   INT16 - signed comparison
+   *   INT32 - signed comparison
+   *   INT64 - signed comparison
+   *   UINT8 - unsigned comparison
+   *   UINT16 - unsigned comparison
+   *   UINT32 - unsigned comparison
+   *   UINT64 - unsigned comparison
+   *   DECIMAL - signed comparison of the represented value
+   *   DATE - signed comparison
+   *   TIME_MILLIS - signed comparison
+   *   TIME_MICROS - signed comparison
+   *   TIMESTAMP_MILLIS - signed comparison
+   *   TIMESTAMP_MICROS - signed comparison
+   *   INTERVAL - unsigned comparison
+   *   JSON - unsigned byte-wise comparison
+   *   BSON - unsigned byte-wise comparison
+   *   ENUM - unsigned byte-wise comparison
+   *   LIST - undefined
+   *   MAP - undefined
+   *
+   * In the absence of logical types, the sort order is determined by the physical type:
+   *   BOOLEAN - false, true
+   *   INT32 - signed comparison
+   *   INT64 - signed comparison
+   *   INT96 (only used for legacy timestamps) - undefined
+   *   FLOAT - signed comparison of the represented value (*)
+   *   DOUBLE - signed comparison of the represented value (*)
+   *   BYTE_ARRAY - unsigned byte-wise comparison
+   *   FIXED_LEN_BYTE_ARRAY - unsigned byte-wise comparison
+   *
+   * (*) Because the sorting order is not specified properly for floating
+   *     point values (relations vs. total ordering) the following
+   *     compatibility rules should be applied when reading statistics:
+   *     - If the min is a NaN, it should be ignored.
+   *     - If the max is a NaN, it should be ignored.
+   *     - If the min is +0, the row group may contain -0 values as well.
+   *     - If the max is -0, the row group may contain +0 values as well.
+   *     - When looking for NaN values, min and max should be ignored.
+   */
   1: TypeDefinedOrder TYPE_ORDER;
 }
 
+struct PageLocation {
+  /** Offset of the page in the file **/
+  1: required i64 offset
+
+  /**
+   * Size of the page, including header. Sum of compressed_page_size and header
+   * length
+   */
+  2: required i32 compressed_page_size
+
+  /**
+   * Index within the RowGroup of the first row of the page; this means pages
+   * change on record boundaries (r = 0).
+   */
+  3: required i64 first_row_index
+}
+
+struct OffsetIndex {
+  /**
+   * PageLocations, ordered by increasing PageLocation.offset. It is required
+   * that page_locations[i].first_row_index < page_locations[i+1].first_row_index.
+   */
+  1: required list<PageLocation> page_locations
+}
+
+/**
+ * Description for ColumnIndex.
+ * Each <array-field>[i] refers to the page at OffsetIndex.page_locations[i]
+ */
+struct ColumnIndex {
+  /**
+   * A list of Boolean values to determine the validity of the corresponding
+   * min and max values. If true, a page contains only null values, and writers
+   * have to set the corresponding entries in min_values and max_values to
+   * byte[0], so that all lists have the same length. If false, the
+   * corresponding entries in min_values and max_values must be valid.
+   */
+  1: required list<bool> null_pages
+
+  /**
+   * Two lists containing lower and upper bounds for the values of each page.
+   * These may be the actual minimum and maximum values found on a page, but
+   * can also be (more compact) values that do not exist on a page. For
+   * example, instead of storing ""Blart Versenwald III", a writer may set
+   * min_values[i]="B", max_values[i]="C". Such more compact values must still
+   * be valid values within the column's logical type. Readers must make sure
+   * that list entries are populated before using them by inspecting null_pages.
+   */
+  2: required list<binary> min_values
+  3: required list<binary> max_values
+
+  /**
+   * Stores whether both min_values and max_values are orderd and if so, in
+   * which direction. This allows readers to perform binary searches in both
+   * lists. Readers cannot assume that max_values[i] <= min_values[i+1], even
+   * if the lists are ordered.
+   */
+  4: required BoundaryOrder boundary_order
+
+  /** A list containing the number of null values for each page **/
+  5: optional list<i64> null_counts
+}
+
 /**
  * Description for file metadata
  */
@@ -626,11 +931,16 @@ struct FileMetaData {
   6: optional string created_by
 
   /**
-   * Sort order used for each column in this file.
+   * Sort order used for the min_value and max_value fields of each column in
+   * this file. Each sort order corresponds to one column, determined by its
+   * position in the list, matching the position of the column in the schema.
+   *
+   * Without column_orders, the meaning of the min_value and max_value fields is
+   * undefined. To ensure well-defined behaviour, if min_value and max_value are
+   * written to a Parquet file, column_orders must be written as well.
    *
-   * If this list is not present, then the order for each column is assumed to
-   * be Signed. In addition, min and max values for INTERVAL or DECIMAL stored
-   * as fixed or bytes should be ignored.
+   * The obsolete min and max fields are always sorted by signed comparison
+   * regardless of column_orders.
    */
   7: optional list<ColumnOrder> column_orders;
 }


 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Add bloom filters to parquet statistics
> ---------------------------------------
>
>                 Key: PARQUET-41
>                 URL: https://issues.apache.org/jira/browse/PARQUET-41
>             Project: Parquet
>          Issue Type: New Feature
>          Components: parquet-format, parquet-mr
>            Reporter: Alex Levenson
>            Assignee: Junjie Chen
>            Priority: Major
>              Labels: filter2, pull-request-available
>
> For row groups with no dictionary, we could still produce a bloom filter. This could be very useful in filtering entire row groups.
> Pull request:
> https://github.com/apache/parquet-mr/pull/215



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)