You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by bo...@apache.org on 2019/05/10 12:04:37 UTC

[impala] branch master updated (1e49b6a -> d423979)

This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 1e49b6a  IMPALA-2029. Implement our own getJNIEnv equivalent
     new 9075099  Drop statestore update frequency during data loading
     new d423979  IMPALA-5843: Use page index in Parquet files to skip pages

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/common/global-flags.cc                      |   2 +-
 be/src/exec/hdfs-scan-node-base.cc                 |  26 +-
 be/src/exec/hdfs-scan-node-base.h                  |  14 +-
 be/src/exec/parquet/CMakeLists.txt                 |   3 +
 be/src/exec/parquet/hdfs-parquet-scanner.cc        | 258 +++++++-
 be/src/exec/parquet/hdfs-parquet-scanner.h         |  69 +-
 be/src/exec/parquet/parquet-column-readers.cc      | 312 ++++++++-
 be/src/exec/parquet/parquet-column-readers.h       | 133 +++-
 be/src/exec/parquet/parquet-column-stats.cc        |  50 +-
 be/src/exec/parquet/parquet-column-stats.h         |  12 +
 be/src/exec/parquet/parquet-common-test.cc         | 122 ++++
 be/src/exec/parquet/parquet-common.cc              |  99 +++
 be/src/exec/parquet/parquet-common.h               |  79 +++
 be/src/exec/parquet/parquet-level-decoder.h        |  29 +-
 be/src/exec/parquet/parquet-page-index-test.cc     | 108 ++++
 be/src/exec/parquet/parquet-page-index.cc          | 147 +++++
 be/src/exec/parquet/parquet-page-index.h           |  83 +++
 be/src/exprs/literal.cc                            |  16 +-
 be/src/runtime/scoped-buffer.h                     |   4 +-
 be/src/service/query-options.cc                    |   7 +-
 be/src/service/query-options.h                     |   4 +-
 common/thrift/ImpalaInternalService.thrift         |   3 +
 common/thrift/ImpalaService.thrift                 |   5 +
 testdata/bin/create-load-data.sh                   |   3 +-
 testdata/data/README                               | 124 +++-
 testdata/data/alltypes_tiny_pages.parquet          | Bin 0 -> 454233 bytes
 testdata/data/alltypes_tiny_pages_plain.parquet    | Bin 0 -> 811756 bytes
 testdata/data/decimals_1_10.parquet                | Bin 0 -> 3874 bytes
 testdata/data/double_nested_decimals.parquet       | Bin 0 -> 3846 bytes
 testdata/data/nested_decimals.parquet              | Bin 0 -> 2369 bytes
 .../QueryTest/nested-types-parquet-page-index.test | 704 +++++++++++++++++++++
 ...rquet-page-index-alltypes-tiny-pages-plain.test | 234 +++++++
 .../parquet-page-index-alltypes-tiny-pages.test    | 234 +++++++
 .../QueryTest/parquet-page-index-large.test        | 357 +++++++++++
 .../queries/QueryTest/parquet-page-index.test      | 219 +++++++
 .../queries/QueryTest/stats-extrapolation.test     |  12 +-
 tests/query_test/test_parquet_stats.py             |  24 +
 37 files changed, 3399 insertions(+), 97 deletions(-)
 create mode 100644 be/src/exec/parquet/parquet-common-test.cc
 create mode 100644 be/src/exec/parquet/parquet-page-index-test.cc
 create mode 100644 be/src/exec/parquet/parquet-page-index.cc
 create mode 100644 be/src/exec/parquet/parquet-page-index.h
 create mode 100644 testdata/data/alltypes_tiny_pages.parquet
 create mode 100644 testdata/data/alltypes_tiny_pages_plain.parquet
 create mode 100644 testdata/data/decimals_1_10.parquet
 create mode 100644 testdata/data/double_nested_decimals.parquet
 create mode 100644 testdata/data/nested_decimals.parquet
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test


[impala] 02/02: IMPALA-5843: Use page index in Parquet files to skip pages

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d423979866c737005882f54d157819e43897a5e8
Author: Zoltan Borok-Nagy <bo...@cloudera.com>
AuthorDate: Thu Jan 24 18:41:23 2019 +0100

    IMPALA-5843: Use page index in Parquet files to skip pages
    
    This commit implements page filtering based on the Parquet page index.
    
    The read and evaluation of the page index is done by the
    HdfsParquetScanner. At first, we determine the row ranges we are
    interested in, and based on the row ranges we determine the candidate
    pages for each column that we are reading.
    
    We still issue one ScanRange per column chunk, but we specify
    sub-ranges that store the candidate pages, i.e. we don't read
    the whole column chunk, but only fractions of it.
    
    Pages are not aligned across column chunks, i.e. page #2 of column A
    might store completely different rows than page #2 of column B.
    It means we need to implement some kind of row-skipping logic
    when we read the data pages. This logic is implemented in
    BaseScalarColumnReader and ScalarColumnReader. Collection column
    readers know nothing about page filtering.
    
    Page filtering can be turned off by setting the query option
    'read_parquet_page_index' to false.
    
    Testing:
     * added some unit tests for the row range and
       page selection logic
     * generated various Parquet files with Parquet-MR
     * enabled Page index writing and wrote selective queries against
       tables written by Impala. Current tests are likely to use page
       filtering transparently.
    
    Performance:
     * Measured locally, observed 3x to 20x speedup for selective queries.
       The speedup was proportional to the IO operations need to be done.
    
     * The TPCH benchmark didn't show a significant performance change. It
       is not a suprise since the data is not being sorted in any useful
       way. So the main goal was to not introduce perf regression.
    
    TODO:
       * measure performance for remote reads
    
    Change-Id: I0cc99f129f2048dbafbe7f5a51d1ea3a5005731a
    Reviewed-on: http://gerrit.cloudera.org:8080/12065
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/common/global-flags.cc                      |   2 +-
 be/src/exec/hdfs-scan-node-base.cc                 |  26 +-
 be/src/exec/hdfs-scan-node-base.h                  |  14 +-
 be/src/exec/parquet/CMakeLists.txt                 |   3 +
 be/src/exec/parquet/hdfs-parquet-scanner.cc        | 258 +++++++-
 be/src/exec/parquet/hdfs-parquet-scanner.h         |  69 +-
 be/src/exec/parquet/parquet-column-readers.cc      | 312 ++++++++-
 be/src/exec/parquet/parquet-column-readers.h       | 133 +++-
 be/src/exec/parquet/parquet-column-stats.cc        |  50 +-
 be/src/exec/parquet/parquet-column-stats.h         |  12 +
 be/src/exec/parquet/parquet-common-test.cc         | 122 ++++
 be/src/exec/parquet/parquet-common.cc              |  99 +++
 be/src/exec/parquet/parquet-common.h               |  79 +++
 be/src/exec/parquet/parquet-level-decoder.h        |  29 +-
 be/src/exec/parquet/parquet-page-index-test.cc     | 108 ++++
 be/src/exec/parquet/parquet-page-index.cc          | 147 +++++
 be/src/exec/parquet/parquet-page-index.h           |  83 +++
 be/src/exprs/literal.cc                            |  16 +-
 be/src/runtime/scoped-buffer.h                     |   4 +-
 be/src/service/query-options.cc                    |   7 +-
 be/src/service/query-options.h                     |   4 +-
 common/thrift/ImpalaInternalService.thrift         |   3 +
 common/thrift/ImpalaService.thrift                 |   5 +
 testdata/data/README                               | 124 +++-
 testdata/data/alltypes_tiny_pages.parquet          | Bin 0 -> 454233 bytes
 testdata/data/alltypes_tiny_pages_plain.parquet    | Bin 0 -> 811756 bytes
 testdata/data/decimals_1_10.parquet                | Bin 0 -> 3874 bytes
 testdata/data/double_nested_decimals.parquet       | Bin 0 -> 3846 bytes
 testdata/data/nested_decimals.parquet              | Bin 0 -> 2369 bytes
 .../QueryTest/nested-types-parquet-page-index.test | 704 +++++++++++++++++++++
 ...rquet-page-index-alltypes-tiny-pages-plain.test | 234 +++++++
 .../parquet-page-index-alltypes-tiny-pages.test    | 234 +++++++
 .../QueryTest/parquet-page-index-large.test        | 357 +++++++++++
 .../queries/QueryTest/parquet-page-index.test      | 219 +++++++
 .../queries/QueryTest/stats-extrapolation.test     |  12 +-
 tests/query_test/test_parquet_stats.py             |  24 +
 36 files changed, 3397 insertions(+), 96 deletions(-)

diff --git a/be/src/common/global-flags.cc b/be/src/common/global-flags.cc
index 5e2f019..ca1261a 100644
--- a/be/src/common/global-flags.cc
+++ b/be/src/common/global-flags.cc
@@ -264,7 +264,7 @@ DEFINE_double_hidden(invalidate_tables_fraction_on_memory_pressure, 0.1,
     "The fraction of tables to invalidate when CatalogdTableInvalidator considers the "
     "old GC generation to be almost full.");
 
-DEFINE_bool_hidden(enable_parquet_page_index_writing_debug_only, false, "If true, Impala "
+DEFINE_bool_hidden(enable_parquet_page_index_writing_debug_only, true, "If true, Impala "
     "will write the Parquet page index. It is not advised to use it in a production "
     "environment, only for testing and development. This flag is meant to be temporary. "
     "We plan to remove this flag once Impala is able to read the page index and has "
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 11fd066..9f453fc 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -585,13 +585,31 @@ ScanRange* HdfsScanNodeBase::AllocateScanRange(hdfsFS fs, const char* file,
     const ScanRange* original_split) {
   ScanRangeMetadata* metadata = runtime_state_->obj_pool()->Add(
         new ScanRangeMetadata(partition_id, original_split));
-  return AllocateScanRange(fs, file, len, offset, metadata, disk_id, expected_local,
+  return AllocateScanRange(fs, file, len, offset, {}, metadata, disk_id, expected_local,
       is_erasure_coded, buffer_opts);
 }
 
 ScanRange* HdfsScanNodeBase::AllocateScanRange(hdfsFS fs, const char* file,
-    int64_t len, int64_t offset, ScanRangeMetadata* metadata, int disk_id, bool expected_local,
-    bool is_erasure_coded, const BufferOpts& buffer_opts) {
+    int64_t len, int64_t offset, ScanRangeMetadata* metadata, int disk_id,
+    bool expected_local, bool is_erasure_coded, const BufferOpts& buffer_opts) {
+  return AllocateScanRange(fs, file, len, offset, {}, metadata, disk_id, expected_local,
+      is_erasure_coded, buffer_opts);
+}
+
+ScanRange* HdfsScanNodeBase::AllocateScanRange(hdfsFS fs, const char* file,
+    int64_t len, int64_t offset, vector<ScanRange::SubRange>&& sub_ranges,
+    int64_t partition_id, int disk_id, bool expected_local, bool is_erasure_coded,
+    const BufferOpts& buffer_opts, const ScanRange* original_split) {
+  ScanRangeMetadata* metadata = runtime_state_->obj_pool()->Add(
+      new ScanRangeMetadata(partition_id, original_split));
+  return AllocateScanRange(fs, file, len, offset, move(sub_ranges), metadata,
+      disk_id, expected_local, is_erasure_coded, buffer_opts);
+}
+
+ScanRange* HdfsScanNodeBase::AllocateScanRange(hdfsFS fs, const char* file,
+    int64_t len, int64_t offset, vector<ScanRange::SubRange>&& sub_ranges,
+    ScanRangeMetadata* metadata, int disk_id, bool expected_local, bool is_erasure_coded,
+    const BufferOpts& buffer_opts) {
   DCHECK_GE(disk_id, -1);
   // Require that the scan range is within [0, file_length). While this cannot be used
   // to guarantee safety (file_length metadata may be stale), it avoids different
@@ -606,7 +624,7 @@ ScanRange* HdfsScanNodeBase::AllocateScanRange(hdfsFS fs, const char* file,
 
   ScanRange* range = runtime_state_->obj_pool()->Add(new ScanRange);
   range->Reset(fs, file, len, offset, disk_id, expected_local, is_erasure_coded,
-      buffer_opts, metadata);
+      buffer_opts, move(sub_ranges), metadata);
   return range;
 }
 
diff --git a/be/src/exec/hdfs-scan-node-base.h b/be/src/exec/hdfs-scan-node-base.h
index 8397d4e..d90f66f 100644
--- a/be/src/exec/hdfs-scan-node-base.h
+++ b/be/src/exec/hdfs-scan-node-base.h
@@ -259,7 +259,19 @@ class HdfsScanNodeBase : public ScanNode {
   /// Same as above, but it takes a pointer to a ScanRangeMetadata object which contains
   /// the partition_id, original_splits, and other information about the scan range.
   io::ScanRange* AllocateScanRange(hdfsFS fs, const char* file, int64_t len,
-      int64_t offset, ScanRangeMetadata* metadata, int disk_id, bool expected_local,
+      int64_t offset, ScanRangeMetadata* metadata, int disk_id,
+      bool expected_local, bool is_erasure_coded, const io::BufferOpts& buffer_opts);
+
+  /// Same as the first overload, but it takes sub-ranges as well.
+  io::ScanRange* AllocateScanRange(hdfsFS fs, const char* file, int64_t len,
+      int64_t offset, std::vector<io::ScanRange::SubRange>&& sub_ranges,
+      int64_t partition_id, int disk_id, bool expected_local, bool is_erasure_coded,
+      const io::BufferOpts& buffer_opts, const io::ScanRange* original_split = NULL);
+
+  /// Same as above, but it takes both sub-ranges and metadata.
+  io::ScanRange* AllocateScanRange(hdfsFS fs, const char* file, int64_t len,
+      int64_t offset, std::vector<io::ScanRange::SubRange>&& sub_ranges,
+      ScanRangeMetadata* metadata, int disk_id, bool expected_local,
       bool is_erasure_coded, const io::BufferOpts& buffer_opts);
 
   /// Old API for compatibility with text scanners (e.g. LZO text scanner).
diff --git a/be/src/exec/parquet/CMakeLists.txt b/be/src/exec/parquet/CMakeLists.txt
index 7bf24e2..8d11780 100644
--- a/be/src/exec/parquet/CMakeLists.txt
+++ b/be/src/exec/parquet/CMakeLists.txt
@@ -34,11 +34,14 @@ add_library(Parquet
   parquet-level-decoder.cc
   parquet-metadata-utils.cc
   parquet-common.cc
+  parquet-page-index.cc
 )
 
 add_dependencies(Parquet gen-deps)
 
 ADD_BE_LSAN_TEST(parquet-bool-decoder-test)
+ADD_BE_LSAN_TEST(parquet-common-test)
+ADD_BE_LSAN_TEST(parquet-page-index-test)
 ADD_BE_LSAN_TEST(parquet-plain-test)
 ADD_BE_LSAN_TEST(parquet-version-test)
 ADD_BE_LSAN_TEST(hdfs-parquet-scanner-test)
diff --git a/be/src/exec/parquet/hdfs-parquet-scanner.cc b/be/src/exec/parquet/hdfs-parquet-scanner.cc
index 611eeba..65cf788 100644
--- a/be/src/exec/parquet/hdfs-parquet-scanner.cc
+++ b/be/src/exec/parquet/hdfs-parquet-scanner.cc
@@ -27,7 +27,6 @@
 #include "exec/hdfs-scan-node.h"
 #include "exec/parquet/parquet-collection-column-reader.h"
 #include "exec/parquet/parquet-column-readers.h"
-#include "exec/parquet/parquet-column-stats.h"
 #include "exec/scanner-context.inline.h"
 #include "rpc/thrift-util.h"
 #include "runtime/collection-value-builder.h"
@@ -37,6 +36,7 @@
 #include "runtime/runtime-filter.inline.h"
 #include "runtime/runtime-state.h"
 #include "util/dict-encoding.h"
+#include "util/scope-exit-trigger.h"
 
 #include "common/names.h"
 
@@ -103,7 +103,8 @@ HdfsParquetScanner::HdfsParquetScanner(HdfsScanNodeBase* scan_node, RuntimeState
     parquet_compressed_page_size_counter_(nullptr),
     parquet_uncompressed_page_size_counter_(nullptr),
     coll_items_read_counter_(0),
-    codegend_process_scratch_batch_fn_(nullptr) {
+    codegend_process_scratch_batch_fn_(nullptr),
+    page_index_(this) {
   assemble_rows_timer_.Stop();
 }
 
@@ -117,6 +118,13 @@ Status HdfsParquetScanner::Open(ScannerContext* context) {
           TUnit::UNIT);
   num_row_groups_counter_ =
       ADD_COUNTER(scan_node_->runtime_profile(), "NumRowGroups", TUnit::UNIT);
+  num_row_groups_with_page_index_counter_ =
+      ADD_COUNTER(scan_node_->runtime_profile(), "NumRowGroupsWithPageIndex",
+          TUnit::UNIT);
+  num_stats_filtered_pages_counter_ =
+      ADD_COUNTER(scan_node_->runtime_profile(), "NumStatsFilteredPages", TUnit::UNIT);
+  num_pages_counter_ =
+      ADD_COUNTER(scan_node_->runtime_profile(), "NumPages", TUnit::UNIT);
   num_scanners_with_no_reads_counter_ =
       ADD_COUNTER(scan_node_->runtime_profile(), "NumScannersWithNoReads", TUnit::UNIT);
   num_dict_filtered_row_groups_counter_ =
@@ -127,6 +135,8 @@ Status HdfsParquetScanner::Open(ScannerContext* context) {
       scan_node_->runtime_profile(), "ParquetCompressedPageSize", TUnit::BYTES);
   parquet_uncompressed_page_size_counter_ = ADD_SUMMARY_STATS_COUNTER(
       scan_node_->runtime_profile(), "ParquetUncompressedPageSize", TUnit::BYTES);
+  process_page_index_stats_ =
+      ADD_SUMMARY_STATS_TIMER(scan_node_->runtime_profile(), "PageIndexProcessingTime");
 
   codegend_process_scratch_batch_fn_ = reinterpret_cast<ProcessScratchBatchFn>(
       scan_node_->GetCodegenFn(THdfsFileFormat::PARQUET));
@@ -501,18 +511,16 @@ Status HdfsParquetScanner::EvaluateStatsConjuncts(
     DCHECK_LT(col_idx, row_group.columns.size());
 
     const vector<parquet::ColumnOrder>& col_orders = file_metadata.column_orders;
-    const parquet::ColumnOrder* col_order = nullptr;
-    if (col_idx < col_orders.size()) col_order = &col_orders[col_idx];
+    const parquet::ColumnOrder* col_order = col_idx < col_orders.size() ?
+        &col_orders[col_idx] : nullptr;
 
     const parquet::ColumnChunk& col_chunk = row_group.columns[col_idx];
     const ColumnType& col_type = slot_desc->type();
 
     DCHECK(node->element != nullptr);
 
-    ColumnStatsReader stat_reader(col_chunk, col_type, col_order,  *node->element);
-    if (col_type.IsTimestampType()) {
-      stat_reader.SetTimestampDecoder(CreateTimestampDecoder(*node->element));
-    }
+    ColumnStatsReader stat_reader = CreateColumnStatsReader(col_chunk, col_type,
+        col_order, *node->element);
 
     int64_t null_count = 0;
     bool null_count_result = stat_reader.ReadNullCountStat(&null_count);
@@ -523,16 +531,7 @@ Status HdfsParquetScanner::EvaluateStatsConjuncts(
 
     const string& fn_name = eval->root().function_name();
     ColumnStatsReader::StatsField stats_field;
-    if (fn_name == "lt" || fn_name == "le") {
-      // We need to get min stats.
-      stats_field = ColumnStatsReader::StatsField::MIN;
-    } else if (fn_name == "gt" || fn_name == "ge") {
-      // We need to get max stats.
-      stats_field = ColumnStatsReader::StatsField::MAX;
-    } else {
-      DCHECK(false) << "Unsupported function name for statistics evaluation: " << fn_name;
-      continue;
-    }
+    if (!ColumnStatsReader::GetRequiredStatsField(fn_name, &stats_field)) continue;
 
     void* slot = min_max_tuple_->GetSlot(slot_desc->tuple_offset());
     bool stats_read = stat_reader.ReadFromThrift(stats_field, slot);
@@ -609,6 +608,10 @@ Status HdfsParquetScanner::NextRowGroup() {
     }
 
     COUNTER_ADD(num_row_groups_counter_, 1);
+    if (!row_group.columns.empty() &&
+        row_group.columns.front().__isset.offset_index_offset) {
+      COUNTER_ADD(num_row_groups_with_page_index_counter_, 1);
+    }
 
     // Evaluate row group statistics.
     bool skip_row_group_on_stats;
@@ -619,6 +622,26 @@ Status HdfsParquetScanner::NextRowGroup() {
       continue;
     }
 
+    // Evaluate page index.
+    if (!min_max_conjunct_evals_.empty() &&
+        state_->query_options().parquet_read_page_index) {
+      bool filter_pages;
+      Status page_index_status = ProcessPageIndex(&filter_pages);
+      if (!page_index_status.ok()) {
+        RETURN_IF_ERROR(state_->LogOrReturnError(page_index_status.msg()));
+      }
+      if (filter_pages && candidate_ranges_.empty()) {
+        // Page level statistics filtered the whole row group. It can happen when there
+        // is a gap in the data between the pages and the user's predicate hit that gap.
+        // E.g. column chunk 'A' has two pages with statistics {min: 0, max: 5},
+        // {min: 10, max: 20}, and query is 'select * from T where A = 8'.
+        // It can also happen when there are predicates against different columns, and
+        // the passing row ranges of the predicates don't have a common subset.
+        COUNTER_ADD(num_stats_filtered_row_groups_counter_, 1);
+        continue;
+      }
+    }
+
     InitCollectionColumns();
     RETURN_IF_ERROR(InitScalarColumns());
 
@@ -664,6 +687,141 @@ Status HdfsParquetScanner::NextRowGroup() {
   return Status::OK();
 }
 
+bool HdfsParquetScanner::ReadStatFromIndex(const ColumnStatsReader& stats_reader,
+    const parquet::ColumnIndex& column_index, int page_idx,
+    ColumnStatsReader::StatsField stats_field, bool* is_null_page, void* slot) {
+  *is_null_page = column_index.null_pages[page_idx];
+  if (*is_null_page) return false;
+  switch (stats_field) {
+    case ColumnStatsReader::StatsField::MIN:
+      return stats_reader.ReadFromString(
+          stats_field, column_index.min_values[page_idx], slot);
+    case ColumnStatsReader::StatsField::MAX:
+      return stats_reader.ReadFromString(
+          stats_field, column_index.max_values[page_idx], slot);
+    default:
+      DCHECK(false);
+  }
+  return false;
+}
+
+Status HdfsParquetScanner::ProcessPageIndex(bool* filter_pages) {
+  MonotonicStopWatch single_process_page_index_timer;
+  single_process_page_index_timer.Start();
+  candidate_ranges_.clear();
+  *filter_pages = false;
+  for (auto& scalar_reader : scalar_readers_) scalar_reader->ResetPageFiltering();
+  RETURN_IF_ERROR(page_index_.ReadAll(row_group_idx_));
+  if (page_index_.IsEmpty()) return Status::OK();
+  // We can release the raw page index buffer when we exit this function.
+  const auto scope_exit = MakeScopeExitTrigger([this](){page_index_.Release();});
+  RETURN_IF_ERROR(EvaluatePageIndex(filter_pages));
+  RETURN_IF_ERROR(ComputeCandidatePagesForColumns(filter_pages));
+  single_process_page_index_timer.Stop();
+  process_page_index_stats_->UpdateCounter(single_process_page_index_timer.ElapsedTime());
+  return Status::OK();
+}
+
+Status HdfsParquetScanner::EvaluatePageIndex(bool* filter_pages) {
+  parquet::RowGroup& row_group = file_metadata_.row_groups[row_group_idx_];
+  vector<RowRange> skip_ranges;
+
+  for (int i = 0; i < min_max_conjunct_evals_.size(); ++i) {
+    ScalarExprEvaluator* eval = min_max_conjunct_evals_[i];
+    SlotDescriptor* slot_desc = scan_node_->min_max_tuple_desc()->slots()[i];
+
+    // Resolve column path to determine col idx.
+    SchemaNode* node = nullptr;
+    bool pos_field, missing_field;
+    RETURN_IF_ERROR(schema_resolver_->ResolvePath(slot_desc->col_path(), &node,
+        &pos_field, &missing_field));
+    if (pos_field || missing_field) continue;
+
+    int col_idx = node->col_idx;
+    DCHECK_LT(col_idx, row_group.columns.size());
+    const parquet::ColumnChunk& col_chunk = row_group.columns[col_idx];
+    if (col_chunk.column_index_length == 0) continue;
+
+    parquet::ColumnIndex column_index;
+    RETURN_IF_ERROR(page_index_.DeserializeColumnIndex(col_chunk, &column_index));
+
+    const ColumnType& col_type = slot_desc->type();
+    const vector<parquet::ColumnOrder>& col_orders = file_metadata_.column_orders;
+    const parquet::ColumnOrder* col_order = col_idx < col_orders.size() ?
+        &col_orders[col_idx] : nullptr;
+    ColumnStatsReader stats_reader = CreateColumnStatsReader(col_chunk, col_type,
+        col_order, *node->element);
+
+    min_max_tuple_->Init(scan_node_->min_max_tuple_desc()->byte_size());
+    void* slot = min_max_tuple_->GetSlot(slot_desc->tuple_offset());
+
+    const int num_of_pages = column_index.null_pages.size();
+    const string& fn_name = eval->root().function_name();
+    ColumnStatsReader::StatsField stats_field;
+    if (!ColumnStatsReader::GetRequiredStatsField(fn_name, &stats_field)) continue;
+
+    for (int page_idx = 0; page_idx < num_of_pages; ++page_idx) {
+      bool value_read, is_null_page;
+      value_read = ReadStatFromIndex(stats_reader, column_index, page_idx, stats_field,
+          &is_null_page, slot);
+      if (!is_null_page && !value_read) continue;
+      TupleRow row;
+      row.SetTuple(0, min_max_tuple_);
+      if (is_null_page || !ExecNode::EvalPredicate(eval, &row)) {
+        BaseScalarColumnReader* scalar_reader = scalar_reader_map_[col_idx];
+        RETURN_IF_ERROR(page_index_.DeserializeOffsetIndex(col_chunk,
+            &scalar_reader->offset_index_));
+        RowRange row_range;
+        GetRowRangeForPage(row_group, scalar_reader->offset_index_, page_idx, &row_range);
+        skip_ranges.push_back(row_range);
+      }
+    }
+  }
+  if (skip_ranges.empty()) return Status::OK();
+
+  for (BaseScalarColumnReader* scalar_reader : scalar_readers_) {
+    const parquet::ColumnChunk& col_chunk = row_group.columns[scalar_reader->col_idx()];
+    if (col_chunk.offset_index_length > 0) {
+      parquet::OffsetIndex& offset_index = scalar_reader->offset_index_;
+      if (!offset_index.page_locations.empty()) continue;
+      RETURN_IF_ERROR(page_index_.DeserializeOffsetIndex(col_chunk, &offset_index));
+    } else {
+      // We can only filter pages based on the page index if we have the offset index
+      // for all columns.
+      return Status(Substitute("Found column index, but no offset index for '$0' in "
+          "file '$1'", scalar_reader->schema_element().name, filename()));
+    }
+  }
+  if (!ComputeCandidateRanges(row_group.num_rows, &skip_ranges, &candidate_ranges_)) {
+    return Status(Substitute("Invalid offset index in Parquet file $0.", filename()));
+  }
+  *filter_pages = true;
+  return Status::OK();
+}
+
+Status HdfsParquetScanner::ComputeCandidatePagesForColumns(bool* filter_pages) {
+  if (candidate_ranges_.empty()) return Status::OK();
+
+  parquet::RowGroup& row_group = file_metadata_.row_groups[row_group_idx_];
+  for (BaseScalarColumnReader* scalar_reader : scalar_readers_) {
+    const auto& page_locations = scalar_reader->offset_index_.page_locations;
+    if (!ComputeCandidatePages(page_locations, candidate_ranges_, row_group.num_rows,
+        &scalar_reader->candidate_data_pages_)) {
+      *filter_pages = false;
+      return Status(Substitute("Invalid offset index in Parquet file $0.", filename()));
+    }
+  }
+  for (BaseScalarColumnReader* scalar_reader : scalar_readers_) {
+    const auto& page_locations = scalar_reader->offset_index_.page_locations;
+    int total_page_count = page_locations.size();
+    int candidate_pages_count = scalar_reader->candidate_data_pages_.size();
+    COUNTER_ADD(num_stats_filtered_pages_counter_,
+        total_page_count - candidate_pages_count);
+    COUNTER_ADD(num_pages_counter_, total_page_count);
+  }
+  return Status::OK();
+}
+
 void HdfsParquetScanner::FlushRowGroupResources(RowBatch* row_batch) {
   DCHECK(row_batch != nullptr);
   row_batch->tuple_data_pool()->AcquireData(dictionary_pool_.get(), false);
@@ -973,6 +1131,7 @@ Status HdfsParquetScanner::AssembleRows(
       }
       last_num_tuples = scratch_batch_->num_tuples;
     }
+    RETURN_IF_ERROR(CheckPageFiltering());
     num_rows_read += scratch_batch_->num_tuples;
     int num_row_to_commit = TransferScratchTuples(row_batch);
     RETURN_IF_ERROR(CommitRows(row_batch, num_row_to_commit));
@@ -986,6 +1145,20 @@ Status HdfsParquetScanner::AssembleRows(
   return Status::OK();
 }
 
+Status HdfsParquetScanner::CheckPageFiltering() {
+  if (candidate_ranges_.empty() || scalar_readers_.empty()) return Status::OK();
+
+  int64_t current_row = scalar_readers_[0]->LastProcessedRow();
+  for (int i = 1; i < scalar_readers_.size(); ++i) {
+    if (current_row != scalar_readers_[i]->LastProcessedRow()) {
+      DCHECK(false);
+      return Status(Substitute(
+          "Top level rows aren't in sync during page filtering in file $0.", filename()));
+    }
+  }
+  return Status::OK();
+}
+
 Status HdfsParquetScanner::CommitRows(RowBatch* dst_batch, int num_rows) {
   DCHECK(dst_batch != nullptr);
   dst_batch->CommitRows(num_rows);
@@ -1383,6 +1556,9 @@ Status HdfsParquetScanner::CreateColumnReaders(const TupleDescriptor& tuple_desc
           static_cast<CollectionColumnReader*>(col_reader);
       RETURN_IF_ERROR(CreateColumnReaders(
           *item_tuple_desc, schema_resolver, collection_reader->children()));
+    } else {
+      scalar_reader_map_[node->col_idx] = static_cast<BaseScalarColumnReader*>(
+          col_reader);
     }
   }
 
@@ -1642,13 +1818,27 @@ Status HdfsParquetScanner::ValidateEndOfRowGroup(
       << parse_status_.GetDetail();
 
   if (column_readers[0]->max_rep_level() == 0) {
-    // These column readers materialize table-level values (vs. collection values). Test
-    // if the expected number of rows from the file metadata matches the actual number of
-    // rows read from the file.
-    int64_t expected_rows_in_group = file_metadata_.row_groups[row_group_idx].num_rows;
-    if (rows_read != expected_rows_in_group) {
-      return Status(TErrorCode::PARQUET_GROUP_ROW_COUNT_ERROR, filename(), row_group_idx,
-          expected_rows_in_group, rows_read);
+    if (candidate_ranges_.empty()) {
+      // These column readers materialize table-level values (vs. collection values).
+      // Test if the expected number of rows from the file metadata matches the actual
+      // number of rows read from the file.
+      int64_t expected_rows_in_group = file_metadata_.row_groups[row_group_idx].num_rows;
+      if (rows_read != expected_rows_in_group) {
+        return Status(TErrorCode::PARQUET_GROUP_ROW_COUNT_ERROR, filename(),
+            row_group_idx, expected_rows_in_group, rows_read);
+      }
+    } else {
+      // In this case we filter out row ranges. Validate that the number of rows read
+      // matches the number of rows determined by the candidate row ranges.
+      int64_t expected_rows_in_group = 0;
+      for (auto& range : candidate_ranges_) {
+        expected_rows_in_group += range.last - range.first + 1;
+      }
+      if (rows_read != expected_rows_in_group) {
+        return Status(Substitute("Based on the page index of group $0($1) there are $2 ",
+            "rows need to be scanned, but $3 rows were read.", filename(), row_group_idx,
+            expected_rows_in_group, rows_read));
+      }
     }
   }
 
@@ -1671,14 +1861,25 @@ Status HdfsParquetScanner::ValidateEndOfRowGroup(
     // fail if this not the case though, since num_values_read_ is only updated at the end
     // of a data page).
     if (num_values_read == -1) num_values_read = reader->num_values_read_;
-    DCHECK_EQ(reader->num_values_read_, num_values_read);
+    if (candidate_ranges_.empty()) DCHECK_EQ(reader->num_values_read_, num_values_read);
     // ReadDataPage() uses metadata_->num_values to determine when the column's done
-    DCHECK(reader->num_values_read_ == reader->metadata_->num_values ||
-        !state_->abort_on_error());
+    DCHECK(!candidate_ranges_.empty() ||
+        (reader->num_values_read_ == reader->metadata_->num_values ||
+        !state_->abort_on_error()));
   }
   return Status::OK();
 }
 
+ColumnStatsReader HdfsParquetScanner::CreateColumnStatsReader(
+    const parquet::ColumnChunk& col_chunk, const ColumnType& col_type,
+    const parquet::ColumnOrder* col_order, const parquet::SchemaElement& element) {
+  ColumnStatsReader stats_reader(col_chunk, col_type, col_order, element);
+  if (col_type.IsTimestampType()) {
+    stats_reader.SetTimestampDecoder(CreateTimestampDecoder(element));
+  }
+  return stats_reader;
+}
+
 ParquetTimestampDecoder HdfsParquetScanner::CreateTimestampDecoder(
     const parquet::SchemaElement& element) {
   bool timestamp_conversion_needed_for_int96_timestamps =
@@ -1697,4 +1898,5 @@ void HdfsParquetScanner::UpdateUncompressedPageSizeCounter(
     int64_t uncompressed_page_size) {
   parquet_uncompressed_page_size_counter_->UpdateCounter(uncompressed_page_size);
 }
+
 }
diff --git a/be/src/exec/parquet/hdfs-parquet-scanner.h b/be/src/exec/parquet/hdfs-parquet-scanner.h
index 09da2bb..ae47cf8 100644
--- a/be/src/exec/parquet/hdfs-parquet-scanner.h
+++ b/be/src/exec/parquet/hdfs-parquet-scanner.h
@@ -21,8 +21,10 @@
 
 #include "codegen/impala-ir.h"
 #include "exec/hdfs-scanner.h"
+#include "exec/parquet/parquet-column-stats.h"
 #include "exec/parquet/parquet-common.h"
 #include "exec/parquet/parquet-metadata-utils.h"
+#include "exec/parquet/parquet-page-index.h"
 #include "exec/parquet/parquet-scratch-tuple-batch.h"
 #include "runtime/scoped-buffer.h"
 #include "util/runtime-profile-counters.h"
@@ -41,6 +43,7 @@ class ParquetLevelDecoder;
 /// Per column reader.
 class ParquetColumnReader;
 class CollectionColumnReader;
+class ColumnStatsReader;
 class BaseScalarColumnReader;
 template<typename InternalType, parquet::Type::type PARQUET_TYPE, bool MATERIALIZED>
 class ScalarColumnReader;
@@ -321,6 +324,15 @@ class BoolColumnReader;
 /// excluded from output. Only partition-column filters are applied at AssembleRows(). The
 /// FilterContexts for these filters are cloned from the parent scan node and attached to
 /// the ScannerContext.
+///
+/// ---- Page filtering ----
+/// A Parquet file can contain a so called "page index". It has two parts, a column index
+/// and an offset index. The column index contains statistics like minimum and maximum
+/// values for each page. The offset index contains information about page locations in
+/// the Parquet file and top-level row ranges. HdfsParquetScanner evaluates the min/max
+/// conjuncts against the column index and determines the surviving pages with the help of
+/// the offset index. Then it will configure the column readers to only scan the pages
+/// and row ranges that have a chance to store rows that pass the conjuncts.
 class HdfsParquetScanner : public HdfsScanner {
  public:
   HdfsParquetScanner(HdfsScanNodeBase* scan_node, RuntimeState* state);
@@ -343,6 +355,11 @@ class HdfsParquetScanner : public HdfsScanner {
       llvm::Function** process_scratch_batch_fn)
       WARN_UNUSED_RESULT;
 
+  /// Helper function to create ColumnStatsReader object. 'col_order' might be NULL.
+  ColumnStatsReader CreateColumnStatsReader(
+      const parquet::ColumnChunk& col_chunk, const ColumnType& col_type,
+      const parquet::ColumnOrder* col_order, const parquet::SchemaElement& element);
+
   /// Initializes a ParquetTimestampDecoder depending on writer, timezone, and the schema
   /// of the column.
   ParquetTimestampDecoder CreateTimestampDecoder(const parquet::SchemaElement& element);
@@ -358,6 +375,7 @@ class HdfsParquetScanner : public HdfsScanner {
   friend class ScalarColumnReader;
   friend class BoolColumnReader;
   friend class HdfsParquetScannerTest;
+  friend class ParquetPageIndex;
 
   /// Index of the current row group being processed. Initialized to -1 which indicates
   /// that we have not started processing the first row group yet (GetNext() has not yet
@@ -408,6 +426,11 @@ class HdfsParquetScanner : public HdfsScanner {
   /// pages in a column chunk.
   boost::scoped_ptr<MemPool> dictionary_pool_;
 
+  /// Contains the leftover ranges after evaluating the page index.
+  /// If all rows were eliminated, then the row group is skipped immediately after
+  /// evaluating the page index.
+  std::vector<RowRange> candidate_ranges_;
+
   /// Column readers that are eligible for dictionary filtering.
   /// These are pointers to elements of column_readers_. Materialized columns that are
   /// dictionary encoded correspond to scalar columns that are either top-level columns
@@ -426,6 +449,9 @@ class HdfsParquetScanner : public HdfsScanner {
   /// Flattened collection column readers that point to readers in column_readers_.
   std::vector<CollectionColumnReader*> collection_readers_;
 
+  /// Mapping from Parquet column indexes to scalar readers.
+  std::unordered_map<int, BaseScalarColumnReader*> scalar_reader_map_;
+
   /// Memory used to store the tuples used for dictionary filtering. Tuples owned by
   /// perm_pool_.
   std::unordered_map<const TupleDescriptor*, Tuple*> dict_filter_tuple_map_;
@@ -436,15 +462,29 @@ class HdfsParquetScanner : public HdfsScanner {
   /// Average and min/max time spent processing the footer by each split.
   RuntimeProfile::SummaryStatsCounter* process_footer_timer_stats_;
 
+  /// Average and min/max time spent processing the page index for each row group.
+  RuntimeProfile::SummaryStatsCounter* process_page_index_stats_;
+
   /// Number of columns that need to be read.
   RuntimeProfile::Counter* num_cols_counter_;
 
-  /// Number of row groups that are skipped because of Parquet row group statistics.
+  /// Number of row groups that are skipped because of Parquet statistics, either by
+  /// row group level statistics, or page level statistics.
   RuntimeProfile::Counter* num_stats_filtered_row_groups_counter_;
 
   /// Number of row groups that need to be read.
   RuntimeProfile::Counter* num_row_groups_counter_;
 
+  /// Number of row groups with page index.
+  RuntimeProfile::Counter* num_row_groups_with_page_index_counter_;
+
+  /// Number of pages that are skipped because of Parquet page statistics.
+  RuntimeProfile::Counter* num_stats_filtered_pages_counter_;
+
+  /// Number of pages need to be examined. We need to scan
+  /// 'num_pages_counter_ - num_stats_filtered_pages_counter_' pages.
+  RuntimeProfile::Counter* num_pages_counter_;
+
   /// Number of scanners that end up doing no reads because their splits don't overlap
   /// with the midpoint of any row-group in the file.
   RuntimeProfile::Counter* num_scanners_with_no_reads_counter_;
@@ -472,6 +512,8 @@ class HdfsParquetScanner : public HdfsScanner {
   /// The codegen'd version of ProcessScratchBatch() if available, NULL otherwise.
   ProcessScratchBatchFn codegend_process_scratch_batch_fn_;
 
+  ParquetPageIndex page_index_;
+
   const char* filename() const { return metadata_range_->file(); }
 
   virtual Status GetNextInternal(RowBatch* row_batch) WARN_UNUSED_RESULT;
@@ -490,6 +532,31 @@ class HdfsParquetScanner : public HdfsScanner {
   /// to be OK as well.
   Status NextRowGroup() WARN_UNUSED_RESULT;
 
+  /// High-level function for initializing page filtering for the scalar readers.
+  /// Sets 'filter_pages' to true if found any page to filter out.
+  Status ProcessPageIndex(bool* filter_pages);
+
+  /// Evaluates 'min_max_conjunct_evals_' against the column index and determines the row
+  /// ranges that might contain data we are looking for.
+  /// Sets 'filter_pages' to true if found any page to filter out.
+  Status EvaluatePageIndex(bool* filter_pages);
+
+  /// Based on 'candidate_ranges_' it determines the candidate pages for each
+  /// scalar reader.
+  Status ComputeCandidatePagesForColumns(bool* filter_pages);
+
+  /// Check that the scalar readers agree on the top-level row being scanned.
+  Status CheckPageFiltering();
+
+  /// Reads statistics data for a page of given column. Page is specified by 'page_idx',
+  /// column is specified by 'column_index'. 'stats_field' specifies whether we should
+  /// read the min or max value. 'is_null_page' is set to true for null pages, in that
+  /// case there is no min nor max value. Returns true if the read of the min or max
+  /// value was successful, in this case 'slot' will hold the min or max value.
+  static bool ReadStatFromIndex(const ColumnStatsReader& stats_reader,
+      const parquet::ColumnIndex& column_index, int page_idx,
+      ColumnStatsReader::StatsField stats_field, bool* is_null_page, void* slot);
+
   /// Reads data using 'column_readers' to materialize top-level tuples into 'row_batch'.
   /// Returns a non-OK status if a non-recoverable error was encountered and execution
   /// of this query should be terminated immediately.
diff --git a/be/src/exec/parquet/parquet-column-readers.cc b/be/src/exec/parquet/parquet-column-readers.cc
index 58cf9dc..16cdc2c 100644
--- a/be/src/exec/parquet/parquet-column-readers.cc
+++ b/be/src/exec/parquet/parquet-column-readers.cc
@@ -35,7 +35,6 @@
 #include "runtime/exec-env.h"
 #include "runtime/io/disk-io-mgr.h"
 #include "runtime/io/request-context.h"
-#include "runtime/mem-pool.h"
 #include "runtime/runtime-state.h"
 #include "runtime/tuple-row.h"
 #include "runtime/tuple.h"
@@ -201,6 +200,8 @@ class ScalarColumnReader : public BaseScalarColumnReader {
 
   virtual Status InitDataPage(uint8_t* data, int size) override;
 
+  virtual bool SkipEncodedValuesInPage(int64_t num_values) override;
+
  private:
   /// Writes the next value into the appropriate destination slot in 'tuple'. Returns
   /// false if execution should be aborted for some reason, e.g. parse_error_ is set, the
@@ -385,6 +386,23 @@ Status ScalarColumnReader<bool, parquet::Type::BOOLEAN, true>::InitDataPage(
   return GetUnsupportedDecodingError();
 }
 
+template <typename InternalType, parquet::Type::type PARQUET_TYPE, bool MATERIALIZED>
+bool ScalarColumnReader<InternalType, PARQUET_TYPE,
+    MATERIALIZED>::SkipEncodedValuesInPage(int64_t num_values) {
+  if (bool_decoder_) {
+    return bool_decoder_->SkipValues(num_values);
+  }
+  if (page_encoding_ == Encoding::PLAIN_DICTIONARY) {
+    return dict_decoder_.SkipValues(num_values);
+  } else {
+    DCHECK_EQ(page_encoding_, Encoding::PLAIN);
+    int64_t encoded_len = ParquetPlainEncoder::EncodedLen<PARQUET_TYPE>(
+        data_, data_end_, fixed_len_size_, num_values);
+    if (encoded_len < 0) return false;
+    data_ += encoded_len;
+  }
+  return true;
+}
 
 template <typename InternalType, parquet::Type::type PARQUET_TYPE, bool MATERIALIZED>
 template <bool IN_COLLECTION>
@@ -434,7 +452,8 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE, MATERIALIZED>::ReadValueBatc
   bool continue_execution = true;
   while (val_count < max_values && !RowGroupAtEnd() && continue_execution) {
     DCHECK_GE(num_buffered_values_, 0);
-    // Read next page if necessary.
+    // Read next page if necessary. It will skip values if necessary, so we can start
+    // materializing the values right after.
     if (num_buffered_values_ == 0) {
       if (!NextPage()) {
         continue_execution = parent_->parse_status_.ok();
@@ -445,6 +464,8 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE, MATERIALIZED>::ReadValueBatc
     // Not materializing anything - skip decoding any levels and rely on the value
     // count from page metadata to return the correct number of rows.
     if (!MATERIALIZED && !IN_COLLECTION) {
+      // We cannot filter pages in this context.
+      DCHECK(!DoesPageFiltering());
       int vals_to_add = min(num_buffered_values_, max_values - val_count);
       val_count += vals_to_add;
       num_buffered_values_ -= vals_to_add;
@@ -453,10 +474,12 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE, MATERIALIZED>::ReadValueBatc
     }
     // Fill the rep level cache if needed. We are flattening out the fields of the
     // nested collection into the top-level tuple returned by the scan, so we don't
-    // care about the nesting structure unless the position slot is being populated.
-    if (IN_COLLECTION && pos_slot_desc_ != nullptr && !rep_levels_.CacheHasNext()) {
+    // care about the nesting structure unless the position slot is being populated,
+    // or we filter out rows.
+    if (IN_COLLECTION && (pos_slot_desc_ != nullptr || DoesPageFiltering()) &&
+        !rep_levels_.CacheHasNext()) {
       parent_->parse_status_.MergeStatus(
-          rep_levels_.CacheNextBatch(num_buffered_values_));
+            rep_levels_.CacheNextBatch(num_buffered_values_));
       if (UNLIKELY(!parent_->parse_status_.ok())) return false;
     }
 
@@ -483,6 +506,21 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE, MATERIALIZED>::ReadValueBatc
           remaining_val_capacity, tuple_size, next_tuple, &ret_val_count);
       val_count += ret_val_count;
     }
+    // Now that we have read some values, let's check whether we should skip some
+    // due to page filtering.
+    if (DoesPageFiltering() && ConsumedCurrentCandidateRange<IN_COLLECTION>()) {
+      if (IsLastCandidateRange()) {
+        *num_values = val_count;
+        num_buffered_values_ = 0;
+        return val_count > 0;
+      }
+      AdvanceCandidateRange();
+      if (PageHasRemainingCandidateRows()) {
+        if(!SkipRowsInPage()) return false;
+      } else {
+        if (!JumpToNextPage()) return false;
+      }
+    }
     if (SHOULD_TRIGGER_COL_READER_DEBUG_ACTION(val_count)) {
       continue_execution &= ColReaderDebugAction(&val_count);
     }
@@ -499,25 +537,33 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE, MATERIALIZED>::MaterializeVa
   DCHECK(MATERIALIZED || IN_COLLECTION);
   DCHECK_GT(num_buffered_values_, 0);
   DCHECK(def_levels_.CacheHasNext());
-  if (IN_COLLECTION && pos_slot_desc_ != nullptr) DCHECK(rep_levels_.CacheHasNext());
-  const int cache_start_idx = def_levels_.CacheCurrIdx();
+  if (IN_COLLECTION && (pos_slot_desc_ != nullptr || DoesPageFiltering())) {
+    DCHECK(rep_levels_.CacheHasNext());
+  }
+  int cache_start_idx = def_levels_.CacheCurrIdx();
   uint8_t* curr_tuple = tuple_mem;
   int val_count = 0;
   DCHECK_LE(def_levels_.CacheRemaining(), num_buffered_values_);
   max_values = min(max_values, num_buffered_values_);
   while (def_levels_.CacheHasNext() && val_count < max_values) {
+    if (DoesPageFiltering()) {
+      int peek_rep_level = IN_COLLECTION ? rep_levels_.PeekLevel() : 0;
+      if (RowsRemainingInCandidateRange() == 0 && peek_rep_level == 0) break;
+    }
+
+    int rep_level = IN_COLLECTION ? rep_levels_.ReadLevel() : 0;
+    if (rep_level == 0) ++current_row_;
+
     Tuple* tuple = reinterpret_cast<Tuple*>(curr_tuple);
     int def_level = def_levels_.CacheGetNext();
 
     if (IN_COLLECTION) {
       if (def_level < def_level_of_immediate_repeated_ancestor()) {
-        // A containing repeated field is empty or NULL. Skip the value but
-        // move to the next repetition level if necessary.
-        if (pos_slot_desc_ != nullptr) rep_levels_.CacheSkipLevels(1);
+        // A containing repeated field is empty or NULL, skip the value.
         continue;
       }
       if (pos_slot_desc_ != nullptr) {
-        ReadPositionBatched(rep_levels_.CacheGetNext(),
+        ReadPositionBatched(rep_level,
             tuple->GetBigIntSlot(pos_slot_desc_->tuple_offset()));
       }
     }
@@ -546,7 +592,10 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE,
     MATERIALIZED>::MaterializeValueBatchRepeatedDefLevel(int max_values, int tuple_size,
     uint8_t* RESTRICT tuple_mem, int* RESTRICT num_values) RESTRICT {
   DCHECK_GT(num_buffered_values_, 0);
-  if (pos_slot_desc_ != nullptr) DCHECK(rep_levels_.CacheHasNext());
+  if (max_rep_level_ > 0 &&
+      (pos_slot_desc_ != nullptr || DoesPageFiltering())) {
+    DCHECK(rep_levels_.CacheHasNext());
+  }
   int32_t def_level_repeats = def_levels_.NextRepeatedRunLength();
   DCHECK_GT(def_level_repeats, 0);
   // Peek at the def level. The number of def levels we'll consume depends on several
@@ -554,6 +603,7 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE,
   uint8_t def_level = def_levels_.GetRepeatedValue(0);
   int32_t num_def_levels_to_consume = 0;
 
+  // Find the upper limit of how many def levels we can consume.
   if (def_level < def_level_of_immediate_repeated_ancestor()) {
     DCHECK_GT(max_rep_level_, 0) << "Only possible if in a collection.";
     // A containing repeated field is empty or NULL. We don't need to return any values
@@ -561,18 +611,40 @@ bool ScalarColumnReader<InternalType, PARQUET_TYPE,
     if (pos_slot_desc_ != nullptr) {
       num_def_levels_to_consume =
           min<uint32_t>(def_level_repeats, rep_levels_.CacheRemaining());
-      rep_levels_.CacheSkipLevels(num_def_levels_to_consume);
     } else {
       num_def_levels_to_consume = def_level_repeats;
     }
-    *num_values = 0;
   } else {
     // Cannot consume more levels than allowed by buffered input values and output space.
-    num_def_levels_to_consume =
-        min(num_buffered_values_, min(max_values, def_level_repeats));
+    num_def_levels_to_consume = min(min(
+        num_buffered_values_, max_values), def_level_repeats);
     if (pos_slot_desc_ != nullptr) {
       num_def_levels_to_consume =
           min<uint32_t>(num_def_levels_to_consume, rep_levels_.CacheRemaining());
+    }
+  }
+  // Page filtering can also put an upper limit on 'num_def_levels_to_consume'.
+  if (DoesPageFiltering()) {
+    int rows_remaining = RowsRemainingInCandidateRange();
+    if (max_rep_level_ == 0) {
+      num_def_levels_to_consume = min(num_def_levels_to_consume, rows_remaining);
+      current_row_ += num_def_levels_to_consume;
+    } else {
+      // We need to calculate how many 'primitive' values are there until the end
+      // of the current candidate range. In the meantime we also fill the position
+      // slots because we are consuming the repetition levels.
+      num_def_levels_to_consume = FillPositionsInCandidateRange(rows_remaining,
+          num_def_levels_to_consume, tuple_mem, tuple_size);
+    }
+  }
+  // Now we have 'num_def_levels_to_consume' set, let's read the slots.
+  if (def_level < def_level_of_immediate_repeated_ancestor()) {
+    if (pos_slot_desc_ != nullptr && !DoesPageFiltering()) {
+      rep_levels_.CacheSkipLevels(num_def_levels_to_consume);
+    }
+    *num_values = 0;
+  } else {
+    if (pos_slot_desc_ != nullptr && !DoesPageFiltering()) {
       ReadPositions(num_def_levels_to_consume, tuple_size, tuple_mem);
     }
     if (MATERIALIZED) {
@@ -974,6 +1046,29 @@ static bool RequiresSkippedDictionaryHeaderCheck(
   return v.VersionEq(1,1,0) || (v.VersionEq(1,2,0) && v.is_impala_internal);
 }
 
+void BaseScalarColumnReader::CreateSubRanges(vector<ScanRange::SubRange>* sub_ranges) {
+  sub_ranges->clear();
+  if (!DoesPageFiltering()) return;
+  int64_t data_start = metadata_->data_page_offset;
+  int64_t data_start_based_on_offset_index = offset_index_.page_locations[0].offset;
+  if (metadata_->__isset.dictionary_page_offset) {
+    int64_t dict_start = metadata_->dictionary_page_offset;
+    // This assumes that the first data page is coming right after the dictionary page
+    sub_ranges->push_back( { dict_start, data_start - dict_start });
+  } else if (data_start < data_start_based_on_offset_index) {
+    // 'dictionary_page_offset' is not set, but the offset index and
+    // column chunk metadata disagree on the data start => column chunk's data start
+    // is actually the location of the dictionary page. Parquet-MR (at least
+    // version 1.10 and earlier versions) writes Parquet files like that.
+    int64_t dict_start = data_start;
+    sub_ranges->push_back({dict_start, data_start_based_on_offset_index - dict_start});
+  }
+  for (int candidate_page_idx : candidate_data_pages_) {
+    auto page_loc = offset_index_.page_locations[candidate_page_idx];
+    sub_ranges->push_back( { page_loc.offset, page_loc.compressed_page_size });
+  }
+}
+
 Status BaseScalarColumnReader::Reset(const HdfsFileDesc& file_desc,
     const parquet::ColumnChunk& col_chunk, int row_group_idx) {
   // Ensure metadata is valid before using it to initialize the reader.
@@ -1020,6 +1115,7 @@ Status BaseScalarColumnReader::Reset(const HdfsFileDesc& file_desc,
     int64_t bytes_remaining = file_desc.file_length - col_end;
     int64_t pad = min<int64_t>(MAX_DICT_HEADER_SIZE, bytes_remaining);
     col_len += pad;
+    col_end += pad;
   }
 
   // TODO: this will need to change when we have co-located files and the columns
@@ -1037,9 +1133,11 @@ Status BaseScalarColumnReader::Reset(const HdfsFileDesc& file_desc,
   bool col_range_local = split_range->expected_local()
       && col_start >= split_range->offset()
       && col_end <= split_range->offset() + split_range->len();
+  vector<ScanRange::SubRange> sub_ranges;
+  CreateSubRanges(&sub_ranges);
   scan_range_ = parent_->scan_node_->AllocateScanRange(metadata_range->fs(),
-      filename(), col_len, col_start, partition_id, split_range->disk_id(),
-      col_range_local, split_range->is_erasure_coded(),
+      filename(), col_len, col_start, move(sub_ranges), partition_id,
+      split_range->disk_id(),col_range_local, split_range->is_erasure_coded(),
       BufferOpts(split_range->try_cache(), file_desc.mtime));
   ClearDictionaryDecoder();
   return Status::OK();
@@ -1297,7 +1395,9 @@ Status BaseScalarColumnReader::ReadDataPage() {
   // the pages).
   while (true) {
     DCHECK_EQ(num_buffered_values_, 0);
-    if (num_values_read_ == metadata_->num_values) {
+    if ((DoesPageFiltering() &&
+         candidate_page_idx_ == candidate_data_pages_.size() - 1) ||
+        num_values_read_ == metadata_->num_values) {
       // No more pages to read
       // TODO: should we check for stream_->eosr()?
       break;
@@ -1354,6 +1454,7 @@ Status BaseScalarColumnReader::ReadDataPage() {
       return Status(Substitute("Error reading data page in Parquet file '$0'. "
           "Invalid number of values in metadata: $1", filename(), num_values));
     }
+
     num_buffered_values_ = num_values;
     num_values_read_ += num_buffered_values_;
 
@@ -1421,6 +1522,11 @@ Status BaseScalarColumnReader::ReadDataPage() {
 
     // Data can be empty if the column contains all NULLs
     RETURN_IF_ERROR(InitDataPage(data_, data_size));
+
+    // Skip rows if needed.
+    RETURN_IF_ERROR(StartPageFiltering());
+
+    if (parent_->candidate_ranges_.empty()) COUNTER_ADD(parent_->num_pages_counter_, 1);
     break;
   }
 
@@ -1443,9 +1549,24 @@ template <bool ADVANCE_REP_LEVEL>
 bool BaseScalarColumnReader::NextLevels() {
   if (!ADVANCE_REP_LEVEL) DCHECK_EQ(max_rep_level(), 0) << slot_desc()->DebugString();
 
+  levels_readahead_ = true;
   if (UNLIKELY(num_buffered_values_ == 0)) {
     if (!NextPage()) return parent_->parse_status_.ok();
   }
+  if (DoesPageFiltering() && RowsRemainingInCandidateRange() == 0) {
+    if (!ADVANCE_REP_LEVEL || max_rep_level() == 0 || rep_levels_.PeekLevel() == 0) {
+      if (!IsLastCandidateRange()) AdvanceCandidateRange();
+      if (PageHasRemainingCandidateRows()) {
+        auto current_range = parent_->candidate_ranges_[current_row_range_];
+        int64_t skip_rows = current_range.first - current_row_ - 1;
+        DCHECK_GE(skip_rows, 0);
+        if (!SkipTopLevelRows(skip_rows)) return false;
+      } else {
+        if (!JumpToNextPage()) return parent_->parse_status_.ok();
+      }
+    }
+  }
+
   --num_buffered_values_;
   DCHECK_GE(num_buffered_values_, 0);
 
@@ -1467,11 +1588,164 @@ bool BaseScalarColumnReader::NextLevels() {
     }
     // Reset position counter if we are at the start of a new parent collection.
     if (rep_level_ <= max_rep_level() - 1) pos_current_value_ = 0;
+    if (rep_level_ == 0) ++current_row_;
+  } else {
+    ++current_row_;
   }
 
   return parent_->parse_status_.ok();
 }
 
+void BaseScalarColumnReader::ResetPageFiltering() {
+  offset_index_.page_locations.clear();
+  candidate_data_pages_.clear();
+  candidate_page_idx_ = -1;
+  current_row_ = -1;
+  levels_readahead_ = false;
+}
+
+Status BaseScalarColumnReader::StartPageFiltering() {
+  if (!DoesPageFiltering()) return Status::OK();
+  ++candidate_page_idx_;
+  current_row_ = FirstRowIdxInCurrentPage() - 1;
+  // Move to the next candidate range.
+  auto& candidate_row_ranges = parent_->candidate_ranges_;
+  while (current_row_ >= candidate_row_ranges[current_row_range_].last) {
+    DCHECK_LT(current_row_range_, candidate_row_ranges.size() - 1);
+    ++current_row_range_;
+  }
+  int64_t range_start = candidate_row_ranges[current_row_range_].first;
+  if (range_start > current_row_ + 1) {
+    int64_t skip_rows = range_start - current_row_ - 1;
+    if (!SkipTopLevelRows(skip_rows)) {
+      return Status(Substitute("Couldn't skip rows in file $0.", filename()));
+    }
+    DCHECK_EQ(current_row_, range_start - 1);
+  }
+  return Status::OK();
+}
+
+bool BaseScalarColumnReader::SkipTopLevelRows(int64_t num_rows) {
+  DCHECK_GE(num_buffered_values_, num_rows);
+  // Fastest path: field is required and not nested.
+  // So row count equals value count, and every value is stored in the page data.
+  if (max_def_level() == 0 && max_rep_level() == 0) {
+    current_row_ += num_rows;
+    num_buffered_values_ -= num_rows;
+    return SkipEncodedValuesInPage(num_rows);
+  }
+  int64_t num_values_to_skip = 0;
+  if (max_rep_level() == 0) {
+    // No nesting, but field is not required.
+    // Skip as many values in the page data as many non-NULL values encountered.
+    int i = 0;
+    while (i < num_rows) {
+      int repeated_run_length = def_levels_.NextRepeatedRunLength();
+      if (repeated_run_length > 0) {
+        int read_count = min<int64_t>(num_rows - i, repeated_run_length);
+        int16_t def_level = def_levels_.GetRepeatedValue(read_count);
+        if (def_level >= max_def_level_) num_values_to_skip += read_count;
+        i += read_count;
+        num_buffered_values_ -= read_count;
+      } else if (def_levels_.CacheHasNext()) {
+        int read_count = min<int64_t>(num_rows - i, def_levels_.CacheRemaining());
+        for (int j = 0; j < read_count; ++j) {
+          if (def_levels_.CacheGetNext() >= max_def_level_) ++num_values_to_skip;
+        }
+        i += read_count;
+        num_buffered_values_ -= read_count;
+      } else {
+        if (!def_levels_.CacheNextBatch(num_buffered_values_).ok()) return false;
+      }
+    }
+    current_row_ += num_rows;
+  } else {
+    // 'rep_level_' being zero denotes the start of a new top-level row.
+    // From the 'def_level_' we can determine the number of non-NULL values.
+    while (!(num_rows == 0 && rep_levels_.PeekLevel() == 0)) {
+      def_level_ = def_levels_.ReadLevel();
+      rep_level_ = rep_levels_.ReadLevel();
+      --num_buffered_values_;
+      if (def_level_ >= max_def_level()) ++num_values_to_skip;
+      if (rep_level_ == 0) {
+        ++current_row_;
+        --num_rows;
+      }
+    }
+  }
+  return SkipEncodedValuesInPage(num_values_to_skip);
+}
+
+int BaseScalarColumnReader::FillPositionsInCandidateRange(int rows_remaining,
+    int max_values, uint8_t* RESTRICT tuple_mem, int tuple_size) {
+  DCHECK_GT(max_rep_level_, 0);
+  DCHECK_EQ(rows_remaining, RowsRemainingInCandidateRange());
+  int row_count = 0;
+  int val_count = 0;
+  int64_t *pos_slot = nullptr;
+  if (pos_slot_desc_ != nullptr) {
+    const int pos_slot_offset = pos_slot_desc()->tuple_offset();
+    pos_slot = reinterpret_cast<Tuple*>(tuple_mem)->GetBigIntSlot(pos_slot_offset);
+  }
+  StrideWriter<int64_t> pos_writer{pos_slot, tuple_size};
+  while (rep_levels_.CacheRemaining() && row_count <= rows_remaining &&
+         val_count < max_values) {
+    if (row_count == rows_remaining && rep_levels_.CachePeekNext() == 0) break;
+    int rep_level = rep_levels_.CacheGetNext();
+    if (rep_level == 0) ++row_count;
+    ++val_count;
+    if (pos_slot_desc_ != nullptr) {
+      if (rep_level <= max_rep_level() - 1) pos_current_value_ = 0;
+      *pos_writer.Advance() = pos_current_value_++;
+    }
+  }
+  current_row_ += row_count;
+  return val_count;
+}
+
+void BaseScalarColumnReader::AdvanceCandidateRange() {
+  DCHECK(DoesPageFiltering());
+  auto& candidate_ranges = parent_->candidate_ranges_;
+  DCHECK_LT(current_row_range_, candidate_ranges.size());
+  DCHECK_EQ(current_row_, candidate_ranges[current_row_range_].last);
+  ++current_row_range_;
+  DCHECK_LE(current_row_, candidate_ranges[current_row_range_].last);
+}
+
+bool BaseScalarColumnReader::PageHasRemainingCandidateRows() const {
+  DCHECK(DoesPageFiltering());
+  DCHECK_LT(current_row_range_, parent_->candidate_ranges_.size());
+  auto current_range = parent_->candidate_ranges_[current_row_range_];
+  if (candidate_page_idx_ != candidate_data_pages_.size() - 1) {
+    auto& next_page_loc =
+        offset_index_.page_locations[candidate_data_pages_[candidate_page_idx_+1]];
+    // If the next page contains rows with index higher than the start of the
+    // current candidate range, it means we still have interesting rows in the
+    // current page.
+    return next_page_loc.first_row_index > current_range.first;
+  }
+  if (candidate_page_idx_ == candidate_data_pages_.size() - 1) {
+    // We are in the last page, we need to skip rows if the current top level row
+    // precedes the next candidate range.
+    return current_row_ < current_range.first;
+  }
+  return false;
+}
+
+bool BaseScalarColumnReader::SkipRowsInPage() {
+  auto current_range = parent_->candidate_ranges_[current_row_range_];
+  DCHECK_LT(current_row_, current_range.first);
+  int64_t skip_rows = current_range.first - current_row_ - 1;
+  DCHECK_GE(skip_rows, 0);
+  return SkipTopLevelRows(skip_rows);
+}
+
+bool BaseScalarColumnReader::JumpToNextPage() {
+  DCHECK(DoesPageFiltering());
+  num_buffered_values_ = 0;
+  return NextPage();
+}
+
 Status BaseScalarColumnReader::GetUnsupportedDecodingError() {
   return Status(Substitute(
       "File '$0' is corrupt: unexpected encoding: $1 for data page of column '$2'.",
diff --git a/be/src/exec/parquet/parquet-column-readers.h b/be/src/exec/parquet/parquet-column-readers.h
index cde4fe1..6571379 100644
--- a/be/src/exec/parquet/parquet-column-readers.h
+++ b/be/src/exec/parquet/parquet-column-readers.h
@@ -22,6 +22,7 @@
 
 #include "exec/parquet/hdfs-parquet-scanner.h"
 #include "exec/parquet/parquet-level-decoder.h"
+#include "runtime/io/request-ranges.h"
 #include "util/bit-stream-utils.h"
 #include "util/codec.h"
 
@@ -340,7 +341,8 @@ class BaseScalarColumnReader : public ParquetColumnReader {
   // Less frequently used members that are not accessed in inner loop should go below
   // here so they do not occupy precious cache line space.
 
-  /// The number of values seen so far. Updated per data page.
+  /// The number of values seen so far. Updated per data page. It is only used for
+  /// validation. It is not used when we filter rows based on the page index.
   int64_t num_values_read_ = 0;
 
   /// Metadata for the column for the current row group.
@@ -367,6 +369,50 @@ class BaseScalarColumnReader : public ParquetColumnReader {
   /// Header for current data page.
   parquet::PageHeader current_page_header_;
 
+  /////////////////////////////////////////
+  /// BEGIN: Members used for page filtering
+  /// They are not set when we don't filter out pages at all.
+
+  /// The parquet OffsetIndex of this column chunk. It stores information about the page
+  /// locations and row indexes.
+  parquet::OffsetIndex offset_index_;
+
+  /// Collection of page indexes that we are going to read. When we use page filtering,
+  /// we issue a scan-range with sub-ranges that belong to the candidate data pages, i.e.
+  /// we will not even see the bytes of the filtered out pages.
+  /// It is set in HdfsParquetScanner::CalculateCandidatePagesForColumns().
+  std::vector<int> candidate_data_pages_;
+
+  /// Stores an index to 'candidate_data_pages_'. It is the currently read data page when
+  /// we have candidate pages.
+  int candidate_page_idx_ = -1;
+
+  /// Stores an index to 'parent_->candidate_ranges_'. When we have candidate pages, we
+  /// are processing values in this range. When we leave this range, then we need to skip
+  /// rows and increment this field.
+  int current_row_range_ = 0;
+
+  /// Index of the current top-level row. It is updated together with the rep/def levels.
+  /// When updated, and its value is N, it means that we already processed the Nth row
+  /// completely.
+  int64_t current_row_ = -1;
+
+  /// This flag is needed for the proper tracking of the last processed row.
+  /// The batched and non-batched interfaces behave differently. E.g. when using the
+  /// batched interface you don't need to invoke NextLevels() in advance, while you need
+  /// to do that for the non-batched interface. In fact, the batched interface doesn't
+  /// call NextLevels() at all. It directly reads the levels then the corresponding value
+  /// in a loop. On the other hand, the non-batched interface (ReadValue()) expects that
+  /// the levels for the next value are already read via NextLevels(). And after reading
+  /// the value it calls NextLevels() to read the levels of the next value. Hence, the
+  /// levels are always read ahead in this case.
+  /// Returns true, if we read ahead def and rep levels. In this case 'current_row_'
+  /// points to the row we'll process next, not to the row we already processed.
+  bool levels_readahead_ = false;
+
+  /// END: Members used for page filtering
+  /////////////////////////////////////////
+
   /// Reads the next page header into next_page_header/next_header_size.
   /// If the stream reaches the end before reading a complete page header,
   /// eos is set to true. If peek is false, the stream position is advanced
@@ -422,6 +468,91 @@ class BaseScalarColumnReader : public ParquetColumnReader {
         && slot_desc_ != nullptr && slot_desc_->type().IsVarLenStringType();
   }
 
+  /// Resets structures related to page filtering.
+  void ResetPageFiltering();
+
+  /// Must be invoked when starting a new page. Updates the structures related to page
+  /// filtering and skips the first rows if needed.
+  Status StartPageFiltering();
+
+  /// Returns the index of the row that was processed most recently.
+  int64_t LastProcessedRow() const {
+    if (def_level_ == ParquetLevel::ROW_GROUP_END) return current_row_;
+    return levels_readahead_ ? current_row_ - 1 : current_row_;
+  }
+
+  /// Creates sub-ranges if page filtering is active.
+  void CreateSubRanges(std::vector<io::ScanRange::SubRange>* sub_ranges);
+
+  /// Calculates how many encoded values we need to skip in the page data, then
+  /// invokes SkipEncodedValuesInPage(). The number of the encoded values depends on the
+  /// nesting of the data, and also on the number of null values.
+  /// E.g. if 'num_rows' is 10, and every row contains an array of 10 integers, then
+  /// we need to skip 100 encoded values in the page data.
+  /// And, if 'num_rows' is 10, and every second value is NULL, then we only need to skip
+  /// 5 values in the page data since NULL values are not stored there.
+  /// The number of primitive values can be calculated from the def and rep levels.
+  /// Returns true on success, false otherwise.
+  bool SkipTopLevelRows(int64_t num_rows);
+
+  /// Skip values in the page data. Returns true on success, false otherwise.
+  virtual bool SkipEncodedValuesInPage(int64_t num_values) = 0;
+
+  /// Only valid to call this function when we filter out pages based on the Page index.
+  /// Returns the RowGroup-level index of the starting row in the candidate page.
+  int64_t FirstRowIdxInCurrentPage() const {
+    DCHECK(!candidate_data_pages_.empty());
+    return offset_index_.page_locations[
+        candidate_data_pages_[candidate_page_idx_]].first_row_index;
+  }
+
+  /// The number of top-level rows until the end of the current candidate range.
+  /// For simple columns it returns 0 if we have processed the last row in the current
+  /// range. For nested columns, it returns 0 when we are processing values from the last
+  /// row in the current row range.
+  int RowsRemainingInCandidateRange() const {
+    DCHECK(!candidate_data_pages_.empty());
+    return parent_->candidate_ranges_[current_row_range_].last - current_row_;
+  }
+
+  /// Returns true if we are filtering pages.
+  bool DoesPageFiltering() const {
+    return !candidate_data_pages_.empty();
+  }
+
+  bool IsLastCandidateRange() const {
+    return current_row_range_ == parent_->candidate_ranges_.size() - 1;
+  }
+
+  template <bool IN_COLLECTION>
+  bool ConsumedCurrentCandidateRange() {
+    return RowsRemainingInCandidateRange() == 0 &&
+        (!IN_COLLECTION || max_rep_level() == 0 || num_buffered_values_ == 0 ||
+            rep_levels_.PeekLevel() == 0);
+  }
+
+  /// This function fills the position slots up to 'max_values' or up to values belonging
+  /// to the current candidate range or up to cached repetition levels. It returns the
+  /// count of values until its limit. It can also be used without a position slot, in
+  /// that case it returns the number of values until the first limit it reaches.
+  /// It consumes cached repetition levels.
+  int FillPositionsInCandidateRange(int rows_remaining, int max_values,
+      uint8_t* RESTRICT tuple_mem, int tuple_size);
+
+  /// Advance to the next candidate range that contains 'current_row_'.
+  /// Cannot advance past the last candidate range.
+  void AdvanceCandidateRange();
+
+  /// Returns true if the current candidate row range has some rows in the current page.
+  bool PageHasRemainingCandidateRows() const;
+
+  /// Skip top level rows in current page until current candidate range is reached.
+  bool SkipRowsInPage();
+
+  /// Invoke this when there aren't any more interesting rows in the current page based
+  /// on the page index. It starts reading the next page.
+  bool JumpToNextPage();
+
   /// Slow-path status construction code for def/rep decoding errors. 'level_name' is
   /// either "rep" or "def", 'decoded_level' is the value returned from
   /// ParquetLevelDecoder::ReadLevel() and 'max_level' is the maximum allowed value.
diff --git a/be/src/exec/parquet/parquet-column-stats.cc b/be/src/exec/parquet/parquet-column-stats.cc
index 23f0012..7dca501 100644
--- a/be/src/exec/parquet/parquet-column-stats.cc
+++ b/be/src/exec/parquet/parquet-column-stats.cc
@@ -25,6 +25,20 @@
 
 namespace impala {
 
+bool ColumnStatsReader::GetRequiredStatsField(const string& fn_name,
+    StatsField* stats_field) {
+  if (fn_name == "lt" || fn_name == "le") {
+    *stats_field = StatsField::MIN;
+    return true;
+  } else if (fn_name == "gt" || fn_name == "ge") {
+    *stats_field = StatsField::MAX;
+    return true;
+  }
+  DCHECK(false) << "Unsupported function name for statistics evaluation: "
+                << fn_name;
+  return false;
+}
+
 bool ColumnStatsReader::ReadFromThrift(StatsField stats_field, void* slot) const {
   if (!(col_chunk_.__isset.meta_data && col_chunk_.meta_data.__isset.statistics)) {
     return false;
@@ -58,14 +72,19 @@ bool ColumnStatsReader::ReadFromThrift(StatsField stats_field, void* slot) const
   }
   if (stat_value == nullptr) return false;
 
+  return ReadFromString(stats_field, *stat_value, slot);
+}
+
+bool ColumnStatsReader::ReadFromString(StatsField stats_field,
+    const string& encoded_value, void* slot) const {
   switch (col_type_.type) {
     case TYPE_BOOLEAN:
-      return ColumnStats<bool>::DecodePlainValue(*stat_value, slot,
+      return ColumnStats<bool>::DecodePlainValue(encoded_value, slot,
           parquet::Type::BOOLEAN);
     case TYPE_TINYINT: {
       // parquet::Statistics encodes INT_8 values using 4 bytes.
       int32_t col_stats;
-      bool ret = ColumnStats<int32_t>::DecodePlainValue(*stat_value, &col_stats,
+      bool ret = ColumnStats<int32_t>::DecodePlainValue(encoded_value, &col_stats,
           parquet::Type::INT32);
       if (!ret || col_stats < std::numeric_limits<int8_t>::min() ||
           col_stats > std::numeric_limits<int8_t>::max()) {
@@ -77,7 +96,7 @@ bool ColumnStatsReader::ReadFromThrift(StatsField stats_field, void* slot) const
     case TYPE_SMALLINT: {
       // parquet::Statistics encodes INT_16 values using 4 bytes.
       int32_t col_stats;
-      bool ret = ColumnStats<int32_t>::DecodePlainValue(*stat_value, &col_stats,
+      bool ret = ColumnStats<int32_t>::DecodePlainValue(encoded_value, &col_stats,
           parquet::Type::INT32);
       if (!ret || col_stats < std::numeric_limits<int16_t>::min() ||
           col_stats > std::numeric_limits<int16_t>::max()) {
@@ -87,23 +106,24 @@ bool ColumnStatsReader::ReadFromThrift(StatsField stats_field, void* slot) const
       return true;
     }
     case TYPE_INT:
-      return ColumnStats<int32_t>::DecodePlainValue(*stat_value, slot, element_.type);
+      return ColumnStats<int32_t>::DecodePlainValue(encoded_value, slot, element_.type);
     case TYPE_BIGINT:
-      return ColumnStats<int64_t>::DecodePlainValue(*stat_value, slot, element_.type);
+      return ColumnStats<int64_t>::DecodePlainValue(encoded_value, slot, element_.type);
     case TYPE_FLOAT:
       // IMPALA-6527, IMPALA-6538: ignore min/max stats if NaN
-      return ColumnStats<float>::DecodePlainValue(*stat_value, slot, element_.type)
-          && !std::isnan(*reinterpret_cast<float*>(slot));
+      return ColumnStats<float>::DecodePlainValue(encoded_value, slot, element_.type) &&
+          !std::isnan(*reinterpret_cast<float*>(slot));
     case TYPE_DOUBLE:
       // IMPALA-6527, IMPALA-6538: ignore min/max stats if NaN
-      return ColumnStats<double>::DecodePlainValue(*stat_value, slot, element_.type)
-          && !std::isnan(*reinterpret_cast<double*>(slot));
+      return ColumnStats<double>::DecodePlainValue(encoded_value, slot, element_.type) &&
+          !std::isnan(*reinterpret_cast<double*>(slot));
     case TYPE_TIMESTAMP:
-      return DecodeTimestamp(*stat_value, stats_field,
+      return DecodeTimestamp(encoded_value, stats_field,
           static_cast<TimestampValue*>(slot));
     case TYPE_STRING:
     case TYPE_VARCHAR:
-      return ColumnStats<StringValue>::DecodePlainValue(*stat_value, slot, element_.type);
+      return ColumnStats<StringValue>::DecodePlainValue(encoded_value, slot,
+          element_.type);
     case TYPE_CHAR:
       /// We don't read statistics for CHAR columns, since CHAR support is broken in
       /// Impala (IMPALA-1652).
@@ -111,18 +131,18 @@ bool ColumnStatsReader::ReadFromThrift(StatsField stats_field, void* slot) const
     case TYPE_DECIMAL:
       switch (col_type_.GetByteSize()) {
         case 4:
-          return ColumnStats<Decimal4Value>::DecodePlainValue(*stat_value, slot,
+          return ColumnStats<Decimal4Value>::DecodePlainValue(encoded_value, slot,
               element_.type);
         case 8:
-          return ColumnStats<Decimal8Value>::DecodePlainValue(*stat_value, slot,
+          return ColumnStats<Decimal8Value>::DecodePlainValue(encoded_value, slot,
               element_.type);
         case 16:
-          return ColumnStats<Decimal16Value>::DecodePlainValue(*stat_value, slot,
+          return ColumnStats<Decimal16Value>::DecodePlainValue(encoded_value, slot,
               element_.type);
         }
       DCHECK(false) << "Unknown decimal byte size: " << col_type_.GetByteSize();
     case TYPE_DATE:
-      return ColumnStats<DateValue>::DecodePlainValue(*stat_value, slot, element_.type);
+      return ColumnStats<DateValue>::DecodePlainValue(encoded_value, slot, element_.type);
     default:
       DCHECK(false) << col_type_.DebugString();
   }
diff --git a/be/src/exec/parquet/parquet-column-stats.h b/be/src/exec/parquet/parquet-column-stats.h
index edce474..9eb08ff 100644
--- a/be/src/exec/parquet/parquet-column-stats.h
+++ b/be/src/exec/parquet/parquet-column-stats.h
@@ -270,11 +270,23 @@ public:
   /// was successful, false otherwise.
   bool ReadFromThrift(StatsField stats_field, void* slot) const;
 
+  /// Read plain encoded value from a string 'encoded_value' into 'slot'.
+  bool ReadFromString(StatsField stats_field, const std::string& encoded_value,
+      void* slot) const;
+
   // Gets the null_count statistics from the column chunk's metadata and returns
   // it via an output parameter.
   // Returns true if the null_count stats were read successfully, false otherwise.
   bool ReadNullCountStat(int64_t* null_count) const;
 
+  /// Returns the required stats field for the given function. 'fn_name' can be 'le',
+  /// 'lt', 'ge', and 'gt' (i.e. binary operators <=, <, >=, >). If we want to check that
+  /// whether a column contains a value less than a constant, we need the minimum value of
+  /// the column to answer that question. And, to answer the opposite question we need the
+  /// maximum value. The required stats field (min/max) will be stored in 'stats_field'.
+  /// The function returns true on success, false otherwise.
+  static bool GetRequiredStatsField(const std::string& fn_name, StatsField* stats_field);
+
 private:
   /// Returns true if we support reading statistics stored in the fields 'min_value' and
   /// 'max_value' in parquet::Statistics for the type 'col_type_' and the column order
diff --git a/be/src/exec/parquet/parquet-common-test.cc b/be/src/exec/parquet/parquet-common-test.cc
new file mode 100644
index 0000000..8605845
--- /dev/null
+++ b/be/src/exec/parquet/parquet-common-test.cc
@@ -0,0 +1,122 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+
+#include "exec/parquet/parquet-common.h"
+#include "testutil/gtest-util.h"
+
+#include "common/names.h"
+
+namespace impala {
+
+using RangeVec = vector<RowRange>;
+
+void ValidateRanges(RangeVec skip_ranges, int num_rows, const RangeVec& expected,
+    bool should_succeed = true) {
+  RangeVec result;
+  bool success = ComputeCandidateRanges(num_rows, &skip_ranges, &result);
+  EXPECT_EQ(should_succeed, success);
+  if (success) EXPECT_EQ(expected, result);
+}
+
+void ValidateRangesError(RangeVec skip_ranges, int num_rows, const RangeVec& expected) {
+  ValidateRanges(skip_ranges, num_rows, expected, false);
+}
+
+/// This test exercises the logic of ComputeCandidateRanges() with various
+/// inputs. ComputeCandidateRanges() determines the row ranges we need to scan
+/// in a Parquet file based on the total row count and a set of ranges that we
+/// need to skip.
+TEST(ParquetCommon, ComputeCandidateRanges) {
+  ValidateRanges({}, -1, {});
+  ValidateRanges({{0, 5}}, 10, {{6, 9}});
+  ValidateRanges({{6, 9}}, 10, {{0, 5}});
+  ValidateRanges({{0, 9}}, 10, {});
+  ValidateRanges({{2, 4}, {2, 4}}, 10, {{0, 1}, {5, 9}});
+  ValidateRanges({{2, 4}, {6, 6}}, 10, {{0, 1}, {5, 5}, {7, 9}});
+  ValidateRanges({{2, 4}, {3, 7}}, 10, {{0, 1}, {8, 9}});
+  ValidateRanges({{2, 6}, {1, 8}}, 10, {{0, 0}, {9, 9}});
+  ValidateRanges({{1, 2}, {2, 5}, {7, 8}}, 10, {{0, 0}, {6, 6}, {9, 9}});
+  ValidateRanges({{0, 2}, {6, 8}, {3, 5}}, 10, {{9, 9}});
+  ValidateRanges({{1, 2}, {1, 4}, {1, 8}}, 10, {{0, 0}, {9, 9}});
+  ValidateRanges({{7, 8}, {1, 8}, {3, 8}}, 10, {{0, 0}, {9, 9}});
+  // Error cases:
+  // Range starts at negative number
+  ValidateRangesError({{-1, 1}}, 10, {});
+  // Range is not a subrange of [0..num_rows)
+  ValidateRangesError({{2, 12}}, 10, {});
+  // First > last
+  ValidateRangesError({{6, 3}}, 10, {});
+}
+
+void ValidatePages(const vector<int64_t>& first_row_indexes, const RangeVec& ranges,
+    int64_t num_rows, const vector<int>& expected_page_indexes,
+    bool should_succeed = true) {
+  vector<parquet::PageLocation> page_locations;
+  for (int64_t first_row_index : first_row_indexes) {
+    parquet::PageLocation page_loc;
+    page_loc.first_row_index = first_row_index;
+    page_locations.push_back(page_loc);
+  }
+  vector<int> candidate_pages;
+  bool success = ComputeCandidatePages(page_locations, ranges, num_rows,
+      &candidate_pages);
+  EXPECT_EQ(should_succeed, success);
+  if (success) EXPECT_EQ(expected_page_indexes, candidate_pages);
+}
+
+void ValidatePagesError(const vector<int64_t>& first_row_indexes, const RangeVec& ranges,
+    int64_t num_rows, const vector<int>& expected_page_indexes) {
+  ValidatePages(first_row_indexes, ranges, num_rows, expected_page_indexes, false);
+}
+
+/// This test exercises the logic of ComputeCandidatePages(). It creates fake vectors
+/// of page location objects and candidate ranges then checks whether the right pages
+/// were selected.
+TEST(ParquetCommon, ComputeCandidatePages) {
+  ValidatePages({0}, {}, 10, {});
+  ValidatePages({0}, {{2, 3}}, 10, {0});
+  ValidatePages({0}, {{0, 9}}, 10, {0});
+  ValidatePages({0, 10, 20, 50, 70}, {{0, 9}}, 100, {0});
+  ValidatePages({0, 10, 20, 50, 70}, {{5, 15}}, 100, {0, 1});
+  ValidatePages({0, 10, 20, 50, 70}, {{5, 15}}, 100, {0, 1});
+  ValidatePages({0, 10, 20, 50, 70}, {{5, 15}, {21, 50}}, 100, {0, 1, 2, 3});
+  ValidatePages({0, 10, 20, 50, 70}, {{90, 92}}, 100, {4});
+  ValidatePages({0, 10, 20, 50, 70},
+                {{0, 9}, {10, 19}, {20, 49}, {50, 69}, {70, 99}}, 100, {0, 1, 2, 3, 4});
+  ValidatePages({0, 10, 20, 50, 70}, {{5, 75}}, 100, {0, 1, 2, 3, 4});
+  ValidatePages({0, 10, 20, 50, 70}, {{9, 10}, {69, 70}}, 100, {0, 1, 3, 4});
+  ValidatePages({0, 10, 20, 50, 70}, {{9, 9}, {10, 10}, {50, 50}}, 100, {0, 1, 3});
+  ValidatePages({0}, {{0, 9LL + INT_MAX}}, 100LL + INT_MAX, {0});
+  ValidatePages({0LL, 10LL + INT_MAX, 20LL + INT_MAX, 50LL + INT_MAX, 70LL + INT_MAX},
+      {{0LL, 9LL + INT_MAX}}, 100LL + INT_MAX, {0});
+  ValidatePages({0LL, 10LL + UINT_MAX, 20LL + UINT_MAX, 50LL + UINT_MAX, 70LL + UINT_MAX},
+        {{0LL, 9LL + UINT_MAX}}, 100LL + UINT_MAX, {0});
+  // Error cases:
+  // Negative first row index.
+  ValidatePagesError({-1, 0, 10}, {{0, 10}}, 10, {0});
+  // First row index greater then number of rows.
+  ValidatePagesError({5, 10, 15}, {{0, 10}}, 10, {0});
+  // First row indexes are not in order.
+  ValidatePagesError({0, 5, 3}, {{0, 10}}, 10, {0});
+  // Row ranges don't overlap with pages.
+  ValidatePagesError({0, 5, 10}, {{15, 20}}, 12, {0});
+}
+
+}
+
+IMPALA_TEST_MAIN();
diff --git a/be/src/exec/parquet/parquet-common.cc b/be/src/exec/parquet/parquet-common.cc
index 769e189..e7ee624 100644
--- a/be/src/exec/parquet/parquet-common.cc
+++ b/be/src/exec/parquet/parquet-common.cc
@@ -58,6 +58,105 @@ parquet::CompressionCodec::type ConvertImpalaToParquetCodec(
   return IMPALA_TO_PARQUET_CODEC[codec];
 }
 
+void GetRowRangeForPage(const parquet::RowGroup& row_group,
+    const parquet::OffsetIndex& offset_index, int page_idx, RowRange* row_range) {
+  const auto& page_locations = offset_index.page_locations;
+  DCHECK_LT(page_idx, page_locations.size());
+  row_range->first = page_locations[page_idx].first_row_index;
+  if (page_idx == page_locations.size() - 1) {
+    row_range->last = row_group.num_rows - 1;
+  } else {
+    row_range->last = page_locations[page_idx + 1].first_row_index - 1;
+  }
+}
+
+static bool ValidateRowRangesData(const vector<RowRange>& skip_ranges,
+    const int64_t num_rows) {
+  for (auto& range : skip_ranges) {
+    if (range.first > range.last || range.first < 0 || range.last >= num_rows) {
+      return false;
+    }
+  }
+  return true;
+}
+
+bool ComputeCandidateRanges(const int64_t num_rows, vector<RowRange>* skip_ranges,
+    vector<RowRange>* candidate_ranges) {
+  if (!ValidateRowRangesData(*skip_ranges, num_rows)) return false;
+  sort(skip_ranges->begin(), skip_ranges->end());
+  candidate_ranges->clear();
+  // 'skip_end' tracks the end of a continuous range of rows that needs to be skipped.
+  // 'skip_ranges' are sorted, so we can start at the beginning.
+  int skip_end = -1;
+  for (auto& skip_range : *skip_ranges) {
+    if (skip_end + 1 >= skip_range.first) {
+      // We can extend 'skip_end' to the end of 'skip_range'.
+      if (skip_end < skip_range.last) skip_end = skip_range.last;
+    } else {
+      // We found a gap in 'skip_ranges', i.e. a row range that is not covered by
+      // 'skip_ranges'.
+      candidate_ranges->push_back({skip_end + 1, skip_range.first - 1});
+      // Let's track the end of the next continuous range that needs to be skipped.
+      skip_end = skip_range.last;
+    }
+  }
+  // If the last skip ended before 'range_end', add the remaining range to
+  // the filtered ranges.
+  if (skip_end < num_rows - 1) {
+    candidate_ranges->push_back({skip_end + 1, num_rows - 1});
+  }
+  return true;
+}
+
+static bool ValidatePageLocations(const vector<parquet::PageLocation>& page_locations,
+    const int64_t num_rows) {
+  for (int i = 0; i < page_locations.size(); ++i) {
+    auto& page_loc = page_locations[i];
+    if (page_loc.first_row_index < 0 || page_loc.first_row_index >= num_rows) {
+      return false;
+    }
+    if (i + 1 < page_locations.size()) {
+      auto& next_page_loc = page_locations[i+1];
+      if (page_loc.first_row_index >= next_page_loc.first_row_index) return false;
+    }
+  }
+  return true;
+}
+
+static bool RangesIntersect(const RowRange& lhs,
+    const RowRange& rhs) {
+  int64_t higher_first = std::max(lhs.first, rhs.first);
+  int64_t lower_last = std::min(lhs.last, rhs.last);
+  return higher_first <= lower_last;
+}
+
+bool ComputeCandidatePages(
+    const vector<parquet::PageLocation>& page_locations,
+    const vector<RowRange>& candidate_ranges,
+    const int64_t num_rows, vector<int>* candidate_pages) {
+  if (!ValidatePageLocations(page_locations, num_rows)) return false;
+
+  int range_idx = 0;
+  for (int i = 0; i < page_locations.size(); ++i) {
+    auto& page_location = page_locations[i];
+    int64_t page_start = page_location.first_row_index;
+    int64_t page_end = i != page_locations.size() - 1 ?
+                       page_locations[i + 1].first_row_index - 1 :
+                       num_rows - 1;
+    while (range_idx < candidate_ranges.size() &&
+        candidate_ranges[range_idx].last < page_start) {
+      ++range_idx;
+    }
+    if (range_idx >= candidate_ranges.size()) break;
+    if (RangesIntersect(candidate_ranges[range_idx], {page_start, page_end})) {
+      candidate_pages->push_back(i);
+    }
+  }
+  // When there are candidate ranges, then we should have at least one candidate page.
+  if (!candidate_ranges.empty() && candidate_pages->empty()) return false;
+  return true;
+}
+
 bool ParquetTimestampDecoder::GetTimestampInfoFromSchema(const parquet::SchemaElement& e,
     Precision& precision, bool& needs_conversion) {
   if (e.type == parquet::Type::INT96) {
diff --git a/be/src/exec/parquet/parquet-common.h b/be/src/exec/parquet/parquet-common.h
index 9fad72d..243064e 100644
--- a/be/src/exec/parquet/parquet-common.h
+++ b/be/src/exec/parquet/parquet-common.h
@@ -36,6 +36,20 @@ namespace impala {
 const uint8_t PARQUET_VERSION_NUMBER[4] = {'P', 'A', 'R', '1'};
 const uint32_t PARQUET_CURRENT_VERSION = 1;
 
+/// Struct that specifies an inclusive range of rows.
+struct RowRange {
+  int64_t first;
+  int64_t last;
+};
+
+inline bool operator==(const RowRange& lhs, const RowRange& rhs) {
+  return lhs.first == rhs.first && lhs.last == rhs.last;
+}
+
+inline bool operator<(const RowRange& lhs, const RowRange& rhs) {
+  return std::tie(lhs.first, lhs.last) < std::tie(rhs.first, rhs.last);
+}
+
 /// Return the Impala compression type for the given Parquet codec. The caller must
 /// validate that the codec is a supported one, otherwise this will DCHECK.
 THdfsCompression::type ConvertParquetToImpalaCodec(parquet::CompressionCodec::type codec);
@@ -45,6 +59,29 @@ THdfsCompression::type ConvertParquetToImpalaCodec(parquet::CompressionCodec::ty
 parquet::CompressionCodec::type ConvertImpalaToParquetCodec(
     THdfsCompression::type codec);
 
+/// Returns the row range for the given page idx using information from the row group and
+/// offset index.
+void GetRowRangeForPage(const parquet::RowGroup& row_group,
+    const parquet::OffsetIndex& offset_index, int page_idx, RowRange* row_range);
+
+/// Given a column chunk containing rows in the range [0, 'num_rows'), 'skip_ranges'
+/// contains the row ranges we are not interested in. 'skip_ranges' can be redundant and
+/// can potentially contain ranges that intersect with each other. As a side-effect, this
+/// function sorts 'skip_ranges'.
+/// 'candidate_ranges' will contain the set of row ranges that we want to scan.
+/// Returns false if the input data is invalid.
+bool ComputeCandidateRanges(const int64_t num_rows, std::vector<RowRange>* skip_ranges,
+    std::vector<RowRange>* candidate_ranges);
+
+/// This function computes the pages that intersect with 'candidate_ranges'. I.e. it
+/// determines the pages that we actually need to read from a given column chunk.
+/// 'candidate_pages' will hold the indexes of such pages.
+/// Returns true on success, false otherwise.
+bool ComputeCandidatePages(
+    const std::vector<parquet::PageLocation>& page_locations,
+    const std::vector<RowRange>& candidate_ranges,
+    const int64_t num_rows, std::vector<int>* candidate_pages);
+
 /// The plain encoding does not maintain any state so all these functions
 /// are static helpers.
 /// TODO: we are using templates to provide a generic interface (over the
@@ -227,6 +264,28 @@ class ParquetPlainEncoder {
     return byte_size;
   }
 
+  /// Returns the byte size of the encoded data when PLAIN encoding is used.
+  /// Returns -1 if the encoded data passes the end of the buffer.
+  template <parquet::Type::type PARQUET_TYPE>
+  static int64_t EncodedLen(const uint8_t* buffer, const uint8_t* buffer_end,
+      int fixed_len_size, int32_t num_values) {
+    using parquet::Type;
+    int byte_size = 0;
+    switch (PARQUET_TYPE) {
+      case Type::INT32: byte_size = 4; break;
+      case Type::INT64: byte_size = 8; break;
+      case Type::INT96: byte_size = 12; break;
+      case Type::FLOAT: byte_size = 4; break;
+      case Type::DOUBLE: byte_size = 8; break;
+      case Type::FIXED_LEN_BYTE_ARRAY: byte_size = fixed_len_size; break;
+      default:
+        DCHECK(false);
+        return -1;
+    }
+    int64_t encoded_len = byte_size * num_values;
+    return encoded_len > buffer_end - buffer ? -1 : encoded_len;
+  }
+
   /// Batched version of Decode() that tries to decode 'num_values' values from the memory
   /// range [buffer, buffer_end) and writes them to 'v' with a stride of 'stride' bytes.
   /// Returns the number of bytes read from 'buffer' or -1 if there was an error
@@ -298,6 +357,26 @@ inline int ParquetPlainEncoder::ByteSize(const TimestampValue& v) {
   return 12;
 }
 
+/// Returns the byte size of the encoded data when PLAIN encoding is used.
+/// Returns -1 if the encoded data passes the end of the buffer.
+template <>
+inline int64_t ParquetPlainEncoder::EncodedLen<parquet::Type::BYTE_ARRAY>(
+    const uint8_t* buffer, const uint8_t* buffer_end, int fixed_len_size,
+    int32_t num_values) {
+  const uint8_t* orig_buffer = buffer;
+  int64_t values_remaining = num_values;
+  while (values_remaining > 0) {
+    if (UNLIKELY(buffer_end - buffer < sizeof(int32_t))) return -1;
+    int32_t str_len;
+    memcpy(&str_len, buffer, sizeof(int32_t));
+    str_len += sizeof(int32_t);
+    if (UNLIKELY(str_len < sizeof(int32_t) || buffer_end - buffer < str_len)) return -1;
+    buffer += str_len;
+    --values_remaining;
+  }
+  return buffer - orig_buffer;
+}
+
 template <typename From, typename To>
 inline int DecodeWithConversion(const uint8_t* buffer, const uint8_t* buffer_end, To* v) {
   int byte_size = sizeof(From);
diff --git a/be/src/exec/parquet/parquet-level-decoder.h b/be/src/exec/parquet/parquet-level-decoder.h
index 2e0c24e..8626b4d 100644
--- a/be/src/exec/parquet/parquet-level-decoder.h
+++ b/be/src/exec/parquet/parquet-level-decoder.h
@@ -59,6 +59,10 @@ class ParquetLevelDecoder {
   /// as batched methods.
   inline int16_t ReadLevel();
 
+  /// Returns the next level or INVALID_LEVEL if there was an error. It doesn't move
+  /// to the next level.
+  inline int16_t PeekLevel();
+
   /// If the next value is part of a repeated run and is not cached, return the length
   /// of the repeated run. A max level of 0 is treated as an arbitrarily long run of
   /// zeroes, so this returns numeric_limits<int32_t>::max(). Otherwise return 0.
@@ -84,6 +88,11 @@ class ParquetLevelDecoder {
     DCHECK_LT(cached_level_idx_, num_cached_levels_);
     return cached_levels_[cached_level_idx_++];
   }
+  // Retrieving the next cached level without consuming it.
+  uint8_t CachePeekNext() {
+    DCHECK_LT(cached_level_idx_, num_cached_levels_);
+    return cached_levels_[cached_level_idx_];
+  }
   void CacheSkipLevels(int num_levels) {
     DCHECK_LE(cached_level_idx_ + num_levels, num_cached_levels_);
     cached_level_idx_ += num_levels;
@@ -97,6 +106,10 @@ class ParquetLevelDecoder {
   /// the cache from pool, if necessary.
   Status InitCache(MemPool* pool, int cache_size);
 
+  // Invokes FillCache() when the cache is empty. Returns true if there are values
+  // in the cache already, or filling the cache was successful, returns false otherwise.
+  inline bool PrepareForRead();
+
   /// Decodes and writes a batch of levels into the cache. Returns true and sets
   /// the number of values written to the cache via *num_cached_levels if no errors
   /// are encountered. *num_cached_levels is < 'batch_size' in this case iff the
@@ -133,19 +146,29 @@ class ParquetLevelDecoder {
   TErrorCode::type decoding_error_code_;
 };
 
-inline int16_t ParquetLevelDecoder::ReadLevel() {
+inline bool ParquetLevelDecoder::PrepareForRead() {
   if (UNLIKELY(!CacheHasNext())) {
     if (UNLIKELY(!FillCache(cache_size_, &num_cached_levels_))) {
-      return ParquetLevel::INVALID_LEVEL;
+      return false;
     }
     DCHECK_GE(num_cached_levels_, 0);
     if (UNLIKELY(num_cached_levels_ == 0)) {
-      return ParquetLevel::INVALID_LEVEL;
+      return false;
     }
   }
+  return true;
+}
+
+inline int16_t ParquetLevelDecoder::ReadLevel() {
+  if (UNLIKELY(!PrepareForRead())) return ParquetLevel::INVALID_LEVEL;
   return CacheGetNext();
 }
 
+inline int16_t ParquetLevelDecoder::PeekLevel() {
+  if (UNLIKELY(!PrepareForRead())) return ParquetLevel::INVALID_LEVEL;
+  return CachePeekNext();
+}
+
 inline int32_t ParquetLevelDecoder::NextRepeatedRunLength() {
   if (CacheHasNext()) return 0;
   // Treat always-zero levels as an infinitely long run of zeroes. Return the maximum
diff --git a/be/src/exec/parquet/parquet-page-index-test.cc b/be/src/exec/parquet/parquet-page-index-test.cc
new file mode 100644
index 0000000..19656b5
--- /dev/null
+++ b/be/src/exec/parquet/parquet-page-index-test.cc
@@ -0,0 +1,108 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "gen-cpp/parquet_types.h"
+#include "exec/parquet/parquet-page-index.h"
+#include "testutil/gtest-util.h"
+
+#include "common/names.h"
+
+namespace impala {
+
+struct PageIndexRanges {
+  int64_t column_index_offset;
+  int64_t column_index_length;
+  int64_t offset_index_offset;
+  int64_t offset_index_length;
+};
+
+using RowGroupRanges = vector<PageIndexRanges>;
+
+/// Creates a parquet::RowGroup object based on data in 'row_group_ranges'. It sets
+/// the offsets and sizes of the column index and offset index members of the row group.
+/// It doesn't set the member if the input value is -1.
+void ConstructFakeRowGroup(const RowGroupRanges& row_group_ranges,
+    parquet::RowGroup* row_group) {
+  for (auto& page_index_ranges : row_group_ranges) {
+    parquet::ColumnChunk col_chunk;
+    if (page_index_ranges.column_index_offset != -1) {
+      col_chunk.__set_column_index_offset(page_index_ranges.column_index_offset);
+    }
+    if (page_index_ranges.column_index_length != -1) {
+      col_chunk.__set_column_index_length(page_index_ranges.column_index_length);
+    }
+    if (page_index_ranges.offset_index_offset != -1) {
+      col_chunk.__set_offset_index_offset(page_index_ranges.offset_index_offset);
+    }
+    if (page_index_ranges.offset_index_length != -1) {
+      col_chunk.__set_offset_index_length(page_index_ranges.offset_index_length);
+    }
+    row_group->columns.push_back(col_chunk);
+  }
+}
+
+/// Validates that 'DeterminePageIndexRangesInRowGroup()' selects the expected file
+/// offsets and sizes or returns false when the row group doesn't have a page index.
+void ValidatePageIndexRange(const RowGroupRanges& row_group_ranges,
+    bool expected_has_page_index, int expected_ci_start, int expected_ci_size,
+    int expected_oi_start, int expected_oi_size) {
+  parquet::RowGroup row_group;
+  ConstructFakeRowGroup(row_group_ranges, &row_group);
+
+  int64_t ci_start;
+  int64_t ci_size;
+  int64_t oi_start;
+  int64_t oi_size;
+  bool has_page_index = ParquetPageIndex::DeterminePageIndexRangesInRowGroup(row_group,
+      &ci_start, &ci_size, &oi_start, &oi_size);
+  ASSERT_EQ(expected_has_page_index, has_page_index);
+  if (has_page_index) {
+    EXPECT_EQ(expected_ci_start, ci_start);
+    EXPECT_EQ(expected_ci_size, ci_size);
+    EXPECT_EQ(expected_oi_start, oi_start);
+    EXPECT_EQ(expected_oi_size, oi_size);
+  }
+}
+
+/// This test constructs a couple of artificial row groups with page index offsets in
+/// them. Then it validates if ParquetPageIndex::DeterminePageIndexRangeInFile() properly
+/// computes the file range that contains the whole page index.
+TEST(ParquetPageIndex, DeterminePageIndexRangesInRowGroup) {
+  // No Column chunks
+  ValidatePageIndexRange({}, false, -1, -1, -1, -1);
+  // No page index at all.
+  ValidatePageIndexRange({{-1, -1, -1, -1}}, false, -1, -1, -1, -1);
+  // Page index for single column chunk.
+  ValidatePageIndexRange({{10, 5, 15, 5}}, true, 10, 5, 15, 5);
+  // Page index for two column chunks.
+  ValidatePageIndexRange({{10, 5, 30, 25}, {15, 15, 50, 20}}, true, 10, 20, 30, 40);
+  // Page index for second column chunk..
+  ValidatePageIndexRange({{-1, -1, -1, -1}, {20, 10, 30, 25}}, true, 20, 10, 30, 25);
+  // Page index for first column chunk.
+  ValidatePageIndexRange({{10, 5, 15, 5}, {-1, -1, -1, -1}}, true, 10, 5, 15, 5);
+  // Missing offset index for first column chunk. Gap in column index.
+  ValidatePageIndexRange({{10, 5, -1, -1}, {20, 10, 30, 25}}, true, 10, 20, 30, 25);
+  // Missing offset index for second column chunk.
+  ValidatePageIndexRange({{10, 5, 25, 5}, {20, 10, -1, -1}}, true, 10, 20, 25, 5);
+  // Three column chunks.
+  ValidatePageIndexRange({{100, 10, 220, 30}, {110, 25, 250, 10}, {140, 30, 260, 40},
+    {200, 10, 300, 100}}, true, 100, 110, 220, 180);
+}
+
+}
+
+IMPALA_TEST_MAIN();
diff --git a/be/src/exec/parquet/parquet-page-index.cc b/be/src/exec/parquet/parquet-page-index.cc
new file mode 100644
index 0000000..296f16c
--- /dev/null
+++ b/be/src/exec/parquet/parquet-page-index.cc
@@ -0,0 +1,147 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "common/logging.h"
+#include "exec/parquet/hdfs-parquet-scanner.h"
+#include "exec/parquet/parquet-page-index.h"
+#include "gutil/strings/substitute.h"
+#include "rpc/thrift-util.h"
+#include "runtime/io/request-context.h"
+#include "runtime/io/request-ranges.h"
+
+#include <memory>
+
+#include "common/names.h"
+
+using namespace parquet;
+using namespace impala::io;
+
+namespace impala {
+
+ParquetPageIndex::ParquetPageIndex(HdfsParquetScanner* scanner) :
+    scanner_(scanner),
+    page_index_buffer_(scanner_->scan_node_->mem_tracker())
+{
+}
+
+bool ParquetPageIndex::DeterminePageIndexRangesInRowGroup(
+    const parquet::RowGroup& row_group, int64_t* column_index_start,
+    int64_t* column_index_size, int64_t* offset_index_start, int64_t* offset_index_size) {
+  int64_t ci_start = numeric_limits<int64_t>::max();
+  int64_t oi_start = numeric_limits<int64_t>::max();
+  int64_t ci_end = -1;
+  int64_t oi_end = -1;
+  for (const ColumnChunk& col_chunk : row_group.columns) {
+    if (col_chunk.__isset.column_index_offset && col_chunk.__isset.column_index_length) {
+      ci_start = min(ci_start, col_chunk.column_index_offset);
+      ci_end = max(ci_end, col_chunk.column_index_offset + col_chunk.column_index_length);
+    }
+    if (col_chunk.__isset.offset_index_offset && col_chunk.__isset.offset_index_length) {
+      oi_start = min(oi_start, col_chunk.offset_index_offset);
+      oi_end = max(oi_end, col_chunk.offset_index_offset + col_chunk.offset_index_length);
+    }
+  }
+  bool has_page_index = oi_end != -1 && ci_end != -1;
+  if (has_page_index) {
+    *column_index_start = ci_start;
+    *column_index_size = ci_end - ci_start;
+    *offset_index_start = oi_start;
+    *offset_index_size = oi_end - oi_start;
+  }
+  return has_page_index;
+}
+
+Status ParquetPageIndex::ReadAll(int row_group_idx) {
+  DCHECK(page_index_buffer_.buffer() == nullptr);
+  bool has_page_index = DeterminePageIndexRangesInRowGroup(
+      scanner_->file_metadata_.row_groups[row_group_idx],
+      &column_index_base_offset_, &column_index_size_,
+      &offset_index_base_offset_, &offset_index_size_);
+
+  // It's not an error if there is no page index.
+  if (!has_page_index) return Status::OK();
+
+  int64_t scan_range_start = column_index_base_offset_;
+  int64_t scan_range_size =
+      offset_index_base_offset_ + offset_index_size_ - column_index_base_offset_;
+  vector<ScanRange::SubRange> sub_ranges;
+  if (column_index_base_offset_ + column_index_size_ <= offset_index_base_offset_) {
+    // The sub-ranges will be merged if they are contiguous.
+    sub_ranges.push_back({column_index_base_offset_, column_index_size_});
+    sub_ranges.push_back({offset_index_base_offset_, offset_index_size_});
+  } else {
+    return Status(Substitute("Found unsupported Parquet page index layout for file '$1'.",
+            scanner_->filename()));
+  }
+  int64_t buffer_size = column_index_size_ + offset_index_size_;
+  if (!page_index_buffer_.TryAllocate(buffer_size)) {
+    return Status(Substitute("Could not allocate buffer of $0 bytes for Parquet "
+        "page index for file '$1'.", buffer_size, scanner_->filename()));
+  }
+  int64_t partition_id = scanner_->context_->partition_descriptor()->id();
+  ScanRange* object_range = scanner_->scan_node_->AllocateScanRange(
+      scanner_->metadata_range_->fs(), scanner_->filename(), scan_range_size,
+      scan_range_start, move(sub_ranges), partition_id,
+      scanner_->metadata_range_->disk_id(), scanner_->metadata_range_->expected_local(),
+      scanner_->metadata_range_->is_erasure_coded(),
+      BufferOpts::ReadInto(page_index_buffer_.buffer(), page_index_buffer_.Size()));
+
+  unique_ptr<BufferDescriptor> io_buffer;
+  bool needs_buffers;
+  RETURN_IF_ERROR(
+      scanner_->scan_node_->reader_context()->StartScanRange(object_range,
+          &needs_buffers));
+  DCHECK(!needs_buffers) << "Already provided a buffer";
+  RETURN_IF_ERROR(object_range->GetNext(&io_buffer));
+  DCHECK_EQ(io_buffer->buffer(), page_index_buffer_.buffer());
+  DCHECK_EQ(io_buffer->len(), page_index_buffer_.Size());
+  DCHECK(io_buffer->eosr());
+  object_range->ReturnBuffer(move(io_buffer));
+
+  return Status::OK();
+}
+
+Status ParquetPageIndex::DeserializeColumnIndex(const ColumnChunk& col_chunk,
+    ColumnIndex* column_index) {
+  if (page_index_buffer_.buffer() == nullptr) {
+    return Status(Substitute("No page index for file $0.", scanner_->filename()));
+  }
+
+  int64_t buffer_offset = col_chunk.column_index_offset - column_index_base_offset_;
+  uint32_t length = col_chunk.column_index_length;
+  DCHECK_GE(buffer_offset, 0);
+  DCHECK_LE(buffer_offset + length, column_index_size_);
+  return DeserializeThriftMsg(page_index_buffer_.buffer() + buffer_offset,
+      &length, true, column_index);
+}
+
+Status ParquetPageIndex::DeserializeOffsetIndex(const ColumnChunk& col_chunk,
+    OffsetIndex* offset_index) {
+  if (page_index_buffer_.buffer() == nullptr) {
+    return Status(Substitute("No page index for file $0.", scanner_->filename()));
+  }
+
+  int64_t buffer_offset = col_chunk.offset_index_offset - offset_index_base_offset_ +
+      column_index_size_;
+  uint32_t length = col_chunk.offset_index_length;
+  DCHECK_GE(buffer_offset, 0);
+  DCHECK_LE(buffer_offset + length, page_index_buffer_.Size());
+  return DeserializeThriftMsg(page_index_buffer_.buffer() + buffer_offset,
+      &length, true, offset_index);
+}
+
+}
diff --git a/be/src/exec/parquet/parquet-page-index.h b/be/src/exec/parquet/parquet-page-index.h
new file mode 100644
index 0000000..36d6ded
--- /dev/null
+++ b/be/src/exec/parquet/parquet-page-index.h
@@ -0,0 +1,83 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#pragma once
+
+#include "runtime/scoped-buffer.h"
+
+namespace parquet {
+class ColumnChunk;
+class ColumnIndex;
+class OffsetIndex;
+}
+
+namespace impala {
+
+class HdfsParquetScanner;
+
+/// Helper class for reading the Parquet page index. It allocates a buffer to hold the
+/// raw bytes of the page index. It provides helper methods to deserialize the relevant
+/// parts.
+class ParquetPageIndex {
+public:
+  ParquetPageIndex(HdfsParquetScanner* scanner);
+
+  /// It reads the raw bytes of a page index belonging to a specific row group. It stores
+  /// the bytes in an internal buffer.
+  /// It expects that the layout of the page index conforms to the specification, i.e.
+  /// column indexes come before offset indexes. Otherwise it returns an error. (Impala
+  /// and Parquet-MR produce conforming page index layouts).
+  /// It needs to be called before the serialization methods.
+  Status ReadAll(int row_group_idx);
+
+  /// Deserializes a ColumnIndex object for the given column chunk.
+  Status DeserializeColumnIndex(const parquet::ColumnChunk& col_chunk,
+      parquet::ColumnIndex* column_index);
+
+  /// Deserializes an OffsetIndex object for the given column chunk.
+  Status DeserializeOffsetIndex(const parquet::ColumnChunk& col_chunk,
+      parquet::OffsetIndex* offset_index);
+
+  /// Determines the column index and offset index ranges for the given row group.
+  /// Returns true when at least a partial column index and an offset index are found.
+  /// Returns false when there is absolutely no column index or offset index for the row
+  /// group.
+  static bool DeterminePageIndexRangesInRowGroup(
+      const parquet::RowGroup& row_group, int64_t* column_index_start,
+      int64_t* column_index_size, int64_t* offset_index_start,
+      int64_t* offset_index_size);
+
+  /// Releases resources held by this object.
+  void Release() { page_index_buffer_.Release(); }
+
+  /// Returns true if the page index buffer is empty.
+  bool IsEmpty() { return page_index_buffer_.Size() == 0; }
+private:
+  /// The scanner that created this object.
+  HdfsParquetScanner* scanner_;
+
+  /// Buffer to hold the raw bytes of the page index.
+  ScopedBuffer page_index_buffer_;
+
+  /// File offsets and sizes of the page Index.
+  int64_t column_index_base_offset_;
+  int64_t column_index_size_;
+  int64_t offset_index_base_offset_;
+  int64_t offset_index_size_;
+};
+
+}
diff --git a/be/src/exprs/literal.cc b/be/src/exprs/literal.cc
index 9603e3e..e20c58b 100644
--- a/be/src/exprs/literal.cc
+++ b/be/src/exprs/literal.cc
@@ -314,25 +314,25 @@ string Literal::DebugString() const {
   out << "Literal(value=";
   switch (type_.type) {
     case TYPE_BOOLEAN:
-      out << value_.bool_val;
+      out << std::to_string(value_.bool_val);
       break;
     case TYPE_TINYINT:
-      out << value_.tinyint_val;
+      out << std::to_string(value_.tinyint_val);
       break;
     case TYPE_SMALLINT:
-      out << value_.smallint_val;
+      out << std::to_string(value_.smallint_val);
       break;
     case TYPE_INT:
-      out << value_.int_val;
+      out << std::to_string(value_.int_val);
       break;
     case TYPE_BIGINT:
-      out << value_.bigint_val;
+      out << std::to_string(value_.bigint_val);
       break;
     case TYPE_FLOAT:
-      out << value_.float_val;
+      out << std::to_string(value_.float_val);
       break;
     case TYPE_DOUBLE:
-      out << value_.double_val;
+      out << std::to_string(value_.double_val);
       break;
     case TYPE_STRING:
       out << value_.string_val;
@@ -353,7 +353,7 @@ string Literal::DebugString() const {
       }
       break;
     case TYPE_TIMESTAMP:
-      out << value_.timestamp_val;
+      out << value_.timestamp_val.ToString();
       break;
     case TYPE_DATE:
       out << value_.date_val;
diff --git a/be/src/runtime/scoped-buffer.h b/be/src/runtime/scoped-buffer.h
index 17fb8e2..f307209 100644
--- a/be/src/runtime/scoped-buffer.h
+++ b/be/src/runtime/scoped-buffer.h
@@ -55,7 +55,9 @@ class ScopedBuffer {
     bytes_ = 0;
   }
 
-  inline uint8_t* buffer() const { return buffer_; }
+  uint8_t* buffer() const { return buffer_; }
+
+  int64_t Size() const { return bytes_; }
 
  private:
   MemTracker* mem_tracker_;
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index e199489..9482f23 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -408,8 +408,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
         break;
       }
       case TImpalaQueryOptions::DISABLE_ROW_RUNTIME_FILTERING:
-        query_options->__set_disable_row_runtime_filtering(
-            iequals(value, "true") || iequals(value, "1"));
+        query_options->__set_disable_row_runtime_filtering(IsTrue(value));
         break;
       case TImpalaQueryOptions::MAX_NUM_RUNTIME_FILTERS: {
         StringParser::ParseResult status;
@@ -725,6 +724,10 @@ Status impala::SetQueryOption(const string& key, const string& value,
         query_options->__set_default_file_format(enum_type);
         break;
       }
+      case TImpalaQueryOptions::PARQUET_READ_PAGE_INDEX: {
+        query_options->__set_parquet_read_page_index(IsTrue(value));
+        break;
+      }
       case TImpalaQueryOptions::PARQUET_TIMESTAMP_TYPE: {
         TParquetTimestampType::type enum_type;
         RETURN_IF_ERROR(GetThriftEnum(value, "Parquet timestamp type",
diff --git a/be/src/service/query-options.h b/be/src/service/query-options.h
index bb83931..cf37550 100644
--- a/be/src/service/query-options.h
+++ b/be/src/service/query-options.h
@@ -41,7 +41,7 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
 // the DCHECK.
 #define QUERY_OPTS_TABLE\
   DCHECK_EQ(_TImpalaQueryOptions_VALUES_TO_NAMES.size(),\
-      TImpalaQueryOptions::PARQUET_TIMESTAMP_TYPE + 1);\
+      TImpalaQueryOptions::PARQUET_READ_PAGE_INDEX + 1);\
   REMOVED_QUERY_OPT_FN(abort_on_default_limit_exceeded, ABORT_ON_DEFAULT_LIMIT_EXCEEDED)\
   QUERY_OPT_FN(abort_on_error, ABORT_ON_ERROR, TQueryOptionLevel::REGULAR)\
   REMOVED_QUERY_OPT_FN(allow_unsupported_formats, ALLOW_UNSUPPORTED_FORMATS)\
@@ -155,6 +155,8 @@ typedef std::unordered_map<string, beeswax::TQueryOptionLevel::type>
   QUERY_OPT_FN(default_file_format, DEFAULT_FILE_FORMAT, TQueryOptionLevel::REGULAR)\
   QUERY_OPT_FN(parquet_timestamp_type, PARQUET_TIMESTAMP_TYPE,\
       TQueryOptionLevel::DEVELOPMENT)\
+  QUERY_OPT_FN(parquet_read_page_index, PARQUET_READ_PAGE_INDEX,\
+      TQueryOptionLevel::ADVANCED)
   ;
 
 /// Enforce practical limits on some query options to avoid undesired query state.
diff --git a/common/thrift/ImpalaInternalService.thrift b/common/thrift/ImpalaInternalService.thrift
index ea05d4a..dfb6099 100644
--- a/common/thrift/ImpalaInternalService.thrift
+++ b/common/thrift/ImpalaInternalService.thrift
@@ -339,6 +339,9 @@ struct TQueryOptions {
   // See comment in ImpalaService.thrift.
   80: optional TParquetTimestampType parquet_timestamp_type =
       TParquetTimestampType.INT96_NANOS;
+
+  // See comment in ImpalaService.thrift.
+  81: optional bool parquet_read_page_index = true;
 }
 
 // Impala currently has two types of sessions: Beeswax and HiveServer2
diff --git a/common/thrift/ImpalaService.thrift b/common/thrift/ImpalaService.thrift
index b5ffaa3..e7dadbb 100644
--- a/common/thrift/ImpalaService.thrift
+++ b/common/thrift/ImpalaService.thrift
@@ -385,6 +385,11 @@ enum TImpalaQueryOptions {
   // Valid values: INT96_NANOS, INT64_MILLIS, INT64_MICROS, INT64_NANOS
   // Default: INT96_NANOS
   PARQUET_TIMESTAMP_TYPE = 79
+
+  // Enable using the Parquet page index during scans. The page index contains min/max
+  // statistics at page-level granularity. It can be used to skip pages and rows during
+  // scanning.
+  PARQUET_READ_PAGE_INDEX = 80
 }
 
 // The summary of a DML statement.
diff --git a/testdata/data/README b/testdata/data/README
index 4bea311..4e5f92c 100644
--- a/testdata/data/README
+++ b/testdata/data/README
@@ -34,7 +34,7 @@ zero_rows_one_row_group.parquet:
 Generated by hacking Impala's Parquet writer.
 The file metadata indicates zero rows but one row group.
 
-huge_num_rows.parquet
+huge_num_rows.parquet:
 Generated by hacking Impala's Parquet writer.
 The file metadata indicates 2 * MAX_INT32 rows.
 The single row group also has the same number of rows in the metadata.
@@ -266,7 +266,7 @@ precision. Tested separately from the micro/millisecond columns because of the d
 valid range.
 Columns: rawvalue bigint, nanoutc timestamp, nanononutc timestamp
 
-out_of_range_timestamp_hive_211.parquet
+out_of_range_timestamp_hive_211.parquet:
 Hive-generated file with an out-of-range timestamp. Generated with Hive 2.1.1 using
 the following query:
 create table alltypes_hive stored as parquet as
@@ -274,7 +274,7 @@ select * from functional.alltypes
 union all
 select -1, false, 0, 0, 0, 0, 0, 0, '', '', cast('1399-01-01 00:00:00' as timestamp), 0, 0
 
-out_of_range_timestamp2_hive_211.parquet
+out_of_range_timestamp2_hive_211.parquet:
 Hive-generated file with out-of-range timestamps every second value, to exercise code
 paths in Parquet scanner for non-repeated runs. Generated with Hive 2.1.1 using
 the following query:
@@ -288,13 +288,13 @@ select id,
 from functional.alltypes
 sort by id
 
-decimal_rtf_tbl.txt
+decimal_rtf_tbl.txt:
 This was generated using formulas in Google Sheets.  The goal was to create various
 decimal values that covers the 3 storage formats with various precision and scale.
 This is a reasonably large table that is used for testing min-max filters
 with decimal types on Kudu.
 
-decimal_rtf_tiny_tbl.txt
+decimal_rtf_tiny_tbl.txt:
 Small table with specific decimal values picked from decimal_rtf_tbl.txt so that
 min-max filter based pruning can be tested with decimal types on Kudu.
 
@@ -322,3 +322,117 @@ hive2_pre_gregorian.parquet:
 Small table with one DATE column, created by Hive 2.1.1.
 Used to demonstrate parquet interoperability issues between Hive and Impala for dates
 before the introduction of Gregorian calendar in 1582-10-15.
+
+decimals_1_10.parquet:
+Contains two decimal columns, one with precision 1, the other with precision 10.
+I used Hive 2.1.1 with a modified version of Parquet-MR (6901a20) to create tiny,
+misaligned pages in order to test the value-skipping logic in the Parquet column readers.
+The modification in Parquet-MR was to set MIN_SLAB_SIZE to 1. You can find the change
+here: https://github.com/boroknagyz/parquet-mr/tree/tiny_pages
+hive  --hiveconf parquet.page.row.count.limit=5 --hiveconf parquet.page.size=5
+ --hiveconf parquet.enable.dictionary=false --hiveconf parquet.page.size.row.check.min=1
+create table decimals_1_10 (d_1 DECIMAL(1, 0), d_10 DECIMAL(10, 0)) stored as PARQUET
+insert into decimals_1_10 values (1, 1), (2, 2), (3, 3), (4, 4), (5, 5),
+                            (NULL, 1), (2, 2), (3, 3), (4, 4), (5, 5),
+                            (1, 1), (NULL, 2), (3, 3), (4, 4), (5, 5),
+                            (1, 1), (2, 2), (NULL, 3), (4, 4), (5, 5),
+                            (1, 1), (2, 2), (3, 3), (NULL, 4), (5, 5),
+                            (1, 1), (2, 2), (3, 3), (4, 4), (NULL, 5),
+                            (NULL, 1), (NULL, 2), (3, 3), (4, 4), (5, 5),
+                            (1, 1), (NULL, 2), (3, 3), (NULL, 4), (5, 5),
+                            (1, 1), (2, 2), (3, 3), (NULL, 4), (NULL, 5),
+                            (NULL, 1), (2, 2), (NULL, 3), (NULL, 4), (5, 5),
+                            (1, 1), (2, 2), (3, 3), (4, 4), (5, NULL);
+
+nested_decimals.parquet:
+Contains two columns, one is a decimal column, the other is an array of decimals.
+I used Hive 2.1.1 with a modified Parquet-MR, see description at decimals_1_10.parquet.
+hive  --hiveconf parquet.page.row.count.limit=5 --hiveconf parquet.page.size=16
+ --hiveconf parquet.enable.dictionary=false --hiveconf parquet.page.size.row.check.min=1
+create table nested_decimals (d_38 Decimal(38, 0), arr array<Decimal(1, 0)>) stored as parquet;
+insert into nested_decimals select 1, array(cast (1 as decimal(1,0)), cast (1 as decimal(1,0)), cast (1 as decimal(1,0)) ) union all
+                            select 2, array(cast (2 as decimal(1,0)), cast (2 as decimal(1,0)), cast (2 as decimal(1,0)) ) union all
+                            select 3, array(cast (3 as decimal(1,0)), cast (3 as decimal(1,0)), cast (3 as decimal(1,0)) ) union all
+                            select 4, array(cast (4 as decimal(1,0)), cast (4 as decimal(1,0)), cast (4 as decimal(1,0)) ) union all
+                            select 5, array(cast (5 as decimal(1,0)), cast (5 as decimal(1,0)), cast (5 as decimal(1,0)) ) union all
+
+                            select 1, array(cast (1 as decimal(1,0)) ) union all
+                            select 2, array(cast (2 as decimal(1,0)), cast (2 as decimal(1,0)) ) union all
+                            select 3, array(cast (3 as decimal(1,0)), cast (3 as decimal(1,0)), cast (3 as decimal(1,0)) ) union all
+                            select 4, array(cast (4 as decimal(1,0)), cast (4 as decimal(1,0)), cast (4 as decimal(1,0)), cast (4 as decimal(1,0)) ) union all
+                            select 5, array(cast (5 as decimal(1,0)), cast (5 as decimal(1,0)), cast (5 as decimal(1,0)), cast (5 as decimal(1,0)), cast (5 as decimal(1,0)) ) union all
+
+                            select 1, array(cast (NULL as decimal(1, 0)), NULL, NULL) union all
+                            select 2, array(cast (2 as decimal(1,0)), NULL, NULL) union all
+                            select 3, array(cast (3 as decimal(1,0)), NULL, cast (3 as decimal(1,0))) union all
+                            select 4, array(NULL, cast (4 as decimal(1,0)), cast (4 as decimal(1,0)), NULL) union all
+                            select 5, array(NULL, cast (5 as decimal(1,0)), NULL, NULL, cast (5 as decimal(1,0)) ) union all
+
+                            select 6, array(cast (6 as decimal(1,0)), NULL, cast (6 as decimal(1,0)) ) union all
+                            select 7, array(cast (7 as decimal(1,0)), cast (7 as decimal(1,0)), cast (7 as decimal(1,0)), NULL ) union all
+                            select 8, array(NULL, NULL, cast (8 as decimal(1,0)) ) union all
+                            select 7, array(cast (7 as decimal(1,0)), cast (7 as decimal(1,0)), cast (7 as decimal(1,0)) ) union all
+                            select 6, array(NULL, NULL, NULL, cast (6 as decimal(1,0)) );
+
+double_nested_decimals.parquet:
+Contains two columns, one is a decimal column, the other is an array of arrays of
+decimals. I used Hive 2.1.1 with a modified Parquet-MR, see description
+at decimals_1_10.parquet.
+hive  --hiveconf parquet.page.row.count.limit=5 --hiveconf parquet.page.size=16
+  --hiveconf parquet.enable.dictionary=false --hiveconf parquet.page.size.row.check.min=1
+create table double_nested_decimals (d_38 Decimal(38, 0), arr array<array<Decimal(1, 0)>>) stored as parquet;
+insert into double_nested_decimals select 1, array(array(cast (1 as decimal(1,0)), cast (1 as decimal(1,0)) )) union all
+                                   select 2, array(array(cast (2 as decimal(1,0)), cast (2 as decimal(1,0)) )) union all
+                                   select 3, array(array(cast (3 as decimal(1,0)), cast (3 as decimal(1,0)), cast (3 as decimal(1,0)) )) union all
+                                   select 4, array(array(cast (4 as decimal(1,0)), cast (4 as decimal(1,0)), cast (4 as decimal(1,0)) )) union all
+                                   select 5, array(array(cast (5 as decimal(1,0)), cast (5 as decimal(1,0)), cast (5 as decimal(1,0)) )) union all
+
+                                   select 1, array(array(cast (1 as decimal(1,0))), array(cast (1 as decimal(1,0))), array(cast (1 as decimal(1,0))) ) union all
+                                   select 2, array(array(cast (2 as decimal(1,0))), array(cast (2 as decimal(1,0))) ) union all
+                                   select 3, array(array(cast (3 as decimal(1,0))), array(cast (3 as decimal(1,0))), array(cast (3 as decimal(1,0))) ) union all
+                                   select 4, array(array(cast (4 as decimal(1,0))), array(cast (4 as decimal(1,0))) ) union all
+                                   select 5, array(array(cast (5 as decimal(1,0))), array(cast (5 as decimal(1,0))) ) union all
+
+                                   select 1, array(array(cast (1 as decimal(1,0))) ) union all
+                                   select 2, array(array(cast (2 as decimal(1,0))), array(cast (2 as decimal(1,0))) ) union all
+                                   select 3, array(array(cast (3 as decimal(1,0))), array(cast (3 as decimal(1,0))), array(cast (3 as decimal(1,0))) ) union all
+                                   select 4, array(array(cast (4 as decimal(1,0))), array(cast (4 as decimal(1,0))) ) union all
+                                   select 5, array(array(cast (5 as decimal(1,0))) ) union all
+
+                                   select 1, array(array(cast (1 as decimal(1,0))), array(cast (1 as decimal(1,0))), array(cast (1 as decimal(1,0))) ) union all
+                                   select 2, array(array(cast (2 as decimal(1,0))), array(cast (2 as decimal(1,0))) ) union all
+                                   select 3, array(array(cast (3 as decimal(1,0))) ) union all
+                                   select 4, array(array(cast (4 as decimal(1,0))), array(cast (4 as decimal(1,0))) ) union all
+                                   select 5, array(array(cast (5 as decimal(1,0))), array(cast (5 as decimal(1,0))), array(cast (5 as decimal(1,0))) ) union all
+
+                                   select 1, array(array(cast (1 as decimal(1,0))), array(cast (1 as decimal(1,0)), cast (1 as decimal(1,0))) ) union all
+                                   select 2, array(array(cast (2 as decimal(1,0))) ) union all
+                                   select 3, array(array(cast (3 as decimal(1,0)), cast (3 as decimal(1,0))), array(cast (3 as decimal(1,0))) ) union all
+                                   select 4, array(array(cast (4 as decimal(1,0))), array(cast (4 as decimal(1,0)), cast (4 as decimal(1,0))), array(cast (4 as decimal(1,0))) ) union all
+                                   select 5, array(array(cast (5 as decimal(1,0))), array(cast (5 as decimal(1,0))), array(cast (5 as decimal(1,0))) ) union all
+
+                                   select 1, array(array(cast (NULL as decimal(1,0))), array(cast (NULL as decimal(1,0))), array(cast (1 as decimal(1,0))) ) union all
+                                   select 2, array(array(cast (NULL as decimal(1,0))), array(cast (NULL as decimal(1,0))), array(cast (NULL as decimal(1,0))) ) union all
+                                   select 3, array(array(cast (NULL as decimal(1,0))), array(cast (3 as decimal(1,0))), NULL ) union all
+                                   select 4, array(NULL, NULL, array(cast (NULL as decimal(1,0)), NULL, NULL, NULL, NULL) ) union all
+                                   select 5, array(array(NULL, cast (5 as decimal(1,0)), NULL, NULL, NULL) ) union all
+
+                                   select 6, array(array(cast (6 as decimal(1,0)), NULL), array(cast (6 as decimal(1,0))) ) union all
+                                   select 7, array(array(cast (7 as decimal(1,0)), cast (7 as decimal(1,0))), NULL ) union all
+                                   select 8, array(array(NULL, NULL, cast (8 as decimal(1,0))) ) union all
+                                   select 7, array(array(cast (7 as decimal(1,0)), cast (NULL as decimal(1,0))), array(cast (7 as decimal(1,0))) ) union all
+                                   select 6, array(array(NULL, NULL, cast (6 as decimal(1,0))), array(NULL, cast (6 as decimal(1,0))) );
+
+alltypes_tiny_pages.parquet:
+Created from 'functional.alltypes' with small page sizes.
+I used Hive 2.1.1 with a modified Parquet-MR, see description at decimals_1_10.parquet.
+I used the following commands to create the file:
+hive  --hiveconf parquet.page.row.count.limit=90 --hiveconf parquet.page.size=90 --hiveconf parquet.page.size.row.check.min=7
+create table alltypes_tiny_pages stored as parquet as select * from functional_parquet.alltypes
+
+alltypes_tiny_pages_plain.parquet:
+Created from 'functional.alltypes' with small page sizes without dictionary encoding.
+I used Hive 2.1.1 with a modified Parquet-MR, see description at decimals_1_10.parquet.
+I used the following commands to create the file:
+hive  --hiveconf parquet.page.row.count.limit=90 --hiveconf parquet.page.size=90 --hiveconf parquet.enable.dictionary=false  --hiveconf parquet.page.size.row.check.min=7
+create table alltypes_tiny_pages_plain stored as parquet as select * from functional_parquet.alltypes
diff --git a/testdata/data/alltypes_tiny_pages.parquet b/testdata/data/alltypes_tiny_pages.parquet
new file mode 100644
index 0000000..90019d1
Binary files /dev/null and b/testdata/data/alltypes_tiny_pages.parquet differ
diff --git a/testdata/data/alltypes_tiny_pages_plain.parquet b/testdata/data/alltypes_tiny_pages_plain.parquet
new file mode 100644
index 0000000..68d4dcb
Binary files /dev/null and b/testdata/data/alltypes_tiny_pages_plain.parquet differ
diff --git a/testdata/data/decimals_1_10.parquet b/testdata/data/decimals_1_10.parquet
new file mode 100644
index 0000000..30bc55b
Binary files /dev/null and b/testdata/data/decimals_1_10.parquet differ
diff --git a/testdata/data/double_nested_decimals.parquet b/testdata/data/double_nested_decimals.parquet
new file mode 100644
index 0000000..4c67979
Binary files /dev/null and b/testdata/data/double_nested_decimals.parquet differ
diff --git a/testdata/data/nested_decimals.parquet b/testdata/data/nested_decimals.parquet
new file mode 100644
index 0000000..3175c7d
Binary files /dev/null and b/testdata/data/nested_decimals.parquet differ
diff --git a/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test b/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test
new file mode 100644
index 0000000..8c85b57
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/nested-types-parquet-page-index.test
@@ -0,0 +1,704 @@
+# These tests check that the page selection and value value-skipping logic works well
+# for nested types. 'nested_decimals' contains an array column of decimals.
+# 'double_nested_decimals' contains an array of arrays of decimals column They are
+# created in a way to have tiny, misaligned pages.
+# The result set checks that the reading and skipping of values went well. And via
+# on the runtime profile check we can verify that we used the page index properly.
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 1
+---- RESULTS
+1,0,1
+1,1,1
+1,2,1
+1,0,1
+1,0,NULL
+1,1,NULL
+1,2,NULL
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 2
+---- RESULTS
+2,0,2
+2,1,2
+2,2,2
+2,0,2
+2,1,2
+2,0,2
+2,1,NULL
+2,2,NULL
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 3
+---- RESULTS
+3,0,3
+3,1,3
+3,2,3
+3,0,3
+3,1,3
+3,2,3
+3,0,3
+3,1,NULL
+3,2,3
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 4
+---- RESULTS
+4,0,4
+4,1,4
+4,2,4
+4,0,4
+4,1,4
+4,2,4
+4,3,4
+4,0,NULL
+4,1,4
+4,2,4
+4,3,NULL
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 5
+---- RESULTS
+5,0,5
+5,1,5
+5,2,5
+5,0,5
+5,1,5
+5,2,5
+5,3,5
+5,4,5
+5,0,NULL
+5,1,5
+5,2,NULL
+5,3,NULL
+5,4,5
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 6
+---- RESULTS
+6,0,6
+6,1,NULL
+6,2,6
+6,0,NULL
+6,1,NULL
+6,2,NULL
+6,3,6
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 21
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 7
+---- RESULTS
+7,0,7
+7,1,7
+7,2,7
+7,3,NULL
+7,0,7
+7,1,7
+7,2,7
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 21
+====
+---- QUERY
+# Test value-skipping logic by selecting a single top-level row from each page.
+# Skipping other rows.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 = 8
+---- RESULTS
+8,0,NULL
+8,1,NULL
+8,2,8
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 22
+====
+---- QUERY
+# Selecting the first rows from the pages.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 < 3
+---- RESULTS
+1,0,1
+1,1,1
+1,2,1
+2,0,2
+2,1,2
+2,2,2
+1,0,1
+2,0,2
+2,1,2
+1,0,NULL
+1,1,NULL
+1,2,NULL
+2,0,2
+2,1,NULL
+2,2,NULL
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 15
+====
+---- QUERY
+# Selecting the last rows from the pages.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 > 2 and d_38 < 6
+---- RESULTS
+3,0,3
+3,1,3
+3,2,3
+4,0,4
+4,1,4
+4,2,4
+5,0,5
+5,1,5
+5,2,5
+3,0,3
+3,1,3
+3,2,3
+4,0,4
+4,1,4
+4,2,4
+4,3,4
+5,0,5
+5,1,5
+5,2,5
+5,3,5
+5,4,5
+3,0,3
+3,1,NULL
+3,2,3
+4,0,NULL
+4,1,4
+4,2,4
+4,3,NULL
+5,0,NULL
+5,1,5
+5,2,NULL
+5,3,NULL
+5,4,5
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 12
+====
+---- QUERY
+# This query selects the first and last rows from a page, so it tests the case
+# when we read, skip, and read values.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 > 5 and d_38 < 8
+---- RESULTS
+6,0,6
+6,1,NULL
+6,2,6
+7,0,7
+7,1,7
+7,2,7
+7,3,NULL
+7,0,7
+7,1,7
+7,2,7
+6,0,NULL
+6,1,NULL
+6,2,NULL
+6,3,6
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 19
+====
+---- QUERY
+# Selecting middle rows from a page.
+select d_38, pos, item from nested_decimals n, n.arr where d_38 > 6
+---- RESULTS
+7,0,7
+7,1,7
+7,2,7
+7,3,NULL
+8,0,NULL
+8,1,NULL
+8,2,8
+7,0,7
+7,1,7
+7,2,7
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 20
+====
+---- QUERY
+# Filtering based on nested value.
+select d_38, pos, item from nested_decimals n, n.arr where item = 1
+---- RESULTS
+1,0,1
+1,1,1
+1,2,1
+1,0,1
+---- TYPES
+DECIMAL, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 12
+====
+---- QUERY
+# Only selecting the nested item and its position. The Parquet scanner reads
+# the values in batches in this case.
+select pos, item from nested_decimals.arr where item < 3
+---- RESULTS
+0,1
+1,1
+2,1
+0,2
+1,2
+2,2
+0,1
+0,2
+1,2
+0,2
+---- TYPES
+BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 1
+====
+---- QUERY
+# Only selecting the nested item and its position. The Parquet scanner reads
+# the values in batches in this case.
+select pos, item from nested_decimals.arr where item < 8 and item > 5
+---- RESULTS
+0,6
+2,6
+0,7
+1,7
+2,7
+0,7
+1,7
+2,7
+3,6
+---- TYPES
+BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 3
+====
+---- QUERY
+# Only selecting the nested item and its position. The Parquet scanner reads
+# the values in batches in this case.
+select pos, item from nested_decimals.arr where item = 8
+---- RESULTS
+2,8
+---- TYPES
+BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 3
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 1
+---- RESULTS
+1,0,0,1
+1,0,1,1
+1,0,0,1
+1,1,0,1
+1,2,0,1
+1,0,0,1
+1,0,0,1
+1,1,0,1
+1,2,0,1
+1,0,0,1
+1,1,0,1
+1,1,1,1
+1,0,0,NULL
+1,1,0,NULL
+1,2,0,1
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 30
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 2
+---- RESULTS
+2,0,0,2
+2,0,1,2
+2,0,0,2
+2,1,0,2
+2,0,0,2
+2,1,0,2
+2,0,0,2
+2,1,0,2
+2,0,0,2
+2,0,0,NULL
+2,1,0,NULL
+2,2,0,NULL
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 30
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 3
+---- RESULTS
+3,0,0,3
+3,0,1,3
+3,0,2,3
+3,0,0,3
+3,1,0,3
+3,2,0,3
+3,0,0,3
+3,1,0,3
+3,2,0,3
+3,0,0,3
+3,0,0,3
+3,0,1,3
+3,1,0,3
+3,0,0,NULL
+3,1,0,3
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 30
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 4
+---- RESULTS
+4,0,0,4
+4,0,1,4
+4,0,2,4
+4,0,0,4
+4,1,0,4
+4,0,0,4
+4,1,0,4
+4,0,0,4
+4,1,0,4
+4,0,0,4
+4,1,0,4
+4,1,1,4
+4,2,0,4
+4,2,0,NULL
+4,2,1,NULL
+4,2,2,NULL
+4,2,3,NULL
+4,2,4,NULL
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 30
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 5
+---- RESULTS
+5,0,0,5
+5,0,1,5
+5,0,2,5
+5,0,0,5
+5,1,0,5
+5,0,0,5
+5,0,0,5
+5,1,0,5
+5,2,0,5
+5,0,0,5
+5,1,0,5
+5,2,0,5
+5,0,0,NULL
+5,0,1,5
+5,0,2,NULL
+5,0,3,NULL
+5,0,4,NULL
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 30
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 6
+---- RESULTS
+6,0,0,6
+6,0,1,NULL
+6,1,0,6
+6,0,0,NULL
+6,0,1,NULL
+6,0,2,6
+6,1,0,NULL
+6,1,1,6
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 39
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 7
+---- RESULTS
+7,0,0,7
+7,0,1,7
+7,0,0,7
+7,0,1,NULL
+7,1,0,7
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 39
+====
+---- QUERY
+# Selecting one top-level row from a table that has a double-nested column.
+# Skipping all other rows.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 = 8
+---- RESULTS
+8,0,0,NULL
+8,0,1,NULL
+8,0,2,8
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 40
+====
+---- QUERY
+# Selecting frist couple of rows from the pages.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 < 3
+---- RESULTS
+1,0,0,1
+1,0,1,1
+2,0,0,2
+2,0,1,2
+1,0,0,1
+1,1,0,1
+1,2,0,1
+2,0,0,2
+2,1,0,2
+1,0,0,1
+2,0,0,2
+2,1,0,2
+1,0,0,1
+1,1,0,1
+1,2,0,1
+2,0,0,2
+2,1,0,2
+1,0,0,1
+1,1,0,1
+1,1,1,1
+2,0,0,2
+1,0,0,NULL
+1,1,0,NULL
+1,2,0,1
+2,0,0,NULL
+2,1,0,NULL
+2,2,0,NULL
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 24
+====
+---- QUERY
+# Selecting last couple of rows from the pages.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 > 2 and d_38 < 6
+---- RESULTS
+3,0,0,3
+3,0,1,3
+3,0,2,3
+4,0,0,4
+4,0,1,4
+4,0,2,4
+5,0,0,5
+5,0,1,5
+5,0,2,5
+3,0,0,3
+3,1,0,3
+3,2,0,3
+4,0,0,4
+4,1,0,4
+5,0,0,5
+5,1,0,5
+3,0,0,3
+3,1,0,3
+3,2,0,3
+4,0,0,4
+4,1,0,4
+5,0,0,5
+3,0,0,3
+4,0,0,4
+4,1,0,4
+5,0,0,5
+5,1,0,5
+5,2,0,5
+3,0,0,3
+3,0,1,3
+3,1,0,3
+4,0,0,4
+4,1,0,4
+4,1,1,4
+4,2,0,4
+5,0,0,5
+5,1,0,5
+5,2,0,5
+3,0,0,NULL
+3,1,0,3
+4,2,0,NULL
+4,2,1,NULL
+4,2,2,NULL
+4,2,3,NULL
+4,2,4,NULL
+5,0,0,NULL
+5,0,1,5
+5,0,2,NULL
+5,0,3,NULL
+5,0,4,NULL
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 18
+====
+---- QUERY
+# This query selects the first and last rows from a page, so it tests the case
+# when we read, then skip, then we read again from a page.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 > 5 and d_38 < 8
+---- RESULTS
+6,0,0,6
+6,0,1,NULL
+6,1,0,6
+7,0,0,7
+7,0,1,7
+7,0,0,7
+7,0,1,NULL
+7,1,0,7
+6,0,0,NULL
+6,0,1,NULL
+6,0,2,6
+6,1,0,NULL
+6,1,1,6
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 37
+====
+---- QUERY
+# Selecting middle rows from a page. So the scanner needs to skip, then read,
+# then skip again.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where d_38 > 6
+---- RESULTS
+7,0,0,7
+7,0,1,7
+8,0,0,NULL
+8,0,1,NULL
+8,0,2,8
+7,0,0,7
+7,0,1,NULL
+7,1,0,7
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 38
+====
+---- QUERY
+# Selecting rows based on the innermost item. Tests whether the page index works for
+# nested columns.
+select d_38, a1.pos, a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where a2.item = 1
+---- RESULTS
+1,0,0,1
+1,0,1,1
+1,0,0,1
+1,1,0,1
+1,2,0,1
+1,0,0,1
+1,0,0,1
+1,1,0,1
+1,2,0,1
+1,0,0,1
+1,1,0,1
+1,1,1,1
+1,2,0,1
+---- TYPES
+DECIMAL, BIGINT, BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 6
+====
+---- QUERY
+# Only selecting the innermost item and its position column. The Parquet scanner reads
+# the values in batches in this case.
+select a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where a2.item = 2
+---- RESULTS
+0,2
+1,2
+0,2
+0,2
+0,2
+0,2
+0,2
+0,2
+0,2
+---- TYPES
+BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 1
+====
+---- QUERY
+# Only selecting the innermost item and its position column. The Parquet scanner reads
+# the values in batches in this case.
+select a2.pos, a2.item from double_nested_decimals d, d.arr a1, a1.item a2
+where a2.item > 5 and a2.item < 7
+---- RESULTS
+0,6
+0,6
+2,6
+1,6
+---- TYPES
+BIGINT, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 6
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test
new file mode 100644
index 0000000..814ee60
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages-plain.test
@@ -0,0 +1,234 @@
+# These tests check that page selection and value-skipping works well for all types
+# of plain-encoded columns. Queries have predicates on different columns, might have
+# multiple predicates joined by AND. This way we can test how page filtering combines
+# these predicates to filter out even more rows.
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain where id < 30
+---- RESULTS
+13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-01 23:13:00.480000000,2009,1
+12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-01 23:12:00.460000000,2009,1
+11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-01 23:11:00.450000000,2009,1
+10,true,0,0,0,0,0,0,'01/02/09','0',2009-01-01 23:10:00.450000000,2009,1
+9,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9',2008-12-31 23:09:00.360000000,2009,1
+8,true,8,8,8,80,8.800000190734863,80.8,'01/01/09','8',2008-12-31 23:08:00.280000000,2009,1
+7,false,7,7,7,70,7.699999809265137,70.7,'01/01/09','7',2008-12-31 23:07:00.210000000,2009,1
+6,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/01/09','6',2008-12-31 23:06:00.150000000,2009,1
+5,false,5,5,5,50,5.5,50.5,'01/01/09','5',2008-12-31 23:05:00.100000000,2009,1
+4,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4',2008-12-31 23:04:00.600000000,2009,1
+3,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3',2008-12-31 23:03:00.300000000,2009,1
+2,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2',2008-12-31 23:02:00.100000000,2009,1
+1,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1',2008-12-31 23:01:00,2009,1
+0,true,0,0,0,0,0,0,'01/01/09','0',2008-12-31 23:00:00,2009,1
+29,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/03/09','9',2009-01-02 23:29:01.260000000,2009,1
+28,true,8,8,8,80,8.800000190734863,80.8,'01/03/09','8',2009-01-02 23:28:01.180000000,2009,1
+27,false,7,7,7,70,7.699999809265137,70.7,'01/03/09','7',2009-01-02 23:27:01.110000000,2009,1
+26,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/03/09','6',2009-01-02 23:26:01.500000000,2009,1
+25,false,5,5,5,50,5.5,50.5,'01/03/09','5',2009-01-02 23:25:01,2009,1
+24,true,4,4,4,40,4.400000095367432,40.4,'01/03/09','4',2009-01-02 23:24:00.960000000,2009,1
+23,false,3,3,3,30,3.299999952316284,30.3,'01/03/09','3',2009-01-02 23:23:00.930000000,2009,1
+22,true,2,2,2,20,2.200000047683716,20.2,'01/03/09','2',2009-01-02 23:22:00.910000000,2009,1
+21,false,1,1,1,10,1.100000023841858,10.1,'01/03/09','1',2009-01-02 23:21:00.900000000,2009,1
+20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-02 23:20:00.900000000,2009,1
+19,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/02/09','9',2009-01-01 23:19:00.810000000,2009,1
+18,true,8,8,8,80,8.800000190734863,80.8,'01/02/09','8',2009-01-01 23:18:00.730000000,2009,1
+17,false,7,7,7,70,7.699999809265137,70.7,'01/02/09','7',2009-01-01 23:17:00.660000000,2009,1
+16,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/02/09','6',2009-01-01 23:16:00.600000000,2009,1
+15,false,5,5,5,50,5.5,50.5,'01/02/09','5',2009-01-01 23:15:00.550000000,2009,1
+14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-01 23:14:00.510000000,2009,1
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5656
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain where id > 7270
+---- RESULTS
+7290,true,0,0,0,0,0,0,'12/31/10','0',2010-12-31 04:00:13.500000000,2010,12
+7291,false,1,1,1,10,1.100000023841858,10.1,'12/31/10','1',2010-12-31 04:01:13.500000000,2010,12
+7292,true,2,2,2,20,2.200000047683716,20.2,'12/31/10','2',2010-12-31 04:02:13.510000000,2010,12
+7293,false,3,3,3,30,3.299999952316284,30.3,'12/31/10','3',2010-12-31 04:03:13.530000000,2010,12
+7294,true,4,4,4,40,4.400000095367432,40.4,'12/31/10','4',2010-12-31 04:04:13.560000000,2010,12
+7295,false,5,5,5,50,5.5,50.5,'12/31/10','5',2010-12-31 04:05:13.600000000,2010,12
+7296,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/31/10','6',2010-12-31 04:06:13.650000000,2010,12
+7297,false,7,7,7,70,7.699999809265137,70.7,'12/31/10','7',2010-12-31 04:07:13.710000000,2010,12
+7298,true,8,8,8,80,8.800000190734863,80.8,'12/31/10','8',2010-12-31 04:08:13.780000000,2010,12
+7299,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/31/10','9',2010-12-31 04:09:13.860000000,2010,12
+7289,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/30/10','9',2010-12-30 03:59:13.410000000,2010,12
+7288,true,8,8,8,80,8.800000190734863,80.8,'12/30/10','8',2010-12-30 03:58:13.330000000,2010,12
+7287,false,7,7,7,70,7.699999809265137,70.7,'12/30/10','7',2010-12-30 03:57:13.260000000,2010,12
+7286,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/30/10','6',2010-12-30 03:56:13.200000000,2010,12
+7285,false,5,5,5,50,5.5,50.5,'12/30/10','5',2010-12-30 03:55:13.150000000,2010,12
+7284,true,4,4,4,40,4.400000095367432,40.4,'12/30/10','4',2010-12-30 03:54:13.110000000,2010,12
+7283,false,3,3,3,30,3.299999952316284,30.3,'12/30/10','3',2010-12-30 03:53:13.800000000,2010,12
+7282,true,2,2,2,20,2.200000047683716,20.2,'12/30/10','2',2010-12-30 03:52:13.600000000,2010,12
+7281,false,1,1,1,10,1.100000023841858,10.1,'12/30/10','1',2010-12-30 03:51:13.500000000,2010,12
+7280,true,0,0,0,0,0,0,'12/30/10','0',2010-12-30 03:50:13.500000000,2010,12
+7279,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/29/10','9',2010-12-29 03:49:12.960000000,2010,12
+7278,true,8,8,8,80,8.800000190734863,80.8,'12/29/10','8',2010-12-29 03:48:12.880000000,2010,12
+7277,false,7,7,7,70,7.699999809265137,70.7,'12/29/10','7',2010-12-29 03:47:12.810000000,2010,12
+7276,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/29/10','6',2010-12-29 03:46:12.750000000,2010,12
+7275,false,5,5,5,50,5.5,50.5,'12/29/10','5',2010-12-29 03:45:12.700000000,2010,12
+7274,true,4,4,4,40,4.400000095367432,40.4,'12/29/10','4',2010-12-29 03:44:12.660000000,2010,12
+7273,false,3,3,3,30,3.299999952316284,30.3,'12/29/10','3',2010-12-29 03:43:12.630000000,2010,12
+7272,true,2,2,2,20,2.200000047683716,20.2,'12/29/10','2',2010-12-29 03:42:12.610000000,2010,12
+7271,false,1,1,1,10,1.100000023841858,10.1,'12/29/10','1',2010-12-29 03:41:12.600000000,2010,12
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5656
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain where id > 2300 and id < 2310
+---- RESULTS
+2309,false,9,9,9,90,9.899999618530273,90.89999999999999,'08/19/09','9',2009-08-19 01:09:08.460000000,2009,8
+2308,true,8,8,8,80,8.800000190734863,80.8,'08/19/09','8',2009-08-19 01:08:08.380000000,2009,8
+2307,false,7,7,7,70,7.699999809265137,70.7,'08/19/09','7',2009-08-19 01:07:08.310000000,2009,8
+2306,true,6,6,6,60,6.599999904632568,60.59999999999999,'08/19/09','6',2009-08-19 01:06:08.250000000,2009,8
+2305,false,5,5,5,50,5.5,50.5,'08/19/09','5',2009-08-19 01:05:08.200000000,2009,8
+2304,true,4,4,4,40,4.400000095367432,40.4,'08/19/09','4',2009-08-19 01:04:08.160000000,2009,8
+2303,false,3,3,3,30,3.299999952316284,30.3,'08/19/09','3',2009-08-19 01:03:08.130000000,2009,8
+2302,true,2,2,2,20,2.200000047683716,20.2,'08/19/09','2',2009-08-19 01:02:08.110000000,2009,8
+2301,false,1,1,1,10,1.100000023841858,10.1,'08/19/09','1',2009-08-19 01:01:08.100000000,2009,8
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5560
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain where bigint_col = 0 and month = 7 and year = 2010
+---- RESULTS
+5540,true,0,0,0,0,0,0,'07/09/10','0',2010-07-08 23:20:03.600000000,2010,7
+5530,true,0,0,0,0,0,0,'07/08/10','0',2010-07-07 23:10:03.150000000,2010,7
+5520,true,0,0,0,0,0,0,'07/07/10','0',2010-07-06 23:00:02.700000000,2010,7
+5560,true,0,0,0,0,0,0,'07/11/10','0',2010-07-10 23:40:04.500000000,2010,7
+5550,true,0,0,0,0,0,0,'07/10/10','0',2010-07-09 23:30:04.500000000,2010,7
+5490,true,0,0,0,0,0,0,'07/04/10','0',2010-07-03 22:30:01.350000000,2010,7
+5480,true,0,0,0,0,0,0,'07/03/10','0',2010-07-02 22:20:00.900000000,2010,7
+5470,true,0,0,0,0,0,0,'07/02/10','0',2010-07-01 22:10:00.450000000,2010,7
+5510,true,0,0,0,0,0,0,'07/06/10','0',2010-07-05 22:50:02.250000000,2010,7
+5500,true,0,0,0,0,0,0,'07/05/10','0',2010-07-04 22:40:01.800000000,2010,7
+5640,true,0,0,0,0,0,0,'07/19/10','0',2010-07-19 01:00:08.100000000,2010,7
+5630,true,0,0,0,0,0,0,'07/18/10','0',2010-07-18 00:50:07.650000000,2010,7
+5670,true,0,0,0,0,0,0,'07/22/10','0',2010-07-22 01:30:09.450000000,2010,7
+5660,true,0,0,0,0,0,0,'07/21/10','0',2010-07-21 01:20:09,2010,7
+5650,true,0,0,0,0,0,0,'07/20/10','0',2010-07-20 01:10:08.550000000,2010,7
+5590,true,0,0,0,0,0,0,'07/14/10','0',2010-07-14 00:10:05.850000000,2010,7
+5580,true,0,0,0,0,0,0,'07/13/10','0',2010-07-13 00:00:05.400000000,2010,7
+5570,true,0,0,0,0,0,0,'07/12/10','0',2010-07-11 23:50:04.950000000,2010,7
+5620,true,0,0,0,0,0,0,'07/17/10','0',2010-07-17 00:40:07.200000000,2010,7
+5610,true,0,0,0,0,0,0,'07/16/10','0',2010-07-16 00:30:06.750000000,2010,7
+5600,true,0,0,0,0,0,0,'07/15/10','0',2010-07-15 00:20:06.300000000,2010,7
+5740,true,0,0,0,0,0,0,'07/29/10','0',2010-07-29 02:40:12.600000000,2010,7
+5730,true,0,0,0,0,0,0,'07/28/10','0',2010-07-28 02:30:12.150000000,2010,7
+5720,true,0,0,0,0,0,0,'07/27/10','0',2010-07-27 02:20:11.700000000,2010,7
+5760,true,0,0,0,0,0,0,'07/31/10','0',2010-07-31 03:00:13.500000000,2010,7
+5750,true,0,0,0,0,0,0,'07/30/10','0',2010-07-30 02:50:13.500000000,2010,7
+5690,true,0,0,0,0,0,0,'07/24/10','0',2010-07-24 01:50:10.350000000,2010,7
+5680,true,0,0,0,0,0,0,'07/23/10','0',2010-07-23 01:40:09.900000000,2010,7
+5460,true,0,0,0,0,0,0,'07/01/10','0',2010-06-30 22:00:00,2010,7
+5710,true,0,0,0,0,0,0,'07/26/10','0',2010-07-26 02:10:11.250000000,2010,7
+5700,true,0,0,0,0,0,0,'07/25/10','0',2010-07-25 02:00:10.800000000,2010,7
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5380
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain where date_string_col = '02/02/09'
+---- RESULTS
+320,true,0,0,0,0,0,0,'02/02/09','0',2009-02-01 23:10:00.450000000,2009,2
+321,false,1,1,1,10,1.100000023841858,10.1,'02/02/09','1',2009-02-01 23:11:00.450000000,2009,2
+322,true,2,2,2,20,2.200000047683716,20.2,'02/02/09','2',2009-02-01 23:12:00.460000000,2009,2
+323,false,3,3,3,30,3.299999952316284,30.3,'02/02/09','3',2009-02-01 23:13:00.480000000,2009,2
+324,true,4,4,4,40,4.400000095367432,40.4,'02/02/09','4',2009-02-01 23:14:00.510000000,2009,2
+325,false,5,5,5,50,5.5,50.5,'02/02/09','5',2009-02-01 23:15:00.550000000,2009,2
+326,true,6,6,6,60,6.599999904632568,60.59999999999999,'02/02/09','6',2009-02-01 23:16:00.600000000,2009,2
+327,false,7,7,7,70,7.699999809265137,70.7,'02/02/09','7',2009-02-01 23:17:00.660000000,2009,2
+328,true,8,8,8,80,8.800000190734863,80.8,'02/02/09','8',2009-02-01 23:18:00.730000000,2009,2
+329,false,9,9,9,90,9.899999618530273,90.89999999999999,'02/02/09','9',2009-02-01 23:19:00.810000000,2009,2
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5624
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain
+where month = 1 and smallint_col = 9 and timestamp_col < '2009-12-08 00:19:03'
+---- RESULTS
+129,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/13/09','9',2009-01-13 01:09:05.760000000,2009,1
+139,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/14/09','9',2009-01-14 01:19:06.210000000,2009,1
+99,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/10/09','9',2009-01-10 00:39:04.410000000,2009,1
+109,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/11/09','9',2009-01-11 00:49:04.860000000,2009,1
+119,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/12/09','9',2009-01-12 00:59:05.310000000,2009,1
+179,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/18/09','9',2009-01-18 01:59:08.100000000,2009,1
+189,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/19/09','9',2009-01-19 02:09:08.460000000,2009,1
+149,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/15/09','9',2009-01-15 01:29:06.660000000,2009,1
+159,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/16/09','9',2009-01-16 01:39:07.110000000,2009,1
+169,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/17/09','9',2009-01-17 01:49:07.560000000,2009,1
+59,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/06/09','9',2009-01-05 23:59:02.610000000,2009,1
+49,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/05/09','9',2009-01-04 23:49:02.160000000,2009,1
+39,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/04/09','9',2009-01-03 23:39:01.710000000,2009,1
+79,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/08/09','9',2009-01-08 00:19:03.510000000,2009,1
+69,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/07/09','9',2009-01-07 00:09:03.600000000,2009,1
+9,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9',2008-12-31 23:09:00.360000000,2009,1
+89,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/09/09','9',2009-01-09 00:29:03.960000000,2009,1
+29,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/03/09','9',2009-01-02 23:29:01.260000000,2009,1
+19,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/02/09','9',2009-01-01 23:19:00.810000000,2009,1
+279,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/28/09','9',2009-01-28 03:39:12.510000000,2009,1
+269,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/27/09','9',2009-01-27 03:29:12.600000000,2009,1
+259,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/26/09','9',2009-01-26 03:19:11.610000000,2009,1
+309,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/31/09','9',2009-01-31 04:09:13.860000000,2009,1
+299,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/30/09','9',2009-01-30 03:59:13.410000000,2009,1
+289,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/29/09','9',2009-01-29 03:49:12.960000000,2009,1
+219,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/22/09','9',2009-01-22 02:39:09.810000000,2009,1
+209,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/21/09','9',2009-01-21 02:29:09.360000000,2009,1
+199,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/20/09','9',2009-01-20 02:19:08.910000000,2009,1
+249,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/25/09','9',2009-01-25 03:09:11.160000000,2009,1
+239,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/24/09','9',2009-01-24 02:59:10.710000000,2009,1
+229,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/23/09','9',2009-01-23 02:49:10.260000000,2009,1
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5199
+====
+---- QUERY
+select * from alltypes_tiny_pages_plain
+where double_col > 70 and double_col < 71 and month = 8 and year = 2010
+---- RESULTS
+6037,false,7,7,7,70,7.699999809265137,70.7,'08/27/10','7',2010-08-27 02:27:11.910000000,2010,8
+6027,false,7,7,7,70,7.699999809265137,70.7,'08/26/10','7',2010-08-26 02:17:11.460000000,2010,8
+6067,false,7,7,7,70,7.699999809265137,70.7,'08/30/10','7',2010-08-30 02:57:13.260000000,2010,8
+6057,false,7,7,7,70,7.699999809265137,70.7,'08/29/10','7',2010-08-29 02:47:12.810000000,2010,8
+6047,false,7,7,7,70,7.699999809265137,70.7,'08/28/10','7',2010-08-28 02:37:12.360000000,2010,8
+5987,false,7,7,7,70,7.699999809265137,70.7,'08/22/10','7',2010-08-22 01:37:09.660000000,2010,8
+5977,false,7,7,7,70,7.699999809265137,70.7,'08/21/10','7',2010-08-21 01:27:09.210000000,2010,8
+6017,false,7,7,7,70,7.699999809265137,70.7,'08/25/10','7',2010-08-25 02:07:11.100000000,2010,8
+6007,false,7,7,7,70,7.699999809265137,70.7,'08/24/10','7',2010-08-24 01:57:10.560000000,2010,8
+5997,false,7,7,7,70,7.699999809265137,70.7,'08/23/10','7',2010-08-23 01:47:10.110000000,2010,8
+5837,false,7,7,7,70,7.699999809265137,70.7,'08/07/10','7',2010-08-06 23:07:02.910000000,2010,8
+5827,false,7,7,7,70,7.699999809265137,70.7,'08/06/10','7',2010-08-05 22:57:02.460000000,2010,8
+5867,false,7,7,7,70,7.699999809265137,70.7,'08/10/10','7',2010-08-09 23:37:04.260000000,2010,8
+5857,false,7,7,7,70,7.699999809265137,70.7,'08/09/10','7',2010-08-08 23:27:03.810000000,2010,8
+5847,false,7,7,7,70,7.699999809265137,70.7,'08/08/10','7',2010-08-07 23:17:03.360000000,2010,8
+5787,false,7,7,7,70,7.699999809265137,70.7,'08/02/10','7',2010-08-01 22:17:00.660000000,2010,8
+5777,false,7,7,7,70,7.699999809265137,70.7,'08/01/10','7',2010-07-31 22:07:00.210000000,2010,8
+5817,false,7,7,7,70,7.699999809265137,70.7,'08/05/10','7',2010-08-04 22:47:02.100000000,2010,8
+5807,false,7,7,7,70,7.699999809265137,70.7,'08/04/10','7',2010-08-03 22:37:01.560000000,2010,8
+5797,false,7,7,7,70,7.699999809265137,70.7,'08/03/10','7',2010-08-02 22:27:01.110000000,2010,8
+5937,false,7,7,7,70,7.699999809265137,70.7,'08/17/10','7',2010-08-17 00:47:07.410000000,2010,8
+5927,false,7,7,7,70,7.699999809265137,70.7,'08/16/10','7',2010-08-16 00:37:06.960000000,2010,8
+5967,false,7,7,7,70,7.699999809265137,70.7,'08/20/10','7',2010-08-20 01:17:08.760000000,2010,8
+5957,false,7,7,7,70,7.699999809265137,70.7,'08/19/10','7',2010-08-19 01:07:08.310000000,2010,8
+5947,false,7,7,7,70,7.699999809265137,70.7,'08/18/10','7',2010-08-18 00:57:07.860000000,2010,8
+5887,false,7,7,7,70,7.699999809265137,70.7,'08/12/10','7',2010-08-11 23:57:05.160000000,2010,8
+5877,false,7,7,7,70,7.699999809265137,70.7,'08/11/10','7',2010-08-10 23:47:04.710000000,2010,8
+5917,false,7,7,7,70,7.699999809265137,70.7,'08/15/10','7',2010-08-15 00:27:06.510000000,2010,8
+5907,false,7,7,7,70,7.699999809265137,70.7,'08/14/10','7',2010-08-14 00:17:06.600000000,2010,8
+5897,false,7,7,7,70,7.699999809265137,70.7,'08/13/10','7',2010-08-13 00:07:05.610000000,2010,8
+6077,false,7,7,7,70,7.699999809265137,70.7,'08/31/10','7',2010-08-31 03:07:13.710000000,2010,8
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5381
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test
new file mode 100644
index 0000000..3b9db6f
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-alltypes-tiny-pages.test
@@ -0,0 +1,234 @@
+# These tests check that page selection and value-skipping works well for many columns.
+# Queries have predicates on different columns, might have multiple predicates joined by
+# AND. This way we can test how page filtering combines these predicates to filter out
+# even more rows.
+====
+---- QUERY
+select * from alltypes_tiny_pages where id < 30
+---- RESULTS
+13,false,3,3,3,30,3.299999952316284,30.3,'01/02/09','3',2009-01-01 23:13:00.480000000,2009,1
+12,true,2,2,2,20,2.200000047683716,20.2,'01/02/09','2',2009-01-01 23:12:00.460000000,2009,1
+11,false,1,1,1,10,1.100000023841858,10.1,'01/02/09','1',2009-01-01 23:11:00.450000000,2009,1
+10,true,0,0,0,0,0,0,'01/02/09','0',2009-01-01 23:10:00.450000000,2009,1
+9,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9',2008-12-31 23:09:00.360000000,2009,1
+8,true,8,8,8,80,8.800000190734863,80.8,'01/01/09','8',2008-12-31 23:08:00.280000000,2009,1
+7,false,7,7,7,70,7.699999809265137,70.7,'01/01/09','7',2008-12-31 23:07:00.210000000,2009,1
+6,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/01/09','6',2008-12-31 23:06:00.150000000,2009,1
+5,false,5,5,5,50,5.5,50.5,'01/01/09','5',2008-12-31 23:05:00.100000000,2009,1
+4,true,4,4,4,40,4.400000095367432,40.4,'01/01/09','4',2008-12-31 23:04:00.600000000,2009,1
+3,false,3,3,3,30,3.299999952316284,30.3,'01/01/09','3',2008-12-31 23:03:00.300000000,2009,1
+2,true,2,2,2,20,2.200000047683716,20.2,'01/01/09','2',2008-12-31 23:02:00.100000000,2009,1
+1,false,1,1,1,10,1.100000023841858,10.1,'01/01/09','1',2008-12-31 23:01:00,2009,1
+0,true,0,0,0,0,0,0,'01/01/09','0',2008-12-31 23:00:00,2009,1
+29,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/03/09','9',2009-01-02 23:29:01.260000000,2009,1
+28,true,8,8,8,80,8.800000190734863,80.8,'01/03/09','8',2009-01-02 23:28:01.180000000,2009,1
+27,false,7,7,7,70,7.699999809265137,70.7,'01/03/09','7',2009-01-02 23:27:01.110000000,2009,1
+26,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/03/09','6',2009-01-02 23:26:01.500000000,2009,1
+25,false,5,5,5,50,5.5,50.5,'01/03/09','5',2009-01-02 23:25:01,2009,1
+24,true,4,4,4,40,4.400000095367432,40.4,'01/03/09','4',2009-01-02 23:24:00.960000000,2009,1
+23,false,3,3,3,30,3.299999952316284,30.3,'01/03/09','3',2009-01-02 23:23:00.930000000,2009,1
+22,true,2,2,2,20,2.200000047683716,20.2,'01/03/09','2',2009-01-02 23:22:00.910000000,2009,1
+21,false,1,1,1,10,1.100000023841858,10.1,'01/03/09','1',2009-01-02 23:21:00.900000000,2009,1
+20,true,0,0,0,0,0,0,'01/03/09','0',2009-01-02 23:20:00.900000000,2009,1
+19,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/02/09','9',2009-01-01 23:19:00.810000000,2009,1
+18,true,8,8,8,80,8.800000190734863,80.8,'01/02/09','8',2009-01-01 23:18:00.730000000,2009,1
+17,false,7,7,7,70,7.699999809265137,70.7,'01/02/09','7',2009-01-01 23:17:00.660000000,2009,1
+16,true,6,6,6,60,6.599999904632568,60.59999999999999,'01/02/09','6',2009-01-01 23:16:00.600000000,2009,1
+15,false,5,5,5,50,5.5,50.5,'01/02/09','5',2009-01-01 23:15:00.550000000,2009,1
+14,true,4,4,4,40,4.400000095367432,40.4,'01/02/09','4',2009-01-01 23:14:00.510000000,2009,1
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5736
+====
+---- QUERY
+select * from alltypes_tiny_pages where id > 7270
+---- RESULTS
+7290,true,0,0,0,0,0,0,'12/31/10','0',2010-12-31 04:00:13.500000000,2010,12
+7291,false,1,1,1,10,1.100000023841858,10.1,'12/31/10','1',2010-12-31 04:01:13.500000000,2010,12
+7292,true,2,2,2,20,2.200000047683716,20.2,'12/31/10','2',2010-12-31 04:02:13.510000000,2010,12
+7293,false,3,3,3,30,3.299999952316284,30.3,'12/31/10','3',2010-12-31 04:03:13.530000000,2010,12
+7294,true,4,4,4,40,4.400000095367432,40.4,'12/31/10','4',2010-12-31 04:04:13.560000000,2010,12
+7295,false,5,5,5,50,5.5,50.5,'12/31/10','5',2010-12-31 04:05:13.600000000,2010,12
+7296,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/31/10','6',2010-12-31 04:06:13.650000000,2010,12
+7297,false,7,7,7,70,7.699999809265137,70.7,'12/31/10','7',2010-12-31 04:07:13.710000000,2010,12
+7298,true,8,8,8,80,8.800000190734863,80.8,'12/31/10','8',2010-12-31 04:08:13.780000000,2010,12
+7299,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/31/10','9',2010-12-31 04:09:13.860000000,2010,12
+7289,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/30/10','9',2010-12-30 03:59:13.410000000,2010,12
+7288,true,8,8,8,80,8.800000190734863,80.8,'12/30/10','8',2010-12-30 03:58:13.330000000,2010,12
+7287,false,7,7,7,70,7.699999809265137,70.7,'12/30/10','7',2010-12-30 03:57:13.260000000,2010,12
+7286,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/30/10','6',2010-12-30 03:56:13.200000000,2010,12
+7285,false,5,5,5,50,5.5,50.5,'12/30/10','5',2010-12-30 03:55:13.150000000,2010,12
+7284,true,4,4,4,40,4.400000095367432,40.4,'12/30/10','4',2010-12-30 03:54:13.110000000,2010,12
+7283,false,3,3,3,30,3.299999952316284,30.3,'12/30/10','3',2010-12-30 03:53:13.800000000,2010,12
+7282,true,2,2,2,20,2.200000047683716,20.2,'12/30/10','2',2010-12-30 03:52:13.600000000,2010,12
+7281,false,1,1,1,10,1.100000023841858,10.1,'12/30/10','1',2010-12-30 03:51:13.500000000,2010,12
+7280,true,0,0,0,0,0,0,'12/30/10','0',2010-12-30 03:50:13.500000000,2010,12
+7279,false,9,9,9,90,9.899999618530273,90.89999999999999,'12/29/10','9',2010-12-29 03:49:12.960000000,2010,12
+7278,true,8,8,8,80,8.800000190734863,80.8,'12/29/10','8',2010-12-29 03:48:12.880000000,2010,12
+7277,false,7,7,7,70,7.699999809265137,70.7,'12/29/10','7',2010-12-29 03:47:12.810000000,2010,12
+7276,true,6,6,6,60,6.599999904632568,60.59999999999999,'12/29/10','6',2010-12-29 03:46:12.750000000,2010,12
+7275,false,5,5,5,50,5.5,50.5,'12/29/10','5',2010-12-29 03:45:12.700000000,2010,12
+7274,true,4,4,4,40,4.400000095367432,40.4,'12/29/10','4',2010-12-29 03:44:12.660000000,2010,12
+7273,false,3,3,3,30,3.299999952316284,30.3,'12/29/10','3',2010-12-29 03:43:12.630000000,2010,12
+7272,true,2,2,2,20,2.200000047683716,20.2,'12/29/10','2',2010-12-29 03:42:12.610000000,2010,12
+7271,false,1,1,1,10,1.100000023841858,10.1,'12/29/10','1',2010-12-29 03:41:12.600000000,2010,12
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5736
+====
+---- QUERY
+select * from alltypes_tiny_pages where id > 2300 and id < 2310
+---- RESULTS
+2309,false,9,9,9,90,9.899999618530273,90.89999999999999,'08/19/09','9',2009-08-19 01:09:08.460000000,2009,8
+2308,true,8,8,8,80,8.800000190734863,80.8,'08/19/09','8',2009-08-19 01:08:08.380000000,2009,8
+2307,false,7,7,7,70,7.699999809265137,70.7,'08/19/09','7',2009-08-19 01:07:08.310000000,2009,8
+2306,true,6,6,6,60,6.599999904632568,60.59999999999999,'08/19/09','6',2009-08-19 01:06:08.250000000,2009,8
+2305,false,5,5,5,50,5.5,50.5,'08/19/09','5',2009-08-19 01:05:08.200000000,2009,8
+2304,true,4,4,4,40,4.400000095367432,40.4,'08/19/09','4',2009-08-19 01:04:08.160000000,2009,8
+2303,false,3,3,3,30,3.299999952316284,30.3,'08/19/09','3',2009-08-19 01:03:08.130000000,2009,8
+2302,true,2,2,2,20,2.200000047683716,20.2,'08/19/09','2',2009-08-19 01:02:08.110000000,2009,8
+2301,false,1,1,1,10,1.100000023841858,10.1,'08/19/09','1',2009-08-19 01:01:08.100000000,2009,8
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5639
+====
+---- QUERY
+select * from alltypes_tiny_pages where bigint_col = 0 and month = 7 and year = 2010
+---- RESULTS
+5540,true,0,0,0,0,0,0,'07/09/10','0',2010-07-08 23:20:03.600000000,2010,7
+5530,true,0,0,0,0,0,0,'07/08/10','0',2010-07-07 23:10:03.150000000,2010,7
+5520,true,0,0,0,0,0,0,'07/07/10','0',2010-07-06 23:00:02.700000000,2010,7
+5560,true,0,0,0,0,0,0,'07/11/10','0',2010-07-10 23:40:04.500000000,2010,7
+5550,true,0,0,0,0,0,0,'07/10/10','0',2010-07-09 23:30:04.500000000,2010,7
+5490,true,0,0,0,0,0,0,'07/04/10','0',2010-07-03 22:30:01.350000000,2010,7
+5480,true,0,0,0,0,0,0,'07/03/10','0',2010-07-02 22:20:00.900000000,2010,7
+5470,true,0,0,0,0,0,0,'07/02/10','0',2010-07-01 22:10:00.450000000,2010,7
+5510,true,0,0,0,0,0,0,'07/06/10','0',2010-07-05 22:50:02.250000000,2010,7
+5500,true,0,0,0,0,0,0,'07/05/10','0',2010-07-04 22:40:01.800000000,2010,7
+5640,true,0,0,0,0,0,0,'07/19/10','0',2010-07-19 01:00:08.100000000,2010,7
+5630,true,0,0,0,0,0,0,'07/18/10','0',2010-07-18 00:50:07.650000000,2010,7
+5670,true,0,0,0,0,0,0,'07/22/10','0',2010-07-22 01:30:09.450000000,2010,7
+5660,true,0,0,0,0,0,0,'07/21/10','0',2010-07-21 01:20:09,2010,7
+5650,true,0,0,0,0,0,0,'07/20/10','0',2010-07-20 01:10:08.550000000,2010,7
+5590,true,0,0,0,0,0,0,'07/14/10','0',2010-07-14 00:10:05.850000000,2010,7
+5580,true,0,0,0,0,0,0,'07/13/10','0',2010-07-13 00:00:05.400000000,2010,7
+5570,true,0,0,0,0,0,0,'07/12/10','0',2010-07-11 23:50:04.950000000,2010,7
+5620,true,0,0,0,0,0,0,'07/17/10','0',2010-07-17 00:40:07.200000000,2010,7
+5610,true,0,0,0,0,0,0,'07/16/10','0',2010-07-16 00:30:06.750000000,2010,7
+5600,true,0,0,0,0,0,0,'07/15/10','0',2010-07-15 00:20:06.300000000,2010,7
+5740,true,0,0,0,0,0,0,'07/29/10','0',2010-07-29 02:40:12.600000000,2010,7
+5730,true,0,0,0,0,0,0,'07/28/10','0',2010-07-28 02:30:12.150000000,2010,7
+5720,true,0,0,0,0,0,0,'07/27/10','0',2010-07-27 02:20:11.700000000,2010,7
+5760,true,0,0,0,0,0,0,'07/31/10','0',2010-07-31 03:00:13.500000000,2010,7
+5750,true,0,0,0,0,0,0,'07/30/10','0',2010-07-30 02:50:13.500000000,2010,7
+5690,true,0,0,0,0,0,0,'07/24/10','0',2010-07-24 01:50:10.350000000,2010,7
+5680,true,0,0,0,0,0,0,'07/23/10','0',2010-07-23 01:40:09.900000000,2010,7
+5460,true,0,0,0,0,0,0,'07/01/10','0',2010-06-30 22:00:00,2010,7
+5710,true,0,0,0,0,0,0,'07/26/10','0',2010-07-26 02:10:11.250000000,2010,7
+5700,true,0,0,0,0,0,0,'07/25/10','0',2010-07-25 02:00:10.800000000,2010,7
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5455
+====
+---- QUERY
+select * from alltypes_tiny_pages where date_string_col = '02/02/09'
+---- RESULTS
+320,true,0,0,0,0,0,0,'02/02/09','0',2009-02-01 23:10:00.450000000,2009,2
+321,false,1,1,1,10,1.100000023841858,10.1,'02/02/09','1',2009-02-01 23:11:00.450000000,2009,2
+322,true,2,2,2,20,2.200000047683716,20.2,'02/02/09','2',2009-02-01 23:12:00.460000000,2009,2
+323,false,3,3,3,30,3.299999952316284,30.3,'02/02/09','3',2009-02-01 23:13:00.480000000,2009,2
+324,true,4,4,4,40,4.400000095367432,40.4,'02/02/09','4',2009-02-01 23:14:00.510000000,2009,2
+325,false,5,5,5,50,5.5,50.5,'02/02/09','5',2009-02-01 23:15:00.550000000,2009,2
+326,true,6,6,6,60,6.599999904632568,60.59999999999999,'02/02/09','6',2009-02-01 23:16:00.600000000,2009,2
+327,false,7,7,7,70,7.699999809265137,70.7,'02/02/09','7',2009-02-01 23:17:00.660000000,2009,2
+328,true,8,8,8,80,8.800000190734863,80.8,'02/02/09','8',2009-02-01 23:18:00.730000000,2009,2
+329,false,9,9,9,90,9.899999618530273,90.89999999999999,'02/02/09','9',2009-02-01 23:19:00.810000000,2009,2
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5703
+====
+---- QUERY
+select * from alltypes_tiny_pages
+where month = 1 and smallint_col = 9 and timestamp_col < '2009-12-08 00:19:03'
+---- RESULTS
+129,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/13/09','9',2009-01-13 01:09:05.760000000,2009,1
+139,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/14/09','9',2009-01-14 01:19:06.210000000,2009,1
+99,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/10/09','9',2009-01-10 00:39:04.410000000,2009,1
+109,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/11/09','9',2009-01-11 00:49:04.860000000,2009,1
+119,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/12/09','9',2009-01-12 00:59:05.310000000,2009,1
+179,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/18/09','9',2009-01-18 01:59:08.100000000,2009,1
+189,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/19/09','9',2009-01-19 02:09:08.460000000,2009,1
+149,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/15/09','9',2009-01-15 01:29:06.660000000,2009,1
+159,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/16/09','9',2009-01-16 01:39:07.110000000,2009,1
+169,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/17/09','9',2009-01-17 01:49:07.560000000,2009,1
+59,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/06/09','9',2009-01-05 23:59:02.610000000,2009,1
+49,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/05/09','9',2009-01-04 23:49:02.160000000,2009,1
+39,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/04/09','9',2009-01-03 23:39:01.710000000,2009,1
+79,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/08/09','9',2009-01-08 00:19:03.510000000,2009,1
+69,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/07/09','9',2009-01-07 00:09:03.600000000,2009,1
+9,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/01/09','9',2008-12-31 23:09:00.360000000,2009,1
+89,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/09/09','9',2009-01-09 00:29:03.960000000,2009,1
+29,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/03/09','9',2009-01-02 23:29:01.260000000,2009,1
+19,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/02/09','9',2009-01-01 23:19:00.810000000,2009,1
+279,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/28/09','9',2009-01-28 03:39:12.510000000,2009,1
+269,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/27/09','9',2009-01-27 03:29:12.600000000,2009,1
+259,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/26/09','9',2009-01-26 03:19:11.610000000,2009,1
+309,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/31/09','9',2009-01-31 04:09:13.860000000,2009,1
+299,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/30/09','9',2009-01-30 03:59:13.410000000,2009,1
+289,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/29/09','9',2009-01-29 03:49:12.960000000,2009,1
+219,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/22/09','9',2009-01-22 02:39:09.810000000,2009,1
+209,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/21/09','9',2009-01-21 02:29:09.360000000,2009,1
+199,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/20/09','9',2009-01-20 02:19:08.910000000,2009,1
+249,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/25/09','9',2009-01-25 03:09:11.160000000,2009,1
+239,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/24/09','9',2009-01-24 02:59:10.710000000,2009,1
+229,false,9,9,9,90,9.899999618530273,90.89999999999999,'01/23/09','9',2009-01-23 02:49:10.260000000,2009,1
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5273
+====
+---- QUERY
+select * from alltypes_tiny_pages
+where double_col > 70 and double_col < 71 and month = 8 and year = 2010
+---- RESULTS
+6037,false,7,7,7,70,7.699999809265137,70.7,'08/27/10','7',2010-08-27 02:27:11.910000000,2010,8
+6027,false,7,7,7,70,7.699999809265137,70.7,'08/26/10','7',2010-08-26 02:17:11.460000000,2010,8
+6067,false,7,7,7,70,7.699999809265137,70.7,'08/30/10','7',2010-08-30 02:57:13.260000000,2010,8
+6057,false,7,7,7,70,7.699999809265137,70.7,'08/29/10','7',2010-08-29 02:47:12.810000000,2010,8
+6047,false,7,7,7,70,7.699999809265137,70.7,'08/28/10','7',2010-08-28 02:37:12.360000000,2010,8
+5987,false,7,7,7,70,7.699999809265137,70.7,'08/22/10','7',2010-08-22 01:37:09.660000000,2010,8
+5977,false,7,7,7,70,7.699999809265137,70.7,'08/21/10','7',2010-08-21 01:27:09.210000000,2010,8
+6017,false,7,7,7,70,7.699999809265137,70.7,'08/25/10','7',2010-08-25 02:07:11.100000000,2010,8
+6007,false,7,7,7,70,7.699999809265137,70.7,'08/24/10','7',2010-08-24 01:57:10.560000000,2010,8
+5997,false,7,7,7,70,7.699999809265137,70.7,'08/23/10','7',2010-08-23 01:47:10.110000000,2010,8
+5837,false,7,7,7,70,7.699999809265137,70.7,'08/07/10','7',2010-08-06 23:07:02.910000000,2010,8
+5827,false,7,7,7,70,7.699999809265137,70.7,'08/06/10','7',2010-08-05 22:57:02.460000000,2010,8
+5867,false,7,7,7,70,7.699999809265137,70.7,'08/10/10','7',2010-08-09 23:37:04.260000000,2010,8
+5857,false,7,7,7,70,7.699999809265137,70.7,'08/09/10','7',2010-08-08 23:27:03.810000000,2010,8
+5847,false,7,7,7,70,7.699999809265137,70.7,'08/08/10','7',2010-08-07 23:17:03.360000000,2010,8
+5787,false,7,7,7,70,7.699999809265137,70.7,'08/02/10','7',2010-08-01 22:17:00.660000000,2010,8
+5777,false,7,7,7,70,7.699999809265137,70.7,'08/01/10','7',2010-07-31 22:07:00.210000000,2010,8
+5817,false,7,7,7,70,7.699999809265137,70.7,'08/05/10','7',2010-08-04 22:47:02.100000000,2010,8
+5807,false,7,7,7,70,7.699999809265137,70.7,'08/04/10','7',2010-08-03 22:37:01.560000000,2010,8
+5797,false,7,7,7,70,7.699999809265137,70.7,'08/03/10','7',2010-08-02 22:27:01.110000000,2010,8
+5937,false,7,7,7,70,7.699999809265137,70.7,'08/17/10','7',2010-08-17 00:47:07.410000000,2010,8
+5927,false,7,7,7,70,7.699999809265137,70.7,'08/16/10','7',2010-08-16 00:37:06.960000000,2010,8
+5967,false,7,7,7,70,7.699999809265137,70.7,'08/20/10','7',2010-08-20 01:17:08.760000000,2010,8
+5957,false,7,7,7,70,7.699999809265137,70.7,'08/19/10','7',2010-08-19 01:07:08.310000000,2010,8
+5947,false,7,7,7,70,7.699999809265137,70.7,'08/18/10','7',2010-08-18 00:57:07.860000000,2010,8
+5887,false,7,7,7,70,7.699999809265137,70.7,'08/12/10','7',2010-08-11 23:57:05.160000000,2010,8
+5877,false,7,7,7,70,7.699999809265137,70.7,'08/11/10','7',2010-08-10 23:47:04.710000000,2010,8
+5917,false,7,7,7,70,7.699999809265137,70.7,'08/15/10','7',2010-08-15 00:27:06.510000000,2010,8
+5907,false,7,7,7,70,7.699999809265137,70.7,'08/14/10','7',2010-08-14 00:17:06.600000000,2010,8
+5897,false,7,7,7,70,7.699999809265137,70.7,'08/13/10','7',2010-08-13 00:07:05.610000000,2010,8
+6077,false,7,7,7,70,7.699999809265137,70.7,'08/31/10','7',2010-08-31 03:07:13.710000000,2010,8
+---- TYPES
+INT, BOOLEAN, TINYINT, SMALLINT, INT, BIGINT, FLOAT, DOUBLE, STRING, STRING, TIMESTAMP, INT, INT
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 5457
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test
new file mode 100644
index 0000000..5230fe9
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index-large.test
@@ -0,0 +1,357 @@
+# Test page selection and value skipping logic on a large table such as 'lineitem'.
+# Queries have predicates on different columns, might have multiple predicates
+# joined by AND. This way we can test how page filtering combines these predicates
+# to filter out even more rows.
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_orderkey < 50
+---- RESULTS
+1,155190,7706,1,17.00,21168.23,0.04,0.02,'N','O','1996-03-13','1996-02-12','1996-03-22','DELIVER IN PERSON','TRUCK','egular courts above the'
+1,67310,7311,2,36.00,45983.16,0.09,0.06,'N','O','1996-04-12','1996-02-28','1996-04-20','TAKE BACK RETURN','MAIL','ly final dependencies: slyly bold '
+1,63700,3701,3,8.00,13309.60,0.10,0.02,'N','O','1996-01-29','1996-03-05','1996-01-31','TAKE BACK RETURN','REG AIR','riously. regular, express dep'
+1,2132,4633,4,28.00,28955.64,0.09,0.06,'N','O','1996-04-21','1996-03-30','1996-05-16','NONE','AIR','lites. fluffily even de'
+1,24027,1534,5,24.00,22824.48,0.10,0.04,'N','O','1996-03-30','1996-03-14','1996-04-01','NONE','FOB',' pending foxes. slyly re'
+1,15635,638,6,32.00,49620.16,0.07,0.02,'N','O','1996-01-30','1996-02-07','1996-02-03','DELIVER IN PERSON','MAIL','arefully slyly ex'
+2,106170,1191,1,38.00,44694.46,0.00,0.05,'N','O','1997-01-28','1997-01-14','1997-02-02','TAKE BACK RETURN','RAIL','ven requests. deposits breach a'
+3,4297,1798,1,45.00,54058.05,0.06,0.00,'R','F','1994-02-02','1994-01-04','1994-02-23','NONE','AIR','ongside of the furiously brave acco'
+3,19036,6540,2,49.00,46796.47,0.10,0.00,'R','F','1993-11-09','1993-12-20','1993-11-24','TAKE BACK RETURN','RAIL',' unusual accounts. eve'
+3,128449,3474,3,27.00,39890.88,0.06,0.07,'A','F','1994-01-16','1993-11-22','1994-01-23','DELIVER IN PERSON','SHIP','nal foxes wake. '
+3,29380,1883,4,2.00,2618.76,0.01,0.06,'A','F','1993-12-04','1994-01-07','1994-01-01','NONE','TRUCK','y. fluffily pending d'
+3,183095,650,5,28.00,32986.52,0.04,0.00,'R','F','1993-12-14','1994-01-10','1994-01-01','TAKE BACK RETURN','FOB','ages nag slyly pending'
+3,62143,9662,6,26.00,28733.64,0.10,0.02,'A','F','1993-10-29','1993-12-18','1993-11-04','TAKE BACK RETURN','RAIL','ges sleep after the caref'
+4,88035,5560,1,30.00,30690.90,0.03,0.08,'N','O','1996-01-10','1995-12-14','1996-01-18','DELIVER IN PERSON','REG AIR','- quickly regular packages sleep. idly'
+5,108570,8571,1,15.00,23678.55,0.02,0.04,'R','F','1994-10-31','1994-08-31','1994-11-20','NONE','AIR','ts wake furiously '
+5,123927,3928,2,26.00,50723.92,0.07,0.08,'R','F','1994-10-16','1994-09-25','1994-10-19','NONE','FOB','sts use slyly quickly special instruc'
+5,37531,35,3,50.00,73426.50,0.08,0.03,'A','F','1994-08-08','1994-10-13','1994-08-26','DELIVER IN PERSON','AIR','eodolites. fluffily unusual'
+6,139636,2150,1,37.00,61998.31,0.08,0.03,'A','F','1992-04-27','1992-05-15','1992-05-02','TAKE BACK RETURN','TRUCK','p furiously special foxes'
+7,182052,9607,1,12.00,13608.60,0.07,0.03,'N','O','1996-05-07','1996-03-13','1996-06-03','TAKE BACK RETURN','FOB','ss pinto beans wake against th'
+7,145243,7758,2,9.00,11594.16,0.08,0.08,'N','O','1996-02-01','1996-03-02','1996-02-19','TAKE BACK RETURN','SHIP','es. instructions'
+7,94780,9799,3,46.00,81639.88,0.10,0.07,'N','O','1996-01-15','1996-03-27','1996-02-03','COLLECT COD','MAIL',' unusual reques'
+7,163073,3074,4,28.00,31809.96,0.03,0.04,'N','O','1996-03-21','1996-04-08','1996-04-20','NONE','FOB','. slyly special requests haggl'
+7,151894,9440,5,38.00,73943.82,0.08,0.01,'N','O','1996-02-11','1996-02-24','1996-02-18','DELIVER IN PERSON','TRUCK','ns haggle carefully ironic deposits. bl'
+7,79251,1759,6,35.00,43058.75,0.06,0.03,'N','O','1996-01-16','1996-02-23','1996-01-22','TAKE BACK RETURN','FOB','jole. excuses wake carefully alongside of '
+7,157238,2269,7,5.00,6476.15,0.04,0.02,'N','O','1996-02-10','1996-03-26','1996-02-13','NONE','FOB','ithely regula'
+32,82704,7721,1,28.00,47227.60,0.05,0.08,'N','O','1995-10-23','1995-08-27','1995-10-26','TAKE BACK RETURN','TRUCK','sleep quickly. req'
+32,197921,441,2,32.00,64605.44,0.02,0.00,'N','O','1995-08-14','1995-10-07','1995-08-27','COLLECT COD','AIR','lithely regular deposits. fluffily '
+32,44161,6666,3,2.00,2210.32,0.09,0.02,'N','O','1995-08-07','1995-10-07','1995-08-23','DELIVER IN PERSON','AIR',' express accounts wake according to the'
+32,2743,7744,4,4.00,6582.96,0.09,0.03,'N','O','1995-08-04','1995-10-01','1995-09-03','NONE','REG AIR','e slyly final pac'
+32,85811,8320,5,44.00,79059.64,0.05,0.06,'N','O','1995-08-28','1995-08-20','1995-09-14','DELIVER IN PERSON','AIR','symptotes nag according to the ironic depo'
+32,11615,4117,6,6.00,9159.66,0.04,0.03,'N','O','1995-07-21','1995-09-23','1995-07-25','COLLECT COD','RAIL',' gifts cajole carefully.'
+33,61336,8855,1,31.00,40217.23,0.09,0.04,'A','F','1993-10-29','1993-12-19','1993-11-08','COLLECT COD','TRUCK','ng to the furiously ironic package'
+33,60519,5532,2,32.00,47344.32,0.02,0.05,'A','F','1993-12-09','1994-01-04','1993-12-28','COLLECT COD','MAIL','gular theodolites'
+33,137469,9983,3,5.00,7532.30,0.05,0.03,'A','F','1993-12-09','1993-12-25','1993-12-23','TAKE BACK RETURN','AIR','. stealthily bold exc'
+33,33918,3919,4,41.00,75928.31,0.09,0.00,'R','F','1993-11-09','1994-01-24','1993-11-11','TAKE BACK RETURN','MAIL','unusual packages doubt caref'
+34,88362,871,1,13.00,17554.68,0.00,0.07,'N','O','1998-10-23','1998-09-14','1998-11-06','NONE','REG AIR','nic accounts. deposits are alon'
+34,89414,1923,2,22.00,30875.02,0.08,0.06,'N','O','1998-10-09','1998-10-16','1998-10-12','NONE','FOB','thely slyly p'
+34,169544,4577,3,6.00,9681.24,0.02,0.06,'N','O','1998-10-30','1998-09-20','1998-11-05','NONE','FOB','ar foxes sleep '
+35,450,2951,1,24.00,32410.80,0.02,0.00,'N','O','1996-02-21','1996-01-03','1996-03-18','TAKE BACK RETURN','FOB',', regular tithe'
+35,161940,4457,2,34.00,68065.96,0.06,0.08,'N','O','1996-01-22','1996-01-06','1996-01-27','DELIVER IN PERSON','RAIL','s are carefully against the f'
+35,120896,8433,3,7.00,13418.23,0.06,0.04,'N','O','1996-01-19','1995-12-22','1996-01-29','NONE','MAIL',' the carefully regular '
+35,85175,7684,4,25.00,29004.25,0.06,0.05,'N','O','1995-11-26','1995-12-25','1995-12-21','DELIVER IN PERSON','SHIP',' quickly unti'
+35,119917,4940,5,34.00,65854.94,0.08,0.06,'N','O','1995-11-08','1996-01-15','1995-11-26','COLLECT COD','MAIL','. silent, unusual deposits boost'
+35,30762,3266,6,28.00,47397.28,0.03,0.02,'N','O','1996-02-01','1995-12-24','1996-02-28','COLLECT COD','RAIL','ly alongside of '
+36,119767,9768,1,42.00,75043.92,0.09,0.00,'N','O','1996-02-03','1996-01-21','1996-02-23','COLLECT COD','SHIP',' careful courts. special '
+37,22630,5133,1,40.00,62105.20,0.09,0.03,'A','F','1992-07-21','1992-08-01','1992-08-15','NONE','REG AIR','luffily regular requests. slyly final acco'
+37,126782,1807,2,39.00,70542.42,0.05,0.02,'A','F','1992-07-02','1992-08-18','1992-07-28','TAKE BACK RETURN','RAIL','the final requests. ca'
+37,12903,5405,3,43.00,78083.70,0.05,0.08,'A','F','1992-07-10','1992-07-06','1992-08-02','DELIVER IN PERSON','TRUCK','iously ste'
+38,175839,874,1,44.00,84252.52,0.04,0.02,'N','O','1996-09-29','1996-11-17','1996-09-30','COLLECT COD','MAIL','s. blithely unusual theodolites am'
+39,2320,9821,1,44.00,53782.08,0.09,0.06,'N','O','1996-11-14','1996-12-15','1996-12-12','COLLECT COD','RAIL','eodolites. careful'
+39,186582,4137,2,26.00,43383.08,0.08,0.04,'N','O','1996-11-04','1996-10-20','1996-11-20','NONE','FOB','ckages across the slyly silent'
+39,67831,5350,3,46.00,82746.18,0.06,0.08,'N','O','1996-09-26','1996-12-19','1996-10-26','DELIVER IN PERSON','AIR','he carefully e'
+39,20590,3093,4,32.00,48338.88,0.07,0.05,'N','O','1996-10-02','1996-12-19','1996-10-14','COLLECT COD','MAIL','heodolites sleep silently pending foxes. ac'
+39,54519,9530,5,43.00,63360.93,0.01,0.01,'N','O','1996-10-17','1996-11-14','1996-10-26','COLLECT COD','MAIL','yly regular i'
+39,94368,6878,6,40.00,54494.40,0.06,0.05,'N','O','1996-12-08','1996-10-22','1997-01-01','COLLECT COD','AIR','quickly ironic fox'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_orderkey > 5999950
+---- RESULTS
+5999968,143161,8190,1,13.00,15654.08,0.04,0.06,'A','F','1993-01-03','1993-03-13','1993-01-05','DELIVER IN PERSON','RAIL','ickly bold foxes. blithely f'
+5999968,94639,4640,2,41.00,66978.83,0.08,0.07,'A','F','1993-01-15','1993-03-04','1993-02-08','NONE','TRUCK','yly express ideas. r'
+5999968,183011,3012,3,48.00,52512.48,0.01,0.00,'A','F','1993-04-16','1993-02-10','1993-04-29','COLLECT COD','MAIL','tions after the blithely express instru'
+5999968,58218,5734,4,48.00,56458.08,0.02,0.07,'A','F','1993-04-08','1993-03-24','1993-04-10','NONE','TRUCK','nd the carefully bold pinto beans. flu'
+5999968,55929,5930,5,38.00,71626.96,0.10,0.00,'A','F','1993-01-14','1993-02-12','1993-01-31','DELIVER IN PERSON','REG AIR','ions. slyl'
+5999968,182919,474,6,41.00,82078.31,0.03,0.02,'A','F','1993-03-24','1993-02-16','1993-04-01','DELIVER IN PERSON','FOB','ctions are furiously unusual hockey playe'
+5999968,103669,3670,7,10.00,16726.60,0.08,0.03,'A','F','1993-03-12','1993-02-19','1993-04-09','TAKE BACK RETURN','REG AIR','slyly final instructio'
+5999969,157685,7686,1,15.00,26140.20,0.00,0.02,'N','O','1996-09-11','1996-07-22','1996-10-07','NONE','SHIP','ep slyly brave in'
+5999970,51044,8560,1,22.00,21890.88,0.00,0.08,'N','O','1995-11-25','1996-02-09','1995-12-04','DELIVER IN PERSON','RAIL','theodolites. express, ironic fox'
+5999970,82081,2082,2,49.00,52090.92,0.04,0.08,'N','O','1995-12-23','1996-02-16','1996-01-04','TAKE BACK RETURN','SHIP','at quickly furiously expres'
+5999970,123008,545,3,32.00,32992.00,0.05,0.07,'N','O','1995-12-11','1996-01-06','1995-12-27','NONE','FOB','iously along '
+5999970,184826,4827,4,6.00,11464.92,0.09,0.01,'N','O','1996-02-08','1996-02-09','1996-02-15','COLLECT COD','FOB','slyly regular requests sleep '
+5999970,167769,2802,5,38.00,69796.88,0.03,0.04,'N','O','1996-02-28','1996-01-12','1996-03-06','DELIVER IN PERSON','FOB','ve the clos'
+5999971,30247,2751,1,40.00,47089.60,0.09,0.01,'N','O','1996-10-02','1996-08-23','1996-10-09','COLLECT COD','REG AIR','lly according '
+5999971,144823,4824,2,38.00,70977.16,0.08,0.03,'N','O','1996-11-13','1996-09-04','1996-11-20','DELIVER IN PERSON','FOB','e blithely after the carefully pending '
+5999971,33994,9001,3,49.00,94471.51,0.03,0.02,'N','O','1996-08-30','1996-08-27','1996-09-16','NONE','MAIL',' promise for the blithely r'
+5999971,132903,2904,4,19.00,36782.10,0.09,0.01,'N','O','1996-11-02','1996-09-02','1996-11-25','TAKE BACK RETURN','RAIL','ckly above the boldly '
+5999971,97922,7923,5,28.00,53757.76,0.10,0.06,'N','O','1996-09-20','1996-08-29','1996-10-05','TAKE BACK RETURN','REG AIR','place of the slyly quick pla'
+5999971,161882,6915,6,15.00,29158.20,0.03,0.00,'N','O','1996-10-19','1996-08-24','1996-10-28','DELIVER IN PERSON','RAIL','luffy theodolites nag boldly bli'
+5999972,102942,473,1,33.00,64183.02,0.06,0.05,'N','O','1996-05-26','1996-06-28','1996-05-29','NONE','MAIL','s maintain carefully among the'
+5999972,133109,8136,2,44.00,50252.40,0.08,0.00,'N','O','1996-05-24','1996-07-22','1996-05-27','COLLECT COD','RAIL',' the furiously express pearls. furi'
+5999972,152761,2762,3,3.00,5441.28,0.04,0.01,'N','O','1996-08-31','1996-06-02','1996-09-22','DELIVER IN PERSON','MAIL','sual accounts al'
+5999973,176345,1380,1,50.00,71067.00,0.04,0.01,'N','O','1997-07-27','1997-09-07','1997-08-10','TAKE BACK RETURN','FOB','gular excuses. '
+5999974,25360,5361,1,24.00,30848.64,0.02,0.03,'R','F','1993-08-15','1993-10-07','1993-09-01','COLLECT COD','MAIL','express dependencies. express, pendi'
+5999974,10463,5466,2,46.00,63179.16,0.08,0.06,'R','F','1993-09-16','1993-09-21','1993-10-02','COLLECT COD','RAIL','dolites wake'
+5999975,7272,2273,1,32.00,37736.64,0.07,0.01,'R','F','1993-10-07','1993-09-30','1993-10-21','COLLECT COD','REG AIR','tructions. excu'
+5999975,6452,1453,2,7.00,9509.15,0.04,0.00,'A','F','1993-11-02','1993-09-23','1993-11-19','DELIVER IN PERSON','SHIP','lar pinto beans aft'
+5999975,37131,2138,3,18.00,19226.34,0.04,0.01,'A','F','1993-11-17','1993-08-28','1993-12-08','DELIVER IN PERSON','FOB',', quick deposits. ironic, unusual deposi'
+6000000,32255,2256,1,5.00,5936.25,0.04,0.03,'N','O','1996-11-02','1996-11-19','1996-12-01','TAKE BACK RETURN','MAIL','carefully '
+6000000,96127,6128,2,28.00,31447.36,0.01,0.02,'N','O','1996-09-22','1996-10-01','1996-10-21','NONE','AIR','ooze furiously about the pe'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_orderkey > 3999950 and l_orderkey < 4000000
+---- RESULTS
+3999968,135583,5584,1,43.00,69598.94,0.02,0.07,'A','F','1993-10-03','1993-10-15','1993-10-22','TAKE BACK RETURN','MAIL','losely final dolphins alongside o'
+3999968,126704,9217,2,11.00,19037.70,0.04,0.07,'R','F','1993-11-27','1993-11-27','1993-12-16','NONE','RAIL','n ideas; enticing, expres'
+3999968,106138,6139,3,26.00,29747.38,0.02,0.01,'A','F','1993-09-29','1993-11-06','1993-10-17','COLLECT COD','SHIP','ng excuses are carefully'
+3999968,47967,2976,4,42.00,80428.32,0.03,0.05,'A','F','1993-09-27','1993-10-11','1993-10-08','COLLECT COD','RAIL','es use carefully with the silent '
+3999969,48082,3091,1,33.00,33992.64,0.06,0.02,'R','F','1993-10-13','1993-09-07','1993-11-12','NONE','SHIP','ronic accounts'
+3999970,50866,8382,1,36.00,65406.96,0.04,0.03,'N','O','1995-08-25','1995-09-16','1995-09-10','TAKE BACK RETURN','MAIL',' hockey players. fluffily even accounts'
+3999970,95116,135,2,12.00,13333.32,0.01,0.00,'N','O','1995-10-23','1995-11-07','1995-11-09','NONE','FOB','to the furiously regular deposits. s'
+3999971,64290,1809,1,8.00,10034.32,0.09,0.02,'R','F','1993-08-29','1993-07-17','1993-09-02','DELIVER IN PERSON','MAIL','lithely ironic reques'
+3999971,110171,172,2,29.00,34253.93,0.04,0.02,'R','F','1993-07-30','1993-08-08','1993-08-17','TAKE BACK RETURN','AIR','xpress requests. unusual accounts cajol'
+3999971,43521,8530,3,14.00,20503.28,0.06,0.06,'A','F','1993-07-11','1993-08-09','1993-07-24','NONE','RAIL','arefully pending theodolites. carefu'
+3999971,50396,2902,4,36.00,48470.04,0.00,0.01,'R','F','1993-06-02','1993-07-20','1993-06-21','DELIVER IN PERSON','RAIL','ironic, ironic deposits. ir'
+3999971,196039,6040,5,26.00,29510.78,0.07,0.02,'R','F','1993-08-08','1993-08-23','1993-08-09','COLLECT COD','SHIP','lithely blithely regular foxes. carefully'
+3999971,25817,5818,6,18.00,31370.58,0.02,0.06,'R','F','1993-06-15','1993-08-28','1993-07-12','COLLECT COD','AIR','dependencies wake c'
+3999972,118670,1182,1,7.00,11820.69,0.03,0.02,'N','O','1998-07-19','1998-06-28','1998-08-14','NONE','REG AIR','bove the furiously regular ideas haggle '
+3999972,86248,8757,2,9.00,11108.16,0.06,0.02,'N','O','1998-08-17','1998-06-09','1998-08-19','DELIVER IN PERSON','RAIL','ly final deposits wake fluffily ac'
+3999972,155174,5175,3,38.00,46708.46,0.08,0.01,'N','O','1998-05-13','1998-06-20','1998-06-04','COLLECT COD','RAIL','ans wake furiously. bl'
+3999973,184250,9287,1,30.00,40027.50,0.10,0.03,'R','F','1992-06-10','1992-07-05','1992-06-26','TAKE BACK RETURN','TRUCK',' sentiments sleep quickly after the blithel'
+3999973,75652,667,2,41.00,66733.65,0.02,0.08,'A','F','1992-07-25','1992-08-11','1992-08-21','TAKE BACK RETURN','RAIL','ithely ironic instructions. bold '
+3999973,81373,6390,3,36.00,48757.32,0.00,0.01,'R','F','1992-06-08','1992-06-28','1992-07-02','TAKE BACK RETURN','FOB','thely around the'
+3999973,33236,5740,4,13.00,15199.99,0.02,0.03,'R','F','1992-06-07','1992-07-03','1992-06-21','TAKE BACK RETURN','TRUCK','nal accounts. express requests snoo'
+3999973,192614,5134,5,36.00,61437.96,0.10,0.04,'R','F','1992-05-24','1992-07-08','1992-06-06','TAKE BACK RETURN','FOB','ly express platelets haggle'
+3999974,37722,2729,1,29.00,48131.88,0.01,0.05,'A','F','1994-12-18','1994-10-01','1994-12-23','DELIVER IN PERSON','FOB','pending packages u'
+3999974,191924,1925,2,14.00,28222.88,0.02,0.06,'R','F','1994-10-27','1994-10-18','1994-11-20','TAKE BACK RETURN','MAIL','the ironic '
+3999974,55547,558,3,33.00,49583.82,0.01,0.06,'R','F','1994-09-02','1994-10-09','1994-09-13','COLLECT COD','AIR','ar dependencies wake alongsid'
+3999974,165089,122,4,30.00,34622.40,0.02,0.07,'A','F','1994-11-04','1994-11-06','1994-11-25','TAKE BACK RETURN','SHIP','ding excuses. regular reque'
+3999975,11692,6695,1,42.00,67354.98,0.06,0.08,'N','O','1996-06-29','1996-07-23','1996-07-12','COLLECT COD','SHIP','carefully ironic deposits sl'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_partkey < 25 and l_partkey > 23
+---- RESULTS
+1096387,24,7525,3,15.00,13860.30,0.00,0.04,'A','F','1995-02-06','1995-02-19','1995-02-28','NONE','RAIL','r the furiously ironic packages; spec'
+1128579,24,2525,2,23.00,21252.46,0.00,0.02,'R','F','1994-07-03','1994-06-01','1994-07-08','TAKE BACK RETURN','TRUCK','lly ironic packages detect'
+1150310,24,25,4,29.00,26796.58,0.01,0.07,'N','O','1996-05-03','1996-04-16','1996-05-20','COLLECT COD','MAIL','eans sublate quickly'
+3245350,24,2525,3,38.00,35112.76,0.04,0.04,'N','O','1998-05-03','1998-03-17','1998-05-23','NONE','REG AIR','e furiously unusual ac'
+3301445,24,7525,5,7.00,6468.14,0.09,0.06,'N','O','1996-06-03','1996-07-02','1996-06-18','NONE','SHIP','detect carefully ironic, ironic d'
+3331584,24,5025,4,33.00,30492.66,0.06,0.07,'N','O','1997-10-03','1997-08-23','1997-10-31','TAKE BACK RETURN','RAIL','gular pinto beans cajole! quickl'
+2372164,24,2525,1,43.00,39732.86,0.04,0.06,'N','O','1997-12-09','1997-11-27','1998-01-07','NONE','RAIL','ake ruthlessly a'
+4450017,24,2525,1,11.00,10164.22,0.03,0.00,'N','O','1996-04-16','1996-06-12','1996-05-08','COLLECT COD','SHIP','ly even requests wake blithely'
+3490022,24,7525,6,32.00,29568.64,0.03,0.08,'N','O','1996-07-24','1996-07-19','1996-07-27','COLLECT COD','MAIL','ounts. final, special theodolites'
+2474144,24,25,2,40.00,36960.80,0.00,0.08,'A','F','1994-06-26','1994-07-05','1994-07-06','NONE','FOB','ages use blithel'
+5718278,24,2525,6,6.00,5544.12,0.07,0.05,'N','O','1996-08-12','1996-08-23','1996-08-28','TAKE BACK RETURN','REG AIR','o beans sleep. slyly express packages a'
+5745697,24,7525,2,30.00,27720.60,0.02,0.05,'N','O','1998-06-09','1998-05-23','1998-06-17','TAKE BACK RETURN','SHIP','r the regular, final excuses. fluffily bol'
+4663810,24,5025,2,19.00,17556.38,0.08,0.08,'N','O','1996-05-07','1996-02-23','1996-06-01','DELIVER IN PERSON','REG AIR','al pinto beans sublate permanently '
+1502112,24,25,1,22.00,20328.44,0.10,0.07,'R','F','1992-10-29','1992-11-11','1992-11-03','DELIVER IN PERSON','FOB','ideas affix slyly. enticing pinto beans ca'
+324771,24,25,3,21.00,19404.42,0.09,0.00,'N','O','1997-03-07','1997-04-06','1997-03-20','NONE','SHIP','uriously acro'
+3580387,24,25,1,8.00,7392.16,0.09,0.03,'A','F','1993-08-16','1993-09-16','1993-08-22','COLLECT COD','MAIL','into beans eat. furious'
+442278,24,5025,2,12.00,11088.24,0.08,0.05,'N','O','1998-05-28','1998-05-15','1998-06-18','TAKE BACK RETURN','SHIP','requests sleep blithely f'
+5949380,24,25,1,4.00,3696.08,0.01,0.02,'N','O','1998-07-30','1998-06-21','1998-07-31','TAKE BACK RETURN','SHIP','ding instruct'
+2796742,24,7525,1,35.00,32340.70,0.01,0.03,'R','F','1994-04-12','1994-06-07','1994-05-10','COLLECT COD','REG AIR','en, final instructions. '
+2842339,24,7525,5,47.00,43428.94,0.00,0.08,'N','O','1997-10-22','1997-08-25','1997-11-15','NONE','AIR','ickly express excuses. fluf'
+4908453,24,7525,5,10.00,9240.20,0.10,0.06,'A','F','1994-04-04','1994-05-03','1994-04-11','NONE','REG AIR','xpress theodolit'
+3845317,24,5025,7,14.00,12936.28,0.03,0.06,'N','O','1995-11-27','1995-12-01','1995-12-20','NONE','SHIP','tly bold packages sleep against the '
+3870983,24,7525,1,50.00,46201.00,0.08,0.01,'R','F','1993-12-31','1994-01-28','1994-01-07','NONE','REG AIR','nts sublate. quic'
+5040484,24,2525,1,8.00,7392.16,0.04,0.07,'R','F','1995-01-07','1994-12-10','1995-01-26','DELIVER IN PERSON','AIR','iously. regu'
+1853765,24,7525,2,4.00,3696.08,0.00,0.03,'N','O','1996-05-22','1996-05-11','1996-05-30','COLLECT COD','RAIL','totes affix slyly. unusua'
+3135236,24,2525,1,30.00,27720.60,0.04,0.05,'A','F','1992-07-18','1992-08-18','1992-08-16','COLLECT COD','AIR','Tiresias haggle packages. d'
+3954851,24,7525,1,29.00,26796.58,0.10,0.00,'N','O','1997-03-28','1997-05-25','1997-04-26','NONE','FOB',' bold accounts according to the'
+776295,24,25,2,46.00,42504.92,0.08,0.05,'A','F','1994-05-29','1994-06-19','1994-06-04','COLLECT COD','FOB','carefully alongside of t'
+4023489,24,25,1,38.00,35112.76,0.07,0.07,'N','O','1997-07-16','1997-06-06','1997-08-10','TAKE BACK RETURN','AIR','elets. furiously '
+5161861,24,5025,3,45.00,41580.90,0.04,0.08,'N','O','1996-06-04','1996-08-04','1996-06-29','TAKE BACK RETURN','RAIL','sual ideas integrate after the fluffily '
+1996481,24,5025,5,23.00,21252.46,0.04,0.02,'A','F','1995-05-21','1995-04-05','1995-06-07','COLLECT COD','TRUCK','ual requests ar'
+5239172,24,2525,7,37.00,34188.74,0.06,0.03,'N','O','1998-02-22','1998-03-23','1998-03-24','TAKE BACK RETURN','SHIP','encies. furiously regular reques'
+857093,24,2525,2,17.00,15708.34,0.04,0.03,'A','F','1993-01-08','1993-02-04','1993-01-31','NONE','RAIL','ely bold theodolites. special packages '
+908449,24,7525,1,43.00,39732.86,0.09,0.08,'R','F','1993-12-10','1993-11-24','1994-01-06','NONE','MAIL','es affix carefully'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_extendedprice > 104000 and l_extendedprice < 104100
+---- RESULTS
+4326275,195986,8506,3,50.00,104099.00,0.06,0.05,'R','F','1994-06-12','1994-07-15','1994-06-24','NONE','FOB','thely final ideas. foxes'
+4366983,185996,3551,6,50.00,104099.50,0.01,0.05,'R','F','1993-01-16','1993-02-26','1993-01-25','COLLECT COD','RAIL','ructions. evenly bold theodolit'
+4392259,198983,8984,4,50.00,104099.00,0.03,0.01,'N','O','1998-08-13','1998-08-14','1998-09-12','TAKE BACK RETURN','FOB','ependencies wake carefully alongside of '
+1191105,185995,5996,6,50.00,104049.50,0.04,0.02,'R','F','1994-10-12','1994-11-07','1994-10-24','DELIVER IN PERSON','TRUCK','iously packages. unusual, e'
+2482885,190991,3511,2,50.00,104099.50,0.04,0.04,'N','O','1998-02-02','1997-12-15','1998-02-19','NONE','TRUCK','ainst the slyly final '
+338118,192989,2990,2,50.00,104099.00,0.10,0.08,'N','O','1996-01-28','1995-12-21','1996-02-14','NONE','REG AIR','ajole fluffily qui'
+1567104,193987,3988,4,50.00,104049.00,0.03,0.02,'R','F','1992-03-28','1992-04-17','1992-04-11','NONE','MAIL',' regular pains. furiously daring Tiresia'
+1587110,187993,3030,4,50.00,104049.50,0.08,0.05,'N','O','1996-01-27','1996-01-27','1996-02-09','NONE','TRUCK','p blithely silent dugouts. '
+3890689,184997,2552,3,50.00,104099.50,0.07,0.02,'N','O','1997-04-17','1997-03-29','1997-04-28','TAKE BACK RETURN','AIR',' carefully bold deposits nag qu'
+3912326,194986,7506,2,50.00,104049.00,0.01,0.01,'N','F','1995-06-16','1995-05-03','1995-06-25','DELIVER IN PERSON','SHIP','furiously '
+699526,198982,6540,4,50.00,104049.00,0.01,0.05,'N','O','1998-07-26','1998-07-04','1998-08-06','NONE','AIR',' excuses nag blithely against th'
+1756229,197983,7984,3,50.00,104049.00,0.08,0.02,'N','O','1998-06-10','1998-05-03','1998-07-10','DELIVER IN PERSON','AIR','tes alongside of'
+5926503,192989,547,6,50.00,104099.00,0.05,0.00,'N','O','1995-08-24','1995-09-09','1995-08-29','NONE','FOB','jole enticingly careful'
+5014279,194987,2545,3,50.00,104099.00,0.04,0.03,'A','F','1994-02-24','1994-04-17','1994-03-09','TAKE BACK RETURN','TRUCK','its. even, p'
+3997415,187993,3030,1,50.00,104049.50,0.09,0.05,'N','O','1997-03-23','1997-03-09','1997-04-01','NONE','AIR','nal ideas ac'
+910432,182998,8035,3,50.00,104049.50,0.05,0.05,'N','F','1995-06-07','1995-06-16','1995-07-03','NONE','AIR','ross the carefully pending foxes. blithely '
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_comment > 'zzle' and l_quantity < 5
+---- RESULTS
+15046,37324,9828,2,3.00,3783.96,0.03,0.07,'N','O','1997-12-27','1997-11-15','1998-01-19','NONE','MAIL','zzle furiously iron'
+3313831,33790,8797,1,2.00,3447.58,0.02,0.03,'N','O','1997-08-18','1997-10-08','1997-09-03','DELIVER IN PERSON','SHIP','zzle fluffily regular din'
+3345767,116221,3755,1,4.00,4948.88,0.02,0.08,'A','F','1992-07-11','1992-06-19','1992-08-01','DELIVER IN PERSON','REG AIR','zzle furiously pending requests. slyly fina'
+4396290,149387,4416,6,4.00,5745.52,0.04,0.07,'A','F','1993-01-19','1992-12-08','1993-01-26','DELIVER IN PERSON','REG AIR','zzle slyly slyly final excuses. expres'
+4434754,179382,9383,5,3.00,4384.14,0.05,0.07,'A','F','1995-03-02','1995-02-22','1995-03-21','COLLECT COD','MAIL','zzle quickly above the e'
+3384549,33300,8307,6,2.00,2466.60,0.01,0.04,'N','O','1996-12-12','1996-10-08','1996-12-15','TAKE BACK RETURN','TRUCK','zzle enticingly ruthless accounts. pending,'
+3418016,43648,1161,5,4.00,6366.56,0.09,0.04,'N','O','1996-10-08','1996-09-08','1996-11-07','TAKE BACK RETURN','TRUCK','zzle against the theodo'
+222145,56727,4243,1,2.00,3367.44,0.00,0.06,'A','F','1994-09-06','1994-10-10','1994-09-23','TAKE BACK RETURN','MAIL','zzle slyly even ideas. unus'
+3480997,155559,590,2,4.00,6458.20,0.02,0.00,'A','F','1992-05-25','1992-06-26','1992-06-10','DELIVER IN PERSON','RAIL','zzle quickly even requests. ironic ins'
+3492800,143471,8500,5,4.00,6057.88,0.10,0.08,'N','O','1995-06-23','1995-05-30','1995-07-12','NONE','FOB','zzle carefully at t'
+4473636,4016,4017,2,3.00,2760.03,0.00,0.03,'A','F','1994-06-18','1994-07-29','1994-07-01','COLLECT COD','RAIL','zzle blithely. silently bold exc'
+357157,95395,7905,5,4.00,5561.56,0.08,0.07,'N','O','1998-08-13','1998-10-14','1998-08-19','DELIVER IN PERSON','AIR','zzle evenly re'
+3625573,188624,3661,1,1.00,1712.62,0.03,0.02,'A','F','1994-03-29','1994-01-07','1994-04-07','DELIVER IN PERSON','MAIL','zzle after the regular foxes; final'
+2356514,153502,6018,2,3.00,4666.50,0.07,0.05,'N','O','1998-07-13','1998-07-26','1998-07-27','NONE','TRUCK','zzle quickly. requests caj'
+2407269,98969,8970,1,4.00,7871.84,0.00,0.05,'A','F','1993-03-11','1993-01-24','1993-04-06','DELIVER IN PERSON','AIR','zzle. regular, special theodolites wake. bl'
+5611046,158172,3203,6,3.00,3690.51,0.03,0.02,'A','F','1992-07-09','1992-05-10','1992-08-01','TAKE BACK RETURN','TRUCK','zzle after the slyly express req'
+5637252,98651,3670,3,2.00,3299.30,0.01,0.07,'A','F','1992-09-25','1992-07-19','1992-10-02','DELIVER IN PERSON','REG AIR','zzle quickly slyly ironic'
+3649697,170341,2859,5,4.00,5645.36,0.01,0.02,'A','F','1992-05-24','1992-05-17','1992-06-03','COLLECT COD','AIR','zzle. foxes cajole quickly according'
+453126,89720,4737,2,4.00,6838.88,0.04,0.00,'R','F','1995-04-11','1995-05-26','1995-05-11','DELIVER IN PERSON','REG AIR','zzle according to the blithely ev'
+3732899,122611,148,2,2.00,3267.22,0.10,0.01,'R','F','1994-07-07','1994-08-25','1994-07-17','DELIVER IN PERSON','MAIL','zzle slyly bol'
+4863458,77848,7849,2,1.00,1825.84,0.10,0.02,'N','O','1996-12-03','1996-11-11','1996-12-07','DELIVER IN PERSON','RAIL','zzle blithely. ruthlessly b'
+4887207,108221,8222,2,1.00,1229.22,0.09,0.00,'R','F','1995-02-27','1995-04-08','1995-03-06','COLLECT COD','MAIL','zzle silent pinto bea'
+4898439,149853,4882,5,4.00,7611.40,0.07,0.08,'N','O','1998-04-30','1998-05-20','1998-05-08','COLLECT COD','FOB','zzle slyly after the carefully'
+1764517,192822,380,4,2.00,3829.64,0.09,0.06,'A','F','1994-09-03','1994-07-26','1994-10-03','NONE','FOB','zzle about the ideas. special packages'
+4038850,191403,6442,3,3.00,4483.20,0.09,0.01,'A','F','1993-05-30','1993-05-09','1993-06-18','TAKE BACK RETURN','MAIL','zzle slyly about the'
+5791713,8767,1268,4,3.00,5027.28,0.01,0.00,'N','O','1997-06-25','1997-07-30','1997-07-05','TAKE BACK RETURN','RAIL','zzle furiously above the ironic, re'
+2689120,87143,2160,3,1.00,1130.14,0.05,0.06,'A','F','1994-12-10','1994-10-25','1994-12-19','COLLECT COD','AIR','zzle above the even foxe'
+5908418,155066,2612,2,1.00,1121.06,0.01,0.05,'N','O','1997-03-20','1997-02-04','1997-04-09','COLLECT COD','TRUCK','zzle express ideas. slyly i'
+876549,96206,6207,6,2.00,2404.40,0.01,0.05,'R','F','1994-01-15','1994-02-14','1994-01-16','NONE','REG AIR','zzle slyly blithely express epitaphs. req'
+898788,62837,2838,2,2.00,3599.66,0.08,0.01,'N','O','1998-07-26','1998-05-29','1998-08-11','NONE','REG AIR','zzle blithely regular foxes. sp'
+903459,182031,4550,1,3.00,3339.09,0.02,0.04,'R','F','1995-04-18','1995-05-20','1995-05-08','NONE','REG AIR','zzle blithely bravely unusual instr'
+928898,106264,8775,6,1.00,1270.26,0.10,0.00,'R','F','1994-10-18','1994-10-03','1994-11-12','DELIVER IN PERSON','RAIL','zzle fluffily furiou'
+4193893,83860,8877,2,2.00,3687.72,0.06,0.06,'R','F','1993-09-30','1993-11-06','1993-10-11','COLLECT COD','FOB','zzle carefully'
+5937603,197827,7828,3,2.00,3849.64,0.03,0.00,'A','F','1994-07-01','1994-07-30','1994-07-31','TAKE BACK RETURN','RAIL','zzle quickly regular foxes. silent pains '
+2776164,19867,2369,1,4.00,7147.44,0.03,0.03,'A','F','1993-01-16','1992-12-01','1993-01-21','DELIVER IN PERSON','FOB','zzle after the accounts. blithely bold'
+2860996,15745,5746,1,3.00,4982.22,0.02,0.07,'N','O','1996-05-02','1996-03-29','1996-05-16','TAKE BACK RETURN','RAIL','zzle blithely regular reque'
+4211265,54718,4719,7,4.00,6690.84,0.06,0.05,'R','F','1993-08-16','1993-09-16','1993-09-05','NONE','MAIL','zzle carefully alongside of the pendin'
+1013090,96470,6471,1,2.00,2932.94,0.08,0.04,'N','O','1997-08-14','1997-09-08','1997-08-25','TAKE BACK RETURN','FOB','zzle slyly alongside '
+1936391,87326,2343,2,2.00,2626.64,0.04,0.04,'A','F','1993-09-12','1993-08-20','1993-09-14','NONE','AIR','zzle carefully against the '
+5179426,158940,1456,5,2.00,3997.88,0.10,0.05,'A','F','1992-12-23','1992-11-12','1992-12-29','COLLECT COD','MAIL','zzle blithely requests. regular, regular'
+5190949,30001,5008,4,4.00,3724.00,0.07,0.04,'R','F','1992-06-20','1992-06-07','1992-06-28','TAKE BACK RETURN','AIR','zzle along the even, pending '
+2014983,32060,9570,1,2.00,1984.12,0.04,0.04,'N','O','1998-07-16','1998-09-02','1998-08-06','NONE','REG AIR','zzle instructions. carefully regular '
+5330853,22283,7288,1,4.00,4821.12,0.09,0.00,'R','F','1992-08-03','1992-08-21','1992-08-13','DELIVER IN PERSON','RAIL','zzle fluffily a'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_extendedprice < 910
+---- RESULTS
+5400866,7001,9502,1,1.00,908.00,0.10,0.03,'N','O','1998-01-24','1998-03-26','1998-02-01','COLLECT COD','TRUCK',' accounts. slyly special platelets nag fl'
+2243878,3004,8005,5,1.00,907.00,0.01,0.06,'N','O','1998-06-11','1998-05-14','1998-06-14','DELIVER IN PERSON','REG AIR',' slyly bold packages wake quickly regu'
+3322496,3003,5504,6,1.00,906.00,0.07,0.00,'R','F','1994-09-25','1994-08-11','1994-09-27','TAKE BACK RETURN','TRUCK','ss, regular accounts wake furiously on the '
+201607,4003,4004,6,1.00,907.00,0.02,0.02,'R','F','1993-02-11','1993-02-12','1993-03-10','COLLECT COD','AIR','y unusual packages. blithely regular dino'
+4549287,2006,4507,1,1.00,908.00,0.06,0.02,'N','O','1995-08-22','1995-10-17','1995-09-18','TAKE BACK RETURN','RAIL','e slyly fi'
+5696710,7001,9502,3,1.00,908.00,0.02,0.07,'N','O','1995-09-17','1995-09-29','1995-10-16','COLLECT COD','TRUCK','ove the slyly'
+2506752,4000,6501,4,1.00,904.00,0.10,0.02,'A','F','1992-10-06','1992-11-12','1992-10-19','DELIVER IN PERSON','REG AIR','play. ironic asymptotes '
+309573,4000,9001,2,1.00,904.00,0.09,0.04,'R','F','1993-11-05','1993-08-12','1993-11-28','TAKE BACK RETURN','MAIL','furiously pending req'
+407206,3002,8003,1,1.00,905.00,0.08,0.08,'N','O','1996-07-01','1996-05-24','1996-07-04','TAKE BACK RETURN','REG AIR','arefully even requests '
+1409604,2005,2006,3,1.00,907.00,0.10,0.05,'R','F','1992-09-13','1992-09-08','1992-09-22','COLLECT COD','FOB','ously ironic attain'
+4685351,2002,4503,2,1.00,904.00,0.10,0.03,'N','O','1996-08-30','1996-06-18','1996-09-05','COLLECT COD','AIR','ironic platelet'
+5911524,7,2508,3,1.00,907.00,0.00,0.06,'A','F','1995-01-23','1994-12-14','1995-02-18','TAKE BACK RETURN','AIR',' even asymptotes. ironic de'
+3672610,5001,7502,4,1.00,906.00,0.07,0.07,'R','F','1992-09-23','1992-10-13','1992-10-07','TAKE BACK RETURN','AIR','cajole according to the careful'
+505280,8,7509,4,1.00,908.00,0.06,0.05,'N','O','1996-01-07','1996-01-16','1996-02-06','NONE','MAIL',' regular pi'
+529024,2002,2003,4,1.00,904.00,0.09,0.02,'N','O','1998-03-29','1998-04-26','1998-04-25','COLLECT COD','AIR','press grouc'
+2888709,6,5007,3,1.00,906.00,0.10,0.06,'N','O','1996-10-21','1996-08-12','1996-11-18','TAKE BACK RETURN','RAIL','arefully even theodolites. deposits use'
+2891650,2003,4504,1,1.00,905.00,0.04,0.04,'N','O','1997-07-16','1997-09-21','1997-08-11','COLLECT COD','MAIL','e unusual foxes detect stealthy'
+1607271,8001,3002,1,1.00,909.00,0.03,0.05,'A','F','1992-02-06','1992-03-04','1992-02-10','TAKE BACK RETURN','AIR','y unusual pinto beans boost pending pint'
+599361,1,5002,7,1.00,901.00,0.05,0.01,'N','O','1998-04-28','1998-05-23','1998-05-26','DELIVER IN PERSON','AIR','lithely bold packages sleep fluffily. f'
+3889155,5001,2,4,1.00,906.00,0.05,0.07,'R','F','1992-03-09','1992-05-27','1992-03-22','COLLECT COD','FOB','thely across the final requests. quic'
+2993222,6001,6002,5,1.00,907.00,0.07,0.07,'N','O','1998-05-10','1998-03-19','1998-05-11','DELIVER IN PERSON','RAIL','ckly! slyly'
+3013575,4,5,5,1.00,904.00,0.08,0.06,'N','O','1998-05-10','1998-06-07','1998-05-24','COLLECT COD','RAIL','ress foxes. carefully special dependencies'
+1669568,2002,7003,1,1.00,904.00,0.07,0.00,'N','O','1995-08-27','1995-09-18','1995-09-17','DELIVER IN PERSON','MAIL','ously final requests breach slyly sile'
+1769571,9000,1501,3,1.00,909.00,0.05,0.02,'R','F','1992-08-08','1992-06-20','1992-08-10','DELIVER IN PERSON','AIR','ideas cajole slyly bold pain'
+1796167,1006,3507,6,1.00,907.00,0.07,0.08,'A','F','1994-09-09','1994-10-16','1994-09-23','DELIVER IN PERSON','FOB','es are fluffil'
+1803334,1004,1005,7,1.00,905.00,0.10,0.08,'A','F','1994-05-31','1994-04-19','1994-06-12','NONE','TRUCK','tithes. eve'
+5071588,3000,501,2,1.00,903.00,0.05,0.08,'N','O','1997-07-02','1997-08-26','1997-07-16','COLLECT COD','TRUCK',' blithely after the blithely regu'
+5109957,5002,5003,2,1.00,907.00,0.04,0.07,'N','O','1998-04-18','1998-04-01','1998-05-03','TAKE BACK RETURN','MAIL','equests haggle '
+856807,2005,7006,4,1.00,907.00,0.03,0.02,'N','O','1997-12-03','1997-12-26','1997-12-07','COLLECT COD','REG AIR','ironic, reg'
+943879,3002,8003,2,1.00,905.00,0.06,0.04,'N','O','1995-09-21','1995-09-05','1995-10-06','NONE','MAIL','d the dolphins. furiously final deposits '
+5143585,4,7505,4,1.00,904.00,0.07,0.02,'A','F','1992-02-05','1992-02-24','1992-02-25','TAKE BACK RETURN','FOB','y special accounts. ironic'
+5229153,3005,5506,6,1.00,908.00,0.08,0.00,'A','F','1993-11-05','1993-11-06','1993-11-09','DELIVER IN PERSON','AIR','nts. even instructions are ironi'
+5286247,5003,5004,3,1.00,908.00,0.04,0.02,'N','O','1995-10-18','1995-11-19','1995-10-25','NONE','TRUCK','y even frays. final requests wak'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem
+where l_shipdate < '1994' and l_shipdate > '1993-11-31' and
+      l_extendedprice < 988 and l_extendedprice > 986 and l_partkey < 88
+---- RESULTS
+5347525,86,5087,3,1.00,986.08,0.08,0.07,'R','F','1993-12-14','1994-02-07','1994-01-01','TAKE BACK RETURN','TRUCK','nments haggle carefully quickly spe'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_receiptdate = '1998-12-30'
+---- RESULTS
+2357827,110874,8408,2,12.00,22618.44,0.08,0.05,'N','O','1998-12-01','1998-10-22','1998-12-30','COLLECT COD','RAIL','. fluffily quiet theodolites above the exp'
+3797283,3820,6321,1,43.00,74124.26,0.07,0.08,'N','O','1998-11-30','1998-10-07','1998-12-30','NONE','FOB','ake carefully. quick'
+2949666,73956,6464,2,25.00,48248.75,0.03,0.08,'N','O','1998-12-01','1998-10-30','1998-12-30','NONE','MAIL','. blithely ironic'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem where l_commitdate = '1992-01-31'
+---- RESULTS
+4396228,184608,7127,1,24.00,40622.40,0.10,0.05,'A','F','1992-04-08','1992-01-31','1992-04-19','COLLECT COD','MAIL','st the slyly final platelets. regular de'
+1203046,182171,4690,3,36.00,45114.12,0.01,0.02,'R','F','1992-04-08','1992-01-31','1992-05-02','DELIVER IN PERSON','REG AIR','ead of the final p'
+287971,100055,56,1,34.00,35871.70,0.08,0.08,'A','F','1992-04-24','1992-01-31','1992-05-14','NONE','RAIL',' furiously ironic, final dependencies. sl'
+2153317,126942,1967,2,21.00,41347.74,0.01,0.01,'R','F','1992-04-17','1992-01-31','1992-05-05','TAKE BACK RETURN','AIR','ly pending packages. quickly unusual pa'
+5364544,188754,3791,2,37.00,68181.75,0.07,0.02,'A','F','1992-03-11','1992-01-31','1992-03-12','TAKE BACK RETURN','REG AIR','de of the ironic depe'
+5497253,73137,659,3,35.00,38854.55,0.06,0.08,'A','F','1992-02-20','1992-01-31','1992-03-04','TAKE BACK RETURN','RAIL',' final foxes are fluff'
+1249954,105591,612,4,40.00,63863.60,0.01,0.06,'R','F','1992-02-19','1992-01-31','1992-03-09','DELIVER IN PERSON','SHIP','ts boost furiously. ironic instr'
+1304070,45028,7533,1,7.00,6811.14,0.02,0.02,'R','F','1992-04-25','1992-01-31','1992-04-28','DELIVER IN PERSON','AIR','ual ideas. slyly final instr'
+3557312,41542,9055,3,44.00,65275.76,0.06,0.00,'A','F','1992-02-29','1992-01-31','1992-03-09','COLLECT COD','TRUCK','osits. fluffily express account'
+343335,156215,1246,6,3.00,3813.63,0.02,0.02,'R','F','1992-03-16','1992-01-31','1992-03-26','COLLECT COD','RAIL','kages wake slyly expre'
+433446,135091,7605,4,6.00,6756.54,0.06,0.01,'A','F','1992-02-08','1992-01-31','1992-02-16','NONE','SHIP','e ironic deposits wake f'
+2335747,99615,9616,6,13.00,20989.93,0.10,0.01,'A','F','1992-04-07','1992-01-31','1992-04-23','DELIVER IN PERSON','AIR','al ideas must have to boost furiously thin'
+470693,45925,3438,4,42.00,78578.64,0.03,0.06,'R','F','1992-03-19','1992-01-31','1992-04-02','DELIVER IN PERSON','AIR','s wake slyly final waters. '
+3741124,89320,9321,3,31.00,40588.92,0.01,0.04,'R','F','1992-03-26','1992-01-31','1992-04-15','DELIVER IN PERSON','RAIL','. unusual, silent sentiments cajole qui'
+571586,76488,1503,2,21.00,30754.08,0.08,0.00,'R','F','1992-02-28','1992-01-31','1992-03-26','COLLECT COD','TRUCK','ongside of the '
+5670564,56165,6166,4,49.00,54936.84,0.04,0.05,'R','F','1992-03-15','1992-01-31','1992-03-30','TAKE BACK RETURN','MAIL','efully pending deposits. furiously '
+2559431,191885,4405,5,36.00,71167.68,0.04,0.05,'A','F','1992-04-19','1992-01-31','1992-05-14','DELIVER IN PERSON','AIR','requests sleep carefully'
+2559431,7343,2344,6,33.00,41261.22,0.02,0.04,'R','F','1992-01-05','1992-01-31','1992-01-31','DELIVER IN PERSON','FOB','s nag quick'
+1545379,24148,1655,1,3.00,3216.42,0.03,0.00,'A','F','1992-03-08','1992-01-31','1992-03-15','TAKE BACK RETURN','REG AIR',' wake across the requests. carefully pend'
+1545379,52219,4725,2,8.00,9369.68,0.00,0.01,'A','F','1992-04-28','1992-01-31','1992-04-30','COLLECT COD','SHIP','riously final foxes. blithely final '
+1586021,9636,9637,1,9.00,13910.67,0.02,0.00,'A','F','1992-02-10','1992-01-31','1992-02-12','TAKE BACK RETURN','TRUCK','pearls cajole ac'
+4812032,161931,4448,5,22.00,43844.46,0.05,0.00,'R','F','1992-04-18','1992-01-31','1992-05-08','NONE','FOB','ctions cajole about '
+575495,94198,9217,4,14.00,16690.66,0.00,0.04,'R','F','1992-04-09','1992-01-31','1992-04-13','COLLECT COD','RAIL','tructions. unus'
+3804260,94311,9330,3,44.00,57433.64,0.09,0.05,'R','F','1992-03-29','1992-01-31','1992-04-03','DELIVER IN PERSON','AIR',' forges. slyly final instructions hagg'
+2653089,50040,2546,3,46.00,45541.84,0.01,0.07,'A','F','1992-01-06','1992-01-31','1992-02-05','TAKE BACK RETURN','MAIL',' asymptotes.'
+5854464,74338,6846,5,31.00,40682.23,0.05,0.04,'R','F','1992-01-13','1992-01-31','1992-02-03','DELIVER IN PERSON','TRUCK','theodolites sl'
+4859008,24294,9299,5,2.00,2436.58,0.02,0.08,'R','F','1992-03-20','1992-01-31','1992-04-05','DELIVER IN PERSON','TRUCK','d deposits cajole slyly accounts. ex'
+717701,153080,626,1,4.00,4532.32,0.09,0.03,'R','F','1992-01-13','1992-01-31','1992-01-24','TAKE BACK RETURN','AIR','ithely special theodolite'
+3958784,44413,6918,2,31.00,42079.71,0.06,0.04,'A','F','1992-01-13','1992-01-31','1992-02-12','TAKE BACK RETURN','REG AIR','sly above the blit'
+4010469,55374,5375,6,22.00,29246.14,0.06,0.05,'R','F','1992-03-31','1992-01-31','1992-04-30','DELIVER IN PERSON','AIR','oys. carefully final pinto beans are s'
+3001924,110252,7786,3,28.00,35343.00,0.06,0.04,'A','F','1992-03-05','1992-01-31','1992-03-23','DELIVER IN PERSON','SHIP','e quickly even depos'
+3142627,191818,1819,1,48.00,91670.88,0.00,0.03,'R','F','1992-01-15','1992-01-31','1992-02-13','COLLECT COD','FOB','final pinto beans. furiously unusu'
+874625,166022,1055,3,11.00,11968.22,0.03,0.04,'A','F','1992-04-24','1992-01-31','1992-04-26','COLLECT COD','TRUCK',' players along the carefully bold package'
+4227367,66843,9350,3,48.00,86872.32,0.06,0.02,'R','F','1992-02-02','1992-01-31','1992-02-13','TAKE BACK RETURN','RAIL','aggle blithely according to the regular '
+1049856,83433,8450,4,45.00,63739.35,0.04,0.02,'R','F','1992-04-26','1992-01-31','1992-05-25','COLLECT COD','FOB','uriously regular deposits cajole quick'
+4262469,73143,665,2,43.00,47994.02,0.10,0.04,'R','F','1992-01-25','1992-01-31','1992-02-10','TAKE BACK RETURN','AIR','nic pinto beans. requests across the fluf'
+2069187,112268,2269,2,39.00,49930.14,0.01,0.03,'R','F','1992-02-08','1992-01-31','1992-02-12','DELIVER IN PERSON','MAIL','blithely? blithely pending ideas use. pen'
+2138215,151143,3659,1,48.00,57318.72,0.02,0.01,'A','F','1992-04-04','1992-01-31','1992-05-02','NONE','FOB','kly according to the thin i'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+select * from tpch_parquet.lineitem
+where l_commitdate = '1992-01-31' and l_orderkey > 5000000
+---- RESULTS
+5364544,188754,3791,2,37.00,68181.75,0.07,0.02,'A','F','1992-03-11','1992-01-31','1992-03-12','TAKE BACK RETURN','REG AIR','de of the ironic depe'
+5497253,73137,659,3,35.00,38854.55,0.06,0.08,'A','F','1992-02-20','1992-01-31','1992-03-04','TAKE BACK RETURN','RAIL',' final foxes are fluff'
+5670564,56165,6166,4,49.00,54936.84,0.04,0.05,'R','F','1992-03-15','1992-01-31','1992-03-30','TAKE BACK RETURN','MAIL','efully pending deposits. furiously '
+5854464,74338,6846,5,31.00,40682.23,0.05,0.04,'R','F','1992-01-13','1992-01-31','1992-02-03','DELIVER IN PERSON','TRUCK','theodolites sl'
+---- TYPES
+BIGINT, BIGINT, BIGINT, INT, DECIMAL, DECIMAL, DECIMAL, DECIMAL, STRING, STRING, STRING, STRING, STRING, STRING, STRING, STRING
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
new file mode 100644
index 0000000..47c4eaa
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/parquet-page-index.test
@@ -0,0 +1,219 @@
+# These tests check that the page selection and value-skipping logic works well when using
+# the page index of the Parquet file. 'decimals_1_10' contains tiny, misaligned pages and
+# some NULL values. Column 'd_10' has one value per page, while column 'd_1' has five
+# values per page. Thus, with putting predicates on column 'd_10' we can craft different
+# test cases for value skipping in 'd_1'.
+====
+---- QUERY
+# 'd_10 = 1' selects the first row from each page. Therefore in the pages of 'd_1' we
+# read the first value, then skip all the rest.
+select * from decimals_1_10 where d_10 = 1
+---- RESULTS
+1,1
+NULL,1
+1,1
+1,1
+1,1
+1,1
+NULL,1
+1,1
+1,1
+NULL,1
+1,1
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 50
+====
+---- QUERY
+# Selecting the second rows of the pages of 'd_1', skipping values before and after.
+select * from decimals_1_10 where d_10 = 2
+---- RESULTS
+2,2
+2,2
+NULL,2
+2,2
+2,2
+2,2
+NULL,2
+NULL,2
+2,2
+2,2
+2,2
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 50
+====
+---- QUERY
+# Selecting the third rows of the pages of 'd_1', skipping values before and after.
+select * from decimals_1_10 where d_10 = 3
+---- RESULTS
+3,3
+3,3
+3,3
+NULL,3
+3,3
+3,3
+3,3
+3,3
+3,3
+NULL,3
+3,3
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 50
+====
+---- QUERY
+# Selecting the fourth rows of the pages of 'd_1', skipping values before and after.
+select * from decimals_1_10 where d_10 = 4
+---- RESULTS
+4,4
+4,4
+4,4
+4,4
+NULL,4
+4,4
+4,4
+NULL,4
+NULL,4
+NULL,4
+4,4
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 50
+====
+---- QUERY
+# 'd_10 = 5' selects the last row from each page. Therefore in the pages of 'd_1' we
+# skip the first four values, then read the last.
+select * from decimals_1_10 where d_10 = 5
+---- RESULTS
+5,5
+5,5
+5,5
+5,5
+5,5
+NULL,5
+5,5
+5,5
+NULL,5
+5,5
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 52
+====
+---- QUERY
+# Selecting the first couple of rows from the pages of 'd_1'. Skips last rows.
+select * from decimals_1_10 where d_10 < 3
+---- RESULTS
+1,1
+2,2
+NULL,1
+2,2
+1,1
+NULL,2
+1,1
+2,2
+1,1
+2,2
+1,1
+2,2
+NULL,1
+NULL,2
+1,1
+NULL,2
+1,1
+2,2
+NULL,1
+2,2
+1,1
+2,2
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 39
+====
+---- QUERY
+# Selecting the last couple of rows from the pages of 'd_1'. Skips first rows.
+select * from decimals_1_10 where d_10 > 2
+---- RESULTS
+3,3
+4,4
+5,5
+3,3
+4,4
+5,5
+3,3
+4,4
+5,5
+NULL,3
+4,4
+5,5
+3,3
+NULL,4
+5,5
+3,3
+4,4
+NULL,5
+3,3
+4,4
+5,5
+3,3
+NULL,4
+5,5
+3,3
+NULL,4
+NULL,5
+NULL,3
+NULL,4
+5,5
+7,7
+8,8
+9,9
+8,8
+7,7
+3,3
+4,4
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 23
+====
+---- QUERY
+# Skipping middle row in a page.
+select * from decimals_1_10 where d_10 > 5 and d_10 < 9
+---- RESULTS
+7,7
+8,8
+8,8
+7,7
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 67
+====
+---- QUERY
+# Only reading middle rows in a page.
+select * from decimals_1_10 where d_10 > 7
+---- RESULTS
+8,8
+9,9
+8,8
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredPages): 68
+====
+---- QUERY
+# Row group level minimum is 1, maximum is 9. But there is a gap between the pages,
+# therefore with page-level statistics we can filter out the whole row group.
+select * from decimals_1_10 where d_10 = 6
+---- RESULTS
+---- TYPES
+DECIMAL, DECIMAL
+---- RUNTIME_PROFILE
+aggregation(SUM, NumStatsFilteredRowGroups): 1
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test b/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
index c784899..1ae0ea7 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/stats-extrapolation.test
@@ -34,16 +34,16 @@ YEAR, MONTH, #ROWS, EXTRAP #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION,
 ---- RESULTS
 '2009','1',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=1'
 '2009','2',-1,289,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=2'
-'2009','3',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=3'
+'2009','3',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=3'
 '2009','4',-1,302,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=4'
-'2009','5',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=5'
+'2009','5',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=5'
 '2009','6',-1,302,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=6'
-'2009','7',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=7'
-'2009','8',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=8'
+'2009','7',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=7'
+'2009','8',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=8'
 '2009','9',-1,302,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=9'
-'2009','10',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=10'
+'2009','10',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=10'
 '2009','11',-1,302,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=11'
-'2009','12',-1,308,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=12'
+'2009','12',-1,307,1,regex:.*B,'NOT CACHED','NOT CACHED','PARQUET','false','$NAMENODE/test-warehouse/$DATABASE.db/alltypes/year=2009/month=12'
 'Total','',3650,3650,12,regex:.*B,'0B','','','',''
 ---- TYPES
 STRING,STRING,BIGINT,BIGINT,BIGINT,STRING,STRING,STRING,STRING,STRING,STRING
diff --git a/tests/query_test/test_parquet_stats.py b/tests/query_test/test_parquet_stats.py
index cb35653..a64cb88 100644
--- a/tests/query_test/test_parquet_stats.py
+++ b/tests/query_test/test_parquet_stats.py
@@ -73,3 +73,27 @@ class TestParquetStats(ImpalaTestSuite):
     NaNs, therefore we need to ignore them"""
     create_table_from_parquet(self.client, unique_database, 'min_max_is_nan')
     self.run_test_case('QueryTest/parquet-invalid-minmax-stats', vector, unique_database)
+
+  def test_page_index(self, vector, unique_database):
+    """Test that using the Parquet page index works well. The various test files
+    contain queries that exercise the page selection and value-skipping logic against
+    columns with different types and encodings."""
+    create_table_from_parquet(self.client, unique_database, 'decimals_1_10')
+    create_table_from_parquet(self.client, unique_database, 'nested_decimals')
+    create_table_from_parquet(self.client, unique_database, 'double_nested_decimals')
+    create_table_from_parquet(self.client, unique_database, 'alltypes_tiny_pages')
+    create_table_from_parquet(self.client, unique_database, 'alltypes_tiny_pages_plain')
+
+    for batch_size in [0, 1]:
+      vector.get_value('exec_option')['batch_size'] = batch_size
+      self.run_test_case('QueryTest/parquet-page-index', vector, unique_database)
+      self.run_test_case('QueryTest/nested-types-parquet-page-index', vector,
+                         unique_database)
+      self.run_test_case('QueryTest/parquet-page-index-alltypes-tiny-pages', vector,
+                         unique_database)
+      self.run_test_case('QueryTest/parquet-page-index-alltypes-tiny-pages-plain', vector,
+                         unique_database)
+
+    for batch_size in [0, 32]:
+      vector.get_value('exec_option')['batch_size'] = batch_size
+      self.run_test_case('QueryTest/parquet-page-index-large', vector, unique_database)


[impala] 01/02: Drop statestore update frequency during data loading

Posted by bo...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

boroknagyz pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 9075099c27f68e4a9fd35c6db76d36dae3301643
Author: Todd Lipcon <to...@apache.org>
AuthorDate: Tue May 7 00:33:21 2019 -0700

    Drop statestore update frequency during data loading
    
    The statestore update frequency is the limiting factor in most DDL
    statements. This improved the speed of an incremental data load of the
    functional dataset by 5-10x or so on my machine in the case where data
    had previously been loaded.
    
    Change-Id: I8931a88aa04e0b4e8ef26a92bfe50a539a3c2505
    Reviewed-on: http://gerrit.cloudera.org:8080/13260
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
 testdata/bin/create-load-data.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index 9796d66..c2122d0 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -142,7 +142,8 @@ echo "REMOTE_LOAD=${REMOTE_LOAD:-}"
 
 function start-impala {
   : ${START_CLUSTER_ARGS=""}
-  START_CLUSTER_ARGS_INT=""
+  # Use a fast statestore update so that DDL operations run faster.
+  START_CLUSTER_ARGS_INT="--state_store_args=--statestore_update_frequency_ms=50"
   if [[ "${TARGET_FILESYSTEM}" == "local" ]]; then
     START_CLUSTER_ARGS_INT+=("--impalad_args=--abort_on_config_error=false -s 1")
   else