You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ar...@apache.org on 2018/07/18 21:35:17 UTC
impala git commit: IMPALA-7304: Don't write floating column index until PARQUET-1222 is resolved.

Repository: impala
Updated Branches:
  refs/heads/2.x 329979d6f -> 07c704aef


IMPALA-7304: Don't write floating column index until PARQUET-1222 is resolved.

Impala master branch can already write the Parquet
page index. However, we still don't have a well-defined
ordering for floating-point numbers in Parquet, see
PARQUET-1222

Currently impala writes the page index with
fmax()/fmin() semantics, but it might contradicts the
future semantics that will be defined once PARQUET-1222
is resolved.

>From this patch Impala won't write the column index
for floating-point columns until PARQUET-1222 is
resolved and implemented.

I updated the python test accordingly.

Change-Id: I50aa2e6607de6a8943eb068b8162b0506763078b
Reviewed-on: http://gerrit.cloudera.org:8080/10951
Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>
(cherry picked from commit 041197444d2a73bc3e3da4c6dbfdf1d63c236fbf)
Reviewed-on: http://gerrit.cloudera.org:8080/10960
Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
Tested-by: Zoltan Borok-Nagy <bo...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/07c704ae
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/07c704ae
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/07c704ae

Branch: refs/heads/2.x
Commit: 07c704aef3e8806198334bbf2f530293d717813f
Parents: 329979d
Author: Zoltan Borok-Nagy <bo...@cloudera.com>
Authored: Mon Jul 16 14:24:45 2018 +0200
Committer: Zoltan Borok-Nagy <bo...@cloudera.com>
Committed: Wed Jul 18 10:31:47 2018 +0000

----------------------------------------------------------------------
 be/src/exec/hdfs-parquet-table-writer.cc    | 6 ++++++
 tests/query_test/test_parquet_page_index.py | 5 +++++
 2 files changed, 11 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/07c704ae/be/src/exec/hdfs-parquet-table-writer.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-parquet-table-writer.cc b/be/src/exec/hdfs-parquet-table-writer.cc
index 91a2084..8aa4f7a 100644
--- a/be/src/exec/hdfs-parquet-table-writer.cc
+++ b/be/src/exec/hdfs-parquet-table-writer.cc
@@ -338,10 +338,16 @@ class HdfsParquetTableWriter::ColumnWriter :
       plain_encoded_value_size_(
           ParquetPlainEncoder::EncodedByteSize(eval->root().type())) {
     DCHECK_NE(eval->root().type().type, TYPE_BOOLEAN);
+    // IMPALA-7304: Don't write column index for floating-point columns until
+    // PARQUET-1222 is resolved.
+    if (std::is_floating_point<T>::value) valid_column_index_ = false;
   }
 
   virtual void Reset() {
     BaseColumnWriter::Reset();
+    // IMPALA-7304: Don't write column index for floating-point columns until
+    // PARQUET-1222 is resolved.
+    if (std::is_floating_point<T>::value) valid_column_index_ = false;
     // Default to dictionary encoding.  If the cardinality ends up being too high,
     // it will fall back to plain.
     current_encoding_ = parquet::Encoding::PLAIN_DICTIONARY;

http://git-wip-us.apache.org/repos/asf/impala/blob/07c704ae/tests/query_test/test_parquet_page_index.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_parquet_page_index.py b/tests/query_test/test_parquet_page_index.py
index 0ee5d37..6235819 100644
--- a/tests/query_test/test_parquet_page_index.py
+++ b/tests/query_test/test_parquet_page_index.py
@@ -226,6 +226,11 @@ class TestHdfsParquetTableIndexWriter(ImpalaTestSuite):
           index_size = len(column_info.offset_index.page_locations)
           assert index_size > 0
           self._validate_page_locations(column_info.offset_index.page_locations)
+          # IMPALA-7304: Impala doesn't write column index for floating-point columns
+          # until PARQUET-1222 is resolved.
+          if column_info.schema.type in [4, 5]:
+            assert column_info.column_index is None
+            continue
           self._validate_null_stats(index_size, column_info)
           self._validate_min_max_values(index_size, column_info)
           self._validate_boundary_order(column_info)