You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by st...@apache.org on 2023/03/27 23:01:54 UTC

[impala] branch branch-4.1.2 created (now d6f90efd6)

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a change to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git


      at d6f90efd6 Update version to 4.1.2-RELEASE

This branch includes the following new commits:

     new ec7a834b3 IMPALA-11355: Add STRING overloads for hour/minute/second/millisecond
     new 4f284b0f1 IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse
     new d9cbfa873 IMPALA-11753: CatalogD OOMkilled due to natively allocated memory
     new b793f4197 IMPALA-11696: Fix incorrect warnings of ignoring delimiters on text/sequence tables
     new 9bf8607ce IMPALA-11744: Table mask view should preserve the original column order in Hive
     new dff569b7e IMPALA-11779: Fix crash in TopNNode due to slots in null type
     new 4037f43f2 IMPALA-11843: Fix IndexOutOfBoundsException in analytic limit pushdown
     new 588719d32 IMPALA-11857: Connect join build fragment to join in graphical plan
     new 555370e0c IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking
     new faae4a513 IMPALA-11845: (Addendum) Don't specify db name in the new struct tests
     new d85d34fc3 IMPALA-11914: Fix broken verbose explain on MT_DOP > 0
     new c6223b2ae IMPALA-11953: Declare num_trues and num_falses in TIntermediateColumnStats as optional
     new 794eb1ba4 IMPALA-11081: Fix incorrect results in partition key scan
     new f3f0293df IMPALA-11751: Template tuple of Avro header should be transferred to ScanRangeSharedState
     new 43b01859b IMPALA-11751: (Addendum) fix test for Ozone
     new 5a96ffdb4 IMPALA-11795: Ignore high/low values stats for timestamp columns
     new d6f90efd6 Update version to 4.1.2-RELEASE

The 17 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.

[impala] 03/17: IMPALA-11753: CatalogD OOMkilled due to natively allocated memory

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d9cbfa8733f25af0f6b41f650eac8d7feb2a52e1
Author: Zoltan Borok-Nagy <bo...@cloudera.com>
AuthorDate: Mon Nov 28 18:19:10 2022 +0100

    IMPALA-11753: CatalogD OOMkilled due to natively allocated memory
    
    CatalogD can be OOMKilled due to too much natively allocated memory.
    The bug is due to a misuse of a Java compression API:
    https://bugs.openjdk.org/browse/JDK-8257032
    
    The problem is that we create our own Deflater object and pass it
    to the constructor of DeflaterOutputStream:
    https://github.com/apache/impala/blob/84fa6d210d3966e5ece8b4ac84ff8bd8780dec4e/fe/src/main/java/org/apache/impala/util/CompressionUtil.java#L47
    
    This means that Java's DeflaterOutputStream won't assume ownership on
    the Deflater, and won't invoke its end() method:
    
    * https://github.com/openjdk/jdk/blob/a249a52501f3cd7d4fbe5293d14ac8d0d6ffcc69/src/java.base/share/classes/java/util/zip/DeflaterOutputStream.java#L144
    * https://github.com/openjdk/jdk/blob/a249a52501f3cd7d4fbe5293d14ac8d0d6ffcc69/src/java.base/share/classes/java/util/zip/DeflaterOutputStream.java#L246-L247
    
    The Deflater's methods are implemented in C and allocate native memory.
    This means that until the GC doesn't destroy the unreachable Deflater
    objects they can consume quite much native memory. In some scenarios
    it can even result in OOMKills by the kernel.
    
    The fix is to override the DeflaterOutputStream's close() method so
    it invokes end() on the Deflater object.
    
    Change-Id: I663a21f60871e32d2d0100ea03d92fd8ab460691
    Reviewed-on: http://gerrit.cloudera.org:8080/19282
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../main/java/org/apache/impala/util/CompressionUtil.java   | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/fe/src/main/java/org/apache/impala/util/CompressionUtil.java b/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
index f6e11a110..cfa39c914 100644
--- a/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
+++ b/fe/src/main/java/org/apache/impala/util/CompressionUtil.java
@@ -44,7 +44,18 @@ public class CompressionUtil {
     // Deflater with 'BEST_SPEED' level provided reasonable compression ratios at much
     // faster speeds compared to other modes like BEST_COMPRESSION/DEFAULT_COMPRESSION.
     DeflaterOutputStream stream =
-        new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_SPEED));
+        new DeflaterOutputStream(bos, new Deflater(Deflater.BEST_SPEED)) {
+          // IMPALA-11753: to avoid CatalogD OOM we invoke def.end() which frees
+          // the natively allocated memory by the Deflater. See details in Jira.
+          @Override
+          public void close() throws IOException {
+            try {
+              super.close();
+            } finally {
+              def.end();
+            }
+          }
+        };
     try {
       stream.write(input);
       stream.close();

[impala] 02/17: IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4f284b0f15ef4db3abc24027fe3c09e6eaf870c3
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Mon Nov 7 20:44:52 2022 +0800

    IMPALA-11707: Fix global runtime IN-list filter of numeric types are AlwaysFalse
    
    Global runtime filters are published to the coordinator and then
    distributed to all executors that need it. The filter is serialized and
    deserialized using protobuf. While deserializing a global runtime filter
    of numeric type from protobuf, the InsertBatch() method forgot to update
    the total_entries_ counter. The filter is then considered as an empty
    list, which will reject any files/rows.
    
    This patch adds the missing update of total_entries_. Some DCHECKs are
    added to make sure total_entries_ is consistent with the actual size of
    the value set. This patch also fixes a type error (long_val -> int_val)
    in ToProtobuf() of Date type IN-list filter.
    
    Tests:
    - Added BE tests to verify the filter cloned from protobuf has the same
      behavior as the original one.
    - Added e2e regression tests
    - Run TestInListFilters 200 times.
    
    Change-Id: Ie90b2bce5e5ec6f6906ce9d2090b0ab19d48cc78
    Reviewed-on: http://gerrit.cloudera.org:8080/19220
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Qifan Chen <qf...@hotmail.com>
---
 be/src/runtime/runtime-filter-bank.h               |  20 +--
 be/src/util/in-list-filter-ir.cc                   |   6 +-
 be/src/util/in-list-filter-test.cc                 | 141 +++++++++++++++------
 be/src/util/in-list-filter.cc                      |  37 +++---
 be/src/util/in-list-filter.h                       |  25 +++-
 .../queries/QueryTest/in_list_filters.test         |  95 +++++++++++++-
 6 files changed, 248 insertions(+), 76 deletions(-)

diff --git a/be/src/runtime/runtime-filter-bank.h b/be/src/runtime/runtime-filter-bank.h
index 7a66ec6c9..f97d43dd1 100644
--- a/be/src/runtime/runtime-filter-bank.h
+++ b/be/src/runtime/runtime-filter-bank.h
@@ -74,10 +74,11 @@ struct FilterRegistration {
 ///
 /// All producers and consumers of filters must register via RegisterProducer() and
 /// RegisterConsumer(). Local plan fragments update the filters by calling
-/// UpdateFilterFromLocal(), with either a bloom filter or a min-max filter, depending
-/// on the filter's type. The 'bloom_filter' or 'min_max_filter' that is passed into
-/// UpdateFilterFromLocal() must have been allocated by AllocateScratch*Filter(); this
-/// allows RuntimeFilterBank to manage all memory associated with filters.
+/// UpdateFilterFromLocal(), with either a bloom filter, a min-max filter, or an in-list
+/// filter, depending on the filter's type. The 'bloom_filter', 'min_max_filter' or
+/// 'in_list_filter' that is passed into UpdateFilterFromLocal() must have been allocated
+/// by AllocateScratch*Filter(); this allows RuntimeFilterBank to manage all memory
+/// associated with filters.
 ///
 /// Filters are aggregated, first locally in this RuntimeFilterBank, if there are multiple
 /// producers, and then made available to consumers after PublishGlobalFilter() has been
@@ -85,10 +86,11 @@ struct FilterRegistration {
 /// of time so that RuntimeFilterBank knows when the filter is complete.
 ///
 /// After PublishGlobalFilter() has been called (at most once per filter_id), the
-/// RuntimeFilter object associated with filter_id will have a valid bloom_filter or
-/// min_max_filter, and may be used for filter evaluation. This operation occurs
-/// without synchronisation, and neither the thread that calls PublishGlobalFilter()
-/// nor the thread that may call RuntimeFilter::Eval() need to coordinate in any way.
+/// RuntimeFilter object associated with filter_id will have a valid bloom_filter,
+/// min_max_filter or in_list_filter, and may be used for filter evaluation. This
+/// operation occurs without synchronisation, and neither the thread that calls
+/// PublishGlobalFilter() nor the thread that may call RuntimeFilter::Eval() need to
+/// coordinate in any way.
 class RuntimeFilterBank {
  public:
   /// 'filters': contains an entry for every filter produced or consumed on this backend.
@@ -127,7 +129,7 @@ class RuntimeFilterBank {
   void UpdateFilterFromLocal(int32_t filter_id, BloomFilter* bloom_filter,
       MinMaxFilter* min_max_filter, InListFilter* in_list_filter);
 
-  /// Makes a bloom_filter (aggregated globally from all producer fragments) available for
+  /// Makes a filter (aggregated globally from all producer fragments) available for
   /// consumption by operators that wish to use it for filtering.
   void PublishGlobalFilter(
       const PublishFilterParamsPB& params, kudu::rpc::RpcContext* context);
diff --git a/be/src/util/in-list-filter-ir.cc b/be/src/util/in-list-filter-ir.cc
index 9e532b418..b166829e3 100644
--- a/be/src/util/in-list-filter-ir.cc
+++ b/be/src/util/in-list-filter-ir.cc
@@ -40,6 +40,7 @@ int32_t InListFilterImpl<int32_t, TYPE_DATE>::GetValue(const void* val) {
         Reset();                                                                      \
       }                                                                               \
     }                                                                                 \
+    DCHECK_EQ(total_entries_, values_.size());                                        \
   }                                                                                   \
                                                                                       \
   template<>                                                                          \
@@ -72,14 +73,15 @@ StringValue InListFilterImpl<StringValue, TYPE_CHAR>::GetValue(const void* val,
     }                                                                                   \
     StringValue s = GetValue(val, type_len_);                                           \
     if (!values_.find(s)) {                                                             \
-      bool res = newly_inserted_values_.insert(s);                                      \
-      if (res) {                                                                        \
+      const auto& res = newly_inserted_values_.insert(s);                               \
+      if (res.second) {                                                                 \
         ++total_entries_;                                                               \
         uint32_t str_total_len = values_.total_len + newly_inserted_values_.total_len;  \
         if (UNLIKELY(total_entries_ > entry_limit_                                      \
             || str_total_len >= STRING_SET_MAX_TOTAL_LENGTH)) {                         \
           Reset();                                                                      \
         }                                                                               \
+        DCHECK_EQ(total_entries_, values_.size() + newly_inserted_values_.size());      \
       }                                                                                 \
     }                                                                                   \
   }                                                                                     \
diff --git a/be/src/util/in-list-filter-test.cc b/be/src/util/in-list-filter-test.cc
index 9db92486a..e274c16d1 100644
--- a/be/src/util/in-list-filter-test.cc
+++ b/be/src/util/in-list-filter-test.cc
@@ -23,6 +23,32 @@
 
 using namespace impala;
 
+template<typename T>
+void VerifyItems(InListFilter* f, ColumnType col_type, T min_value, T max_value,
+    bool contains_null) {
+  int num_items = max_value - min_value + 1;
+  if (contains_null) ++num_items;
+  EXPECT_EQ(num_items, f->NumItems());
+  EXPECT_EQ(contains_null, f->ContainsNull());
+  for (T v = min_value; v <= max_value; ++v) {
+    EXPECT_TRUE(f->Find(&v, col_type)) << v << " not found in " << f->DebugString();
+  }
+  T i = min_value - 1;
+  EXPECT_FALSE(f->Find(&i, col_type));
+  i = max_value + 1;
+  EXPECT_FALSE(f->Find(&i, col_type));
+
+  EXPECT_FALSE(f->AlwaysFalse());
+  EXPECT_FALSE(f->AlwaysTrue());
+}
+
+InListFilter* CloneFromProtobuf(InListFilter* filter, ColumnType col_type,
+    uint32_t entry_limit, ObjectPool* pool, MemTracker* mem_tracker) {
+  InListFilterPB pb;
+  InListFilter::ToProtobuf(filter, &pb);
+  return InListFilter::Create(pb, col_type, entry_limit, pool, mem_tracker);
+}
+
 template<typename T, PrimitiveType SLOT_TYPE>
 void TestNumericInListFilter() {
   MemTracker mem_tracker;
@@ -32,35 +58,40 @@ void TestNumericInListFilter() {
   EXPECT_TRUE(filter->AlwaysFalse());
   EXPECT_FALSE(filter->AlwaysTrue());
 
+  // Insert 20 values
   for (T v = -10; v < 10; ++v) {
     filter->Insert(&v);
   }
-  // Insert duplicated values again
+  // Insert some duplicated values again
   for (T v = 9; v >= 0; --v) {
     filter->Insert(&v);
   }
-  EXPECT_EQ(20, filter->NumItems());
-  EXPECT_FALSE(filter->ContainsNull());
-  filter->Insert(nullptr);
-  EXPECT_TRUE(filter->ContainsNull());
-  EXPECT_EQ(21, filter->NumItems());
+  VerifyItems<T>(filter, col_type, -10, 9, false);
 
-  for (T v = -10; v < 10; ++v) {
-    EXPECT_TRUE(filter->Find(&v, col_type));
-  }
-  T i = -11;
-  EXPECT_FALSE(filter->Find(&i, col_type));
-  i = 10;
-  EXPECT_FALSE(filter->Find(&i, col_type));
+  // Copy the filter through protobuf for testing InsertBatch()
+  InListFilter* filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool,
+      &mem_tracker);
+  VerifyItems<T>(filter2, col_type, -10, 9, false);
 
-  EXPECT_FALSE(filter->AlwaysFalse());
-  EXPECT_FALSE(filter->AlwaysTrue());
+  // Insert NULL
+  filter->Insert(nullptr);
+  VerifyItems<T>(filter, col_type, -10, 9, true);
+
+  filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool, &mem_tracker);
+  VerifyItems<T>(filter2, col_type, -10, 9, true);
 
   // Test falling back to an always_true filter when #items exceeds the limit
-  filter->Insert(&i);
-  EXPECT_FALSE(filter->AlwaysFalse());
-  EXPECT_TRUE(filter->AlwaysTrue());
-  EXPECT_EQ(0, filter->NumItems());
+  T value = 10;
+  filter->Insert(&value);
+  filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool, &mem_tracker);
+
+  int i = 0;
+  for (InListFilter* f : {filter, filter2}) {
+    EXPECT_FALSE(f->AlwaysFalse());
+    EXPECT_TRUE(f->AlwaysTrue());
+    EXPECT_EQ(0, f->NumItems()) << "Error in filter " << i;
+    ++i;
+  }
 }
 
 TEST(InListFilterTest, TestTinyint) {
@@ -101,15 +132,23 @@ TEST(InListFilterTest, TestDate) {
   }
   EXPECT_EQ(5, filter->NumItems());
   EXPECT_FALSE(filter->ContainsNull());
+  InListFilter* filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool,
+      &mem_tracker);
+  EXPECT_EQ(5, filter2->NumItems());
+  EXPECT_FALSE(filter2->ContainsNull());
+
   filter->Insert(nullptr);
-  EXPECT_TRUE(filter->ContainsNull());
-  EXPECT_EQ(6, filter->NumItems());
+  filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool, &mem_tracker);
 
-  for (const auto& v : values) {
-    EXPECT_TRUE(filter->Find(&v, col_type));
+  for (InListFilter* f : {filter, filter2}) {
+    EXPECT_TRUE(f->ContainsNull());
+    EXPECT_EQ(6, f->NumItems());
+    for (const auto& v : values) {
+      EXPECT_TRUE(f->Find(&v, col_type));
+    }
+    DateValue d(60000);
+    EXPECT_FALSE(f->Find(&d, col_type));
   }
-  DateValue d(60000);
-  EXPECT_FALSE(filter->Find(&d, col_type));
 }
 
 void TestStringInListFilter(const ColumnType& col_type) {
@@ -135,21 +174,30 @@ void TestStringInListFilter(const ColumnType& col_type) {
     filter->Insert(&s);
   }
   filter->MaterializeValues();
-
   EXPECT_EQ(5, filter->NumItems());
   EXPECT_FALSE(filter->ContainsNull());
+
+  InListFilter* filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool,
+      &mem_tracker);
+  EXPECT_EQ(5, filter2->NumItems());
+  EXPECT_FALSE(filter2->ContainsNull());
+  filter2->Close();
+
   filter->Insert(nullptr);
-  EXPECT_TRUE(filter->ContainsNull());
-  EXPECT_EQ(6, filter->NumItems());
+  filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool, &mem_tracker);
 
   // Merge ss2 to ss1
   ss1.insert(ss1.end(), ss2.begin(), ss2.end());
-  for (const StringValue& s : ss1) {
-    EXPECT_TRUE(filter->Find(&s, col_type));
+  for (InListFilter* f : {filter, filter2}) {
+    EXPECT_TRUE(f->ContainsNull());
+    EXPECT_EQ(6, f->NumItems());
+    for (const StringValue& s : ss1) {
+      EXPECT_TRUE(f->Find(&s, col_type));
+    }
+    StringValue d("d");
+    EXPECT_FALSE(f->Find(&d, col_type));
+    f->Close();
   }
-  StringValue d("d");
-  EXPECT_FALSE(filter->Find(&d, col_type));
-  filter->Close();
 }
 
 TEST(InListFilterTest, TestString) {
@@ -188,16 +236,25 @@ TEST(InListFilterTest, TestChar) {
 
   EXPECT_EQ(5, filter->NumItems());
   EXPECT_FALSE(filter->ContainsNull());
+  InListFilter* filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool,
+      &mem_tracker);
+  EXPECT_EQ(5, filter2->NumItems());
+  EXPECT_FALSE(filter2->ContainsNull());
+  filter2->Close();
+
   filter->Insert(nullptr);
-  EXPECT_TRUE(filter->ContainsNull());
-  EXPECT_EQ(6, filter->NumItems());
+  filter2 = CloneFromProtobuf(filter, col_type, 20, &obj_pool, &mem_tracker);
+  for (InListFilter* f : {filter, filter2}) {
+    EXPECT_TRUE(f->ContainsNull());
+    EXPECT_EQ(6, f->NumItems());
 
-  ptr = str_buffer;
-  for (int i = 0; i < 5; ++i) {
-    EXPECT_TRUE(filter->Find(ptr, col_type));
-    ptr += 2;
+    ptr = str_buffer;
+    for (int i = 0; i < 5; ++i) {
+      EXPECT_TRUE(f->Find(ptr, col_type));
+      ptr += 2;
+    }
+    ptr = "gg";
+    EXPECT_FALSE(f->Find(ptr, col_type));
+    f->Close();
   }
-  ptr = "gg";
-  EXPECT_FALSE(filter->Find(ptr, col_type));
-  filter->Close();
 }
\ No newline at end of file
diff --git a/be/src/util/in-list-filter.cc b/be/src/util/in-list-filter.cc
index 8749deda2..1c1a914a0 100644
--- a/be/src/util/in-list-filter.cc
+++ b/be/src/util/in-list-filter.cc
@@ -132,10 +132,7 @@ void InListFilterImpl<StringValue, SLOT_TYPE>::MaterializeValues() {
     VLOG_QUERY << "Not enough memory in materializing string IN-list filters. "
         << "Fallback to always true. New string batch size: "
         << newly_inserted_values_.total_len << "\n" << mem_pool_.DebugString();
-    always_true_ = true;
-    values_.clear();
-    newly_inserted_values_.clear();
-    total_entries_ = 0;
+    Reset();
     return;
   }
   // Transfer values to the finial set. Don't need to update total_entries_ since it's
@@ -148,23 +145,31 @@ void InListFilterImpl<StringValue, SLOT_TYPE>::MaterializeValues() {
   newly_inserted_values_.clear();
 }
 
-#define IN_LIST_FILTER_INSERT_BATCH(TYPE, SLOT_TYPE, PB_VAL_METHOD)                      \
+#define IN_LIST_FILTER_INSERT_BATCH(TYPE, SLOT_TYPE, PB_VAL_METHOD, SET_VAR)             \
   template<>                                                                             \
   void InListFilterImpl<TYPE, SLOT_TYPE>::InsertBatch(const ColumnValueBatchPB& batch) { \
     for (const ColumnValuePB& v : batch) {                                               \
-      DCHECK(v.has_##PB_VAL_METHOD());                                                   \
-      values_.insert(v.PB_VAL_METHOD());                                                 \
+      DCHECK(v.has_##PB_VAL_METHOD()) << v.ShortDebugString();                           \
+      const auto& res = SET_VAR.insert(v.PB_VAL_METHOD());                               \
+      if (res.second) {                                                                  \
+        ++total_entries_;                                                                \
+        if (UNLIKELY(total_entries_ > entry_limit_)) {                                   \
+          Reset();                                                                       \
+          break;                                                                         \
+        }                                                                                \
+      }                                                                                  \
     }                                                                                    \
+    DCHECK_EQ(total_entries_, SET_VAR.size());                                           \
   }
 
-IN_LIST_FILTER_INSERT_BATCH(int8_t, TYPE_TINYINT, byte_val)
-IN_LIST_FILTER_INSERT_BATCH(int16_t, TYPE_SMALLINT, short_val)
-IN_LIST_FILTER_INSERT_BATCH(int32_t, TYPE_INT, int_val)
-IN_LIST_FILTER_INSERT_BATCH(int64_t, TYPE_BIGINT, long_val)
-IN_LIST_FILTER_INSERT_BATCH(int32_t, TYPE_DATE, int_val)
-IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_STRING, string_val)
-IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_VARCHAR, string_val)
-IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_CHAR, string_val)
+IN_LIST_FILTER_INSERT_BATCH(int8_t, TYPE_TINYINT, byte_val, values_)
+IN_LIST_FILTER_INSERT_BATCH(int16_t, TYPE_SMALLINT, short_val, values_)
+IN_LIST_FILTER_INSERT_BATCH(int32_t, TYPE_INT, int_val, values_)
+IN_LIST_FILTER_INSERT_BATCH(int64_t, TYPE_BIGINT, long_val, values_)
+IN_LIST_FILTER_INSERT_BATCH(int32_t, TYPE_DATE, int_val, values_)
+IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_STRING, string_val, newly_inserted_values_)
+IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_VARCHAR, string_val, newly_inserted_values_)
+IN_LIST_FILTER_INSERT_BATCH(StringValue, TYPE_CHAR, string_val, newly_inserted_values_)
 
 #define NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(TYPE, SLOT_TYPE, PB_VAL_METHOD)             \
   template<>                                                                           \
@@ -182,7 +187,7 @@ NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int8_t, TYPE_TINYINT, byte_val)
 NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int16_t, TYPE_SMALLINT, short_val)
 NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int32_t, TYPE_INT, int_val)
 NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int64_t, TYPE_BIGINT, long_val)
-NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int32_t, TYPE_DATE, long_val)
+NUMERIC_IN_LIST_FILTER_TO_PROTOBUF(int32_t, TYPE_DATE, int_val)
 
 #define STRING_IN_LIST_FILTER_TO_PROTOBUF(SLOT_TYPE)                                   \
   template<>                                                                           \
diff --git a/be/src/util/in-list-filter.h b/be/src/util/in-list-filter.h
index cb2e7c6b2..c2e36cab9 100644
--- a/be/src/util/in-list-filter.h
+++ b/be/src/util/in-list-filter.h
@@ -136,20 +136,29 @@ class InListFilterImpl : public InListFilter {
   std::unordered_set<T> values_;
 };
 
+/// String set that wraps a boost::unordered_set<StringValue> and tracks the total length
+/// of strings in the set. Exposes the same methods of boost::unordered_set that are used
+/// in InListFilters.
 struct StringSetWithTotalLen {
   boost::unordered_set<StringValue> values;
   uint32_t total_len = 0;
 
-  inline bool insert(StringValue v) {
-    const auto& res = values.insert(v);
+  typedef typename boost::unordered_set<StringValue>::iterator iterator;
+
+  /// Inserts a new StringValue. Returns a pair consisting of an iterator to the element
+  /// in the set, and a bool denoting whether the insertion took place (true if insertion
+  /// happened, false if it did not, i.e. already exists).
+  inline pair<iterator, bool> insert(StringValue v) {
+    const auto& res = values.emplace(v);
     total_len += (res.second? v.len : 0);
-    return res.second;
+    return res;
   }
 
-  inline bool insert(const string& s) {
+  /// Same as the above one but inserts a value of std::string
+  inline pair<iterator, bool> insert(const string& s) {
     const auto& res = values.emplace(s);
     total_len += (res.second ? s.length() : 0);
-    return res.second;
+    return res;
   }
 
   inline bool find(StringValue v) const {
@@ -160,6 +169,10 @@ struct StringSetWithTotalLen {
     values.clear();
     total_len = 0;
   }
+
+  inline size_t size() const {
+    return values.size();
+  }
 };
 
 template<PrimitiveType SLOT_TYPE>
@@ -196,7 +209,7 @@ class InListFilterImpl<StringValue, SLOT_TYPE> : public InListFilter {
   MemPool mem_pool_;
   StringSetWithTotalLen values_;
   /// Temp set used to insert new values. They will be transferred to values_ in
-  /// MaterializeValues().
+  /// MaterializeValues(). Values should always be inserted into this set first.
   StringSetWithTotalLen newly_inserted_values_;
   /// Type len for CHAR type.
   int type_len_;
diff --git a/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test b/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
index cfe71f19a..919c2d850 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/in_list_filters.test
@@ -170,4 +170,97 @@ select STRAIGHT_JOIN count(*) from alltypes a
 row_regex: .*Filter 0 arrival with 0 items.*
 row_regex: .*RowsRead: 2.43K \(2433\).*
 ====
-
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+# Final filter table shown below. Filter 0 is the global filter
+# generated by the build side of scanning b, and is applied to
+# the scan node that scans a.
+#
+# ID  Src. Node  Tgt. Node(s)  Target type  Partition filter  Pending (Expected)  First arrived  Completed  Enabled  Bloom Size  Est fpp  Min value  Max value     In-list size
+#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+#  1          3             0        LOCAL             false               0 (3)            N/A        N/A     true     IN_LIST                                  PartialUpdates
+#  0          4             1       REMOTE             false               0 (3)      431.952ms  431.953ms     true     IN_LIST                                               1
+select count(*) from alltypes t, alltypestiny a, alltypestiny b
+where t.id = a.id and a.tinyint_col = b.tinyint_col and b.id = 0;
+---- RESULTS
+4
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+select count(*) from alltypes t, alltypestiny a, alltypestiny b
+where t.id = a.id and a.smallint_col = b.smallint_col and b.id = 0;
+---- RESULTS
+4
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+select count(*) from alltypes t, alltypestiny a, alltypestiny b
+where t.id = a.id and a.int_col = b.int_col and b.id = 0;
+---- RESULTS
+4
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+select count(*) from alltypes t, alltypestiny a, alltypestiny b
+where t.id = a.id and a.bigint_col = b.bigint_col and b.id = 0;
+---- RESULTS
+4
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+select count(*) from alltypes t, alltypestiny a, alltypestiny b
+where t.id = a.id and a.string_col = b.string_col and b.id = 0;
+---- RESULTS
+4
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter on DATE type
+# Final filter table:
+# ID  Src. Node  Tgt. Node(s)  Target type  Partition filter  Pending (Expected)  First arrived  Completed  Enabled  Bloom Size  Est fpp  Min value  Max value     In-list size
+#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+#  1          3             0        LOCAL             false               0 (3)            N/A        N/A     true     IN_LIST                                  PartialUpdates
+#  0          4             1       REMOTE             false               0 (3)      427.938ms  427.947ms     true     IN_LIST                                               5
+select STRAIGHT_JOIN count(*)
+from date_tbl t
+join [BROADCAST] date_tbl a on t.id_col = a.id_col
+join [BROADCAST] date_tbl b on a.date_col = b.date_col
+where b.id_col < 5;
+---- RESULTS
+7
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+5
+row_regex: .*Filter 0 arrival with 5 items.*
+====
+---- QUERY
+# IMPALA-11707: Regression test on global IN-list filter
+# Final filter table:
+# ID  Src. Node  Tgt. Node(s)  Target type  Partition filter  Pending (Expected)  First arrived  Completed  Enabled  Bloom Size  Est fpp  Min value  Max value     In-list size
+#------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
+#  1          3             0        LOCAL             false               0 (1)            N/A        N/A     true     IN_LIST                                  PartialUpdates
+#  0          4             1       REMOTE             false               0 (1)       87.270ms   87.271ms     true     IN_LIST                                               1
+select count(*)
+from tpch_orc_def.supplier, tpch_orc_def.nation, tpch_orc_def.region
+where s_nationkey = n_nationkey
+  and n_regionkey = r_regionkey
+  and r_name = 'EUROPE';
+---- RESULTS
+1987
+---- RUNTIME_PROFILE
+row_regex: .*0\s+4\s+1\s+REMOTE\s+false.*IN_LIST\s+1
+row_regex: .*Filter 0 arrival with 1 items.*
+====

[impala] 04/17: IMPALA-11696: Fix incorrect warnings of ignoring delimiters on text/sequence tables

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b793f4197630eab665cbecc9cf5920fc8212d9c1
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Wed Nov 2 11:02:00 2022 +0800

    IMPALA-11696: Fix incorrect warnings of ignoring delimiters on text/sequence tables
    
    IMPALA-9822 adds a warning when the customized row format delimiters in
    the CreateTable statement are ignored on non-TEXT and non-SEQUENCE
    tables. However, the warning also shows up for TEXT/SEQUENCE tables. The
    cause is an incorrect check in the table format that all formats match
    the condition.
    
    This patch fixes the condition and adds tests to verify that no warnings
    show up in such cases. Currently the test methods (e.g. AnalyzesOk) only
    check expected warning messages when provided. If the provided expected
    message is null, they just skip checking the warnings. This patch adds
    methods like AnalyzesOkWithoutWarnings to assure no warnings are
    generated.
    
    Tests
     - Run FE tests
    
    Change-Id: I0871b94dcd2290723699c21227a576e8a6a09b5a
    Reviewed-on: http://gerrit.cloudera.org:8080/19186
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../apache/impala/analysis/CreateTableStmt.java    |  2 +-
 .../org/apache/impala/analysis/AnalyzeDDLTest.java | 37 +++++++++++++---------
 .../apache/impala/analysis/AnalyzeStmtsTest.java   |  6 +++-
 .../org/apache/impala/common/FrontendFixture.java  | 10 ++++--
 .../org/apache/impala/common/FrontendTestBase.java | 24 ++++++++++++--
 5 files changed, 56 insertions(+), 23 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java b/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
index 114bbdb04..344f9c2c6 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
@@ -273,7 +273,7 @@ public class CreateTableStmt extends StatementBase {
       String lineDelimiter = getRowFormat().getLineDelimiter();
       String escapeChar = getRowFormat().getEscapeChar();
       if (getFileFormat() != THdfsFileFormat.TEXT
-          || getFileFormat() != THdfsFileFormat.SEQUENCE_FILE) {
+          && getFileFormat() != THdfsFileFormat.SEQUENCE_FILE) {
         if (fieldDelimiter != null) {
           analyzer.addWarning("'ROW FORMAT DELIMITED FIELDS TERMINATED BY '"
               + fieldDelimiter + "'' is ignored.");
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
index a473a2011..f2416a5a4 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
@@ -49,7 +49,6 @@ import org.apache.impala.common.FileSystemUtil;
 import org.apache.impala.common.FrontendTestBase;
 import org.apache.impala.common.Pair;
 import org.apache.impala.common.PrintUtils;
-import org.apache.impala.common.RuntimeEnv;
 import org.apache.impala.service.BackendConfig;
 import org.apache.impala.testutil.TestUtils;
 import org.apache.impala.thrift.TBackendGflags;
@@ -2314,16 +2313,20 @@ public class AnalyzeDDLTest extends FrontendTestBase {
         "explicitly specify the column type for column 'new_col'.");
 
     // IMPALA-9822 Row Format Delimited is valid only for Text Files
-    String[] fileFormats = {"PARQUET", "ICEBERG"};
-    for (String format : fileFormats) {
+    String[] fileFormats = {"TEXTFILE", "PARQUET", "ICEBERG"};
+    for (int i = 0; i < fileFormats.length; ++i) {
+      String format = fileFormats[i];
       for (String rowFormat : ImmutableList.of(
                "FIELDS TERMINATED BY ','", "LINES TERMINATED BY ','", "ESCAPED BY ','")) {
-        AnalyzesOk(
-            String.format(
-                "create table new_table row format delimited %s stored as %s as select *"
-                    + " from functional.child_table",
-                rowFormat, format),
-            "'ROW FORMAT DELIMITED " + rowFormat + "' is ignored.");
+        String stmt = String.format(
+            "create table new_table row format delimited %s stored as %s as select *"
+                + " from functional.child_table", rowFormat, format);
+        if (i == 0) {
+          // No warrnings for TEXT tables
+          AnalyzesOkWithoutWarnings(stmt);
+        } else {
+          AnalyzesOk(stmt, "'ROW FORMAT DELIMITED " + rowFormat + "' is ignored.");
+        }
       }
     }
   }
@@ -2583,14 +2586,18 @@ public class AnalyzeDDLTest extends FrontendTestBase {
       formatIndx++;
     }
 
-    for (formatIndx = 2; formatIndx < fileFormats.length; formatIndx++) {
+    for (formatIndx = 0; formatIndx < fileFormats.length; formatIndx++) {
       for (String rowFormat : ImmutableList.of(
                "FIELDS TERMINATED BY ','", "LINES TERMINATED BY ','", "ESCAPED BY ','")) {
-        AnalyzesOk(
-            String.format(
-                "create table new_table (i int) row format delimited %s stored as %s",
-                rowFormat, fileFormats[formatIndx]),
-            "'ROW FORMAT DELIMITED " + rowFormat + "' is ignored");
+        String stmt = String.format(
+            "create table new_table (i int) row format delimited %s stored as %s",
+            rowFormat, fileFormats[formatIndx]);
+        if (formatIndx < 2) {
+          // No warrnings for TEXT and SEQUENCE tables
+          AnalyzesOkWithoutWarnings(stmt);
+        } else {
+          AnalyzesOk(stmt, "'ROW FORMAT DELIMITED " + rowFormat + "' is ignored");
+        }
       }
     }
 
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
index be7cf7776..25fc885c2 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
@@ -1989,7 +1989,11 @@ public class AnalyzeStmtsTest extends AnalyzerTest {
   */
   private boolean hasStraightJoin(String stmt, String expectedWarning){
     AnalysisContext ctx = createAnalysisCtx();
-    AnalyzesOk(stmt,ctx, expectedWarning);
+    if (expectedWarning == null) {
+      AnalyzesOkWithoutWarnings(stmt, ctx);
+    } else {
+      AnalyzesOk(stmt, ctx, expectedWarning);
+    }
     return ctx.getAnalyzer().isStraightJoin();
   }
 
diff --git a/fe/src/test/java/org/apache/impala/common/FrontendFixture.java b/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
index 862b782e1..17e9d6665 100644
--- a/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
+++ b/fe/src/test/java/org/apache/impala/common/FrontendFixture.java
@@ -373,13 +373,14 @@ public class FrontendFixture {
   /**
    * Analyze 'stmt', expecting it to pass. Asserts in case of analysis error.
    * If 'expectedWarning' is not null, asserts that a warning is produced.
+   * Otherwise, asserts no warnings if 'assertNoWarnings' is true.
    */
   public ParseNode analyzeStmt(String stmt, AnalysisContext ctx,
-      String expectedWarning) {
+      String expectedWarning, boolean assertNoWarnings) {
     try {
       AnalysisResult analysisResult = parseAndAnalyze(stmt, ctx);
+      List<String> actualWarnings = analysisResult.getAnalyzer().getWarnings();
       if (expectedWarning != null) {
-        List<String> actualWarnings = analysisResult.getAnalyzer().getWarnings();
         boolean matchedWarning = false;
         for (String actualWarning: actualWarnings) {
           if (actualWarning.startsWith(expectedWarning)) {
@@ -392,6 +393,9 @@ public class FrontendFixture {
                   + "Expected warning:\n%s.\nActual warnings:\n%s\nsql:\n%s",
               expectedWarning, Joiner.on("\n").join(actualWarnings), stmt));
         }
+      } else if (assertNoWarnings && !actualWarnings.isEmpty()) {
+        fail(String.format("Should not produce any warnings. Got:\n%s\nsql:\n%s",
+            Joiner.on("\n").join(actualWarnings), stmt));
       }
       Preconditions.checkNotNull(analysisResult.getStmt());
       return analysisResult.getStmt();
@@ -407,6 +411,6 @@ public class FrontendFixture {
    * Uses default options; use {@link QueryFixture} for greater control.
    */
   public ParseNode analyzeStmt(String stmt) {
-    return analyzeStmt(stmt, createAnalysisCtx(), null);
+    return analyzeStmt(stmt, createAnalysisCtx(), null, false);
   }
 }
diff --git a/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java b/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
index bdf08155a..817b9395f 100644
--- a/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
+++ b/fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
@@ -194,9 +194,17 @@ public class FrontendTestBase extends AbstractFrontendTest {
   /**
    * Analyze 'stmt', expecting it to pass. Asserts in case of analysis error.
    * If 'expectedWarning' is not null, asserts that a warning is produced.
+   * Otherwise, asserts no warnings.
    */
   public ParseNode AnalyzesOk(String stmt, String expectedWarning) {
-    return AnalyzesOk(stmt, createAnalysisCtx(), expectedWarning);
+    return AnalyzesOk(stmt, createAnalysisCtx(), expectedWarning, true);
+  }
+
+  /**
+   * Analyze 'stmt', expecting it to pass. Asserts in case of analysis error or warnings.
+   */
+  public ParseNode AnalyzesOkWithoutWarnings(String stmt) {
+    return AnalyzesOk(stmt, createAnalysisCtx(), null, true);
   }
 
   protected AnalysisContext createAnalysisCtx() {
@@ -241,13 +249,23 @@ public class FrontendTestBase extends AbstractFrontendTest {
   /**
    * Analyze 'stmt', expecting it to pass. Asserts in case of analysis error.
    * If 'expectedWarning' is not null, asserts that a warning is produced.
+   * Otherwise, asserts no warnings if 'assertNoWarnings' is true.
    */
-  public ParseNode AnalyzesOk(String stmt, AnalysisContext ctx, String expectedWarning) {
+  public ParseNode AnalyzesOk(String stmt, AnalysisContext ctx, String expectedWarning,
+      boolean assertNoWarnings) {
     try (FrontendProfile.Scope scope = FrontendProfile.createNewWithScope()) {
-      return feFixture_.analyzeStmt(stmt, ctx, expectedWarning);
+      return feFixture_.analyzeStmt(stmt, ctx, expectedWarning, assertNoWarnings);
     }
   }
 
+  public ParseNode AnalyzesOk(String stmt, AnalysisContext ctx, String expectedWarning) {
+    return AnalyzesOk(stmt, ctx, expectedWarning, false);
+  }
+
+  public ParseNode AnalyzesOkWithoutWarnings(String stmt, AnalysisContext ctx) {
+    return AnalyzesOk(stmt, ctx, null, true);
+  }
+
   /**
    * Analyzes the given statement without performing rewrites or authorization.
    */

[impala] 16/17: IMPALA-11795: Ignore high/low values stats for timestamp columns

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 5a96ffdb4cfef135ac55e24275f86414fab7e3b7
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Mon Feb 27 14:28:35 2023 +0100

    IMPALA-11795: Ignore high/low values stats for timestamp columns
    
    Timestamp column stats are handled as LongColumnStatsData, similarly to
    integer types, but high/low value handling is not yet implemented for
    timestamps. If for some reason HMS returned high/low values for
    timestamps columns a Precondition ("Unsupported type encountered in
    setLowAndHighValue()") was hit in Catalogd leading to failing to load
    the table.
    
    Impala does not write high/low values for timestamp columns, so I don't
    know what led to this state in HMS and could only reproduce the issue
    by manipulating TAB_COL_STATS in the backing db of HMS.
    
    Testing:
    - only tested manually by manipulating TAB_COL_STATS in HMS's db
    
    Change-Id: If585d2543d49978140dcb7b8d49d6ea50e4a8544
    Reviewed-on: http://gerrit.cloudera.org:8080/19548
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../main/java/org/apache/impala/catalog/ColumnStats.java  | 15 +++++++++++++--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java b/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
index fcb52f83c..9603f3f1b 100644
--- a/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
+++ b/fe/src/main/java/org/apache/impala/catalog/ColumnStats.java
@@ -417,7 +417,7 @@ public class ColumnStats {
 
   /*
    * From the source 'longStats', set the low and high value for 'type' (one of the
-   * integer types).
+   * integer types). Does not handle TIMESTAMP columns.
    */
   protected void setLowAndHighValue(PrimitiveType type, LongColumnStatsData longStats) {
     if (!longStats.isSetLowValue()) {
@@ -438,6 +438,10 @@ public class ColumnStats {
         case BIGINT:
           lowValue_.setLong_val(value.longValue());
           break;
+        case TIMESTAMP:
+          Preconditions.checkState(
+              false, "TIMESTAMP columns are not supported by setLowAndHighValue()");
+          break;
         default:
           Preconditions.checkState(
               false, "Unsupported type encountered in setLowAndHighValue()");
@@ -462,6 +466,10 @@ public class ColumnStats {
         case BIGINT:
           highValue_.setLong_val(value.longValue());
           break;
+        case TIMESTAMP:
+          Preconditions.checkState(
+              false, "TIMESTAMP columns are not supported by setLowAndHighValue()");
+          break;
         default:
           Preconditions.checkState(
               false, "Unsupported type encountered in setLowAndHighValue()");
@@ -575,7 +583,10 @@ public class ColumnStats {
           LongColumnStatsData longStats = statsData.getLongStats();
           numDistinctValues_ = longStats.getNumDVs();
           numNulls_ = longStats.getNumNulls();
-          setLowAndHighValue(colType.getPrimitiveType(), longStats);
+          if (colType.getPrimitiveType() != PrimitiveType.TIMESTAMP) {
+            // Low/high value handling is not yet implemented for timestamps.
+            setLowAndHighValue(colType.getPrimitiveType(), longStats);
+          }
         }
         break;
       case DATE:

[impala] 01/17: IMPALA-11355: Add STRING overloads for hour/minute/second/millisecond

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit ec7a834b378847251a37f98a1dfcfb18c004f1b4
Author: Csaba Ringhofer <cs...@cloudera.com>
AuthorDate: Mon Jul 11 19:07:55 2022 +0200

    IMPALA-11355: Add STRING overloads for hour/minute/second/millisecond
    
    IMPALA-9531 dropped support for "dateless timestamps",
    e.g. cast("12:05:05" as timestamp) now returns NULL.
    
    This led to breaking functions like minute("12:05:05"), as minute()
    expects a timestamp, and Impala adds an implicit cast, so what actually
    happens is minute(cast("12:05:05" as timestamp)), which returns NULL.
    
    This change adds overloads for similar functions that take STRING
    instead of TIMESTAMP parameter. The same functions already take a
    STRING parameter in Hive and mySQL.
    
    The changes in the parser mainly restore code removed in IMPALA-9531.
    
    Note that these functions could be potentially optimized by returning
    parts of the parse result without converting them to boost time first,
    but this is not done here to make the change minimal.
    
    Testing:
    - restored related tests in expr-test and added some new ones for
      malformed time-of-day strings
    - added benchmarks for the new overloads and fixed the ones for the
      old functions (they tested NULL)
    
    Change-Id: I6cc1c851ee71ab4fcc58105c7e9931155a483679
    Reviewed-on: http://gerrit.cloudera.org:8080/18718
    Reviewed-by: Riza Suminto <ri...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/benchmarks/expr-benchmark.cc                | 154 ++++++++++++---------
 be/src/exprs/expr-test.cc                          |  19 +++
 be/src/exprs/timestamp-functions-ir.cc             |  41 ++++++
 be/src/exprs/timestamp-functions.h                 |   4 +
 be/src/runtime/date-parse-util.cc                  |   2 +-
 .../runtime/datetime-simple-date-format-parser.cc  |  34 ++++-
 .../runtime/datetime-simple-date-format-parser.h   |  10 +-
 be/src/runtime/timestamp-parse-util.cc             |   6 +-
 be/src/runtime/timestamp-parse-util.h              |   6 +-
 common/function-registry/impala_functions.py       |   4 +
 10 files changed, 202 insertions(+), 78 deletions(-)

diff --git a/be/src/benchmarks/expr-benchmark.cc b/be/src/benchmarks/expr-benchmark.cc
index d850a5771..52130e292 100644
--- a/be/src/benchmarks/expr-benchmark.cc
+++ b/be/src/benchmarks/expr-benchmark.cc
@@ -770,77 +770,88 @@ Benchmark* BenchmarkMathFunctions(bool codegen) {
   return suite;
 }
 
+// Machine Info: Intel(R) Core(TM) i5-6600 CPU @ 3.30GHz
 // TimestampFn:               Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
 //                                                                          (relative) (relative) (relative)
 // ---------------------------------------------------------------------------------------------------------
-//                             literal               33.8       34     34.1         1X         1X         1X
-//                           to_string               13.3     13.4     13.5     0.395X     0.395X     0.395X
-//                            add_year               16.5     16.7     16.8     0.488X      0.49X     0.492X
-//                           sub_month               16.3     16.3     16.4     0.482X      0.48X     0.481X
-//                           add_weeks               20.4     20.4     20.5     0.603X     0.599X     0.601X
-//                            sub_days               19.4     19.4     19.5     0.573X     0.569X     0.572X
-//                                 add               20.6     20.6     20.7     0.608X     0.605X     0.608X
-//                           sub_hours               18.7     18.7     18.9     0.553X      0.55X     0.553X
-//                         add_minutes               19.4     19.5     19.6     0.573X     0.575X     0.575X
-//                         sub_seconds               19.1     19.4     19.4     0.564X     0.569X     0.569X
-//                           add_milli               18.5     18.5     18.7     0.548X     0.545X     0.546X
-//                           sub_micro               17.9     17.9     18.1     0.529X     0.526X     0.529X
-//                            add_nano               18.3     18.3     18.5     0.542X      0.54X     0.541X
-//                     unix_timestamp1               37.9     37.9     38.1      1.12X      1.11X      1.12X
-//                     unix_timestamp2               51.9     52.2     52.5      1.54X      1.54X      1.54X
-//                          from_unix1               30.4     30.7     30.9     0.899X     0.902X     0.904X
-//                          from_unix2               43.3     43.5       44      1.28X      1.28X      1.29X
-//                          from_unix3               30.7     31.1     31.3      0.91X     0.916X     0.917X
-//                                year               39.3     39.4     39.7      1.16X      1.16X      1.16X
-//                               month               39.5     40.1     40.3      1.17X      1.18X      1.18X
-//                        day of month               38.2     38.4     38.5      1.13X      1.13X      1.13X
-//                         day of year               35.4     35.6     35.8      1.05X      1.05X      1.05X
-//                        week of year               34.6     34.6     34.8      1.02X      1.02X      1.02X
-//                                hour               81.5     81.9     82.5      2.41X      2.41X      2.42X
-//                              minute                 80     80.5     81.5      2.37X      2.37X      2.39X
-//                              second               81.2     82.2     82.9       2.4X      2.42X      2.43X
-//                             to date               19.4     19.4     19.5     0.573X     0.569X      0.57X
-//                           date diff               17.5     17.5     17.6     0.518X     0.515X     0.514X
-//                            from utc               21.9     21.9     22.2     0.649X     0.646X     0.649X
-//                              to utc               19.4     19.4     19.4     0.573X     0.569X     0.569X
-//                                 now                286      287      290      8.45X      8.45X       8.5X
-//                      unix_timestamp                207      208      211      6.13X      6.13X      6.17X
+//                             literal               10.8     11.6       12         1X         1X         1X
+//                           to_string               4.34     4.91     5.02     0.402X     0.424X     0.419X
+//                            add_year               6.17     7.11     7.29     0.572X     0.614X     0.609X
+//                           sub_month               6.13     6.93     7.12     0.568X     0.598X     0.595X
+//                           add_weeks               6.97     8.71     9.02     0.646X     0.752X     0.754X
+//                            sub_days               6.73     8.34     8.57     0.624X     0.721X     0.716X
+//                                 add               7.19     8.84     9.02     0.666X     0.763X     0.754X
+//                           sub_hours               7.37     8.07     8.24     0.683X     0.697X     0.688X
+//                         add_minutes               7.28     7.93     8.21     0.674X     0.685X     0.687X
+//                         sub_seconds               7.63     7.89     8.14     0.707X     0.682X      0.68X
+//                           add_milli               6.85      7.5     7.73     0.635X     0.648X     0.646X
+//                           sub_micro               6.97     7.49     7.64     0.646X     0.647X     0.639X
+//                            add_nano               6.79      7.4     7.62     0.629X     0.639X     0.637X
+//                     unix_timestamp1               12.9       14     14.5      1.19X      1.21X      1.21X
+//                     unix_timestamp2               18.1     19.9     20.4      1.67X      1.72X       1.7X
+//                          from_unix1               9.54     10.7       11     0.884X     0.924X     0.918X
+//                          from_unix2               14.4     16.1     16.7      1.34X      1.39X       1.4X
+//                          from_unix3               9.79     10.9     11.3     0.907X     0.945X     0.948X
+//                                year               12.4     14.2     14.5      1.15X      1.23X      1.21X
+//                               month                 13     14.4     14.6       1.2X      1.24X      1.22X
+//                        day of month               13.2     14.5     14.9      1.22X      1.25X      1.25X
+//                         day of year               11.7     12.7     12.9      1.08X      1.09X      1.08X
+//                        week of year               11.8     12.8     13.1      1.09X      1.11X       1.1X
+//                     hour(timestamp)               6.71     7.26     7.41     0.622X     0.627X     0.619X
+//                   minute(timestamp)               6.48     7.25     7.41     0.601X     0.626X     0.619X
+//                   second(timestamp)               6.55     7.24     7.41     0.607X     0.626X     0.619X
+//              millisecond(timestamp)               6.55     7.28     7.41     0.606X     0.629X     0.619X
+//                        hour(string)                 10       11     11.2     0.927X     0.946X     0.933X
+//                      minute(string)               9.91       11     11.2     0.918X     0.947X     0.933X
+//                      second(string)                9.9     10.9     11.2     0.917X     0.942X     0.933X
+//                 millisecond(string)               9.88     10.6       11     0.915X     0.916X     0.918X
+//                             to date               6.07     6.98     7.25     0.563X     0.603X     0.606X
+//                           date diff               5.75     6.39     6.55     0.533X     0.551X     0.547X
+//                            from utc               7.56     8.25     8.55       0.7X     0.712X     0.714X
+//                              to utc                6.6     7.28     7.49     0.612X     0.629X     0.626X
+//                                 now                103      109      112       9.5X      9.43X       9.4X
+//                      unix_timestamp               80.2     86.4     88.8      7.43X      7.46X      7.42X
 //
 // TimestampFnCodegen:        Function  iters/ms   10%ile   50%ile   90%ile     10%ile     50%ile     90%ile
 //                                                                          (relative) (relative) (relative)
 // ---------------------------------------------------------------------------------------------------------
-//                             literal               38.2     38.4     38.6         1X         1X         1X
-//                           to_string                 15     15.3     15.3     0.392X     0.398X     0.396X
-//                            add_year               18.3     18.3     18.5     0.479X     0.477X     0.478X
-//                           sub_month               17.9     18.1     18.1     0.469X      0.47X      0.47X
-//                           add_weeks               23.8     23.8     23.9     0.622X     0.619X     0.619X
-//                            sub_days               22.6     22.6     22.7      0.59X     0.588X     0.589X
-//                                 add               23.4     23.4     23.5     0.613X      0.61X     0.609X
-//                           sub_hours               21.8     21.8     21.9     0.569X     0.566X     0.566X
-//                         add_minutes               21.9     21.9       22     0.574X     0.571X      0.57X
-//                         sub_seconds               21.9     21.9       22     0.574X     0.571X      0.57X
-//                           add_milli               20.9     20.9     21.1     0.547X     0.545X     0.547X
-//                           sub_micro               20.1     20.1     20.2     0.525X     0.523X     0.523X
-//                            add_nano               21.1     21.1     21.3     0.552X     0.549X     0.551X
-//                     unix_timestamp1                 47     47.2     47.5      1.23X      1.23X      1.23X
-//                     unix_timestamp2               61.3     61.5     61.8       1.6X       1.6X       1.6X
-//                          from_unix1               34.8       35     35.3      0.91X     0.911X     0.914X
-//                          from_unix2               51.5     51.5       52      1.35X      1.34X      1.35X
-//                          from_unix3               34.8       35     35.3      0.91X     0.911X     0.914X
-//                                year               57.4     57.8     58.5       1.5X       1.5X      1.51X
-//                               month               58.4     58.6     59.1      1.53X      1.53X      1.53X
-//                        day of month               58.9     59.1     59.4      1.54X      1.54X      1.54X
-//                         day of year               53.2     53.7     54.3      1.39X       1.4X      1.41X
-//                        week of year               50.9     51.1     51.4      1.33X      1.33X      1.33X
-//                                hour                125      132      134      3.26X      3.43X      3.48X
-//                              minute                132      133      134      3.46X      3.46X      3.48X
-//                              second                132      133      135      3.46X      3.47X      3.49X
-//                             to date               24.2     24.3     24.6     0.632X     0.631X     0.637X
-//                           date diff               22.6     22.6     22.7     0.591X     0.588X     0.589X
-//                            from utc               38.1     38.5     38.8     0.995X         1X         1X
-//                              to utc               22.1     22.1     22.4     0.579X     0.576X      0.58X
-//                                 now                517      520      524      13.5X      13.5X      13.6X
-//                      unix_timestamp                399      403      406      10.4X      10.5X      10.5X
+//                             literal               11.4     11.8     12.1         1X         1X         1X
+//                           to_string               4.83        5     5.09     0.425X     0.424X      0.42X
+//                            add_year               6.98     7.18     7.36     0.615X     0.609X     0.607X
+//                           sub_month               6.69     6.93     7.05     0.589X     0.588X     0.581X
+//                           add_weeks               8.56     8.86     9.02     0.754X     0.752X     0.743X
+//                            sub_days                8.1     8.39     8.58     0.714X     0.712X     0.707X
+//                                 add               8.56     8.86     9.02     0.754X     0.752X     0.743X
+//                           sub_hours               7.87     8.09     8.34     0.693X     0.687X     0.688X
+//                         add_minutes               7.76        8     8.18     0.683X     0.679X     0.674X
+//                         sub_seconds               7.75     7.95      8.1     0.683X     0.674X     0.668X
+//                           add_milli                7.3     7.46      7.6     0.643X     0.633X     0.626X
+//                           sub_micro               7.33     7.59     7.79     0.645X     0.644X     0.642X
+//                            add_nano                7.2     7.46     7.59     0.634X     0.633X     0.625X
+//                     unix_timestamp1               13.5       14     14.3      1.19X      1.19X      1.18X
+//                     unix_timestamp2               19.3       20     20.4       1.7X       1.7X      1.68X
+//                          from_unix1               10.3     10.6       11     0.908X     0.901X     0.905X
+//                          from_unix2               15.8     16.4     16.9       1.4X      1.39X      1.39X
+//                          from_unix3               10.5     11.1     11.4     0.922X     0.945X     0.937X
+//                                year               14.3     14.8     15.1      1.26X      1.26X      1.24X
+//                               month               14.3     14.8     15.1      1.26X      1.26X      1.24X
+//                        day of month               14.3     14.6     14.9      1.26X      1.24X      1.23X
+//                         day of year               12.6     13.2     13.4      1.11X      1.12X      1.11X
+//                        week of year               12.2       13     13.4      1.08X       1.1X       1.1X
+//                     hour(timestamp)               7.28     7.41      7.6     0.641X     0.629X     0.626X
+//                   minute(timestamp)               7.16     7.41     7.55      0.63X     0.629X     0.622X
+//                   second(timestamp)               7.16      7.4      7.6      0.63X     0.628X     0.626X
+//              millisecond(timestamp)                7.2     7.41     7.59     0.634X     0.629X     0.625X
+//                        hour(string)               10.5       11     11.3     0.928X      0.93X     0.929X
+//                      minute(string)               10.5       11     11.2     0.921X      0.93X      0.92X
+//                      second(string)               10.6       11     11.3     0.933X      0.93X      0.93X
+//                 millisecond(string)               10.6       11     11.4     0.933X      0.93X     0.937X
+//                             to date               6.43     7.11     7.23     0.566X     0.603X     0.596X
+//                           date diff                6.3     6.43     6.55     0.554X     0.545X     0.539X
+//                            from utc               8.14     8.57     8.73     0.716X     0.727X     0.719X
+//                              to utc               7.13     7.41     7.55     0.628X     0.629X     0.622X
+//                                 now                107      110      113      9.45X      9.36X      9.34X
+//                      unix_timestamp               84.4     86.4     88.7      7.43X      7.33X      7.31X
 Benchmark* BenchmarkTimestampFunctions(bool codegen) {
   Benchmark* suite = new Benchmark(BenchmarkName("TimestampFn", codegen));
   BENCHMARK("literal", "cast('2012-01-01 09:10:11.123456789' as timestamp)");
@@ -880,9 +891,18 @@ Benchmark* BenchmarkTimestampFunctions(bool codegen) {
   BENCHMARK("day of month", "dayofmonth(cast('2011-12-22' as timestamp))");
   BENCHMARK("day of year", "dayofyear(cast('2011-12-22' as timestamp))");
   BENCHMARK("week of year", "weekofyear(cast('2011-12-22' as timestamp))");
-  BENCHMARK("hour", "hour(cast('09:10:11.000000' as timestamp))");
-  BENCHMARK("minute", "minute(cast('09:10:11.000000' as timestamp))");
-  BENCHMARK("second", "second(cast('09:10:11.000000' as timestamp))");
+  BENCHMARK("hour(timestamp)",
+      "hour(cast('1970-01-01 09:10:11.130000' as timestamp))");
+  BENCHMARK("minute(timestamp)",
+      "minute(cast('1970-01-01 09:10:11.130000' as timestamp))");
+  BENCHMARK(
+      "second(timestamp)", "second(cast('1970-01-01 09:10:11.130000' as timestamp))");
+  BENCHMARK(
+      "millisecond(timestamp)", "millisecond(cast('1970-01-01 09:10:11.130000' as timestamp))");
+  BENCHMARK("hour(string)", "hour('09:10:11.130000')");
+  BENCHMARK("minute(string)", "minute('09:10:11.130000')");
+  BENCHMARK("second(string)", "second('09:10:11.130000')");
+  BENCHMARK("millisecond(string)", "millisecond('09:10:11.130000')");
   BENCHMARK("to date",
       "to_date(cast('2011-12-22 09:10:11.12345678' as timestamp))");
   BENCHMARK("date diff", "datediff(cast('2011-12-22 09:10:11.12345678' as timestamp), "
diff --git a/be/src/exprs/expr-test.cc b/be/src/exprs/expr-test.cc
index d8a29cb47..688f8f74e 100644
--- a/be/src/exprs/expr-test.cc
+++ b/be/src/exprs/expr-test.cc
@@ -6954,6 +6954,25 @@ TEST_P(ExprTest, TimestampFunctions) {
   TestStringValue(
       "to_date(cast('2011-12-22 09:10:11.12345678' as timestamp))", "2011-12-22");
 
+  // These expressions directly extract hour/minute/second/millis from STRING type
+  // to support these functions for timestamp strings without a date part (IMPALA-11355).
+  TestValue("hour('09:10:11.000000')", TYPE_INT, 9);
+  TestValue("minute('09:10:11.000000')", TYPE_INT, 10);
+  TestValue("second('09:10:11.000000')", TYPE_INT, 11);
+  TestValue("millisecond('09:10:11.123456')", TYPE_INT, 123);
+  TestValue("millisecond('09:10:11')", TYPE_INT, 0);
+  // Test the functions above with invalid inputs.
+  TestIsNull("hour('09:10:1')", TYPE_INT);
+  TestIsNull("hour('838:59:59')", TYPE_INT);
+  TestIsNull("minute('09-10-11')", TYPE_INT);
+  TestIsNull("second('09:aa:11.000000')", TYPE_INT);
+  TestIsNull("second('09:10:11pm')", TYPE_INT);
+  TestIsNull("millisecond('24:11:11.123')", TYPE_INT);
+  TestIsNull("millisecond('09:61:11.123')", TYPE_INT);
+  TestIsNull("millisecond('09:10:61.123')", TYPE_INT);
+  TestIsNull("millisecond('09:10:11.123aaa')", TYPE_INT);
+  TestIsNull("millisecond('')", TYPE_INT);
+
   // Check that timeofday() does not crash or return incorrect results
   TestIsNotNull("timeofday()", TYPE_STRING);
 
diff --git a/be/src/exprs/timestamp-functions-ir.cc b/be/src/exprs/timestamp-functions-ir.cc
index 1cc06e184..79a0d59f1 100644
--- a/be/src/exprs/timestamp-functions-ir.cc
+++ b/be/src/exprs/timestamp-functions-ir.cc
@@ -276,6 +276,47 @@ IntVal TimestampFunctions::Millisecond(FunctionContext* context,
   return IntVal(time.total_milliseconds() - time.total_seconds() * 1000);
 }
 
+bool StringToTimeOfDay(
+    const StringVal& str_val, boost::posix_time::time_duration* time) {
+  if (str_val.is_null) return false;
+  boost::gregorian::date dummy_date;
+  return TimestampParser::ParseSimpleDateFormat(
+      reinterpret_cast<char*>(str_val.ptr), str_val.len, &dummy_date, time, true);
+}
+
+IntVal TimestampFunctions::Hour(FunctionContext* context, const StringVal& str_val) {
+  boost::posix_time::time_duration time;
+  if (!StringToTimeOfDay(str_val, &time)) {
+    return IntVal::null();
+  }
+  return IntVal(time.hours());
+}
+
+IntVal TimestampFunctions::Minute(FunctionContext* context, const StringVal& str_val) {
+  boost::posix_time::time_duration time;
+  if (!StringToTimeOfDay(str_val, &time)) {
+    return IntVal::null();
+  }
+  return IntVal(time.minutes());
+}
+
+IntVal TimestampFunctions::Second(FunctionContext* context, const StringVal& str_val) {
+  boost::posix_time::time_duration time;
+  if (!StringToTimeOfDay(str_val, &time)) {
+    return IntVal::null();
+  }
+  return IntVal(time.seconds());
+}
+
+IntVal TimestampFunctions::Millisecond(
+    FunctionContext* context, const StringVal& str_val) {
+  boost::posix_time::time_duration time;
+  if (!StringToTimeOfDay(str_val, &time)) {
+    return IntVal::null();
+  }
+  return IntVal(time.total_milliseconds() - time.total_seconds() * 1000);
+}
+
 TimestampVal TimestampFunctions::Now(FunctionContext* context) {
   const TimestampValue* now = context->impl()->state()->now();
   TimestampVal return_val;
diff --git a/be/src/exprs/timestamp-functions.h b/be/src/exprs/timestamp-functions.h
index 47a2d8643..68995a2fd 100644
--- a/be/src/exprs/timestamp-functions.h
+++ b/be/src/exprs/timestamp-functions.h
@@ -168,9 +168,13 @@ class TimestampFunctions {
   static IntVal DayOfYear(FunctionContext* context, const TimestampVal& ts_val);
   static IntVal WeekOfYear(FunctionContext* context, const TimestampVal& ts_val);
   static IntVal Hour(FunctionContext* context, const TimestampVal& ts_val);
+  static IntVal Hour(FunctionContext* context, const StringVal& str_val);
   static IntVal Minute(FunctionContext* context, const TimestampVal& ts_val);
+  static IntVal Minute(FunctionContext* context, const StringVal& str_val);
   static IntVal Second(FunctionContext* context, const TimestampVal& ts_val);
+  static IntVal Second(FunctionContext* context, const StringVal& str_val);
   static IntVal Millisecond(FunctionContext* context, const TimestampVal& ts_val);
+  static IntVal Millisecond(FunctionContext* context, const StringVal& str_val);
 
   /// Date/time functions.
   static TimestampVal Now(FunctionContext* context);
diff --git a/be/src/runtime/date-parse-util.cc b/be/src/runtime/date-parse-util.cc
index 8db8def80..ea8698d82 100644
--- a/be/src/runtime/date-parse-util.cc
+++ b/be/src/runtime/date-parse-util.cc
@@ -100,7 +100,7 @@ bool DateParser::ParseSimpleDateFormat(const char* str, int len, bool accept_tim
 
   const DateTimeFormatContext* dt_ctx =
       SimpleDateFormatTokenizer::GetDefaultFormatContext(str, trimmed_len,
-          accept_time_toks);
+          accept_time_toks, false);
   if (dt_ctx != nullptr) return ParseSimpleDateFormat(str, trimmed_len, *dt_ctx, date);
 
   // Generating context lazily as a fall back if default formats fail.
diff --git a/be/src/runtime/datetime-simple-date-format-parser.cc b/be/src/runtime/datetime-simple-date-format-parser.cc
index dac71d81b..5a937540e 100644
--- a/be/src/runtime/datetime-simple-date-format-parser.cc
+++ b/be/src/runtime/datetime-simple-date-format-parser.cc
@@ -34,6 +34,8 @@ namespace datetime_parse_util {
 bool SimpleDateFormatTokenizer::initialized = false;
 
 const int SimpleDateFormatTokenizer::DEFAULT_DATE_FMT_LEN = 10;
+const int SimpleDateFormatTokenizer::DEFAULT_TIME_FMT_LEN = 8;
+const int SimpleDateFormatTokenizer::DEFAULT_TIME_FRAC_FMT_LEN = 18;
 const int SimpleDateFormatTokenizer::DEFAULT_SHORT_DATE_TIME_FMT_LEN = 19;
 const int SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_FMT_LEN = 29;
 const int SimpleDateFormatTokenizer::FRACTIONAL_MAX_LEN = 9;
@@ -41,8 +43,10 @@ const int SimpleDateFormatTokenizer::FRACTIONAL_MAX_LEN = 9;
 DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_SHORT_DATE_TIME_CTX;
 DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_SHORT_ISO_DATE_TIME_CTX;
 DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_DATE_CTX;
+DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_TIME_CTX;
 DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_CTX[10];
 DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_ISO_DATE_TIME_CTX[10];
+DateTimeFormatContext SimpleDateFormatTokenizer::DEFAULT_TIME_FRAC_CTX[10];
 
 void SimpleDateFormatTokenizer::InitCtx() {
   if (initialized) return;
@@ -74,6 +78,17 @@ void SimpleDateFormatTokenizer::InitCtx() {
   DEFAULT_DATE_CTX.Reset("yyyy-MM-dd");
   Tokenize(&DEFAULT_DATE_CTX, PARSE);
 
+  // Setup the default short time context HH:mm:ss
+  DEFAULT_TIME_CTX.Reset("HH:mm:ss");
+  Tokenize(&DEFAULT_TIME_CTX, PARSE, true, true);
+
+  // Setup the default short time context with fractional seconds HH:mm:ss.SSSSSSSSS
+  for (int i = FRACTIONAL_MAX_LEN; i >= 0; --i) {
+    DEFAULT_TIME_FRAC_CTX[i].Reset(DATE_TIME_CTX_FMT + 11,
+        DEFAULT_TIME_FRAC_FMT_LEN - (FRACTIONAL_MAX_LEN - i));
+    Tokenize(&DEFAULT_TIME_FRAC_CTX[i], PARSE, true, true);
+  }
+
   // Flag that the parser is ready.
   initialized = true;
 }
@@ -97,7 +112,8 @@ bool SimpleDateFormatTokenizer::IsValidTZOffset(const char* str_begin,
 }
 
 bool SimpleDateFormatTokenizer::Tokenize(
-    DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool accept_time_toks) {
+    DateTimeFormatContext* dt_ctx, CastDirection cast_mode, bool accept_time_toks,
+    bool accept_time_toks_only) {
   DCHECK(dt_ctx != NULL);
   DCHECK(dt_ctx->fmt != NULL);
   DCHECK(dt_ctx->fmt_len > 0);
@@ -180,8 +196,8 @@ bool SimpleDateFormatTokenizer::Tokenize(
     }
     dt_ctx->toks.push_back(tok);
   }
-  if (cast_mode == PARSE) return (dt_ctx->has_date_toks);
-  return (dt_ctx->has_date_toks || dt_ctx->has_time_toks);
+  if (cast_mode == PARSE && !accept_time_toks_only) return dt_ctx->has_date_toks;
+  return dt_ctx->has_date_toks || dt_ctx->has_time_toks;
 }
 
 const char* SimpleDateFormatTokenizer::ParseDigitToken(const char* str,
@@ -340,12 +356,13 @@ bool SimpleDateFormatTokenizer::TokenizeByStr( DateTimeFormatContext* dt_ctx,
 }
 
 const DateTimeFormatContext* SimpleDateFormatTokenizer::GetDefaultFormatContext(
-    const char* str, int len, bool accept_time_toks) {
+    const char* str, int len, bool accept_time_toks, bool accept_time_toks_only) {
   DCHECK(initialized);
   DCHECK(str != nullptr);
   DCHECK(len > 0);
+  DCHECK(!accept_time_toks_only || accept_time_toks);
 
-  if (LIKELY(len >= DEFAULT_DATE_FMT_LEN)) {
+  if (LIKELY(len >= DEFAULT_TIME_FMT_LEN)) {
     // Check if this string starts with a date component
     if (str[4] == '-' && str[7] == '-') {
       // Do we have a date component only?
@@ -398,6 +415,13 @@ const DateTimeFormatContext* SimpleDateFormatTokenizer::GetDefaultFormatContext(
           break;
         }
       }
+    } else if (accept_time_toks_only && str[2] == ':' && str[5] == ':') {
+      if (len == DEFAULT_TIME_FMT_LEN) return &DEFAULT_TIME_CTX;
+      // There is only time component.
+      len = min(len, DEFAULT_TIME_FRAC_FMT_LEN);
+      if (len > DEFAULT_TIME_FMT_LEN && str[8] == '.') {
+        return &DEFAULT_TIME_FRAC_CTX[len - DEFAULT_TIME_FMT_LEN - 1];
+      }
     }
   }
   return nullptr;
diff --git a/be/src/runtime/datetime-simple-date-format-parser.h b/be/src/runtime/datetime-simple-date-format-parser.h
index 30b0812ff..2e0ce98bb 100644
--- a/be/src/runtime/datetime-simple-date-format-parser.h
+++ b/be/src/runtime/datetime-simple-date-format-parser.h
@@ -74,6 +74,8 @@ class SimpleDateFormatTokenizer {
 public:
   /// Constants to hold default format lengths.
   static const int DEFAULT_DATE_FMT_LEN;
+  static const int DEFAULT_TIME_FMT_LEN;
+  static const int DEFAULT_TIME_FRAC_FMT_LEN;
   static const int DEFAULT_SHORT_DATE_TIME_FMT_LEN;
   static const int DEFAULT_DATE_TIME_FMT_LEN;
   static const int FRACTIONAL_MAX_LEN;
@@ -85,7 +87,7 @@ public:
   /// cast_mode -- indicates if it is a 'datetime to string' or 'string to datetime' cast
   /// Return true if the parse was successful.
   static bool Tokenize(DateTimeFormatContext* dt_ctx, CastDirection cast_mode,
-      bool accept_time_toks = true);
+      bool accept_time_toks = true, bool accept_time_toks_only = false);
 
   /// Parse the date/time string to generate the DateTimeFormatToken required by
   /// DateTimeFormatContext. Similar to Tokenize() this function will take the string
@@ -106,10 +108,12 @@ public:
   /// len -- length of the string to parse (must be > 0)
   /// accept_time_toks -- if true, time tokens are accepted. Otherwise time tokens are
   /// rejected.
+  /// accept_time_toks_only -- if true, time tokens without date tokens are accepted.
+  /// Otherwise, they are rejected.
   /// Return the corresponding default format context if parsing succeeded, or nullptr
   /// otherwise.
   static const DateTimeFormatContext* GetDefaultFormatContext(const char* str, int len,
-      bool accept_time_toks);
+      bool accept_time_toks, bool accept_time_toks_only);
 
   /// Return default date/time format context for a timestamp parsing.
   /// If 'time' has a fractional seconds, context with pattern
@@ -134,8 +138,10 @@ private:
   static DateTimeFormatContext DEFAULT_SHORT_DATE_TIME_CTX;
   static DateTimeFormatContext DEFAULT_SHORT_ISO_DATE_TIME_CTX;
   static DateTimeFormatContext DEFAULT_DATE_CTX;
+  static DateTimeFormatContext DEFAULT_TIME_CTX;
   static DateTimeFormatContext DEFAULT_DATE_TIME_CTX[10];
   static DateTimeFormatContext DEFAULT_ISO_DATE_TIME_CTX[10];
+  static DateTimeFormatContext DEFAULT_TIME_FRAC_CTX[10];
 
   /// Checks if str_begin point to the beginning of a valid timezone offset.
   static bool IsValidTZOffset(const char* str_begin, const char* str_end);
diff --git a/be/src/runtime/timestamp-parse-util.cc b/be/src/runtime/timestamp-parse-util.cc
index 506b69f30..98339dba7 100644
--- a/be/src/runtime/timestamp-parse-util.cc
+++ b/be/src/runtime/timestamp-parse-util.cc
@@ -63,7 +63,8 @@ static bool IndicateTimestampParseFailure(date* d, time_duration* t) {
 }
 
 bool TimestampParser::ParseSimpleDateFormat(const char* str, int len,
-    boost::gregorian::date* d, boost::posix_time::time_duration* t) {
+    boost::gregorian::date* d, boost::posix_time::time_duration* t,
+    bool accept_time_toks_only) {
   DCHECK(d != nullptr);
   DCHECK(t != nullptr);
   if (UNLIKELY(str == nullptr)) return IndicateTimestampParseFailure(d, t);
@@ -100,7 +101,8 @@ bool TimestampParser::ParseSimpleDateFormat(const char* str, int len,
       SimpleDateFormatTokenizer::DEFAULT_DATE_TIME_FMT_LEN);
   // Determine the default formatting context that's required for parsing.
   const DateTimeFormatContext* dt_ctx =
-      SimpleDateFormatTokenizer::GetDefaultFormatContext(str, default_fmt_len, true);
+      SimpleDateFormatTokenizer::GetDefaultFormatContext(
+          str, default_fmt_len, true, accept_time_toks_only);
   if (dt_ctx != nullptr) {
     return ParseSimpleDateFormat(str, default_fmt_len, *dt_ctx, d, t);
   }
diff --git a/be/src/runtime/timestamp-parse-util.h b/be/src/runtime/timestamp-parse-util.h
index ad61a81c3..60eb8888a 100644
--- a/be/src/runtime/timestamp-parse-util.h
+++ b/be/src/runtime/timestamp-parse-util.h
@@ -39,13 +39,17 @@ class TimestampParser {
   /// date may be specified. All components are required in either the
   /// date or time except for the fractional seconds following the period. In the case
   /// of just a date, the time will be set to 00:00:00.
+  /// In case accept_time_toks_only=true, HH:mm:ss.SSSSSSSSS is also accepted and if
+  /// there is no data part in the string, the output date is set to invalid.
   /// str -- valid pointer to the string to parse
   /// len -- length of the string to parse (must be > 0)
   /// d -- the date value where the results of the parsing will be placed
   /// t -- the time value where the results of the parsing will be placed
+  /// accept_time_toks_only -- also accepts time of the day string without date part
   /// Returns true if the date/time was successfully parsed.
   static bool ParseSimpleDateFormat(const char* str, int len, boost::gregorian::date* d,
-      boost::posix_time::time_duration* t) WARN_UNUSED_RESULT;
+      boost::posix_time::time_duration* t,
+      bool accept_time_toks_only = false) WARN_UNUSED_RESULT;
 
   /// Parse a date/time string. The data must adhere to SimpleDateFormat, otherwise it
   /// will be rejected i.e. no missing tokens. In the case of just a date, the time will
diff --git a/common/function-registry/impala_functions.py b/common/function-registry/impala_functions.py
index dd81a692d..d6956aa60 100644
--- a/common/function-registry/impala_functions.py
+++ b/common/function-registry/impala_functions.py
@@ -133,9 +133,13 @@ visible_functions = [
   [['dayofyear'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions9DayOfYearEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
   [['week', 'weekofyear'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions10WeekOfYearEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
   [['hour'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions4HourEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+  [['hour'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions4HourEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['minute'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6MinuteEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+  [['minute'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions6MinuteEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['second'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6SecondEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+  [['second'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions6SecondEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['millisecond'], 'INT', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions11MillisecondEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
+  [['millisecond'], 'INT', ['STRING'], '_ZN6impala18TimestampFunctions11MillisecondEPN10impala_udf15FunctionContextERKNS1_9StringValE'],
   [['to_date'], 'STRING', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions6ToDateEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
   [['dayname'], 'STRING', ['TIMESTAMP'], '_ZN6impala18TimestampFunctions11LongDayNameEPN10impala_udf15FunctionContextERKNS1_12TimestampValE'],
   [['date_trunc'], 'TIMESTAMP', ['STRING', 'TIMESTAMP'],

[impala] 09/17: IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 555370e0caf89cf9c78fb962b0532161603a55c9
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Jan 17 16:27:44 2023 +0800

    IMPALA-11845: Fix incorrect check of struct STAR path in resolvePathWithMasking
    
    resolvePathWithMasking() is a wrapper on resolvePath() to further
    resolve nested columns inside the table masking view. When it was
    added, complex types in the select list hadn't been supported yet. So
    the table masking view can't expose complex type columns directly in the
    select list. Any paths in nested types will be further resolved inside
    the table masking view in resolvePathWithMasking().
    
    Take the following query as an example:
      select id, nested_struct.* from complextypestbl;
    If Ranger column-masking/row-filter policies applied on the table, the
    query is rewritten as
      select id, nested_struct.* from (
        select mask(id) from complextypestbl
        where row-filtering-condition
      ) t;
    Table masking view "t" can't expose the nested column "nested_struct".
    So we further resolve "nested_struct" inside the inlineView to use the
    masked table "complextypestbl". The underlying TableRef is expected to
    be a BaseTableRef.
    
    Paths that don't reference nested columns should be resolved and
    returned directly (just like the original resolvePath() does). E.g.
      select v.* from masked_view v
    is rewritten to
      select v.* from (
        select mask(c1), mask(c2), ..., mask(cn)
        from masked_view
        where row-filtering-condition
      ) v;
    
    The STAR path "v.*" should be resolved directly. However, it's treated
    as a nested column unexpectedly. The code then tries to resolve it
    inside the table "masked_view" and found "masked_view" is not a table so
    throws the IllegalStateException.
    
    These are the current conditions for identifying nested STAR paths:
     - The destType is STRUCT
     - And the resolved path is rooted at a valid tuple descriptor
    
    They don't really recognize the nested struct columns because STAR paths
    on table/view also match these conditions. When the STAR path is an
    expansion on a catalog table/view, the root tuple descriptor is
    exactly the output tuple of the table/view. The destType is the type of
    the tuple descriptor which is always a StructType.
    
    Note that STAR paths on other nested types, i.e. array/map, are invalid.
    So the first condition matches for all valid cases. The second condition
    also matches all valid cases since both the table/view and struct STAR
    expansion have the path rooted at a valid tuple descriptor.
    
    This patch fixes the check for nested struct STAR path by checking
    the matched types instead. Note that if "v.*" is a table/view expansion,
    the matched type list is empty. If "v.*" is a struct column expansion,
    the matched type list contains the STRUCT column type.
    
    Tests:
     - Add missing coverage on STAR paths (v.*) on masked views.
    
    Backport Note for 4.1.2:
     - Removed the test on EXPAND_COMPLEX_TYPES which is not supported.
    
    Change-Id: I8f1e78e325baafbe23101909d47e82bf140a2d77
    Reviewed-on: http://gerrit.cloudera.org:8080/19429
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../java/org/apache/impala/analysis/Analyzer.java  |   8 +-
 .../main/java/org/apache/impala/analysis/Path.java |   5 +-
 .../queries/QueryTest/ranger_column_masking.test   |  46 +++++++++
 .../ranger_column_masking_complex_types.test       | 105 ++++++++++++++++++++-
 4 files changed, 159 insertions(+), 5 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
index cfb5f2a05..a76c2e8be 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
@@ -1165,9 +1165,11 @@ public class Analyzer {
         return resolvedPath;
       }
     } else if (pathType == PathType.STAR) {
-      if (!resolvedPath.destType().isStructType() || !resolvedPath.isRootedAtTuple()) {
-        return resolvedPath;
-      }
+      // For "v.*", return directly if "v" is a table/view.
+      // If it's a struct column, we need to further resolve it if it's now resolved on a
+      // table masking view.
+      if (resolvedPath.getMatchedTypes().isEmpty()) return resolvedPath;
+      Preconditions.checkState(resolvedPath.destType().isStructType());
     }
     // In this case, resolvedPath is resolved on a nested column. Check if it's resolved
     // on a table masking view. The root TableRef(table/view) could be at a parent query
diff --git a/fe/src/main/java/org/apache/impala/analysis/Path.java b/fe/src/main/java/org/apache/impala/analysis/Path.java
index 6e3fd2d3b..53a2bc939 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Path.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Path.java
@@ -131,6 +131,7 @@ public class Path {
 
   // Registered table alias that this path is rooted at, if any.
   // Null if the path is rooted at a catalog table/view.
+  // Note that this is also set for STAR path of "v.*" when "v" is a catalog table/view.
   private final TupleDescriptor rootDesc_;
 
   // Catalog table that this resolved path is rooted at, if any.
@@ -142,7 +143,9 @@ public class Path {
   private final Path rootPath_;
 
   // List of matched types and field positions set during resolution. The matched
-  // types/positions describe the physical path through the schema tree.
+  // types/positions describe the physical path through the schema tree of a catalog
+  // table/view. Empty if the path corresponds to a catalog table/view, e.g. when it's a
+  // TABLE_REF or when it's a STAR on a table/view.
   private final List<Type> matchedTypes_ = new ArrayList<>();
   private final List<Integer> matchedPositions_ = new ArrayList<>();
 
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
index 1f0dac113..0c2671892 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
@@ -446,6 +446,52 @@ order by id limit 10
 STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
 ====
 ---- QUERY
+# Test on star select with table ref
+select v.* from functional.alltypes_view v
+order by id limit 10
+---- RESULTS
+0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1
+100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1
+200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1
+300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1
+400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1
+500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1
+600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1
+700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1
+800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1
+900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+select v.* from functional.alltypes_view v, functional.alltypestiny t where v.id = t.id
+---- RESULTS
+0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1
+100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1
+200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1
+300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1
+400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1
+500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1
+600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1
+700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+select v.id, v.*, t.id from functional.alltypes_view v, functional.alltypestiny t where v.id = t.id
+---- RESULTS
+0,0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,0
+100,100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,100
+200,200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,200
+300,300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,300
+400,400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,400
+500,500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,500
+600,600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,600
+700,700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,700
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Test on local view (CTE). Correctly ignore masking on local view names so the result
 # won't be 100 (affected by policy id => id * 100).
 use functional;
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
index 3f9fd0606..e088aeb12 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
@@ -45,7 +45,7 @@ select id, nested_struct.a from complextypestbl
 BIGINT,INT
 ====
 ---- QUERY
-# Test resolving nested columns in expending star expression.
+# Test resolving nested columns in expanding star expression.
 select id, nested_struct.* from complextypestbl
 ---- RESULTS
 100,1
@@ -60,6 +60,109 @@ select id, nested_struct.* from complextypestbl
 BIGINT,INT
 ====
 ---- QUERY
+# Test resolving nested columns in expanding star expression.
+select nested_struct.* from complextypestbl
+---- RESULTS
+1
+NULL
+NULL
+NULL
+NULL
+NULL
+7
+-1
+---- TYPES
+INT
+====
+---- QUERY
+# Test resolving explicit STAR path on a nested struct column inside array
+select id, nested_arr.item.*
+from functional_parquet.complextypestbl t,
+  t.nested_struct.c.d arr,
+  arr.item nested_arr;
+---- RESULTS
+100,10,'aaa'
+100,-10,'bbb'
+100,11,'c'
+200,NULL,'NULL'
+200,10,'aaa'
+200,NULL,'NULL'
+200,-10,'bbb'
+200,NULL,'NULL'
+200,11,'c'
+200,NULL,'NULL'
+700,NULL,'NULL'
+800,-1,'nonnullable'
+---- TYPES
+BIGINT,INT,STRING
+====
+---- QUERY
+# Test resolving explicit STAR path on a nested struct column inside array
+select nested_arr.item.*
+from functional_parquet.complextypestbl t,
+  t.nested_struct.c.d arr,
+  arr.item nested_arr;
+---- RESULTS
+10,'aaa'
+-10,'bbb'
+11,'c'
+NULL,'NULL'
+10,'aaa'
+NULL,'NULL'
+-10,'bbb'
+NULL,'NULL'
+11,'c'
+NULL,'NULL'
+NULL,'NULL'
+-1,'nonnullable'
+---- TYPES
+INT,STRING
+====
+---- QUERY
+# Test resolving implicit STAR path on a nested struct column inside array
+select id, nested_arr.*
+from functional_parquet.complextypestbl t,
+  t.nested_struct.c.d arr,
+  arr.item nested_arr;
+---- RESULTS
+100,10,'aaa'
+100,-10,'bbb'
+100,11,'c'
+200,NULL,'NULL'
+200,10,'aaa'
+200,NULL,'NULL'
+200,-10,'bbb'
+200,NULL,'NULL'
+200,11,'c'
+200,NULL,'NULL'
+700,NULL,'NULL'
+800,-1,'nonnullable'
+---- TYPES
+BIGINT,INT,STRING
+====
+---- QUERY
+# Test resolving explicit STAR path on a nested struct column inside array
+select nested_arr.*
+from functional_parquet.complextypestbl t,
+  t.nested_struct.c.d arr,
+  arr.item nested_arr;
+---- RESULTS
+10,'aaa'
+-10,'bbb'
+11,'c'
+NULL,'NULL'
+10,'aaa'
+NULL,'NULL'
+-10,'bbb'
+NULL,'NULL'
+11,'c'
+NULL,'NULL'
+NULL,'NULL'
+-1,'nonnullable'
+---- TYPES
+INT,STRING
+====
+---- QUERY
 # Test resolving nested column in function.
 select count(id), count(nested_struct.a) from complextypestbl
 ---- RESULTS

[impala] 14/17: IMPALA-11751: Template tuple of Avro header should be transferred to ScanRangeSharedState

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit f3f0293df4c67bea7fdc136469d6835729ddee66
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Nov 29 18:53:17 2022 +0800

    IMPALA-11751: Template tuple of Avro header should be transferred to ScanRangeSharedState
    
    Sequence container based file formats (SequenceFile, RCFile, Avro) have
    a file header in each file that describes the metadata of the file, e.g.
    codec, default values, etc. The header should be decoded before reading
    the file content. The initial scanners will read the header and then
    issue follow-up scan ranges for the file content. The decoded header
    will be referenced by follow-up scanners.
    
    Since IMPALA-9655, when MT_DOP > 1, the issued scan ranges could be
    scheduled to other scan node instances. So the header resource should
    live until all scan node instances close. Header objects are owned by
    the object pool of the RuntimeState, which meets the requirement.
    
    AvroFileHeader is special than other headers in that it references a
    template tuple which contains the partition values and default values
    for missing fields. The template tuple is initially owned by the header
    scanner, then transferred to the scan node before the scanner closes.
    However, when the scan node instance closes, the template tuple is
    freed. Scanners of other scan node instances might still depend on it.
    This could cause wrong results or crash the impalad.
    
    When partition columns are used in the query, or when the underlying
    avro files have missing fields and the table schema has default values
    for them, the AvroFileHeader will have a non-null template tuple, which
    could hit this bug when MT_DOP>1.
    
    This patch fixes the bug by transferring the template tuple to
    ScanRangeSharedState directly. The scan_node_pool of HdfsScanNodeBase is
    also removed since it's only used to hold the template tuple (and
    related buffers) of the avro header. Also no need to override
    TransferToScanNodePool in HdfsScanNode since the original purpose is to
    protect the pool by a lock, and now the method in ScanRangeSharedState
    already has a lock.
    
    Tests
     - Add missing test coverage for compute stats on avro tables. Note that
       MT_DOP=4 is set by default for compute stats.
     - Add the MT_DOP dimension for TestScannersAllTableFormats. Also add
       some queries that can reveal the bug in scanners.test. The ASAN build
       can easily crash by heap-use-after-free error without this fix.
     - Ran exhaustive tests.
    
    Backport Notes:
     - Trivial conflicts in hdfs-scan-node-base.h and hdfs-scan-node-base.cc
       due to missing iceberg_partition_filtering_pool_ and
       HasVirtualColumnInTemplateTuple().
    
    Change-Id: Iafa43fce7c2ffdc867004d11e5873327c3d8cb42
    Reviewed-on: http://gerrit.cloudera.org:8080/19289
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/base-sequence-scanner.cc               |   2 +-
 be/src/exec/grouping-aggregator.cc                 |   1 -
 be/src/exec/hdfs-avro-scanner.cc                   |   4 +-
 be/src/exec/hdfs-avro-scanner.h                    |   5 +-
 be/src/exec/hdfs-scan-node-base.cc                 |  12 +-
 be/src/exec/hdfs-scan-node-base.h                  |  11 +-
 be/src/exec/hdfs-scan-node.cc                      |   5 -
 be/src/exec/hdfs-scan-node.h                       |   3 -
 be/src/exec/streaming-aggregation-node.h           |   4 +-
 .../queries/QueryTest/compute-stats-avro.test      | 151 +++++++++++++++++++++
 .../queries/QueryTest/scanners.test                |  56 ++++++++
 tests/query_test/test_scanners.py                  |   3 +
 12 files changed, 231 insertions(+), 26 deletions(-)

diff --git a/be/src/exec/base-sequence-scanner.cc b/be/src/exec/base-sequence-scanner.cc
index ae66d0e01..9bb94929e 100644
--- a/be/src/exec/base-sequence-scanner.cc
+++ b/be/src/exec/base-sequence-scanner.cc
@@ -169,7 +169,7 @@ Status BaseSequenceScanner::GetNextInternal(RowBatch* row_batch) {
       return Status::OK();
     }
     // Header is parsed, set the metadata in the scan node and issue more ranges.
-    static_cast<HdfsScanNodeBase*>(scan_node_)->SetFileMetadata(
+    scan_node_->SetFileMetadata(
         context_->partition_descriptor()->id(), stream_->filename(), header_);
     const HdfsFileDesc* desc = scan_node_->GetFileDesc(
         context_->partition_descriptor()->id(), stream_->filename());
diff --git a/be/src/exec/grouping-aggregator.cc b/be/src/exec/grouping-aggregator.cc
index 40471892e..75425d599 100644
--- a/be/src/exec/grouping-aggregator.cc
+++ b/be/src/exec/grouping-aggregator.cc
@@ -475,7 +475,6 @@ Status GroupingAggregator::AddBatch(RuntimeState* state, RowBatch* batch) {
 
 Status GroupingAggregator::AddBatchStreaming(
     RuntimeState* state, RowBatch* out_batch, RowBatch* child_batch, bool* eos) {
-  RETURN_IF_ERROR(QueryMaintenance(state));
   SCOPED_TIMER(streaming_timer_);
   RETURN_IF_ERROR(QueryMaintenance(state));
   num_input_rows_ += child_batch->num_rows();
diff --git a/be/src/exec/hdfs-avro-scanner.cc b/be/src/exec/hdfs-avro-scanner.cc
index 3cd20ba6e..850a88633 100644
--- a/be/src/exec/hdfs-avro-scanner.cc
+++ b/be/src/exec/hdfs-avro-scanner.cc
@@ -120,7 +120,7 @@ Status HdfsAvroScanner::ReadFileHeader() {
 
   // Transfer ownership so the memory remains valid for subsequent scanners that process
   // the data portions of the file.
-  scan_node_->TransferToScanNodePool(template_tuple_pool_.get());
+  scan_node_->TransferToSharedStatePool(template_tuple_pool_.get());
   return Status::OK();
 }
 
@@ -289,6 +289,8 @@ Status HdfsAvroScanner::ResolveSchemas(const AvroSchemaElement& table_root,
 
 Status HdfsAvroScanner::WriteDefaultValue(
     SlotDescriptor* slot_desc, avro_datum_t default_value, const char* field_name) {
+  // avro_header could have null template_tuple here if no partition columns are
+  // materialized and no default values are set yet.
   if (avro_header_->template_tuple == nullptr) {
     if (template_tuple_ != nullptr) {
       avro_header_->template_tuple = template_tuple_;
diff --git a/be/src/exec/hdfs-avro-scanner.h b/be/src/exec/hdfs-avro-scanner.h
index 01220eee0..ee5b96579 100644
--- a/be/src/exec/hdfs-avro-scanner.h
+++ b/be/src/exec/hdfs-avro-scanner.h
@@ -121,8 +121,9 @@ class HdfsAvroScanner : public BaseSequenceScanner {
     /// Set to nullptr if there are no materialized partition keys and no default values
     /// are necessary (i.e., all materialized fields are present in the file schema).
     /// This tuple is created by the scanner processing the initial scan range with
-    /// the header. The ownership of memory is transferred to the scan-node pool,
-    /// such that it remains live when subsequent scanners process data ranges.
+    /// the header. The ownership of memory is transferred to the template pool of
+    /// ScanRangeSharedState, such that it remains live when subsequent scanners process
+    /// data ranges.
     Tuple* template_tuple;
 
     /// True if this file can use the codegen'd version of DecodeAvroData() (i.e. its
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 019467732..be60f33b5 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -465,7 +465,6 @@ Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
   }
 
   // One-time initialization of state that is constant across scan ranges
-  scan_node_pool_.reset(new MemPool(mem_tracker()));
   runtime_profile()->AddInfoString("Table Name", hdfs_table_->fully_qualified_name());
 
   if (HasRowBatchQueue()) {
@@ -627,8 +626,6 @@ void HdfsScanNodeBase::Close(RuntimeState* state) {
   // There should be no active hdfs read threads.
   DCHECK_EQ(active_hdfs_read_thread_counter_.value(), 0);
 
-  if (scan_node_pool_.get() != NULL) scan_node_pool_->FreeAll();
-
   // Close collection conjuncts
   for (auto& tid_conjunct_eval : conjunct_evals_map_) {
     // ExecNode::conjunct_evals_ are already closed in ExecNode::Close()
@@ -988,8 +985,8 @@ void HdfsScanPlanNode::ComputeSlotMaterializationOrder(
   }
 }
 
-void HdfsScanNodeBase::TransferToScanNodePool(MemPool* pool) {
-  scan_node_pool_->AcquireData(pool, false);
+void HdfsScanNodeBase::TransferToSharedStatePool(MemPool* pool) {
+  shared_state_->TransferToSharedStatePool(pool);
 }
 
 void HdfsScanNodeBase::UpdateHdfsSplitStats(
@@ -1193,6 +1190,11 @@ Tuple* ScanRangeSharedState::GetTemplateTupleForPartitionId(int64_t partition_id
   return partition_template_tuple_map_[partition_id];
 }
 
+void ScanRangeSharedState::TransferToSharedStatePool(MemPool* pool) {
+  unique_lock<mutex> l(metadata_lock_);
+  template_pool_->AcquireData(pool, false);
+}
+
 void ScanRangeSharedState::UpdateRemainingScanRangeSubmissions(int32_t delta) {
   int new_val = remaining_scan_range_submissions_.Add(delta);
   DCHECK_GE(new_val, 0);
diff --git a/be/src/exec/hdfs-scan-node-base.h b/be/src/exec/hdfs-scan-node-base.h
index 7f5643756..539eb073a 100644
--- a/be/src/exec/hdfs-scan-node-base.h
+++ b/be/src/exec/hdfs-scan-node-base.h
@@ -191,6 +191,9 @@ class ScanRangeSharedState {
   /// cancellation. Must be called before adding or removing scan ranges to the queue.
   void AddCancellationHook(RuntimeState* state);
 
+  /// Transfers all memory from 'pool' to 'template_pool_'.
+  void TransferToSharedStatePool(MemPool* pool);
+
  private:
   friend class HdfsScanPlanNode;
 
@@ -602,8 +605,8 @@ class HdfsScanNodeBase : public ScanNode {
         && (IsZeroSlotTableScan() || optimize_count_star());
   }
 
-  /// Transfers all memory from 'pool' to 'scan_node_pool_'.
-  virtual void TransferToScanNodePool(MemPool* pool);
+  /// Transfers all memory from 'pool' to shared state of all scanners.
+  void TransferToSharedStatePool(MemPool* pool);
 
   /// map from volume id to <number of split, per volume split lengths>
   typedef boost::unordered_map<int32_t, std::pair<int, int64_t>> PerVolumeStats;
@@ -780,10 +783,6 @@ class HdfsScanNodeBase : public ScanNode {
   /// taken where there are i concurrent hdfs read thread running. Created in Open().
   std::vector<RuntimeProfile::Counter*>* hdfs_read_thread_concurrency_bucket_ = nullptr;
 
-  /// Pool for allocating some amounts of memory that is shared between scanners.
-  /// e.g. partition key tuple and their string buffers
-  boost::scoped_ptr<MemPool> scan_node_pool_;
-
   /// Status of failed operations.  This is set in the ScannerThreads
   /// Returned in GetNext() if an error occurred.  An non-ok status triggers cleanup
   /// scanner threads.
diff --git a/be/src/exec/hdfs-scan-node.cc b/be/src/exec/hdfs-scan-node.cc
index d9d7985f2..055d1080a 100644
--- a/be/src/exec/hdfs-scan-node.cc
+++ b/be/src/exec/hdfs-scan-node.cc
@@ -197,11 +197,6 @@ void HdfsScanNode::RangeComplete(const THdfsFileFormat::type& file_type,
   HdfsScanNodeBase::RangeComplete(file_type, compression_type, skipped);
 }
 
-void HdfsScanNode::TransferToScanNodePool(MemPool* pool) {
-  unique_lock<timed_mutex> l(lock_);
-  HdfsScanNodeBase::TransferToScanNodePool(pool);
-}
-
 void HdfsScanNode::AddMaterializedRowBatch(unique_ptr<RowBatch> row_batch) {
   InitNullCollectionValues(row_batch.get());
   thread_state_.EnqueueBatch(move(row_batch));
diff --git a/be/src/exec/hdfs-scan-node.h b/be/src/exec/hdfs-scan-node.h
index def5d8b95..db6a8f663 100644
--- a/be/src/exec/hdfs-scan-node.h
+++ b/be/src/exec/hdfs-scan-node.h
@@ -104,9 +104,6 @@ class HdfsScanNode : public HdfsScanNodeBase {
       const std::vector<THdfsCompression::type>& compression_type, bool skipped = false)
       override;
 
-  /// Transfers all memory from 'pool' to 'scan_node_pool_'.
-  virtual void TransferToScanNodePool(MemPool* pool) override;
-
   virtual ExecutionModel getExecutionModel() const override {
     return NON_TASK_BASED_SYNC;
   }
diff --git a/be/src/exec/streaming-aggregation-node.h b/be/src/exec/streaming-aggregation-node.h
index 6af70b075..e1427719b 100644
--- a/be/src/exec/streaming-aggregation-node.h
+++ b/be/src/exec/streaming-aggregation-node.h
@@ -34,11 +34,11 @@ class RuntimeState;
 /// aggregate the rows into its hash table, but if there is not enough memory available or
 /// if the reduction from the aggregation is not very good, it will 'stream' the rows
 /// through and return them without aggregating them instead of spilling. After all of the
-/// input as been processed from child(0), subsequent calls to GetNext() will return any
+/// input has been processed from child(0), subsequent calls to GetNext() will return any
 /// rows that were aggregated in the Aggregator's hash table.
 ///
 /// Since the rows returned by GetNext() may be only partially aggregated if there are
-/// memory contraints, this is a preliminary aggregation step that functions as an
+/// memory constraints, this is a preliminary aggregation step that functions as an
 /// optimization and will always be followed in the plan by an AggregationNode that does
 /// the final aggregation.
 ///
diff --git a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
index 2d6d19018..387fd9bed 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
@@ -53,6 +53,157 @@ COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE, #TRUES, #FALSES
 STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE, BIGINT, BIGINT
 ====
 ---- QUERY
+# Non-empty Avro table with matching column definitions and Avro schema
+create external table avro_hive_alltypes_ext
+like functional_avro_snap.alltypes;
+alter table avro_hive_alltypes_ext
+set location '/test-warehouse/alltypes_avro_snap';
+alter table avro_hive_alltypes_ext recover partitions;
+compute stats avro_hive_alltypes_ext;
+---- RESULTS
+'Updated 24 partition(s) and 11 column(s).'
+---- TYPES
+STRING
+====
+---- QUERY
+show table stats avro_hive_alltypes_ext
+---- LABELS
+YEAR, MONTH, #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION, FORMAT, INCREMENTAL STATS, LOCATION
+---- RESULTS
+'2009','1',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','2',280,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','3',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','4',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','5',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','6',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','7',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','8',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','9',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','10',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','11',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','12',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','1',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','2',280,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','3',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','4',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','5',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','6',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','7',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','8',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','9',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','10',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','11',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','12',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'Total','',7300,24,regex:.*,'0B','','','',''
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+show column stats avro_hive_alltypes_ext
+---- LABELS
+COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE, #TRUES, #FALSES
+---- RESULTS
+'id','INT',7300,0,4,4,-1,-1
+'bool_col','BOOLEAN',2,0,1,1,3650,3650
+'tinyint_col','INT',10,0,4,4,-1,-1
+'smallint_col','INT',10,0,4,4,-1,-1
+'int_col','INT',10,0,4,4,-1,-1
+'bigint_col','BIGINT',10,0,8,8,-1,-1
+'float_col','FLOAT',10,0,4,4,-1,-1
+'double_col','DOUBLE',10,0,8,8,-1,-1
+'date_string_col','STRING',736,0,8,8,-1,-1
+'string_col','STRING',10,0,1,1,-1,-1
+'timestamp_col','STRING',7224,0,22,21.66438293457031,-1,-1
+'year','INT',2,0,4,4,-1,-1
+'month','INT',12,0,4,4,-1,-1
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE, BIGINT, BIGINT
+====
+---- QUERY
+# Non-empty Avro table with matching column definitions and Avro schema, but with
+# different partition schema. Note that we use INT for tinyint_col and smallint_col,
+# and STRING for timestamp_col. See HIVE_TO_AVRO_TYPE_MAP in
+# testdata/bin/generate-schema-statements.py
+create external table avro_hive_alltypes_str_part (
+  id int,
+  bool_col boolean,
+  tinyint_col int,
+  smallint_col int,
+  int_col int,
+  bigint_col bigint,
+  float_col float,
+  double_col double,
+  date_string_col string,
+  string_col string,
+  timestamp_col string
+) partitioned by (
+  year string,
+  month string
+)
+stored as avro
+location '/test-warehouse/alltypes_avro_snap';
+alter table avro_hive_alltypes_str_part recover partitions;
+compute stats avro_hive_alltypes_str_part;
+---- RESULTS
+'Updated 24 partition(s) and 11 column(s).'
+---- TYPES
+STRING
+====
+---- QUERY
+show table stats avro_hive_alltypes_str_part
+---- LABELS
+YEAR, MONTH, #ROWS, #FILES, SIZE, BYTES CACHED, CACHE REPLICATION, FORMAT, INCREMENTAL STATS, LOCATION
+---- RESULTS
+'2009','1',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','2',280,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','3',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','4',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','5',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','6',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','7',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','8',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','9',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','10',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','11',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2009','12',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','1',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','2',280,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','3',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','4',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','5',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','6',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','7',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','8',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','9',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','10',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','11',300,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'2010','12',310,1,regex:.*,'NOT CACHED','NOT CACHED','AVRO','false',regex:.*
+'Total','',7300,24,regex:.*,'0B','','','',''
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, STRING, STRING, STRING, STRING, STRING, STRING
+====
+---- QUERY
+show column stats avro_hive_alltypes_str_part
+---- LABELS
+COLUMN, TYPE, #DISTINCT VALUES, #NULLS, MAX SIZE, AVG SIZE, #TRUES, #FALSES
+---- RESULTS
+'id','INT',7300,0,4,4,-1,-1
+'bool_col','BOOLEAN',2,0,1,1,3650,3650
+'tinyint_col','INT',10,0,4,4,-1,-1
+'smallint_col','INT',10,0,4,4,-1,-1
+'int_col','INT',10,0,4,4,-1,-1
+'bigint_col','BIGINT',10,0,8,8,-1,-1
+'float_col','FLOAT',10,0,4,4,-1,-1
+'double_col','DOUBLE',10,0,8,8,-1,-1
+'date_string_col','STRING',736,0,8,8,-1,-1
+'string_col','STRING',10,0,1,1,-1,-1
+'timestamp_col','STRING',7224,0,22,21.66438293457031,-1,-1
+'year','STRING',2,0,-1,-1,-1,-1
+'month','STRING',12,0,-1,-1,-1,-1
+---- TYPES
+STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE, BIGINT, BIGINT
+====
+---- QUERY
 # Avro table with an extra column definition.
 compute stats avro_hive_alltypes_extra_coldef
 ---- RESULTS
diff --git a/testdata/workloads/functional-query/queries/QueryTest/scanners.test b/testdata/workloads/functional-query/queries/QueryTest/scanners.test
index 002d5d0d9..8dd741f67 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/scanners.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/scanners.test
@@ -242,3 +242,59 @@ aggregation(SUM, RowsRead): 100
 aggregation(SUM, RowsRead): 0
 aggregation(SUM, RowsReturned): 200
 ====
+---- QUERY
+select year, count(*) from alltypes group by year
+---- RESULTS
+2009,3650
+2010,3650
+---- TYPES
+INT, BIGINT
+====
+---- QUERY
+select month, count(*) from alltypes group by month
+---- RESULTS
+1,620
+2,560
+3,620
+4,600
+5,620
+6,600
+7,620
+8,620
+9,600
+10,620
+11,600
+12,620
+---- TYPES
+INT, BIGINT
+====
+---- QUERY
+select year, month, count(*) from alltypes group by year, month
+---- RESULTS
+2009,1,310
+2009,2,280
+2009,3,310
+2009,4,300
+2009,5,310
+2009,6,300
+2009,7,310
+2009,8,310
+2009,9,300
+2009,10,310
+2009,11,300
+2009,12,310
+2010,1,310
+2010,2,280
+2010,3,310
+2010,4,300
+2010,5,310
+2010,6,300
+2010,7,310
+2010,8,310
+2010,9,300
+2010,10,310
+2010,11,300
+2010,12,310
+---- TYPES
+INT, INT, BIGINT
+====
diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index ca15d43a2..6aea011a5 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -91,12 +91,14 @@ class TestScannersAllTableFormats(ImpalaTestSuite):
         ImpalaTestDimension('batch_size', *TestScannersAllTableFormats.BATCH_SIZES))
     cls.ImpalaTestMatrix.add_dimension(
         ImpalaTestDimension('debug_action', *DEBUG_ACTION_DIMS))
+    cls.ImpalaTestMatrix.add_dimension(ImpalaTestDimension('mt_dop', *MT_DOP_VALUES))
 
   def test_scanners(self, vector):
     new_vector = deepcopy(vector)
     # Copy over test dimensions to the matching query options.
     new_vector.get_value('exec_option')['batch_size'] = vector.get_value('batch_size')
     new_vector.get_value('exec_option')['debug_action'] = vector.get_value('debug_action')
+    new_vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')
     self.run_test_case('QueryTest/scanners', new_vector)
 
   def test_many_nulls(self, vector):
@@ -107,6 +109,7 @@ class TestScannersAllTableFormats(ImpalaTestSuite):
     new_vector = deepcopy(vector)
     new_vector.get_value('exec_option')['batch_size'] = vector.get_value('batch_size')
     new_vector.get_value('exec_option')['debug_action'] = vector.get_value('debug_action')
+    new_vector.get_value('exec_option')['mt_dop'] = vector.get_value('mt_dop')
     self.run_test_case('QueryTest/scanners-many-nulls', new_vector)
 
   def test_hdfs_scanner_profile(self, vector):

[impala] 13/17: IMPALA-11081: Fix incorrect results in partition key scan

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 794eb1ba4a6d459379dee91c4274be3f40bd16ac
Author: zhangyifan27 <ch...@163.com>
AuthorDate: Fri Feb 3 16:43:08 2023 +0800

    IMPALA-11081: Fix incorrect results in partition key scan
    
    This patch fixes incorrect results caused by short-circuit partition
    key scan in the case where a Parquet/ORC file contains multiple
    blocks.
    
    IMPALA-8834 introduced the optimization that generating only one
    scan range that corresponding to the first block per file. Backends
    only issue footer ranges for Parquet/ORC files for file-metadata-only
    queries(see HdfsScanner::IssueFooterRanges()), which leads to
    incorrect results if the first block doesn't include a file footer.
    This bug is fixed by returning a scan range corresponding to the last
    block for Parquet/ORC files to make sure it contains a file footer.
    
    Testing:
    - Added e2e tests to verify the fix.
    
    Backport Notes:
    - Trivial conflicts in HdfsScanNode.java and test_partition_metadata.py
    
    Change-Id: I17331ed6c26a747e0509dcbaf427cd52808943b1
    Reviewed-on: http://gerrit.cloudera.org:8080/19471
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../org/apache/impala/planner/HdfsScanNode.java    | 10 ++++-
 tests/common/test_dimensions.py                    | 15 ++++++++
 tests/metadata/test_partition_metadata.py          |  9 +----
 tests/query_test/test_queries.py                   | 45 +++++++++++++++++++++-
 4 files changed, 70 insertions(+), 9 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
index 78577769d..265ed00a8 100644
--- a/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/HdfsScanNode.java
@@ -1324,7 +1324,15 @@ public class HdfsScanNode extends ScanNode {
     Preconditions.checkArgument(fileDesc.getNumFileBlocks() > 0);
     boolean fileDescMissingDiskIds = false;
     long fileMaxScanRangeBytes = 0;
-    for (int i = 0; i < fileDesc.getNumFileBlocks(); ++i) {
+    int i = 0;
+    if (isPartitionKeyScan_ && (partition.getFileFormat().isParquetBased()
+        || partition.getFileFormat() == HdfsFileFormat.ORC)) {
+      // IMPALA-8834 introduced the optimization for partition key scan by generating
+      // one scan range for each HDFS file. With Parquet and ORC, we start with the last
+      // block is to get a scan range that contains a file footer for short-circuiting.
+      i = fileDesc.getNumFileBlocks() - 1;
+    }
+    for (; i < fileDesc.getNumFileBlocks(); ++i) {
       FbFileBlock block = fileDesc.getFbFileBlock(i);
       int replicaHostCount = FileBlock.getNumReplicaHosts(block);
       if (replicaHostCount == 0) {
diff --git a/tests/common/test_dimensions.py b/tests/common/test_dimensions.py
index 5a5788aac..48af54227 100644
--- a/tests/common/test_dimensions.py
+++ b/tests/common/test_dimensions.py
@@ -27,6 +27,21 @@ from tests.util.filesystem_utils import (
 
 WORKLOAD_DIR = os.environ['IMPALA_WORKLOAD_DIR']
 
+# Map from the test dimension file_format string to the SQL "STORED AS" or "STORED BY"
+# argument.
+FILE_FORMAT_TO_STORED_AS_MAP = {
+  'text': 'TEXTFILE',
+  'seq': 'SEQUENCEFILE',
+  'rc': 'RCFILE',
+  'orc': 'ORC',
+  'parquet': 'PARQUET',
+  'hudiparquet': 'HUDIPARQUET',
+  'avro': 'AVRO',
+  'hbase': "'org.apache.hadoop.hive.hbase.HBaseStorageHandler'",
+  'kudu': "KUDU",
+  'iceberg': "ICEBERG"
+}
+
 # Describes the configuration used to execute a single tests. Contains both the details
 # of what specific table format to target along with the exec options (num_nodes, etc)
 # to use when running the query.
diff --git a/tests/metadata/test_partition_metadata.py b/tests/metadata/test_partition_metadata.py
index 363014761..e86751084 100644
--- a/tests/metadata/test_partition_metadata.py
+++ b/tests/metadata/test_partition_metadata.py
@@ -20,14 +20,9 @@ from tests.common.impala_test_suite import ImpalaTestSuite
 from tests.common.skip import (SkipIfS3, SkipIfABFS, SkipIfADLS, SkipIfIsilon,
                                SkipIfGCS, SkipIfCOS, SkipIfLocal)
 from tests.common.test_dimensions import (create_single_exec_option_dimension,
-    create_uncompressed_text_dimension)
+    create_uncompressed_text_dimension, FILE_FORMAT_TO_STORED_AS_MAP)
 from tests.util.filesystem_utils import get_fs_path, WAREHOUSE, FILESYSTEM_PREFIX
 
-# Map from the test dimension file_format string to the SQL "STORED AS"
-# argument.
-STORED_AS_ARGS = { 'text': 'textfile', 'parquet': 'parquet', 'avro': 'avro',
-    'seq': 'sequencefile' }
-
 # Tests specific to partition metadata.
 # TODO: Split up the DDL tests and move some of the partition-specific tests
 # here.
@@ -60,7 +55,7 @@ class TestPartitionMetadata(ImpalaTestSuite):
     # Create the table
     self.client.execute(
         "create table %s (i int) partitioned by(j int) stored as %s location '%s'"
-        % (FQ_TBL_NAME, STORED_AS_ARGS[file_format], TBL_LOCATION))
+        % (FQ_TBL_NAME, FILE_FORMAT_TO_STORED_AS_MAP[file_format], TBL_LOCATION))
 
     # Point both partitions to the same location.
     self.client.execute("alter table %s add partition (j=1) location '%s/p'"
diff --git a/tests/query_test/test_queries.py b/tests/query_test/test_queries.py
index afd7354e9..0a907046f 100644
--- a/tests/query_test/test_queries.py
+++ b/tests/query_test/test_queries.py
@@ -26,7 +26,9 @@ from tests.common.skip import SkipIfEC, SkipIfCatalogV2, SkipIfNotHdfsMinicluste
 from tests.common.test_dimensions import (
    create_uncompressed_text_dimension, create_exec_option_dimension_from_dict,
    create_client_protocol_dimension, hs2_parquet_constraint,
-   extend_exec_option_dimension)
+   extend_exec_option_dimension, FILE_FORMAT_TO_STORED_AS_MAP)
+from tests.util.filesystem_utils import get_fs_path
+from subprocess import check_call
 
 class TestQueries(ImpalaTestSuite):
 
@@ -329,6 +331,47 @@ class TestPartitionKeyScans(ImpalaTestSuite):
   def test_partition_key_scans_with_joins(self, vector):
     self.run_test_case('QueryTest/partition-key-scans-with-joins', vector)
 
+
+class TestPartitionKeyScansWithMultipleBlocks(ImpalaTestSuite):
+  """Tests for queries that exercise partition key scan optimisation with data files
+  that contain multiple blocks."""
+  @classmethod
+  def add_test_dimensions(cls):
+    super(TestPartitionKeyScansWithMultipleBlocks, cls).add_test_dimensions()
+    cls.ImpalaTestMatrix.add_constraint(lambda v:
+        v.get_value('table_format').file_format not in ('kudu', 'hbase'))
+
+  @classmethod
+  def get_workload(cls):
+    return 'functional-query'
+
+  def _build_alltypes_multiblocks_table(self, vector, unique_database):
+    file_format = vector.get_value('table_format').file_format
+    db_suffix = vector.get_value('table_format').db_suffix()
+    src_tbl_name = 'functional' + db_suffix + '.alltypes'
+    src_tbl_loc = self._get_table_location(src_tbl_name, vector)
+    source_file = src_tbl_loc + '/year=2010/month=12/*'
+    tbl_loc = get_fs_path("/test-warehouse/%s.db/alltypes_multiblocks"
+        % (unique_database))
+    file_path = tbl_loc + "/year=2010/month=12"
+
+    check_call(['hdfs', 'dfs', '-mkdir', '-p', file_path])
+    self.client.execute("""create table if not exists %s.alltypes_multiblocks
+        like functional.alltypes stored as %s location '%s';"""
+        % (unique_database, FILE_FORMAT_TO_STORED_AS_MAP[file_format], tbl_loc))
+
+    # set block size to 1024 so the target file occupies multiple blocks
+    check_call(['hdfs', 'dfs', '-Ddfs.block.size=1024', '-cp', '-f', '-d',
+        source_file, file_path])
+    self.client.execute("alter table %s.alltypes_multiblocks recover partitions"
+        % (unique_database))
+
+  def test_partition_key_scans_with_multiple_blocks_table(self, vector, unique_database):
+    self._build_alltypes_multiblocks_table(vector, unique_database)
+    result = self.execute_query_expect_success(self.client,
+          "SELECT max(year) FROM %s.alltypes_multiblocks" % (unique_database))
+    assert int(result.get_data()) == 2010
+
 class TestTopNReclaimQuery(ImpalaTestSuite):
   """Test class to validate that TopN periodically reclaims tuple pool memory
    and runs with a lower memory footprint."""

[impala] 11/17: IMPALA-11914: Fix broken verbose explain on MT_DOP > 0

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d85d34fc31d77cd9147a58f34f05f514af35180d
Author: Riza Suminto <ri...@cloudera.com>
AuthorDate: Fri Feb 10 15:13:13 2023 -0800

    IMPALA-11914: Fix broken verbose explain on MT_DOP > 0
    
    When running with MT_DOP>0, EXPLAIN_LEVEL=VERBOSE will produce broken
    explain string on fragment consuming input from join build fragment. It
    will continue printing the explain plan for that child fragment, the
    join build fragment, rather than skipping it. This patch fix the issue
    by skipping such children fragment.
    
    Testing:
    - Add PlannerTests.testExplainVerboseMtDop
    
    Backport Note for 4.1.2:
    Resovled trivial conflicts in PlannerTest.java
    
    Change-Id: Iad082074933204daaba0c675abb34c144e12c3f7
    Reviewed-on: http://gerrit.cloudera.org:8080/19491
    Reviewed-by: Wenzhe Zhou <wz...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../java/org/apache/impala/planner/PlanNode.java   |    1 +
 .../org/apache/impala/planner/PlannerTest.java     |    9 +
 .../PlannerTest/explain-verbose-mt_dop.test        | 2454 ++++++++++++++++++++
 3 files changed, 2464 insertions(+)

diff --git a/fe/src/main/java/org/apache/impala/planner/PlanNode.java b/fe/src/main/java/org/apache/impala/planner/PlanNode.java
index b936d5a5c..ae19ce278 100644
--- a/fe/src/main/java/org/apache/impala/planner/PlanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/PlanNode.java
@@ -389,6 +389,7 @@ abstract public class PlanNode extends TreeNode<PlanNode> {
       for (int i = children_.size() - 1; i >= 1; --i) {
         PlanNode child = getChild(i);
         if (fragment_ != child.fragment_) {
+          if (detailLevel == TExplainLevel.VERBOSE) continue;
           // we're crossing a fragment boundary
           expBuilder.append(
               child.fragment_.getExplainString(
diff --git a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
index 7a7e60ebe..2d5382d50 100644
--- a/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
+++ b/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
@@ -1286,4 +1286,13 @@ public class PlannerTest extends PlannerTestBase {
   public void testOrcStatsAgg() {
     runPlannerTestFile("orc-stats-agg");
   }
+
+  /**
+   * Test EXPLAIN_LEVEL=VERBOSE is displayed properly with MT_DOP>0
+   */
+  @Test
+  public void testExplainVerboseMtDop() {
+    runPlannerTestFile("explain-verbose-mt_dop", "tpcds_parquet",
+        ImmutableSet.of(PlannerTestOption.INCLUDE_RESOURCE_HEADER));
+  }
 }
diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test b/testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
new file mode 100644
index 000000000..ee2d37997
--- /dev/null
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/explain-verbose-mt_dop.test
@@ -0,0 +1,2454 @@
+# TPCDS-Q14a first of two
+with  cross_items as
+ (select i_item_sk ss_item_sk
+ from item,
+ (select iss.i_brand_id brand_id
+     ,iss.i_class_id class_id
+     ,iss.i_category_id category_id
+ from store_sales
+     ,item iss
+     ,date_dim d1
+ where ss_item_sk = iss.i_item_sk
+   and ss_sold_date_sk = d1.d_date_sk
+   and d1.d_year between 1999 AND 1999 + 2
+ intersect
+ select ics.i_brand_id
+     ,ics.i_class_id
+     ,ics.i_category_id
+ from catalog_sales
+     ,item ics
+     ,date_dim d2
+ where cs_item_sk = ics.i_item_sk
+   and cs_sold_date_sk = d2.d_date_sk
+   and d2.d_year between 1999 AND 1999 + 2
+ intersect
+ select iws.i_brand_id
+     ,iws.i_class_id
+     ,iws.i_category_id
+ from web_sales
+     ,item iws
+     ,date_dim d3
+ where ws_item_sk = iws.i_item_sk
+   and ws_sold_date_sk = d3.d_date_sk
+   and d3.d_year between 1999 AND 1999 + 2) t1
+ where i_brand_id = brand_id
+      and i_class_id = class_id
+      and i_category_id = category_id
+),
+ avg_sales as
+ (select avg(quantity*list_price) average_sales
+  from (select ss_quantity quantity
+             ,ss_list_price list_price
+       from store_sales
+           ,date_dim
+       where ss_sold_date_sk = d_date_sk
+         and d_year between 1999 and 1999 + 2
+       union all
+       select cs_quantity quantity
+             ,cs_list_price list_price
+       from catalog_sales
+           ,date_dim
+       where cs_sold_date_sk = d_date_sk
+         and d_year between 1999 and 1999 + 2
+       union all
+       select ws_quantity quantity
+             ,ws_list_price list_price
+       from web_sales
+           ,date_dim
+       where ws_sold_date_sk = d_date_sk
+         and d_year between 1999 and 1999 + 2) x)
+ select channel, i_brand_id,i_class_id,i_category_id,sum(sales), sum(number_sales)
+ from(
+       select 'store' channel, i_brand_id,i_class_id
+             ,i_category_id,sum(ss_quantity*ss_list_price) sales
+             , count(*) number_sales
+       from store_sales
+           ,item
+           ,date_dim
+       where ss_item_sk in (select ss_item_sk from cross_items)
+         and ss_item_sk = i_item_sk
+         and ss_sold_date_sk = d_date_sk
+         and d_year = 1999+2
+         and d_moy = 11
+       group by i_brand_id,i_class_id,i_category_id
+       having sum(ss_quantity*ss_list_price) > (select average_sales from avg_sales)
+       union all
+       select 'catalog' channel, i_brand_id,i_class_id,i_category_id, sum(cs_quantity*cs_list_price) sales, count(*) number_sales
+       from catalog_sales
+           ,item
+           ,date_dim
+       where cs_item_sk in (select ss_item_sk from cross_items)
+         and cs_item_sk = i_item_sk
+         and cs_sold_date_sk = d_date_sk
+         and d_year = 1999+2
+         and d_moy = 11
+       group by i_brand_id,i_class_id,i_category_id
+       having sum(cs_quantity*cs_list_price) > (select average_sales from avg_sales)
+       union all
+       select 'web' channel, i_brand_id,i_class_id,i_category_id, sum(ws_quantity*ws_list_price) sales , count(*) number_sales
+       from web_sales
+           ,item
+           ,date_dim
+       where ws_item_sk in (select ss_item_sk from cross_items)
+         and ws_item_sk = i_item_sk
+         and ws_sold_date_sk = d_date_sk
+         and d_year = 1999+2
+         and d_moy = 11
+       group by i_brand_id,i_class_id,i_category_id
+       having sum(ws_quantity*ws_list_price) > (select average_sales from avg_sales)
+ ) y
+ group by rollup (channel, i_brand_id,i_class_id,i_category_id)
+ order by channel,i_brand_id,i_class_id,i_category_id
+LIMIT 100
+---- QUERYOPTIONS
+mt_dop=4
+explain_level=3
+---- PARALLELPLANS
+Max Per-Host Resource Reservation: Memory=988.00MB Threads=189
+Per-Host Resource Estimates: Memory=3.37GB
+F80:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=4.06MB mem-reservation=4.00MB thread-reservation=1
+  PLAN-ROOT SINK
+  |  output exprs: CASE valid_tid(104,105,106,107,108) WHEN 104 THEN channel WHEN 105 THEN channel WHEN 106 THEN channel WHEN 107 THEN channel WHEN 108 THEN NULL END, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_brand_id WHEN 105 THEN i_brand_id WHEN 106 THEN i_brand_id WHEN 107 THEN NULL WHEN 108 THEN NULL END, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_class_id WHEN 105 THEN i_class_id WHEN 106 THEN NULL WHEN 107 THEN NULL WHEN 108 THEN NULL END, CASE valid_tid(104,105, [...]
+  |  mem-estimate=4.00MB mem-reservation=4.00MB spill-buffer=2.00MB thread-reservation=0
+  |
+  216:MERGING-EXCHANGE [UNPARTITIONED]
+     order by: CASE valid_tid(104,105,106,107,108) WHEN 104 THEN channel WHEN 105 THEN channel WHEN 106 THEN channel WHEN 107 THEN channel WHEN 108 THEN NULL END ASC, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_brand_id WHEN 105 THEN i_brand_id WHEN 106 THEN i_brand_id WHEN 107 THEN NULL WHEN 108 THEN NULL END ASC, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_class_id WHEN 105 THEN i_class_id WHEN 106 THEN NULL WHEN 107 THEN NULL WHEN 108 THEN NULL END ASC, CASE valid_tid( [...]
+     limit: 100
+     mem-estimate=62.50KB mem-reservation=0B thread-reservation=0
+     tuple-ids=110 row-size=48B cardinality=100
+     in pipelines: 129(GETNEXT)
+
+F79:PLAN FRAGMENT [HASH(CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(channel) WHEN 105 THEN murmur_hash(channel) WHEN 106 THEN murmur_hash(channel) WHEN 107 THEN murmur_hash(channel) WHEN 108 THEN murmur_hash(NULL) END,CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(i_brand_id) WHEN 105 THEN murmur_hash(i_brand_id) WHEN 106 THEN murmur_hash(i_brand_id) WHEN 107 THEN murmur_hash(NULL) WHEN 108 THEN murmur_hash(NULL) END,CASE valid_tid(104,105,106,107,108) WH [...]
+Per-Instance Resources: mem-estimate=77.05MB mem-reservation=47.38MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F80, EXCHANGE=216, UNPARTITIONED]
+  |  mem-estimate=208.00KB mem-reservation=0B thread-reservation=0
+  129:TOP-N [LIMIT=100]
+  |  order by: CASE valid_tid(104,105,106,107,108) WHEN 104 THEN channel WHEN 105 THEN channel WHEN 106 THEN channel WHEN 107 THEN channel WHEN 108 THEN NULL END ASC, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_brand_id WHEN 105 THEN i_brand_id WHEN 106 THEN i_brand_id WHEN 107 THEN NULL WHEN 108 THEN NULL END ASC, CASE valid_tid(104,105,106,107,108) WHEN 104 THEN i_class_id WHEN 105 THEN i_class_id WHEN 106 THEN NULL WHEN 107 THEN NULL WHEN 108 THEN NULL END ASC, CASE valid_tid( [...]
+  |  mem-estimate=4.69KB mem-reservation=0B thread-reservation=0
+  |  tuple-ids=110 row-size=48B cardinality=100
+  |  in pipelines: 129(GETNEXT), 128(OPEN)
+  |
+  128:AGGREGATE [FINALIZE]
+  |  output: aggif(valid_tid(104,105,106,107,108) IN (CAST(104 AS INT), CAST(105 AS INT), CAST(106 AS INT), CAST(107 AS INT), CAST(108 AS INT)), CASE valid_tid(104,105,106,107,108) WHEN CAST(104 AS INT) THEN sum(sales) WHEN CAST(105 AS INT) THEN sum(sales) WHEN CAST(106 AS INT) THEN sum(sales) WHEN CAST(107 AS INT) THEN sum(sales) WHEN CAST(108 AS INT) THEN sum(sales) END), aggif(valid_tid(104,105,106,107,108) IN (CAST(104 AS INT), CAST(105 AS INT), CAST(106 AS INT), CAST(107 AS INT), CA [...]
+  |  group by: CASE valid_tid(104,105,106,107,108) WHEN CAST(104 AS INT) THEN channel WHEN CAST(105 AS INT) THEN channel WHEN CAST(106 AS INT) THEN channel WHEN CAST(107 AS INT) THEN channel WHEN CAST(108 AS INT) THEN NULL END, CASE valid_tid(104,105,106,107,108) WHEN CAST(104 AS INT) THEN i_brand_id WHEN CAST(105 AS INT) THEN i_brand_id WHEN CAST(106 AS INT) THEN i_brand_id WHEN CAST(107 AS INT) THEN NULL WHEN CAST(108 AS INT) THEN NULL END, CASE valid_tid(104,105,106,107,108) WHEN CAST [...]
+  |  mem-estimate=10.00MB mem-reservation=4.75MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=109 row-size=52B cardinality=562.30K
+  |  in pipelines: 128(GETNEXT), 215(OPEN)
+  |
+  215:AGGREGATE [FINALIZE]
+  |  Class 0
+  |    output: sum:merge(sales), sum:merge(number_sales)
+  |    group by: channel, i_brand_id, i_class_id, i_category_id
+  |  Class 1
+  |    output: sum:merge(sales), sum:merge(number_sales)
+  |    group by: channel, i_brand_id, i_class_id, NULL
+  |  Class 2
+  |    output: sum:merge(sales), sum:merge(number_sales)
+  |    group by: channel, i_brand_id, NULL, NULL
+  |  Class 3
+  |    output: sum:merge(sales), sum:merge(number_sales)
+  |    group by: channel, NULL, NULL, NULL
+  |  Class 4
+  |    output: sum:merge(sales), sum:merge(number_sales)
+  |    group by: NULL, NULL, NULL, NULL
+  |  mem-estimate=64.00MB mem-reservation=42.62MB thread-reservation=0
+  |  tuple-ids=104N,105N,106N,107N,108N row-size=240B cardinality=562.30K
+  |  in pipelines: 215(GETNEXT), 151(OPEN), 179(OPEN), 207(OPEN)
+  |
+  214:EXCHANGE [HASH(CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(channel) WHEN 105 THEN murmur_hash(channel) WHEN 106 THEN murmur_hash(channel) WHEN 107 THEN murmur_hash(channel) WHEN 108 THEN murmur_hash(NULL) END,CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(i_brand_id) WHEN 105 THEN murmur_hash(i_brand_id) WHEN 106 THEN murmur_hash(i_brand_id) WHEN 107 THEN murmur_hash(NULL) WHEN 108 THEN murmur_hash(NULL) END,CASE valid_tid(104,105,106,107,108) WHEN  [...]
+     mem-estimate=13.05MB mem-reservation=0B thread-reservation=0
+     tuple-ids=104N,105N,106N,107N,108N row-size=240B cardinality=562.30K
+     in pipelines: 151(GETNEXT), 179(GETNEXT), 207(GETNEXT)
+
+F78:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=74.36MB mem-reservation=28.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F79, EXCHANGE=214, HASH(CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(channel) WHEN 105 THEN murmur_hash(channel) WHEN 106 THEN murmur_hash(channel) WHEN 107 THEN murmur_hash(channel) WHEN 108 THEN murmur_hash(NULL) END,CASE valid_tid(104,105,106,107,108) WHEN 104 THEN murmur_hash(i_brand_id) WHEN 105 THEN murmur_hash(i_brand_id) WHEN 106 THEN murmur_hash(i_brand_id) WHEN 107 THEN murmur_hash(NULL) WHEN 108 THEN murmur_hash(NULL) END,CASE valid [...]
+  |  mem-estimate=12.19MB mem-reservation=0B thread-reservation=0
+  127:AGGREGATE [STREAMING]
+  |  Class 0
+  |    output: sum(sales), sum(number_sales)
+  |    group by: channel, i_brand_id, i_class_id, i_category_id
+  |  Class 1
+  |    output: sum(sales), sum(number_sales)
+  |    group by: channel, i_brand_id, i_class_id, NULL
+  |  Class 2
+  |    output: sum(sales), sum(number_sales)
+  |    group by: channel, i_brand_id, NULL, NULL
+  |  Class 3
+  |    output: sum(sales), sum(number_sales)
+  |    group by: channel, NULL, NULL, NULL
+  |  Class 4
+  |    output: sum(sales), sum(number_sales)
+  |    group by: NULL, NULL, NULL, NULL
+  |  mem-estimate=50.00MB mem-reservation=27.00MB thread-reservation=0
+  |  tuple-ids=104N,105N,106N,107N,108N row-size=240B cardinality=562.30K
+  |  in pipelines: 151(GETNEXT), 179(GETNEXT), 207(GETNEXT)
+  |
+  00:UNION
+  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  tuple-ids=102 row-size=48B cardinality=276.96K
+  |  in pipelines: 151(GETNEXT), 179(GETNEXT), 207(GETNEXT)
+  |
+  |--126:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
+  |  |  join table id: 08
+  |  |  predicates: sum(ws_quantity * ws_list_price) > avg(quantity * list_price)
+  |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  |  tuple-ids=88,99 row-size=52B cardinality=42.85K
+  |  |  in pipelines: 207(GETNEXT), 212(OPEN)
+  |  |
+  |  207:AGGREGATE [FINALIZE]
+  |  |  output: sum:merge(ws_quantity * ws_list_price), count:merge(*)
+  |  |  group by: i_brand_id, i_class_id, i_category_id
+  |  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=88 row-size=36B cardinality=42.85K
+  |  |  in pipelines: 207(GETNEXT), 85(OPEN)
+  |  |
+  |  206:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+  |     mem-estimate=833.29KB mem-reservation=0B thread-reservation=0
+  |     tuple-ids=88 row-size=36B cardinality=42.85K
+  |     in pipelines: 85(GETNEXT)
+  |
+  |--84:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
+  |  |  join table id: 04
+  |  |  predicates: sum(cs_quantity * cs_list_price) > avg(quantity * list_price)
+  |  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  |  tuple-ids=54,65 row-size=52B cardinality=85.31K
+  |  |  in pipelines: 179(GETNEXT), 184(OPEN)
+  |  |
+  |  179:AGGREGATE [FINALIZE]
+  |  |  output: sum:merge(cs_quantity * cs_list_price), count:merge(*)
+  |  |  group by: i_brand_id, i_class_id, i_category_id
+  |  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=54 row-size=36B cardinality=85.31K
+  |  |  in pipelines: 179(GETNEXT), 43(OPEN)
+  |  |
+  |  178:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+  |     mem-estimate=1.09MB mem-reservation=0B thread-reservation=0
+  |     tuple-ids=54 row-size=36B cardinality=85.31K
+  |     in pipelines: 43(GETNEXT)
+  |
+  42:NESTED LOOP JOIN [INNER JOIN, BROADCAST]
+  |  join table id: 00
+  |  predicates: sum(ss_quantity * ss_list_price) > avg(quantity * list_price)
+  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  tuple-ids=20,31 row-size=52B cardinality=148.80K
+  |  in pipelines: 151(GETNEXT), 156(OPEN)
+  |
+  151:AGGREGATE [FINALIZE]
+  |  output: sum:merge(ss_quantity * ss_list_price), count:merge(*)
+  |  group by: i_brand_id, i_class_id, i_category_id
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=20 row-size=36B cardinality=148.80K
+  |  in pipelines: 151(GETNEXT), 01(OPEN)
+  |
+  150:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+     mem-estimate=2.17MB mem-reservation=0B thread-reservation=0
+     tuple-ids=20 row-size=36B cardinality=148.80K
+     in pipelines: 01(GETNEXT)
+
+F00:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Host Shared Resources: mem-estimate=3.00MB mem-reservation=3.00MB thread-reservation=0 runtime-filters-memory=3.00MB
+Per-Instance Resources: mem-estimate=27.88MB mem-reservation=3.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F78, EXCHANGE=150, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.88MB mem-reservation=0B thread-reservation=0
+  30:AGGREGATE [STREAMING]
+  |  output: sum(CAST(ss_quantity AS DECIMAL(10,0)) * ss_list_price), count(*)
+  |  group by: i_brand_id, i_class_id, i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=20 row-size=36B cardinality=148.80K
+  |  in pipelines: 01(GETNEXT)
+  |
+  29:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
+  |  hash-table-id=12
+  |  hash predicates: ss_item_sk = tpcds_parquet.item.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=0,2,1 row-size=52B cardinality=170.55K
+  |  in pipelines: 01(GETNEXT), 148(OPEN)
+  |
+  27:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=22
+  |  hash predicates: ss_item_sk = i_item_sk
+  |  fk/pk conjuncts: ss_item_sk = i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=0,2,1 row-size=52B cardinality=170.55K
+  |  in pipelines: 01(GETNEXT), 02(OPEN)
+  |
+  26:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=23
+  |  hash predicates: ss_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=0,2 row-size=32B cardinality=170.55K
+  |  in pipelines: 01(GETNEXT), 03(OPEN)
+  |
+  01:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF005[min_max] -> ss_sold_date_sk, RF004[bloom] -> ss_sold_date_sk, RF000[bloom] -> ss_item_sk, RF002[bloom] -> ss_item_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=1.00MB thread-reservation=0
+     tuple-ids=0 row-size=20B cardinality=2.88M
+     in pipelines: 01(GETNEXT)
+
+F93:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=9.03MB mem-reservation=8.75MB thread-reservation=1 runtime-filters-memory=1.00MB
+  JOIN BUILD
+  |  join-table-id=12 plan-id=13 cohort-id=01
+  |  build expressions: tpcds_parquet.item.i_item_sk
+  |  runtime filters: RF000[bloom] <- tpcds_parquet.item.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  149:EXCHANGE [BROADCAST]
+     mem-estimate=284.43KB mem-reservation=0B thread-reservation=0
+     tuple-ids=126 row-size=8B cardinality=17.98K
+     in pipelines: 148(GETNEXT)
+
+F16:PLAN FRAGMENT [HASH(tpcds_parquet.item.i_item_sk)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=10.19MB mem-reservation=1.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F93, EXCHANGE=149, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  148:AGGREGATE [FINALIZE]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=126 row-size=8B cardinality=17.98K
+  |  in pipelines: 148(GETNEXT), 135(OPEN)
+  |
+  147:EXCHANGE [HASH(tpcds_parquet.item.i_item_sk)]
+     mem-estimate=190.81KB mem-reservation=0B thread-reservation=0
+     tuple-ids=126 row-size=8B cardinality=17.98K
+     in pipelines: 135(GETNEXT)
+
+F06:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=20.76MB mem-reservation=3.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F16, EXCHANGE=147, HASH(tpcds_parquet.item.i_item_sk)]
+  |  mem-estimate=576.00KB mem-reservation=0B thread-reservation=0
+  28:AGGREGATE [STREAMING]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=126 row-size=8B cardinality=17.98K
+  |  in pipelines: 135(GETNEXT)
+  |
+  25:HASH JOIN [INNER JOIN, PARTITIONED]
+  |  hash-table-id=13
+  |  hash predicates: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  fk/pk conjuncts: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=7,3 row-size=32B cardinality=148.80K
+  |  in pipelines: 135(GETNEXT), 04(OPEN)
+  |
+  24:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=14
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM iws.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM iws.i_category_id, iss.i_class_id IS NOT DISTINCT FROM iws.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=7 row-size=12B cardinality=148.80K
+  |  in pipelines: 135(GETNEXT), 144(OPEN)
+  |
+  22:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=17
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM ics.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM ics.i_category_id, iss.i_class_id IS NOT DISTINCT FROM ics.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=7 row-size=12B cardinality=148.80K
+  |  in pipelines: 135(GETNEXT), 139(OPEN)
+  |
+  135:AGGREGATE [FINALIZE]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=7 row-size=12B cardinality=148.80K
+  |  in pipelines: 135(GETNEXT), 05(OPEN)
+  |
+  134:EXCHANGE [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+     mem-estimate=773.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=7 row-size=12B cardinality=148.80K
+     in pipelines: 05(GETNEXT)
+
+F03:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=26.75MB mem-reservation=5.50MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F06, EXCHANGE=134, HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  10:AGGREGATE [STREAMING]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=7 row-size=12B cardinality=148.80K
+  |  in pipelines: 05(GETNEXT)
+  |
+  09:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=20
+  |  hash predicates: ss_sold_date_sk = d1.d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d1.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=4,5,6 row-size=40B cardinality=2.88M
+  |  in pipelines: 05(GETNEXT), 07(OPEN)
+  |
+  08:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=21
+  |  hash predicates: ss_item_sk = iss.i_item_sk
+  |  fk/pk conjuncts: ss_item_sk = iss.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=4,5 row-size=32B cardinality=2.88M
+  |  in pipelines: 05(GETNEXT), 06(OPEN)
+  |
+  05:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF019[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=4 row-size=12B cardinality=2.88M
+     in pipelines: 05(GETNEXT)
+
+F101:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=20 plan-id=21 cohort-id=05
+  |  build expressions: d1.d_date_sk
+  |  runtime filters: RF019[min_max] <- d1.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  133:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=6 row-size=8B cardinality=7.30K
+     in pipelines: 07(GETNEXT)
+
+F05:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F101, EXCHANGE=133, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  07:SCAN HDFS [tpcds_parquet.date_dim d1, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=6 row-size=8B cardinality=7.30K
+     in pipelines: 07(GETNEXT)
+
+F102:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=21 plan-id=22 cohort-id=05
+  |  build expressions: iss.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  132:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=5 row-size=20B cardinality=18.00K
+     in pipelines: 06(GETNEXT)
+
+F04:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Host Shared Resources: mem-estimate=5.00MB mem-reservation=5.00MB thread-reservation=0 runtime-filters-memory=5.00MB
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F102, EXCHANGE=132, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  06:SCAN HDFS [tpcds_parquet.item iss, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     runtime filters: RF006[bloom] -> iss.i_brand_id, RF007[bloom] -> iss.i_category_id, RF008[bloom] -> iss.i_class_id, RF012[bloom] -> iss.i_brand_id, RF013[bloom] -> iss.i_category_id
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=5 row-size=20B cardinality=18.00K
+     in pipelines: 06(GETNEXT)
+
+F94:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=5.30MB mem-reservation=4.94MB thread-reservation=1 runtime-filters-memory=3.00MB
+  JOIN BUILD
+  |  join-table-id=13 plan-id=14 cohort-id=05
+  |  build expressions: i_brand_id, i_category_id, i_class_id
+  |  runtime filters: RF006[bloom] <- i_brand_id, RF007[bloom] <- i_category_id, RF008[bloom] <- i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  146:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=3 row-size=20B cardinality=18.00K
+     in pipelines: 04(GETNEXT)
+
+F15:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=17.12MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F94, EXCHANGE=146, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.12MB mem-reservation=0B thread-reservation=0
+  04:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=3 row-size=20B cardinality=18.00K
+     in pipelines: 04(GETNEXT)
+
+F95:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=4.82MB mem-reservation=3.94MB thread-reservation=1 runtime-filters-memory=2.00MB
+  JOIN BUILD
+  |  join-table-id=14 plan-id=15 cohort-id=05
+  |  build expressions: iws.i_brand_id, iws.i_category_id, iws.i_class_id
+  |  runtime filters: RF012[bloom] <- iws.i_brand_id, RF013[bloom] <- iws.i_category_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  145:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=125 row-size=12B cardinality=148.80K
+     in pipelines: 144(GETNEXT)
+
+F14:PLAN FRAGMENT [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=10.88MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F95, EXCHANGE=145, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  144:AGGREGATE [FINALIZE]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=125 row-size=12B cardinality=148.80K
+  |  in pipelines: 144(GETNEXT), 16(OPEN)
+  |
+  143:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=125 row-size=12B cardinality=148.80K
+     in pipelines: 16(GETNEXT)
+
+F11:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=42.12MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F14, EXCHANGE=143, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=128.00KB mem-reservation=0B thread-reservation=0
+  23:AGGREGATE [STREAMING]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=125 row-size=12B cardinality=148.80K
+  |  in pipelines: 16(GETNEXT)
+  |
+  20:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=15
+  |  hash predicates: ws_sold_date_sk = d3.d_date_sk
+  |  fk/pk conjuncts: ws_sold_date_sk = d3.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=13,14,15 row-size=40B cardinality=719.38K
+  |  in pipelines: 16(GETNEXT), 18(OPEN)
+  |
+  19:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=16
+  |  hash predicates: ws_item_sk = iws.i_item_sk
+  |  fk/pk conjuncts: ws_item_sk = iws.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=13,14 row-size=32B cardinality=719.38K
+  |  in pipelines: 16(GETNEXT), 17(OPEN)
+  |
+  16:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+     HDFS partitions=1/1 files=2 size=45.09MB
+     stored statistics:
+       table: rows=719.38K size=45.09MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=644.77K
+     file formats: [PARQUET]
+     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=13 row-size=12B cardinality=719.38K
+     in pipelines: 16(GETNEXT)
+
+F96:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=15 plan-id=16 cohort-id=06
+  |  build expressions: d3.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  142:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=15 row-size=8B cardinality=7.30K
+     in pipelines: 18(GETNEXT)
+
+F13:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F96, EXCHANGE=142, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  18:SCAN HDFS [tpcds_parquet.date_dim d3, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=15 row-size=8B cardinality=7.30K
+     in pipelines: 18(GETNEXT)
+
+F97:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=16 plan-id=17 cohort-id=06
+  |  build expressions: iws.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  141:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=14 row-size=20B cardinality=18.00K
+     in pipelines: 17(GETNEXT)
+
+F12:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F97, EXCHANGE=141, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  17:SCAN HDFS [tpcds_parquet.item iws, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=14 row-size=20B cardinality=18.00K
+     in pipelines: 17(GETNEXT)
+
+F98:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.55MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=17 plan-id=18 cohort-id=05
+  |  build expressions: ics.i_brand_id, ics.i_category_id, ics.i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  140:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=124 row-size=12B cardinality=148.80K
+     in pipelines: 139(GETNEXT)
+
+F10:PLAN FRAGMENT [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=10.75MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F98, EXCHANGE=140, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  139:AGGREGATE [FINALIZE]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=124 row-size=12B cardinality=148.80K
+  |  in pipelines: 139(GETNEXT), 11(OPEN)
+  |
+  138:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=124 row-size=12B cardinality=148.80K
+     in pipelines: 11(GETNEXT)
+
+F07:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=58.19MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F10, EXCHANGE=138, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=192.00KB mem-reservation=0B thread-reservation=0
+  21:AGGREGATE [STREAMING]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=124 row-size=12B cardinality=148.80K
+  |  in pipelines: 11(GETNEXT)
+  |
+  15:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=18
+  |  hash predicates: cs_sold_date_sk = d2.d_date_sk
+  |  fk/pk conjuncts: cs_sold_date_sk = d2.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=9,10,11 row-size=40B cardinality=1.44M
+  |  in pipelines: 11(GETNEXT), 13(OPEN)
+  |
+  14:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=19
+  |  hash predicates: cs_item_sk = ics.i_item_sk
+  |  fk/pk conjuncts: cs_item_sk = ics.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=9,10 row-size=32B cardinality=1.44M
+  |  in pipelines: 11(GETNEXT), 12(OPEN)
+  |
+  11:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+     HDFS partitions=1/1 files=3 size=96.62MB
+     stored statistics:
+       table: rows=1.44M size=96.62MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=650.14K
+     file formats: [PARQUET]
+     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=9 row-size=12B cardinality=1.44M
+     in pipelines: 11(GETNEXT)
+
+F99:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=18 plan-id=19 cohort-id=07
+  |  build expressions: d2.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  137:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=11 row-size=8B cardinality=7.30K
+     in pipelines: 13(GETNEXT)
+
+F09:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F99, EXCHANGE=137, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  13:SCAN HDFS [tpcds_parquet.date_dim d2, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=11 row-size=8B cardinality=7.30K
+     in pipelines: 13(GETNEXT)
+
+F100:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=19 plan-id=20 cohort-id=07
+  |  build expressions: ics.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  136:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=10 row-size=20B cardinality=18.00K
+     in pipelines: 12(GETNEXT)
+
+F08:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F100, EXCHANGE=136, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  12:SCAN HDFS [tpcds_parquet.item ics, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=10 row-size=20B cardinality=18.00K
+     in pipelines: 12(GETNEXT)
+
+F103:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=9.12MB mem-reservation=8.75MB thread-reservation=1 runtime-filters-memory=1.00MB
+  JOIN BUILD
+  |  join-table-id=22 plan-id=23 cohort-id=01
+  |  build expressions: i_item_sk
+  |  runtime filters: RF002[bloom] <- i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  131:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=1 row-size=20B cardinality=18.00K
+     in pipelines: 02(GETNEXT)
+
+F02:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F103, EXCHANGE=131, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  02:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     runtime filters: RF000[bloom] -> tpcds_parquet.item.i_item_sk
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=1 row-size=20B cardinality=18.00K
+     in pipelines: 02(GETNEXT)
+
+F104:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.77MB mem-reservation=8.75MB thread-reservation=1 runtime-filters-memory=1.00MB
+  JOIN BUILD
+  |  join-table-id=23 plan-id=24 cohort-id=01
+  |  build expressions: d_date_sk
+  |  runtime filters: RF004[bloom] <- d_date_sk, RF005[min_max] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  130:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=2 row-size=12B cardinality=108
+     in pipelines: 03(GETNEXT)
+
+F01:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.06MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F104, EXCHANGE=130, BROADCAST]
+  |  mem-estimate=64.00KB mem-reservation=0B thread-reservation=0
+  03:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     parquet dictionary predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=2 row-size=12B cardinality=108
+     in pipelines: 03(GETNEXT)
+
+F26:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
+Per-Instance Resources: mem-estimate=59.88MB mem-reservation=6.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F78, EXCHANGE=178, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.88MB mem-reservation=0B thread-reservation=0
+  72:AGGREGATE [STREAMING]
+  |  output: sum(CAST(cs_quantity AS DECIMAL(10,0)) * cs_list_price), count(*)
+  |  group by: i_brand_id, i_class_id, i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=54 row-size=36B cardinality=85.31K
+  |  in pipelines: 43(GETNEXT)
+  |
+  71:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
+  |  hash-table-id=24
+  |  hash predicates: cs_item_sk = tpcds_parquet.item.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=34,36,35 row-size=52B cardinality=85.31K
+  |  in pipelines: 43(GETNEXT), 176(OPEN)
+  |
+  69:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=34
+  |  hash predicates: cs_item_sk = i_item_sk
+  |  fk/pk conjuncts: cs_item_sk = i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=34,36,35 row-size=52B cardinality=85.31K
+  |  in pipelines: 43(GETNEXT), 44(OPEN)
+  |
+  68:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=35
+  |  hash predicates: cs_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: cs_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=34,36 row-size=32B cardinality=85.31K
+  |  in pipelines: 43(GETNEXT), 45(OPEN)
+  |
+  43:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+     HDFS partitions=1/1 files=3 size=96.62MB
+     runtime filters: RF040[bloom] -> cs_sold_date_sk
+     stored statistics:
+       table: rows=1.44M size=96.62MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=650.14K
+     file formats: [PARQUET]
+     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=34 row-size=20B cardinality=1.44M
+     in pipelines: 43(GETNEXT)
+
+F105:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.03MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=24 plan-id=25 cohort-id=01
+  |  build expressions: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  177:EXCHANGE [BROADCAST]
+     mem-estimate=284.43KB mem-reservation=0B thread-reservation=0
+     tuple-ids=148 row-size=8B cardinality=17.98K
+     in pipelines: 176(GETNEXT)
+
+F42:PLAN FRAGMENT [HASH(tpcds_parquet.item.i_item_sk)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=10.19MB mem-reservation=1.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F105, EXCHANGE=177, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  176:AGGREGATE [FINALIZE]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=148 row-size=8B cardinality=17.98K
+  |  in pipelines: 176(GETNEXT), 163(OPEN)
+  |
+  175:EXCHANGE [HASH(tpcds_parquet.item.i_item_sk)]
+     mem-estimate=190.81KB mem-reservation=0B thread-reservation=0
+     tuple-ids=148 row-size=8B cardinality=17.98K
+     in pipelines: 163(GETNEXT)
+
+F32:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=20.76MB mem-reservation=3.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F42, EXCHANGE=175, HASH(tpcds_parquet.item.i_item_sk)]
+  |  mem-estimate=576.00KB mem-reservation=0B thread-reservation=0
+  70:AGGREGATE [STREAMING]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=148 row-size=8B cardinality=17.98K
+  |  in pipelines: 163(GETNEXT)
+  |
+  67:HASH JOIN [INNER JOIN, PARTITIONED]
+  |  hash-table-id=25
+  |  hash predicates: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  fk/pk conjuncts: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=41,37 row-size=32B cardinality=148.80K
+  |  in pipelines: 163(GETNEXT), 46(OPEN)
+  |
+  66:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=26
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM iws.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM iws.i_category_id, iss.i_class_id IS NOT DISTINCT FROM iws.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=41 row-size=12B cardinality=148.80K
+  |  in pipelines: 163(GETNEXT), 172(OPEN)
+  |
+  64:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=29
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM ics.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM ics.i_category_id, iss.i_class_id IS NOT DISTINCT FROM ics.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=41 row-size=12B cardinality=148.80K
+  |  in pipelines: 163(GETNEXT), 167(OPEN)
+  |
+  163:AGGREGATE [FINALIZE]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=41 row-size=12B cardinality=148.80K
+  |  in pipelines: 163(GETNEXT), 47(OPEN)
+  |
+  162:EXCHANGE [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+     mem-estimate=773.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=41 row-size=12B cardinality=148.80K
+     in pipelines: 47(GETNEXT)
+
+F29:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=26.75MB mem-reservation=5.50MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F32, EXCHANGE=162, HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  52:AGGREGATE [STREAMING]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=41 row-size=12B cardinality=148.80K
+  |  in pipelines: 47(GETNEXT)
+  |
+  51:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=32
+  |  hash predicates: ss_sold_date_sk = d1.d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d1.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=38,39,40 row-size=40B cardinality=2.88M
+  |  in pipelines: 47(GETNEXT), 49(OPEN)
+  |
+  50:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=33
+  |  hash predicates: ss_item_sk = iss.i_item_sk
+  |  fk/pk conjuncts: ss_item_sk = iss.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=38,39 row-size=32B cardinality=2.88M
+  |  in pipelines: 47(GETNEXT), 48(OPEN)
+  |
+  47:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF055[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=38 row-size=12B cardinality=2.88M
+     in pipelines: 47(GETNEXT)
+
+F113:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=32 plan-id=33 cohort-id=08
+  |  build expressions: d1.d_date_sk
+  |  runtime filters: RF055[min_max] <- d1.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  161:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=40 row-size=8B cardinality=7.30K
+     in pipelines: 49(GETNEXT)
+
+F31:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F113, EXCHANGE=161, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  49:SCAN HDFS [tpcds_parquet.date_dim d1, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=40 row-size=8B cardinality=7.30K
+     in pipelines: 49(GETNEXT)
+
+F114:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=33 plan-id=34 cohort-id=08
+  |  build expressions: iss.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  160:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=39 row-size=20B cardinality=18.00K
+     in pipelines: 48(GETNEXT)
+
+F30:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F114, EXCHANGE=160, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  48:SCAN HDFS [tpcds_parquet.item iss, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=39 row-size=20B cardinality=18.00K
+     in pipelines: 48(GETNEXT)
+
+F106:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.30MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=25 plan-id=26 cohort-id=08
+  |  build expressions: i_brand_id, i_category_id, i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  174:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=37 row-size=20B cardinality=18.00K
+     in pipelines: 46(GETNEXT)
+
+F41:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=17.12MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F106, EXCHANGE=174, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.12MB mem-reservation=0B thread-reservation=0
+  46:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=37 row-size=20B cardinality=18.00K
+     in pipelines: 46(GETNEXT)
+
+F107:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.82MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=26 plan-id=27 cohort-id=08
+  |  build expressions: iws.i_brand_id, iws.i_category_id, iws.i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  173:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=147 row-size=12B cardinality=148.80K
+     in pipelines: 172(GETNEXT)
+
+F40:PLAN FRAGMENT [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=10.88MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F107, EXCHANGE=173, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  172:AGGREGATE [FINALIZE]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=147 row-size=12B cardinality=148.80K
+  |  in pipelines: 172(GETNEXT), 58(OPEN)
+  |
+  171:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=147 row-size=12B cardinality=148.80K
+     in pipelines: 58(GETNEXT)
+
+F37:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=42.12MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F40, EXCHANGE=171, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=128.00KB mem-reservation=0B thread-reservation=0
+  65:AGGREGATE [STREAMING]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=147 row-size=12B cardinality=148.80K
+  |  in pipelines: 58(GETNEXT)
+  |
+  62:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=27
+  |  hash predicates: ws_sold_date_sk = d3.d_date_sk
+  |  fk/pk conjuncts: ws_sold_date_sk = d3.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=47,48,49 row-size=40B cardinality=719.38K
+  |  in pipelines: 58(GETNEXT), 60(OPEN)
+  |
+  61:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=28
+  |  hash predicates: ws_item_sk = iws.i_item_sk
+  |  fk/pk conjuncts: ws_item_sk = iws.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=47,48 row-size=32B cardinality=719.38K
+  |  in pipelines: 58(GETNEXT), 59(OPEN)
+  |
+  58:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+     HDFS partitions=1/1 files=2 size=45.09MB
+     stored statistics:
+       table: rows=719.38K size=45.09MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=644.77K
+     file formats: [PARQUET]
+     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=47 row-size=12B cardinality=719.38K
+     in pipelines: 58(GETNEXT)
+
+F108:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=27 plan-id=28 cohort-id=09
+  |  build expressions: d3.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  170:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=49 row-size=8B cardinality=7.30K
+     in pipelines: 60(GETNEXT)
+
+F39:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F108, EXCHANGE=170, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  60:SCAN HDFS [tpcds_parquet.date_dim d3, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=49 row-size=8B cardinality=7.30K
+     in pipelines: 60(GETNEXT)
+
+F109:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=28 plan-id=29 cohort-id=09
+  |  build expressions: iws.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  169:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=48 row-size=20B cardinality=18.00K
+     in pipelines: 59(GETNEXT)
+
+F38:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F109, EXCHANGE=169, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  59:SCAN HDFS [tpcds_parquet.item iws, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=48 row-size=20B cardinality=18.00K
+     in pipelines: 59(GETNEXT)
+
+F110:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.55MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=29 plan-id=30 cohort-id=08
+  |  build expressions: ics.i_brand_id, ics.i_category_id, ics.i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  168:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=146 row-size=12B cardinality=148.80K
+     in pipelines: 167(GETNEXT)
+
+F36:PLAN FRAGMENT [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=10.75MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F110, EXCHANGE=168, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  167:AGGREGATE [FINALIZE]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=146 row-size=12B cardinality=148.80K
+  |  in pipelines: 167(GETNEXT), 53(OPEN)
+  |
+  166:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=146 row-size=12B cardinality=148.80K
+     in pipelines: 53(GETNEXT)
+
+F33:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=58.19MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F36, EXCHANGE=166, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=192.00KB mem-reservation=0B thread-reservation=0
+  63:AGGREGATE [STREAMING]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=146 row-size=12B cardinality=148.80K
+  |  in pipelines: 53(GETNEXT)
+  |
+  57:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=30
+  |  hash predicates: cs_sold_date_sk = d2.d_date_sk
+  |  fk/pk conjuncts: cs_sold_date_sk = d2.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=43,44,45 row-size=40B cardinality=1.44M
+  |  in pipelines: 53(GETNEXT), 55(OPEN)
+  |
+  56:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=31
+  |  hash predicates: cs_item_sk = ics.i_item_sk
+  |  fk/pk conjuncts: cs_item_sk = ics.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=43,44 row-size=32B cardinality=1.44M
+  |  in pipelines: 53(GETNEXT), 54(OPEN)
+  |
+  53:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+     HDFS partitions=1/1 files=3 size=96.62MB
+     stored statistics:
+       table: rows=1.44M size=96.62MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=650.14K
+     file formats: [PARQUET]
+     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=43 row-size=12B cardinality=1.44M
+     in pipelines: 53(GETNEXT)
+
+F111:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=30 plan-id=31 cohort-id=10
+  |  build expressions: d2.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  165:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=45 row-size=8B cardinality=7.30K
+     in pipelines: 55(GETNEXT)
+
+F35:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F111, EXCHANGE=165, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  55:SCAN HDFS [tpcds_parquet.date_dim d2, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=45 row-size=8B cardinality=7.30K
+     in pipelines: 55(GETNEXT)
+
+F112:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=31 plan-id=32 cohort-id=10
+  |  build expressions: ics.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  164:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=44 row-size=20B cardinality=18.00K
+     in pipelines: 54(GETNEXT)
+
+F34:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F112, EXCHANGE=164, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  54:SCAN HDFS [tpcds_parquet.item ics, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=44 row-size=20B cardinality=18.00K
+     in pipelines: 54(GETNEXT)
+
+F115:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=34 plan-id=35 cohort-id=01
+  |  build expressions: i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  159:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=35 row-size=20B cardinality=18.00K
+     in pipelines: 44(GETNEXT)
+
+F28:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F115, EXCHANGE=159, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  44:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=35 row-size=20B cardinality=18.00K
+     in pipelines: 44(GETNEXT)
+
+F116:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.77MB mem-reservation=8.75MB thread-reservation=1 runtime-filters-memory=1.00MB
+  JOIN BUILD
+  |  join-table-id=35 plan-id=36 cohort-id=01
+  |  build expressions: d_date_sk
+  |  runtime filters: RF040[bloom] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  158:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=36 row-size=12B cardinality=108
+     in pipelines: 45(GETNEXT)
+
+F27:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.06MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F116, EXCHANGE=158, BROADCAST]
+  |  mem-estimate=64.00KB mem-reservation=0B thread-reservation=0
+  45:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     parquet dictionary predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=36 row-size=12B cardinality=108
+     in pipelines: 45(GETNEXT)
+
+F52:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Host Shared Resources: mem-estimate=1.00MB mem-reservation=1.00MB thread-reservation=0 runtime-filters-memory=1.00MB
+Per-Instance Resources: mem-estimate=43.88MB mem-reservation=6.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F78, EXCHANGE=206, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.88MB mem-reservation=0B thread-reservation=0
+  114:AGGREGATE [STREAMING]
+  |  output: sum(CAST(ws_quantity AS DECIMAL(10,0)) * ws_list_price), count(*)
+  |  group by: i_brand_id, i_class_id, i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=88 row-size=36B cardinality=42.85K
+  |  in pipelines: 85(GETNEXT)
+  |
+  113:HASH JOIN [LEFT SEMI JOIN, BROADCAST]
+  |  hash-table-id=36
+  |  hash predicates: ws_item_sk = tpcds_parquet.item.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=68,70,69 row-size=52B cardinality=42.85K
+  |  in pipelines: 85(GETNEXT), 204(OPEN)
+  |
+  111:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=46
+  |  hash predicates: ws_item_sk = i_item_sk
+  |  fk/pk conjuncts: ws_item_sk = i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=68,70,69 row-size=52B cardinality=42.85K
+  |  in pipelines: 85(GETNEXT), 86(OPEN)
+  |
+  110:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=47
+  |  hash predicates: ws_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: ws_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=68,70 row-size=32B cardinality=42.85K
+  |  in pipelines: 85(GETNEXT), 87(OPEN)
+  |
+  85:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+     HDFS partitions=1/1 files=2 size=45.09MB
+     runtime filters: RF076[bloom] -> ws_sold_date_sk
+     stored statistics:
+       table: rows=719.38K size=45.09MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=644.77K
+     file formats: [PARQUET]
+     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=68 row-size=20B cardinality=719.38K
+     in pipelines: 85(GETNEXT)
+
+F117:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.03MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=36 plan-id=37 cohort-id=01
+  |  build expressions: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  205:EXCHANGE [BROADCAST]
+     mem-estimate=284.43KB mem-reservation=0B thread-reservation=0
+     tuple-ids=170 row-size=8B cardinality=17.98K
+     in pipelines: 204(GETNEXT)
+
+F68:PLAN FRAGMENT [HASH(tpcds_parquet.item.i_item_sk)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=10.19MB mem-reservation=1.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F117, EXCHANGE=205, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  204:AGGREGATE [FINALIZE]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=170 row-size=8B cardinality=17.98K
+  |  in pipelines: 204(GETNEXT), 191(OPEN)
+  |
+  203:EXCHANGE [HASH(tpcds_parquet.item.i_item_sk)]
+     mem-estimate=190.81KB mem-reservation=0B thread-reservation=0
+     tuple-ids=170 row-size=8B cardinality=17.98K
+     in pipelines: 191(GETNEXT)
+
+F58:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=20.76MB mem-reservation=3.94MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F68, EXCHANGE=203, HASH(tpcds_parquet.item.i_item_sk)]
+  |  mem-estimate=576.00KB mem-reservation=0B thread-reservation=0
+  112:AGGREGATE [STREAMING]
+  |  group by: tpcds_parquet.item.i_item_sk
+  |  mem-estimate=10.00MB mem-reservation=2.00MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=170 row-size=8B cardinality=17.98K
+  |  in pipelines: 191(GETNEXT)
+  |
+  109:HASH JOIN [INNER JOIN, PARTITIONED]
+  |  hash-table-id=37
+  |  hash predicates: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  fk/pk conjuncts: iss.i_brand_id = i_brand_id, iss.i_category_id = i_category_id, iss.i_class_id = i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=75,71 row-size=32B cardinality=148.80K
+  |  in pipelines: 191(GETNEXT), 88(OPEN)
+  |
+  108:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=38
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM iws.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM iws.i_category_id, iss.i_class_id IS NOT DISTINCT FROM iws.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=75 row-size=12B cardinality=148.80K
+  |  in pipelines: 191(GETNEXT), 200(OPEN)
+  |
+  106:HASH JOIN [LEFT SEMI JOIN, PARTITIONED]
+  |  hash-table-id=41
+  |  hash predicates: iss.i_brand_id IS NOT DISTINCT FROM ics.i_brand_id, iss.i_category_id IS NOT DISTINCT FROM ics.i_category_id, iss.i_class_id IS NOT DISTINCT FROM ics.i_class_id
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=75 row-size=12B cardinality=148.80K
+  |  in pipelines: 191(GETNEXT), 195(OPEN)
+  |
+  191:AGGREGATE [FINALIZE]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=75 row-size=12B cardinality=148.80K
+  |  in pipelines: 191(GETNEXT), 89(OPEN)
+  |
+  190:EXCHANGE [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+     mem-estimate=773.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=75 row-size=12B cardinality=148.80K
+     in pipelines: 89(GETNEXT)
+
+F55:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=26.75MB mem-reservation=5.50MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F58, EXCHANGE=190, HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  94:AGGREGATE [STREAMING]
+  |  group by: iss.i_brand_id, iss.i_class_id, iss.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=75 row-size=12B cardinality=148.80K
+  |  in pipelines: 89(GETNEXT)
+  |
+  93:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=44
+  |  hash predicates: ss_sold_date_sk = d1.d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d1.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=72,73,74 row-size=40B cardinality=2.88M
+  |  in pipelines: 89(GETNEXT), 91(OPEN)
+  |
+  92:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=45
+  |  hash predicates: ss_item_sk = iss.i_item_sk
+  |  fk/pk conjuncts: ss_item_sk = iss.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=72,73 row-size=32B cardinality=2.88M
+  |  in pipelines: 89(GETNEXT), 90(OPEN)
+  |
+  89:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF091[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=72 row-size=12B cardinality=2.88M
+     in pipelines: 89(GETNEXT)
+
+F125:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=44 plan-id=45 cohort-id=11
+  |  build expressions: d1.d_date_sk
+  |  runtime filters: RF091[min_max] <- d1.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  189:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=74 row-size=8B cardinality=7.30K
+     in pipelines: 91(GETNEXT)
+
+F57:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F125, EXCHANGE=189, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  91:SCAN HDFS [tpcds_parquet.date_dim d1, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d1.d_year <= CAST(2001 AS INT), d1.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=74 row-size=8B cardinality=7.30K
+     in pipelines: 91(GETNEXT)
+
+F126:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=45 plan-id=46 cohort-id=11
+  |  build expressions: iss.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  188:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=73 row-size=20B cardinality=18.00K
+     in pipelines: 90(GETNEXT)
+
+F56:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F126, EXCHANGE=188, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  90:SCAN HDFS [tpcds_parquet.item iss, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=73 row-size=20B cardinality=18.00K
+     in pipelines: 90(GETNEXT)
+
+F118:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.30MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=37 plan-id=38 cohort-id=11
+  |  build expressions: i_brand_id, i_category_id, i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  202:EXCHANGE [HASH(i_brand_id,i_class_id,i_category_id)]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=71 row-size=20B cardinality=18.00K
+     in pipelines: 88(GETNEXT)
+
+F67:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=17.12MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F118, EXCHANGE=202, HASH(i_brand_id,i_class_id,i_category_id)]
+  |  mem-estimate=1.12MB mem-reservation=0B thread-reservation=0
+  88:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=71 row-size=20B cardinality=18.00K
+     in pipelines: 88(GETNEXT)
+
+F119:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.82MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=38 plan-id=39 cohort-id=11
+  |  build expressions: iws.i_brand_id, iws.i_category_id, iws.i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  201:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=169 row-size=12B cardinality=148.80K
+     in pipelines: 200(GETNEXT)
+
+F66:PLAN FRAGMENT [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=10.88MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F119, EXCHANGE=201, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  200:AGGREGATE [FINALIZE]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=169 row-size=12B cardinality=148.80K
+  |  in pipelines: 200(GETNEXT), 100(OPEN)
+  |
+  199:EXCHANGE [HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+     mem-estimate=903.88KB mem-reservation=0B thread-reservation=0
+     tuple-ids=169 row-size=12B cardinality=148.80K
+     in pipelines: 100(GETNEXT)
+
+F63:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=42.12MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F66, EXCHANGE=199, HASH(iws.i_brand_id,iws.i_class_id,iws.i_category_id)]
+  |  mem-estimate=128.00KB mem-reservation=0B thread-reservation=0
+  107:AGGREGATE [STREAMING]
+  |  group by: iws.i_brand_id, iws.i_class_id, iws.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=169 row-size=12B cardinality=148.80K
+  |  in pipelines: 100(GETNEXT)
+  |
+  104:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=39
+  |  hash predicates: ws_sold_date_sk = d3.d_date_sk
+  |  fk/pk conjuncts: ws_sold_date_sk = d3.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=81,82,83 row-size=40B cardinality=719.38K
+  |  in pipelines: 100(GETNEXT), 102(OPEN)
+  |
+  103:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=40
+  |  hash predicates: ws_item_sk = iws.i_item_sk
+  |  fk/pk conjuncts: ws_item_sk = iws.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=81,82 row-size=32B cardinality=719.38K
+  |  in pipelines: 100(GETNEXT), 101(OPEN)
+  |
+  100:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+     HDFS partitions=1/1 files=2 size=45.09MB
+     stored statistics:
+       table: rows=719.38K size=45.09MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=644.77K
+     file formats: [PARQUET]
+     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=81 row-size=12B cardinality=719.38K
+     in pipelines: 100(GETNEXT)
+
+F120:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=39 plan-id=40 cohort-id=12
+  |  build expressions: d3.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  198:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=83 row-size=8B cardinality=7.30K
+     in pipelines: 102(GETNEXT)
+
+F65:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F120, EXCHANGE=198, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  102:SCAN HDFS [tpcds_parquet.date_dim d3, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d3.d_year <= CAST(2001 AS INT), d3.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=83 row-size=8B cardinality=7.30K
+     in pipelines: 102(GETNEXT)
+
+F121:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=40 plan-id=41 cohort-id=12
+  |  build expressions: iws.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  197:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=82 row-size=20B cardinality=18.00K
+     in pipelines: 101(GETNEXT)
+
+F64:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F121, EXCHANGE=197, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  101:SCAN HDFS [tpcds_parquet.item iws, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=82 row-size=20B cardinality=18.00K
+     in pipelines: 101(GETNEXT)
+
+F122:PLAN FRAGMENT [HASH(iss.i_brand_id,iss.i_class_id,iss.i_category_id)] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=2.55MB mem-reservation=1.94MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=41 plan-id=42 cohort-id=11
+  |  build expressions: ics.i_brand_id, ics.i_category_id, ics.i_class_id
+  |  mem-estimate=1.94MB mem-reservation=1.94MB spill-buffer=64.00KB thread-reservation=0
+  |
+  196:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=168 row-size=12B cardinality=148.80K
+     in pipelines: 195(GETNEXT)
+
+F62:PLAN FRAGMENT [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=10.75MB mem-reservation=2.88MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F122, EXCHANGE=196, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=768.00KB mem-reservation=0B thread-reservation=0
+  195:AGGREGATE [FINALIZE]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=2.88MB spill-buffer=128.00KB thread-reservation=0
+  |  tuple-ids=168 row-size=12B cardinality=148.80K
+  |  in pipelines: 195(GETNEXT), 95(OPEN)
+  |
+  194:EXCHANGE [HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+     mem-estimate=629.25KB mem-reservation=0B thread-reservation=0
+     tuple-ids=168 row-size=12B cardinality=148.80K
+     in pipelines: 95(GETNEXT)
+
+F59:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=58.19MB mem-reservation=9.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F62, EXCHANGE=194, HASH(ics.i_brand_id,ics.i_class_id,ics.i_category_id)]
+  |  mem-estimate=192.00KB mem-reservation=0B thread-reservation=0
+  105:AGGREGATE [STREAMING]
+  |  group by: ics.i_brand_id, ics.i_class_id, ics.i_category_id
+  |  mem-estimate=10.00MB mem-reservation=5.00MB spill-buffer=256.00KB thread-reservation=0
+  |  tuple-ids=168 row-size=12B cardinality=148.80K
+  |  in pipelines: 95(GETNEXT)
+  |
+  99:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=42
+  |  hash predicates: cs_sold_date_sk = d2.d_date_sk
+  |  fk/pk conjuncts: cs_sold_date_sk = d2.d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=77,78,79 row-size=40B cardinality=1.44M
+  |  in pipelines: 95(GETNEXT), 97(OPEN)
+  |
+  98:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=43
+  |  hash predicates: cs_item_sk = ics.i_item_sk
+  |  fk/pk conjuncts: cs_item_sk = ics.i_item_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=77,78 row-size=32B cardinality=1.44M
+  |  in pipelines: 95(GETNEXT), 96(OPEN)
+  |
+  95:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+     HDFS partitions=1/1 files=3 size=96.62MB
+     stored statistics:
+       table: rows=1.44M size=96.62MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=650.14K
+     file formats: [PARQUET]
+     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+     tuple-ids=77 row-size=12B cardinality=1.44M
+     in pipelines: 95(GETNEXT)
+
+F123:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=42 plan-id=43 cohort-id=13
+  |  build expressions: d2.d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  193:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=79 row-size=8B cardinality=7.30K
+     in pipelines: 97(GETNEXT)
+
+F61:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F123, EXCHANGE=193, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  97:SCAN HDFS [tpcds_parquet.date_dim d2, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d2.d_year <= CAST(2001 AS INT), d2.d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=79 row-size=8B cardinality=7.30K
+     in pipelines: 97(GETNEXT)
+
+F124:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=43 plan-id=44 cohort-id=13
+  |  build expressions: ics.i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  192:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=78 row-size=20B cardinality=18.00K
+     in pipelines: 96(GETNEXT)
+
+F60:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F124, EXCHANGE=192, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  96:SCAN HDFS [tpcds_parquet.item ics, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=78 row-size=20B cardinality=18.00K
+     in pipelines: 96(GETNEXT)
+
+F127:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.12MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=46 plan-id=47 cohort-id=01
+  |  build expressions: i_item_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  187:EXCHANGE [BROADCAST]
+     mem-estimate=375.56KB mem-reservation=0B thread-reservation=0
+     tuple-ids=69 row-size=20B cardinality=18.00K
+     in pipelines: 86(GETNEXT)
+
+F54:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.09MB mem-reservation=256.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F127, EXCHANGE=187, BROADCAST]
+  |  mem-estimate=96.00KB mem-reservation=0B thread-reservation=0
+  86:SCAN HDFS [tpcds_parquet.item, RANDOM]
+     HDFS partitions=1/1 files=1 size=1.73MB
+     stored statistics:
+       table: rows=18.00K size=1.73MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=18.00K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=256.00KB thread-reservation=0
+     tuple-ids=69 row-size=20B cardinality=18.00K
+     in pipelines: 86(GETNEXT)
+
+F128:PLAN FRAGMENT [RANDOM] hosts=2 instances=2
+Per-Instance Resources: mem-estimate=8.77MB mem-reservation=8.75MB thread-reservation=1 runtime-filters-memory=1.00MB
+  JOIN BUILD
+  |  join-table-id=47 plan-id=48 cohort-id=01
+  |  build expressions: d_date_sk
+  |  runtime filters: RF076[bloom] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  186:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=70 row-size=12B cardinality=108
+     in pipelines: 87(GETNEXT)
+
+F53:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.06MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F128, EXCHANGE=186, BROADCAST]
+  |  mem-estimate=64.00KB mem-reservation=0B thread-reservation=0
+  87:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     parquet dictionary predicates: d_year = CAST(2001 AS INT), d_moy = CAST(11 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=70 row-size=12B cardinality=108
+     in pipelines: 87(GETNEXT)
+
+F81:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=16.02KB mem-reservation=0B thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=00 plan-id=01 cohort-id=01
+  |  mem-estimate=16B mem-reservation=0B thread-reservation=0
+  |
+  157:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=31 row-size=16B cardinality=1
+     in pipelines: 156(GETNEXT)
+
+F25:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=96.00KB mem-reservation=0B thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F81, EXCHANGE=157, BROADCAST]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  156:AGGREGATE [FINALIZE]
+  |  output: avg:merge(quantity * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=31 row-size=16B cardinality=1
+  |  in pipelines: 156(GETNEXT), 41(OPEN)
+  |
+  155:EXCHANGE [UNPARTITIONED]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=30 row-size=16B cardinality=1
+     in pipelines: 41(GETNEXT)
+
+F24:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=48.02MB mem-reservation=4.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F25, EXCHANGE=155, UNPARTITIONED]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  41:AGGREGATE
+  |  output: avg(CAST(quantity AS DECIMAL(10,0)) * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=30 row-size=16B cardinality=1
+  |  in pipelines: 41(GETNEXT), 32(OPEN), 35(OPEN), 38(OPEN)
+  |
+  31:UNION
+  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  tuple-ids=28 row-size=8B cardinality=5.04M
+  |  in pipelines: 32(GETNEXT), 35(GETNEXT), 38(GETNEXT)
+  |
+  |--40:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=03
+  |  |  hash predicates: ws_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: ws_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=26,27 row-size=20B cardinality=719.38K
+  |  |  in pipelines: 38(GETNEXT), 39(OPEN)
+  |  |
+  |  38:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+  |     HDFS partitions=1/1 files=2 size=45.09MB
+  |     stored statistics:
+  |       table: rows=719.38K size=45.09MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=644.77K
+  |     file formats: [PARQUET]
+  |     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=26 row-size=12B cardinality=719.38K
+  |     in pipelines: 38(GETNEXT)
+  |
+  |--37:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=02
+  |  |  hash predicates: cs_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: cs_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=24,25 row-size=20B cardinality=1.44M
+  |  |  in pipelines: 35(GETNEXT), 36(OPEN)
+  |  |
+  |  35:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+  |     HDFS partitions=1/1 files=3 size=96.62MB
+  |     stored statistics:
+  |       table: rows=1.44M size=96.62MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=650.14K
+  |     file formats: [PARQUET]
+  |     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=24 row-size=12B cardinality=1.44M
+  |     in pipelines: 35(GETNEXT)
+  |
+  34:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=01
+  |  hash predicates: ss_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=22,23 row-size=20B cardinality=2.88M
+  |  in pipelines: 32(GETNEXT), 33(OPEN)
+  |
+  32:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF031[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=22 row-size=12B cardinality=2.88M
+     in pipelines: 32(GETNEXT)
+
+F82:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=01 plan-id=02 cohort-id=02
+  |  build expressions: d_date_sk
+  |  runtime filters: RF031[min_max] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  152:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=23 row-size=8B cardinality=7.30K
+     in pipelines: 33(GETNEXT)
+
+F19:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F82, EXCHANGE=152, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  33:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=23 row-size=8B cardinality=7.30K
+     in pipelines: 33(GETNEXT)
+
+F83:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=02 plan-id=03 cohort-id=02
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  153:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=25 row-size=8B cardinality=7.30K
+     in pipelines: 36(GETNEXT)
+
+F21:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F83, EXCHANGE=153, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  36:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=25 row-size=8B cardinality=7.30K
+     in pipelines: 36(GETNEXT)
+
+F84:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=03 plan-id=04 cohort-id=02
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  154:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=27 row-size=8B cardinality=7.30K
+     in pipelines: 39(GETNEXT)
+
+F23:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F84, EXCHANGE=154, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  39:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=27 row-size=8B cardinality=7.30K
+     in pipelines: 39(GETNEXT)
+
+F85:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=16.02KB mem-reservation=0B thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=04 plan-id=05 cohort-id=01
+  |  mem-estimate=16B mem-reservation=0B thread-reservation=0
+  |
+  185:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=65 row-size=16B cardinality=1
+     in pipelines: 184(GETNEXT)
+
+F51:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=96.00KB mem-reservation=0B thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F85, EXCHANGE=185, BROADCAST]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  184:AGGREGATE [FINALIZE]
+  |  output: avg:merge(quantity * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=65 row-size=16B cardinality=1
+  |  in pipelines: 184(GETNEXT), 83(OPEN)
+  |
+  183:EXCHANGE [UNPARTITIONED]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=64 row-size=16B cardinality=1
+     in pipelines: 83(GETNEXT)
+
+F50:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=48.02MB mem-reservation=4.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F51, EXCHANGE=183, UNPARTITIONED]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  83:AGGREGATE
+  |  output: avg(CAST(quantity AS DECIMAL(10,0)) * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=64 row-size=16B cardinality=1
+  |  in pipelines: 83(GETNEXT), 74(OPEN), 77(OPEN), 80(OPEN)
+  |
+  73:UNION
+  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  tuple-ids=62 row-size=8B cardinality=5.04M
+  |  in pipelines: 74(GETNEXT), 77(GETNEXT), 80(GETNEXT)
+  |
+  |--82:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=07
+  |  |  hash predicates: ws_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: ws_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=60,61 row-size=20B cardinality=719.38K
+  |  |  in pipelines: 80(GETNEXT), 81(OPEN)
+  |  |
+  |  80:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+  |     HDFS partitions=1/1 files=2 size=45.09MB
+  |     stored statistics:
+  |       table: rows=719.38K size=45.09MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=644.77K
+  |     file formats: [PARQUET]
+  |     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=60 row-size=12B cardinality=719.38K
+  |     in pipelines: 80(GETNEXT)
+  |
+  |--79:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=06
+  |  |  hash predicates: cs_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: cs_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=58,59 row-size=20B cardinality=1.44M
+  |  |  in pipelines: 77(GETNEXT), 78(OPEN)
+  |  |
+  |  77:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+  |     HDFS partitions=1/1 files=3 size=96.62MB
+  |     stored statistics:
+  |       table: rows=1.44M size=96.62MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=650.14K
+  |     file formats: [PARQUET]
+  |     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=58 row-size=12B cardinality=1.44M
+  |     in pipelines: 77(GETNEXT)
+  |
+  76:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=05
+  |  hash predicates: ss_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=56,57 row-size=20B cardinality=2.88M
+  |  in pipelines: 74(GETNEXT), 75(OPEN)
+  |
+  74:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF067[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=56 row-size=12B cardinality=2.88M
+     in pipelines: 74(GETNEXT)
+
+F86:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=05 plan-id=06 cohort-id=03
+  |  build expressions: d_date_sk
+  |  runtime filters: RF067[min_max] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  180:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=57 row-size=8B cardinality=7.30K
+     in pipelines: 75(GETNEXT)
+
+F45:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F86, EXCHANGE=180, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  75:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=57 row-size=8B cardinality=7.30K
+     in pipelines: 75(GETNEXT)
+
+F87:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=06 plan-id=07 cohort-id=03
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  181:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=59 row-size=8B cardinality=7.30K
+     in pipelines: 78(GETNEXT)
+
+F47:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F87, EXCHANGE=181, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  78:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=59 row-size=8B cardinality=7.30K
+     in pipelines: 78(GETNEXT)
+
+F88:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=07 plan-id=08 cohort-id=03
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  182:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=61 row-size=8B cardinality=7.30K
+     in pipelines: 81(GETNEXT)
+
+F49:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F88, EXCHANGE=182, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  81:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=61 row-size=8B cardinality=7.30K
+     in pipelines: 81(GETNEXT)
+
+F89:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=16.02KB mem-reservation=0B thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=08 plan-id=09 cohort-id=01
+  |  mem-estimate=16B mem-reservation=0B thread-reservation=0
+  |
+  213:EXCHANGE [BROADCAST]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=99 row-size=16B cardinality=1
+     in pipelines: 212(GETNEXT)
+
+F77:PLAN FRAGMENT [UNPARTITIONED] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=96.00KB mem-reservation=0B thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F89, EXCHANGE=213, BROADCAST]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  212:AGGREGATE [FINALIZE]
+  |  output: avg:merge(quantity * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=99 row-size=16B cardinality=1
+  |  in pipelines: 212(GETNEXT), 125(OPEN)
+  |
+  211:EXCHANGE [UNPARTITIONED]
+     mem-estimate=16.00KB mem-reservation=0B thread-reservation=0
+     tuple-ids=98 row-size=16B cardinality=1
+     in pipelines: 125(GETNEXT)
+
+F76:PLAN FRAGMENT [RANDOM] hosts=3 instances=12
+Per-Instance Resources: mem-estimate=48.02MB mem-reservation=4.00MB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F77, EXCHANGE=211, UNPARTITIONED]
+  |  mem-estimate=80.00KB mem-reservation=0B thread-reservation=0
+  125:AGGREGATE
+  |  output: avg(CAST(quantity AS DECIMAL(10,0)) * list_price)
+  |  mem-estimate=16.00KB mem-reservation=0B spill-buffer=2.00MB thread-reservation=0
+  |  tuple-ids=98 row-size=16B cardinality=1
+  |  in pipelines: 125(GETNEXT), 116(OPEN), 119(OPEN), 122(OPEN)
+  |
+  115:UNION
+  |  mem-estimate=0B mem-reservation=0B thread-reservation=0
+  |  tuple-ids=96 row-size=8B cardinality=5.04M
+  |  in pipelines: 116(GETNEXT), 119(GETNEXT), 122(GETNEXT)
+  |
+  |--124:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=11
+  |  |  hash predicates: ws_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: ws_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=94,95 row-size=20B cardinality=719.38K
+  |  |  in pipelines: 122(GETNEXT), 123(OPEN)
+  |  |
+  |  122:SCAN HDFS [tpcds_parquet.web_sales, RANDOM]
+  |     HDFS partitions=1/1 files=2 size=45.09MB
+  |     stored statistics:
+  |       table: rows=719.38K size=45.09MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=644.77K
+  |     file formats: [PARQUET]
+  |     mem-estimate=32.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=94 row-size=12B cardinality=719.38K
+  |     in pipelines: 122(GETNEXT)
+  |
+  |--121:HASH JOIN [INNER JOIN, BROADCAST]
+  |  |  hash-table-id=10
+  |  |  hash predicates: cs_sold_date_sk = d_date_sk
+  |  |  fk/pk conjuncts: cs_sold_date_sk = d_date_sk
+  |  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  |  tuple-ids=92,93 row-size=20B cardinality=1.44M
+  |  |  in pipelines: 119(GETNEXT), 120(OPEN)
+  |  |
+  |  119:SCAN HDFS [tpcds_parquet.catalog_sales, RANDOM]
+  |     HDFS partitions=1/1 files=3 size=96.62MB
+  |     stored statistics:
+  |       table: rows=1.44M size=96.62MB
+  |       columns: all
+  |     extrapolated-rows=disabled max-scan-range-rows=650.14K
+  |     file formats: [PARQUET]
+  |     mem-estimate=48.00MB mem-reservation=4.00MB thread-reservation=0
+  |     tuple-ids=92 row-size=12B cardinality=1.44M
+  |     in pipelines: 119(GETNEXT)
+  |
+  118:HASH JOIN [INNER JOIN, BROADCAST]
+  |  hash-table-id=09
+  |  hash predicates: ss_sold_date_sk = d_date_sk
+  |  fk/pk conjuncts: ss_sold_date_sk = d_date_sk
+  |  mem-estimate=0B mem-reservation=0B spill-buffer=64.00KB thread-reservation=0
+  |  tuple-ids=90,91 row-size=20B cardinality=2.88M
+  |  in pipelines: 116(GETNEXT), 117(OPEN)
+  |
+  116:SCAN HDFS [tpcds_parquet.store_sales, RANDOM]
+     HDFS partitions=1824/1824 files=1824 size=199.44MB
+     runtime filters: RF103[min_max] -> ss_sold_date_sk
+     stored statistics:
+       table: rows=2.88M size=199.44MB
+       partitions: 1824/1824 rows=2.88M
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=130.09K
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=90 row-size=12B cardinality=2.88M
+     in pipelines: 116(GETNEXT)
+
+F90:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=09 plan-id=10 cohort-id=04
+  |  build expressions: d_date_sk
+  |  runtime filters: RF103[min_max] <- d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  208:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=91 row-size=8B cardinality=7.30K
+     in pipelines: 117(GETNEXT)
+
+F71:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F90, EXCHANGE=208, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  117:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=91 row-size=8B cardinality=7.30K
+     in pipelines: 117(GETNEXT)
+
+F91:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=10 plan-id=11 cohort-id=04
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  209:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=93 row-size=8B cardinality=7.30K
+     in pipelines: 120(GETNEXT)
+
+F73:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F91, EXCHANGE=209, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  120:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=93 row-size=8B cardinality=7.30K
+     in pipelines: 120(GETNEXT)
+
+F92:PLAN FRAGMENT [RANDOM] hosts=3 instances=3
+Per-Instance Resources: mem-estimate=7.82MB mem-reservation=7.75MB thread-reservation=1
+  JOIN BUILD
+  |  join-table-id=11 plan-id=12 cohort-id=04
+  |  build expressions: d_date_sk
+  |  mem-estimate=7.75MB mem-reservation=7.75MB spill-buffer=64.00KB thread-reservation=0
+  |
+  210:EXCHANGE [BROADCAST]
+     mem-estimate=69.07KB mem-reservation=0B thread-reservation=0
+     tuple-ids=95 row-size=8B cardinality=7.30K
+     in pipelines: 123(GETNEXT)
+
+F75:PLAN FRAGMENT [RANDOM] hosts=1 instances=1
+Per-Instance Resources: mem-estimate=16.05MB mem-reservation=512.00KB thread-reservation=1
+  DATASTREAM SINK [FRAGMENT=F92, EXCHANGE=210, BROADCAST]
+  |  mem-estimate=48.00KB mem-reservation=0B thread-reservation=0
+  123:SCAN HDFS [tpcds_parquet.date_dim, RANDOM]
+     HDFS partitions=1/1 files=1 size=2.15MB
+     predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     stored statistics:
+       table: rows=73.05K size=2.15MB
+       columns: all
+     extrapolated-rows=disabled max-scan-range-rows=73.05K
+     parquet statistics predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     parquet dictionary predicates: d_year <= CAST(2001 AS INT), d_year >= CAST(1999 AS INT)
+     file formats: [PARQUET]
+     mem-estimate=16.00MB mem-reservation=512.00KB thread-reservation=0
+     tuple-ids=95 row-size=8B cardinality=7.30K
+     in pipelines: 123(GETNEXT)
+====

[impala] 05/17: IMPALA-11744: Table mask view should preserve the original column order in Hive

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 9bf8607ce58a1a2573c8c2b0ebdf9179a1840429
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Fri Nov 25 09:22:30 2022 +0800

    IMPALA-11744: Table mask view should preserve the original column order in Hive
    
    Ranger provides column masking and row filtering policies to mask
    sensitive data for specific users/groups. When a table should be masked
    in a query, Impala replaces it with a table mask view that exposes the
    columns with masked expressions.
    
    After IMPALA-9661, only selected columns are exposed in the table mask
    view. However, the columns of the view are exposed in the order that
    they are registered. If the registering order differs from the column
    order in the table, STAR expansions will mismatch the columns.
    
    To be specific, let's say table 'tbl' with 3 columns a, b, c should be
    masked in the following query:
      select b, * from tbl;
    Ideally Impala should replace the TableRef of 'tbl' with a table mask
    view as:
      select b, * from (
        select mask(a) a, mask(b) b, mask(c) c from tbl
      ) t;
    
    Currently, the rewritten query is
      select b, * from (
        select mask(b) b, mask(a) a, mask(c) c from tbl
      ) t;
    This incorrectly expands the STAR as "b, a, c" in the re-analyze phase.
    
    The cause is that column 'b' is registered earlier than all other
    columns. This patch fixes it by sorting the selected columns based on
    their original order in the table.
    
    Tests:
     - Add tests for selecting STAR with normal columns on table and view.
    
    Backport Note for 4.1.2:
    Keep the import of Optional in Analyzer.java.
    Removed some tests due to virtual column input__file__name not supported.
    
    Change-Id: Ic83d78312b19fa2c5ab88ac4f359bfabaeaabce6
    Reviewed-on: http://gerrit.cloudera.org:8080/19279
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../java/org/apache/impala/analysis/Analyzer.java  |   9 +-
 .../org/apache/impala/analysis/InlineViewRef.java  |   5 +
 .../java/org/apache/impala/analysis/TableRef.java  |  33 ++-
 .../queries/QueryTest/ranger_column_masking.test   | 231 +++++++++++++++++++++
 .../ranger_column_masking_and_row_filtering.test   |  88 ++++++++
 .../queries/QueryTest/ranger_row_filtering.test    |  98 +++++++++
 6 files changed, 459 insertions(+), 5 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
index c9d23db28..cfb5f2a05 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
@@ -934,7 +934,10 @@ public class Analyzer {
       dbName = resolvedTableRef.getTable().getDb().getName();
       tblName = resolvedTableRef.getTable().getName();
     }
-    List<Column> columns = resolvedTableRef.getColumns();
+    // The selected columns should be in the same relative order as they are in the
+    // corresponding Hive table so that the order of the SelectListItem's in the
+    // table mask view (if needs masking or filtering) would be correct.
+    List<Column> columns = resolvedTableRef.getSelectedColumnsInHiveOrder();
     TableMask tableMask = new TableMask(authChecker, dbName, tblName, columns, user_);
     try {
       if (resolvedTableRef instanceof CollectionTableRef) {
@@ -1701,7 +1704,9 @@ public class Analyzer {
   }
 
   /**
-   * Register scalar columns. Used in resolving column mask.
+   * Register columns for resolving column mask. The order in which columns are registered
+   * is not necessarily the same as the relative order of those columns in the
+   * corresponding Hive table.
    */
   public void registerColumnForMasking(SlotDescriptor slotDesc) {
     Preconditions.checkNotNull(slotDesc.getPath());
diff --git a/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java b/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
index a91070e85..40aed0864 100644
--- a/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/InlineViewRef.java
@@ -523,6 +523,11 @@ public class InlineViewRef extends TableRef {
     return queryStmt_.getColLabels();
   }
 
+  @Override
+  public List<Column> getColumnsInHiveOrder() {
+    return view_.getColumnsInHiveOrder();
+  }
+
   public FeView getView() { return view_; }
 
   public boolean isTableMaskingView() { return isTableMaskingView_; }
diff --git a/fe/src/main/java/org/apache/impala/analysis/TableRef.java b/fe/src/main/java/org/apache/impala/analysis/TableRef.java
index ed3d813df..ad6df20c0 100644
--- a/fe/src/main/java/org/apache/impala/analysis/TableRef.java
+++ b/fe/src/main/java/org/apache/impala/analysis/TableRef.java
@@ -21,6 +21,7 @@ import static org.apache.impala.analysis.ToSqlOptions.DEFAULT;
 
 import java.util.ArrayList;
 import java.util.Collections;
+import java.util.HashMap;
 import java.util.HashSet;
 import java.util.LinkedHashMap;
 import java.util.List;
@@ -156,7 +157,7 @@ public class TableRef extends StmtNode {
   protected boolean exposeNestedColumnsByTableMaskView_ = false;
 
   // Columns referenced in the query. Used in resolving column mask.
-  protected Map<String, Column> columns_ = new LinkedHashMap<>();
+  protected Map<String, Column> columns_ = new HashMap<>();
 
   // Time travel spec of this table ref. It contains information specified in the
   // FOR SYSTEM_TIME AS OF <timestamp> or FOR SYSTEM_TIME AS OF <version> clause.
@@ -778,8 +779,34 @@ public class TableRef extends StmtNode {
     columns_.put(column.getName(), column);
   }
 
-  public List<Column> getColumns() {
-    return new ArrayList<>(columns_.values());
+  /**
+   * @return an unmodifiable list of all columns, but with partition columns at the end of
+   * the list rather than the beginning. This is equivalent to the order in which Hive
+   * enumerates columns.
+   */
+  public List<Column> getColumnsInHiveOrder() {
+    return getTable().getColumnsInHiveOrder();
+  }
+
+  public List<Column> getSelectedColumnsInHiveOrder() {
+    // Map from column name to the Column object (null if not selected).
+    // Use LinkedHashMap to preserve the order.
+    Map<String, Column> colSelection = new LinkedHashMap<>();
+    for (Column c : getColumnsInHiveOrder()) {
+      colSelection.put(c.getName(), null);
+    }
+    // Update 'colSelection' with selected columns. Virtual columns will also be added.
+    for (String colName : columns_.keySet()) {
+      colSelection.put(colName, columns_.get(colName));
+    }
+    List<Column> res = new ArrayList<>();
+    for (Column c : colSelection.values()) {
+      if (c != null) res.add(c);
+    }
+    // Make sure not missing any columns
+    Preconditions.checkState(res.size() == columns_.size(),
+        "missing columns: " + res.size() + " != " + columns_.size());
+    return res;
   }
 
   void migratePropertiesTo(TableRef other) {
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
index c0b4a3853..1f0dac113 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking.test
@@ -75,6 +75,111 @@ select * from functional.alltypestiny t
 INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
 ====
 ---- QUERY
+# Test on star select item with other columns
+select int_col, * from functional.alltypestiny
+---- RESULTS
+0,0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1
+1,100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1
+0,200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2
+1,300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2
+0,400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3
+1,500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3
+0,600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4
+1,700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+select string_col, * from functional.alltypestiny
+---- RESULTS
+'0aaa',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1
+'1aaa',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1
+'0aaa',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2
+'1aaa',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2
+'0aaa',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3
+'1aaa',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3
+'0aaa',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4
+'1aaa',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+select string_col, month, * from functional.alltypestiny
+---- RESULTS
+'0aaa',1,0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1
+'1aaa',1,100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1
+'0aaa',2,200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2
+'1aaa',2,300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2
+'0aaa',3,400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3
+'1aaa',3,500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3
+'0aaa',4,600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4
+'1aaa',4,700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4
+---- TYPES
+STRING,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+select *, int_col from functional.alltypestiny
+---- RESULTS
+0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,0
+100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,1
+200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,0
+300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,1
+400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,0
+500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,1
+600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,0
+700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,1
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+select *, string_col from functional.alltypestiny
+---- RESULTS
+0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,'0aaa'
+100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,'1aaa'
+200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,'0aaa'
+300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,'1aaa'
+400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,'0aaa'
+500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,'1aaa'
+600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,'0aaa'
+700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,'1aaa'
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,STRING
+====
+---- QUERY
+# Test on star select item with other columns
+select string_col, *, int_col from functional.alltypestiny
+---- RESULTS
+'0aaa',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,0
+'1aaa',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,1
+'0aaa',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,0
+'1aaa',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,1
+'0aaa',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,0
+'1aaa',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,1
+'0aaa',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,0
+'1aaa',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,1
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+select string_col, *, month, *, int_col from functional.alltypestiny
+---- RESULTS
+'0aaa',0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,1,0,NULL,0,0,0,0,0,0,'01/01/09','0aaa',2009-01-01 00:00:00,2009,1,0
+'1aaa',100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,1,100,NULL,1,1,1,10,1.100000023841858,10.1,'01/01/09','1aaa',2009-01-01 00:01:00,2009,1,1
+'0aaa',200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,2,200,NULL,0,0,0,0,0,0,'02/01/09','0aaa',2009-02-01 00:00:00,2009,2,0
+'1aaa',300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,2,300,NULL,1,1,1,10,1.100000023841858,10.1,'02/01/09','1aaa',2009-02-01 00:01:00,2009,2,1
+'0aaa',400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,3,400,NULL,0,0,0,0,0,0,'03/01/09','0aaa',2009-03-01 00:00:00,2009,3,0
+'1aaa',500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,3,500,NULL,1,1,1,10,1.100000023841858,10.1,'03/01/09','1aaa',2009-03-01 00:01:00,2009,3,1
+'0aaa',600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,4,600,NULL,0,0,0,0,0,0,'04/01/09','0aaa',2009-04-01 00:00:00,2009,4,0
+'1aaa',700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,4,700,NULL,1,1,1,10,1.100000023841858,10.1,'04/01/09','1aaa',2009-04-01 00:01:00,2009,4,1
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Test on predicate. Should evaluate on masked values.
 select * from functional.alltypestiny where id = 1
 ---- RESULTS
@@ -215,6 +320,132 @@ order by id limit 10
 INT,BOOLEAN,INT,STRING
 ====
 ---- QUERY
+# Test on star select item with other columns on view
+select int_col, * from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+0,0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1
+1,100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1
+2,200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1
+3,300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1
+4,400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1
+5,500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1
+6,600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1
+7,700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1
+8,800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1
+9,900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns on view
+select string_col, * from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+'vvv0ttt',0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1
+'vvv1ttt',100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1
+'vvv2ttt',200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1
+'vvv3ttt',300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1
+'vvv4ttt',400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1
+'vvv5ttt',500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1
+'vvv6ttt',600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1
+'vvv7ttt',700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1
+'vvv8ttt',800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1
+'vvv9ttt',900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns on view
+select string_col, month, * from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+'vvv0ttt',1,0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1
+'vvv1ttt',1,100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1
+'vvv2ttt',1,200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1
+'vvv3ttt',1,300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1
+'vvv4ttt',1,400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1
+'vvv5ttt',1,500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1
+'vvv6ttt',1,600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1
+'vvv7ttt',1,700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1
+'vvv8ttt',1,800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1
+'vvv9ttt',1,900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1
+---- TYPES
+STRING,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns on view
+select *, int_col from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,0
+100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,1
+200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,2
+300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,3
+400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,4
+500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,5
+600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,6
+700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,7
+800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1,8
+900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1,9
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns on view
+select *, string_col from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,'vvv0ttt'
+100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,'vvv1ttt'
+200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,'vvv2ttt'
+300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,'vvv3ttt'
+400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,'vvv4ttt'
+500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,'vvv5ttt'
+600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,'vvv6ttt'
+700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,'vvv7ttt'
+800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1,'vvv8ttt'
+900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1,'vvv9ttt'
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,STRING
+====
+---- QUERY
+# Test on star select item with other columns on view
+select string_col, *, int_col from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+'vvv0ttt',0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,0
+'vvv1ttt',100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,1
+'vvv2ttt',200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,2
+'vvv3ttt',300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,3
+'vvv4ttt',400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,4
+'vvv5ttt',500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,5
+'vvv6ttt',600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,6
+'vvv7ttt',700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,7
+'vvv8ttt',800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1,8
+'vvv9ttt',900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1,9
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns on view
+select string_col, *, month, *, int_col from functional.alltypes_view
+order by id limit 10
+---- RESULTS
+'vvv0ttt',0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,1,0,true,0,0,0,0,0.0,0.0,'01/01/09','vvv0ttt',2009-01-01 00:00:00,2009,1,0
+'vvv1ttt',100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,1,100,false,1,1,1,10,1.10000002384,10.1,'01/01/09','vvv1ttt',2009-01-01 00:01:00,2009,1,1
+'vvv2ttt',200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,1,200,true,2,2,2,20,2.20000004768,20.2,'01/01/09','vvv2ttt',2009-01-01 00:02:00.100000000,2009,1,2
+'vvv3ttt',300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,1,300,false,3,3,3,30,3.29999995232,30.3,'01/01/09','vvv3ttt',2009-01-01 00:03:00.300000000,2009,1,3
+'vvv4ttt',400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,1,400,true,4,4,4,40,4.40000009537,40.4,'01/01/09','vvv4ttt',2009-01-01 00:04:00.600000000,2009,1,4
+'vvv5ttt',500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,1,500,false,5,5,5,50,5.5,50.5,'01/01/09','vvv5ttt',2009-01-01 00:05:00.100000000,2009,1,5
+'vvv6ttt',600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,1,600,true,6,6,6,60,6.59999990463,60.6,'01/01/09','vvv6ttt',2009-01-01 00:06:00.150000000,2009,1,6
+'vvv7ttt',700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,1,700,false,7,7,7,70,7.69999980927,70.7,'01/01/09','vvv7ttt',2009-01-01 00:07:00.210000000,2009,1,7
+'vvv8ttt',800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1,1,800,true,8,8,8,80,8.80000019073,80.8,'01/01/09','vvv8ttt',2009-01-01 00:08:00.280000000,2009,1,8
+'vvv9ttt',900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1,1,900,false,9,9,9,90,9.89999961853,90.9,'01/01/09','vvv9ttt',2009-01-01 00:09:00.360000000,2009,1,9
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Test on local view (CTE). Correctly ignore masking on local view names so the result
 # won't be 100 (affected by policy id => id * 100).
 use functional;
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
index 848ad4fd2..0b8eec575 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_and_row_filtering.test
@@ -14,6 +14,46 @@ from functional.alltypestiny
 INT,BOOLEAN,INT,STRING,STRING,INT,INT
 ====
 ---- QUERY
+# Test on star select item
+select * from functional.alltypestiny
+---- RESULTS
+100,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-01-01 00:00:00,2009,1
+103,false,1,1,1,10,1.100000023841858,10.1,'nn/nn/nn','NULL',2009-02-01 00:01:00,2009,2
+106,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-04-01 00:00:00,2009,4
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select int_col, * from functional.alltypestiny
+---- RESULTS
+0,100,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-01-01 00:00:00,2009,1
+1,103,false,1,1,1,10,1.100000023841858,10.1,'nn/nn/nn','NULL',2009-02-01 00:01:00,2009,2
+0,106,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-04-01 00:00:00,2009,4
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, * from functional.alltypestiny
+---- RESULTS
+'NULL',100,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-01-01 00:00:00,2009,1
+'NULL',103,false,1,1,1,10,1.100000023841858,10.1,'nn/nn/nn','NULL',2009-02-01 00:01:00,2009,2
+'NULL',106,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-04-01 00:00:00,2009,4
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, *, int_col from functional.alltypestiny
+---- RESULTS
+'NULL',100,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-01-01 00:00:00,2009,1,0
+'NULL',103,false,1,1,1,10,1.100000023841858,10.1,'nn/nn/nn','NULL',2009-02-01 00:01:00,2009,2,1
+'NULL',106,true,0,0,0,0,0,0,'nn/nn/nn','NULL',2009-04-01 00:00:00,2009,4,0
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Column-masking policies of functional.alltypes mask "id" to "-id" and redact column
 # "date_string_col". Row-filtering policy of functional.alltypes_view keeps rows with
 # "id >= -8 and date_string_col = 'nn/nn/nn'". functional.alltypes_view is a view based
@@ -34,6 +74,54 @@ select id, bool_col, date_string_col, year, month from functional.alltypes_view
 INT,BOOLEAN,STRING,INT,INT
 ====
 ---- QUERY
+# Test on star select item with normal columns
+select int_col, * from functional.alltypes_view
+---- RESULTS
+0,0,true,0,0,0,0,0.0,0.0,'nn/nn/nn','0',2009-01-01 00:00:00,2009,1
+1,-1,false,1,1,1,10,1.10000002384,10.1,'nn/nn/nn','1',2009-01-01 00:01:00,2009,1
+2,-2,true,2,2,2,20,2.20000004768,20.2,'nn/nn/nn','2',2009-01-01 00:02:00.100000000,2009,1
+3,-3,false,3,3,3,30,3.29999995232,30.3,'nn/nn/nn','3',2009-01-01 00:03:00.300000000,2009,1
+4,-4,true,4,4,4,40,4.40000009537,40.4,'nn/nn/nn','4',2009-01-01 00:04:00.600000000,2009,1
+5,-5,false,5,5,5,50,5.5,50.5,'nn/nn/nn','5',2009-01-01 00:05:00.100000000,2009,1
+6,-6,true,6,6,6,60,6.59999990463,60.6,'nn/nn/nn','6',2009-01-01 00:06:00.150000000,2009,1
+7,-7,false,7,7,7,70,7.69999980927,70.7,'nn/nn/nn','7',2009-01-01 00:07:00.210000000,2009,1
+8,-8,true,8,8,8,80,8.80000019073,80.8,'nn/nn/nn','8',2009-01-01 00:08:00.280000000,2009,1
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, * from functional.alltypes_view
+---- RESULTS
+'0',0,true,0,0,0,0,0.0,0.0,'nn/nn/nn','0',2009-01-01 00:00:00,2009,1
+'1',-1,false,1,1,1,10,1.10000002384,10.1,'nn/nn/nn','1',2009-01-01 00:01:00,2009,1
+'2',-2,true,2,2,2,20,2.20000004768,20.2,'nn/nn/nn','2',2009-01-01 00:02:00.100000000,2009,1
+'3',-3,false,3,3,3,30,3.29999995232,30.3,'nn/nn/nn','3',2009-01-01 00:03:00.300000000,2009,1
+'4',-4,true,4,4,4,40,4.40000009537,40.4,'nn/nn/nn','4',2009-01-01 00:04:00.600000000,2009,1
+'5',-5,false,5,5,5,50,5.5,50.5,'nn/nn/nn','5',2009-01-01 00:05:00.100000000,2009,1
+'6',-6,true,6,6,6,60,6.59999990463,60.6,'nn/nn/nn','6',2009-01-01 00:06:00.150000000,2009,1
+'7',-7,false,7,7,7,70,7.69999980927,70.7,'nn/nn/nn','7',2009-01-01 00:07:00.210000000,2009,1
+'8',-8,true,8,8,8,80,8.80000019073,80.8,'nn/nn/nn','8',2009-01-01 00:08:00.280000000,2009,1
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, *, int_col from functional.alltypes_view
+---- RESULTS
+'0',0,true,0,0,0,0,0.0,0.0,'nn/nn/nn','0',2009-01-01 00:00:00,2009,1,0
+'1',-1,false,1,1,1,10,1.10000002384,10.1,'nn/nn/nn','1',2009-01-01 00:01:00,2009,1,1
+'2',-2,true,2,2,2,20,2.20000004768,20.2,'nn/nn/nn','2',2009-01-01 00:02:00.100000000,2009,1,2
+'3',-3,false,3,3,3,30,3.29999995232,30.3,'nn/nn/nn','3',2009-01-01 00:03:00.300000000,2009,1,3
+'4',-4,true,4,4,4,40,4.40000009537,40.4,'nn/nn/nn','4',2009-01-01 00:04:00.600000000,2009,1,4
+'5',-5,false,5,5,5,50,5.5,50.5,'nn/nn/nn','5',2009-01-01 00:05:00.100000000,2009,1,5
+'6',-6,true,6,6,6,60,6.59999990463,60.6,'nn/nn/nn','6',2009-01-01 00:06:00.150000000,2009,1,6
+'7',-7,false,7,7,7,70,7.69999980927,70.7,'nn/nn/nn','7',2009-01-01 00:07:00.210000000,2009,1,7
+'8',-8,true,8,8,8,80,8.80000019073,80.8,'nn/nn/nn','8',2009-01-01 00:08:00.280000000,2009,1,8
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Test with expr rewrite rules.
 select id, bool_col, string_col,
   if (id <=> id, 0, 1),
diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
index c1ef6b99a..c303ec362 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_row_filtering.test
@@ -11,6 +11,54 @@ select * from functional.alltypestiny
 INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
 ====
 ---- QUERY
+# Test on star select item with other columns
+# Row-filtering policy keeps rows with "id % 2 = 0"
+select int_col, * from functional.alltypestiny
+---- RESULTS
+0,0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+0,2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2
+0,4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3
+0,6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+# Row-filtering policy keeps rows with "id % 2 = 0"
+select string_col, * from functional.alltypestiny
+---- RESULTS
+'0',0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+'0',2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2
+'0',4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3
+'0',6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+# Row-filtering policy keeps rows with "id % 2 = 0"
+select string_col, *, int_col from functional.alltypestiny
+---- RESULTS
+'0',0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1,0
+'0',2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2,0
+'0',4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3,0
+'0',6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4,0
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
+# Test on star select item with other columns
+# Row-filtering policy keeps rows with "id % 2 = 0"
+select string_col, *, month, *, int_col from functional.alltypestiny
+---- RESULTS
+'0',0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1,1,0,true,0,0,0,0,0,0,'01/01/09','0',2009-01-01 00:00:00,2009,1,0
+'0',2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2,2,2,true,0,0,0,0,0,0,'02/01/09','0',2009-02-01 00:00:00,2009,2,0
+'0',4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3,3,4,true,0,0,0,0,0,0,'03/01/09','0',2009-03-01 00:00:00,2009,3,0
+'0',6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4,4,6,true,0,0,0,0,0,0,'04/01/09','0',2009-04-01 00:00:00,2009,4,0
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # Row-filtering policy keeps rows with
 # "(string_col = '0' and id <= 0) or (string_col = '1' and bool_col = true and id > 90)"
 select id, string_col, bool_col, year, month from functional.alltypessmall
@@ -232,6 +280,56 @@ from functional.alltypes_view where id % 2 = 0
 INT,BOOLEAN,INT,STRING,STRING,INT,INT
 ====
 ---- QUERY
+# Test on star select item
+# Row-filtering policy on the view keeps rows with "id < 5". Row-filtering policy on the
+# underlying table 'alltypes' keeps rows with "year = 2009 and month = 1".
+select * from functional.alltypes_view
+---- RESULTS
+0,true,0,0,0,0,0.0,0.0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+1,false,1,1,1,10,1.10000002384,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1
+2,true,2,2,2,20,2.20000004768,20.2,'01/01/09','2',2009-01-01 00:02:00.100000000,2009,1
+3,false,3,3,3,30,3.29999995232,30.3,'01/01/09','3',2009-01-01 00:03:00.300000000,2009,1
+4,true,4,4,4,40,4.40000009537,40.4,'01/01/09','4',2009-01-01 00:04:00.600000000,2009,1
+---- TYPES
+INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select int_col, * from functional.alltypes_view
+---- RESULTS
+0,0,true,0,0,0,0,0.0,0.0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+1,1,false,1,1,1,10,1.10000002384,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1
+2,2,true,2,2,2,20,2.20000004768,20.2,'01/01/09','2',2009-01-01 00:02:00.100000000,2009,1
+3,3,false,3,3,3,30,3.29999995232,30.3,'01/01/09','3',2009-01-01 00:03:00.300000000,2009,1
+4,4,true,4,4,4,40,4.40000009537,40.4,'01/01/09','4',2009-01-01 00:04:00.600000000,2009,1
+---- TYPES
+INT,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, * from functional.alltypes_view
+---- RESULTS
+'0',0,true,0,0,0,0,0.0,0.0,'01/01/09','0',2009-01-01 00:00:00,2009,1
+'1',1,false,1,1,1,10,1.10000002384,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1
+'2',2,true,2,2,2,20,2.20000004768,20.2,'01/01/09','2',2009-01-01 00:02:00.100000000,2009,1
+'3',3,false,3,3,3,30,3.29999995232,30.3,'01/01/09','3',2009-01-01 00:03:00.300000000,2009,1
+'4',4,true,4,4,4,40,4.40000009537,40.4,'01/01/09','4',2009-01-01 00:04:00.600000000,2009,1
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT
+====
+---- QUERY
+# Test on star select item with normal columns
+select string_col, *, int_col from functional.alltypes_view
+---- RESULTS
+'0',0,true,0,0,0,0,0.0,0.0,'01/01/09','0',2009-01-01 00:00:00,2009,1,0
+'1',1,false,1,1,1,10,1.10000002384,10.1,'01/01/09','1',2009-01-01 00:01:00,2009,1,1
+'2',2,true,2,2,2,20,2.20000004768,20.2,'01/01/09','2',2009-01-01 00:02:00.100000000,2009,1,2
+'3',3,false,3,3,3,30,3.29999995232,30.3,'01/01/09','3',2009-01-01 00:03:00.300000000,2009,1,3
+'4',4,true,4,4,4,40,4.40000009537,40.4,'01/01/09','4',2009-01-01 00:04:00.600000000,2009,1,4
+---- TYPES
+STRING,INT,BOOLEAN,TINYINT,SMALLINT,INT,BIGINT,FLOAT,DOUBLE,STRING,STRING,TIMESTAMP,INT,INT,INT
+====
+---- QUERY
 # The query has no results since the where-clause is the opposite of the row-filter expr.
 select * from functional.alltypes_view where id >= 5
 ---- RESULTS

[impala] 12/17: IMPALA-11953: Declare num_trues and num_falses in TIntermediateColumnStats as optional

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit c6223b2aeb8ae23a094551aa2abc8fab75e13165
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Feb 28 08:48:50 2023 +0800

    IMPALA-11953: Declare num_trues and num_falses in TIntermediateColumnStats as optional
    
    TIntermediateColumnStats is the representation of incremental stats
    which are stored in HMS partition properties using keys like
    "impala_intermediate_stats_chunk0", "impala_intermediate_stats_chunk1",
    "impala_intermediate_stats_chunk2", etc.
    
    Fields in TIntermediateColumnStats should be optional to ensure
    backward compatibility. IMPALA-8205 adds two required fields, num_trues
    and num_falses, in TIntermediateColumnStats. This breaks the incremental
    stats loading in higher versions of Impala if the stats are generated by
    older Impala versions (< 4.0). This patch changes the fields to be
    optional.
    
    Tests:
     - Verified the incremental stats generated by CDH Impala cluster can be
       loaded by CDP Impala cluster with this fix.
    
    Change-Id: I4f74d5d0676e7ce9eb4ea8061a15610846db3ca5
    Reviewed-on: http://gerrit.cloudera.org:8080/19555
    Reviewed-by: Riza Suminto <ri...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 common/thrift/CatalogObjects.thrift | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/common/thrift/CatalogObjects.thrift b/common/thrift/CatalogObjects.thrift
index 8aa1116c6..7f095fbc1 100644
--- a/common/thrift/CatalogObjects.thrift
+++ b/common/thrift/CatalogObjects.thrift
@@ -201,6 +201,8 @@ struct TColumnStats {
 
 // Intermediate state for the computation of per-column stats. Impala can aggregate these
 // structures together to produce final stats for a column.
+// Fields should be optional for backward compatibility since this is stored in HMS
+// partition properties.
 struct TIntermediateColumnStats {
   // One byte for each bucket of the NDV HLL computation
   1: optional binary intermediate_ndv
@@ -221,8 +223,8 @@ struct TIntermediateColumnStats {
   6: optional i64 num_rows
 
   // The number of true and false value, of the column
-  7: required i64 num_trues
-  8: required i64 num_falses
+  7: optional i64 num_trues
+  8: optional i64 num_falses
 
   // The low and the high value
   9: optional Data.TColumnValue low_value

[impala] 10/17: IMPALA-11845: (Addendum) Don't specify db name in the new struct tests

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit faae4a513c504b28c83b0417b368f2eff8f4c91c
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Fri Feb 3 09:05:36 2023 +0800

    IMPALA-11845: (Addendum) Don't specify db name in the new struct tests
    
    Some new tests are added for STAR expansion on struct types when the
    table is masked by Ranger masking policies. They are tested on both
    Parquet and ORC tables. However, some tests explicitly use
    'functional_parquet' as the db name, which lose the coverage on ORC
    tables. This patch removes the explicit db names.
    
    Change-Id: I8efea5cc2e10d8ae50ee6c1201e325932cb27fbf
    Reviewed-on: http://gerrit.cloudera.org:8080/19470
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../queries/QueryTest/ranger_column_masking_complex_types.test    | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
index e088aeb12..92027f1f0 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/ranger_column_masking_complex_types.test
@@ -77,7 +77,7 @@ INT
 ---- QUERY
 # Test resolving explicit STAR path on a nested struct column inside array
 select id, nested_arr.item.*
-from functional_parquet.complextypestbl t,
+from complextypestbl t,
   t.nested_struct.c.d arr,
   arr.item nested_arr;
 ---- RESULTS
@@ -99,7 +99,7 @@ BIGINT,INT,STRING
 ---- QUERY
 # Test resolving explicit STAR path on a nested struct column inside array
 select nested_arr.item.*
-from functional_parquet.complextypestbl t,
+from complextypestbl t,
   t.nested_struct.c.d arr,
   arr.item nested_arr;
 ---- RESULTS
@@ -121,7 +121,7 @@ INT,STRING
 ---- QUERY
 # Test resolving implicit STAR path on a nested struct column inside array
 select id, nested_arr.*
-from functional_parquet.complextypestbl t,
+from complextypestbl t,
   t.nested_struct.c.d arr,
   arr.item nested_arr;
 ---- RESULTS
@@ -143,7 +143,7 @@ BIGINT,INT,STRING
 ---- QUERY
 # Test resolving explicit STAR path on a nested struct column inside array
 select nested_arr.*
-from functional_parquet.complextypestbl t,
+from complextypestbl t,
   t.nested_struct.c.d arr,
   arr.item nested_arr;
 ---- RESULTS

[impala] 15/17: IMPALA-11751: (Addendum) fix test for Ozone

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 43b01859bd127d1453c88a3bbb76217a106cfca1
Author: Michael Smith <mi...@cloudera.com>
AuthorDate: Thu Dec 8 09:57:25 2022 -0800

    IMPALA-11751: (Addendum) fix test for Ozone
    
    Use FILESYSTEM_PREFIX to make test relocatable, as required by the Ozone
    test environment.
    
    Testing: ran test with Ozone.
    
    Change-Id: Ic16322d90bd4039ec5ce2a54be79c748ee822978
    Reviewed-on: http://gerrit.cloudera.org:8080/19330
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../functional-query/queries/QueryTest/compute-stats-avro.test        | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
index 387fd9bed..f6f453205 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/compute-stats-avro.test
@@ -57,7 +57,7 @@ STRING, STRING, BIGINT, BIGINT, BIGINT, DOUBLE, BIGINT, BIGINT
 create external table avro_hive_alltypes_ext
 like functional_avro_snap.alltypes;
 alter table avro_hive_alltypes_ext
-set location '/test-warehouse/alltypes_avro_snap';
+set location '$FILESYSTEM_PREFIX/test-warehouse/alltypes_avro_snap';
 alter table avro_hive_alltypes_ext recover partitions;
 compute stats avro_hive_alltypes_ext;
 ---- RESULTS
@@ -141,7 +141,7 @@ create external table avro_hive_alltypes_str_part (
   month string
 )
 stored as avro
-location '/test-warehouse/alltypes_avro_snap';
+location '$FILESYSTEM_PREFIX/test-warehouse/alltypes_avro_snap';
 alter table avro_hive_alltypes_str_part recover partitions;
 compute stats avro_hive_alltypes_str_part;
 ---- RESULTS

[impala] 06/17: IMPALA-11779: Fix crash in TopNNode due to slots in null type

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit dff569b7e388b0ece08e04d9cc13d2a8a96d629e
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Sat Dec 10 17:10:38 2022 +0800

    IMPALA-11779: Fix crash in TopNNode due to slots in null type
    
    BE can't codegen or evaluate exprs in NULL type. So when FE transfers
    exprs to BE (via thrift), it will convert exprs in NULL type into
    NullLiteral with Boolean type, e.g. see code in Expr#treeToThrift().
    The type doesn't matter since ScalarExprEvaluator::GetValue() in BE
    returns nullptr for null values of all types, and nullptr is treated as
    null value.
    
    Most of the exprs in BE are generated from thrift TExprs transferred
    from FE, which guarantees they are not NULL type exprs. However, in
    TopNPlanNode::Init(), we create SlotRefs directly based on the sort
    tuple descriptor. If there are NULL type slots in the tuple descriptor,
    we get SlotRefs in NULL type, which will crash codegen or evaluation (if
    codegen is disabled) on them.
    
    This patch adds a type-safe create method for SlotRef which uses
    TYPE_BOOLEAN for TYPE_NULL. BE codes that create SlotRef directly from
    SlotDescriptors are replaced by calling this create method, which
    guarantees no TYPE_NULL exprs are used in the corresponding evaluators.
    
    Tests:
     - Added new tests in partitioned-top-n.test
     - Ran exhaustive tests
    
    Change-Id: I6aaf80c5129eaf788c70c8f041021eaf73087f94
    Reviewed-on: http://gerrit.cloudera.org:8080/19336
    Reviewed-by: Zoltan Borok-Nagy <bo...@cloudera.com>
    Tested-by: Quanlong Huang <hu...@gmail.com>
---
 be/src/exec/grouping-aggregator.cc                 |  6 ++--
 be/src/exec/topn-node.cc                           |  3 +-
 be/src/exprs/slot-ref.cc                           | 10 ++++++
 be/src/exprs/slot-ref.h                            |  4 +++
 .../queries/QueryTest/partitioned-top-n.test       | 40 ++++++++++++++++++++++
 5 files changed, 57 insertions(+), 6 deletions(-)

diff --git a/be/src/exec/grouping-aggregator.cc b/be/src/exec/grouping-aggregator.cc
index 1b8ebd104..40471892e 100644
--- a/be/src/exec/grouping-aggregator.cc
+++ b/be/src/exec/grouping-aggregator.cc
@@ -103,10 +103,8 @@ Status GroupingAggregatorConfig::Init(
   for (int i = 0; i < grouping_exprs_.size(); ++i) {
     SlotDescriptor* desc = intermediate_tuple_desc_->slots()[i];
     DCHECK(desc->type().type == TYPE_NULL || desc->type() == grouping_exprs_[i]->type());
-    // Hack to avoid TYPE_NULL SlotRefs.
-    SlotRef* build_expr = state->obj_pool()->Add(desc->type().type != TYPE_NULL ?
-            new SlotRef(desc) :
-            new SlotRef(desc, ColumnType(TYPE_BOOLEAN)));
+    // Use SlotRef::TypeSafeCreate() to avoid TYPE_NULL SlotRefs.
+    SlotRef* build_expr = state->obj_pool()->Add(SlotRef::TypeSafeCreate(desc));
     build_exprs_.push_back(build_expr);
     // Not an entry point because all hash table callers support codegen.
     RETURN_IF_ERROR(
diff --git a/be/src/exec/topn-node.cc b/be/src/exec/topn-node.cc
index 7d2ff11b1..3de1b68d1 100644
--- a/be/src/exec/topn-node.cc
+++ b/be/src/exec/topn-node.cc
@@ -95,8 +95,7 @@ Status TopNPlanNode::Init(const TPlanNode& tnode, FragmentState* state) {
 
     // Construct SlotRefs that simply copy the output tuple to itself.
     for (const SlotDescriptor* slot_desc : output_tuple_desc_->slots()) {
-      SlotRef* slot_ref =
-          state->obj_pool()->Add(new SlotRef(slot_desc, slot_desc->type()));
+      SlotRef* slot_ref = state->obj_pool()->Add(SlotRef::TypeSafeCreate(slot_desc));
       noop_tuple_exprs_.push_back(slot_ref);
       RETURN_IF_ERROR(slot_ref->Init(*row_descriptor_, true, state));
     }
diff --git a/be/src/exprs/slot-ref.cc b/be/src/exprs/slot-ref.cc
index 32d4e254c..d883f68db 100644
--- a/be/src/exprs/slot-ref.cc
+++ b/be/src/exprs/slot-ref.cc
@@ -72,6 +72,16 @@ SlotRef::SlotRef(const ColumnType& type, int offset, const bool nullable /* = fa
     null_indicator_offset_(0, nullable ? offset : -1),
     slot_id_(-1) {}
 
+SlotRef* SlotRef::TypeSafeCreate(const SlotDescriptor* desc) {
+  if (desc->type().type == TYPE_NULL) {
+    // ScalarExprEvaluator requires a non-null type for the expr. It returns nullptr for
+    // null values of all types. So replacing TYPE_NULL to an arbitrary type is ok.
+    // Here we use TYPE_BOOLEAN for consistency with other places.
+    return new SlotRef(desc, ColumnType(TYPE_BOOLEAN));
+  }
+  return new SlotRef(desc);
+}
+
 Status SlotRef::Init(
     const RowDescriptor& row_desc, bool is_entry_point, FragmentState* state) {
   DCHECK(type_.IsStructType() || children_.size() == 0);
diff --git a/be/src/exprs/slot-ref.h b/be/src/exprs/slot-ref.h
index dfb299825..c0b1bf31a 100644
--- a/be/src/exprs/slot-ref.h
+++ b/be/src/exprs/slot-ref.h
@@ -45,6 +45,10 @@ class SlotRef : public ScalarExpr {
   /// does not generate the appropriate exprs).
   SlotRef(const SlotDescriptor* desc, const ColumnType& type);
 
+  /// Create a SlotRef based on the given SlotDescriptor 'desc' and make sure the type is
+  /// not TYPE_NULL (if so, replaced it with TYPE_BOOLEAN).
+  static SlotRef* TypeSafeCreate(const SlotDescriptor* desc);
+
   /// Used for testing.  GetValue will return tuple + offset interpreted as 'type'
   SlotRef(const ColumnType& type, int offset, const bool nullable = false);
 
diff --git a/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test b/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
index 2c29b7389..7cfff846f 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
@@ -80,3 +80,43 @@ NULL,0,1
 ---- TYPES
 TINYINT, INT, BIGINT
 ====
+---- QUERY
+# IMPALA-11779: test null slots in the sort tuple
+with v1 as (
+  select '0' as a1, '' as b1 from alltypestiny
+), v2 as (
+  select '' as a2, null as b2
+), v3 as (
+  select b1 as b
+  from v1 left join v2 on a1 = a2
+)
+select 1 from (
+  select row_number() over (partition by b order by b) rnk
+  from v3
+) v
+where rnk = 1
+---- RESULTS
+1
+---- TYPES
+TINYINT
+====
+---- QUERY
+# IMPALA-11779: test null slots in the sort tuple
+with v1 as (
+  select '0' as a1, '' as b1 from alltypes
+), v2 as (
+  select '' as a2, null as b2
+), v3 as (
+  select b1 as b
+  from v1 left join v2 on a1 = a2
+)
+select count(*) from (
+  select row_number() over (partition by b order by b) rnk
+  from v3
+) v
+where rnk < 10
+---- RESULTS
+9
+---- TYPES
+BIGINT
+====

[impala] 17/17: Update version to 4.1.2-RELEASE

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit d6f90efd65ebbf4911bbfec751b74083f25d4c6a
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Tue Mar 28 06:51:34 2023 +0800

    Update version to 4.1.2-RELEASE
    
    Change-Id: I4646f0304e1ba38a847da92263e25a8a4cf7adbc
---
 bin/impala-config.sh                 | 2 +-
 fe/pom.xml                           | 2 +-
 java/TableFlattener/pom.xml          | 2 +-
 java/datagenerator/pom.xml           | 2 +-
 java/executor-deps/pom.xml           | 2 +-
 java/ext-data-source/api/pom.xml     | 2 +-
 java/ext-data-source/pom.xml         | 2 +-
 java/ext-data-source/sample/pom.xml  | 2 +-
 java/ext-data-source/test/pom.xml    | 2 +-
 java/pom.xml                         | 2 +-
 java/query-event-hook-api/pom.xml    | 2 +-
 java/shaded-deps/hive-exec/pom.xml   | 2 +-
 java/shaded-deps/s3a-aws-sdk/pom.xml | 2 +-
 java/test-corrupt-hive-udfs/pom.xml  | 2 +-
 java/test-hive-udfs/pom.xml          | 2 +-
 java/yarn-extras/pom.xml             | 2 +-
 16 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/bin/impala-config.sh b/bin/impala-config.sh
index 6a3e8786f..ca13472ed 100755
--- a/bin/impala-config.sh
+++ b/bin/impala-config.sh
@@ -70,7 +70,7 @@ fi
 # WARNING: If changing this value, also run these commands:
 # cd ${IMPALA_HOME}/java
 # mvn versions:set -DnewVersion=YOUR_NEW_VERSION
-export IMPALA_VERSION=4.1.1-RELEASE
+export IMPALA_VERSION=4.1.2-RELEASE
 
 # The unique build id of the toolchain to use if bootstrapping. This is generated by the
 # native-toolchain build when publishing its build artifacts. This should be changed when
diff --git a/fe/pom.xml b/fe/pom.xml
index 0b71cbb5b..72516726e 100644
--- a/fe/pom.xml
+++ b/fe/pom.xml
@@ -23,7 +23,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
     <relativePath>../java/pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/TableFlattener/pom.xml b/java/TableFlattener/pom.xml
index 800c23b70..b6b423419 100644
--- a/java/TableFlattener/pom.xml
+++ b/java/TableFlattener/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>nested-table-flattener</artifactId>
diff --git a/java/datagenerator/pom.xml b/java/datagenerator/pom.xml
index 386bf365e..bd23a533a 100644
--- a/java/datagenerator/pom.xml
+++ b/java/datagenerator/pom.xml
@@ -23,7 +23,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
diff --git a/java/executor-deps/pom.xml b/java/executor-deps/pom.xml
index 0783ff06e..bd8a0e99f 100644
--- a/java/executor-deps/pom.xml
+++ b/java/executor-deps/pom.xml
@@ -34,7 +34,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <groupId>org.apache.impala</groupId>
diff --git a/java/ext-data-source/api/pom.xml b/java/ext-data-source/api/pom.xml
index d93658a8b..5e74d9a72 100644
--- a/java/ext-data-source/api/pom.xml
+++ b/java/ext-data-source/api/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <artifactId>impala-data-source-api</artifactId>
   <name>Apache Impala External Data Source API</name>
diff --git a/java/ext-data-source/pom.xml b/java/ext-data-source/pom.xml
index 6d52a2d5e..a5acceab1 100644
--- a/java/ext-data-source/pom.xml
+++ b/java/ext-data-source/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>impala-data-source</artifactId>
diff --git a/java/ext-data-source/sample/pom.xml b/java/ext-data-source/sample/pom.xml
index 40553a45c..69753a80e 100644
--- a/java/ext-data-source/sample/pom.xml
+++ b/java/ext-data-source/sample/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <artifactId>impala-data-source-sample</artifactId>
   <name>Apache Impala External Data Source Sample</name>
diff --git a/java/ext-data-source/test/pom.xml b/java/ext-data-source/test/pom.xml
index 9140800b3..4a4fc0243 100644
--- a/java/ext-data-source/test/pom.xml
+++ b/java/ext-data-source/test/pom.xml
@@ -23,7 +23,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-data-source</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <artifactId>impala-data-source-test</artifactId>
   <name>Apache Impala External Data Source Test Library</name>
diff --git a/java/pom.xml b/java/pom.xml
index 187212813..6b08f43e2 100644
--- a/java/pom.xml
+++ b/java/pom.xml
@@ -21,7 +21,7 @@ under the License.
   <modelVersion>4.0.0</modelVersion>
   <groupId>org.apache.impala</groupId>
   <artifactId>impala-parent</artifactId>
-  <version>4.1.1-RELEASE</version>
+  <version>4.1.2-RELEASE</version>
   <packaging>pom</packaging>
   <name>Apache Impala Parent POM</name>
 
diff --git a/java/query-event-hook-api/pom.xml b/java/query-event-hook-api/pom.xml
index 5608d9f93..18b4d4c76 100644
--- a/java/query-event-hook-api/pom.xml
+++ b/java/query-event-hook-api/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>query-event-hook-api</artifactId>
diff --git a/java/shaded-deps/hive-exec/pom.xml b/java/shaded-deps/hive-exec/pom.xml
index 3f2f39294..83c6e2a80 100644
--- a/java/shaded-deps/hive-exec/pom.xml
+++ b/java/shaded-deps/hive-exec/pom.xml
@@ -27,7 +27,7 @@ the same dependencies
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/shaded-deps/s3a-aws-sdk/pom.xml b/java/shaded-deps/s3a-aws-sdk/pom.xml
index 70ea94bfb..a5bfc40c2 100644
--- a/java/shaded-deps/s3a-aws-sdk/pom.xml
+++ b/java/shaded-deps/s3a-aws-sdk/pom.xml
@@ -25,7 +25,7 @@ though some of them might not be necessary. The exclusions are sorted alphabetic
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
     <relativePath>../../pom.xml</relativePath>
   </parent>
   <modelVersion>4.0.0</modelVersion>
diff --git a/java/test-corrupt-hive-udfs/pom.xml b/java/test-corrupt-hive-udfs/pom.xml
index 92ba79bc0..8381b8290 100644
--- a/java/test-corrupt-hive-udfs/pom.xml
+++ b/java/test-corrupt-hive-udfs/pom.xml
@@ -23,7 +23,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
diff --git a/java/test-hive-udfs/pom.xml b/java/test-hive-udfs/pom.xml
index 68d264219..4ce835dc9 100644
--- a/java/test-hive-udfs/pom.xml
+++ b/java/test-hive-udfs/pom.xml
@@ -22,7 +22,7 @@ under the License.
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
 
diff --git a/java/yarn-extras/pom.xml b/java/yarn-extras/pom.xml
index 9b117dd6f..3007ea187 100644
--- a/java/yarn-extras/pom.xml
+++ b/java/yarn-extras/pom.xml
@@ -22,7 +22,7 @@
   <parent>
     <groupId>org.apache.impala</groupId>
     <artifactId>impala-parent</artifactId>
-    <version>4.1.1-RELEASE</version>
+    <version>4.1.2-RELEASE</version>
   </parent>
   <modelVersion>4.0.0</modelVersion>
   <artifactId>yarn-extras</artifactId>

[impala] 07/17: IMPALA-11843: Fix IndexOutOfBoundsException in analytic limit pushdown

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4037f43f2e6a33706e2c5a18e7fdba95b69f8601
Author: stiga-huang <hu...@gmail.com>
AuthorDate: Mon Jan 16 19:34:13 2023 +0800

    IMPALA-11843: Fix IndexOutOfBoundsException in analytic limit pushdown
    
    When finding analytic conjuncts for analytic limit pushdown, the
    following conditions are checked:
     - The conjunct should be a binary predicate
     - Left hand side is a SlotRef referencing the analytic expression, e.g.
       "rn" of "row_number() as rn"
     - The underlying analytic function is rank(), dense_rank() or row_number()
     - The window frame is UNBOUNDED PRECEDING to CURRENT ROW
     - Right hand side is a valid numeric limit
     - The op is =, <, or <=
    See more details in AnalyticPlanner.inferPartitionLimits().
    
    While checking the 2nd and 3rd condition, we get the source exprs of the
    SlotRef. The source exprs could be empty if the SlotRef is actually
    referencing a column of the table, i.e. a column materialized by the
    scan node. Currently, we check the first source expr directly regardless
    whether the list is empty, which causes the IndexOutOfBoundsException.
    
    This patch fixes it by augmenting the check to consider an empty list.
    Also fixes a similar code in AnalyticEvalNode.
    
    Tests:
     - Add FE and e2e regression tests
    
    Change-Id: I26d6bd58be58d09a29b8b81972e76665f41cf103
    Reviewed-on: http://gerrit.cloudera.org:8080/19422
    Reviewed-by: Aman Sinha <am...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../apache/impala/planner/AnalyticEvalNode.java    |  2 +-
 .../org/apache/impala/planner/AnalyticPlanner.java |  2 +-
 .../limit-pushdown-partitioned-top-n.test          | 32 ++++++++++++++++++++++
 .../queries/QueryTest/partitioned-top-n.test       | 12 ++++++++
 4 files changed, 46 insertions(+), 2 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java b/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
index b0dd6987d..d4fb6abf4 100644
--- a/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/AnalyticEvalNode.java
@@ -587,7 +587,7 @@ public class AnalyticEvalNode extends PlanNode {
       return falseStatus;
     }
     List<Expr> lhsSourceExprs = ((SlotRef) lhs).getDesc().getSourceExprs();
-    if (lhsSourceExprs.size() > 1 ||
+    if (lhsSourceExprs.size() != 1 ||
           !(lhsSourceExprs.get(0) instanceof AnalyticExpr)) {
       return falseStatus;
     }
diff --git a/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java b/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
index c070ea69f..0fc8b333f 100644
--- a/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
+++ b/fe/src/main/java/org/apache/impala/planner/AnalyticPlanner.java
@@ -910,7 +910,7 @@ public class AnalyticPlanner {
       if (!(lhs instanceof SlotRef)) continue;
 
       List<Expr> lhsSourceExprs = ((SlotRef) lhs).getDesc().getSourceExprs();
-      if (lhsSourceExprs.size() > 1 ||
+      if (lhsSourceExprs.size() != 1 ||
             !(lhsSourceExprs.get(0) instanceof AnalyticExpr)) {
         continue;
       }
diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test b/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test
index 24c59d02f..3b9c65263 100644
--- a/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/limit-pushdown-partitioned-top-n.test
@@ -891,3 +891,35 @@ PLAN-ROOT SINK
    HDFS partitions=1/1 files=1 size=38B
    row-size=24B cardinality=unavailable
 ====
+# Regression test for IMPALA-11843. Several conjuncts to consider pushing down.
+select id, rn from (
+  select id,
+    row_number() over (order by id desc) rn,
+    max(id) over () max_id
+  from functional.alltypesagg) t
+where id = max_id and rn < 10
+---- PLAN
+PLAN-ROOT SINK
+|
+04:SELECT
+|  predicates: id = max(id), id = max(id), row_number() < 10
+|  row-size=16B cardinality=1.10K
+|
+03:ANALYTIC
+|  functions: row_number()
+|  order by: id DESC
+|  window: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
+|  row-size=16B cardinality=11.00K
+|
+02:ANALYTIC
+|  functions: max(id)
+|  row-size=8B cardinality=11.00K
+|
+01:SORT
+|  order by: id DESC
+|  row-size=4B cardinality=11.00K
+|
+00:SCAN HDFS [functional.alltypesagg]
+   HDFS partitions=11/11 files=11 size=814.73KB
+   row-size=4B cardinality=11.00K
+====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test b/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
index 7cfff846f..538d289f8 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/partitioned-top-n.test
@@ -120,3 +120,15 @@ where rnk < 10
 ---- TYPES
 BIGINT
 ====
+---- QUERY
+select id, rn from (
+  select id,
+    row_number() over (order by id desc) rn,
+    max(id) over () max_id
+  from functional.alltypesagg) t
+where id = max_id and rn < 10
+---- RESULTS
+9999,1
+---- TYPES
+INT,BIGINT
+====

[impala] 08/17: IMPALA-11857: Connect join build fragment to join in graphical plan

Posted by st...@apache.org.

This is an automated email from the ASF dual-hosted git repository.

stigahuang pushed a commit to branch branch-4.1.2
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 588719d321bda8bc54f50fcbf0234986a0e28c4f
Author: Kurt Deschler <kd...@cloudera.com>
AuthorDate: Mon Jan 23 18:57:37 2023 -0500

    IMPALA-11857: Connect join build fragment to join in graphical plan
    
    When support was added for join build sink, the plan JSON and plan
    rendering logic were not updated to handle the new sink type. As a
    result, the linkage from the join build fragment to its corresponding
    join node were not expressed in the JSON and build fragments nodes were
    rendered as orphaned.
    
    This change adds a new JSON join_build_target field to join build
    fragments and connects the build fragment to the join with a green dashed
    line similar to the red dashed line used for data sender fragments.
    
    Also changed the SVG fill type to 'none' for exchange edges to avoid
    rendering a black triangle if the right child was an exchange as in the
    join build case.
    
    Testing:
    - Manual testing with TPCH queries and reviewing rendered plan diagrams
    
    Change-Id: I80af977e5c5e869268d3ee68fafe541adadc239d
    Reviewed-on: http://gerrit.cloudera.org:8080/19437
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/service/impala-http-handler.cc |  4 ++++
 www/query_plan.tmpl                   | 13 ++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/be/src/service/impala-http-handler.cc b/be/src/service/impala-http-handler.cc
index caffd7b9b..fabb8f4b5 100644
--- a/be/src/service/impala-http-handler.cc
+++ b/be/src/service/impala-http-handler.cc
@@ -820,6 +820,10 @@ void PlanToJson(const vector<TPlanFragment>& fragments, const TExecSummary& summ
         Value target(label_map[sink.stream_sink.dest_node_id].c_str(),
             document->GetAllocator());
         plan_fragment.AddMember("data_stream_target", target, document->GetAllocator());
+      } else if (sink.__isset.join_build_sink) {
+        Value target(label_map[sink.join_build_sink.dest_node_id].c_str(),
+            document->GetAllocator());
+        plan_fragment.AddMember("join_build_target", target, document->GetAllocator());
       }
     }
     nodes.PushBack(plan_fragment, document->GetAllocator());
diff --git a/www/query_plan.tmpl b/www/query_plan.tmpl
index 744eb69d9..ed2373fcc 100644
--- a/www/query_plan.tmpl
+++ b/www/query_plan.tmpl
@@ -115,13 +115,20 @@ function build(node, parent, edges, states, colour_idx, max_node_time) {
     edges.push({ start: node["label"], end: parent,
                  style: { label: label_val }});
   }
-  // Add an inter-fragment edge. We use a red dashed line to show that rows are crossing
-  // the fragment boundary.
+  // Add an inter-fragment edges
   if (node["data_stream_target"]) {
+    // Use a red dashed line to show a streaming data boundary
     edges.push({ "start": node["label"],
                  "end": node["data_stream_target"],
                  "style": { label: "" + node["output_card"].toLocaleString(),
-                            style: "stroke: #f66; stroke-dasharray: 5, 5;"}});
+                            style: "fill:none; stroke: #c00000; stroke-dasharray: 5, 5;"}});
+  } else if (node["join_build_target"]) {
+    // Use a green dashed line to show a join build boundary
+    edges.push({ "start": node["label"],
+                 "end": node["join_build_target"],
+                 "style": { label: "" + node["output_card"].toLocaleString(),
+                            style: "fill: none; stroke: #00c000; stroke-dasharray: 5, 5;"}
+});
   }
   max_node_time = Math.max(node["max_time_val"], max_node_time)
   for (var i = 0; i < node["children"].length; ++i) {