You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by wz...@apache.org on 2021/12/16 15:05:27 UTC

[impala] branch master updated (b25e250 -> 1ed48a5)

This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from b25e250  IMPALA-10926: Improve catalogd consistency and self events detection
     new b0c37b6  IMPALA-11054: Support resource pool polling for frontend
     new 4b33107  IMPALA-11035: Make x-forwarded-for http header case-insensitive
     new 763acff  IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
     new 1ed48a5  IMPALA-11005 (part 3): Repalce random number generator with mt19937_64

The 4 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/exec/union-node.cc                          |  11 ++
 be/src/exec/union-node.h                           |  13 +++
 be/src/exprs/aggregate-functions-ir.cc             |   9 +-
 be/src/exprs/scalar-expr.cc                        |  13 ++-
 be/src/exprs/scalar-expr.h                         |   6 +-
 be/src/runtime/fragment-state.h                    |   3 +
 be/src/scheduling/request-pool-service.cc          |  36 +++----
 be/src/scheduling/request-pool-service.h           |   4 +-
 be/src/transport/THttpServer.cpp                   |   2 +-
 common/thrift/Exprs.thrift                         |   3 +
 .../main/java/org/apache/impala/analysis/Expr.java |  11 ++
 .../org/apache/impala/analysis/ValuesStmt.java     |   8 ++
 .../java/org/apache/impala/planner/UnionNode.java  |   7 +-
 .../apache/impala/util/JniRequestPoolService.java  | 112 +++++++++++++++++++++
 .../org/apache/impala/util/RequestPoolService.java |  91 +++++++++--------
 .../apache/impala/analysis/ExprRewriterTest.java   |   4 +-
 .../apache/impala/customcluster/LdapHS2Test.java   |  27 +++--
 .../apache/impala/util/TestRequestPoolService.java |   4 +-
 .../queries/PlannerTest/values.test                |  32 +++---
 .../queries/QueryTest/alloc-fail-init.test         |   4 +-
 .../QueryTest/union-const-scalar-expr-codegen.test |  80 +++++++++++++++
 tests/query_test/test_codegen.py                   |   7 ++
 22 files changed, 382 insertions(+), 105 deletions(-)
 create mode 100644 fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test

[impala] 03/04: IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 763acffb74ec2770a9402ba23c145ea928021f8d
Author: Abhishek Rawat <ar...@cloudera.com>
AuthorDate: Tue Jun 11 12:56:22 2019 -0700

    IMPALA-6590: Disable expr rewrites and codegen for VALUES() statements
    
    Expression rewrites for VALUES() could result in performance regression
    since there is virtually no benefit of rewrite, if the expression will
    only ever be evaluated once. The overhead of rewrites in some cases
    could be huge, especially if there are several constant expressions.
    The regression also seems to non-linearly increase as number of columns
    increases. Similarly, there is no value in doing codegen for such const
    expressions.
    
    The rewriteExprs() for ValuesStmt class was overridden with an empty
    function body. As a result rewrites for VALUES() is a no-op.
    
    Codegen was disabled for const expressions within a UNION node, if
    the UNION node is not within a subplan. This applies to all UNION nodes
    with const expressions (and not just limited to UNION nodes associated
    with a VALUES clause).
    
    The decision for whether or not to enable codegen for const expressions
    in a UNION is made in the planner when a UnionNode is initialized. A new
    member 'is_codegen_disabled' was added to the thrift struct TExprNode
    for communicating this decision to backend. The Optimizer should take
    decisions it can and so it seemed like the right place to disable/enable
    codegen. The infrastructure is generic and could be extended in future
    to selectively disable codegen for any given expression, if needed.
    
    Testing:
    - Added a new e2e test case in tests/query_test/test_codegen.py, which
      tests the different scenarios involving UNION with const expressions.
    - Passed exhaustive unit-tests.
    - Ran manual tests to validate that the non-linear regression in VALUES
      clause when involving increasing number of columns is no longer seen.
      Results below.
    
    for i in 256 512 1024 2048 4096 8192 16384 32768;
    do (echo 'VALUES ('; for x in $(seq $i);
    do echo  "cast($x as string),"; done;
    echo "NULL); profile;") |
    time impala-shell.sh -f /dev/stdin |& grep Analysis; done
    
    Base:
           - Analysis finished: 20.137ms (19.215ms)
           - Analysis finished: 46.275ms (44.597ms)
           - Analysis finished: 119.642ms (116.663ms)
           - Analysis finished: 361.195ms (355.856ms)
           - Analysis finished: 1s277ms (1s266ms)
           - Analysis finished: 5s664ms (5s640ms)
           - Analysis finished: 29s689ms (29s646ms)
           - Analysis finished: 2m (2m)
    
    Test:
           - Analysis finished: 1.868ms (986.520us)
           - Analysis finished: 3.195ms (1.856ms)
           - Analysis finished: 7.332ms (3.484ms)
           - Analysis finished: 13.896ms (8.071ms)
           - Analysis finished: 31.015ms (18.963ms)
           - Analysis finished: 60.157ms (38.125ms)
           - Analysis finished: 113.694ms (67.642ms)
           - Analysis finished: 253.044ms (163.180ms)
    
    Change-Id: I229d67b821968321abd8f97f7c89cf2617000d8d
    Reviewed-on: http://gerrit.cloudera.org:8080/13645
    Reviewed-by: Joe McDonnell <jo...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/union-node.cc                          | 11 +++
 be/src/exec/union-node.h                           | 13 ++++
 be/src/exprs/scalar-expr.cc                        | 13 ++--
 be/src/exprs/scalar-expr.h                         |  6 +-
 be/src/runtime/fragment-state.h                    |  3 +
 common/thrift/Exprs.thrift                         |  3 +
 .../main/java/org/apache/impala/analysis/Expr.java | 11 +++
 .../org/apache/impala/analysis/ValuesStmt.java     |  8 +++
 .../java/org/apache/impala/planner/UnionNode.java  |  7 +-
 .../apache/impala/analysis/ExprRewriterTest.java   |  4 +-
 .../queries/PlannerTest/values.test                | 32 ++++-----
 .../QueryTest/union-const-scalar-expr-codegen.test | 80 ++++++++++++++++++++++
 tests/query_test/test_codegen.py                   |  7 ++
 13 files changed, 173 insertions(+), 25 deletions(-)

diff --git a/be/src/exec/union-node.cc b/be/src/exec/union-node.cc
index aac6874..b454cdf 100644
--- a/be/src/exec/union-node.cc
+++ b/be/src/exec/union-node.cc
@@ -41,6 +41,8 @@ Status UnionPlanNode::Init(const TPlanNode& tnode, FragmentState* state) {
   DCHECK(tuple_desc_ != nullptr);
   first_materialized_child_idx_ = tnode_->union_node.first_materialized_child_idx;
   DCHECK_GT(first_materialized_child_idx_, -1);
+  const int64_t num_nonconst_scalar_expr_to_be_codegened =
+      state->NumScalarExprNeedsCodegen();
   // Create const_exprs_lists_ from thrift exprs.
   const vector<vector<TExpr>>& const_texpr_lists = tnode_->union_node.const_expr_lists;
   for (const vector<TExpr>& texprs : const_texpr_lists) {
@@ -49,6 +51,8 @@ Status UnionPlanNode::Init(const TPlanNode& tnode, FragmentState* state) {
     DCHECK_EQ(const_exprs.size(), tuple_desc_->slots().size());
     const_exprs_lists_.push_back(const_exprs);
   }
+  num_const_scalar_expr_to_be_codegened_ =
+      state->NumScalarExprNeedsCodegen() - num_nonconst_scalar_expr_to_be_codegened;
   // Create child_exprs_lists_ from thrift exprs.
   const vector<vector<TExpr>>& thrift_result_exprs = tnode_->union_node.result_expr_lists;
   for (int i = 0; i < thrift_result_exprs.size(); ++i) {
@@ -85,6 +89,8 @@ UnionNode::UnionNode(
   : ExecNode(pool, pnode, descs),
     tuple_desc_(pnode.tuple_desc_),
     first_materialized_child_idx_(pnode.first_materialized_child_idx_),
+    num_const_scalar_expr_to_be_codegened_(pnode.num_const_scalar_expr_to_be_codegened_),
+    is_codegen_status_added_(pnode.is_codegen_status_added_),
     const_exprs_lists_(pnode.const_exprs_lists_),
     child_exprs_lists_(pnode.child_exprs_lists_),
     codegend_union_materialize_batch_fns_(pnode.codegend_union_materialize_batch_fns_),
@@ -158,6 +164,7 @@ void UnionPlanNode::Codegen(FragmentState* state){
         &(codegend_union_materialize_batch_fns_.data()[i]));
   }
   AddCodegenStatus(codegen_status, codegen_message.str());
+  is_codegen_status_added_ = true;
 }
 
 Status UnionNode::Open(RuntimeState* state) {
@@ -177,6 +184,10 @@ Status UnionNode::Open(RuntimeState* state) {
   // succeeded.
   if (!children_.empty()) RETURN_IF_ERROR(child(child_idx_)->Open(state));
 
+  if (is_codegen_status_added_ && num_const_scalar_expr_to_be_codegened_ == 0
+      && !const_exprs_lists_.empty()) {
+    runtime_profile_->AppendExecOption("Codegen Disabled for const scalar expressions");
+  }
   return Status::OK();
 }
 
diff --git a/be/src/exec/union-node.h b/be/src/exec/union-node.h
index 59be242..f9574b3 100644
--- a/be/src/exec/union-node.h
+++ b/be/src/exec/union-node.h
@@ -62,6 +62,13 @@ class UnionPlanNode : public PlanNode {
   /// materialized.
   int first_materialized_child_idx_ = -1;
 
+  /// Number of const scalar expressions which will be codegened.
+  /// This is only used for observability.
+  int64_t num_const_scalar_expr_to_be_codegened_ = 0;
+
+  /// Set as TRUE if codegen status is added.
+  bool is_codegen_status_added_ = false;
+
   typedef void (*UnionMaterializeBatchFn)(UnionNode*, RowBatch*, uint8_t**);
   /// Vector of pointers to codegen'ed MaterializeBatch functions. The vector contains one
   /// function for each child. The size of the vector should be equal to the number of
@@ -105,6 +112,12 @@ class UnionNode : public ExecNode {
   /// materialized.
   const int first_materialized_child_idx_;
 
+  /// Number of const scalar expressions which will be codegened.
+  const int64_t num_const_scalar_expr_to_be_codegened_;
+
+  /// Reference to UnionPlanNode::is_codegen_status_added_.
+  const bool& is_codegen_status_added_;
+
   /// Const exprs materialized by this node. These exprs don't refer to any children.
   /// Only materialized by the first fragment instance to avoid duplication.
   const std::vector<std::vector<ScalarExpr*>>& const_exprs_lists_;
diff --git a/be/src/exprs/scalar-expr.cc b/be/src/exprs/scalar-expr.cc
index 0bf2069..cf5a287 100644
--- a/be/src/exprs/scalar-expr.cc
+++ b/be/src/exprs/scalar-expr.cc
@@ -65,14 +65,15 @@ namespace impala {
 
 const char* ScalarExpr::LLVM_CLASS_NAME = "class.impala::ScalarExpr";
 
-ScalarExpr::ScalarExpr(const ColumnType& type, bool is_constant)
+ScalarExpr::ScalarExpr(const ColumnType& type, bool is_constant, bool is_codegen_disabled)
   : Expr(type),
-    is_constant_(is_constant) {
-}
+    is_constant_(is_constant),
+    is_codegen_disabled_(is_codegen_disabled) {}
 
 ScalarExpr::ScalarExpr(const TExprNode& node)
   : Expr(node),
-    is_constant_(node.is_constant) {
+    is_constant_(node.is_constant),
+    is_codegen_disabled_(node.is_codegen_disabled) {
   if (node.__isset.fn) fn_ = node.fn;
 }
 
@@ -328,8 +329,10 @@ bool ScalarExpr::ShouldCodegen(const FragmentState* state) const {
   //    key expression in a descriptor table.
   // 2. codegen is disabled by query option.
   // 3. there is an optimization hint to disable codegen and the expr can be interpreted.
+  // 4. Optimizer decided to disable codegen. Example: const expressions in VALUES()
+  //    which are evaluated only once.
   return state != nullptr && !state->CodegenDisabledByQueryOption()
-      && !(state->CodegenHasDisableHint() && IsInterpretable());
+      && !((state->CodegenHasDisableHint() || is_codegen_disabled_) && IsInterpretable());
 }
 
 int ScalarExpr::GetSlotIds(vector<SlotId>* slot_ids) const {
diff --git a/be/src/exprs/scalar-expr.h b/be/src/exprs/scalar-expr.h
index a148cc5..582763e 100644
--- a/be/src/exprs/scalar-expr.h
+++ b/be/src/exprs/scalar-expr.h
@@ -256,7 +256,7 @@ class ScalarExpr : public Expr {
   static Status CreateNode(const TExprNode& texpr_node, ObjectPool* pool,
       ScalarExpr** expr) WARN_UNUSED_RESULT;
 
-  ScalarExpr(const ColumnType& type, bool is_constant);
+  ScalarExpr(const ColumnType& type, bool is_constant, bool is_codegen_disabled = false);
   ScalarExpr(const TExprNode& node);
 
   /// Implementation of GetCodegendComputeFn() to be overridden by each subclass of
@@ -394,6 +394,10 @@ class ScalarExpr : public Expr {
   /// Set in GetCodegendComputeFn().
   bool added_to_jit_ = false;
 
+  /// True if codegen should be disabled for this scalar expression. Typical use case
+  /// is const expressions in VALUES clause, which are evaluated only once.
+  const bool is_codegen_disabled_;
+
   /// Static wrappers which call the compute function of the given ScalarExpr, passing
   /// it the ScalarExprEvaluator and TupleRow. These are cross-compiled and used by
   /// GetStaticGetValWrapper.
diff --git a/be/src/runtime/fragment-state.h b/be/src/runtime/fragment-state.h
index 6bda279..fa6a66b 100644
--- a/be/src/runtime/fragment-state.h
+++ b/be/src/runtime/fragment-state.h
@@ -115,6 +115,9 @@ class FragmentState {
   /// created, init'ed in which all expressions' Prepare() are invoked.
   bool ScalarExprNeedsCodegen() const { return !scalar_exprs_to_codegen_.empty(); }
 
+  /// Returns the number of scalar expressions to be codegen'd.
+  int64_t NumScalarExprNeedsCodegen() const { return scalar_exprs_to_codegen_.size(); }
+
   /// Check if codegen was disabled and if so, add a message to the runtime profile.
   /// Call this only after expressions have been have been created.
   void CheckAndAddCodegenDisabledMessage(std::vector<std::string>& codegen_status_msgs) {
diff --git a/common/thrift/Exprs.thrift b/common/thrift/Exprs.thrift
index 7ed771a..fb02a29 100644
--- a/common/thrift/Exprs.thrift
+++ b/common/thrift/Exprs.thrift
@@ -177,6 +177,9 @@ struct TExprNode {
   20: optional TTimestampLiteral timestamp_literal
   21: optional TKuduPartitionExpr kudu_partition_expr
   22: optional TCastExpr cast_expr
+
+  // If codegen is disabled for this Expr
+  23: optional bool is_codegen_disabled
 }
 
 // A flattened representation of a tree of Expr nodes, obtained by depth-first
diff --git a/fe/src/main/java/org/apache/impala/analysis/Expr.java b/fe/src/main/java/org/apache/impala/analysis/Expr.java
index 07c27c1..fe11c2b 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Expr.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Expr.java
@@ -413,6 +413,9 @@ abstract public class Expr extends TreeNode<Expr> implements ParseNode, Cloneabl
   // For exprs of type Predicate, this keeps track of predicate hints
   protected List<PlanHint> predicateHints_;
 
+  // Is codegen disabled for this expression ?
+  private boolean isCodegenDisabled_ = false;
+
   protected Expr() {
     type_ = Type.INVALID;
     selectivity_ = -1.0;
@@ -441,6 +444,7 @@ abstract public class Expr extends TreeNode<Expr> implements ParseNode, Cloneabl
       predicateHints_ = new ArrayList<>();
       predicateHints_.addAll(other.predicateHints_);
     }
+    isCodegenDisabled_ = other.isCodegenDisabled_;
   }
 
   public boolean isAnalyzed() { return isAnalyzed_; }
@@ -462,6 +466,12 @@ abstract public class Expr extends TreeNode<Expr> implements ParseNode, Cloneabl
   public boolean isAuxExpr() { return isAuxExpr_; }
   public void setIsAuxExpr() { isAuxExpr_ = true; }
   public Function getFn() { return fn_; }
+  public void disableCodegen() {
+    isCodegenDisabled_ = true;
+    for (Expr child : children_) {
+      child.disableCodegen();
+    }
+  }
 
   /**
    * Perform semantic analysis of node and all of its children.
@@ -856,6 +866,7 @@ abstract public class Expr extends TreeNode<Expr> implements ParseNode, Cloneabl
     msg.type = type_.toThrift();
     msg.is_constant = isConstant_;
     msg.num_children = children_.size();
+    msg.setIs_codegen_disabled(isCodegenDisabled_);
     if (fn_ != null) {
       TFunction thriftFn = fn_.toThrift();
       thriftFn.setLast_modified_time(fn_.getLastModifiedTime());
diff --git a/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java b/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
index 642cac7..349c491 100644
--- a/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/ValuesStmt.java
@@ -22,6 +22,8 @@ import java.util.List;
 import com.google.common.base.Preconditions;
 
 import static org.apache.impala.analysis.ToSqlOptions.DEFAULT;
+import org.apache.impala.common.AnalysisException;
+import org.apache.impala.rewrite.ExprRewriter;
 
 /**
  * Representation of a values() statement with a list of constant-expression lists.
@@ -81,4 +83,10 @@ public class ValuesStmt extends UnionStmt {
 
   @Override
   public ValuesStmt clone() { return new ValuesStmt(this); }
+
+  /**
+   * Intentionally left empty to disable expression rewrite for values clause.
+   */
+  @Override
+  public void rewriteExprs(ExprRewriter rewriter) {}
 }
diff --git a/fe/src/main/java/org/apache/impala/planner/UnionNode.java b/fe/src/main/java/org/apache/impala/planner/UnionNode.java
index 72da0e6..06dc4e1 100644
--- a/fe/src/main/java/org/apache/impala/planner/UnionNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/UnionNode.java
@@ -295,7 +295,12 @@ public class UnionNode extends PlanNode {
       Preconditions.checkState(exprList.size() == slots.size());
       List<Expr> newExprList = new ArrayList<>();
       for (int i = 0; i < exprList.size(); ++i) {
-        if (slots.get(i).isMaterialized()) newExprList.add(exprList.get(i));
+        if (slots.get(i).isMaterialized()) {
+          Expr constExpr = exprList.get(i);
+          // Disable codegen for const expressions which will only ever be evaluated once.
+          if (!isInSubplan_) constExpr.disableCodegen();
+          newExprList.add(constExpr);
+        }
       }
       materializedConstExprLists_.add(newExprList);
     }
diff --git a/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java b/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
index 71a5f4d..9fa7fc6 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ExprRewriterTest.java
@@ -135,8 +135,8 @@ public class ExprRewriterTest extends AnalyzerTest {
         stmt_, stmt_), 47, 23);
     // Constant select.
     RewritesOk("select 1, 2, 3, 4", 4, 4);
-    // Values stmt.
-    RewritesOk("values(1, '2', 3, 4.1), (1, '2', 3, 4.1)", 8, 8);
+    // Values stmt - expression rewrites are disabled.
+    RewritesOk("values(1, '2', 3, 4.1), (1, '2', 3, 4.1)", 0, 0);
     // Test WHERE-clause subqueries.
     RewritesOk("select id, int_col from functional.alltypes a " +
         "where exists (select 1 from functional.alltypes " +
diff --git a/testdata/workloads/functional-planner/queries/PlannerTest/values.test b/testdata/workloads/functional-planner/queries/PlannerTest/values.test
index ffa5632..36b0fb4 100644
--- a/testdata/workloads/functional-planner/queries/PlannerTest/values.test
+++ b/testdata/workloads/functional-planner/queries/PlannerTest/values.test
@@ -4,35 +4,35 @@ PLAN-ROOT SINK
 |
 00:UNION
    constant-operands=1
-   row-size=18B cardinality=1
+   row-size=19B cardinality=1
 ---- DISTRIBUTEDPLAN
 PLAN-ROOT SINK
 |
 00:UNION
    constant-operands=1
-   row-size=18B cardinality=1
+   row-size=19B cardinality=1
 ====
 values(1+1, 2, 5.0, 'a') order by 1 limit 10
 ---- PLAN
 PLAN-ROOT SINK
 |
 01:TOP-N [LIMIT=10]
-|  order by: 2 ASC
-|  row-size=18B cardinality=1
+|  order by: 1 + 1 ASC
+|  row-size=19B cardinality=1
 |
 00:UNION
    constant-operands=1
-   row-size=18B cardinality=1
+   row-size=19B cardinality=1
 ---- DISTRIBUTEDPLAN
 PLAN-ROOT SINK
 |
 01:TOP-N [LIMIT=10]
-|  order by: 2 ASC
-|  row-size=18B cardinality=1
+|  order by: 1 + 1 ASC
+|  row-size=19B cardinality=1
 |
 00:UNION
    constant-operands=1
-   row-size=18B cardinality=1
+   row-size=19B cardinality=1
 ====
 values((1+1, 2, 5.0, 'a'), (2, 3, 6.0, 'b'), (3, 4, 7.0, 'c'))
 ---- PLAN
@@ -40,33 +40,33 @@ PLAN-ROOT SINK
 |
 00:UNION
    constant-operands=3
-   row-size=18B cardinality=3
+   row-size=19B cardinality=3
 ---- DISTRIBUTEDPLAN
 PLAN-ROOT SINK
 |
 00:UNION
    constant-operands=3
-   row-size=18B cardinality=3
+   row-size=19B cardinality=3
 ====
 values((1+1, 2, 5.0, 'a'), (2, 3, 6.0, 'b'), (3, 4, 7.0, 'c')) order by 1 limit 10
 ---- PLAN
 PLAN-ROOT SINK
 |
 01:TOP-N [LIMIT=10]
-|  order by: 2 ASC
-|  row-size=18B cardinality=3
+|  order by: 1 + 1 ASC
+|  row-size=19B cardinality=3
 |
 00:UNION
    constant-operands=3
-   row-size=18B cardinality=3
+   row-size=19B cardinality=3
 ---- DISTRIBUTEDPLAN
 PLAN-ROOT SINK
 |
 01:TOP-N [LIMIT=10]
-|  order by: 2 ASC
-|  row-size=18B cardinality=3
+|  order by: 1 + 1 ASC
+|  row-size=19B cardinality=3
 |
 00:UNION
    constant-operands=3
-   row-size=18B cardinality=3
+   row-size=19B cardinality=3
 ====
diff --git a/testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test b/testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
new file mode 100644
index 0000000..e8398eb
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/union-const-scalar-expr-codegen.test
@@ -0,0 +1,80 @@
+====
+---- QUERY
+# Test union with multiple legs each having const expressions.
+# Expect codegen to be disabled for const expressions.
+set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
+select 1,2,3 union all select 4,5,6 union all select 7,8,9 order by 1;
+---- TYPES
+tinyint,tinyint,tinyint
+---- RESULTS
+1,2,3
+4,5,6
+7,8,9
+---- RUNTIME_PROFILE
+00:UNION
+   constant-operands=3
+#SORT_NODE
+ExecOption: Codegen Enabled
+#UNION_NODE
+ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
+====
+---- QUERY
+# Test insert statement with values (translated into UNION with const expressions).
+# Expect codegen to be disabled for const expressions.
+set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
+drop table if exists test_values_codegen;
+create table test_values_codegen (c1 int, c2 timestamp, c3 string);
+insert into test_values_codegen(c1) values (CAST(1+ceil(2.5)*3 as tinyint));
+---- RUNTIME_PROFILE
+00:UNION
+   constant-operands=1
+#UNION_NODE
+ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
+====
+---- QUERY
+# Test insert statement with values having const scalar expressions.
+# Expect codegen to be disabled for const expressions.
+set DISABLE_CODEGEN_ROWS_THRESHOLD=1;
+insert into test_values_codegen values
+  (1+1, '2015-04-09 14:07:46.580465000', base64encode('hello world')),
+  (CAST(1*2+2-5 as INT), CAST(1428421382 as timestamp),
+   regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0));
+---- RUNTIME_PROFILE
+00:UNION
+   constant-operands=2
+#UNION_NODE
+ExecOption: Codegen Enabled, Codegen Disabled for const scalar expressions
+====
+---- QUERY
+# Test the result of above inserts with codegen disabled.
+select * from test_values_codegen order by c1;
+---- TYPES
+int, timestamp, string
+---- RESULTS
+-1,2015-04-07 15:43:02,'abcdef123ghi456'
+2,2015-04-09 14:07:46.580465000,'aGVsbG8gd29ybGQ='
+10,NULL,'NULL'
+====
+---- QUERY
+# Test union with const expressions in a subplan.
+# Expect codegen enabled.
+select count(c.c_custkey), count(v.tot_price)
+from tpch_nested_parquet.customer c, (
+  select sum(o_totalprice) tot_price from c.c_orders
+  union
+  select 9.99 tot_price) v;
+---- TYPES
+BIGINT, BIGINT
+---- RESULTS
+300000,249996
+---- RUNTIME_PROFILE
+01:SUBPLAN
+|  03:UNION
+|  |  constant-operands=1
+#AGGREGATION_NODE (id=6)
+ExecOption: Codegen Enabled
+#UNION_NODE (id=3)
+ExecOption: Codegen Enabled
+#AGGREGATION_NODE (id=5)
+ExecOption: Codegen Enabled
+====
diff --git a/tests/query_test/test_codegen.py b/tests/query_test/test_codegen.py
index 0622d7d..18597aa 100644
--- a/tests/query_test/test_codegen.py
+++ b/tests/query_test/test_codegen.py
@@ -92,3 +92,10 @@ class TestCodegen(ImpalaTestSuite):
     profile_str = str(result.runtime_profile)
     assert "Probe Side Codegen Enabled" in profile_str, profile_str
     assert "Build Side Codegen Enabled" in profile_str, profile_str
+
+  def test_const_scalar_expr_in_union(self, vector, unique_database):
+    """Test that codegen is disabled for const scalar expressions in a UNION node.
+    if, however the UNION node is under a subplan then codegen is not disabled for
+    const expressions."""
+    self.run_test_case('QueryTest/union-const-scalar-expr-codegen', vector,
+        use_db=unique_database)

[impala] 01/04: IMPALA-11054: Support resource pool polling for frontend

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b0c37b66004090e59a895849feabe2bb35097d91
Author: wzhou-code <wz...@cloudera.com>
AuthorDate: Wed Dec 8 14:24:39 2021 -0800

    IMPALA-11054: Support resource pool polling for frontend
    
    This patch splits the Java class RequestPoolService as two classes -
    JniRequestPoolService and RequestPoolService, makes RequestPoolService
    as singleton class and provides an API for frontend to access
    RequestPoolService instance.
    
    Testing:
      - Manually verified that Planner could access RequestPoolService
        instance with RequestPoolService.getInstance().
      - Passed exhaustive tests.
    
    Change-Id: Ia78b1a0574f6b8ad4df5bb0fc9533f218b486e6b
    Reviewed-on: http://gerrit.cloudera.org:8080/18078
    Reviewed-by: Bikramjeet Vig <bi...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/scheduling/request-pool-service.cc          |  36 +++----
 be/src/scheduling/request-pool-service.h           |   4 +-
 .../apache/impala/util/JniRequestPoolService.java  | 112 +++++++++++++++++++++
 .../org/apache/impala/util/RequestPoolService.java |  91 +++++++++--------
 .../apache/impala/util/TestRequestPoolService.java |   4 +-
 5 files changed, 180 insertions(+), 67 deletions(-)

diff --git a/be/src/scheduling/request-pool-service.cc b/be/src/scheduling/request-pool-service.cc
index cd78793..90e48d3 100644
--- a/be/src/scheduling/request-pool-service.cc
+++ b/be/src/scheduling/request-pool-service.cc
@@ -31,6 +31,7 @@
 #include "util/collection-metrics.h"
 #include "util/mem-info.h"
 #include "util/parse-util.h"
+#include "util/test-info.h"
 #include "util/time.h"
 
 #include "common/names.h"
@@ -115,21 +116,21 @@ RequestPoolService::RequestPoolService(MetricGroup* metrics) :
   }
   default_pool_only_ = false;
 
-  jmethodID start_id; // RequestPoolService.start(), only called in this method.
+  jmethodID start_id; // JniRequestPoolService.start(), only called in this method.
   JniMethodDescriptor methods[] = {
-    {"<init>", "(Ljava/lang/String;Ljava/lang/String;)V", &ctor_},
-    {"start", "()V", &start_id},
-    {"resolveRequestPool", "([B)[B", &resolve_request_pool_id_},
-    {"getPoolConfig", "([B)[B", &get_pool_config_id_}};
+      {"<init>", "(Ljava/lang/String;Ljava/lang/String;Z)V", &ctor_},
+      {"start", "()V", &start_id},
+      {"resolveRequestPool", "([B)[B", &resolve_request_pool_id_},
+      {"getPoolConfig", "([B)[B", &get_pool_config_id_}};
 
   JNIEnv* jni_env = JniUtil::GetJNIEnv();
-  request_pool_service_class_ =
-    jni_env->FindClass("org/apache/impala/util/RequestPoolService");
+  jni_request_pool_service_class_ =
+      jni_env->FindClass("org/apache/impala/util/JniRequestPoolService");
   ABORT_IF_EXC(jni_env);
   uint32_t num_methods = sizeof(methods) / sizeof(methods[0]);
   for (int i = 0; i < num_methods; ++i) {
-    ABORT_IF_ERROR(JniUtil::LoadJniMethod(jni_env, request_pool_service_class_,
-        &(methods[i])));
+    ABORT_IF_ERROR(
+        JniUtil::LoadJniMethod(jni_env, jni_request_pool_service_class_, &(methods[i])));
   }
 
   jstring fair_scheduler_config_path =
@@ -139,12 +140,13 @@ RequestPoolService::RequestPoolService(MetricGroup* metrics) :
       jni_env->NewStringUTF(FLAGS_llama_site_path.c_str());
   ABORT_IF_EXC(jni_env);
 
-  jobject request_pool_service = jni_env->NewObject(request_pool_service_class_, ctor_,
-      fair_scheduler_config_path, llama_site_path);
+  jboolean is_be_test = TestInfo::is_be_test();
+  jobject jni_request_pool_service = jni_env->NewObject(jni_request_pool_service_class_,
+      ctor_, fair_scheduler_config_path, llama_site_path, is_be_test);
   ABORT_IF_EXC(jni_env);
-  ABORT_IF_ERROR(JniUtil::LocalToGlobalRef(jni_env, request_pool_service,
-      &request_pool_service_));
-  jni_env->CallObjectMethod(request_pool_service_, start_id);
+  ABORT_IF_ERROR(JniUtil::LocalToGlobalRef(
+      jni_env, jni_request_pool_service, &jni_request_pool_service_));
+  jni_env->CallObjectMethod(jni_request_pool_service_, start_id);
   ABORT_IF_EXC(jni_env);
 }
 
@@ -168,8 +170,8 @@ Status RequestPoolService::ResolveRequestPool(const TQueryCtx& ctx,
   params.__set_requested_pool(requested_pool);
   TResolveRequestPoolResult result;
   int64_t start_time = MonotonicMillis();
-  Status status = JniUtil::CallJniMethod(request_pool_service_, resolve_request_pool_id_,
-      params, &result);
+  Status status = JniUtil::CallJniMethod(
+      jni_request_pool_service_, resolve_request_pool_id_, params, &result);
   resolve_pool_ms_metric_->Update(MonotonicMillis() - start_time);
 
   if (result.status.status_code != TErrorCode::OK) {
@@ -205,7 +207,7 @@ Status RequestPoolService::GetPoolConfig(const string& pool_name,
   TPoolConfigParams params;
   params.__set_pool(pool_name);
   RETURN_IF_ERROR(JniUtil::CallJniMethod(
-        request_pool_service_, get_pool_config_id_, params, pool_config));
+      jni_request_pool_service_, get_pool_config_id_, params, pool_config));
   if (FLAGS_disable_pool_max_requests) pool_config->__set_max_requests(-1);
   if (FLAGS_disable_pool_mem_limits) pool_config->__set_max_mem_resources(-1);
   return Status::OK();
diff --git a/be/src/scheduling/request-pool-service.h b/be/src/scheduling/request-pool-service.h
index ad38900..5fa0979 100644
--- a/be/src/scheduling/request-pool-service.h
+++ b/be/src/scheduling/request-pool-service.h
@@ -69,9 +69,9 @@ class RequestPoolService {
 
   /// The following members are not initialized if default_pool_only_ is true.
   /// Descriptor of Java RequestPoolService class itself, used to create a new instance.
-  jclass request_pool_service_class_;
+  jclass jni_request_pool_service_class_;
   /// Instance of org.apache.impala.util.RequestPoolService
-  jobject request_pool_service_;
+  jobject jni_request_pool_service_;
   jmethodID resolve_request_pool_id_;  // RequestPoolService.resolveRequestPool()
   jmethodID get_pool_config_id_;  // RequestPoolService.getPoolConfig()
   jmethodID ctor_;
diff --git a/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java b/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java
new file mode 100644
index 0000000..8a5c3cf
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/util/JniRequestPoolService.java
@@ -0,0 +1,112 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.util;
+
+import com.google.common.base.Preconditions;
+
+import org.apache.impala.common.ImpalaException;
+import org.apache.impala.common.InternalException;
+import org.apache.impala.common.JniUtil;
+import org.apache.impala.thrift.TErrorCode;
+import org.apache.impala.thrift.TPoolConfigParams;
+import org.apache.impala.thrift.TPoolConfig;
+import org.apache.impala.thrift.TResolveRequestPoolParams;
+import org.apache.impala.thrift.TResolveRequestPoolResult;
+import org.apache.impala.thrift.TStatus;
+
+import org.apache.thrift.TException;
+import org.apache.thrift.TSerializer;
+import org.apache.thrift.protocol.TBinaryProtocol;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * JNI interface for RequestPoolService.
+ */
+public class JniRequestPoolService {
+  final static Logger LOG = LoggerFactory.getLogger(JniRequestPoolService.class);
+
+  private final static TBinaryProtocol.Factory protocolFactory_ =
+      new TBinaryProtocol.Factory();
+
+  // A single instance is created by the backend and lasts the duration of the process.
+  private final RequestPoolService requestPoolService_;
+
+  /**
+   * Creates a RequestPoolService instance with a configuration containing the specified
+   * fair-scheduler.xml and llama-site.xml.
+   *
+   * @param fsAllocationPath path to the fair scheduler allocation file.
+   * @param sitePath path to the configuration file.
+   */
+  JniRequestPoolService(
+      final String fsAllocationPath, final String sitePath, boolean isBackendTest) {
+    Preconditions.checkNotNull(fsAllocationPath);
+    requestPoolService_ =
+        RequestPoolService.getInstance(fsAllocationPath, sitePath, isBackendTest);
+  }
+
+  /**
+   * Starts the RequestPoolService instance. It does the initial loading of the
+   * configuration and starts the automatic reloading.
+   */
+  @SuppressWarnings("unused") // called from C++
+  public void start() {
+    requestPoolService_.start();
+  }
+
+  /**
+   * Resolves a user and pool to the pool specified by the allocation placement policy
+   * and checks if the user is authorized to submit requests.
+   *
+   * @param thriftResolvePoolParams Serialized {@link TResolveRequestPoolParams}
+   * @return serialized {@link TResolveRequestPoolResult}
+   */
+  @SuppressWarnings("unused") // called from C++
+  public byte[] resolveRequestPool(byte[] thriftResolvePoolParams)
+      throws ImpalaException {
+    TResolveRequestPoolParams resolvePoolParams = new TResolveRequestPoolParams();
+    JniUtil.deserializeThrift(
+        protocolFactory_, resolvePoolParams, thriftResolvePoolParams);
+    TResolveRequestPoolResult result =
+        requestPoolService_.resolveRequestPool(resolvePoolParams);
+    try {
+      return new TSerializer(protocolFactory_).serialize(result);
+    } catch (TException e) {
+      throw new InternalException(e.getMessage());
+    }
+  }
+
+  /**
+   * Gets the pool configuration values for the specified pool.
+   *
+   * @param thriftPoolConfigParams Serialized {@link TPoolConfigParams}
+   * @return serialized {@link TPoolConfig}
+   */
+  @SuppressWarnings("unused") // called from C++
+  public byte[] getPoolConfig(byte[] thriftPoolConfigParams) throws ImpalaException {
+    TPoolConfigParams poolConfigParams = new TPoolConfigParams();
+    JniUtil.deserializeThrift(protocolFactory_, poolConfigParams, thriftPoolConfigParams);
+    TPoolConfig result = requestPoolService_.getPoolConfig(poolConfigParams.getPool());
+    try {
+      return new TSerializer(protocolFactory_).serialize(result);
+    } catch (TException e) {
+      throw new InternalException(e.getMessage());
+    }
+  }
+}
diff --git a/fe/src/main/java/org/apache/impala/util/RequestPoolService.java b/fe/src/main/java/org/apache/impala/util/RequestPoolService.java
index 58fe6d5..828fe37 100644
--- a/fe/src/main/java/org/apache/impala/util/RequestPoolService.java
+++ b/fe/src/main/java/org/apache/impala/util/RequestPoolService.java
@@ -28,9 +28,6 @@ import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.security.UserGroupInformation;
 import org.apache.hadoop.yarn.api.records.QueueACL;
 import org.apache.hadoop.yarn.conf.YarnConfiguration;
-import org.apache.thrift.TException;
-import org.apache.thrift.TSerializer;
-import org.apache.thrift.protocol.TBinaryProtocol;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -80,8 +77,6 @@ import com.google.common.collect.Lists;
 public class RequestPoolService {
   final static Logger LOG = LoggerFactory.getLogger(RequestPoolService.class);
 
-  private final static TBinaryProtocol.Factory protocolFactory_ =
-      new TBinaryProtocol.Factory();
   // Used to ensure start() has been called before any other methods can be used.
   private final AtomicBoolean running_;
 
@@ -152,6 +147,9 @@ public class RequestPoolService {
   // URL of the configuration file.
   private final URL confUrl_;
 
+  // Reference of single instance of RequestPoolService.
+  private static RequestPoolService single_instance_ = null;
+
   /**
    * Updates the configuration when the file changes. The file is confUrl_
    * and it will exist when this is created (or RequestPoolService will not start). If
@@ -170,13 +168,42 @@ public class RequestPoolService {
   }
 
   /**
+   * Static method to create singleton instance of RequestPoolService class.
+   * This API is called by backend code through JNI, or called by unit-test code.
+   */
+  public static RequestPoolService getInstance(
+      final String fsAllocationPath, final String sitePath, boolean isTest) {
+    // For frontend and backend tests, different request pools could be created with
+    // different configurations in one process so we have to allow multiple instances
+    // to be created for frontend and backend tests.
+    if (isTest) return (new RequestPoolService(fsAllocationPath, sitePath));
+
+    if (single_instance_ == null) {
+      single_instance_ = new RequestPoolService(fsAllocationPath, sitePath);
+    }
+    return single_instance_;
+  }
+
+  /**
+   * Static method to return singleton instance of RequestPoolService class.
+   * This API is called by frontend Java code. An instance should be already created
+   * by backend before this API is called except only default pool is used.
+   */
+  public static RequestPoolService getInstance() {
+    if (single_instance_ == null) {
+      LOG.info("Default pool only, scheduler allocation is not specified.");
+    }
+    return single_instance_;
+  }
+
+  /**
    * Creates a RequestPoolService instance with a configuration containing the specified
    * fair-scheduler.xml and llama-site.xml.
    *
    * @param fsAllocationPath path to the fair scheduler allocation file.
    * @param sitePath path to the configuration file.
    */
-  RequestPoolService(final String fsAllocationPath, final String sitePath) {
+  private RequestPoolService(final String fsAllocationPath, final String sitePath) {
     Preconditions.checkNotNull(fsAllocationPath);
     running_ = new AtomicBoolean(false);
     allocationConf_ = new AtomicReference<>();
@@ -274,31 +301,12 @@ public class RequestPoolService {
    * Resolves a user and pool to the pool specified by the allocation placement policy
    * and checks if the user is authorized to submit requests.
    *
-   * @param thriftResolvePoolParams Serialized {@link TResolveRequestPoolParams}
-   * @return serialized {@link TResolveRequestPoolResult}
+   * @param resolvePoolParams {@link TResolveRequestPoolParams}
+   * @return {@link TResolveRequestPoolResult}
    */
-  @SuppressWarnings("unused") // called from C++
-  public byte[] resolveRequestPool(byte[] thriftResolvePoolParams)
-      throws ImpalaException {
-    TResolveRequestPoolParams resolvePoolParams = new TResolveRequestPoolParams();
-    JniUtil.deserializeThrift(protocolFactory_, resolvePoolParams,
-        thriftResolvePoolParams);
-    TResolveRequestPoolResult result = resolveRequestPool(resolvePoolParams);
-    if (LOG.isTraceEnabled()) {
-      LOG.trace("resolveRequestPool(pool={}, user={}): resolved_pool={}, has_access={}",
-          resolvePoolParams.getRequested_pool(), resolvePoolParams.getUser(),
-          result.resolved_pool, result.has_access);
-    }
-    try {
-      return new TSerializer(protocolFactory_).serialize(result);
-    } catch (TException e) {
-      throw new InternalException(e.getMessage());
-    }
-  }
-
-  @VisibleForTesting
-  TResolveRequestPoolResult resolveRequestPool(
+  public TResolveRequestPoolResult resolveRequestPool(
       TResolveRequestPoolParams resolvePoolParams) throws InternalException {
+    Preconditions.checkState(running_.get());
     String requestedPool = resolvePoolParams.getRequested_pool();
     String user = resolvePoolParams.getUser();
     TResolveRequestPoolResult result = new TResolveRequestPoolResult();
@@ -335,31 +343,22 @@ public class RequestPoolService {
       result.setHas_access(hasAccess(pool, user));
       result.setStatus(new TStatus(TErrorCode.OK, Lists.newArrayList()));
     }
+    if (LOG.isTraceEnabled()) {
+      LOG.trace("resolveRequestPool(pool={}, user={}): resolved_pool={}, has_access={}",
+          resolvePoolParams.getRequested_pool(), resolvePoolParams.getUser(),
+          result.resolved_pool, result.has_access);
+    }
     return result;
   }
 
   /**
    * Gets the pool configuration values for the specified pool.
    *
-   * @param thriftPoolConfigParams Serialized {@link TPoolConfigParams}
-   * @return serialized {@link TPoolConfig}
+   * @param pool name.
+   * @return {@link TPoolConfig}
    */
-  @SuppressWarnings("unused") // called from C++
-  public byte[] getPoolConfig(byte[] thriftPoolConfigParams) throws ImpalaException {
+  public TPoolConfig getPoolConfig(String pool) {
     Preconditions.checkState(running_.get());
-    TPoolConfigParams poolConfigParams = new TPoolConfigParams();
-    JniUtil.deserializeThrift(protocolFactory_, poolConfigParams,
-        thriftPoolConfigParams);
-    TPoolConfig result = getPoolConfig(poolConfigParams.getPool());
-    try {
-      return new TSerializer(protocolFactory_).serialize(result);
-    } catch (TException e) {
-      throw new InternalException(e.getMessage());
-    }
-  }
-
-  @VisibleForTesting
-  TPoolConfig getPoolConfig(String pool) {
     TPoolConfig result = new TPoolConfig();
     long maxMemoryMb = allocationConf_.get().getMaxResources(pool).getMemory();
     result.setMax_mem_resources(
diff --git a/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java b/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
index 436984c..efc9412 100644
--- a/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
+++ b/fe/src/test/java/org/apache/impala/util/TestRequestPoolService.java
@@ -105,8 +105,8 @@ public class TestRequestPoolService {
       Files.copy(getClasspathFile(llamaConfFile), llamaConfFile_);
       llamaConfPath = llamaConfFile_.getAbsolutePath();
     }
-    poolService_ = new RequestPoolService(allocationConfFile_.getAbsolutePath(),
-        llamaConfPath);
+    poolService_ = RequestPoolService.getInstance(
+        allocationConfFile_.getAbsolutePath(), llamaConfPath, /* isTest */ true);
 
     // Lower the wait times on the AllocationFileLoaderService and RequestPoolService so
     // the test doesn't have to wait very long to test that file changes are reloaded.

[impala] 02/04: IMPALA-11035: Make x-forwarded-for http header case-insensitive

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4b33107d15c32ffe7dfe6be99755830299ff8d61
Author: Abhishek Rawat <ar...@cloudera.com>
AuthorDate: Mon Nov 22 20:06:50 2021 -0800

    IMPALA-11035: Make x-forwarded-for http header case-insensitive
    
    Updated THttpServer::parseHeader to treat 'X-Forwarded-For' header name
    as case-insensitive.
    
    Added fe test (LdapHS2Test) to use mixed case 'x-forwarded-for' header.
    
    Change-Id: Id9c4070a4a2d5ad9decb186a9219957d8c26a7d7
    Reviewed-on: http://gerrit.cloudera.org:8080/18048
    Reviewed-by: Abhishek Rawat <ar...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/transport/THttpServer.cpp                   |  2 +-
 .../apache/impala/customcluster/LdapHS2Test.java   | 27 +++++++++++++++++-----
 2 files changed, 22 insertions(+), 7 deletions(-)

diff --git a/be/src/transport/THttpServer.cpp b/be/src/transport/THttpServer.cpp
index 6e11572..8189d16 100644
--- a/be/src/transport/THttpServer.cpp
+++ b/be/src/transport/THttpServer.cpp
@@ -142,7 +142,7 @@ void THttpServer::parseHeader(char* header) {
   } else if (THRIFT_strncasecmp(header, "Content-length", sz) == 0) {
     chunked_ = false;
     contentLength_ = atoi(value);
-  } else if (strncmp(header, "X-Forwarded-For", sz) == 0) {
+  } else if (THRIFT_strncasecmp(header, "X-Forwarded-For", sz) == 0) {
     origin_ = value;
   } else if ((has_ldap_ || has_kerberos_ || has_saml_ || has_jwt_)
       && THRIFT_strncasecmp(header, "Authorization", sz) == 0) {
diff --git a/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java b/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
index 07cc7a7..5641109 100644
--- a/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
+++ b/fe/src/test/java/org/apache/impala/customcluster/LdapHS2Test.java
@@ -376,7 +376,7 @@ public class LdapHS2Test {
     verifyMetrics(0, 0);
     verifyTrustedDomainMetrics(3);
 
-    // Case 2: Authenticate as 'Test1Ldap' without password
+    // Case 2a: Authenticate as 'Test1Ldap' without password
     headers.put("Authorization", "Basic VGVzdDFMZGFwOg==");
     headers.put("X-Forwarded-For", "127.0.0.1");
     transport.setCustomHeaders(headers);
@@ -388,6 +388,21 @@ public class LdapHS2Test {
     verifyMetrics(0, 0);
     verifyTrustedDomainMetrics(6);
 
+    // Case 2b: Authenticate as 'Test1Ldap' without password. Tests that XFF header name
+    // is case-insensitive.
+    headers.put("Authorization", "Basic VGVzdDFMZGFwOg==");
+    headers.remove("X-Forwarded-For");
+    headers.put("x-Forwarded-for", "127.0.0.1");
+    transport.setCustomHeaders(headers);
+    openResp = client.OpenSession(openReq);
+    verifyMetrics(0, 0);
+    verifyTrustedDomainMetrics(7);
+    operationHandle = execAndFetch(client, openResp.getSessionHandle(),
+        "select logged_in_user()", "Test1Ldap");
+    verifyMetrics(0, 0);
+    verifyTrustedDomainMetrics(9);
+    headers.remove("x-Forwarded-for");
+
     // Case 3: Case 1: Authenticate as 'Test1Ldap' with the right password
     // '12345' but with a non trusted address in X-Forwarded-For header
     headers.put("Authorization", "Basic VGVzdDFMZGFwOjEyMzQ1");
@@ -395,11 +410,11 @@ public class LdapHS2Test {
     transport.setCustomHeaders(headers);
     openResp = client.OpenSession(openReq);
     verifyMetrics(1, 0);
-    verifyTrustedDomainMetrics(6);
+    verifyTrustedDomainMetrics(9);
     operationHandle = execAndFetch(client, openResp.getSessionHandle(),
         "select logged_in_user()", "Test1Ldap");
     verifyMetrics(3, 0);
-    verifyTrustedDomainMetrics(6);
+    verifyTrustedDomainMetrics(9);
 
     // Case 4: No auth header, does not work
     headers.remove("Authorization");
@@ -409,7 +424,7 @@ public class LdapHS2Test {
       openResp = client.OpenSession(openReq);
       fail("Exception exception.");
     } catch (Exception e) {
-      verifyTrustedDomainMetrics(6);
+      verifyTrustedDomainMetrics(9);
       assertEquals(e.getMessage(), "HTTP Response code: 401");
     }
 
@@ -423,7 +438,7 @@ public class LdapHS2Test {
       fail("Exception exception.");
     } catch (Exception e) {
       verifyMetrics(3, 1);
-      verifyTrustedDomainMetrics(6);
+      verifyTrustedDomainMetrics(9);
       assertEquals(e.getMessage(), "HTTP Response code: 401");
     }
 
@@ -435,7 +450,7 @@ public class LdapHS2Test {
     openResp = client.OpenSession(openReq);
     // Account for 1 successful basic auth increment.
     verifyMetrics(4, 1);
-    verifyTrustedDomainMetrics(6);
+    verifyTrustedDomainMetrics(9);
   }
 
   /**

[impala] 04/04: IMPALA-11005 (part 3): Repalce random number generator with mt19937_64

Posted by wz...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

wzhou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 1ed48a542beeadc544193fbd4e74c2d1ac68daf5
Author: wzhou-code <wz...@cloudera.com>
AuthorDate: Thu Nov 11 15:12:53 2021 -0800

    IMPALA-11005 (part 3): Repalce random number generator with mt19937_64
    
    Previous patch upgraded boost library. This patch changes 64-bit random
    number generator from ranlux64_3 to mt19937_64 since mt19937_64 has
    better performance according to boost benchmark at https://www.boost.org
    /doc/libs/1_74_0/doc/html/boost_random/performance.html.
    Also fixs an unit-test which is affected by the change of random number
    generator.
    
    Testing:
     - Passed exhaustive tests.
    
    Change-Id: Iade226fc17442f4d7b9b14e4a9e80a30a3856226
    Reviewed-on: http://gerrit.cloudera.org:8080/18022
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exprs/aggregate-functions-ir.cc                           | 9 +++++----
 .../functional-query/queries/QueryTest/alloc-fail-init.test      | 4 ++--
 2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/be/src/exprs/aggregate-functions-ir.cc b/be/src/exprs/aggregate-functions-ir.cc
index a658e39..fe5b324 100644
--- a/be/src/exprs/aggregate-functions-ir.cc
+++ b/be/src/exprs/aggregate-functions-ir.cc
@@ -23,7 +23,7 @@
 #include <utility>
 #include <cmath>
 
-#include <boost/random/ranlux.hpp>
+#include <boost/random/mersenne_twister.hpp>
 #include <boost/random/uniform_int.hpp>
 
 #include "codegen/impala-ir.h"
@@ -53,7 +53,7 @@
 #include "common/names.h"
 
 using boost::uniform_int;
-using boost::ranlux64_3;
+using boost::mt19937_64;
 using std::make_pair;
 using std::map;
 using std::min_element;
@@ -1277,8 +1277,9 @@ class ReservoirSampleState {
   int64_t source_size_;
 
   // Random number generator for generating 64-bit integers
-  // TODO: Replace with mt19937_64 when upgrading boost
-  ranlux64_3 rng_;
+  // Replace ranlux64_3 with mt19937_64 for better performance. See boost benchmark at
+  // https://www.boost.org/doc/libs/1_74_0/doc/html/boost_random/performance.html
+  mt19937_64 rng_;
 
   // True if the array of samples is in the same memory allocation as this object. If
   // false, this object is responsible for freeing the memory.
diff --git a/testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test b/testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
index c6d17b5..c6d77d6 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/alloc-fail-init.test
@@ -13,7 +13,7 @@ FunctionContext::Allocate() failed to allocate 1 bytes.
 ---- QUERY
 select sample(timestamp_col) from functional.alltypes
 ---- CATCH
-FunctionContext::Allocate() failed to allocate 248 bytes.
+FunctionContext::Allocate() failed to allocate 2536 bytes.
 ====
 ---- QUERY
 select group_concat(string_col) from functional.alltypes
@@ -58,7 +58,7 @@ FunctionContextImpl::AllocateForResults() failed to allocate 120 bytes.
 ---- QUERY
 select appx_median(int_col) from functional.alltypes
 ---- CATCH
-FunctionContext::Allocate() failed to allocate 248 bytes.
+FunctionContext::Allocate() failed to allocate 2536 bytes.
 ====
 ---- QUERY
 select to_date(now())