You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by ta...@apache.org on 2019/02/16 01:20:50 UTC

[impala] branch 2.x updated (22fb381 -> e660f11)

This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a change to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git.


    from 22fb381  IMPALA-6035: Add query options to limit thread reservation
     new 4fe01b2  IMPALA-2195: Improper handling of comments in queries
     new 3cda4c2  IMPALA-7165: [DOCS] Correct example for dynamic partition pruning
     new 1a2ea50  IMPALA-7016: Implement ALTER DATABASE SET OWNER
     new 6bfa720  [DOCS] Wording changes in DPP examples for clarity
     new b8e995e  IMPALA-5552: Add support for authorized proxy groups
     new 56c98e9  IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()
     new 85906dd  IMPALA-7128 (part 2): add an interface for data sources
     new b1f43e4  IMPALA-6942: Reword error message to say "Failed" rather than "Cancelled"
     new 47f135d  IMPALA-6917: Implement COMMENT ON TABLE/VIEW
     new fb8332c  IMPALA-5604: document DISABLE_CODEGEN_ROWS_THRESHOLD
     new e660f11  IMPALA-7111: avoid use of boost::split in CheckPluginEnabled

The 11 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 be/src/codegen/gen_ir_descriptions.py              |   8 +-
 be/src/codegen/impala-ir.cc                        |   1 +
 be/src/exec/data-sink.cc                           |   4 +
 be/src/exec/data-sink.h                            |   4 +
 be/src/exec/exchange-node.cc                       |   2 +-
 be/src/exec/exec-node.cc                           |   8 -
 be/src/exec/exec-node.h                            |   3 -
 be/src/exec/hdfs-plugin-text-scanner.cc            |   9 +-
 be/src/exec/hdfs-scan-node-base.cc                 |   2 +-
 be/src/exec/partial-sort-node.cc                   |   2 +-
 be/src/exec/partitioned-aggregation-node.cc        |   2 +-
 be/src/exec/partitioned-hash-join-builder.cc       |   6 +-
 be/src/exec/partitioned-hash-join-builder.h        |   2 +-
 be/src/exec/partitioned-hash-join-node.cc          |   2 +-
 be/src/exec/sort-node.cc                           |   2 +-
 be/src/exec/topn-node.cc                           |   2 +-
 be/src/runtime/CMakeLists.txt                      |   1 +
 be/src/runtime/fragment-instance-state.cc          |   2 +
 be/src/runtime/krpc-data-stream-sender-ir.cc       |  49 +++++
 be/src/runtime/krpc-data-stream-sender.cc          | 237 ++++++++++++++++++---
 be/src/runtime/krpc-data-stream-sender.h           |  71 ++++--
 be/src/runtime/raw-value-ir.cc                     |  28 +++
 be/src/runtime/raw-value.cc                        |  27 ---
 be/src/runtime/runtime-state.h                     |   9 +
 be/src/service/CMakeLists.txt                      |   1 +
 be/src/service/frontend.cc                         |  16 ++
 be/src/service/frontend.h                          |   5 +
 be/src/service/impala-server-test.cc               |  74 +++++++
 be/src/service/impala-server.cc                    | 119 ++++++++---
 be/src/service/impala-server.h                     |  23 +-
 be/src/util/backend-gflag-util.cc                  |   3 +
 be/src/util/runtime-profile.h                      |   3 +-
 be/src/util/string-util-test.cc                    |  31 +++
 be/src/util/string-util.cc                         |  11 +
 be/src/util/string-util.h                          |   3 +
 common/thrift/BackendGflags.thrift                 |   2 +
 common/thrift/CatalogService.thrift                |   3 +
 common/thrift/Frontend.thrift                      |  10 +
 common/thrift/JniCatalog.thrift                    |  42 +++-
 docs/impala.ditamap                                |   1 +
 docs/impala_keydefs.ditamap                        |   1 +
 docs/shared/impala_common.xml                      |  71 +++---
 .../impala_disable_codegen_rows_threshold.xml      |  97 +++++++++
 .../impala_exec_single_node_rows_threshold.xml     |  11 +-
 docs/topics/impala_partitioning.xml                |  77 +++++--
 docs/topics/impala_runtime_filtering.xml           |  16 +-
 fe/src/main/cup/sql-parser.cup                     |  38 ++++
 .../impala/analysis/AlterDbSetOwnerStmt.java       |  60 ++++++
 .../{CommentOnDbStmt.java => AlterDbStmt.java}     |  23 +-
 .../org/apache/impala/analysis/AlterTableStmt.java |   3 +-
 .../apache/impala/analysis/AnalysisContext.java    |   9 +-
 .../java/org/apache/impala/analysis/Analyzer.java  |   3 +-
 .../apache/impala/analysis/CommentOnDbStmt.java    |   2 +
 .../org/apache/impala/analysis/CommentOnStmt.java  |  13 +-
 ...OnDbStmt.java => CommentOnTableOrViewStmt.java} |  35 ++-
 .../{ParseNode.java => CommentOnTableStmt.java}    |  25 ++-
 .../{ParseNode.java => CommentOnViewStmt.java}     |  25 ++-
 .../impala/analysis/CreateTableDataSrcStmt.java    |   4 +-
 .../Owner.java}                                    |  23 +-
 .../org/apache/impala/analysis/PrivilegeSpec.java  |   4 +-
 .../java/org/apache/impala/catalog/DataSource.java |   8 +-
 .../org/apache/impala/catalog/DataSourceTable.java |  18 +-
 .../java/org/apache/impala/catalog/FeCatalog.java  |   5 +-
 .../Visitor.java => catalog/FeDataSource.java}     |  16 +-
 ...oadingException.java => FeDataSourceTable.java} |  30 +--
 .../apache/impala/planner/DataSourceScanNode.java  |   6 +-
 .../apache/impala/planner/SingleNodePlanner.java   |   4 +-
 .../org/apache/impala/service/BackendConfig.java   |   4 +
 .../apache/impala/service/CatalogOpExecutor.java   |  82 ++++++-
 .../java/org/apache/impala/service/Frontend.java   |  21 +-
 .../org/apache/impala/service/JniFrontend.java     |  72 ++++++-
 .../java/org/apache/impala/util/MetaStoreUtil.java |   4 +
 .../org/apache/impala/analysis/AnalyzeDDLTest.java |  73 ++++++-
 .../apache/impala/analysis/AuthorizationTest.java  |  32 +++
 .../org/apache/impala/analysis/ParserTest.java     |  38 ++++
 .../org/apache/impala/service/JniFrontendTest.java |  58 +++++
 shell/impala_shell.py                              |  82 ++++++-
 .../QueryTest/datastream-sender-codegen.test       |  41 ++++
 tests/authorization/test_authorization.py          |  40 +++-
 tests/metadata/test_ddl.py                         |  63 ++++++
 tests/metadata/test_ddl_base.py                    |  21 +-
 tests/query_test/test_codegen.py                   |   8 +-
 tests/shell/test_shell_interactive.py              | 103 ++++++++-
 83 files changed, 1861 insertions(+), 338 deletions(-)
 create mode 100644 be/src/runtime/krpc-data-stream-sender-ir.cc
 create mode 100644 be/src/service/impala-server-test.cc
 create mode 100644 docs/topics/impala_disable_codegen_rows_threshold.xml
 create mode 100644 fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java
 copy fe/src/main/java/org/apache/impala/analysis/{CommentOnDbStmt.java => AlterDbStmt.java} (77%)
 copy fe/src/main/java/org/apache/impala/analysis/{CommentOnDbStmt.java => CommentOnTableOrViewStmt.java} (55%)
 copy fe/src/main/java/org/apache/impala/analysis/{ParseNode.java => CommentOnTableStmt.java} (64%)
 copy fe/src/main/java/org/apache/impala/analysis/{ParseNode.java => CommentOnViewStmt.java} (64%)
 copy fe/src/main/java/org/apache/impala/{common/ColumnAliasGenerator.java => analysis/Owner.java} (63%)
 copy fe/src/main/java/org/apache/impala/{util/Visitor.java => catalog/FeDataSource.java} (78%)
 copy fe/src/main/java/org/apache/impala/catalog/{TableLoadingException.java => FeDataSourceTable.java} (63%)
 create mode 100644 fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
 create mode 100644 testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test


[impala] 06/11: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 56c98e9900fa4898c1997ab834ca4c9ee81f3688
Author: Michael Ho <kw...@cloudera.com>
AuthorDate: Tue Mar 6 18:31:18 2018 -0800

    IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()
    
    This change codegens the hash partitioning logic of
    KrpcDataStreamSender::Send() when the partitioning strategy
    is HASH_PARTITIONED. It does so by unrolling the loop which
    evaluates each row against the partitioning expressions and
    hashes the result. It also replaces the number of channels
    of that sender with a constant at runtime.
    
    With this change, we get reasonable speedup with some benchmarks:
    
    +------------+-----------------------+---------+------------+------------+----------------+
    | Workload   | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
    +------------+-----------------------+---------+------------+------------+----------------+
    | TPCH(_300) | parquet / none / none | 20.03   | -6.44%     | 13.56      | -7.15%         |
    +------------+-----------------------+---------+------------+------------+----------------+
    
    +---------------------+-----------------------+---------+------------+------------+----------------+
    | Workload            | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
    +---------------------+-----------------------+---------+------------+------------+----------------+
    | TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56%     | 12.28      | -5.30%         |
    +---------------------+-----------------------+---------+------------+------------+----------------+
    
    +-------------------------+-----------------------+---------+------------+------------+----------------+
    | Workload                | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
    +-------------------------+-----------------------+---------+------------+------------+----------------+
    | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10%     | 7.16       | -4.33%         |
    +-------------------------+-----------------------+---------+------------+------------+----------------+
    
    +-------------------+-----------------------+---------+------------+------------+----------------+
    | Workload          | File Format           | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) |
    +-------------------+-----------------------+---------+------------+------------+----------------+
    | TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02%     | 17.46      | -4.71%         |
    +-------------------+-----------------------+---------+------------+------------+----------------+
    
    Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
    Reviewed-on: http://gerrit.cloudera.org:8080/10421
    Reviewed-by: Michael Ho <kw...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/codegen/gen_ir_descriptions.py              |   8 +-
 be/src/codegen/impala-ir.cc                        |   1 +
 be/src/exec/data-sink.cc                           |   4 +
 be/src/exec/data-sink.h                            |   4 +
 be/src/exec/exchange-node.cc                       |   2 +-
 be/src/exec/exec-node.cc                           |   8 -
 be/src/exec/exec-node.h                            |   3 -
 be/src/exec/hdfs-scan-node-base.cc                 |   2 +-
 be/src/exec/partial-sort-node.cc                   |   2 +-
 be/src/exec/partitioned-aggregation-node.cc        |   2 +-
 be/src/exec/partitioned-hash-join-builder.cc       |   6 +-
 be/src/exec/partitioned-hash-join-builder.h        |   2 +-
 be/src/exec/partitioned-hash-join-node.cc          |   2 +-
 be/src/exec/sort-node.cc                           |   2 +-
 be/src/exec/topn-node.cc                           |   2 +-
 be/src/runtime/CMakeLists.txt                      |   1 +
 be/src/runtime/fragment-instance-state.cc          |   2 +
 be/src/runtime/krpc-data-stream-sender-ir.cc       |  49 +++++
 be/src/runtime/krpc-data-stream-sender.cc          | 237 ++++++++++++++++++---
 be/src/runtime/krpc-data-stream-sender.h           |  71 ++++--
 be/src/runtime/raw-value-ir.cc                     |  28 +++
 be/src/runtime/raw-value.cc                        |  27 ---
 be/src/runtime/runtime-state.h                     |   9 +
 be/src/util/runtime-profile.h                      |   3 +-
 .../QueryTest/datastream-sender-codegen.test       |  41 ++++
 tests/query_test/test_codegen.py                   |   8 +-
 26 files changed, 425 insertions(+), 101 deletions(-)

diff --git a/be/src/codegen/gen_ir_descriptions.py b/be/src/codegen/gen_ir_descriptions.py
index 26a8ad7..dd2df9e 100755
--- a/be/src/codegen/gen_ir_descriptions.py
+++ b/be/src/codegen/gen_ir_descriptions.py
@@ -196,6 +196,8 @@ ir_functions = [
    "_ZN6impala8RawValue7CompareEPKvS2_RKNS_10ColumnTypeE"],
   ["RAW_VALUE_GET_HASH_VALUE",
    "_ZN6impala8RawValue12GetHashValueEPKvRKNS_10ColumnTypeEj"],
+  ["RAW_VALUE_GET_HASH_VALUE_FAST_HASH",
+   "_ZN6impala8RawValue20GetHashValueFastHashEPKvRKNS_10ColumnTypeEm"],
   ["TOPN_NODE_INSERT_BATCH",
    "_ZN6impala8TopNNode11InsertBatchEPNS_8RowBatchE"],
   ["MEMPOOL_ALLOCATE",
@@ -219,7 +221,11 @@ ir_functions = [
   ["FLOAT_MIN_MAX_FILTER_INSERT", "_ZN6impala17FloatMinMaxFilter6InsertEPv"],
   ["DOUBLE_MIN_MAX_FILTER_INSERT", "_ZN6impala18DoubleMinMaxFilter6InsertEPv"],
   ["STRING_MIN_MAX_FILTER_INSERT", "_ZN6impala18StringMinMaxFilter6InsertEPv"],
-  ["TIMESTAMP_MIN_MAX_FILTER_INSERT", "_ZN6impala21TimestampMinMaxFilter6InsertEPv"]
+  ["TIMESTAMP_MIN_MAX_FILTER_INSERT", "_ZN6impala21TimestampMinMaxFilter6InsertEPv"],
+  ["KRPC_DSS_GET_PART_EXPR_EVAL",
+  "_ZN6impala20KrpcDataStreamSender25GetPartitionExprEvaluatorEi"],
+  ["KRPC_DSS_HASH_AND_ADD_ROWS",
+  "_ZN6impala20KrpcDataStreamSender14HashAndAddRowsEPNS_8RowBatchE"]
 ]
 
 enums_preamble = '\
diff --git a/be/src/codegen/impala-ir.cc b/be/src/codegen/impala-ir.cc
index 4b79e8b..9c5b3eb 100644
--- a/be/src/codegen/impala-ir.cc
+++ b/be/src/codegen/impala-ir.cc
@@ -54,6 +54,7 @@
 #include "exprs/timestamp-functions-ir.cc"
 #include "exprs/udf-builtins-ir.cc"
 #include "exprs/utility-functions-ir.cc"
+#include "runtime/krpc-data-stream-sender-ir.cc"
 #include "runtime/mem-pool.h"
 #include "runtime/raw-value-ir.cc"
 #include "runtime/runtime-filter-ir.cc"
diff --git a/be/src/exec/data-sink.cc b/be/src/exec/data-sink.cc
index 9140b3e..2ca0019 100644
--- a/be/src/exec/data-sink.cc
+++ b/be/src/exec/data-sink.cc
@@ -134,6 +134,10 @@ Status DataSink::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) {
   return Status::OK();
 }
 
+void DataSink::Codegen(LlvmCodeGen* codegen) {
+  return;
+}
+
 Status DataSink::Open(RuntimeState* state) {
   DCHECK_EQ(output_exprs_.size(), output_expr_evals_.size());
   return ScalarExprEvaluator::Open(output_expr_evals_, state);
diff --git a/be/src/exec/data-sink.h b/be/src/exec/data-sink.h
index 605b46d..d4f2040 100644
--- a/be/src/exec/data-sink.h
+++ b/be/src/exec/data-sink.h
@@ -63,6 +63,10 @@ class DataSink {
   /// initializes their evaluators. Subclasses must call DataSink::Prepare().
   virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker);
 
+  /// Codegen expressions in the sink. Overridden by sink type which supports codegen.
+  /// No-op by default.
+  virtual void Codegen(LlvmCodeGen* codegen);
+
   /// Call before Send() to open the sink and initialize output expression evaluators.
   virtual Status Open(RuntimeState* state);
 
diff --git a/be/src/exec/exchange-node.cc b/be/src/exec/exchange-node.cc
index 297a805..8884f30 100644
--- a/be/src/exec/exchange-node.cc
+++ b/be/src/exec/exchange-node.cc
@@ -94,7 +94,7 @@ Status ExchangeNode::Prepare(RuntimeState* state) {
   if (is_merging_) {
     less_than_.reset(
         new TupleRowComparator(ordering_exprs_, is_asc_order_, nulls_first_));
-    AddCodegenDisabledMessage(state);
+    state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   }
   return Status::OK();
 }
diff --git a/be/src/exec/exec-node.cc b/be/src/exec/exec-node.cc
index 3a99f18..85d965b 100644
--- a/be/src/exec/exec-node.cc
+++ b/be/src/exec/exec-node.cc
@@ -501,14 +501,6 @@ Status ExecNode::QueryMaintenance(RuntimeState* state) {
   return state->CheckQueryState();
 }
 
-void ExecNode::AddCodegenDisabledMessage(RuntimeState* state) {
-  if (state->CodegenDisabledByQueryOption()) {
-    runtime_profile()->AddCodegenMsg(false, "disabled by query option DISABLE_CODEGEN");
-  } else if (state->CodegenDisabledByHint()) {
-    runtime_profile()->AddCodegenMsg(false, "disabled due to optimization hints");
-  }
-}
-
 bool ExecNode::IsNodeCodegenDisabled() const {
   return disable_codegen_;
 }
diff --git a/be/src/exec/exec-node.h b/be/src/exec/exec-node.h
index 80241b1..fac8312 100644
--- a/be/src/exec/exec-node.h
+++ b/be/src/exec/exec-node.h
@@ -216,9 +216,6 @@ class ExecNode {
   /// check to see if codegen was enabled for the enclosing fragment.
   bool IsNodeCodegenDisabled() const;
 
-  /// Add codegen disabled message if codegen is disabled for this ExecNode.
-  void AddCodegenDisabledMessage(RuntimeState* state);
-
   /// Extract node id from p->name().
   static int GetNodeIdFromProfile(RuntimeProfile* p);
 
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index a163e63..67f07cd 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -264,7 +264,7 @@ Status HdfsScanNodeBase::Prepare(RuntimeState* state) {
   UpdateHdfsSplitStats(*scan_range_params_, &per_volume_stats);
   PrintHdfsSplitStats(per_volume_stats, &str);
   runtime_profile()->AddInfoString(HDFS_SPLIT_STATS_DESC, str.str());
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   return Status::OK();
 }
 
diff --git a/be/src/exec/partial-sort-node.cc b/be/src/exec/partial-sort-node.cc
index 911f014..9fd653c 100644
--- a/be/src/exec/partial-sort-node.cc
+++ b/be/src/exec/partial-sort-node.cc
@@ -62,7 +62,7 @@ Status PartialSortNode::Prepare(RuntimeState* state) {
       resource_profile_.spillable_buffer_size, runtime_profile(), state, id(), false));
   RETURN_IF_ERROR(sorter_->Prepare(pool_));
   DCHECK_GE(resource_profile_.min_reservation, sorter_->ComputeMinReservation());
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   input_batch_.reset(
       new RowBatch(child(0)->row_desc(), state->batch_size(), mem_tracker()));
   return Status::OK();
diff --git a/be/src/exec/partitioned-aggregation-node.cc b/be/src/exec/partitioned-aggregation-node.cc
index d7b8c0a..3a90f14 100644
--- a/be/src/exec/partitioned-aggregation-node.cc
+++ b/be/src/exec/partitioned-aggregation-node.cc
@@ -225,7 +225,7 @@ Status PartitionedAggregationNode::Prepare(RuntimeState* state) {
         state->fragment_hash_seed(), MAX_PARTITION_DEPTH, 1, expr_perm_pool(),
         expr_results_pool(), expr_results_pool(), &ht_ctx_));
   }
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   return Status::OK();
 }
 
diff --git a/be/src/exec/partitioned-hash-join-builder.cc b/be/src/exec/partitioned-hash-join-builder.cc
index 194bb92..c627b98 100644
--- a/be/src/exec/partitioned-hash-join-builder.cc
+++ b/be/src/exec/partitioned-hash-join-builder.cc
@@ -137,11 +137,7 @@ Status PhjBuilder::Prepare(RuntimeState* state, MemTracker* parent_mem_tracker)
   partition_build_rows_timer_ = ADD_TIMER(profile(), "BuildRowsPartitionTime");
   build_hash_table_timer_ = ADD_TIMER(profile(), "HashTablesBuildTime");
   repartition_timer_ = ADD_TIMER(profile(), "RepartitionTime");
-  if (state->CodegenDisabledByQueryOption()) {
-    profile()->AddCodegenMsg(false, "disabled by query option DISABLE_CODEGEN");
-  } else if (state->CodegenDisabledByHint()) {
-    profile()->AddCodegenMsg(false, "disabled due to optimization hints");
-  }
+  state->CheckAndAddCodegenDisabledMessage(profile());
   return Status::OK();
 }
 
diff --git a/be/src/exec/partitioned-hash-join-builder.h b/be/src/exec/partitioned-hash-join-builder.h
index e075beb..f6105fc 100644
--- a/be/src/exec/partitioned-hash-join-builder.h
+++ b/be/src/exec/partitioned-hash-join-builder.h
@@ -91,7 +91,7 @@ class PhjBuilder : public DataSink {
   /// Does all codegen for the builder (if codegen is enabled).
   /// Updates the the builder's runtime profile with info about whether any errors
   /// occured during codegen.
-  void Codegen(LlvmCodeGen* codegen);
+  virtual void Codegen(LlvmCodeGen* codegen) override;
 
   /////////////////////////////////////////
   // The following functions are used only by PartitionedHashJoinNode.
diff --git a/be/src/exec/partitioned-hash-join-node.cc b/be/src/exec/partitioned-hash-join-node.cc
index 99e07eb..909a427 100644
--- a/be/src/exec/partitioned-hash-join-node.cc
+++ b/be/src/exec/partitioned-hash-join-node.cc
@@ -136,7 +136,7 @@ Status PartitionedHashJoinNode::Prepare(RuntimeState* state) {
       ADD_COUNTER(runtime_profile(), "ProbeRowsPartitioned", TUnit::UNIT);
   num_hash_table_builds_skipped_ =
       ADD_COUNTER(runtime_profile(), "NumHashTableBuildsSkipped", TUnit::UNIT);
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   return Status::OK();
 }
 
diff --git a/be/src/exec/sort-node.cc b/be/src/exec/sort-node.cc
index 9ab1435..546f3ba 100644
--- a/be/src/exec/sort-node.cc
+++ b/be/src/exec/sort-node.cc
@@ -58,7 +58,7 @@ Status SortNode::Prepare(RuntimeState* state) {
           resource_profile_.spillable_buffer_size, runtime_profile(), state, id(), true));
   RETURN_IF_ERROR(sorter_->Prepare(pool_));
   DCHECK_GE(resource_profile_.min_reservation, sorter_->ComputeMinReservation());
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   return Status::OK();
 }
 
diff --git a/be/src/exec/topn-node.cc b/be/src/exec/topn-node.cc
index 31bf58b..97c05f9 100644
--- a/be/src/exec/topn-node.cc
+++ b/be/src/exec/topn-node.cc
@@ -79,7 +79,7 @@ Status TopNNode::Prepare(RuntimeState* state) {
       new TupleRowComparator(ordering_exprs_, is_asc_order_, nulls_first_));
   output_tuple_desc_ = row_descriptor_.tuple_descriptors()[0];
   insert_batch_timer_ = ADD_TIMER(runtime_profile(), "InsertBatchTime");
-  AddCodegenDisabledMessage(state);
+  state->CheckAndAddCodegenDisabledMessage(runtime_profile());
   tuple_pool_reclaim_counter_ = ADD_COUNTER(runtime_profile(), "TuplePoolReclamations",
       TUnit::UNIT);
   return Status::OK();
diff --git a/be/src/runtime/CMakeLists.txt b/be/src/runtime/CMakeLists.txt
index 89cbbb9..7dcf45c 100644
--- a/be/src/runtime/CMakeLists.txt
+++ b/be/src/runtime/CMakeLists.txt
@@ -47,6 +47,7 @@ add_library(Runtime
   krpc-data-stream-mgr.cc
   krpc-data-stream-recvr.cc
   krpc-data-stream-sender.cc
+  krpc-data-stream-sender-ir.cc
   lib-cache.cc
   mem-tracker.cc
   mem-pool.cc
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index 629fb96..e6fa489 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -247,6 +247,8 @@ Status FragmentInstanceState::Open() {
       SCOPED_THREAD_COUNTER_MEASUREMENT(
           runtime_state_->codegen()->llvm_thread_counters());
       exec_tree_->Codegen(runtime_state_);
+      sink_->Codegen(runtime_state_->codegen());
+
       // It shouldn't be fatal to fail codegen. However, until IMPALA-4233 is fixed,
       // ScalarFnCall has no fall back to interpretation when codegen fails so propagates
       // the error status for now.
diff --git a/be/src/runtime/krpc-data-stream-sender-ir.cc b/be/src/runtime/krpc-data-stream-sender-ir.cc
new file mode 100644
index 0000000..335e16f
--- /dev/null
+++ b/be/src/runtime/krpc-data-stream-sender-ir.cc
@@ -0,0 +1,49 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include "exprs/scalar-expr-evaluator.h"
+#include "runtime/krpc-data-stream-sender.h"
+#include "runtime/raw-value.h"
+#include "runtime/row-batch.h"
+
+namespace impala {
+
+ScalarExprEvaluator* KrpcDataStreamSender::GetPartitionExprEvaluator(int i) {
+  return partition_expr_evals_[i];
+}
+
+Status KrpcDataStreamSender::HashAndAddRows(RowBatch* batch) {
+  const int num_rows = batch->num_rows();
+  const int num_channels = GetNumChannels();
+  int channel_ids[RowBatch::HASH_BATCH_SIZE];
+  int row_idx = 0;
+  while (row_idx < num_rows) {
+    int row_count = 0;
+    FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, row_batch_iter) {
+      TupleRow* row = row_batch_iter.Get();
+      channel_ids[row_count++] = HashRow(row) % num_channels;
+    }
+    row_count = 0;
+    FOREACH_ROW_LIMIT(batch, row_idx, RowBatch::HASH_BATCH_SIZE, row_batch_iter) {
+      RETURN_IF_ERROR(AddRowToChannel(channel_ids[row_count++], row_batch_iter.Get()));
+    }
+    row_idx += row_count;
+  }
+  return Status::OK();
+}
+
+}
diff --git a/be/src/runtime/krpc-data-stream-sender.cc b/be/src/runtime/krpc-data-stream-sender.cc
index cd30f06..74eb129 100644
--- a/be/src/runtime/krpc-data-stream-sender.cc
+++ b/be/src/runtime/krpc-data-stream-sender.cc
@@ -25,6 +25,8 @@
 #include <thrift/protocol/TDebugProtocol.h>
 
 #include "common/logging.h"
+#include "codegen/codegen-anyval.h"
+#include "codegen/llvm-codegen.h"
 #include "exec/kudu-util.h"
 #include "exprs/scalar-expr.h"
 #include "exprs/scalar-expr-evaluator.h"
@@ -61,6 +63,10 @@ DECLARE_int32(rpc_retry_interval_ms);
 
 namespace impala {
 
+const char* KrpcDataStreamSender::HASH_ROW_SYMBOL =
+    "KrpcDataStreamSender7HashRowEPNS_8TupleRowE";
+const char* KrpcDataStreamSender::LLVM_CLASS_NAME = "class.impala::KrpcDataStreamSender";
+
 // A datastream sender may send row batches to multiple destinations. There is one
 // channel for each destination.
 //
@@ -645,6 +651,7 @@ Status KrpcDataStreamSender::Prepare(
   for (int i = 0; i < channels_.size(); ++i) {
     RETURN_IF_ERROR(channels_[i]->Init(state));
   }
+  state->CheckAndAddCodegenDisabledMessage(profile());
   return Status::OK();
 }
 
@@ -653,6 +660,198 @@ Status KrpcDataStreamSender::Open(RuntimeState* state) {
   return ScalarExprEvaluator::Open(partition_expr_evals_, state);
 }
 
+//
+// An example of generated code with int type.
+//
+// define i64 @KrpcDataStreamSenderHashRow(%"class.impala::KrpcDataStreamSender"* %this,
+//                                         %"class.impala::TupleRow"* %row) #46 {
+// entry:
+//   %0 = alloca i32
+//   %1 = call %"class.impala::ScalarExprEvaluator"*
+//       @_ZN6impala20KrpcDataStreamSender25GetPartitionExprEvaluatorEi(
+//           %"class.impala::KrpcDataStreamSender"* %this, i32 0)
+//   %partition_val = call i64 @GetSlotRef(
+//       %"class.impala::ScalarExprEvaluator"* %1, %"class.impala::TupleRow"* %row)
+//   %is_null = trunc i64 %partition_val to i1
+//   br i1 %is_null, label %is_null_block, label %not_null_block
+//
+// is_null_block:                                ; preds = %entry
+//   br label %hash_val_block
+//
+// not_null_block:                               ; preds = %entry
+//   %2 = ashr i64 %partition_val, 32
+//   %3 = trunc i64 %2 to i32
+//   store i32 %3, i32* %0
+//   %native_ptr = bitcast i32* %0 to i8*
+//   br label %hash_val_block
+//
+// hash_val_block:                               ; preds = %not_null_block, %is_null_block
+//   %val_ptr_phi = phi i8* [ %native_ptr, %not_null_block ], [ null, %is_null_block ]
+//   %hash_val = call i64
+//       @_ZN6impala8RawValue20GetHashValueFastHashEPKvRKNS_10ColumnTypeEm(
+//           i8* %val_ptr_phi, %"struct.impala::ColumnType"* @expr_type_arg,
+//               i64 7403188670037225271)
+//   ret i64 %hash_val
+// }
+Status KrpcDataStreamSender::CodegenHashRow(LlvmCodeGen* codegen, llvm::Function** fn) {
+  llvm::LLVMContext& context = codegen->context();
+  LlvmBuilder builder(context);
+
+  LlvmCodeGen::FnPrototype prototype(
+      codegen, "KrpcDataStreamSenderHashRow", codegen->i64_type());
+  prototype.AddArgument(
+      LlvmCodeGen::NamedVariable("this", codegen->GetNamedPtrType(LLVM_CLASS_NAME)));
+  prototype.AddArgument(
+      LlvmCodeGen::NamedVariable("row", codegen->GetStructPtrType<TupleRow>()));
+
+  llvm::Value* args[2];
+  llvm::Function* hash_row_fn = prototype.GeneratePrototype(&builder, args);
+  llvm::Value* this_arg = args[0];
+  llvm::Value* row_arg = args[1];
+
+  // Store the initial seed to hash_val
+  llvm::Value* hash_val = codegen->GetI64Constant(EXCHANGE_HASH_SEED);
+
+  // Unroll the loop and codegen each of the partition expressions
+  for (int i = 0; i < partition_exprs_.size(); ++i) {
+    llvm::Function* compute_fn;
+    RETURN_IF_ERROR(partition_exprs_[i]->GetCodegendComputeFn(codegen, &compute_fn));
+
+    // Load the expression evaluator for the i-th partition expression
+    llvm::Function* get_expr_eval_fn =
+        codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false);
+    DCHECK(get_expr_eval_fn != nullptr);
+    llvm::Value* expr_eval_arg =
+        builder.CreateCall(get_expr_eval_fn, {this_arg, codegen->GetI32Constant(i)});
+
+    // Compute the value against the i-th partition expression
+    llvm::Value* compute_fn_args[] = {expr_eval_arg, row_arg};
+    CodegenAnyVal partition_val = CodegenAnyVal::CreateCallWrapped(codegen, &builder,
+        partition_exprs_[i]->type(), compute_fn, compute_fn_args, "partition_val");
+
+    llvm::BasicBlock* is_null_block =
+        llvm::BasicBlock::Create(context, "is_null_block", hash_row_fn);
+    llvm::BasicBlock* not_null_block =
+        llvm::BasicBlock::Create(context, "not_null_block", hash_row_fn);
+    llvm::BasicBlock* hash_val_block =
+        llvm::BasicBlock::Create(context, "hash_val_block", hash_row_fn);
+
+    // Check if 'partition_val' is NULL
+    llvm::Value* val_is_null = partition_val.GetIsNull();
+    builder.CreateCondBr(val_is_null, is_null_block, not_null_block);
+
+    // Set the pointer to NULL in case 'partition_val' evaluates to NULL
+    builder.SetInsertPoint(is_null_block);
+    llvm::Value* null_ptr = codegen->null_ptr_value();
+    builder.CreateBr(hash_val_block);
+
+    // Saves 'partition_val' on the stack and passes a pointer to it to the hash function
+    builder.SetInsertPoint(not_null_block);
+    llvm::Value* native_ptr = partition_val.ToNativePtr();
+    native_ptr = builder.CreatePointerCast(native_ptr, codegen->ptr_type(), "native_ptr");
+    builder.CreateBr(hash_val_block);
+
+    // Picks the input value to hash function
+    builder.SetInsertPoint(hash_val_block);
+    llvm::PHINode* val_ptr_phi = builder.CreatePHI(codegen->ptr_type(), 2, "val_ptr_phi");
+    val_ptr_phi->addIncoming(native_ptr, not_null_block);
+    val_ptr_phi->addIncoming(null_ptr, is_null_block);
+
+    // Creates a global constant of the partition expression's ColumnType. It has to be a
+    // constant for constant propagation and dead code elimination in 'get_hash_value_fn'
+    llvm::Type* col_type = codegen->GetStructType<ColumnType>();
+    llvm::Constant* expr_type_arg = codegen->ConstantToGVPtr(
+        col_type, partition_exprs_[i]->type().ToIR(codegen), "expr_type_arg");
+
+    // Update 'hash_val' with the new 'partition-val'
+    llvm::Value* get_hash_value_args[] = {val_ptr_phi, expr_type_arg, hash_val};
+    llvm::Function* get_hash_value_fn =
+        codegen->GetFunction(IRFunction::RAW_VALUE_GET_HASH_VALUE_FAST_HASH, false);
+    DCHECK(get_hash_value_fn != nullptr);
+    hash_val = builder.CreateCall(get_hash_value_fn, get_hash_value_args, "hash_val");
+  }
+
+  builder.CreateRet(hash_val);
+  *fn = codegen->FinalizeFunction(hash_row_fn);
+  if (*fn == nullptr) {
+    return Status("Codegen'd KrpcDataStreamSenderHashRow() fails verification. See log");
+  }
+
+  return Status::OK();
+}
+
+string KrpcDataStreamSender::PartitionTypeName() const {
+  switch (partition_type_) {
+  case TPartitionType::UNPARTITIONED:
+    return "Unpartitioned";
+  case TPartitionType::HASH_PARTITIONED:
+    return "Hash Partitioned";
+  case TPartitionType::RANDOM:
+    return "Random Partitioned";
+  case TPartitionType::KUDU:
+    return "Kudu Partitioned";
+  default:
+    DCHECK(false) << partition_type_;
+    return "";
+  }
+}
+
+void KrpcDataStreamSender::Codegen(LlvmCodeGen* codegen) {
+  const string sender_name = PartitionTypeName() + " Sender";
+  if (partition_type_ != TPartitionType::HASH_PARTITIONED) {
+    const string& msg = Substitute("not $0",
+        partition_type_ == TPartitionType::KUDU ? "supported" : "needed");
+    profile()->AddCodegenMsg(false, msg, sender_name);
+    return;
+  }
+
+  llvm::Function* hash_row_fn;
+  Status codegen_status = CodegenHashRow(codegen, &hash_row_fn);
+  if (codegen_status.ok()) {
+    llvm::Function* hash_and_add_rows_fn =
+        codegen->GetFunction(IRFunction::KRPC_DSS_HASH_AND_ADD_ROWS, true);
+    DCHECK(hash_and_add_rows_fn != nullptr);
+
+    int num_replaced;
+    // Replace GetNumChannels() with a constant.
+    num_replaced = codegen->ReplaceCallSitesWithValue(hash_and_add_rows_fn,
+        codegen->GetI32Constant(GetNumChannels()), "GetNumChannels");
+    DCHECK_EQ(num_replaced, 1);
+
+    // Replace HashRow() with the handcrafted IR function.
+    num_replaced = codegen->ReplaceCallSites(hash_and_add_rows_fn,
+        hash_row_fn, HASH_ROW_SYMBOL);
+    DCHECK_EQ(num_replaced, 1);
+
+    hash_and_add_rows_fn = codegen->FinalizeFunction(hash_and_add_rows_fn);
+    if (hash_and_add_rows_fn == nullptr) {
+      codegen_status =
+          Status("Codegen'd HashAndAddRows() failed verification. See log");
+    } else {
+      codegen->AddFunctionToJit(hash_and_add_rows_fn,
+          reinterpret_cast<void**>(&hash_and_add_rows_fn_));
+    }
+  }
+  profile()->AddCodegenMsg(codegen_status.ok(), codegen_status, sender_name);
+}
+
+Status KrpcDataStreamSender::AddRowToChannel(const int channel_id, TupleRow* row) {
+  return channels_[channel_id]->AddRow(row);
+}
+
+uint64_t KrpcDataStreamSender::HashRow(TupleRow* row) {
+  uint64_t hash_val = EXCHANGE_HASH_SEED;
+  for (ScalarExprEvaluator* eval : partition_expr_evals_) {
+    void* partition_val = eval->GetValue(row);
+    // We can't use the crc hash function here because it does not result in
+    // uncorrelated hashes with different seeds. Instead we use FastHash.
+    // TODO: fix crc hash/GetHashValue()
+    hash_val = RawValue::GetHashValueFastHash(
+        partition_val, eval->root().type(), hash_val);
+  }
+  return hash_val;
+}
+
 Status KrpcDataStreamSender::Send(RuntimeState* state, RowBatch* batch) {
   SCOPED_TIMER(profile()->total_time_counter());
   DCHECK(!closed_);
@@ -703,40 +902,10 @@ Status KrpcDataStreamSender::Send(RuntimeState* state, RowBatch* batch) {
     }
   } else {
     DCHECK_EQ(partition_type_, TPartitionType::HASH_PARTITIONED);
-    // hash-partition batch's rows across channels
-    // TODO: encapsulate this in an Expr as we've done for Kudu above and remove this case
-    // once we have codegen here.
-    int num_channels = channels_.size();
-    const int num_partition_exprs = partition_exprs_.size();
-    const int num_rows = batch->num_rows();
-    const int hash_batch_size = RowBatch::HASH_BATCH_SIZE;
-    int channel_ids[hash_batch_size];
-
-    // Break the loop into two parts break the data dependency between computing
-    // the hash and calling AddRow()
-    // To keep stack allocation small a RowBatch::HASH_BATCH is used
-    for (int batch_start = 0; batch_start < num_rows; batch_start += hash_batch_size) {
-      int batch_window_size = min(num_rows - batch_start, hash_batch_size);
-      for (int i = 0; i < batch_window_size; ++i) {
-        TupleRow* row = batch->GetRow(i + batch_start);
-        uint64_t hash_val = EXCHANGE_HASH_SEED;
-        for (int j = 0; j < num_partition_exprs; ++j) {
-          ScalarExprEvaluator* eval = partition_expr_evals_[j];
-          void* partition_val = eval->GetValue(row);
-          // We can't use the crc hash function here because it does not result in
-          // uncorrelated hashes with different seeds. Instead we use FastHash.
-          // TODO: fix crc hash/GetHashValue()
-          DCHECK(&(eval->root()) == partition_exprs_[j]);
-          hash_val = RawValue::GetHashValueFastHash(
-              partition_val, partition_exprs_[j]->type(), hash_val);
-        }
-        channel_ids[i] = hash_val % num_channels;
-      }
-
-      for (int i = 0; i < batch_window_size; ++i) {
-        TupleRow* row = batch->GetRow(i + batch_start);
-        RETURN_IF_ERROR(channels_[channel_ids[i]]->AddRow(row));
-      }
+    if (hash_and_add_rows_fn_ != nullptr) {
+      RETURN_IF_ERROR(hash_and_add_rows_fn_(this, batch));
+    } else {
+      RETURN_IF_ERROR(HashAndAddRows(batch));
     }
   }
   COUNTER_ADD(total_sent_rows_counter_, batch->num_rows());
diff --git a/be/src/runtime/krpc-data-stream-sender.h b/be/src/runtime/krpc-data-stream-sender.h
index e6c6ccf..6757c2a 100644
--- a/be/src/runtime/krpc-data-stream-sender.h
+++ b/be/src/runtime/krpc-data-stream-sender.h
@@ -23,9 +23,11 @@
 #include <string>
 
 #include "exec/data-sink.h"
+#include "codegen/impala-ir.h"
 #include "common/global-types.h"
 #include "common/object-pool.h"
 #include "common/status.h"
+#include "exprs/scalar-expr.h"
 #include "runtime/row-batch.h"
 #include "util/runtime-profile.h"
 
@@ -48,7 +50,7 @@ class TPlanFragmentDestination;
 /// TODO: create a PlanNode equivalent class for DataSink.
 class KrpcDataStreamSender : public DataSink {
  public:
-  /// Construct a sender according to the output specification (tsink), sending to the
+  /// Constructs a sender according to the output specification (tsink), sending to the
   /// given destinations:
   /// 'sender_id' identifies this sender instance, and is unique within a fragment.
   /// 'row_desc' is the descriptor of the tuple row. It must out-live the sink.
@@ -65,40 +67,44 @@ class KrpcDataStreamSender : public DataSink {
 
   virtual ~KrpcDataStreamSender();
 
-  /// Initialize the sender by initializing all the channels and allocates all
+  /// Initializes the sender by initializing all the channels and allocates all
   /// the stat counters. Return error status if any channels failed to initialize.
-  virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker);
+  virtual Status Prepare(RuntimeState* state, MemTracker* parent_mem_tracker) override;
 
-  /// Initialize the evaluator of the partitioning expressions. Return error status
+  /// Codegen HashAndAddRows() if partitioning type is HASH_PARTITIONED.
+  /// Replaces HashRow() and GetNumChannels() based on runtime information.
+  virtual void Codegen(LlvmCodeGen* codegen) override;
+
+  /// Initializes the evaluator of the partitioning expressions. Return error status
   /// if initialization failed.
-  virtual Status Open(RuntimeState* state);
+  virtual Status Open(RuntimeState* state) override;
 
-  /// Flush all buffered data and close all existing channels to destination hosts.
+  /// Flushes all buffered data and close all existing channels to destination hosts.
   /// Further Send() calls are illegal after calling FlushFinal(). It is legal to call
   /// FlushFinal() no more than once. Return error status if Send() failed or the end
   /// of stream call failed.
-  virtual Status FlushFinal(RuntimeState* state);
+  virtual Status FlushFinal(RuntimeState* state) override;
 
-  /// Send data in 'batch' to destination nodes according to partitioning
+  /// Sends data in 'batch' to destination nodes according to partitioning
   /// specification provided in c'tor.
   /// Blocks until all rows in batch are placed in their appropriate outgoing
   /// buffers (ie, blocks if there are still in-flight rpcs from the last
   /// Send() call).
-  virtual Status Send(RuntimeState* state, RowBatch* batch);
+  virtual Status Send(RuntimeState* state, RowBatch* batch) override;
 
   /// Shutdown all existing channels to destination hosts. Further FlushFinal() calls are
   /// illegal after calling Close().
-  virtual void Close(RuntimeState* state);
+  virtual void Close(RuntimeState* state) override;
 
  protected:
   friend class DataStreamTest;
 
-  /// Initialize any partitioning expressions based on 'thrift_output_exprs' and stores
+  /// Initializes any partitioning expressions based on 'thrift_output_exprs' and stores
   /// them in 'partition_exprs_'. Returns error status if the initialization failed.
   virtual Status Init(const std::vector<TExpr>& thrift_output_exprs,
-      const TDataSink& tsink, RuntimeState* state);
+      const TDataSink& tsink, RuntimeState* state) override;
 
-  /// Return total number of bytes sent. If batches are broadcast to multiple receivers,
+  /// Returns total number of bytes sent. If batches are broadcast to multiple receivers,
   /// they are counted once per receiver.
   int64_t GetNumDataBytesSent() const;
 
@@ -111,6 +117,35 @@ class KrpcDataStreamSender : public DataSink {
   /// updating the stat counters.
   Status SerializeBatch(RowBatch* src, OutboundRowBatch* dest, int num_receivers = 1);
 
+  /// Returns 'partition_expr_evals_[i]'. Used by the codegen'd HashRow() IR function.
+  ScalarExprEvaluator* GetPartitionExprEvaluator(int i);
+
+  /// Returns the number of channels in this data stream sender. Not inlined for the
+  /// cross-compiled code as it's to be replaced with a constant during codegen.
+  int IR_NO_INLINE GetNumChannels() const { return channels_.size(); }
+
+  /// Evaluates the input row against partition expressions and hashes the expression
+  /// values. Returns the final hash value.
+  uint64_t HashRow(TupleRow* row);
+
+  /// Used when 'partition_type_' is HASH_PARTITIONED. Call HashRow() against each row
+  /// in the input batch and adds it to the corresponding channel based on the hash value.
+  /// Cross-compiled to be patched by Codegen() at runtime. Returns error status if
+  /// insertion into the channel fails. Returns OK status otherwise.
+  Status HashAndAddRows(RowBatch* batch);
+
+  /// Adds the given row to 'channels_[channel_id]'.
+  Status AddRowToChannel(const int channel_id, TupleRow* row);
+
+  /// Codegen the HashRow() function and returns the codegen'd function in 'fn'.
+  /// This involves unrolling the loop in HashRow(), codegens each of the partition
+  /// expressions and replaces the column type argument to the hash function with
+  /// constants to eliminate some branches. Returns error status on failure.
+  Status CodegenHashRow(LlvmCodeGen* codegen, llvm::Function** fn);
+
+  /// Returns the name of the partitioning type of this data stream sender.
+  string PartitionTypeName() const;
+
   /// Sender instance id, unique within a fragment.
   const int sender_id_;
 
@@ -187,8 +222,18 @@ class KrpcDataStreamSender : public DataSink {
   /// or when errors are encountered.
   int next_unknown_partition_;
 
+  /// Types and pointers for the codegen'd HashAndAddRows() functions.
+  /// NULL if codegen is disabled or failed.
+  typedef Status (*HashAndAddRowsFn)(KrpcDataStreamSender*, RowBatch* row);
+  HashAndAddRowsFn hash_and_add_rows_fn_ = nullptr;
+
+  /// KrpcDataStreamSender::HashRow() symbol. Used for call-site replacement.
+  static const char* HASH_ROW_SYMBOL;
+
   /// An arbitrary hash seed used for exchanges.
   static constexpr uint64_t EXCHANGE_HASH_SEED = 0x66bd68df22c3ef37;
+
+  static const char* LLVM_CLASS_NAME;
 };
 
 } // namespace impala
diff --git a/be/src/runtime/raw-value-ir.cc b/be/src/runtime/raw-value-ir.cc
index 93b77f2..2e44d6a 100644
--- a/be/src/runtime/raw-value-ir.cc
+++ b/be/src/runtime/raw-value-ir.cc
@@ -23,6 +23,7 @@
 #include "runtime/raw-value.inline.h"
 #include "runtime/string-value.inline.h"
 #include "runtime/timestamp-value.h"
+#include "util/hash-util.h"
 
 using namespace impala;
 
@@ -162,3 +163,30 @@ uint32_t IR_ALWAYS_INLINE RawValue::GetHashValue(
       return 0;
   }
 }
+
+uint64_t IR_ALWAYS_INLINE RawValue::GetHashValueFastHash(const void* v,
+    const ColumnType& type, uint64_t seed) {
+  // Hash with an arbitrary constant to ensure we don't return seed.
+  if (v == nullptr) {
+    return HashUtil::FastHash64(&HASH_VAL_NULL, sizeof(HASH_VAL_NULL), seed);
+  }
+  switch (type.type) {
+    case TYPE_STRING:
+    case TYPE_VARCHAR: {
+      const StringValue* string_value = reinterpret_cast<const StringValue*>(v);
+      return HashUtil::FastHash64(string_value->ptr,
+          static_cast<size_t>(string_value->len), seed);
+    }
+    case TYPE_BOOLEAN: return HashUtil::FastHash64(v, 1, seed);
+    case TYPE_TINYINT: return HashUtil::FastHash64(v, 1, seed);
+    case TYPE_SMALLINT: return HashUtil::FastHash64(v, 2, seed);
+    case TYPE_INT: return HashUtil::FastHash64(v, 4, seed);
+    case TYPE_BIGINT: return HashUtil::FastHash64(v, 8, seed);
+    case TYPE_FLOAT: return HashUtil::FastHash64(v, 4, seed);
+    case TYPE_DOUBLE: return HashUtil::FastHash64(v, 8, seed);
+    case TYPE_TIMESTAMP: return HashUtil::FastHash64(v, 12, seed);
+    case TYPE_CHAR: return HashUtil::FastHash64(v, type.len, seed);
+    case TYPE_DECIMAL: return HashUtil::FastHash64(v, type.GetByteSize(), seed);
+    default: DCHECK(false); return 0;
+  }
+}
diff --git a/be/src/runtime/raw-value.cc b/be/src/runtime/raw-value.cc
index 1db9741..d030460 100644
--- a/be/src/runtime/raw-value.cc
+++ b/be/src/runtime/raw-value.cc
@@ -193,33 +193,6 @@ void RawValue::Write(const void* value, Tuple* tuple, const SlotDescriptor* slot
   }
 }
 
-uint64_t RawValue::GetHashValueFastHash(const void* v, const ColumnType& type,
-    uint64_t seed) {
-  // Hash with an arbitrary constant to ensure we don't return seed.
-  if (v == nullptr) {
-    return HashUtil::FastHash64(&HASH_VAL_NULL, sizeof(HASH_VAL_NULL), seed);
-  }
-  switch (type.type) {
-    case TYPE_STRING:
-    case TYPE_VARCHAR: {
-      const StringValue* string_value = reinterpret_cast<const StringValue*>(v);
-      return HashUtil::FastHash64(string_value->ptr,
-          static_cast<size_t>(string_value->len), seed);
-    }
-    case TYPE_BOOLEAN: return HashUtil::FastHash64(v, 1, seed);
-    case TYPE_TINYINT: return HashUtil::FastHash64(v, 1, seed);
-    case TYPE_SMALLINT: return HashUtil::FastHash64(v, 2, seed);
-    case TYPE_INT: return HashUtil::FastHash64(v, 4, seed);
-    case TYPE_BIGINT: return HashUtil::FastHash64(v, 8, seed);
-    case TYPE_FLOAT: return HashUtil::FastHash64(v, 4, seed);
-    case TYPE_DOUBLE: return HashUtil::FastHash64(v, 8, seed);
-    case TYPE_TIMESTAMP: return HashUtil::FastHash64(v, 12, seed);
-    case TYPE_CHAR: return HashUtil::FastHash64(v, type.len, seed);
-    case TYPE_DECIMAL: return HashUtil::FastHash64(v, type.GetByteSize(), seed);
-    default: DCHECK(false); return 0;
-  }
-}
-
 void RawValue::PrintValue(
     const void* value, const ColumnType& type, int scale, std::stringstream* stream) {
   if (value == NULL) {
diff --git a/be/src/runtime/runtime-state.h b/be/src/runtime/runtime-state.h
index 47eaf7a..8901d7e 100644
--- a/be/src/runtime/runtime-state.h
+++ b/be/src/runtime/runtime-state.h
@@ -152,6 +152,15 @@ class RuntimeState {
   /// expressions' Prepare() are invoked.
   bool ScalarFnNeedsCodegen() const { return !scalar_fns_to_codegen_.empty(); }
 
+  /// Check if codegen was disabled and if so, add a message to the runtime profile.
+  void CheckAndAddCodegenDisabledMessage(RuntimeProfile* profile) {
+    if (CodegenDisabledByQueryOption()) {
+      profile->AddCodegenMsg(false, "disabled by query option DISABLE_CODEGEN");
+    } else if (CodegenDisabledByHint()) {
+      profile->AddCodegenMsg(false, "disabled due to optimization hints");
+    }
+  }
+
   /// Returns true if there is a hint to disable codegen. This can be true for single node
   /// optimization or expression evaluation request from FE to BE (see fe-support.cc).
   /// Note that this internal flag is advisory and it may be ignored if the fragment has
diff --git a/be/src/util/runtime-profile.h b/be/src/util/runtime-profile.h
index d0e296f..cae2462 100644
--- a/be/src/util/runtime-profile.h
+++ b/be/src/util/runtime-profile.h
@@ -234,7 +234,8 @@ class RuntimeProfile { // NOLINT: This struct is not packed, but there are not s
   /// then no error occurred).
   void AddCodegenMsg(bool codegen_enabled, const Status& codegen_status,
       const std::string& extra_label = "") {
-    AddCodegenMsg(codegen_enabled, codegen_status.GetDetail(), extra_label);
+    const string& err_msg = codegen_status.ok() ? "" : codegen_status.msg().msg();
+    AddCodegenMsg(codegen_enabled, err_msg, extra_label);
   }
 
   /// Creates and returns a new EventSequence (owned by the runtime
diff --git a/testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test b/testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
new file mode 100644
index 0000000..ad396ad
--- /dev/null
+++ b/testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
@@ -0,0 +1,41 @@
+====
+---- QUERY
+set disable_codegen_rows_threshold=0;
+select count(*) from alltypes t1
+  join /* +SHUFFLE */ alltypes t2
+    on t1.int_col= t2.int_col and
+       t1.string_col = t2.string_col
+---- RESULTS
+5329000
+---- TYPES
+bigint
+---- RUNTIME_PROFILE
+# Verify that codegen was enabled
+row_regex: .*Hash Partitioned Sender Codegen Enabled.*
+====
+---- QUERY
+set disable_codegen_rows_threshold=0;
+select count(*) from alltypes t1
+  join /* +BROADCAST */ alltypes t2
+    on t1.int_col= t2.int_col and
+       t1.string_col = t2.string_col
+---- RESULTS
+5329000
+---- TYPES
+bigint
+---- RUNTIME_PROFILE
+# Verify that codegen was enabled
+row_regex: .*Unpartitioned Sender Codegen Disabled: not needed.*
+====
+---- QUERY
+set disable_codegen_rows_threshold=0;
+select count(*) from chars_tiny t1
+  join /* +SHUFFLE */ chars_tiny t2 on t1.cs=t2.cs;
+---- RESULTS
+10
+---- TYPES
+bigint
+---- RUNTIME_PROFILE
+# Verify that codegen was disabled
+row_regex: .*Hash Partitioned Sender Codegen Disabled: Codegen for Char not supported.*
+====
diff --git a/tests/query_test/test_codegen.py b/tests/query_test/test_codegen.py
index ccd85e1..a4c02f5 100644
--- a/tests/query_test/test_codegen.py
+++ b/tests/query_test/test_codegen.py
@@ -18,6 +18,7 @@
 # Tests end-to-end codegen behaviour.
 
 from tests.common.impala_test_suite import ImpalaTestSuite
+from tests.common.skip import SkipIf
 from tests.common.test_dimensions import create_exec_option_dimension_from_dict
 from tests.common.test_result_verifier import get_node_exec_options,\
     assert_codegen_enabled
@@ -49,4 +50,9 @@ class TestCodegen(ImpalaTestSuite):
     exec_options = get_node_exec_options(result.runtime_profile, 1)
     # Make sure test fails if there are no exec options in the profile for the node
     assert len(exec_options) > 0
-    assert_codegen_enabled(result.runtime_profile, [1])
\ No newline at end of file
+    assert_codegen_enabled(result.runtime_profile, [1])
+
+  @SkipIf.not_krpc
+  def test_datastream_sender_codegen(self, vector):
+    """Test the KrpcDataStreamSender's codegen logic"""
+    self.run_test_case('QueryTest/datastream-sender-codegen', vector)


[impala] 03/11: IMPALA-7016: Implement ALTER DATABASE SET OWNER

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 1a2ea50fe83c8fc14e87f948ecf1997123941804
Author: Fredy Wijaya <fw...@cloudera.com>
AuthorDate: Mon May 21 10:53:29 2018 -0700

    IMPALA-7016: Implement ALTER DATABASE SET OWNER
    
    Alter the database owner to either user or role.
    
    On database creation, the database owner will be set to the current
    user, which can be viewed via DESCRIBE DATABASE db command. Having an
    owner information allows implementing a feature where an owner can be
    given certain privileges automatically upon a database creation. See
    IMPALA-7075. The ALTER DATABASE SET OWNER will be a useful command for
    transferring ownership (a set of owner privileges) from the current
    owner to another owner.
    
    Syntax:
    ALTER DATABASE db SET OWNER USER user
    ALTER DATABASE db SET OWNER ROLE role
    
    Testing:
    - Added new front-end tests
    - Added new end-to-end tests
    
    Change-Id: Ie3b923021ebce5192d2d64784e7ddb952ba82bc3
    Reviewed-on: http://gerrit.cloudera.org:8080/10471
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 common/thrift/CatalogService.thrift                |  3 ++
 common/thrift/JniCatalog.thrift                    | 33 +++++++++++-
 fe/src/main/cup/sql-parser.cup                     | 34 ++++++++++++
 .../impala/analysis/AlterDbSetOwnerStmt.java       | 60 ++++++++++++++++++++++
 .../org/apache/impala/analysis/AlterDbStmt.java    | 46 +++++++++++++++++
 .../apache/impala/analysis/AnalysisContext.java    |  9 +++-
 .../java/org/apache/impala/analysis/Owner.java     | 37 +++++++++++++
 .../apache/impala/service/CatalogOpExecutor.java   | 43 ++++++++++++++++
 .../java/org/apache/impala/service/Frontend.java   | 11 ++++
 .../java/org/apache/impala/util/MetaStoreUtil.java |  4 ++
 .../org/apache/impala/analysis/AnalyzeDDLTest.java | 21 ++++++++
 .../apache/impala/analysis/AuthorizationTest.java  | 16 ++++++
 .../org/apache/impala/analysis/ParserTest.java     | 23 +++++++++
 tests/metadata/test_ddl.py                         | 13 +++++
 tests/metadata/test_ddl_base.py                    | 15 ++++--
 15 files changed, 361 insertions(+), 7 deletions(-)

diff --git a/common/thrift/CatalogService.thrift b/common/thrift/CatalogService.thrift
index 6c02722..aaabd31 100644
--- a/common/thrift/CatalogService.thrift
+++ b/common/thrift/CatalogService.thrift
@@ -132,6 +132,9 @@ struct TDdlExecRequest {
 
   // Parameters for COMMENT ON
   23: optional JniCatalog.TCommentOnParams comment_on_params
+
+  // Parameters for ALTER DATABASE
+  24: optional JniCatalog.TAlterDbParams alter_db_params
 }
 
 // Response from executing a TDdlExecRequest
diff --git a/common/thrift/JniCatalog.thrift b/common/thrift/JniCatalog.thrift
index d3bd7a0..87722a4 100644
--- a/common/thrift/JniCatalog.thrift
+++ b/common/thrift/JniCatalog.thrift
@@ -52,7 +52,38 @@ enum TDdlType {
   GRANT_PRIVILEGE,
   REVOKE_PRIVILEGE,
   TRUNCATE_TABLE,
-  COMMENT_ON
+  COMMENT_ON,
+  ALTER_DATABASE
+}
+
+enum TOwnerType {
+  USER,
+  ROLE
+}
+
+// Types of ALTER DATABASE commands supported.
+enum TAlterDbType {
+  SET_OWNER
+}
+
+// Parameters for ALTER DATABASE SET OWNER commands.
+struct TAlterDbSetOwnerParams {
+  // The owner type.
+  1: required TOwnerType owner_type
+
+  // The owner name.
+  2: required string owner_name
+}
+
+struct TAlterDbParams {
+  // The type of ALTER DATABASE command.
+  1: required TAlterDbType alter_type
+
+  // Name of the database to alter.
+  2: required string db
+
+  // Parameters for ALTER DATABASE SET OWNER commands.
+  3: optional TAlterDbSetOwnerParams set_owner_params
 }
 
 // Types of ALTER TABLE commands supported.
diff --git a/fe/src/main/cup/sql-parser.cup b/fe/src/main/cup/sql-parser.cup
index 82fdf72..352f185 100644
--- a/fe/src/main/cup/sql-parser.cup
+++ b/fe/src/main/cup/sql-parser.cup
@@ -50,6 +50,7 @@ import org.apache.impala.thrift.TCatalogObjectType;
 import org.apache.impala.thrift.TDescribeOutputStyle;
 import org.apache.impala.thrift.TFunctionCategory;
 import org.apache.impala.thrift.THdfsFileFormat;
+import org.apache.impala.thrift.TOwnerType;
 import org.apache.impala.thrift.TPrivilegeLevel;
 import org.apache.impala.thrift.TShowStatsOp;
 import org.apache.impala.thrift.TTablePropertyType;
@@ -244,6 +245,15 @@ parser code {:
 
     return result.toString();
   }
+
+  /**
+   * This methods checks if a given ident matches the given keyword.
+   */
+  public void checkIdentKeyword(String keyword, String ident) throws Exception {
+    if (!keyword.equals(ident.toUpperCase())) {
+      parseError("identifier", SqlParserSymbols.IDENT, keyword);
+    }
+  }
 :};
 
 // List of keywords. Please keep them sorted alphabetically.
@@ -407,6 +417,9 @@ nonterminal PartitionKeyValue partition_key_value;
 nonterminal PartitionKeyValue static_partition_key_value;
 nonterminal Qualifier union_op;
 
+// For ALTER DATABASE.
+nonterminal AlterDbStmt alter_db_stmt;
+
 nonterminal PartitionDef partition_def;
 nonterminal List<PartitionDef> partition_def_list;
 nonterminal CommentOnStmt comment_on_stmt;
@@ -595,6 +608,8 @@ stmt ::=
   {: RESULT = describe; :}
   | describe_table_stmt:describe
   {: RESULT = describe; :}
+  | alter_db_stmt:alter_db
+  {: RESULT = alter_db; :}
   | alter_tbl_stmt:alter_tbl
   {: RESULT = alter_tbl; :}
   | alter_view_stmt:alter_view
@@ -1017,6 +1032,25 @@ comment_on_stmt ::=
   {: RESULT = new CommentOnDbStmt(db_name, comment); :}
   ;
 
+// Introducing OWNER and USER keywords has a potential to be a breaking change,
+// such that any names that use OWNER or USER will need to be escaped. By using IDENT
+// token we can make OWNER and USER to be keywords only in these statements.
+alter_db_stmt ::=
+  KW_ALTER KW_DATABASE ident_or_default:db KW_SET IDENT:owner_id IDENT:user_id
+  ident_or_default:user
+  {:
+    parser.checkIdentKeyword("OWNER", owner_id);
+    parser.checkIdentKeyword("USER", user_id);
+    RESULT = new AlterDbSetOwnerStmt(db, new Owner(TOwnerType.USER, user));
+  :}
+  | KW_ALTER KW_DATABASE ident_or_default:db KW_SET IDENT:owner_id KW_ROLE
+    ident_or_default:role
+  {:
+    parser.checkIdentKeyword("OWNER", owner_id);
+    RESULT = new AlterDbSetOwnerStmt(db, new Owner(TOwnerType.ROLE, role));
+  :}
+  ;
+
 alter_tbl_stmt ::=
   KW_ALTER KW_TABLE table_name:table replace_existing_cols_val:replace KW_COLUMNS
   LPAREN column_def_list:col_defs RPAREN
diff --git a/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java b/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java
new file mode 100644
index 0000000..379a806
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/analysis/AlterDbSetOwnerStmt.java
@@ -0,0 +1,60 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.analysis;
+
+import com.google.common.base.Preconditions;
+import org.apache.impala.common.AnalysisException;
+import org.apache.impala.thrift.TAlterDbParams;
+import org.apache.impala.thrift.TAlterDbSetOwnerParams;
+import org.apache.impala.thrift.TAlterDbType;
+import org.apache.impala.util.MetaStoreUtil;
+
+/**
+ * Represents an ALTER DATABASE db SET OWNER [USER|ROLE] owner statement.
+ */
+public class AlterDbSetOwnerStmt extends AlterDbStmt {
+  private final Owner owner_;
+
+  public AlterDbSetOwnerStmt(String dbName, Owner owner) {
+    super(dbName);
+    Preconditions.checkNotNull(owner);
+    owner_ = owner;
+  }
+
+  @Override
+  public void analyze(Analyzer analyzer) throws AnalysisException {
+    super.analyze(analyzer);
+    String ownerName = owner_.getOwnerName();
+    if (ownerName.length() > MetaStoreUtil.MAX_OWNER_LENGTH) {
+      throw new AnalysisException(String.format("Owner name exceeds maximum length of " +
+          "%d characters. The given owner name has %d characters.",
+          MetaStoreUtil.MAX_OWNER_LENGTH, ownerName.length()));
+    }
+  }
+
+  @Override
+  public TAlterDbParams toThrift() {
+    TAlterDbParams params = super.toThrift();
+    params.setAlter_type(TAlterDbType.SET_OWNER);
+    TAlterDbSetOwnerParams setOwnerParams = new TAlterDbSetOwnerParams();
+    setOwnerParams.setOwner_type(owner_.getOwnerType());
+    setOwnerParams.setOwner_name(owner_.getOwnerName());
+    params.setSet_owner_params(setOwnerParams);
+    return params;
+  }
+}
diff --git a/fe/src/main/java/org/apache/impala/analysis/AlterDbStmt.java b/fe/src/main/java/org/apache/impala/analysis/AlterDbStmt.java
new file mode 100644
index 0000000..b69cc49
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/analysis/AlterDbStmt.java
@@ -0,0 +1,46 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.analysis;
+
+import com.google.common.base.Preconditions;
+import org.apache.impala.authorization.Privilege;
+import org.apache.impala.common.AnalysisException;
+import org.apache.impala.thrift.TAlterDbParams;
+
+/**
+ * Base class for all ALTER DATABASE statements.
+ */
+public abstract class AlterDbStmt extends StatementBase {
+  protected final String dbName_;
+
+  public AlterDbStmt(String dbName) {
+    Preconditions.checkNotNull(dbName);
+    dbName_ = dbName;
+  }
+
+  public TAlterDbParams toThrift() {
+    TAlterDbParams params = new TAlterDbParams();
+    params.setDb(dbName_);
+    return params;
+  }
+
+  @Override
+  public void analyze(Analyzer analyzer) throws AnalysisException {
+    analyzer.getDb(dbName_, Privilege.ALTER);
+  }
+}
diff --git a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
index aa49c03..5ed791e 100644
--- a/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
+++ b/fe/src/main/java/org/apache/impala/analysis/AnalysisContext.java
@@ -151,6 +151,8 @@ public class AnalysisContext {
     public DeleteStmt getDeleteStmt() { return (DeleteStmt) stmt_; }
     public boolean isCommentOnStmt() { return stmt_ instanceof CommentOnStmt; }
 
+    public boolean isAlterDbStmt() { return stmt_ instanceof AlterDbStmt; }
+
     public boolean isCatalogOp() {
       return isUseStmt() || isViewMetadataStmt() || isDdlStmt();
     }
@@ -163,7 +165,7 @@ public class AnalysisContext {
           isCreateUdaStmt() || isDropFunctionStmt() || isCreateTableAsSelectStmt() ||
           isCreateDataSrcStmt() || isDropDataSrcStmt() || isDropStatsStmt() ||
           isCreateDropRoleStmt() || isGrantRevokeStmt() || isTruncateStmt() ||
-          isCommentOnStmt();
+          isCommentOnStmt() || isAlterDbStmt();
     }
 
     private boolean isViewMetadataStmt() {
@@ -356,6 +358,11 @@ public class AnalysisContext {
       return (CommentOnStmt) stmt_;
     }
 
+    public AlterDbStmt getAlterDbStmt() {
+      Preconditions.checkState(isAlterDbStmt());
+      return (AlterDbStmt) stmt_;
+    }
+
     public StatementBase getStmt() { return stmt_; }
     public Analyzer getAnalyzer() { return analyzer_; }
     public Set<TAccessEvent> getAccessEvents() { return analyzer_.getAccessEvents(); }
diff --git a/fe/src/main/java/org/apache/impala/analysis/Owner.java b/fe/src/main/java/org/apache/impala/analysis/Owner.java
new file mode 100644
index 0000000..4b39caf
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/analysis/Owner.java
@@ -0,0 +1,37 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.analysis;
+
+import com.google.common.base.Preconditions;
+import org.apache.impala.thrift.TOwnerType;
+
+public class Owner {
+  private final TOwnerType ownerType_;
+  private final String ownerName_;
+
+  public Owner(TOwnerType ownerType, String ownerName) {
+    Preconditions.checkNotNull(ownerType);
+    Preconditions.checkNotNull(ownerName);
+    ownerType_ = ownerType;
+    ownerName_ = ownerName;
+  }
+
+  public TOwnerType getOwnerType() { return ownerType_; }
+
+  public String getOwnerName() { return ownerName_; }
+}
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index e062eeb..80ebd19 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -92,6 +92,8 @@ import org.apache.impala.common.Reference;
 import org.apache.impala.compat.MetastoreShim;
 import org.apache.impala.thrift.ImpalaInternalServiceConstants;
 import org.apache.impala.thrift.JniCatalogConstants;
+import org.apache.impala.thrift.TAlterDbParams;
+import org.apache.impala.thrift.TAlterDbSetOwnerParams;
 import org.apache.impala.thrift.TAlterTableAddDropRangePartitionParams;
 import org.apache.impala.thrift.TAlterTableAddPartitionParams;
 import org.apache.impala.thrift.TAlterTableAddReplaceColsParams;
@@ -333,6 +335,9 @@ public class CatalogOpExecutor {
       case COMMENT_ON:
         alterCommentOn(ddlRequest.getComment_on_params(), response);
         break;
+      case ALTER_DATABASE:
+        alterDatabase(ddlRequest.getAlter_db_params(), response);
+        break;
       default: throw new IllegalStateException("Unexpected DDL exec request type: " +
           ddlRequest.ddl_type);
     }
@@ -3520,6 +3525,44 @@ public class CatalogOpExecutor {
     addSummary(response, "Updated database.");
   }
 
+  private void alterDatabase(TAlterDbParams params, TDdlExecResponse response)
+      throws CatalogException, ImpalaRuntimeException {
+    switch (params.getAlter_type()) {
+      case SET_OWNER:
+        alterDatabaseSetOwner(params.getDb(), params.getSet_owner_params(), response);
+        break;
+      default:
+        throw new UnsupportedOperationException(
+            "Unknown ALTER DATABASE operation type: " + params.getAlter_type());
+    }
+  }
+
+  private void alterDatabaseSetOwner(String dbName, TAlterDbSetOwnerParams params,
+      TDdlExecResponse response) throws CatalogException, ImpalaRuntimeException {
+    Db db = catalog_.getDb(dbName);
+    if (db == null) {
+      throw new CatalogException("Database: " + db.getName() + " does not exist.");
+    }
+    Preconditions.checkNotNull(params.owner_name);
+    Preconditions.checkNotNull(params.owner_type);
+    synchronized (metastoreDdlLock_) {
+      Database msDb = db.getMetaStoreDb();
+      String originalOwnerName = msDb.getOwnerName();
+      PrincipalType originalOwnerType = msDb.getOwnerType();
+      msDb.setOwnerName(params.owner_name);
+      msDb.setOwnerType(PrincipalType.valueOf(params.owner_type.name()));
+      try {
+        applyAlterDatabase(db);
+      } catch (ImpalaRuntimeException e) {
+        msDb.setOwnerName(originalOwnerName);
+        msDb.setOwnerType(originalOwnerType);
+        throw e;
+      }
+    }
+    addDbToCatalogUpdate(db, response.result);
+    addSummary(response, "Updated database");
+  }
+
   private void addDbToCatalogUpdate(Db db, TCatalogUpdateResult result) {
     Preconditions.checkNotNull(db);
     // Updating the new catalog version and setting it to the DB catalog version while
diff --git a/fe/src/main/java/org/apache/impala/service/Frontend.java b/fe/src/main/java/org/apache/impala/service/Frontend.java
index beab12e..54a9473 100644
--- a/fe/src/main/java/org/apache/impala/service/Frontend.java
+++ b/fe/src/main/java/org/apache/impala/service/Frontend.java
@@ -35,6 +35,7 @@ import java.util.concurrent.atomic.AtomicReference;
 
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
+import org.apache.impala.analysis.AlterDbStmt;
 import org.apache.impala.analysis.AnalysisContext;
 import org.apache.impala.analysis.AnalysisContext.AnalysisResult;
 import org.apache.impala.analysis.CommentOnStmt;
@@ -97,6 +98,7 @@ import org.apache.impala.planner.HdfsScanNode;
 import org.apache.impala.planner.PlanFragment;
 import org.apache.impala.planner.Planner;
 import org.apache.impala.planner.ScanNode;
+import org.apache.impala.thrift.TAlterDbParams;
 import org.apache.impala.thrift.TCatalogOpRequest;
 import org.apache.impala.thrift.TCatalogOpType;
 import org.apache.impala.thrift.TCatalogServiceRequestHeader;
@@ -105,6 +107,7 @@ import org.apache.impala.thrift.TColumnValue;
 import org.apache.impala.thrift.TCommentOnParams;
 import org.apache.impala.thrift.TCreateDropRoleParams;
 import org.apache.impala.thrift.TDdlExecRequest;
+import org.apache.impala.thrift.TDdlExecResponse;
 import org.apache.impala.thrift.TDdlType;
 import org.apache.impala.thrift.TDescribeOutputStyle;
 import org.apache.impala.thrift.TDescribeResult;
@@ -510,6 +513,14 @@ public class Frontend {
       req.setComment_on_params(params);
       ddl.op_type = TCatalogOpType.DDL;
       ddl.setDdl_params(req);
+    } else if (analysis.isAlterDbStmt()) {
+      AlterDbStmt alterDbStmt = analysis.getAlterDbStmt();
+      TAlterDbParams params = alterDbStmt.toThrift();
+      TDdlExecRequest req = new TDdlExecRequest();
+      req.setDdl_type(TDdlType.ALTER_DATABASE);
+      req.setAlter_db_params(params);
+      ddl.op_type = TCatalogOpType.DDL;
+      ddl.setDdl_params(req);
     } else {
       throw new IllegalStateException("Unexpected CatalogOp statement type.");
     }
diff --git a/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java b/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
index 4cc9057..34082f4 100644
--- a/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
+++ b/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
@@ -58,6 +58,10 @@ public class MetaStoreUtil {
   // The longest strings Hive accepts for [serde] property values.
   public static final int MAX_PROPERTY_VALUE_LENGTH = 4000;
 
+  // Maximum owner length. The owner can be user or role.
+  // https://github.com/apache/hive/blob/13fbae57321f3525cabb326df702430d61c242f9/standalone-metastore/src/main/resources/package.jdo#L63
+  public static final int MAX_OWNER_LENGTH = 128;
+
   // The default maximum number of partitions to fetch from the Hive metastore in one
   // RPC.
   private static final short DEFAULT_MAX_PARTITIONS_PER_RPC = 1000;
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
index c8fdbee..5ab533c 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
@@ -3919,4 +3919,25 @@ public class AnalyzeDDLTest extends FrontendTestBase {
     AnalysisError("comment on database doesntexist is 'comment'",
         "Database does not exist: doesntexist");
   }
+
+  @Test
+  public void TestAlterDatabaseSetOwner() {
+    String[] ownerTypes = new String[]{"user", "role"};
+    for (String ownerType : ownerTypes) {
+      AnalyzesOk(String.format("alter database functional set owner %s foo", ownerType));
+      AnalysisError(String.format("alter database doesntexist set owner %s foo",
+          ownerType), "Database does not exist: doesntexist");
+      AnalysisError(String.format("alter database functional set owner %s %s",
+          ownerType, buildLongOwnerName()), "Owner name exceeds maximum length of 128 " +
+          "characters. The given owner name has 133 characters.");
+    }
+  }
+
+  private static String buildLongOwnerName() {
+    StringBuilder comment = new StringBuilder();
+    for (int i = 0; i < MetaStoreUtil.MAX_OWNER_LENGTH + 5; i++) {
+      comment.append("a");
+    }
+    return comment.toString();
+  }
 }
diff --git a/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java b/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
index 58f73fa..ed6f0e4 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
@@ -3017,6 +3017,22 @@ public class AuthorizationTest extends FrontendTestBase {
         "User '%s' does not have privileges to execute 'ALTER' on: doesntexist");
   }
 
+  @Test
+  public void TestAlterDatabaseSetOwner() throws ImpalaException {
+    // User has ALTER privilege on functional_text_lzo database.
+    AuthzOk("alter database functional_text_lzo set owner user foo_user");
+    AuthzOk("alter database functional_text_lzo set owner role foo_role");
+    // User does not have ALTER privilege on functional database.
+    AuthzError("alter database functional set owner user foo_user",
+        "User '%s' does not have privileges to execute 'ALTER' on: functional");
+    AuthzError("alter database functional set owner role foo_role",
+        "User '%s' does not have privileges to execute 'ALTER' on: functional");
+    AuthzError("alter database doesntexist set owner user foo_user",
+        "User '%s' does not have privileges to execute 'ALTER' on: doesntexist");
+    AuthzError("alter database doesntexist set owner role foo_role",
+        "User '%s' does not have privileges to execute 'ALTER' on: doesntexist");
+  }
+
   private void TestWithIncorrectConfig(AuthorizationConfig authzConfig, User user)
       throws ImpalaException {
     Frontend fe = new Frontend(authzConfig, ctx_.catalog);
diff --git a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
index 81c79e5..ba01564 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
@@ -3747,4 +3747,27 @@ public class ParserTest extends FrontendTestBase {
     ParserError("COMMENT ON DATABASE IS 'comment'");
     ParserError("COMMENT ON DATABASE db IS");
   }
+
+  @Test
+  public void TestAlterDatabaseSetOwner() {
+    for (String valid : new String[]{"foo", "user", "owner"}) {
+      ParsesOk(String.format("ALTER DATABASE %s SET OWNER USER %s", valid, valid));
+      ParsesOk(String.format("ALTER DATABASE %s SET OWNER ROLE %s", valid, valid));
+    }
+
+    for (String invalid : new String[]{"'foo'", "''", "NULL"}) {
+      ParserError(String.format("ALTER DATABASE %s SET OWNER ROLE %s", invalid, invalid));
+      ParserError(String.format("ALTER DATABASE %s SET OWNER USER %s", invalid, invalid));
+    }
+
+    ParserError("ALTER DATABASE db SET ABC USER foo");
+    ParserError("ALTER DATABASE db SET ABC ROLE foo");
+    ParserError("ALTER DATABASE db SET OWNER ABC foo");
+    ParserError("ALTER DATABASE db SET OWNER USER");
+    ParserError("ALTER DATABASE SET OWNER foo");
+    ParserError("ALTER DATABASE SET OWNER USER foo");
+    ParserError("ALTER DATABASE db SET OWNER ROLE");
+    ParserError("ALTER DATABASE SET OWNER ROLE foo");
+    ParserError("ALTER DATABASE SET OWNER");
+  }
 }
diff --git a/tests/metadata/test_ddl.py b/tests/metadata/test_ddl.py
index c2f34a9..e11fb09 100644
--- a/tests/metadata/test_ddl.py
+++ b/tests/metadata/test_ddl.py
@@ -210,6 +210,19 @@ class TestDdlStatements(TestDdlBase):
     comment = self._get_db_comment(unique_database)
     assert '' == comment
 
+  def test_alter_database_set_owner(self, vector, unique_database):
+    self.client.execute("alter database {0} set owner user foo_user".format(
+      unique_database))
+    properties = self._get_db_owner_properties(unique_database)
+    assert len(properties) == 1
+    assert {'foo_user': 'USER'} == properties
+
+    self.client.execute("alter database {0} set owner role foo_role".format(
+      unique_database))
+    properties = self._get_db_owner_properties(unique_database)
+    assert len(properties) == 1
+    assert {'foo_role': 'ROLE'} == properties
+
   # There is a query in QueryTest/create-table that references nested types, which is not
   # supported if old joins and aggs are enabled. Since we do not get any meaningful
   # additional coverage by running a DDL test under the old aggs and joins, it can be
diff --git a/tests/metadata/test_ddl_base.py b/tests/metadata/test_ddl_base.py
index 0b0968d..cc0e0c2 100644
--- a/tests/metadata/test_ddl_base.py
+++ b/tests/metadata/test_ddl_base.py
@@ -73,17 +73,22 @@ class TestDdlBase(ImpalaTestSuite):
     """Extracts the serde properties mapping from the output of DESCRIBE FORMATTED"""
     return self._get_properties('Storage Desc Params:', table_name)
 
-  def _get_properties(self, section_name, table_name):
-    """Extracts the table properties mapping from the output of DESCRIBE FORMATTED"""
-    result = self.client.execute("describe formatted " + table_name)
+  def _get_db_owner_properties(self, db_name):
+    """Extracts the DB properties mapping from the output of DESCRIBE FORMATTED"""
+    return self._get_properties("Owner:", db_name, True)
+
+  def _get_properties(self, section_name, name, is_db=False):
+    """Extracts the db/table properties mapping from the output of DESCRIBE FORMATTED"""
+    result = self.client.execute("describe {0} formatted {1}".format(
+      "database" if is_db else "", name))
     match = False
-    properties = dict();
+    properties = dict()
     for row in result.data:
       if section_name in row:
         match = True
       elif match:
         row = row.split('\t')
-        if (row[1] == 'NULL'):
+        if row[1] == 'NULL':
           break
         properties[row[1].rstrip()] = row[2].rstrip()
     return properties


[impala] 09/11: IMPALA-6917: Implement COMMENT ON TABLE/VIEW

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 47f135dfcf6f0a157161ab4e9a0b2929abc3b0eb
Author: Fredy Wijaya <fw...@cloudera.com>
AuthorDate: Tue May 22 16:40:47 2018 -0500

    IMPALA-6917: Implement COMMENT ON TABLE/VIEW
    
    This patch implements updating comment on a table or view.
    
    Syntax:
    COMMENT ON TABLE t IS 'comment'
    COMMENT ON VIEW v IS 'comment'
    
    Testing:
    - Added new front-end tests
    - Ran all front-end tests
    - Added new end-to-end tests
    - Ran end-to-end DDL tests
    
    Change-Id: I497c17342f79ff7c99931fd8a0ddec0c79303dbf
    Reviewed-on: http://gerrit.cloudera.org:8080/10478
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 common/thrift/JniCatalog.thrift                    |  9 +++-
 fe/src/main/cup/sql-parser.cup                     |  4 ++
 .../apache/impala/analysis/CommentOnDbStmt.java    |  2 +
 .../org/apache/impala/analysis/CommentOnStmt.java  | 13 +++++-
 ...OnDbStmt.java => CommentOnTableOrViewStmt.java} | 35 +++++++++++----
 ...{CommentOnStmt.java => CommentOnTableStmt.java} | 22 ++++-----
 .../{CommentOnStmt.java => CommentOnViewStmt.java} | 22 ++++-----
 .../apache/impala/service/CatalogOpExecutor.java   | 37 ++++++++++++++-
 .../org/apache/impala/analysis/AnalyzeDDLTest.java | 52 +++++++++++++++++++++-
 .../apache/impala/analysis/AuthorizationTest.java  | 16 +++++++
 .../org/apache/impala/analysis/ParserTest.java     | 15 +++++++
 tests/metadata/test_ddl.py                         | 50 +++++++++++++++++++++
 tests/metadata/test_ddl_base.py                    |  6 ++-
 13 files changed, 248 insertions(+), 35 deletions(-)

diff --git a/common/thrift/JniCatalog.thrift b/common/thrift/JniCatalog.thrift
index 87722a4..7052faf 100644
--- a/common/thrift/JniCatalog.thrift
+++ b/common/thrift/JniCatalog.thrift
@@ -663,9 +663,16 @@ struct TGetCatalogUsageResponse{
 }
 
 struct TCommentOnParams {
-  // Name of comment to alter. When this field is not set, the comment will be removed.
+  // Contents of comment to alter. When this field is not set, the comment will be removed.
   1: optional string comment
 
+  //--------------------------------------
+  // Only one of these fields can be set.
+  //--------------------------------------
+
   // Name of database to alter.
   2: optional string db
+
+  // Name of table/view to alter.
+  3: optional CatalogObjects.TTableName table_name
 }
diff --git a/fe/src/main/cup/sql-parser.cup b/fe/src/main/cup/sql-parser.cup
index 352f185..db69aae 100644
--- a/fe/src/main/cup/sql-parser.cup
+++ b/fe/src/main/cup/sql-parser.cup
@@ -1030,6 +1030,10 @@ partition_def_list ::=
 comment_on_stmt ::=
   KW_COMMENT KW_ON KW_DATABASE ident_or_default:db_name KW_IS nullable_comment_val:comment
   {: RESULT = new CommentOnDbStmt(db_name, comment); :}
+  | KW_COMMENT KW_ON KW_TABLE table_name:table KW_IS nullable_comment_val:comment
+  {: RESULT = new CommentOnTableStmt(table, comment); :}
+  | KW_COMMENT KW_ON KW_VIEW table_name:table KW_IS nullable_comment_val:comment
+  {: RESULT = new CommentOnViewStmt(table, comment); :}
   ;
 
 // Introducing OWNER and USER keywords has a potential to be a breaking change,
diff --git a/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java b/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
index 8f622fd..a216fd3 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
@@ -36,9 +36,11 @@ public class CommentOnDbStmt extends CommentOnStmt {
 
   @Override
   public void analyze(Analyzer analyzer) throws AnalysisException {
+    super.analyze(analyzer);
     analyzer.getDb(dbName_, Privilege.ALTER);
   }
 
+  @Override
   public TCommentOnParams toThrift() {
     TCommentOnParams params = super.toThrift();
     params.setDb(dbName_);
diff --git a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java b/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
index dffc08d..667659a 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
@@ -17,18 +17,29 @@
 
 package org.apache.impala.analysis;
 
+import org.apache.impala.common.AnalysisException;
 import org.apache.impala.thrift.TCommentOnParams;
+import org.apache.impala.util.MetaStoreUtil;
 
 /**
  * A base class for COMMENT ON statement.
  */
-public class CommentOnStmt extends StatementBase {
+public abstract class CommentOnStmt extends StatementBase {
   protected final String comment_;
 
   public CommentOnStmt(String comment) {
     comment_ = comment;
   }
 
+  @Override
+  public void analyze(Analyzer analyzer) throws AnalysisException {
+    if (comment_ != null && comment_.length() > MetaStoreUtil.CREATE_MAX_COMMENT_LENGTH) {
+      throw new AnalysisException(String.format("Comment exceeds maximum length of %d " +
+          "characters. The given comment has %d characters.",
+          MetaStoreUtil.CREATE_MAX_COMMENT_LENGTH, comment_.length()));
+    }
+  }
+
   public TCommentOnParams toThrift() {
     TCommentOnParams params = new TCommentOnParams();
     params.setComment(comment_);
diff --git a/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java b/fe/src/main/java/org/apache/impala/analysis/CommentOnTableOrViewStmt.java
similarity index 55%
copy from fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
copy to fe/src/main/java/org/apache/impala/analysis/CommentOnTableOrViewStmt.java
index 8f622fd..36325a6 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CommentOnDbStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CommentOnTableOrViewStmt.java
@@ -22,26 +22,45 @@ import org.apache.impala.authorization.Privilege;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.thrift.TCommentOnParams;
 
+import java.util.List;
+
 /**
- * Represents a COMMENT ON DATABASE db IS 'comment' statement.
+ * A base class for COMMENT ON TABLE/VIEW.
  */
-public class CommentOnDbStmt extends CommentOnStmt {
-  private final String dbName_;
+public abstract class CommentOnTableOrViewStmt extends CommentOnStmt {
+  protected TableName tableName_;
 
-  public CommentOnDbStmt(String dbName, String comment) {
+  public CommentOnTableOrViewStmt(TableName tableName, String comment) {
     super(comment);
-    Preconditions.checkNotNull(dbName);
-    dbName_ = dbName;
+    Preconditions.checkArgument(tableName != null && !tableName.isEmpty());
+    tableName_ = tableName;
+  }
+
+  @Override
+  public void collectTableRefs(List<TableRef> tblRefs) {
+    tblRefs.add(new TableRef(tableName_.toPath(), null));
   }
 
   @Override
   public void analyze(Analyzer analyzer) throws AnalysisException {
-    analyzer.getDb(dbName_, Privilege.ALTER);
+    super.analyze(analyzer);
+    tableName_ = analyzer.getFqTableName(tableName_);
+    TableRef tableRef = new TableRef(tableName_.toPath(), null, Privilege.ALTER);
+    tableRef = analyzer.resolveTableRef(tableRef);
+    Preconditions.checkNotNull(tableRef);
+    tableRef.analyze(analyzer);
+    validateType(tableRef);
   }
 
+  /**
+   * Validates the type of the given TableRef.
+   */
+  protected abstract void validateType(TableRef tableRef) throws AnalysisException;
+
+  @Override
   public TCommentOnParams toThrift() {
     TCommentOnParams params = super.toThrift();
-    params.setDb(dbName_);
+    params.setTable_name(tableName_.toThrift());
     return params;
   }
 }
diff --git a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java b/fe/src/main/java/org/apache/impala/analysis/CommentOnTableStmt.java
similarity index 61%
copy from fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
copy to fe/src/main/java/org/apache/impala/analysis/CommentOnTableStmt.java
index dffc08d..d7daa40 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CommentOnTableStmt.java
@@ -17,21 +17,21 @@
 
 package org.apache.impala.analysis;
 
-import org.apache.impala.thrift.TCommentOnParams;
+import org.apache.impala.common.AnalysisException;
 
 /**
- * A base class for COMMENT ON statement.
+ * Represents a COMMENT ON TABLE tbl IS 'comment' statement.
  */
-public class CommentOnStmt extends StatementBase {
-  protected final String comment_;
-
-  public CommentOnStmt(String comment) {
-    comment_ = comment;
+public class CommentOnTableStmt extends CommentOnTableOrViewStmt {
+  public CommentOnTableStmt(TableName tableName, String comment) {
+    super(tableName, comment);
   }
 
-  public TCommentOnParams toThrift() {
-    TCommentOnParams params = new TCommentOnParams();
-    params.setComment(comment_);
-    return params;
+  @Override
+  protected void validateType(TableRef tableRef) throws AnalysisException {
+    if (tableRef instanceof InlineViewRef) {
+      throw new AnalysisException(String.format(
+          "COMMENT ON TABLE not allowed on a view: %s", tableName_));
+    }
   }
 }
diff --git a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java b/fe/src/main/java/org/apache/impala/analysis/CommentOnViewStmt.java
similarity index 61%
copy from fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
copy to fe/src/main/java/org/apache/impala/analysis/CommentOnViewStmt.java
index dffc08d..cbd3f6b 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CommentOnStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CommentOnViewStmt.java
@@ -17,21 +17,21 @@
 
 package org.apache.impala.analysis;
 
-import org.apache.impala.thrift.TCommentOnParams;
+import org.apache.impala.common.AnalysisException;
 
 /**
- * A base class for COMMENT ON statement.
+ * Represents a COMMENT ON VIEW v IS 'comment' statement.
  */
-public class CommentOnStmt extends StatementBase {
-  protected final String comment_;
-
-  public CommentOnStmt(String comment) {
-    comment_ = comment;
+public class CommentOnViewStmt extends CommentOnTableOrViewStmt {
+  public CommentOnViewStmt(TableName tableName, String comment) {
+    super(tableName, comment);
   }
 
-  public TCommentOnParams toThrift() {
-    TCommentOnParams params = new TCommentOnParams();
-    params.setComment(comment_);
-    return params;
+  @Override
+  protected void validateType(TableRef tableRef) throws AnalysisException {
+    if (!(tableRef instanceof InlineViewRef)) {
+      throw new AnalysisException(String.format(
+          "COMMENT ON VIEW not allowed on a table: %s", tableName_));
+    }
   }
 }
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index 2d83bab..f846e60 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -3502,12 +3502,19 @@ public class CatalogOpExecutor {
   private void alterCommentOn(TCommentOnParams params, TDdlExecResponse response)
       throws ImpalaRuntimeException, CatalogException, InternalException {
     if (params.getDb() != null) {
+      Preconditions.checkArgument(!params.isSetTable_name());
       alterCommentOnDb(params.getDb(), params.getComment(), response);
+    } else if (params.getTable_name() != null) {
+      Preconditions.checkArgument(!params.isSetDb());
+      alterCommentOnTableOrView(TableName.fromThrift(params.getTable_name()),
+          params.getComment(), response);
+    } else {
+      throw new UnsupportedOperationException("Unsupported COMMENT ON operation");
     }
   }
 
   private void alterCommentOnDb(String dbName, String comment, TDdlExecResponse response)
-      throws ImpalaRuntimeException, CatalogException, InternalException {
+      throws ImpalaRuntimeException, CatalogException {
     Db db = catalog_.getDb(dbName);
     if (db == null) {
       throw new CatalogException("Database: " + db.getName() + " does not exist.");
@@ -3582,4 +3589,32 @@ public class CatalogOpExecutor {
       catalog_.getLock().writeLock().unlock();
     }
   }
+
+  private void alterCommentOnTableOrView(TableName tableName, String comment,
+      TDdlExecResponse response) throws CatalogException, InternalException,
+      ImpalaRuntimeException {
+    Table tbl = getExistingTable(tableName.getDb(), tableName.getTbl());
+    if (!catalog_.tryLockTable(tbl)) {
+      throw new InternalException(String.format("Error altering table/view %s due to " +
+          "lock contention.", tbl.getFullName()));
+    }
+    try {
+      long newCatalogVersion = catalog_.incrementAndGetCatalogVersion();
+      catalog_.getLock().writeLock().unlock();
+      org.apache.hadoop.hive.metastore.api.Table msTbl = tbl.getMetaStoreTable().deepCopy();
+      boolean isView = msTbl.getTableType().equalsIgnoreCase(
+          TableType.VIRTUAL_VIEW.toString());
+      if (comment == null) {
+        msTbl.getParameters().remove("comment");
+      } else {
+        msTbl.getParameters().put("comment", comment);
+      }
+      applyAlterTable(msTbl, true);
+      loadTableMetadata(tbl, newCatalogVersion, false, false, null);
+      addTableToCatalogUpdate(tbl, response.result);
+      addSummary(response, String.format("Updated %s.", (isView) ? "view" : "table"));
+    } finally {
+      tbl.getLock().unlock();
+    }
+  }
 }
diff --git a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
index 5ab533c..634fb87 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
@@ -46,6 +46,7 @@ import org.apache.impala.catalog.Type;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.common.FileSystemUtil;
 import org.apache.impala.common.FrontendTestBase;
+import org.apache.impala.common.Pair;
 import org.apache.impala.common.RuntimeEnv;
 import org.apache.impala.service.BackendConfig;
 import org.apache.impala.testutil.TestUtils;
@@ -3912,12 +3913,61 @@ public class AnalyzeDDLTest extends FrontendTestBase {
   }
 
   @Test
-  public void TestCommentOn() {
+  public void TestCommentOnDatabase() {
     AnalyzesOk("comment on database functional is 'comment'");
     AnalyzesOk("comment on database functional is ''");
     AnalyzesOk("comment on database functional is null");
     AnalysisError("comment on database doesntexist is 'comment'",
         "Database does not exist: doesntexist");
+    AnalysisError(String.format("comment on database functional is '%s'",
+        buildLongComment()), "Comment exceeds maximum length of 256 characters. " +
+        "The given comment has 261 characters.");
+  }
+
+  @Test
+  public void TestCommentOnTable() {
+    for (Pair<String, AnalysisContext> pair : new Pair[]{
+        new Pair<>("functional.alltypes", createAnalysisCtx()),
+        new Pair<>("alltypes", createAnalysisCtx("functional"))}) {
+      AnalyzesOk(String.format("comment on table %s is 'comment'", pair.first),
+          pair.second);
+      AnalyzesOk(String.format("comment on table %s is ''", pair.first), pair.second);
+      AnalyzesOk(String.format("comment on table %s is null", pair.first), pair.second);
+    }
+    AnalysisError("comment on table doesntexist is 'comment'",
+        "Could not resolve table reference: 'default.doesntexist'");
+    AnalysisError("comment on table functional.alltypes_view is 'comment'",
+        "COMMENT ON TABLE not allowed on a view: functional.alltypes_view");
+    AnalysisError(String.format("comment on table functional.alltypes is '%s'",
+        buildLongComment()), "Comment exceeds maximum length of 256 characters. " +
+        "The given comment has 261 characters.");
+  }
+
+  @Test
+  public void TestCommentOnView() {
+    for (Pair<String, AnalysisContext> pair : new Pair[]{
+        new Pair<>("functional.alltypes_view", createAnalysisCtx()),
+        new Pair<>("alltypes_view", createAnalysisCtx("functional"))}) {
+      AnalyzesOk(String.format("comment on view %s is 'comment'", pair.first),
+          pair.second);
+      AnalyzesOk(String.format("comment on view %s is ''", pair.first), pair.second);
+      AnalyzesOk(String.format("comment on view %s is null", pair.first), pair.second);
+    }
+    AnalysisError("comment on view doesntexist is 'comment'",
+        "Could not resolve table reference: 'default.doesntexist'");
+    AnalysisError("comment on view functional.alltypes is 'comment'",
+        "COMMENT ON VIEW not allowed on a table: functional.alltypes");
+    AnalysisError(String.format("comment on table functional.alltypes_view is '%s'",
+        buildLongComment()), "Comment exceeds maximum length of 256 characters. " +
+        "The given comment has 261 characters.");
+  }
+
+  private static String buildLongComment() {
+    StringBuilder comment = new StringBuilder();
+    for (int i = 0; i < MetaStoreUtil.CREATE_MAX_COMMENT_LENGTH + 5; i++) {
+      comment.append("a");
+    }
+    return comment.toString();
   }
 
   @Test
diff --git a/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java b/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
index ed6f0e4..36bcaa3 100644
--- a/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/AuthorizationTest.java
@@ -3015,6 +3015,22 @@ public class AuthorizationTest extends FrontendTestBase {
         "User '%s' does not have privileges to execute 'ALTER' on: functional");
     AuthzError("comment on database doesntexist is 'comment'",
         "User '%s' does not have privileges to execute 'ALTER' on: doesntexist");
+
+    // User has ALTER privilege on functional_text_lzo.alltypes table.
+    AuthzOk("comment on table functional_text_lzo.alltypes is 'comment'");
+    // User does not have ALTER privilege on functional.alltypes table.
+    AuthzError("comment on table functional.alltypes is 'comment'",
+        "User '%s' does not have privileges to execute 'ALTER' on: functional");
+    AuthzError("comment on table doesntexist is 'comment'",
+        "User '%s' does not have privileges to execute 'ALTER' on: default.doesntexist");
+
+    // User has ALTER privilege on functional.alltypes_view view.
+    AuthzOk("comment on view functional.alltypes_view is 'comment'");
+    // User does not have ALTER privilege on functional.view_view table.
+    AuthzError("comment on view functional.view_view is 'comment'",
+        "User '%s' does not have privileges to execute 'ALTER' on: functional");
+    AuthzError("comment on view doesntexist is 'comment'",
+        "User '%s' does not have privileges to execute 'ALTER' on: default.doesntexist");
   }
 
   @Test
diff --git a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
index ba01564..93c2e2b 100644
--- a/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
+++ b/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
@@ -3746,6 +3746,21 @@ public class ParserTest extends FrontendTestBase {
     ParsesOk("COMMENT ON DATABASE db IS NULL");
     ParserError("COMMENT ON DATABASE IS 'comment'");
     ParserError("COMMENT ON DATABASE db IS");
+
+    for (String tbl : new String[]{"db.t", "t"}) {
+      ParsesOk(String.format("COMMENT ON TABLE %s IS 'comment'", tbl));
+      ParsesOk(String.format("COMMENT ON TABLE %s IS ''", tbl));
+      ParsesOk(String.format("COMMENT ON TABLE %s IS NULL", tbl));
+
+      ParsesOk(String.format("COMMENT ON VIEW %s IS 'comment'", tbl));
+      ParsesOk(String.format("COMMENT ON VIEW %s IS ''", tbl));
+      ParsesOk(String.format("COMMENT ON VIEW %s IS NULL", tbl));
+    }
+    ParserError("COMMENT ON TABLE IS 'comment'");
+    ParserError("COMMENT ON TABLE tbl IS");
+
+    ParserError("COMMENT ON VIEW IS 'comment'");
+    ParserError("COMMENT ON VIEW tbl IS");
   }
 
   @Test
diff --git a/tests/metadata/test_ddl.py b/tests/metadata/test_ddl.py
index e11fb09..bcafd8e 100644
--- a/tests/metadata/test_ddl.py
+++ b/tests/metadata/test_ddl.py
@@ -202,6 +202,10 @@ class TestDdlStatements(TestDdlBase):
     comment = self._get_db_comment(unique_database)
     assert 'comment' == comment
 
+    self.client.execute("comment on database {0} is '\\'comment\\''".format(unique_database))
+    comment = self._get_db_comment(unique_database)
+    assert "\\'comment\\'" == comment
+
     self.client.execute("comment on database {0} is ''".format(unique_database))
     comment = self._get_db_comment(unique_database)
     assert '' == comment
@@ -258,6 +262,52 @@ class TestDdlStatements(TestDdlBase):
     self.run_test_case('QueryTest/kudu_create', vector, use_db=unique_database,
         multiple_impalad=self._use_multiple_impalad(vector))
 
+  def test_comment_on_table(self, vector, unique_database):
+    table = '{0}.comment_table'.format(unique_database)
+    self.client.execute("create table {0} (i int)".format(table))
+
+    comment = self._get_table_or_view_comment(table)
+    assert comment is None
+
+    self.client.execute("comment on table {0} is 'comment'".format(table))
+    comment = self._get_table_or_view_comment(table)
+    assert "comment" == comment
+
+    self.client.execute("comment on table {0} is '\\'comment\\''".format(table))
+    comment = self._get_table_or_view_comment(table)
+    assert "\\\\'comment\\\\'" == comment
+
+    self.client.execute("comment on table {0} is ''".format(table))
+    comment = self._get_table_or_view_comment(table)
+    assert "" == comment
+
+    self.client.execute("comment on table {0} is null".format(table))
+    comment = self._get_table_or_view_comment(table)
+    assert comment is None
+
+  def test_comment_on_view(self, vector, unique_database):
+    view = '{0}.comment_view'.format(unique_database)
+    self.client.execute("create view {0} as select 1".format(view))
+
+    comment = self._get_table_or_view_comment(view)
+    assert comment is None
+
+    self.client.execute("comment on view {0} is 'comment'".format(view))
+    comment = self._get_table_or_view_comment(view)
+    assert "comment" == comment
+
+    self.client.execute("comment on view {0} is '\\'comment\\''".format(view))
+    comment = self._get_table_or_view_comment(view)
+    assert "\\\\'comment\\\\'" == comment
+
+    self.client.execute("comment on view {0} is ''".format(view))
+    comment = self._get_table_or_view_comment(view)
+    assert "" == comment
+
+    self.client.execute("comment on view {0} is null".format(view))
+    comment = self._get_table_or_view_comment(view)
+    assert comment is None
+
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_sync_ddl_drop(self, vector, unique_database):
     """Verifies the catalog gets updated properly when dropping objects with sync_ddl
diff --git a/tests/metadata/test_ddl_base.py b/tests/metadata/test_ddl_base.py
index cc0e0c2..c1388fc 100644
--- a/tests/metadata/test_ddl_base.py
+++ b/tests/metadata/test_ddl_base.py
@@ -96,4 +96,8 @@ class TestDdlBase(ImpalaTestSuite):
   def _get_db_comment(self, db_name):
     """Extracts the DB comment from the output of DESCRIBE DATABASE"""
     result = self.client.execute("describe database {0}".format(db_name))
-    return result.data[0].split('\t')[2]
\ No newline at end of file
+    return result.data[0].split('\t')[2]
+
+  def _get_table_or_view_comment(self, table_name):
+    props = self._get_tbl_properties(table_name)
+    return props["comment"] if "comment" in props else None
\ No newline at end of file


[impala] 05/11: IMPALA-5552: Add support for authorized proxy groups

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b8e995e042619c6062cfa970b21924508e16677d
Author: Fredy Wijaya <fw...@cloudera.com>
AuthorDate: Wed May 23 20:16:00 2018 -0700

    IMPALA-5552: Add support for authorized proxy groups
    
    The patch adds support for mapping of users to a list of proxy groups.
    
    The following flags are added in impalad:
    - authorized_proxy_group_config
    - authorized_proxy_group_config_delimiter
    
    Example:
    --authorized_proxy_group_config=hue=group1,group2;user1=*
    
    This feature is not supported on Shell-based Hadoop groups mapping
    providers.
    
    The authorized_proxy_user/group_config parser will now strip leading
    and trailing whitespaces from the user/group names.
    
    Testing:
    - Added FE unit test to check for groups mapping provider
    - Added BE unit test for the parsing logic
    - Added a new test in test_authorization.py
    - Ran all end-to-end test_authorization.py
    
    Change-Id: I6953f89c293b06b72f523b11802232133d9d6cbb
    Reviewed-on: http://gerrit.cloudera.org:8080/10510
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Tim Armstrong <ta...@cloudera.com>
---
 be/src/service/CMakeLists.txt                      |   1 +
 be/src/service/frontend.cc                         |  16 +++
 be/src/service/frontend.h                          |   5 +
 be/src/service/impala-server-test.cc               |  74 +++++++++++++
 be/src/service/impala-server.cc                    | 117 ++++++++++++++++-----
 be/src/service/impala-server.h                     |  23 +++-
 be/src/util/backend-gflag-util.cc                  |   3 +
 common/thrift/BackendGflags.thrift                 |   2 +
 common/thrift/Frontend.thrift                      |  10 ++
 .../org/apache/impala/service/BackendConfig.java   |   4 +
 .../org/apache/impala/service/JniFrontend.java     |  67 +++++++++++-
 .../org/apache/impala/service/JniFrontendTest.java |  58 ++++++++++
 tests/authorization/test_authorization.py          |  40 ++++++-
 13 files changed, 384 insertions(+), 36 deletions(-)

diff --git a/be/src/service/CMakeLists.txt b/be/src/service/CMakeLists.txt
index c78116c..66cf55d 100644
--- a/be/src/service/CMakeLists.txt
+++ b/be/src/service/CMakeLists.txt
@@ -77,3 +77,4 @@ target_link_libraries(impalad
 ADD_BE_TEST(session-expiry-test session-expiry-test.cc)
 ADD_BE_TEST(hs2-util-test hs2-util-test.cc)
 ADD_BE_TEST(query-options-test query-options-test.cc)
+ADD_BE_TEST(impala-server-test impala-server-test.cc)
diff --git a/be/src/service/frontend.cc b/be/src/service/frontend.cc
index ecfae0a..0108f63 100644
--- a/be/src/service/frontend.cc
+++ b/be/src/service/frontend.cc
@@ -57,6 +57,16 @@ DEFINE_string(authorized_proxy_user_config, "",
     "users. For example: hue=user1,user2;admin=*");
 DEFINE_string(authorized_proxy_user_config_delimiter, ",",
     "Specifies the delimiter used in authorized_proxy_user_config. ");
+DEFINE_string(authorized_proxy_group_config, "",
+    "Specifies the set of authorized proxy groups (users who can delegate to other "
+    "users belonging to the specified groups during authorization) and whom they are "
+    "allowed to delegate. Input is a semicolon-separated list of key=value pairs of "
+    "authorized proxy users to the group(s) they can delegate to. These groups are "
+    "specified as a list of groups separated by a delimiter (which defaults to comma and "
+    "may be changed via --authorized_proxy_group_config_delimiter), or '*' to indicate "
+    "all users. For example: hue=group1,group2;admin=*");
+DEFINE_string(authorized_proxy_group_config_delimiter, ",",
+    "Specifies the delimiter used in authorized_proxy_group_config. ");
 DEFINE_string(kudu_master_hosts, "", "Specifies the default Kudu master(s). The given "
     "value should be a comma separated list of hostnames or IP addresses; ports are "
     "optional.");
@@ -68,6 +78,7 @@ Frontend::Frontend() {
     {"getExplainPlan", "([B)Ljava/lang/String;", &get_explain_plan_id_},
     {"getHadoopConfig", "([B)[B", &get_hadoop_config_id_},
     {"getAllHadoopConfigs", "()[B", &get_hadoop_configs_id_},
+    {"getHadoopGroups", "([B)[B", &get_hadoop_groups_id_},
     {"checkConfiguration", "()Ljava/lang/String;", &check_config_id_},
     {"updateCatalogCache", "([B)[B", &update_catalog_cache_id_},
     {"updateMembership", "([B)V", &update_membership_id_},
@@ -242,6 +253,11 @@ Status Frontend::GetHadoopConfig(const TGetHadoopConfigRequest& request,
   return JniUtil::CallJniMethod(fe_, get_hadoop_config_id_, request, response);
 }
 
+Status Frontend::GetHadoopGroups(const TGetHadoopGroupsRequest& request,
+    TGetHadoopGroupsResponse* response) {
+  return JniUtil::CallJniMethod(fe_, get_hadoop_groups_id_, request, response);
+}
+
 Status Frontend::LoadData(const TLoadDataReq& request, TLoadDataResp* response) {
   return JniUtil::CallJniMethod(fe_, load_table_data_id_, request, response);
 }
diff --git a/be/src/service/frontend.h b/be/src/service/frontend.h
index 9198b83..08123b3 100644
--- a/be/src/service/frontend.h
+++ b/be/src/service/frontend.h
@@ -153,6 +153,10 @@ class Frontend {
   Status GetHadoopConfig(const TGetHadoopConfigRequest& request,
       TGetHadoopConfigResponse* response);
 
+  /// Returns (in the output parameter) the list of groups for the given user.
+  Status GetHadoopGroups(const TGetHadoopGroupsRequest& request,
+      TGetHadoopGroupsResponse* response);
+
   /// Loads a single file or set of files into a table or partition. Saves the RPC
   /// response in the TLoadDataResp output parameter. Returns OK if the operation
   /// completed successfully.
@@ -186,6 +190,7 @@ class Frontend {
   jmethodID get_explain_plan_id_;  // JniFrontend.getExplainPlan()
   jmethodID get_hadoop_config_id_;  // JniFrontend.getHadoopConfig(byte[])
   jmethodID get_hadoop_configs_id_;  // JniFrontend.getAllHadoopConfigs()
+  jmethodID get_hadoop_groups_id_;  // JniFrontend.getHadoopGroups()
   jmethodID check_config_id_; // JniFrontend.checkConfiguration()
   jmethodID update_catalog_cache_id_; // JniFrontend.updateCatalogCache(byte[][])
   jmethodID update_membership_id_; // JniFrontend.updateMembership()
diff --git a/be/src/service/impala-server-test.cc b/be/src/service/impala-server-test.cc
new file mode 100644
index 0000000..772e70f
--- /dev/null
+++ b/be/src/service/impala-server-test.cc
@@ -0,0 +1,74 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+#include <gutil/strings/substitute.h>
+#include <sstream>
+#include <vector>
+
+#include "service/impala-server.h"
+#include "testutil/gtest-util.h"
+
+using namespace impala;
+using namespace std;
+using namespace strings;
+
+namespace impala {
+
+using AuthorizedProxyMap =
+  boost::unordered_map<std::string, boost::unordered_set<std::string>>;
+
+class ImpalaServerTest : public testing::Test {
+public:
+  static Status PopulateAuthorizedProxyConfig(
+      const string& authorized_proxy_config,
+      const string& authorized_proxy_config_delimiter,
+      AuthorizedProxyMap* authorized_proxy_map) {
+    return ImpalaServer::PopulateAuthorizedProxyConfig(authorized_proxy_config,
+        authorized_proxy_config_delimiter, authorized_proxy_map);
+  }
+};
+
+}
+
+TEST(ImpalaServerTest, PopulateAuthorizedProxyConfig) {
+  vector<string> delimiters{",", "@", " "};
+  for (auto& delimiter : delimiters) {
+    AuthorizedProxyMap proxy_map;
+    Status status = ImpalaServerTest::PopulateAuthorizedProxyConfig(
+        Substitute("hue=user1$0user2;impala = user3 ;hive=* ", delimiter), delimiter,
+        &proxy_map);
+    EXPECT_TRUE(status.ok());
+    EXPECT_EQ(3ul, proxy_map.size());
+
+    auto proxies = proxy_map["hue"];
+    EXPECT_EQ(2ul, proxies.size());
+    EXPECT_EQ("user1", *proxies.find("user1"));
+    EXPECT_EQ("user2", *proxies.find("user2"));
+
+    proxies = proxy_map["impala"];
+    EXPECT_EQ(1ul, proxies.size());
+    EXPECT_EQ("user3", *proxies.find("user3"));
+
+    proxies = proxy_map["hive"];
+    EXPECT_EQ(1ul, proxies.size());
+    EXPECT_EQ("*", *proxies.find("*"));
+
+    EXPECT_EQ(proxy_map.end(), proxy_map.find("doesnotexist"));
+  }
+}
+
+IMPALA_TEST_MAIN();
diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index 2259791..771b0eb 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -26,6 +26,7 @@
 #include <boost/unordered_set.hpp>
 #include <boost/bind.hpp>
 #include <boost/algorithm/string.hpp>
+#include <boost/algorithm/string/trim.hpp>
 #include <boost/lexical_cast.hpp>
 #include <gperftools/malloc_extension.h>
 #include <gutil/strings/substitute.h>
@@ -113,6 +114,8 @@ DECLARE_string(nn);
 DECLARE_int32(nn_port);
 DECLARE_string(authorized_proxy_user_config);
 DECLARE_string(authorized_proxy_user_config_delimiter);
+DECLARE_string(authorized_proxy_group_config);
+DECLARE_string(authorized_proxy_group_config_delimiter);
 DECLARE_bool(abort_on_config_error);
 DECLARE_bool(disk_spill_encryption);
 DECLARE_bool(use_krpc);
@@ -316,29 +319,22 @@ ImpalaServer::ImpalaServer(ExecEnv* exec_env)
   }
 
   if (!FLAGS_authorized_proxy_user_config.empty()) {
-    // Parse the proxy user configuration using the format:
-    // <proxy user>=<comma separated list of users they are allowed to delegate>
-    // See FLAGS_authorized_proxy_user_config for more details.
-    vector<string> proxy_user_config;
-    split(proxy_user_config, FLAGS_authorized_proxy_user_config, is_any_of(";"),
-        token_compress_on);
-    if (proxy_user_config.size() > 0) {
-      for (const string& config: proxy_user_config) {
-        size_t pos = config.find("=");
-        if (pos == string::npos) {
-          CLEAN_EXIT_WITH_ERROR(Substitute("Invalid proxy user configuration. No "
-              "mapping value specified for the proxy user. For more information review "
-              "usage of the --authorized_proxy_user_config flag: $0", config));
-        }
-        string proxy_user = config.substr(0, pos);
-        string config_str = config.substr(pos + 1);
-        vector<string> parsed_allowed_users;
-        split(parsed_allowed_users, config_str,
-            is_any_of(FLAGS_authorized_proxy_user_config_delimiter), token_compress_on);
-        unordered_set<string> allowed_users(parsed_allowed_users.begin(),
-            parsed_allowed_users.end());
-        authorized_proxy_user_config_.insert(make_pair(proxy_user, allowed_users));
-      }
+    Status status = PopulateAuthorizedProxyConfig(FLAGS_authorized_proxy_user_config,
+        FLAGS_authorized_proxy_user_config_delimiter, &authorized_proxy_user_config_);
+    if (!status.ok()) {
+      CLEAN_EXIT_WITH_ERROR(Substitute("Invalid proxy user configuration."
+          "No mapping value specified for the proxy user. For more information review "
+          "usage of the --authorized_proxy_user_config flag: $0", status.GetDetail()));
+    }
+  }
+
+  if (!FLAGS_authorized_proxy_group_config.empty()) {
+    Status status = PopulateAuthorizedProxyConfig(FLAGS_authorized_proxy_group_config,
+        FLAGS_authorized_proxy_group_config_delimiter, &authorized_proxy_group_config_);
+    if (!status.ok()) {
+      CLEAN_EXIT_WITH_ERROR(Substitute("Invalid proxy group configuration. "
+          "No mapping value specified for the proxy group. For more information review "
+          "usage of the --authorized_proxy_group_config flag: $0", status.GetDetail()));
     }
   }
 
@@ -403,6 +399,38 @@ ImpalaServer::ImpalaServer(ExecEnv* exec_env)
   exec_env_->SetImpalaServer(this);
 }
 
+Status ImpalaServer::PopulateAuthorizedProxyConfig(
+    const string& authorized_proxy_config,
+    const string& authorized_proxy_config_delimiter,
+    AuthorizedProxyMap* authorized_proxy_config_map) {
+  // Parse the proxy user configuration using the format:
+  // <proxy user>=<comma separated list of users/groups they are allowed to delegate>
+  // See FLAGS_authorized_proxy_user_config or FLAGS_authorized_proxy_group_config
+  // for more details.
+  vector<string> proxy_config;
+  split(proxy_config, authorized_proxy_config, is_any_of(";"),
+      token_compress_on);
+  if (proxy_config.size() > 0) {
+    for (const string& config: proxy_config) {
+      size_t pos = config.find("=");
+      if (pos == string::npos) {
+        return Status(config);
+      }
+      string proxy_user = config.substr(0, pos);
+      boost::trim(proxy_user);
+      string config_str = config.substr(pos + 1);
+      boost::trim(config_str);
+      vector<string> parsed_allowed_users_or_groups;
+      split(parsed_allowed_users_or_groups, config_str,
+          is_any_of(authorized_proxy_config_delimiter), token_compress_on);
+      unordered_set<string> allowed_users_or_groups(
+          parsed_allowed_users_or_groups.begin(), parsed_allowed_users_or_groups.end());
+      authorized_proxy_config_map->insert({proxy_user, allowed_users_or_groups});
+    }
+  }
+  return Status::OK();
+}
+
 Status ImpalaServer::LogLineageRecord(const ClientRequestState& client_request_state) {
   const TExecRequest& request = client_request_state.exec_request();
   if (!request.__isset.query_exec_request && !request.__isset.catalog_op_request) {
@@ -1341,8 +1369,9 @@ Status ImpalaServer::AuthorizeProxyUser(const string& user, const string& do_as_
   stringstream error_msg;
   error_msg << "User '" << user << "' is not authorized to delegate to '"
             << do_as_user << "'.";
-  if (authorized_proxy_user_config_.size() == 0) {
-    error_msg << " User delegation is disabled.";
+  if (authorized_proxy_user_config_.size() == 0 &&
+      authorized_proxy_group_config_.size() == 0) {
+    error_msg << " User/group delegation is disabled.";
     VLOG(1) << error_msg;
     return Status::Expected(error_msg.str());
   }
@@ -1357,13 +1386,45 @@ Status ImpalaServer::AuthorizeProxyUser(const string& user, const string& do_as_
 
   // Check if the proxy user exists. If he/she does, then check if they are allowed
   // to delegate to the do_as_user.
-  ProxyUserMap::const_iterator proxy_user =
+  AuthorizedProxyMap::const_iterator proxy_user =
       authorized_proxy_user_config_.find(short_user);
   if (proxy_user != authorized_proxy_user_config_.end()) {
-    for (const string& user: proxy_user->second) {
-      if (user == "*" || user == do_as_user) return Status::OK();
+    boost::unordered_set<string> users = proxy_user->second;
+    if (users.find("*") != users.end() ||
+        users.find(do_as_user) != users.end()) {
+      return Status::OK();
     }
   }
+
+  if (authorized_proxy_group_config_.size() > 0) {
+    // Check if the groups of do_as_user are in the authorized proxy groups.
+    AuthorizedProxyMap::const_iterator proxy_group =
+        authorized_proxy_group_config_.find(short_user);
+    if (proxy_group != authorized_proxy_group_config_.end()) {
+      boost::unordered_set<string> groups = proxy_group->second;
+      if (groups.find("*") != groups.end()) return Status::OK();
+
+      TGetHadoopGroupsRequest req;
+      req.__set_user(do_as_user);
+      TGetHadoopGroupsResponse res;
+      int64_t start = MonotonicMillis();
+      Status status = exec_env_->frontend()->GetHadoopGroups(req, &res);
+      VLOG_QUERY << "Getting Hadoop groups for user: " << short_user << " took " <<
+          (PrettyPrinter::Print(MonotonicMillis() - start, TUnit::TIME_MS));
+      if (!status.ok()) {
+        LOG(ERROR) << "Error getting Hadoop groups for user: " << short_user << ": "
+            << status.GetDetail();
+        return status;
+      }
+
+      for (const string& do_as_group : res.groups) {
+        if (groups.find(do_as_group) != groups.end()) {
+          return Status::OK();
+        }
+      }
+    }
+  }
+
   VLOG(1) << error_msg;
   return Status::Expected(error_msg.str());
 }
diff --git a/be/src/service/impala-server.h b/be/src/service/impala-server.h
index 26e438f..deca000 100644
--- a/be/src/service/impala-server.h
+++ b/be/src/service/impala-server.h
@@ -466,6 +466,7 @@ class ImpalaServer : public ImpalaServiceIf,
   friend class ChildQuery;
   friend class ImpalaHttpHandler;
   friend struct SessionState;
+  friend class ImpalaServerTest;
 
   boost::scoped_ptr<ImpalaHttpHandler> http_handler_;
 
@@ -813,6 +814,18 @@ class ImpalaServer : public ImpalaServiceIf,
   /// Expire 'crs' and cancel it with status 'status'.
   void ExpireQuery(ClientRequestState* crs, const Status& status);
 
+  typedef boost::unordered_map<std::string, boost::unordered_set<std::string>>
+      AuthorizedProxyMap;
+  /// Populates authorized proxy config into the given map.
+  /// For example:
+  /// - authorized_proxy_config: foo=abc,def;bar=ghi
+  /// - authorized_proxy_config_delimiter: ,
+  /// - authorized_proxy_map: {foo:[abc, def], bar=s[ghi]}
+  static Status PopulateAuthorizedProxyConfig(
+      const std::string& authorized_proxy_config,
+      const std::string& authorized_proxy_config_delimiter,
+      AuthorizedProxyMap* authorized_proxy_map);
+
   /// Guards query_log_ and query_log_index_
   boost::mutex query_log_lock_;
 
@@ -1013,12 +1026,14 @@ class ImpalaServer : public ImpalaServiceIf,
   /// update. Updated with each catalog topic heartbeat from the statestore.
   int64_t min_subscriber_catalog_topic_version_;
 
-  /// Map of short usernames of authorized proxy users to the set of user(s) they are
+  /// Map of short usernames of authorized proxy users to the set of users they are
   /// allowed to delegate to. Populated by parsing the --authorized_proxy_users_config
   /// flag.
-  typedef boost::unordered_map<std::string, boost::unordered_set<std::string>>
-      ProxyUserMap;
-  ProxyUserMap authorized_proxy_user_config_;
+  AuthorizedProxyMap authorized_proxy_user_config_;
+  /// Map of short usernames of authorized proxy users to the set of groups they are
+  /// allowed to delegate to. Populated by parsing the --authorized_proxy_groups_config
+  /// flag.
+  AuthorizedProxyMap authorized_proxy_group_config_;
 
   /// Guards queries_by_timestamp_. See "Locking" in the class comment for lock
   /// acquisition order.
diff --git a/be/src/util/backend-gflag-util.cc b/be/src/util/backend-gflag-util.cc
index 02e1ed8..acbdc6f 100644
--- a/be/src/util/backend-gflag-util.cc
+++ b/be/src/util/backend-gflag-util.cc
@@ -45,6 +45,8 @@ DECLARE_string(authorization_policy_file);
 DECLARE_string(authorization_policy_provider_class);
 DECLARE_string(authorized_proxy_user_config);
 DECLARE_string(authorized_proxy_user_config_delimiter);
+DECLARE_string(authorized_proxy_group_config);
+DECLARE_string(authorized_proxy_group_config_delimiter);
 DECLARE_string(kudu_master_hosts);
 DECLARE_string(reserved_words_version);
 DECLARE_string(sentry_config);
@@ -90,6 +92,7 @@ Status GetThriftBackendGflags(JNIEnv* jni_env, jbyteArray* cfg_bytes) {
   }
   cfg.__set_max_filter_error_rate(FLAGS_max_filter_error_rate);
   cfg.__set_min_buffer_size(FLAGS_min_buffer_size);
+  cfg.__set_authorized_proxy_group_config(FLAGS_authorized_proxy_group_config);
   RETURN_IF_ERROR(SerializeThriftMsg(jni_env, &cfg, cfg_bytes));
   return Status::OK();
 }
diff --git a/common/thrift/BackendGflags.thrift b/common/thrift/BackendGflags.thrift
index c98f50a..f645e3c 100644
--- a/common/thrift/BackendGflags.thrift
+++ b/common/thrift/BackendGflags.thrift
@@ -75,4 +75,6 @@ struct TBackendGflags {
   24: required i64 min_buffer_size
 
   25: required bool enable_orc_scanner
+
+  26: required string authorized_proxy_group_config
 }
diff --git a/common/thrift/Frontend.thrift b/common/thrift/Frontend.thrift
index f8ab05e..8d32b53 100644
--- a/common/thrift/Frontend.thrift
+++ b/common/thrift/Frontend.thrift
@@ -820,6 +820,16 @@ struct TGetAllHadoopConfigsResponse {
   1: optional map<string, string> configs;
 }
 
+struct TGetHadoopGroupsRequest {
+  // The user name to get the groups from.
+  1: required string user
+}
+
+struct TGetHadoopGroupsResponse {
+  // The list of groups that the user belongs to.
+  1: required list<string> groups
+}
+
 // For creating a test descriptor table. The tuples and their memory layout are computed
 // in the FE.
 struct TBuildTestDescriptorTableParams {
diff --git a/fe/src/main/java/org/apache/impala/service/BackendConfig.java b/fe/src/main/java/org/apache/impala/service/BackendConfig.java
index a94f46e..9dd3f94 100644
--- a/fe/src/main/java/org/apache/impala/service/BackendConfig.java
+++ b/fe/src/main/java/org/apache/impala/service/BackendConfig.java
@@ -83,6 +83,10 @@ public class BackendConfig {
 
   public long getMinBufferSize() { return backendCfg_.min_buffer_size; }
 
+  public boolean isAuthorizedProxyGroupEnabled() {
+    return !Strings.isNullOrEmpty(backendCfg_.authorized_proxy_group_config);
+  }
+
   // Inits the auth_to_local configuration in the static KerberosName class.
   private static void initAuthToLocal() {
     // If auth_to_local is enabled, we read the configuration hadoop.security.auth_to_local
diff --git a/fe/src/main/java/org/apache/impala/service/JniFrontend.java b/fe/src/main/java/org/apache/impala/service/JniFrontend.java
index ea3b358..a343182 100644
--- a/fe/src/main/java/org/apache/impala/service/JniFrontend.java
+++ b/fe/src/main/java/org/apache/impala/service/JniFrontend.java
@@ -19,14 +19,15 @@ package org.apache.impala.service;
 
 import java.io.File;
 import java.io.IOException;
-import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Enumeration;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
+import com.google.common.annotations.VisibleForTesting;
 import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
 import org.apache.hadoop.fs.CommonConfigurationKeysPublic;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
@@ -34,6 +35,11 @@ import org.apache.hadoop.fs.s3a.S3AFileSystem;
 import org.apache.hadoop.fs.adl.AdlFileSystem;
 import org.apache.hadoop.hdfs.DFSConfigKeys;
 import org.apache.hadoop.hdfs.DistributedFileSystem;
+import org.apache.hadoop.security.Groups;
+import org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
+import org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMappingWithFallback;
+import org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
+import org.apache.hadoop.security.ShellBasedUnixGroupsNetgroupMapping;
 import org.apache.impala.analysis.DescriptorTable;
 import org.apache.impala.analysis.ToSqlUtils;
 import org.apache.impala.authorization.AuthorizationConfig;
@@ -69,6 +75,8 @@ import org.apache.impala.thrift.TGetFunctionsParams;
 import org.apache.impala.thrift.TGetFunctionsResult;
 import org.apache.impala.thrift.TGetHadoopConfigRequest;
 import org.apache.impala.thrift.TGetHadoopConfigResponse;
+import org.apache.impala.thrift.TGetHadoopGroupsRequest;
+import org.apache.impala.thrift.TGetHadoopGroupsResponse;
 import org.apache.impala.thrift.TGetTablesParams;
 import org.apache.impala.thrift.TGetTablesResult;
 import org.apache.impala.thrift.TLoadDataReq;
@@ -569,6 +577,7 @@ public class JniFrontend {
 
   // Caching this saves ~50ms per call to getHadoopConfigAsHtml
   private static final Configuration CONF = new Configuration();
+  private static final Groups GROUPS = Groups.getUserToGroupsMappingService(CONF);
 
   /**
    * Returns a string of all loaded Hadoop configuration parameters as a table of keys
@@ -608,14 +617,68 @@ public class JniFrontend {
   }
 
   /**
+   * Returns the list of Hadoop groups for the given user name.
+   */
+  public byte[] getHadoopGroups(byte[] serializedRequest) throws ImpalaException {
+    TGetHadoopGroupsRequest request = new TGetHadoopGroupsRequest();
+    JniUtil.deserializeThrift(protocolFactory_, request, serializedRequest);
+    TGetHadoopGroupsResponse result = new TGetHadoopGroupsResponse();
+    try {
+      result.setGroups(GROUPS.getGroups(request.getUser()));
+    } catch (IOException e) {
+      // HACK: https://issues.apache.org/jira/browse/HADOOP-15505
+      // There is no easy way to know if no groups found for a user
+      // other than reading the exception message.
+      if (e.getMessage().startsWith("No groups found for user")) {
+        result.setGroups(Collections.<String>emptyList());
+      } else {
+        LOG.error("Error getting Hadoop groups for user: " + request.getUser(), e);
+        throw new InternalException(e.getMessage());
+      }
+    }
+    TSerializer serializer = new TSerializer(protocolFactory_);
+    try {
+      return serializer.serialize(result);
+    } catch (TException e) {
+      throw new InternalException(e.getMessage());
+    }
+  }
+
+  /**
+   * Returns an error string describing configuration issue with the groups mapping
+   * provider implementation.
+   */
+  @VisibleForTesting
+  protected static String checkGroupsMappingProvider(Configuration conf) {
+    String provider = conf.get(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING);
+    // Shell-based groups mapping providers fork a new process for each call.
+    // This can cause issues such as zombie processes, running out of file descriptors,
+    // etc.
+    if (ShellBasedUnixGroupsNetgroupMapping.class.getName().equals(provider)) {
+      return String.format("Hadoop groups mapping provider: %s is " +
+          "known to be problematic. Consider using: %s instead.",
+          provider, JniBasedUnixGroupsNetgroupMappingWithFallback.class.getName());
+    }
+    if (ShellBasedUnixGroupsMapping.class.getName().equals(provider)) {
+      return String.format("Hadoop groups mapping provider: %s is " +
+          "known to be problematic. Consider using: %s instead.",
+          provider, JniBasedUnixGroupsMappingWithFallback.class.getName());
+    }
+    return "";
+  }
+
+  /**
    * Returns an error string describing all configuration issues. If no config issues are
    * found, returns an empty string.
    */
-  public String checkConfiguration() {
+  public String checkConfiguration() throws ImpalaException {
     StringBuilder output = new StringBuilder();
     output.append(checkLogFilePermission());
     output.append(checkFileSystem(CONF));
     output.append(checkShortCircuitRead(CONF));
+    if (BackendConfig.INSTANCE.isAuthorizedProxyGroupEnabled()) {
+      output.append(checkGroupsMappingProvider(CONF));
+    }
     return output.toString();
   }
 
diff --git a/fe/src/test/java/org/apache/impala/service/JniFrontendTest.java b/fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
new file mode 100644
index 0000000..b4f8dd8
--- /dev/null
+++ b/fe/src/test/java/org/apache/impala/service/JniFrontendTest.java
@@ -0,0 +1,58 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.service;
+
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.CommonConfigurationKeys;
+import org.apache.hadoop.security.JniBasedUnixGroupsMappingWithFallback;
+import org.apache.hadoop.security.JniBasedUnixGroupsNetgroupMappingWithFallback;
+import org.apache.hadoop.security.ShellBasedUnixGroupsMapping;
+import org.apache.hadoop.security.ShellBasedUnixGroupsNetgroupMapping;
+import org.apache.impala.common.ImpalaException;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class JniFrontendTest {
+  @Test
+  public void testCheckGroupsMappingProvider() throws ImpalaException {
+    Configuration conf = new Configuration();
+    conf.set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING,
+        JniBasedUnixGroupsMappingWithFallback.class.getName());
+    assertTrue(JniFrontend.checkGroupsMappingProvider(conf).isEmpty());
+
+    conf = new Configuration();
+    conf.set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING,
+        ShellBasedUnixGroupsMapping.class.getName());
+    assertEquals(JniFrontend.checkGroupsMappingProvider(conf),
+        String.format("Hadoop groups mapping provider: %s is known to be problematic. " +
+            "Consider using: %s instead.",
+            ShellBasedUnixGroupsMapping.class.getName(),
+            JniBasedUnixGroupsMappingWithFallback.class.getName()));
+
+    conf = new Configuration();
+    conf.set(CommonConfigurationKeys.HADOOP_SECURITY_GROUP_MAPPING,
+        ShellBasedUnixGroupsNetgroupMapping.class.getName());
+    assertEquals(JniFrontend.checkGroupsMappingProvider(conf),
+        String.format("Hadoop groups mapping provider: %s is known to be problematic. " +
+            "Consider using: %s instead.",
+            ShellBasedUnixGroupsNetgroupMapping.class.getName(),
+            JniBasedUnixGroupsNetgroupMappingWithFallback.class.getName()));
+  }
+}
\ No newline at end of file
diff --git a/tests/authorization/test_authorization.py b/tests/authorization/test_authorization.py
index 0ae0c76..5b4bccd 100644
--- a/tests/authorization/test_authorization.py
+++ b/tests/authorization/test_authorization.py
@@ -22,6 +22,7 @@ import pytest
 import shutil
 import tempfile
 import json
+import grp
 from time import sleep, time
 from getpass import getuser
 from ImpalaService import ImpalaHiveServer2Service
@@ -35,6 +36,9 @@ from tests.util.filesystem_utils import WAREHOUSE
 
 AUTH_POLICY_FILE = "%s/authz-policy.ini" % WAREHOUSE
 
+def get_groups():
+  return [grp.getgrgid(group).gr_name for group in os.getgroups()]
+
 class TestAuthorization(CustomClusterTestSuite):
   AUDIT_LOG_DIR = tempfile.mkdtemp(dir=os.getenv('LOG_DIR'))
 
@@ -174,10 +178,42 @@ class TestAuthorization(CustomClusterTestSuite):
   @pytest.mark.execute_serially
   @CustomClusterTestSuite.with_args("--server_name=server1\
       --authorization_policy_file=%s\
-      --authorized_proxy_user_config=hue=%s\
+      --authorized_proxy_user_config=foo=bar;hue=%s\
       --abort_on_failed_audit_event=false\
       --audit_event_log_dir=%s" % (AUTH_POLICY_FILE, getuser(), AUDIT_LOG_DIR))
-  def test_impersonation(self):
+  def test_user_impersonation(self):
+    """End-to-end user impersonation + authorization test"""
+    self.__test_impersonation()
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args("--server_name=server1\
+        --authorization_policy_file=%s\
+        --authorized_proxy_user_config=hue=bar\
+        --authorized_proxy_group_config=foo=bar;hue=%s\
+        --abort_on_failed_audit_event=false\
+        --audit_event_log_dir=%s" % (AUTH_POLICY_FILE,
+                                     ','.join(get_groups()),
+                                     AUDIT_LOG_DIR))
+  def test_group_impersonation(self):
+    """End-to-end group impersonation + authorization test"""
+    self.__test_impersonation()
+
+  @pytest.mark.execute_serially
+  @CustomClusterTestSuite.with_args("--server_name=server1\
+        --authorization_policy_file=%s\
+        --authorized_proxy_user_config=foo=bar\
+        --authorized_proxy_group_config=foo=bar\
+        --abort_on_failed_audit_event=false\
+        --audit_event_log_dir=%s" % (AUTH_POLICY_FILE, AUDIT_LOG_DIR))
+  def test_no_matching_user_and_group_impersonation(self):
+    open_session_req = TCLIService.TOpenSessionReq()
+    open_session_req.username = 'hue'
+    open_session_req.configuration = dict()
+    open_session_req.configuration['impala.doas.user'] = 'abc'
+    resp = self.hs2_client.OpenSession(open_session_req)
+    assert 'User \'hue\' is not authorized to delegate to \'abc\'' in str(resp)
+
+  def __test_impersonation(self):
     """End-to-end impersonation + authorization test. Expects authorization to be
     configured before running this test"""
     # TODO: To reuse the HS2 utility code from the TestHS2 test suite we need to import


[impala] 07/11: IMPALA-7128 (part 2): add an interface for data sources

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 85906dd6ac141c1015d1abebef1228c9b8b66d90
Author: Todd Lipcon <to...@cloudera.com>
AuthorDate: Wed Jun 6 10:30:34 2018 -0700

    IMPALA-7128 (part 2): add an interface for data sources
    
    This changes most of the usage of DataSource and DataSourceTable to use
    interfaces instead of implementation classes. There are still various
    usages of the implementation for functionality like creating and
    dropping data sources. We'll address those as part of IMPALA-7131 at a
    later date.
    
    Change-Id: Ibe704197dc2ad7c09b8340865f17567096aa630e
    Reviewed-on: http://gerrit.cloudera.org:8080/10626
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 .../org/apache/impala/analysis/AlterTableStmt.java |  3 +-
 .../java/org/apache/impala/analysis/Analyzer.java  |  3 +-
 .../impala/analysis/CreateTableDataSrcStmt.java    |  4 +--
 .../org/apache/impala/analysis/PrivilegeSpec.java  |  4 +--
 .../java/org/apache/impala/catalog/DataSource.java |  8 ++---
 .../org/apache/impala/catalog/DataSourceTable.java | 18 ++++++----
 .../java/org/apache/impala/catalog/FeCatalog.java  |  5 ++-
 .../org/apache/impala/catalog/FeDataSource.java    | 31 +++++++++++++++++
 .../apache/impala/catalog/FeDataSourceTable.java   | 39 ++++++++++++++++++++++
 .../apache/impala/planner/DataSourceScanNode.java  |  6 ++--
 .../apache/impala/planner/SingleNodePlanner.java   |  4 +--
 .../apache/impala/service/CatalogOpExecutor.java   |  2 ++
 .../java/org/apache/impala/service/Frontend.java   | 10 +++---
 .../org/apache/impala/service/JniFrontend.java     |  5 +--
 14 files changed, 110 insertions(+), 32 deletions(-)

diff --git a/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java b/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
index eda260f..a173975 100644
--- a/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/AlterTableStmt.java
@@ -21,6 +21,7 @@ import java.util.List;
 
 import org.apache.impala.authorization.Privilege;
 import org.apache.impala.catalog.DataSourceTable;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.catalog.FeTable;
 import org.apache.impala.common.AnalysisException;
 import org.apache.impala.thrift.TAlterTableParams;
@@ -90,7 +91,7 @@ public abstract class AlterTableStmt extends StatementBase {
     }
     Preconditions.checkState(tableRef instanceof BaseTableRef);
     table_ = tableRef.getTable();
-    if (table_ instanceof DataSourceTable
+    if (table_ instanceof FeDataSourceTable
         && !(this instanceof AlterTableSetColumnStats)) {
       throw new AnalysisException(String.format(
           "ALTER TABLE not allowed on a table produced by a data source: %s",
diff --git a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
index f0f7334..67b9f59 100644
--- a/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
+++ b/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
@@ -40,6 +40,7 @@ import org.apache.impala.catalog.Column;
 import org.apache.impala.catalog.DataSourceTable;
 import org.apache.impala.catalog.DatabaseNotFoundException;
 import org.apache.impala.catalog.FeCatalog;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.catalog.FeDb;
 import org.apache.impala.catalog.FeFsTable;
 import org.apache.impala.catalog.FeTable;
@@ -589,7 +590,7 @@ public class Analyzer {
       Preconditions.checkState(table instanceof FeFsTable ||
           table instanceof KuduTable ||
           table instanceof HBaseTable ||
-          table instanceof DataSourceTable);
+          table instanceof FeDataSourceTable);
       return new BaseTableRef(tableRef, resolvedPath);
     } else {
       return new CollectionTableRef(tableRef, resolvedPath);
diff --git a/fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java b/fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
index 1df8280..00fd4b6 100644
--- a/fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
+++ b/fe/src/main/java/org/apache/impala/analysis/CreateTableDataSrcStmt.java
@@ -24,8 +24,8 @@ import static org.apache.impala.catalog.DataSourceTable.TBL_PROP_INIT_STRING;
 import static org.apache.impala.catalog.DataSourceTable.TBL_PROP_LOCATION;
 
 import org.apache.impala.authorization.Privilege;
-import org.apache.impala.catalog.DataSource;
 import org.apache.impala.catalog.DataSourceTable;
+import org.apache.impala.catalog.FeDataSource;
 import org.apache.impala.common.AnalysisException;
 import com.google.common.base.Preconditions;
 import com.google.common.base.Strings;
@@ -51,7 +51,7 @@ public class CreateTableDataSrcStmt extends CreateTableStmt {
   public void analyze(Analyzer analyzer) throws AnalysisException {
     super.analyze(analyzer);
     String dataSourceName = getTblProperties().get(TBL_PROP_DATA_SRC_NAME);
-    DataSource dataSource = analyzer.getCatalog().getDataSource(dataSourceName);
+    FeDataSource dataSource = analyzer.getCatalog().getDataSource(dataSourceName);
     if (dataSource == null) {
       throw new AnalysisException("Data source does not exist: " + dataSourceName);
     }
diff --git a/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java b/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
index 610ffa9..fb75398 100644
--- a/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
+++ b/fe/src/main/java/org/apache/impala/analysis/PrivilegeSpec.java
@@ -20,7 +20,7 @@ package org.apache.impala.analysis;
 import java.util.List;
 
 import org.apache.impala.authorization.Privilege;
-import org.apache.impala.catalog.DataSourceTable;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.catalog.FeTable;
 import org.apache.impala.catalog.FeView;
 import org.apache.impala.catalog.RolePrivilege;
@@ -242,7 +242,7 @@ public class PrivilegeSpec implements ParseNode {
       throw new AnalysisException("Column-level privileges on views are not " +
           "supported.");
     }
-    if (table instanceof DataSourceTable) {
+    if (table instanceof FeDataSourceTable) {
       throw new AnalysisException("Column-level privileges on external data " +
           "source tables are not supported.");
     }
diff --git a/fe/src/main/java/org/apache/impala/catalog/DataSource.java b/fe/src/main/java/org/apache/impala/catalog/DataSource.java
index f59f3be..527b187 100644
--- a/fe/src/main/java/org/apache/impala/catalog/DataSource.java
+++ b/fe/src/main/java/org/apache/impala/catalog/DataSource.java
@@ -17,8 +17,6 @@
 
 package org.apache.impala.catalog;
 
-import org.apache.hadoop.fs.Path;
-
 import org.apache.impala.thrift.TCatalogObject;
 import org.apache.impala.thrift.TCatalogObjectType;
 import org.apache.impala.thrift.TDataSource;
@@ -28,7 +26,7 @@ import com.google.common.base.Objects;
  * Represents a data source in the catalog. Contains the data source name and all
  * information needed to locate and load the data source.
  */
-public class DataSource extends CatalogObjectImpl {
+public class DataSource extends CatalogObjectImpl implements FeDataSource {
   private final String dataSrcName_;
   private final String className_;
   private final String apiVersionString_;
@@ -57,9 +55,11 @@ public class DataSource extends CatalogObjectImpl {
   public String getName() { return dataSrcName_; }
   @Override
   public String getUniqueName() { return "DATA_SOURCE:" + dataSrcName_.toLowerCase(); }
-
+  @Override // FeDataSource
   public String getLocation() { return location_; }
+  @Override // FeDataSource
   public String getClassName() { return className_; }
+  @Override // FeDataSource
   public String getApiVersion() { return apiVersionString_; }
 
   public TDataSource toThrift() {
diff --git a/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java b/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
index e9e26b6..3ed5b54 100644
--- a/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
+++ b/fe/src/main/java/org/apache/impala/catalog/DataSourceTable.java
@@ -39,14 +39,14 @@ import org.apache.impala.util.TResultRowBuilder;
 import com.google.common.base.Preconditions;
 
 /**
- * Represents a table backed by an external data source. All data source properties are
- * stored as table properties (persisted in the metastore) because the DataSource catalog
- * object is not persisted so the DataSource catalog object will not exist if the catalog
- * server is restarted, but the table does not need the DataSource catalog object in
- * order to scan the table. Tables that contain the TBL_PROP_DATA_SRC_NAME table
- * parameter are assumed to be backed by an external data source.
+ * All data source properties are stored as table properties (persisted in the
+ * metastore) because the DataSource catalog object is not persisted so the
+ * DataSource catalog object will not exist if the catalog server is restarted,
+ * but the table does not need the DataSource catalog object in order to scan
+ * the table. Tables that contain the TBL_PROP_DATA_SRC_NAME table parameter are
+ * assumed to be backed by an external data source.
  */
-public class DataSourceTable extends Table {
+public class DataSourceTable extends Table implements FeDataSourceTable {
   private final static Logger LOG = LoggerFactory.getLogger(DataSourceTable.class);
 
   /**
@@ -85,13 +85,16 @@ public class DataSourceTable extends Table {
   /**
    * Gets the the data source.
    */
+  @Override // FeDataSourceTable
   public TDataSource getDataSource() { return dataSource_; }
 
   /**
    * Gets the table init string passed to the data source.
    */
+  @Override // FeDataSourceTable
   public String getInitString() { return initString_; }
 
+  @Override // FeDataSourceTable
   public int getNumNodes() { return 1; }
 
   @Override
@@ -210,6 +213,7 @@ public class DataSourceTable extends Table {
    * SHOW TABLE STATS statement. The schema of the returned TResultSet is set
    * inside this method.
    */
+  @Override // FeDataSourceTable
   public TResultSet getTableStats() {
     TResultSet result = new TResultSet();
     TResultSetMetadata resultSchema = new TResultSetMetadata();
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java b/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java
index 57aa6f2..c98f6fc 100644
--- a/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java
+++ b/fe/src/main/java/org/apache/impala/catalog/FeCatalog.java
@@ -56,11 +56,10 @@ public interface FeCatalog {
       List<TPartitionKeyValue> partition_spec) throws CatalogException;
 
   /** @see Catalog#getDataSources(PatternMatcher) */
-  List<DataSource> getDataSources(PatternMatcher createHivePatternMatcher);
+  List<? extends FeDataSource> getDataSources(PatternMatcher createHivePatternMatcher);
 
   /** @see Catalog#getDataSource(String) */
-  // TODO(todd): introduce FeDataSource
-  public DataSource getDataSource(String dataSourceName);
+  public FeDataSource getDataSource(String dataSourceName);
 
   /** @see Catalog#getFunction(Function, Function.CompareMode) */
   // TODO(todd): introduce FeFunction
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeDataSource.java b/fe/src/main/java/org/apache/impala/catalog/FeDataSource.java
new file mode 100644
index 0000000..6c30cca
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/catalog/FeDataSource.java
@@ -0,0 +1,31 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.catalog;
+
+/**
+ * Frontend interface for interacting with data sources.
+ */
+public interface FeDataSource {
+  String getName();
+
+  String getLocation();
+
+  String getClassName();
+
+  String getApiVersion();
+}
diff --git a/fe/src/main/java/org/apache/impala/catalog/FeDataSourceTable.java b/fe/src/main/java/org/apache/impala/catalog/FeDataSourceTable.java
new file mode 100644
index 0000000..a76f38d
--- /dev/null
+++ b/fe/src/main/java/org/apache/impala/catalog/FeDataSourceTable.java
@@ -0,0 +1,39 @@
+// Licensed to the Apache Software Foundation (ASF) under one
+// or more contributor license agreements.  See the NOTICE file
+// distributed with this work for additional information
+// regarding copyright ownership.  The ASF licenses this file
+// to you under the Apache License, Version 2.0 (the
+// "License"); you may not use this file except in compliance
+// with the License.  You may obtain a copy of the License at
+//
+//   http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing,
+// software distributed under the License is distributed on an
+// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+// KIND, either express or implied.  See the License for the
+// specific language governing permissions and limitations
+// under the License.
+
+package org.apache.impala.catalog;
+
+import org.apache.impala.thrift.TDataSource;
+import org.apache.impala.thrift.TResultSet;
+
+/**
+ * Represents a table backed by an external data source.
+ */
+public interface FeDataSourceTable extends FeTable {
+
+  TDataSource getDataSource();
+
+  String getInitString();
+
+  int getNumNodes();
+
+  // TODO(todd): it seems like all FeTables implement this, perhaps
+  // this should just be a method on FeTable and simplify the code
+  // in Frontend.getTableStats?
+  TResultSet getTableStats();
+
+}
diff --git a/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java b/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
index 9175fbb..a41630b 100644
--- a/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
+++ b/fe/src/main/java/org/apache/impala/planner/DataSourceScanNode.java
@@ -33,7 +33,7 @@ import org.apache.impala.analysis.SlotRef;
 import org.apache.impala.analysis.StringLiteral;
 import org.apache.impala.analysis.TupleDescriptor;
 import org.apache.impala.catalog.DataSource;
-import org.apache.impala.catalog.DataSourceTable;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.common.ImpalaException;
 import org.apache.impala.common.InternalException;
 import org.apache.impala.extdatasource.ExternalDataSourceExecutor;
@@ -69,7 +69,7 @@ import com.google.common.collect.Lists;
 public class DataSourceScanNode extends ScanNode {
   private final static Logger LOG = LoggerFactory.getLogger(DataSourceScanNode.class);
   private final TupleDescriptor desc_;
-  private final DataSourceTable table_;
+  private final FeDataSourceTable table_;
 
   // The converted conjuncts_ that were accepted by the data source. A conjunct can
   // be converted if it contains only disjunctive predicates of the form
@@ -87,7 +87,7 @@ public class DataSourceScanNode extends ScanNode {
   public DataSourceScanNode(PlanNodeId id, TupleDescriptor desc, List<Expr> conjuncts) {
     super(id, desc, "SCAN DATA SOURCE");
     desc_ = desc;
-    table_ = (DataSourceTable) desc_.getTable();
+    table_ = (FeDataSourceTable) desc_.getTable();
     conjuncts_ = conjuncts;
     acceptedPredicates_ = null;
     acceptedConjuncts_ = null;
diff --git a/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java b/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
index 3a1c956..331c910 100644
--- a/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
+++ b/fe/src/main/java/org/apache/impala/planner/SingleNodePlanner.java
@@ -53,7 +53,7 @@ import org.apache.impala.analysis.TupleIsNullPredicate;
 import org.apache.impala.analysis.UnionStmt;
 import org.apache.impala.analysis.UnionStmt.UnionOperand;
 import org.apache.impala.catalog.ColumnStats;
-import org.apache.impala.catalog.DataSourceTable;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.catalog.FeFsPartition;
 import org.apache.impala.catalog.FeFsTable;
 import org.apache.impala.catalog.FeTable;
@@ -1305,7 +1305,7 @@ public class SingleNodePlanner {
     FeTable table = tblRef.getTable();
     if (table instanceof FeFsTable) {
       return createHdfsScanPlan(tblRef, aggInfo, conjuncts, analyzer);
-    } else if (table instanceof DataSourceTable) {
+    } else if (table instanceof FeDataSourceTable) {
       scanNode = new DataSourceScanNode(ctx_.getNextNodeId(), tblRef.getDesc(),
           conjuncts);
       scanNode.init(analyzer);
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index 80ebd19..2d83bab 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -1113,6 +1113,7 @@ public class CatalogOpExecutor {
 
   private void createDataSource(TCreateDataSourceParams params, TDdlExecResponse resp)
       throws ImpalaException {
+    // TODO(IMPALA-7131): support data sources with LocalCatalog.
     if (LOG.isTraceEnabled()) { LOG.trace("Adding DATA SOURCE: " + params.toString()); }
     DataSource dataSource = DataSource.fromThrift(params.getData_source());
     DataSource existingDataSource = catalog_.getDataSource(dataSource.getName());
@@ -1134,6 +1135,7 @@ public class CatalogOpExecutor {
 
   private void dropDataSource(TDropDataSourceParams params, TDdlExecResponse resp)
       throws ImpalaException {
+    // TODO(IMPALA-7131): support data sources with LocalCatalog.
     if (LOG.isTraceEnabled()) LOG.trace("Drop DATA SOURCE: " + params.toString());
     DataSource dataSource = catalog_.removeDataSource(params.getData_source());
     if (dataSource == null) {
diff --git a/fe/src/main/java/org/apache/impala/service/Frontend.java b/fe/src/main/java/org/apache/impala/service/Frontend.java
index 54a9473..ad51286 100644
--- a/fe/src/main/java/org/apache/impala/service/Frontend.java
+++ b/fe/src/main/java/org/apache/impala/service/Frontend.java
@@ -75,10 +75,10 @@ import org.apache.impala.authorization.User;
 import org.apache.impala.catalog.Catalog;
 import org.apache.impala.catalog.CatalogException;
 import org.apache.impala.catalog.Column;
-import org.apache.impala.catalog.DataSource;
-import org.apache.impala.catalog.DataSourceTable;
 import org.apache.impala.catalog.DatabaseNotFoundException;
 import org.apache.impala.catalog.Db;
+import org.apache.impala.catalog.FeDataSource;
+import org.apache.impala.catalog.FeDataSourceTable;
 import org.apache.impala.catalog.FeDb;
 import org.apache.impala.catalog.FeFsTable;
 import org.apache.impala.catalog.Function;
@@ -691,7 +691,7 @@ public class Frontend {
    * Returns all data sources that match the pattern. If pattern is null,
    * matches all data sources.
    */
-  public List<DataSource> getDataSrcs(String pattern) {
+  public List<? extends FeDataSource> getDataSrcs(String pattern) {
     return impaladCatalog_.get().getDataSources(
         PatternMatcher.createHivePatternMatcher(pattern));
   }
@@ -734,8 +734,8 @@ public class Frontend {
       return ((FeFsTable) table).getTableStats();
     } else if (table instanceof HBaseTable) {
       return ((HBaseTable) table).getTableStats();
-    } else if (table instanceof DataSourceTable) {
-      return ((DataSourceTable) table).getTableStats();
+    } else if (table instanceof FeDataSourceTable) {
+      return ((FeDataSourceTable) table).getTableStats();
     } else if (table instanceof KuduTable) {
       if (op == TShowStatsOp.RANGE_PARTITIONS) {
         return ((KuduTable) table).getRangePartitions();
diff --git a/fe/src/main/java/org/apache/impala/service/JniFrontend.java b/fe/src/main/java/org/apache/impala/service/JniFrontend.java
index a343182..d8ef912 100644
--- a/fe/src/main/java/org/apache/impala/service/JniFrontend.java
+++ b/fe/src/main/java/org/apache/impala/service/JniFrontend.java
@@ -46,6 +46,7 @@ import org.apache.impala.authorization.AuthorizationConfig;
 import org.apache.impala.authorization.ImpalaInternalAdminUser;
 import org.apache.impala.authorization.User;
 import org.apache.impala.catalog.DataSource;
+import org.apache.impala.catalog.FeDataSource;
 import org.apache.impala.catalog.FeDb;
 import org.apache.impala.catalog.Function;
 import org.apache.impala.catalog.Role;
@@ -319,12 +320,12 @@ public class JniFrontend {
     JniUtil.deserializeThrift(protocolFactory_, params, thriftParams);
 
     TGetDataSrcsResult result = new TGetDataSrcsResult();
-    List<DataSource> dataSources = frontend_.getDataSrcs(params.pattern);
+    List<? extends FeDataSource> dataSources = frontend_.getDataSrcs(params.pattern);
     result.setData_src_names(Lists.<String>newArrayListWithCapacity(dataSources.size()));
     result.setLocations(Lists.<String>newArrayListWithCapacity(dataSources.size()));
     result.setClass_names(Lists.<String>newArrayListWithCapacity(dataSources.size()));
     result.setApi_versions(Lists.<String>newArrayListWithCapacity(dataSources.size()));
-    for (DataSource dataSource: dataSources) {
+    for (FeDataSource dataSource: dataSources) {
       result.addToData_src_names(dataSource.getName());
       result.addToLocations(dataSource.getLocation());
       result.addToClass_names(dataSource.getClassName());


[impala] 10/11: IMPALA-5604: document DISABLE_CODEGEN_ROWS_THRESHOLD

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit fb8332ccd0fe6b44f0876df4fb78f0bea0322357
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Thu Jun 14 16:13:58 2018 -0700

    IMPALA-5604: document DISABLE_CODEGEN_ROWS_THRESHOLD
    
    Also fix a couple of nits in EXEC_SINGLE_NODE_ROWS_THRESHOLD.
    
    Change-Id: I709cd55e3869888feb645f85e61a99901d41d479
    Reviewed-on: http://gerrit.cloudera.org:8080/10727
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/impala.ditamap                                |  1 +
 docs/impala_keydefs.ditamap                        |  1 +
 .../impala_disable_codegen_rows_threshold.xml      | 97 ++++++++++++++++++++++
 .../impala_exec_single_node_rows_threshold.xml     | 11 +--
 4 files changed, 105 insertions(+), 5 deletions(-)

diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index b91ad74..a730559 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -185,6 +185,7 @@ under the License.
           <topicref rev="2.10.0 IMPALA-3200" href="topics/impala_default_spillable_buffer_size.xml"/>
           <topicref audience="hidden" href="topics/impala_disable_cached_reads.xml"/>
           <topicref href="topics/impala_disable_codegen.xml"/>
+          <topicref rev="2.10.0 IMPALA-5483" href="topics/impala_disable_codegen_rows_threshold.xml"/>
           <topicref audience="hidden" href="topics/impala_disable_outermost_topn.xml"/>
           <topicref rev="2.5.0" href="topics/impala_disable_row_runtime_filtering.xml"/>
           <topicref rev="2.5.0" href="topics/impala_disable_streaming_preaggregations.xml"/>
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 200bf79..8340382 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10784,6 +10784,7 @@ under the License.
   <keydef rev="2.10.0 IMPALA-3200" href="topics/impala_default_spillable_buffer_size.xml" keys="default_spillable_buffer_size"/>
   <keydef href="topics/impala_disable_cached_reads.xml" keys="disable_cached_reads"/>
   <keydef href="topics/impala_disable_codegen.xml" keys="disable_codegen"/>
+  <keydef href="topics/impala_disable_codegen_rows_threshold.xml" keys="disable_codegen_rows_threshold"/>
   <keydef href="topics/impala_disable_outermost_topn.xml" keys="disable_outermost_topn"/>
   <keydef href="topics/impala_disable_row_runtime_filtering.xml" keys="disable_row_runtime_filtering"/>
   <keydef href="topics/impala_disable_streaming_preaggregations.xml" keys="disable_streaming_preaggregations"/>
diff --git a/docs/topics/impala_disable_codegen_rows_threshold.xml b/docs/topics/impala_disable_codegen_rows_threshold.xml
new file mode 100644
index 0000000..b16a691
--- /dev/null
+++ b/docs/topics/impala_disable_codegen_rows_threshold.xml
@@ -0,0 +1,97 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept rev="2.0.0" id="exec_single_node_rows_threshold">
+
+  <title>DISABLE_CODEGEN_ROWS_THRESHOLD Query Option (<keyword keyref="impala210_full"/> or higher only)</title>
+  <titlealts audience="PDF"><navtitle>DISABLE_CODEGEN_ROWS_THRESHOLD</navtitle></titlealts>
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+      <data name="Category" value="Scalability"/>
+      <data name="Category" value="Performance"/>
+      <data name="Category" value="Developers"/>
+      <data name="Category" value="Data Analysts"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p rev="2.0.0">
+      <indexterm audience="hidden">DISABLE_CODEGEN_ROWS_THRESHOLD query option</indexterm>
+      This setting controls the cutoff point (in terms of number of rows processed per Impala daemon) below which
+      Impala disables native code generation for the whole query.
+
+      Native code generation is very beneficial for queries that process many rows because
+      it reduces the time taken to process of each row. However, generating the native code
+      adds latency to query startup. Therefore, automatically disabling codegen for
+      queries that process relatively small amounts of data can improve query response time.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
+
+<codeblock>SET DISABLE_CODEGEN_ROWS_THRESHOLD=<varname>number_of_rows</varname></codeblock>
+
+    <p>
+      <b>Type:</b> numeric
+    </p>
+
+    <p>
+      <b>Default:</b> 50000
+    </p>
+
+    <p>
+      <b>Usage notes:</b> Typically, you increase the default value to make this optimization apply to more queries.
+      If incorrect or corrupted table and column statistics cause Impala to apply this optimization incorrectly to
+      queries that actually involve substantial work, you might see the queries being slower as a result of codegen
+      being disabled. In that case, recompute statistics with the <codeph>COMPUTE STATS</codeph> or
+      <codeph>COMPUTE INCREMENTAL STATS</codeph> statement. If there is a problem collecting accurate statistics,
+      you can turn this feature off by setting the value to 0.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/internals_blurb"/>
+
+    <p>
+      This setting applies to queries where the number of rows processed can be accurately
+      determined, either through table and column statistics, or by the presence of a
+      <codeph>LIMIT</codeph> clause. If Impala cannot accurately estimate the number of rows,
+      then this setting does not apply.
+    </p>
+
+    <p rev="2.3.0">
+      If a query uses the complex data types <codeph>STRUCT</codeph>, <codeph>ARRAY</codeph>,
+      or <codeph>MAP</codeph>, then codegen is never automatically disabled regardless of the
+      <codeph>DISABLE_CODEGEN_ROWS_THRESHOLD</codeph> setting.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/added_in_2100"/>
+
+<!-- Don't have any other places that tie into this particular optimization technique yet.
+Potentially: conceptual topics about code generation, distributed queries
+
+<p conref="../shared/impala_common.xml#common/related_info"/>
+<p>
+</p>
+-->
+
+  </conbody>
+
+</concept>
diff --git a/docs/topics/impala_exec_single_node_rows_threshold.xml b/docs/topics/impala_exec_single_node_rows_threshold.xml
index 4822712..62f7988 100644
--- a/docs/topics/impala_exec_single_node_rows_threshold.xml
+++ b/docs/topics/impala_exec_single_node_rows_threshold.xml
@@ -41,8 +41,8 @@ under the License.
       as a <q>small</q> query, turning off optimizations such as parallel execution and native code generation. The
       overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
       makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
-      allows Impala to complete them more quickly, keeping YARN resources, admission control slots, and so on
-      available for data-intensive queries.
+      allows Impala to complete them more quickly, keeping admission control slots, CPU, memory, and so on
+      available for resource-intensive queries.
     </p>
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
@@ -69,9 +69,10 @@ under the License.
     <p conref="../shared/impala_common.xml#common/internals_blurb"/>
 
     <p>
-      This setting applies to query fragments where the amount of data to scan can be accurately determined, either
-      through table and column statistics, or by the presence of a <codeph>LIMIT</codeph> clause. If Impala cannot
-      accurately estimate the size of the input data, this setting does not apply.
+      This setting applies to queries where the number of rows processed can be accurately
+      determined, either through table and column statistics, or by the presence of a
+      <codeph>LIMIT</codeph> clause. If Impala cannot accurately estimate the number of rows,
+      then this setting does not apply.
     </p>
 
     <p rev="2.3.0">


[impala] 01/11: IMPALA-2195: Improper handling of comments in queries

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 4fe01b280e1dcf0b6e99578ddc581f230444d1ad
Author: Fredy Wijaya <fw...@cloudera.com>
AuthorDate: Wed Apr 4 19:09:14 2018 -0700

    IMPALA-2195: Improper handling of comments in queries
    
    This patch fixes an issue where parseline is unable to deduce the
    correct command when a statement has a leading comment.
    
    Before:
    > -- comment
    > insert into table t values(100);
    Fetched 1 row(s) in 0.01s
    
    After:
    > -- comment
    > insert into table t values(100);
    Modified 1 row(s) in 0.01s
    
    Before (FE syntax error):
    > /*comment*/ help;
    
    After (show help correctly):
    > /*comment*/ help;
    
    Testing:
    - Added shell tests
    - Ran end-to-end shell tests on Python 2.6 and Python 2.7
    
    Change-Id: I7ac7cb5a30e6dda73ebe761d9f0eb9ba038e14a7
    Reviewed-on: http://gerrit.cloudera.org:8080/9933
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
    Reviewed-by: Bharath Vissapragada <bh...@cloudera.com>
---
 shell/impala_shell.py                 |  82 ++++++++++++++++++++++++---
 tests/shell/test_shell_interactive.py | 103 ++++++++++++++++++++++++++++++++--
 2 files changed, 173 insertions(+), 12 deletions(-)

diff --git a/shell/impala_shell.py b/shell/impala_shell.py
index 8cb1442..841457e 100755
--- a/shell/impala_shell.py
+++ b/shell/impala_shell.py
@@ -312,11 +312,15 @@ class ImpalaShell(object, cmd.Cmd):
     command = self.orig_cmd
     self.orig_cmd = None
     if not command:
-      print_to_stderr("Unexpected error: Failed to execute query due to command "\
+      print_to_stderr("Unexpected error: Failed to execute query due to command "
                       "is missing")
       sys.exit(1)
-    return self.imp_client.create_beeswax_query("%s %s" % (command, args),
-                                                 self.set_query_options)
+    # In order to deduce the correct cmd, parseline stripped the leading comment.
+    # To preserve the original query, the leading comment (if exists) will be
+    # prepended when constructing the query sent to the Impala front-end.
+    return self.imp_client.create_beeswax_query("%s%s %s" % (self.leading_comment or "",
+                                                command, args),
+                                                self.set_query_options)
 
   def do_shell(self, args):
     """Run a command on the shell
@@ -582,7 +586,7 @@ class ImpalaShell(object, cmd.Cmd):
       # is necessary to find a proper function and here is a right place
       # because the lowering command in front of the finding can avoid a
       # side effect.
-      command, arg, line = self.parseline(line)
+      command, arg, line, leading_comment = self.parseline(line)
       if not line:
         return self.emptyline()
       if command is None:
@@ -596,6 +600,7 @@ class ImpalaShell(object, cmd.Cmd):
         try:
           func = getattr(self, 'do_' + command.lower())
           self.orig_cmd = command
+          self.leading_comment = leading_comment
         except AttributeError:
           return self.default(line)
         return func(arg)
@@ -1281,17 +1286,78 @@ class ImpalaShell(object, cmd.Cmd):
 
   def parseline(self, line):
     """Parse the line into a command name and a string containing
-    the arguments.  Returns a tuple containing (command, args, line).
+    the arguments.  Returns a tuple containing (command, args, line, leading comment).
     'command' and 'args' may be None if the line couldn't be parsed.
     'line' in return tuple is the rewritten original line, with leading
     and trailing space removed and special characters transformed into
-    their aliases.
+    their aliases. If the line contains a leading comment, the leading
+    comment will be removed in order to deduce a 'command' correctly.
+    The 'command' is used to determine which 'do_<command>' function to invoke.
+    The 'do_<command>' implementation can decide whether to retain or ignore the
+    leading comment.
+
+    Examples:
+
+    > /*comment*/ help connect;
+    line: help connect
+    args: connect
+    command: help
+    leading comment: /*comment*/
+
+    > /*comment*/ ? connect;
+    line: help connect
+    args: connect
+    command: help
+    leading comment: /*comment*/
+
+    > /*first comment*/ select /*second comment*/ 1;
+    line: select /*second comment*/ 1
+    args: /*second comment*/ 1
+    command: select
+    leading comment: /*first comment*/
     """
-    line = line.strip()
+    leading_comment, line = ImpalaShell.strip_leading_comment(line.strip())
+    line = line.encode('utf-8')
     if line and line[0] == '@':
       line = 'rerun ' + line[1:]
-    return super(ImpalaShell, self).parseline(line)
+    return super(ImpalaShell, self).parseline(line) + (leading_comment,)
 
+  @staticmethod
+  def strip_leading_comment(sql):
+    """
+    Filter a leading comment in the SQL statement. This function returns a tuple
+    containing (leading comment, line without the leading comment).
+    """
+    class StripLeadingCommentFilter:
+      def __init__(self):
+        self.comment = None
+
+      def _process(self, tlist):
+        token = tlist.token_first()
+        if self._is_comment(token):
+          self.comment = token.value
+          tidx = tlist.token_index(token)
+          tlist.tokens.pop(tidx)
+
+      def _is_comment(self, token):
+        if isinstance(token, sqlparse.sql.Comment):
+          return True
+        for comment in sqlparse.tokens.Comment:
+          if token.ttype == comment:
+            return True
+        return False
+
+      def process(self, stack, stmt):
+        [self.process(stack, sgroup) for sgroup in stmt.get_sublists()]
+        self._process(stmt)
+
+    stack = sqlparse.engine.FilterStack()
+    stack.enable_grouping()
+    strip_leading_comment_filter = StripLeadingCommentFilter()
+    stack.stmtprocess.append(strip_leading_comment_filter)
+    stack.postprocess.append(sqlparse.filters.SerializerUnicode())
+    stripped_line = ''.join(stack.run(sql, 'utf-8'))
+    return strip_leading_comment_filter.comment, stripped_line
 
   def _replace_history_delimiters(self, src_delim, tgt_delim):
     """Replaces source_delim with target_delim for all items in history.
diff --git a/tests/shell/test_shell_interactive.py b/tests/shell/test_shell_interactive.py
index fe41e36..d0308d8 100755
--- a/tests/shell/test_shell_interactive.py
+++ b/tests/shell/test_shell_interactive.py
@@ -367,12 +367,9 @@ class TestImpalaShellInteractive(object):
   @pytest.mark.execute_serially
   def test_zero_row_fetch(self):
     # IMPALA-4418: DROP and USE are generally exceptional statements where
-    # the client does not fetch. However, when preceded by a comment, the
-    # Impala shell treats them like any other statement and will try to
-    # fetch - receiving 0 rows. For statements returning 0 rows we do not
+    # the client does not fetch. For statements returning 0 rows we do not
     # want an empty line in stdout.
     result = run_impala_shell_interactive("-- foo \n use default;")
-    assert "Fetched 0 row(s)" in result.stderr
     assert re.search('> \[', result.stdout)
     result = run_impala_shell_interactive("select * from functional.alltypes limit 0;")
     assert "Fetched 0 row(s)" in result.stderr
@@ -459,6 +456,55 @@ class TestImpalaShellInteractive(object):
       os.chdir(cwd)
 
   @pytest.mark.execute_serially
+  def test_line_with_leading_comment(self):
+    # IMPALA-2195: A line with a comment produces incorrect command.
+    try:
+      run_impala_shell_interactive('drop table if exists leading_comment;')
+      run_impala_shell_interactive('create table leading_comment (i int);')
+      result = run_impala_shell_interactive('-- comment\n'
+                                            'insert into leading_comment values(1);')
+      assert 'Modified 1 row(s)' in result.stderr
+      result = run_impala_shell_interactive('-- comment\n'
+                                            'select * from leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* comment */\n'
+                                            'select * from leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* comment1 */\n'
+                                            '-- comment2\n'
+                                            'select * from leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* comment1\n'
+                                            'comment2 */ select * from leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* select * from leading_comment */ '
+                                            'select * from leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* comment */ help use')
+      assert 'Executes a USE... query' in result.stdout
+
+      result = run_impala_shell_interactive('-- comment\n'
+                                            ' help use;')
+      assert 'Executes a USE... query' in result.stdout
+
+      result = run_impala_shell_interactive('/* comment1 */\n'
+                                            '-- comment2\n'
+                                            'desc leading_comment;')
+      assert 'Fetched 1 row(s)' in result.stderr
+
+      result = run_impala_shell_interactive('/* comment1 */\n'
+                                            '-- comment2\n'
+                                            'help use;')
+      assert 'Executes a USE... query' in result.stdout
+    finally:
+      run_impala_shell_interactive('drop table if exists leading_comment;')
+
+  @pytest.mark.execute_serially
   def test_line_ends_with_comment(self):
     # IMPALA-5269: Test lines that end with a comment.
     queries = ['select 1 + 1; --comment',
@@ -558,6 +604,55 @@ class TestImpalaShellInteractive(object):
     self._expect_with_cmd(proc, "use functional", (), 'functional')
     self._expect_with_cmd(proc, "use foo", (), 'functional')
 
+  def test_strip_leading_comment(self):
+    """Test stripping leading comments from SQL statements"""
+    assert ('--delete\n', 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('--delete\nselect 1')
+    assert ('--delete\n', 'select --do not delete\n1') == \
+        ImpalaShellClass.strip_leading_comment('--delete\nselect --do not delete\n1')
+    assert (None, 'select --do not delete\n1') == \
+        ImpalaShellClass.strip_leading_comment('select --do not delete\n1')
+
+    assert ('/*delete*/\n', 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete*/\nselect 1')
+    assert ('/*delete\nme*/\n', 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete\nme*/\nselect 1')
+    assert ('/*delete\nme*/\n', 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete\nme*/\nselect 1')
+    assert ('/*delete*/', 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete*/select 1')
+    assert ('/*delete*/ ', 'select /*do not delete*/ 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete*/ select /*do not delete*/ 1')
+    assert ('/*delete1*/ \n/*delete2*/ \n--delete3 \n', 'select /*do not delete*/ 1') == \
+        ImpalaShellClass.strip_leading_comment('/*delete1*/ \n'
+                                               '/*delete2*/ \n'
+                                               '--delete3 \n'
+                                               'select /*do not delete*/ 1')
+    assert (None, 'select /*do not delete*/ 1') == \
+        ImpalaShellClass.strip_leading_comment('select /*do not delete*/ 1')
+    assert ('/*delete*/\n', 'select c1 from\n'
+                            'a\n'
+                            'join -- +SHUFFLE\n'
+                            'b') == \
+        ImpalaShellClass.strip_leading_comment('/*delete*/\n'
+                                               'select c1 from\n'
+                                               'a\n'
+                                               'join -- +SHUFFLE\n'
+                                               'b')
+    assert ('/*delete*/\n', 'select c1 from\n'
+                            'a\n'
+                            'join /* +SHUFFLE */\n'
+                            'b') == \
+        ImpalaShellClass.strip_leading_comment('/*delete*/\n'
+                                               'select c1 from\n'
+                                               'a\n'
+                                               'join /* +SHUFFLE */\n'
+                                               'b')
+
+    assert (None, 'select 1') == \
+        ImpalaShellClass.strip_leading_comment('select 1')
+
+
 def run_impala_shell_interactive(input_lines, shell_args=None):
   """Runs a command in the Impala shell interactively."""
   # if argument "input_lines" is a string, makes it into a list


[impala] 02/11: IMPALA-7165: [DOCS] Correct example for dynamic partition pruning

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 3cda4c2ad0b43b1a784bfc3d39977acd17540e6f
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Tue Jun 12 16:18:20 2018 -0700

    IMPALA-7165: [DOCS] Correct example for dynamic partition pruning
    
    Change-Id: I44d1054f55d3dc7947ccf4c2ef440e506c41f963
    Reviewed-on: http://gerrit.cloudera.org:8080/10703
    Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/shared/impala_common.xml            | 72 +++++++++++++++---------------
 docs/topics/impala_partitioning.xml      | 75 +++++++++++++++++++++++---------
 docs/topics/impala_runtime_filtering.xml | 17 +++++---
 3 files changed, 101 insertions(+), 63 deletions(-)

diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 7bb187a..38a7b96 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1798,42 +1798,44 @@ from length_demo;
       </p>
 
 <codeblock id="simple_dpp_example">
-create table yy (s string) partitioned by (year int) stored as parquet;
-insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
-  ('2001', 2001), ('2010',2010);
-compute stats yy;
-
-create table yy2 (s string) partitioned by (year int) stored as parquet;
-insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
-  ('2001', 2001);
-compute stats yy2;
-
--- The query reads an unknown number of partitions, whose key values are only
--- known at run time. The 'runtime filters' lines show how the information about
--- the partitions is calculated in query fragment 02, and then used in query
--- fragment 00 to decide which partitions to skip.
-explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
-+----------------------------------------------------------+
-| Explain String                                           |
-+----------------------------------------------------------+
-| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
-|                                                          |
-| 04:EXCHANGE [UNPARTITIONED]                              |
-| |                                                        |
-| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                 |
-| |  hash predicates: year = year                          |
-| |  <b>runtime filters: RF000 &lt;- year</b>                        |
-| |                                                        |
-| |--03:EXCHANGE [BROADCAST]                               |
-| |  |                                                     |
-| |  01:SCAN HDFS [dpp.yy]                                 |
-| |     partitions=2/4 files=2 size=468B                   |
-| |                                                        |
-| 00:SCAN HDFS [dpp.yy2]                                   |
-|    partitions=2/3 files=2 size=468B                      |
-|    <b>runtime filters: RF000 -> year</b>                        |
-+----------------------------------------------------------+
+CREATE TABLE yy (s STRING) PARTITIONED BY (year INT);
+INSERT INTO yy PARTITION (year) VALUES ('1999', 1999), ('2000', 2000),
+  ('2001', 2001), ('2010', 2010), ('2018', 2018);
+COMPUTE STATS yy;
+
+CREATE TABLE yy2 (s STRING, year INT);
+INSERT INTO yy2 VALUES ('1999', 1999), ('2000', 2000), ('2001', 2001);
+COMPUTE STATS yy2;
+
+-- The following query reads an unknown number of partitions, whose key values
+-- are only known at run time. The 'runtime filters' lines show how the
+-- information about the partitions is calculated in query fragment 02, and then
+-- used in query fragment 00 to decide which partitions to skip.
+
+EXPLAIN SELECT s FROM yy WHERE year IN (SELECT year FROM yy2);
++--------------------------------------------------------------------------+
+| PLAN-ROOT SINK                                                           |
+| |                                                                        |
+| 04:EXCHANGE [UNPARTITIONED]                                              |
+| |                                                                        |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                                 |
+| |  hash predicates: year = year                                          |
+| |  <b>runtime filters: RF000 &lt;- year</b>                              |
+| |                                                                        |
+| |--03:EXCHANGE [BROADCAST]                                               |
+| |  |                                                                     |
+| |  01:SCAN HDFS [default.yy2]                                            |
+| |     partitions=1/1 files=1 size=620B                                   |
+| |                                                                        |
+| 00:SCAN HDFS [default.yy]                                                |
+|    <b>partitions=5/5</b> files=5 size=1.71KB                             |
+|    runtime filters: RF000 -> year                                        |
++--------------------------------------------------------------------------+
+
+SELECT s FROM yy WHERE year IN (SELECT year FROM yy2); -- Returns 3 rows from yy
+PROFILE;
 </codeblock>
+
       <p id="order_by_scratch_dir">
         By default, intermediate files used during large sort, join, aggregation, or analytic function operations
         are stored in the directory <filepath>/tmp/impala-scratch</filepath> . These files are removed when the
diff --git a/docs/topics/impala_partitioning.xml b/docs/topics/impala_partitioning.xml
index c2e36ed..011ff88 100644
--- a/docs/topics/impala_partitioning.xml
+++ b/docs/topics/impala_partitioning.xml
@@ -143,19 +143,27 @@ under the License.
         </li>
 
         <li>
-          <codeph><xref href="impala_alter_table.xml#alter_table">ALTER TABLE</xref></codeph>: you can add or drop partitions, to work with
-          different portions of a huge data set. You can designate the HDFS directory that holds the data files for a specific partition.
-          With data partitioned by date values, you might <q>age out</q> data that is no longer relevant.
-          <note conref="../shared/impala_common.xml#common/add_partition_set_location"/>
+          <codeph><xref href="impala_alter_table.xml#alter_table">ALTER
+              TABLE</xref></codeph>: you can add or drop partitions, to work
+          with different portions of a huge data set. You can designate the HDFS
+          directory that holds the data files for a specific partition. With
+          data partitioned by date values, you might <q>age out</q> data that is
+          no longer relevant. <note
+            conref="../shared/impala_common.xml#common/add_partition_set_location"
+          />
         </li>
 
         <li>
-          <codeph><xref href="impala_insert.xml#insert">INSERT</xref></codeph>: When you insert data into a partitioned table, you identify
-          the partitioning columns. One or more values from each inserted row are not stored in data files, but instead determine the
-          directory where that row value is stored. You can also specify which partition to load a set of data into, with <codeph>INSERT
-          OVERWRITE</codeph> statements; you can replace the contents of a specific partition but you cannot append data to a specific
-          partition.
-          <p rev="1.3.1" conref="../shared/impala_common.xml#common/insert_inherit_permissions"/>
+          <codeph><xref href="impala_insert.xml#insert">INSERT</xref></codeph>:
+          When you insert data into a partitioned table, you identify the
+          partitioning columns. One or more values from each inserted row are
+          not stored in data files, but instead determine the directory where
+          that row value is stored. You can also specify which partition to load
+          a set of data into, with <codeph>INSERT OVERWRITE</codeph> statements;
+          you can replace the contents of a specific partition but you cannot
+          append data to a specific partition. <p rev="1.3.1"
+            conref="../shared/impala_common.xml#common/insert_inherit_permissions"
+          />
         </li>
 
         <li>
@@ -242,7 +250,8 @@ insert into weather <b>partition (year=2014, month=04, day)</b> select 'sunny',2
 
     <conbody>
 
-      <p rev="1.3.1" conref="../shared/impala_common.xml#common/insert_inherit_permissions"/>
+      <p rev="1.3.1"
+        conref="../shared/impala_common.xml#common/insert_inherit_permissions"/>
 
     </conbody>
 
@@ -377,7 +386,8 @@ insert into weather <b>partition (year=2014, month=04, day)</b> select 'sunny',2
 
         <p conref="../shared/impala_common.xml#common/partitions_and_views"/>
 
-        <p conref="../shared/impala_common.xml#common/analytic_partition_pruning_caveat"/>
+        <p
+          conref="../shared/impala_common.xml#common/analytic_partition_pruning_caveat"/>
 
       </conbody>
 
@@ -408,19 +418,38 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
 </codeblock>
 
         <p>
-          Dynamic partition pruning involves using information only available at run time, such as the result of a subquery:
+          Dynamic partition pruning involves using information only available
+          at run time, such as the result of a subquery. The following example
+          shows a simple dynamic partition pruning.
         </p>
 
 <codeblock conref="../shared/impala_common.xml#common/simple_dpp_example"/>
 
-<!-- Former example. Not sure it really would trigger DPP. SELECT COUNT(*) FROM sales_table WHERE year = (SELECT MAX(year) FROM some_other_table); -->
+        <p>
+          In the above example, Impala evaluates the subquery, sends the
+          subquery results to all Impala nodes participating in the query, and
+          then each <cmdname>impalad</cmdname> daemon uses the dynamic partition
+          pruning optimization to read only the partitions with the relevant key
+          values.
+        </p>
 
         <p>
-          In this case, Impala evaluates the subquery, sends the subquery results to all Impala nodes participating in the query, and then
-          each <cmdname>impalad</cmdname> daemon uses the dynamic partition pruning optimization to read only the partitions with the
-          relevant key values.
+          The output query plan from the <codeph>EXPLAIN</codeph> statement
+          shows that runtime filters are enabled. The plan also shows that it
+          expects to read all 5 partitions of the <codeph>yy</codeph> table,
+          indicating that static partition pruning will not happen.
         </p>
 
+        <p>The Filter summary in the <codeph>PROFILE</codeph> output shows that
+          the scan node filtered out based on a runtime filter of dynamic
+          partition pruning. </p>
+
+<codeblock>Filter 0 (1.00 MB):
+ - Files processed: 3
+ - <b>Files rejected: 1 (1)</b>
+ - Files total: 3 (3)
+</codeblock>
+
         <p>
           Dynamic partition pruning is especially effective for queries involving joins of several large partitioned tables. Evaluating the
           <codeph>ON</codeph> clauses of the join predicates might normally require reading data from all partitions of certain tables. If
@@ -429,7 +458,8 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
           and the amount of intermediate data stored and transmitted across the network during the query.
         </p>
 
-        <p conref="../shared/impala_common.xml#common/spill_to_disk_vs_dynamic_partition_pruning"/>
+        <p
+          conref="../shared/impala_common.xml#common/spill_to_disk_vs_dynamic_partition_pruning"/>
 
         <p>
           Dynamic partition pruning is part of the runtime filtering feature, which applies to other kinds of queries in addition to queries
@@ -479,7 +509,8 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
         </li>
 
         <li>
-          <p conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
+          <p
+            conref="../shared/impala_common.xml#common/complex_types_partitioning"/>
         </li>
 
         <li>
@@ -559,7 +590,8 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
         formats.
       </p>
 
-      <note conref="../shared/impala_common.xml#common/add_partition_set_location"/>
+      <note
+        conref="../shared/impala_common.xml#common/add_partition_set_location"/>
 
       <p>
         What happens to the data files when a partition is dropped depends on whether the partitioned table is designated as internal or
@@ -615,7 +647,8 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
 
       <note type="important">
         <p conref="../shared/impala_common.xml#common/cs_or_cis"/>
-        <p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+        <p
+          conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
         <p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
       </note>
 
diff --git a/docs/topics/impala_runtime_filtering.xml b/docs/topics/impala_runtime_filtering.xml
index 05c3743..cce9155 100644
--- a/docs/topics/impala_runtime_filtering.xml
+++ b/docs/topics/impala_runtime_filtering.xml
@@ -344,18 +344,21 @@ under the License.
       </p>
 
       <p>
-        The following example shows a query that uses a single runtime filter (labelled <codeph>RF00</codeph>)
-        to prune the partitions that are scanned in one stage of the query, based on evaluating the
-        result set of a subquery:
+        The following example shows a query that uses a single runtime filter,
+        labeled <codeph>RF000</codeph>, to prune the partitions that are scanned
+        in one stage of the query, based on evaluating the result set of a
+        subquery:
       </p>
 
 <codeblock conref="../shared/impala_common.xml#common/simple_dpp_example"/>
 
       <p>
-        The query profile (displayed by the <codeph>PROFILE</codeph> command in <cmdname>impala-shell</cmdname>)
-        contains both the <codeph>EXPLAIN</codeph> plan and more detailed information about the internal
-        workings of the query. The profile output includes a section labelled the <q>filter routing table</q>,
-        with information about each filter based on its ID.
+        The query profile (displayed by the <codeph>PROFILE</codeph> command
+        in <cmdname>impala-shell</cmdname>) contains both the
+          <codeph>EXPLAIN</codeph> plan and more detailed information about the
+        internal workings of the query. The profile output includes the
+          <codeph>Filter routing table</codeph> section with information about
+        each filter based on its ID.
       </p>
     </conbody>
   </concept>


[impala] 11/11: IMPALA-7111: avoid use of boost::split in CheckPluginEnabled

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit e660f1187fb48316a6cadcb3777786ebc72a1e43
Author: Tim Armstrong <ta...@cloudera.com>
AuthorDate: Wed Jun 13 13:48:48 2018 -0700

    IMPALA-7111: avoid use of boost::split in CheckPluginEnabled
    
    This is an attempt to either avoid the bug or make it easier to diagnose
    if it reoccurs. My suspicion is that somehow boost::split() is
    accessing the input string in a non-thread-safe manner, but the
    implementation is opaque enough that it's not obvious how.
    
    Change-Id: I17b7f083731a33b832035f24900e351e2cb2feb8
    Reviewed-on: http://gerrit.cloudera.org:8080/10709
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/exec/hdfs-plugin-text-scanner.cc |  9 +++------
 be/src/util/string-util-test.cc         | 31 +++++++++++++++++++++++++++++++
 be/src/util/string-util.cc              | 11 +++++++++++
 be/src/util/string-util.h               |  3 +++
 4 files changed, 48 insertions(+), 6 deletions(-)

diff --git a/be/src/exec/hdfs-plugin-text-scanner.cc b/be/src/exec/hdfs-plugin-text-scanner.cc
index d28fab5..b4f7292 100644
--- a/be/src/exec/hdfs-plugin-text-scanner.cc
+++ b/be/src/exec/hdfs-plugin-text-scanner.cc
@@ -27,8 +27,9 @@
 #include "runtime/runtime-state.h"
 #include "runtime/hdfs-fs-cache.h"
 #include "util/debug-util.h"
-#include "util/hdfs-util.h"
 #include "util/dynamic-util.h"
+#include "util/hdfs-util.h"
+#include "util/string-util.h"
 
 #include "common/names.h"
 
@@ -105,11 +106,7 @@ Status HdfsPluginTextScanner::IssueInitialRanges(HdfsScanNodeBase* scan_node,
 }
 
 Status HdfsPluginTextScanner::CheckPluginEnabled(const string& plugin_name) {
-  vector<string> enabled_plugins;
-  boost::split(enabled_plugins, FLAGS_enabled_hdfs_text_scanner_plugins,
-      boost::is_any_of(","));
-  if (find(enabled_plugins.begin(), enabled_plugins.end(), plugin_name)
-      == enabled_plugins.end()) {
+  if (!CommaSeparatedContains(FLAGS_enabled_hdfs_text_scanner_plugins, plugin_name)) {
     return Status(Substitute("Scanner plugin '$0' is not one of the enabled plugins: '$1'",
           plugin_name, FLAGS_enabled_hdfs_text_scanner_plugins));
   }
diff --git a/be/src/util/string-util-test.cc b/be/src/util/string-util-test.cc
index 979eb9f..2c268dd 100644
--- a/be/src/util/string-util-test.cc
+++ b/be/src/util/string-util-test.cc
@@ -79,6 +79,37 @@ TEST(TruncateUpTest, Basic) {
   EvalTruncation(a, b, 2, UP);
 }
 
+TEST(CommaSeparatedContainsTest, Basic) {
+  // Basic tests with string present.
+  EXPECT_TRUE(CommaSeparatedContains("LZO", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains("foo,LZO", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains("LZO,bar", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains("foo,LZO,bar", "LZO"));
+
+  // Handles zero-length entries.
+  EXPECT_FALSE(CommaSeparatedContains("", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains(",", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains(",,", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains("foo,LZO,", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains(",foo,LZO,", "LZO"));
+  EXPECT_TRUE(CommaSeparatedContains(",foo,,LZO,", "LZO"));
+
+  // Basic tests with string absent.
+  EXPECT_FALSE(CommaSeparatedContains("foo,bar", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains("foo", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains("foo,", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains("foo,bar,baz", "LZO"));
+  EXPECT_FALSE(CommaSeparatedContains(",foo,LzO,", "LZO"));
+
+  // Pattern is longer than token.
+  EXPECT_FALSE(CommaSeparatedContains(",foo,LzO,", "ZZZZZ"));
+  // Pattern is longer than string.
+  EXPECT_FALSE(CommaSeparatedContains("foo", "ZZZZZ"));
+
+  // Whitespace is included in tokens alone.
+  EXPECT_FALSE(CommaSeparatedContains("foo , foo, foo,\nfoo,\tfoo", "foo"));
+}
+
 }
 
 IMPALA_TEST_MAIN();
diff --git a/be/src/util/string-util.cc b/be/src/util/string-util.cc
index b6c8fb7..f8e284f 100644
--- a/be/src/util/string-util.cc
+++ b/be/src/util/string-util.cc
@@ -54,4 +54,15 @@ Status TruncateUp(const string& str, int32_t max_length, string* result) {
   return Status::OK();
 }
 
+bool CommaSeparatedContains(const std::string& cs_list, const std::string& item) {
+  size_t pos = 0;
+  while (pos < cs_list.size()) {
+    size_t comma_pos = cs_list.find(",", pos);
+    if (comma_pos == string::npos) return cs_list.compare(pos, string::npos, item) == 0;
+    if (cs_list.compare(pos, comma_pos - pos, item) == 0) return true;
+    pos = comma_pos + 1;
+  }
+  return false;
+}
+
 }
diff --git a/be/src/util/string-util.h b/be/src/util/string-util.h
index 7e7ab12..2811a3c 100644
--- a/be/src/util/string-util.h
+++ b/be/src/util/string-util.h
@@ -37,6 +37,9 @@ Status TruncateDown(const std::string& str, int32_t max_length, std::string* res
 WARN_UNUSED_RESULT
 Status TruncateUp(const std::string& str, int32_t max_length, std::string* result);
 
+/// Return true if the comma-separated string 'cs_list' contains 'item' as one of
+/// the comma-separated values.
+bool CommaSeparatedContains(const std::string& cs_list, const std::string& item);
 }
 
 #endif


[impala] 04/11: [DOCS] Wording changes in DPP examples for clarity

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit 6bfa7202c224ed4360a371a820ae2429ef0b2f0d
Author: Alex Rodoni <ar...@cloudera.com>
AuthorDate: Thu Jun 14 13:13:10 2018 -0700

    [DOCS] Wording changes in DPP examples for clarity
    
    Change-Id: If786fc3d3064b26b213afb685a2f310cebe904fe
    Reviewed-on: http://gerrit.cloudera.org:8080/10718
    Reviewed-by: Alex Rodoni <ar...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 docs/shared/impala_common.xml            | 9 ++++-----
 docs/topics/impala_partitioning.xml      | 6 ++++--
 docs/topics/impala_runtime_filtering.xml | 5 ++---
 3 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 38a7b96..613f5ee 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1808,9 +1808,8 @@ INSERT INTO yy2 VALUES ('1999', 1999), ('2000', 2000), ('2001', 2001);
 COMPUTE STATS yy2;
 
 -- The following query reads an unknown number of partitions, whose key values
--- are only known at run time. The 'runtime filters' lines show how the
--- information about the partitions is calculated in query fragment 02, and then
--- used in query fragment 00 to decide which partitions to skip.
+-- are only known at run time. The <b>runtime filters</b> line shows the
+-- information used in query fragment 02 to decide which partitions to skip.
 
 EXPLAIN SELECT s FROM yy WHERE year IN (SELECT year FROM yy2);
 +--------------------------------------------------------------------------+
@@ -1820,7 +1819,7 @@ EXPLAIN SELECT s FROM yy WHERE year IN (SELECT year FROM yy2);
 | |                                                                        |
 | 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST]                                 |
 | |  hash predicates: year = year                                          |
-| |  <b>runtime filters: RF000 &lt;- year</b>                              |
+| |  <b>runtime filters: RF000 &lt;- year</b>                                   |
 | |                                                                        |
 | |--03:EXCHANGE [BROADCAST]                                               |
 | |  |                                                                     |
@@ -1828,7 +1827,7 @@ EXPLAIN SELECT s FROM yy WHERE year IN (SELECT year FROM yy2);
 | |     partitions=1/1 files=1 size=620B                                   |
 | |                                                                        |
 | 00:SCAN HDFS [default.yy]                                                |
-|    <b>partitions=5/5</b> files=5 size=1.71KB                             |
+|    <b>partitions=5/5</b> files=5 size=1.71KB                               |
 |    runtime filters: RF000 -> year                                        |
 +--------------------------------------------------------------------------+
 
diff --git a/docs/topics/impala_partitioning.xml b/docs/topics/impala_partitioning.xml
index 011ff88..5a986f6 100644
--- a/docs/topics/impala_partitioning.xml
+++ b/docs/topics/impala_partitioning.xml
@@ -440,9 +440,11 @@ SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
           indicating that static partition pruning will not happen.
         </p>
 
-        <p>The Filter summary in the <codeph>PROFILE</codeph> output shows that
+        <p>
+          The Filter summary in the <codeph>PROFILE</codeph> output shows that
           the scan node filtered out based on a runtime filter of dynamic
-          partition pruning. </p>
+          partition pruning.
+        </p>
 
 <codeblock>Filter 0 (1.00 MB):
  - Files processed: 3
diff --git a/docs/topics/impala_runtime_filtering.xml b/docs/topics/impala_runtime_filtering.xml
index cce9155..d496098 100644
--- a/docs/topics/impala_runtime_filtering.xml
+++ b/docs/topics/impala_runtime_filtering.xml
@@ -345,9 +345,8 @@ under the License.
 
       <p>
         The following example shows a query that uses a single runtime filter,
-        labeled <codeph>RF000</codeph>, to prune the partitions that are scanned
-        in one stage of the query, based on evaluating the result set of a
-        subquery:
+        labeled <codeph>RF000</codeph>, to prune the partitions based on
+        evaluating the result set of a subquery at runtime:
       </p>
 
 <codeblock conref="../shared/impala_common.xml#common/simple_dpp_example"/>


[impala] 08/11: IMPALA-6942: Reword error message to say "Failed" rather than "Cancelled"

Posted by ta...@apache.org.
This is an automated email from the ASF dual-hosted git repository.

tarmstrong pushed a commit to branch 2.x
in repository https://gitbox.apache.org/repos/asf/impala.git

commit b1f43e41d9a9f59fe9f9bdef0c736f69d2ee5805
Author: Dan Hecht <dh...@cloudera.com>
AuthorDate: Thu Jun 14 13:05:49 2018 -0700

    IMPALA-6942: Reword error message to say "Failed" rather than "Cancelled"
    
    In this case, the query is failing. It happens to use the cancellation
    path to cleanup, but from a user's perspective this is a query failure
    not a cancellation. Reword the message to reflect that.
    
    Change-Id: I4d8e755aef196e5c25205094af9c8486eb899344
    Reviewed-on: http://gerrit.cloudera.org:8080/10717
    Reviewed-by: Impala Public Jenkins <im...@cloudera.com>
    Tested-by: Impala Public Jenkins <im...@cloudera.com>
---
 be/src/service/impala-server.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index 771b0eb..2026842 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -1699,7 +1699,7 @@ void ImpalaServer::MembershipCallback(
           cancellation_entry != queries_to_cancel.end();
           ++cancellation_entry) {
         stringstream cause_msg;
-        cause_msg << "Cancelled due to unreachable impalad(s): ";
+        cause_msg << "Failed due to unreachable impalad(s): ";
         for (int i = 0; i < cancellation_entry->second.size(); ++i) {
           cause_msg << TNetworkAddressToString(cancellation_entry->second[i]);
           if (i + 1 != cancellation_entry->second.size()) cause_msg << ", ";