You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@impala.apache.org by sa...@apache.org on 2018/04/23 17:39:19 UTC

[1/9] impala git commit: IMPALA-6860: [DOCS] Upgrade considerations for Impala 3.0

Repository: impala
Updated Branches:
  refs/heads/master b68e06997 -> b9271ccf0


IMPALA-6860: [DOCS] Upgrade considerations for Impala 3.0

Change-Id: If8416ac0abb7ea1b918ba53b9533af27182fbe89
Cherry-picks: not for 2.x.
Reviewed-on: http://gerrit.cloudera.org:8080/10080
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/9117ce07
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/9117ce07
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/9117ce07

Branch: refs/heads/master
Commit: 9117ce073b4f5fc4bacc760a7bd82698447d115c
Parents: b68e069
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Mon Apr 16 14:23:43 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Apr 20 00:04:12 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_upgrading.xml | 160 +++++++++++++++++++++++++++++++++-
 1 file changed, 158 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/9117ce07/docs/topics/impala_upgrading.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_upgrading.xml b/docs/topics/impala_upgrading.xml
index 75ca7dd..7bfc0a4 100644
--- a/docs/topics/impala_upgrading.xml
+++ b/docs/topics/impala_upgrading.xml
@@ -136,8 +136,156 @@ $ ps ax | grep [i]mpalad
       </note>
     </conbody>
   </concept>
+
   <concept id="concept_a2p_szq_jdb">
     <title>Impala Upgrade Considerations</title>
+    <concept id="IMPALA-3916">
+      <title>List of Reserved Words Updated in <keyword keyref="impala30_full"
+        /></title>
+      <conbody>
+        <p>
+          The list of <keyword keyref="reserved_words">reserved
+            words</keyword> in Impala was updated in <keyword
+            keyref="impala30_full"/>. If you need to use a reserved word as an
+          identifier, e.g. a table name, enclose the word in back-ticks.
+        </p>
+
+        <p>
+          If you need to use the reserved words from previous versions of
+          Impala, set the <codeph>impalad</codeph> and <codeph>catalogd</codeph>
+          startup flag. Note that this startup option will be deprecated in a
+          future release.
+<codeblock>--reserved_words_version=2.11.0</codeblock>
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="IMPALA-4924">
+      <title>Decimal V2 Used by Default in <keyword keyref="impala30_full"/></title>
+      <conbody>
+        <p>
+          In Impala, two different implementations of <codeph>DECIMAL</codeph>
+          types are supported. Starting in <keyword keyref="impala30_full"/>,
+            <codeph>DECIMAL</codeph> V2 is used by default. See <keyword
+            keyref="decimal">DECIMAL Type</keyword> for detail information.
+        </p>
+
+        <p>
+          If you need to continue using the first version of the
+            <codeph>DECIMAL</codeph> type for the backward compatibility of your
+          queries, set the <codeph>DECIMAL_V2</codeph> query option to
+            <codeph>FALSE</codeph>:
+<codeblock>SET DECIMAL_V2=FALSE;</codeblock>
+        </p>
+      </conbody>
+    </concept>
+    <concept id="IMPALA-5191">
+      <title>Behavior of Column Aliases Changed in <keyword
+          keyref="impala30_full"/></title>
+      <conbody>
+        <p>
+          To conform to the SQL standard, Impala no longer performs alias
+          substitution in the subexpressions of <codeph>GROUP BY</codeph>,
+            <codeph>HAVING</codeph>, and <codeph>ORDER BY</codeph>. See <keyword
+            keyref="aliases"/> for examples of supported and unsupported aliases
+          syntax.
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="IMPALA-5037">
+      <title>Default PARQUET_ARRAY_RESOLUTION Changed in <keyword
+          keyref="impala30_full"/></title>
+      <conbody>
+        <p>
+          The default value for the <codeph>PARQUET_ARRAY_RESOLUTION</codeph>
+          was changed to <codeph>THREE_LEVEL</codeph> in <keyword
+            keyref="impala30_full"/>, to match the Parquet standard 3-level
+          encoding.
+        </p>
+
+        <p>
+          See <keyword keyref="parquet_array_resolution"/> for the information
+          about the query option.
+        </p>
+      </conbody>
+    </concept>
+    <concept id="IMPALA-5293">
+      <title>Enable Clustering Hint for Inserts</title>
+      <conbody>
+        <p>
+          In <keyword keyref="impala30_full"/>, the <keyword keyref="hints"
+            >clustered</keyword> hint is enabled by default. The hint adds a
+          local sort by the partitioning columns to a query plan. </p>
+        <p> The <codeph>clustered</codeph> hint is only effective for HDFS and
+          Kudu tables.
+        </p>
+
+        <p>
+          As in previous versions, the <codeph>noclustered</codeph> hint
+          prevents clustering. If a table has ordering columns defined, the
+            <codeph>noclustered</codeph> hint is ignored with a warning.
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="IMPALA-4319">
+      <title>Deprecated Query Options Removed in <keyword keyref="impala30_full"
+        /></title>
+      <conbody>
+        <p> The following query options have been deprecated for several
+          releases and removed: <ul>
+            <li><codeph>DEFAULT_ORDER_BY_LIMIT</codeph></li>
+            <li><codeph>ABORT_ON_DEFAULT_LIMIT_EXCEEDED</codeph></li>
+            <li><codeph>V_CPU_CORES</codeph></li>
+            <li><codeph>RESERVATION_REQUEST_TIMEOUT</codeph></li>
+            <li><codeph>RM_INITIAL_MEM</codeph></li>
+            <li><codeph>SCAN_NODE_CODEGEN_THRESHOLD</codeph></li>
+            <li><codeph>MAX_IO_BUFFERS</codeph></li>
+            <li><codeph>RM_INITIAL_MEM</codeph></li>
+            <li><codeph>DISABLE_CACHED_READS</codeph></li>
+          </ul>
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="impala-6648">
+      <title>Fine-grained Privileges Added in <keyword keyref="impala30_full"
+        /></title>
+      <conbody>
+        <p>
+          Starting in <keyword keyref="impala30_full"/>, finer grained
+          privileges are enforced, such as the <codeph>REFRESH</codeph>,
+            <codeph>CREATE</codeph>, <codeph>DROP</codeph>, and
+            <codeph>ALTER</codeph> privileges. In particular, running
+            <codeph>REFRESH</codeph> or <codeph>INVALIDATE METADATA</codeph> now
+          requires the new <codeph>REFRESH</codeph> privilege. Users who did not
+          previously have the <codeph>ALL</codeph> privilege will no longer be
+          able to run <codeph>REFRESH</codeph> or <codeph>INVALIDATE
+            METADATA</codeph> after an upgrade. Those users need to have the
+            <codeph>REFRESH</codeph> or <codeph>ALL</codeph> privilege granted
+          to run <codeph>REFRESH</codeph> or <codeph>INVALIDATE
+            METADATA</codeph>.
+        </p>
+
+        <p>
+          See <keyword keyref="grant"/> for the new privileges, the scope, and
+          other information about the new privileges.
+        </p>
+      </conbody>
+    </concept>
+
+    <concept id="IMPALA-3998">
+      <title>refresh_after_connect Impala Shell Option Removed in <keyword
+          keyref="impala30_full"/></title>
+      <conbody>
+        <p>
+          The deprecated <codeph>refresh_after_connect</codeph> option was
+          removed from Impala Shell in <keyword keyref="impala30_full"/>
+        </p>
+      </conbody>
+    </concept>
+
     <concept id="concept_mkn_ygr_jdb">
       <title>Default Setting Changes</title>
       <conbody>
@@ -149,14 +297,22 @@ $ ps ax | grep [i]mpalad
           </sthead>
           <strow>
             <stentry><keyword keyref="impala212_full"/></stentry>
-            <stentry><codeph>compact_catalog_topic</codeph></stentry>
+            <stentry><codeph>compact_catalog_topic</codeph>
+              <codeph>impalad</codeph> flag</stentry>
             <stentry><codeph>true</codeph></stentry>
           </strow>
           <strow>
             <stentry><keyword keyref="impala212_full"/></stentry>
-            <stentry><codeph>max_cached_file_handle</codeph></stentry>
+            <stentry><codeph>max_cached_file_handles</codeph>
+              <codeph>impalad</codeph> flag</stentry>
             <stentry><codeph>20000</codeph></stentry>
           </strow>
+          <strow>
+            <stentry><keyword keyref="impala30_full"/></stentry>
+            <stentry><codeph>PARQUET_ARRAY_RESOLUTION</codeph> query
+              option</stentry>
+            <stentry><codeph>THREE_LEVEL</codeph></stentry>
+          </strow>
         </simpletable>
       </conbody>
     </concept>

[3/9] impala git commit: IMPALA-5690: Part 1: Rename ostream operators for thrift types

Posted by sa...@apache.org.

IMPALA-5690: Part 1: Rename ostream operators for thrift types

Thrift 0.9.3 implements "ostream& operator<<(ostream&, T)" for thrift
data types while impala did the same to enums and special types
including TNetworkAddress and TUniqueId. To prepare for the upgrade of
thrift 0.9.3, this patch renames these impala defined functions. In the
absence of operator<<, assertion macros like DCHECK_EQ can no longer be
used on non-enum thrift defined types.

Change-Id: I9c303997411237e988ef960157f781776f6fcb60
Reviewed-on: http://gerrit.cloudera.org:8080/9168
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/8e86678d
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/8e86678d
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/8e86678d

Branch: refs/heads/master
Commit: 8e86678d65be7635039154f6edecd15cc08b1e0b
Parents: 9117ce0
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Mon Feb 12 15:59:02 2018 -0800
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Apr 20 10:28:12 2018 +0000

----------------------------------------------------------------------
 be/src/benchmarks/scheduler-benchmark.cc        |  2 +-
 be/src/codegen/llvm-codegen.cc                  |  4 +-
 be/src/exec/exchange-node.cc                    |  2 +-
 be/src/exec/exec-node.cc                        |  8 +-
 be/src/exec/hdfs-scan-node-base.cc              | 13 +--
 be/src/exec/hdfs-scan-node.cc                   |  2 +-
 be/src/exec/parquet-column-readers.cc           |  2 +-
 be/src/exec/parquet-metadata-utils.cc           |  2 +-
 be/src/rpc/rpc-trace.cc                         |  4 +-
 be/src/rpc/thrift-client.cc                     | 16 ++--
 be/src/rpc/thrift-util.cc                       |  3 +-
 be/src/rpc/thrift-util.h                        |  2 +-
 be/src/runtime/client-cache.cc                  | 21 +++--
 be/src/runtime/coordinator-backend-state.cc     | 27 +++---
 be/src/runtime/coordinator.cc                   | 34 ++++----
 be/src/runtime/data-stream-mgr.cc               | 22 ++---
 be/src/runtime/data-stream-recvr.cc             |  7 +-
 be/src/runtime/data-stream-sender.cc            | 17 ++--
 be/src/runtime/fragment-instance-state.cc       |  6 +-
 be/src/runtime/krpc-data-stream-mgr.cc          |  8 +-
 be/src/runtime/krpc-data-stream-recvr.cc        |  7 +-
 be/src/runtime/krpc-data-stream-sender.cc       | 12 +--
 be/src/runtime/mem-tracker.cc                   |  4 +-
 be/src/runtime/query-exec-mgr.cc                |  6 +-
 be/src/runtime/query-state.cc                   |  2 +-
 be/src/runtime/runtime-filter-bank.cc           |  3 +-
 be/src/runtime/runtime-state.cc                 |  4 +-
 be/src/scheduling/admission-controller.cc       | 14 ++--
 be/src/scheduling/scheduler.cc                  | 14 ++--
 be/src/service/child-query.cc                   |  3 +-
 be/src/service/client-request-state.cc          | 18 ++--
 be/src/service/impala-beeswax-server.cc         |  2 +-
 be/src/service/impala-hs2-server.cc             |  4 +-
 be/src/service/impala-http-handler.cc           | 11 ++-
 be/src/service/impala-internal-service.cc       |  7 +-
 be/src/service/impala-server.cc                 | 48 ++++++-----
 be/src/service/query-options-test.cc            |  2 +-
 be/src/service/query-options.cc                 | 25 ++++--
 be/src/service/query-result-set.cc              |  4 +-
 be/src/statestore/statestore.cc                 |  4 +-
 be/src/util/collection-metrics.h                |  2 +-
 be/src/util/debug-util.cc                       | 88 ++++++--------------
 be/src/util/debug-util.h                        | 46 +++++-----
 be/src/util/histogram-metric.h                  |  4 +-
 be/src/util/metrics.h                           |  5 +-
 be/src/util/network-util.cc                     |  7 +-
 be/src/util/network-util.h                      |  3 -
 be/src/util/webserver.cc                        |  7 +-
 .../functional-query/queries/QueryTest/set.test |  6 +-
 tests/shell/test_shell_commandline.py           |  4 +-
 50 files changed, 278 insertions(+), 290 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/benchmarks/scheduler-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/scheduler-benchmark.cc b/be/src/benchmarks/scheduler-benchmark.cc
index 7149dde..ef7fb9e 100644
--- a/be/src/benchmarks/scheduler-benchmark.cc
+++ b/be/src/benchmarks/scheduler-benchmark.cc
@@ -131,7 +131,7 @@ void BenchmarkFunction(int num_iterations, void* data) {
 /// blocks. Scheduling will be done according to the parameter 'replica_preference'.
 void RunClusterSizeBenchmark(TReplicaPreference::type replica_preference) {
   string suite_name = strings::Substitute(
-      "Cluster Size, $0", PrintTReplicaPreference(replica_preference));
+      "Cluster Size, $0", PrintThriftEnum(replica_preference));
   Benchmark suite(suite_name, false /* micro_heuristics */);
   vector<TestCtx> test_ctx(CLUSTER_SIZES.size());
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/codegen/llvm-codegen.cc
----------------------------------------------------------------------
diff --git a/be/src/codegen/llvm-codegen.cc b/be/src/codegen/llvm-codegen.cc
index 5d5ed15..7fe4ec1 100644
--- a/be/src/codegen/llvm-codegen.cc
+++ b/be/src/codegen/llvm-codegen.cc
@@ -1088,7 +1088,7 @@ Status LlvmCodeGen::FinalizeModule() {
   }
   string non_finalized_fns_str = ss.str();
   if (!non_finalized_fns_str.empty()) {
-    LOG(INFO) << "For query " << state_->query_id()
+    LOG(INFO) << "For query " << PrintId(state_->query_id())
               << " the following functions were not finalized and have been removed from "
                  "the module:\n"
               << non_finalized_fns_str;
@@ -1710,7 +1710,7 @@ void LlvmCodeGen::DiagnosticHandler::DiagnosticHandlerFn(
     info.print(diagnostic_printer);
     error_msg.flush();
     if (codegen->state_) {
-      LOG(INFO) << "Query " << codegen->state_->query_id() << " encountered a "
+      LOG(INFO) << "Query " << PrintId(codegen->state_->query_id()) << " encountered a "
           << codegen->diagnostic_handler_.error_str_;
     }
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/exchange-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/exchange-node.cc b/be/src/exec/exchange-node.cc
index 2dc662b..297a805 100644
--- a/be/src/exec/exchange-node.cc
+++ b/be/src/exec/exchange-node.cc
@@ -150,7 +150,7 @@ Status ExchangeNode::FillInputRowBatch(RuntimeState* state) {
   VLOG_FILE << "exch: has batch=" << (input_batch_ == NULL ? "false" : "true")
             << " #rows=" << (input_batch_ != NULL ? input_batch_->num_rows() : 0)
             << " is_cancelled=" << (ret_status.IsCancelled() ? "true" : "false")
-            << " instance_id=" << state->fragment_instance_id();
+            << " instance_id=" << PrintId(state->fragment_instance_id());
   return ret_status;
 }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/exec-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/exec-node.cc b/be/src/exec/exec-node.cc
index 294e755..bf08256 100644
--- a/be/src/exec/exec-node.cc
+++ b/be/src/exec/exec-node.cc
@@ -121,7 +121,7 @@ ExecNode::ExecNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl
     limit_(tnode.limit),
     num_rows_returned_(0),
     runtime_profile_(RuntimeProfile::Create(pool_,
-        Substitute("$0 (id=$1)", PrintPlanNodeType(tnode.node_type), id_))),
+        Substitute("$0 (id=$1)", PrintThriftEnum(tnode.node_type), id_))),
     rows_returned_counter_(NULL),
     rows_returned_rate_(NULL),
     containing_subplan_(NULL),
@@ -207,8 +207,8 @@ void ExecNode::Close(RuntimeState* state) {
   if (expr_mem_tracker_ != nullptr) expr_mem_tracker_->Close();
   if (mem_tracker_ != nullptr) {
     if (mem_tracker()->consumption() != 0) {
-      LOG(WARNING) << "Query " << state->query_id() << " may have leaked memory." << endl
-          << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
+      LOG(WARNING) << "Query " << PrintId(state->query_id()) << " may have leaked memory."
+          << endl << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
       DCHECK_EQ(mem_tracker()->consumption(), 0)
           << "Leaked memory." << endl
           << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
@@ -230,7 +230,7 @@ Status ExecNode::ClaimBufferReservation(RuntimeState* state) {
   }
 
   RETURN_IF_ERROR(buffer_pool->RegisterClient(
-      Substitute("$0 id=$1 ptr=$2", PrintPlanNodeType(type_), id_, this),
+      Substitute("$0 id=$1 ptr=$2", PrintThriftEnum(type_), id_, this),
       state->query_state()->file_group(), state->instance_buffer_reservation(),
       mem_tracker(), resource_profile_.max_reservation, runtime_profile(),
       &buffer_pool_client_));

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/hdfs-scan-node-base.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 21a6357..4467da3 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -793,23 +793,26 @@ void HdfsScanNodeBase::StopAndFinalizeCounters() {
           if (file_format == THdfsFileFormat::PARQUET) {
             // If a scan range stored as parquet is skipped, its compression type
             // cannot be figured out without reading the data.
-            ss << file_format << "/" << "Unknown" << "(Skipped):" << file_cnt << " ";
+            ss << PrintThriftEnum(file_format) << "/" << "Unknown" << "(Skipped):"
+               << file_cnt << " ";
           } else {
-            ss << file_format << "/" << compressions_set.GetFirstType() << "(Skipped):"
+            ss << PrintThriftEnum(file_format) << "/"
+               << PrintThriftEnum(compressions_set.GetFirstType()) << "(Skipped):"
                << file_cnt << " ";
           }
         } else if (compressions_set.Size() == 1) {
-          ss << file_format << "/" << compressions_set.GetFirstType() << ":" << file_cnt
+          ss << PrintThriftEnum(file_format) << "/"
+             << PrintThriftEnum(compressions_set.GetFirstType()) << ":" << file_cnt
              << " ";
         } else {
-          ss << file_format << "/" << "(";
+          ss << PrintThriftEnum(file_format) << "/" << "(";
           bool first = true;
           for (auto& elem : _THdfsCompression_VALUES_TO_NAMES) {
             THdfsCompression::type type = static_cast<THdfsCompression::type>(
                 elem.first);
             if (!compressions_set.HasType(type)) continue;
             if (!first) ss << ",";
-            ss << type;
+            ss << PrintThriftEnum(type);
             first = false;
           }
           ss << "):" << file_cnt << " ";

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/hdfs-scan-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-scan-node.cc b/be/src/exec/hdfs-scan-node.cc
index 710a8af..7c64338 100644
--- a/be/src/exec/hdfs-scan-node.cc
+++ b/be/src/exec/hdfs-scan-node.cc
@@ -158,7 +158,7 @@ Status HdfsScanNode::Init(const TPlanNode& tnode, RuntimeState* state) {
     max_materialized_row_batches_ = default_max_row_batches;
   }
   VLOG_QUERY << "Max row batch queue size for scan node '" << id_
-      << "' in fragment instance '" << state->fragment_instance_id()
+      << "' in fragment instance '" << PrintId(state->fragment_instance_id())
       << "': " << max_materialized_row_batches_;
   materialized_row_batches_.reset(new RowBatchQueue(max_materialized_row_batches_));
   return HdfsScanNodeBase::Init(tnode, state);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/parquet-column-readers.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/parquet-column-readers.cc b/be/src/exec/parquet-column-readers.cc
index 5c74e5b..685b030 100644
--- a/be/src/exec/parquet-column-readers.cc
+++ b/be/src/exec/parquet-column-readers.cc
@@ -1211,7 +1211,7 @@ bool BaseScalarColumnReader::NextLevels() {
 Status BaseScalarColumnReader::GetUnsupportedDecodingError() {
   return Status(Substitute(
       "File '$0' is corrupt: unexpected encoding: $1 for data page of column '$2'.",
-      filename(), PrintEncoding(page_encoding_), schema_element().name));
+      filename(), PrintThriftEnum(page_encoding_), schema_element().name));
 }
 
 bool BaseScalarColumnReader::NextPage() {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/exec/parquet-metadata-utils.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/parquet-metadata-utils.cc b/be/src/exec/parquet-metadata-utils.cc
index 3fc6b3e..3d05fe6 100644
--- a/be/src/exec/parquet-metadata-utils.cc
+++ b/be/src/exec/parquet-metadata-utils.cc
@@ -154,7 +154,7 @@ Status ParquetMetadataUtils::ValidateRowGroupColumn(
   for (int i = 0; i < encodings.size(); ++i) {
     if (!IsEncodingSupported(encodings[i])) {
       return Status(Substitute("File '$0' uses an unsupported encoding: $1 for column "
-          "'$2'.", filename, PrintEncoding(encodings[i]), schema_element.name));
+          "'$2'.", filename, PrintThriftEnum(encodings[i]), schema_element.name));
     }
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/rpc/rpc-trace.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/rpc-trace.cc b/be/src/rpc/rpc-trace.cc
index 028f397..425312f 100644
--- a/be/src/rpc/rpc-trace.cc
+++ b/be/src/rpc/rpc-trace.cc
@@ -197,7 +197,7 @@ void* RpcEventHandler::getContext(const char* fn_name, void* server_context) {
   InvocationContext* ctxt_ptr =
       new InvocationContext(MonotonicMillis(), cnxn_ctx, it->second);
   VLOG_RPC << "RPC call: " << string(fn_name) << "(from "
-           << ctxt_ptr->cnxn_ctx->network_address << ")";
+           << TNetworkAddressToString(ctxt_ptr->cnxn_ctx->network_address) << ")";
   return reinterpret_cast<void*>(ctxt_ptr);
 }
 
@@ -207,7 +207,7 @@ void RpcEventHandler::postWrite(void* ctx, const char* fn_name, uint32_t bytes)
   const string& call_name = string(fn_name);
   // TODO: bytes is always 0, how come?
   VLOG_RPC << "RPC call: " << server_name_ << ":" << call_name << " from "
-           << rpc_ctx->cnxn_ctx->network_address << " took "
+           << TNetworkAddressToString(rpc_ctx->cnxn_ctx->network_address) << " took "
            << PrettyPrinter::Print(elapsed_time * 1000L * 1000L, TUnit::TIME_NS);
   MethodDescriptor* descriptor = rpc_ctx->method_descriptor;
   delete rpc_ctx;

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/rpc/thrift-client.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-client.cc b/be/src/rpc/thrift-client.cc
index 1f8d99e..e0852eb 100644
--- a/be/src/rpc/thrift-client.cc
+++ b/be/src/rpc/thrift-client.cc
@@ -67,14 +67,14 @@ Status ThriftClientImpl::Open() {
     try {
       transport_->close();
     } catch (const TException& e) {
-      VLOG(1) << "Error closing socket to: " << address_ << ", ignoring (" << e.what()
-                << ")";
+      VLOG(1) << "Error closing socket to: " << TNetworkAddressToString(address_)
+              << ", ignoring (" << e.what() << ")";
     }
     // In certain cases in which the remote host is overloaded, this failure can
     // happen quite frequently. Let's print this error message without the stack
     // trace as there aren't many callers of this function.
     const string& err_msg = Substitute("Couldn't open transport for $0 ($1)",
-        lexical_cast<string>(address_), e.what());
+        TNetworkAddressToString(address_), e.what());
     VLOG(1) << err_msg;
     return Status::Expected(err_msg);
   }
@@ -91,7 +91,7 @@ Status ThriftClientImpl::OpenWithRetry(uint32_t num_tries, uint64_t wait_ms) {
     Status status = Open();
     if (status.ok()) return status;
 
-    LOG(INFO) << "Unable to connect to " << address_;
+    LOG(INFO) << "Unable to connect to " << TNetworkAddressToString(address_);
     if (num_tries == 0) {
       LOG(INFO) << "(Attempt " << try_count << ", will retry indefinitely)";
     } else {
@@ -109,15 +109,15 @@ void ThriftClientImpl::Close() {
   try {
     if (transport_.get() != NULL && transport_->isOpen()) transport_->close();
   } catch (const TException& e) {
-    LOG(INFO) << "Error closing connection to: " << address_ << ", ignoring (" << e.what()
-              << ")";
+    LOG(INFO) << "Error closing connection to: " << TNetworkAddressToString(address_)
+              << ", ignoring (" << e.what() << ")";
     // Forcibly close the socket (since the transport may have failed to get that far
     // during close())
     try {
       if (socket_.get() != NULL) socket_->close();
     } catch (const TException& e) {
-      LOG(INFO) << "Error closing socket to: " << address_ << ", ignoring (" << e.what()
-                << ")";
+      LOG(INFO) << "Error closing socket to: " << TNetworkAddressToString(address_)
+                << ", ignoring (" << e.what() << ")";
     }
   }
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/rpc/thrift-util.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-util.cc b/be/src/rpc/thrift-util.cc
index b8326f5..c0ba537 100644
--- a/be/src/rpc/thrift-util.cc
+++ b/be/src/rpc/thrift-util.cc
@@ -156,7 +156,7 @@ Status WaitForServer(const string& host, int port, int num_retries,
   return Status("Server did not come up");
 }
 
-std::ostream& operator<<(std::ostream& out, const TColumnValue& colval) {
+void PrintTColumnValue(std::ostream& out, const TColumnValue& colval) {
   if (colval.__isset.bool_val) {
     out << ((colval.bool_val) ? "true" : "false");
   } else if (colval.__isset.double_val) {
@@ -176,7 +176,6 @@ std::ostream& operator<<(std::ostream& out, const TColumnValue& colval) {
   } else {
     out << "NULL";
   }
-  return out;
 }
 
 bool TNetworkAddressComparator(const TNetworkAddress& a, const TNetworkAddress& b) {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/rpc/thrift-util.h
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-util.h b/be/src/rpc/thrift-util.h
index 24b0b6f..ed95a71 100644
--- a/be/src/rpc/thrift-util.h
+++ b/be/src/rpc/thrift-util.h
@@ -139,7 +139,7 @@ Status WaitForServer(const std::string& host, int port, int num_retries,
    int retry_interval_ms);
 
 /// Print a TColumnValue. If null, print "NULL".
-std::ostream& operator<<(std::ostream& out, const TColumnValue& colval);
+void PrintTColumnValue(std::ostream& out, const TColumnValue& colval);
 
 /// Compares two TNetworkAddresses alphanumerically by their host:port
 /// string representation

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/client-cache.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/client-cache.cc b/be/src/runtime/client-cache.cc
index af530f7..d066d6b 100644
--- a/be/src/runtime/client-cache.cc
+++ b/be/src/runtime/client-cache.cc
@@ -43,7 +43,7 @@ Status ClientCacheHelper::GetClient(const TNetworkAddress& address,
   shared_ptr<PerHostCache> host_cache;
   {
     lock_guard<mutex> lock(cache_lock_);
-    VLOG(2) << "GetClient(" << address << ")";
+    VLOG(2) << "GetClient(" << TNetworkAddressToString(address) << ")";
     shared_ptr<PerHostCache>* ptr = &per_host_caches_[address];
     if (ptr->get() == NULL) ptr->reset(new PerHostCache());
     host_cache = *ptr;
@@ -53,7 +53,8 @@ Status ClientCacheHelper::GetClient(const TNetworkAddress& address,
     lock_guard<mutex> lock(host_cache->lock);
     if (!host_cache->clients.empty()) {
       *client_key = host_cache->clients.front();
-      VLOG(2) << "GetClient(): returning cached client for " << address;
+      VLOG(2) << "GetClient(): returning cached client for " <<
+          TNetworkAddressToString(address);
       host_cache->clients.pop_front();
       if (metrics_enabled_) clients_in_use_metric_->Increment(1);
       return Status::OK();
@@ -78,7 +79,8 @@ Status ClientCacheHelper::ReopenClient(ClientFactory factory_method,
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(1) << "ReopenClient(): re-creating client for " << client_impl->address();
+  VLOG(1) << "ReopenClient(): re-creating client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   client_impl->Close();
 
@@ -109,7 +111,8 @@ Status ClientCacheHelper::ReopenClient(ClientFactory factory_method,
 Status ClientCacheHelper::CreateClient(const TNetworkAddress& address,
     ClientFactory factory_method, ClientKey* client_key) {
   shared_ptr<ThriftClientImpl> client_impl(factory_method(address, client_key));
-  VLOG(2) << "CreateClient(): creating new client for " << client_impl->address();
+  VLOG(2) << "CreateClient(): creating new client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   if (!client_impl->init_status().ok()) {
     *client_key = nullptr;
@@ -145,7 +148,8 @@ void ClientCacheHelper::ReleaseClient(ClientKey* client_key) {
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(2) << "Releasing client for " << client_impl->address() << " back to cache";
+  VLOG(2) << "Releasing client for " << TNetworkAddressToString(client_impl->address())
+      << " back to cache";
   {
     lock_guard<mutex> lock(cache_lock_);
     PerHostCacheMap::iterator cache = per_host_caches_.find(client_impl->address());
@@ -167,7 +171,8 @@ void ClientCacheHelper::DestroyClient(ClientKey* client_key) {
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(1) << "Broken Connection, destroy client for " << client_impl->address();
+  VLOG(1) << "Broken Connection, destroy client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   client_impl->Close();
   if (metrics_enabled_) total_clients_metric_->Increment(-1);
@@ -188,7 +193,7 @@ void ClientCacheHelper::CloseConnections(const TNetworkAddress& address) {
 
   {
     VLOG(2) << "Invalidating all " << cache->clients.size() << " clients for: "
-            << address;
+            << TNetworkAddressToString(address);
     lock_guard<mutex> entry_lock(cache->lock);
     lock_guard<mutex> map_lock(client_map_lock_);
     for (ClientKey client_key: cache->clients) {
@@ -208,7 +213,7 @@ string ClientCacheHelper::DebugString() {
   for (const PerHostCacheMap::value_type& cache: per_host_caches_) {
     lock_guard<mutex> host_cache_lock(cache.second->lock);
     if (!first) out << " ";
-    out << cache.first << ":" << cache.second->clients.size();
+    out << TNetworkAddressToString(cache.first) << ":" << cache.second->clients.size();
     first = false;
   }
   out << "])";

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/coordinator-backend-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator-backend-state.cc b/be/src/runtime/coordinator-backend-state.cc
index e8db00e..c701f8e 100644
--- a/be/src/runtime/coordinator-backend-state.cc
+++ b/be/src/runtime/coordinator-backend-state.cc
@@ -58,7 +58,7 @@ void Coordinator::BackendState::Init(
   int prev_fragment_idx = -1;
   for (const FInstanceExecParams* instance_params:
        backend_exec_params_->instance_params) {
-    DCHECK_EQ(host_, instance_params->host);  // all hosts must be the same
+    DCHECK(host_ == instance_params->host);  // all hosts must be the same
     int fragment_idx = instance_params->fragment().idx;
     DCHECK_LT(fragment_idx, fragment_stats.size());
     if (prev_fragment_idx != -1 && fragment_idx != prev_fragment_idx) {
@@ -157,7 +157,8 @@ void Coordinator::BackendState::Exec(
   rpc_params.__set_query_ctx(query_ctx);
   SetRpcParams(debug_options, filter_routing_table, &rpc_params);
   VLOG_FILE << "making rpc: ExecQueryFInstances"
-      << " host=" << impalad_address() << " query_id=" << PrintId(query_id_);
+      << " host=" << TNetworkAddressToString(impalad_address()) << " query_id="
+      << PrintId(query_id_);
 
   // guard against concurrent UpdateBackendExecStatus() that may arrive after RPC returns
   lock_guard<mutex> l(lock_);
@@ -223,8 +224,9 @@ void Coordinator::BackendState::LogFirstInProgress(
   for (Coordinator::BackendState* backend_state : backend_states) {
     lock_guard<mutex> l(backend_state->lock_);
     if (!backend_state->IsDone()) {
-      VLOG_QUERY << "query_id=" << backend_state->query_id_
-                 << ": first in-progress backend: " << backend_state->impalad_address();
+      VLOG_QUERY << "query_id=" << PrintId(backend_state->query_id_)
+                 << ": first in-progress backend: "
+                 << TNetworkAddressToString(backend_state->impalad_address());
       break;
     }
   }
@@ -249,7 +251,7 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
     int instance_idx = GetInstanceIdx(instance_exec_status.fragment_instance_id);
     DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
     InstanceStats* instance_stats = instance_stats_map_[instance_idx];
-    DCHECK_EQ(instance_stats->exec_params_.instance_id,
+    DCHECK(instance_stats->exec_params_.instance_id ==
         instance_exec_status.fragment_instance_id);
     // Ignore duplicate or out-of-order messages.
     if (instance_stats->done_) continue;
@@ -304,7 +306,8 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
     // Append the log messages from each update with the global state of the query
     // execution
     MergeErrorMaps(backend_exec_status.error_log, &error_log_);
-    VLOG_FILE << "host=" << host_ << " error log: " << PrintErrorMapToString(error_log_);
+    VLOG_FILE << "host=" << TNetworkAddressToString(host_) << " error log: " <<
+        PrintErrorMapToString(error_log_);
   }
 
   // TODO: keep backend-wide stopwatch?
@@ -349,8 +352,8 @@ bool Coordinator::BackendState::Cancel() {
   params.protocol_version = ImpalaInternalServiceVersion::V1;
   params.__set_query_id(query_id_);
   TCancelQueryFInstancesResult dummy;
-  VLOG_QUERY << "sending CancelQueryFInstances rpc for query_id="
-             << query_id_ << " backend=" << TNetworkAddressToString(impalad_address());
+  VLOG_QUERY << "sending CancelQueryFInstances rpc for query_id=" << PrintId(query_id_) <<
+      " backend=" << TNetworkAddressToString(impalad_address());
 
   Status rpc_status;
   Status client_status;
@@ -370,14 +373,14 @@ bool Coordinator::BackendState::Cancel() {
   }
   if (!client_status.ok()) {
     status_.MergeStatus(client_status);
-    VLOG_QUERY << "CancelQueryFInstances query_id= " << query_id_
+    VLOG_QUERY << "CancelQueryFInstances query_id= " << PrintId(query_id_)
                << " failed to connect to " << TNetworkAddressToString(impalad_address())
                << " :" << client_status.msg().msg();
     return true;
   }
   if (!rpc_status.ok()) {
     status_.MergeStatus(rpc_status);
-    VLOG_QUERY << "CancelQueryFInstances query_id= " << query_id_
+    VLOG_QUERY << "CancelQueryFInstances query_id= " << PrintId(query_id_)
                << " rpc to " << TNetworkAddressToString(impalad_address())
                << " failed: " << rpc_status.msg().msg();
     return true;
@@ -386,7 +389,7 @@ bool Coordinator::BackendState::Cancel() {
 }
 
 void Coordinator::BackendState::PublishFilter(const TPublishFilterParams& rpc_params) {
-  DCHECK_EQ(rpc_params.dst_query_id, query_id_);
+  DCHECK(rpc_params.dst_query_id == query_id_);
   {
     // If the backend is already done, it's not waiting for this filter, so we skip
     // sending it in this case.
@@ -412,7 +415,7 @@ Coordinator::BackendState::InstanceStats::InstanceStats(
   : exec_params_(exec_params),
     profile_(nullptr) {
   const string& profile_name = Substitute("Instance $0 (host=$1)",
-      PrintId(exec_params.instance_id), lexical_cast<string>(exec_params.host));
+      PrintId(exec_params.instance_id), TNetworkAddressToString(exec_params.host));
   profile_ = RuntimeProfile::Create(obj_pool, profile_name);
   fragment_stats->root_profile()->AddChild(profile_);
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/coordinator.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index bdd02eb..1cac467 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -90,7 +90,7 @@ Status Coordinator::Exec() {
   const TQueryExecRequest& request = schedule_.request();
   DCHECK(request.plan_exec_info.size() > 0);
 
-  VLOG_QUERY << "Exec() query_id=" << query_id()
+  VLOG_QUERY << "Exec() query_id=" << PrintId(query_id())
              << " stmt=" << request.query_ctx.client_request.stmt;
   stmt_type_ = request.stmt_type;
 
@@ -347,7 +347,7 @@ void Coordinator::StartBackendExec() {
   DebugOptions debug_options(schedule_.query_options());
 
   VLOG_QUERY << "starting execution on " << num_backends << " backends for query_id="
-             << query_id();
+             << PrintId(query_id());
   query_events_->MarkEvent(Substitute("Ready to start on $0 backends", num_backends));
 
   for (BackendState* backend_state: backend_states_) {
@@ -360,7 +360,7 @@ void Coordinator::StartBackendExec() {
 
   exec_complete_barrier_->Wait();
   VLOG_QUERY << "started execution on " << num_backends << " backends for query_id="
-             << query_id();
+             << PrintId(query_id());
   query_events_->MarkEvent(
       Substitute("All $0 execution backends ($1 fragment instances) started",
         num_backends, schedule_.GetNumFragmentInstances()));
@@ -472,10 +472,11 @@ Status Coordinator::UpdateStatus(const Status& status, const string& backend_hos
 
   if (is_fragment_failure) {
     // Log the id of the fragment that first failed so we can track it down more easily.
-    VLOG_QUERY << "query_id=" << query_id() << " failed because fragment_instance_id="
-               << instance_id << " on host=" << backend_hostname << " failed.";
+    VLOG_QUERY << "query_id=" << PrintId(query_id())
+               << " failed because fragment_instance_id=" << PrintId(instance_id)
+               << " on host=" << backend_hostname << " failed.";
   } else {
-    VLOG_QUERY << "query_id=" << query_id() << " failed due to error on host="
+    VLOG_QUERY << "query_id=" << PrintId(query_id()) << " failed due to error on host="
                << backend_hostname;
   }
   return query_status_;
@@ -488,7 +489,7 @@ Status Coordinator::FinalizeHdfsInsert() {
   DCHECK(has_called_wait_);
   DCHECK(finalize_params() != nullptr);
 
-  VLOG_QUERY << "Finalizing query: " << query_id();
+  VLOG_QUERY << "Finalizing query: " << PrintId(query_id());
   SCOPED_TIMER(finalization_timer_);
   Status return_status = GetStatus();
   if (return_status.ok()) {
@@ -520,14 +521,15 @@ Status Coordinator::WaitForBackendCompletion() {
   unique_lock<mutex> l(lock_);
   while (num_remaining_backends_ > 0 && query_status_.ok()) {
     VLOG_QUERY << "Coordinator waiting for backends to finish, "
-               << num_remaining_backends_ << " remaining. query_id=" << query_id();
+               << num_remaining_backends_ << " remaining. query_id="
+               << PrintId(query_id());
     backend_completion_cv_.Wait(l);
   }
   if (query_status_.ok()) {
-    VLOG_QUERY << "All backends finished successfully. query_id=" << query_id();
+    VLOG_QUERY << "All backends finished successfully. query_id=" << PrintId(query_id());
   } else {
     VLOG_QUERY << "All backends finished due to one or more errors. query_id="
-               << query_id() << ". " << query_status_.GetDetail();
+               << PrintId(query_id()) << ". " << query_status_.GetDetail();
   }
 
   return query_status_;
@@ -572,7 +574,7 @@ Status Coordinator::Wait() {
 }
 
 Status Coordinator::GetNext(QueryResultSet* results, int max_rows, bool* eos) {
-  VLOG_ROW << "GetNext() query_id=" << query_id();
+  VLOG_ROW << "GetNext() query_id=" << PrintId(query_id());
   DCHECK(has_called_wait_);
   SCOPED_TIMER(query_profile_->total_time_counter());
 
@@ -626,7 +628,7 @@ void Coordinator::Cancel(const Status* cause) {
 }
 
 void Coordinator::CancelInternal() {
-  VLOG_QUERY << "Cancel() query_id=" << query_id();
+  VLOG_QUERY << "Cancel() query_id=" << PrintId(query_id());
   // TODO: remove when restructuring cancellation, which should happen automatically
   // as soon as the coordinator knows that the query is finished
   DCHECK(!query_status_.ok());
@@ -687,9 +689,9 @@ Status Coordinator::UpdateBackendExecStatus(const TReportExecStatusParams& param
     DCHECK_GT(num_remaining_backends_, 0);
     if (VLOG_QUERY_IS_ON && num_remaining_backends_ > 1) {
       VLOG_QUERY << "Backend completed: "
-          << " host=" << backend_state->impalad_address()
+          << " host=" << TNetworkAddressToString(backend_state->impalad_address())
           << " remaining=" << num_remaining_backends_ - 1
-          << " query_id=" << query_id();
+          << " query_id=" << PrintId(query_id());
       BackendState::LogFirstInProgress(backend_states_);
     }
     if (--num_remaining_backends_ == 0 || !status.ok()) {
@@ -729,7 +731,7 @@ void Coordinator::ComputeQuerySummary() {
 
   stringstream info;
   for (BackendState* backend_state: backend_states_) {
-    info << backend_state->impalad_address() << "("
+    info << TNetworkAddressToString(backend_state->impalad_address()) << "("
          << PrettyPrinter::Print(backend_state->GetPeakConsumption(), TUnit::BYTES)
          << ") ";
   }
@@ -893,7 +895,7 @@ void Coordinator::FilterState::ApplyUpdate(const TUpdateFilterParams& params,
       if (!coord->filter_mem_tracker_->TryConsume(heap_space)) {
         VLOG_QUERY << "Not enough memory to allocate filter: "
                    << PrettyPrinter::Print(heap_space, TUnit::BYTES)
-                   << " (query_id=" << coord->query_id() << ")";
+                   << " (query_id=" << PrintId(coord->query_id()) << ")";
         // Disable, as one missing update means a correct filter cannot be produced.
         Disable(coord->filter_mem_tracker_);
       } else {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/data-stream-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-mgr.cc b/be/src/runtime/data-stream-mgr.cc
index ed1e29e..8f55f0a 100644
--- a/be/src/runtime/data-stream-mgr.cc
+++ b/be/src/runtime/data-stream-mgr.cc
@@ -82,7 +82,7 @@ shared_ptr<DataStreamRecvrBase> DataStreamMgr::CreateRecvr(const RowDescriptor*
   DCHECK(profile != nullptr);
   DCHECK(parent_tracker != nullptr);
   VLOG_FILE << "creating receiver for fragment_instance_id="
-            << fragment_instance_id << ", node=" << dest_node_id;
+            << PrintId(fragment_instance_id) << ", node=" << dest_node_id;
   shared_ptr<DataStreamRecvr> recvr(new DataStreamRecvr(this, parent_tracker, row_desc,
       fragment_instance_id, dest_node_id, num_senders, is_merging, buffer_size, profile));
   size_t hash_value = GetHashValue(fragment_instance_id, dest_node_id);
@@ -127,9 +127,9 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvrOrWait(
   const string& time_taken = PrettyPrinter::Print(sw.ElapsedTime(), TUnit::TIME_NS);
   if (timed_out) {
     LOG(INFO) << "Datastream sender timed-out waiting for recvr for fragment_instance_id="
-              << fragment_instance_id << " (time-out was: " << time_taken << "). "
-              << "Increase --datastream_sender_timeout_ms if you see this message "
-              << "frequently.";
+              << PrintId(fragment_instance_id) << " (time-out was: " << time_taken <<
+              "). Increase --datastream_sender_timeout_ms if you see this message "
+              "frequently.";
   } else {
     VLOG_RPC << "Datastream sender waited for " << time_taken
              << ", and did not time-out.";
@@ -148,7 +148,7 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvrOrWait(
 
 shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvr(
     const TUniqueId& fragment_instance_id, PlanNodeId node_id, bool acquire_lock) {
-  VLOG_ROW << "looking up fragment_instance_id=" << fragment_instance_id
+  VLOG_ROW << "looking up fragment_instance_id=" << PrintId(fragment_instance_id)
            << ", node=" << node_id;
   size_t hash_value = GetHashValue(fragment_instance_id, node_id);
   if (acquire_lock) lock_.lock();
@@ -169,7 +169,7 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvr(
 
 Status DataStreamMgr::AddData(const TUniqueId& fragment_instance_id,
     PlanNodeId dest_node_id, const TRowBatch& thrift_batch, int sender_id) {
-  VLOG_ROW << "AddData(): fragment_instance_id=" << fragment_instance_id
+  VLOG_ROW << "AddData(): fragment_instance_id=" << PrintId(fragment_instance_id)
            << " node=" << dest_node_id
            << " size=" << RowBatch::GetDeserializedSize(thrift_batch);
   bool already_unregistered;
@@ -197,7 +197,7 @@ Status DataStreamMgr::AddData(const TUniqueId& fragment_instance_id,
 
 Status DataStreamMgr::CloseSender(const TUniqueId& fragment_instance_id,
     PlanNodeId dest_node_id, int sender_id) {
-  VLOG_FILE << "CloseSender(): fragment_instance_id=" << fragment_instance_id
+  VLOG_FILE << "CloseSender(): fragment_instance_id=" << PrintId(fragment_instance_id)
             << ", node=" << dest_node_id;
   Status status;
   bool already_unregistered;
@@ -243,7 +243,7 @@ Status DataStreamMgr::CloseSender(const TUniqueId& fragment_instance_id,
 
 Status DataStreamMgr::DeregisterRecvr(
     const TUniqueId& fragment_instance_id, PlanNodeId node_id) {
-  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << fragment_instance_id
+  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << PrintId(fragment_instance_id)
              << ", node=" << node_id;
   size_t hash_value = GetHashValue(fragment_instance_id, node_id);
   lock_guard<mutex> l(lock_);
@@ -268,7 +268,7 @@ Status DataStreamMgr::DeregisterRecvr(
   }
 
   stringstream err;
-  err << "unknown row receiver id: fragment_instance_id=" << fragment_instance_id
+  err << "unknown row receiver id: fragment_instance_id=" << PrintId(fragment_instance_id)
       << " node_id=" << node_id;
   LOG(ERROR) << err.str();
   return Status(err.str());
@@ -276,7 +276,7 @@ Status DataStreamMgr::DeregisterRecvr(
 
 void DataStreamMgr::Cancel(const TUniqueId& fragment_instance_id) {
   VLOG_QUERY << "cancelling all streams for fragment_instance_id="
-             << fragment_instance_id;
+             << PrintId(fragment_instance_id);
   lock_guard<mutex> l(lock_);
   FragmentRecvrSet::iterator i =
       fragment_recvr_set_.lower_bound(make_pair(fragment_instance_id, 0));
@@ -285,7 +285,7 @@ void DataStreamMgr::Cancel(const TUniqueId& fragment_instance_id) {
     if (recvr.get() == NULL) {
       // keep going but at least log it
       stringstream err;
-      err << "Cancel(): missing in stream_map: fragment_instance_id=" << i->first
+      err << "Cancel(): missing in stream_map: fragment_instance_id=" << PrintId(i->first)
           << " node=" << i->second;
       LOG(ERROR) << err.str();
     } else {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/data-stream-recvr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-recvr.cc b/be/src/runtime/data-stream-recvr.cc
index cdea4a0..8d9047f 100644
--- a/be/src/runtime/data-stream-recvr.cc
+++ b/be/src/runtime/data-stream-recvr.cc
@@ -112,7 +112,8 @@ Status DataStreamRecvr::SenderQueue::GetBatch(RowBatch** next_batch) {
   unique_lock<mutex> l(lock_);
   // wait until something shows up or we know we're done
   while (!is_cancelled_ && batch_queue_.empty() && num_remaining_senders_ > 0) {
-    VLOG_ROW << "wait arrival fragment_instance_id=" << recvr_->fragment_instance_id()
+    VLOG_ROW << "wait arrival fragment_instance_id="
+             << PrintId(recvr_->fragment_instance_id())
              << " node=" << recvr_->dest_node_id();
     // Don't count time spent waiting on the sender as active time.
     CANCEL_SAFE_SCOPED_TIMER(recvr_->data_arrival_timer_, &is_cancelled_);
@@ -221,7 +222,7 @@ void DataStreamRecvr::SenderQueue::DecrementSenders() {
   DCHECK_GT(num_remaining_senders_, 0);
   num_remaining_senders_ = max(0, num_remaining_senders_ - 1);
   VLOG_FILE << "decremented senders: fragment_instance_id="
-            << recvr_->fragment_instance_id()
+            << PrintId(recvr_->fragment_instance_id())
             << " node_id=" << recvr_->dest_node_id()
             << " #senders=" << num_remaining_senders_;
   if (num_remaining_senders_ == 0) data_arrival_cv_.NotifyOne();
@@ -233,7 +234,7 @@ void DataStreamRecvr::SenderQueue::Cancel() {
     if (is_cancelled_) return;
     is_cancelled_ = true;
     VLOG_QUERY << "cancelled stream: fragment_instance_id_="
-               << recvr_->fragment_instance_id()
+               << PrintId(recvr_->fragment_instance_id())
                << " node_id=" << recvr_->dest_node_id();
   }
   // Wake up all threads waiting to produce/consume batches.  They will all

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/data-stream-sender.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-sender.cc b/be/src/runtime/data-stream-sender.cc
index f68788e..7d0766d 100644
--- a/be/src/runtime/data-stream-sender.cc
+++ b/be/src/runtime/data-stream-sender.cc
@@ -165,8 +165,9 @@ Status DataStreamSender::Channel::Init(RuntimeState* state) {
 }
 
 Status DataStreamSender::Channel::SendBatch(TRowBatch* batch) {
-  VLOG_ROW << "Channel::SendBatch() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_ << " #rows=" << batch->num_rows;
+  VLOG_ROW << "Channel::SendBatch() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
+           << " #rows=" << batch->num_rows;
   // return if the previous batch saw an error
   RETURN_IF_ERROR(GetSendStatus());
   {
@@ -193,8 +194,8 @@ void DataStreamSender::Channel::TransmitData(int thread_id, const TRowBatch* bat
 
 void DataStreamSender::Channel::TransmitDataHelper(const TRowBatch* batch) {
   DCHECK(batch != NULL);
-  VLOG_ROW << "Channel::TransmitData() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_ROW << "Channel::TransmitData() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows=" << batch->num_rows;
   TTransmitDataParams params;
   params.protocol_version = ImpalaInternalServiceVersion::V1;
@@ -276,15 +277,15 @@ Status DataStreamSender::Channel::GetSendStatus() {
   WaitForRpc();
   if (!rpc_status_.ok()) {
     LOG(ERROR) << "channel send to " << TNetworkAddressToString(address_) << " failed "
-               << "(fragment_instance_id=" << fragment_instance_id_ << "): "
+               << "(fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
   }
   return rpc_status_;
 }
 
 Status DataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
-  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows= " << batch_->num_rows();
 
   // We can return an error here and not go on to send the EOS RPC because the error that
@@ -314,7 +315,7 @@ Status DataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
   rpc_status_ = DoTransmitDataRpc(&client, params, &res);
   if (!rpc_status_.ok()) {
     LOG(ERROR) << "Failed to send EOS to " << TNetworkAddressToString(address_)
-               << " (fragment_instance_id=" << fragment_instance_id_ << "): "
+               << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
     return rpc_status_;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/fragment-instance-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index 1a0d452..a6ae1ff 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -361,7 +361,7 @@ void FragmentInstanceState::ReportProfileThread() {
     SendReport(false, Status::OK());
   }
 
-  VLOG_FILE << "exiting reporting thread: instance_id=" << instance_id();
+  VLOG_FILE << "exiting reporting thread: instance_id=" << PrintId(instance_id());
 }
 
 void FragmentInstanceState::SendReport(bool done, const Status& status) {
@@ -370,7 +370,7 @@ void FragmentInstanceState::SendReport(bool done, const Status& status) {
 
   if (VLOG_FILE_IS_ON) {
     VLOG_FILE << "Reporting " << (done ? "final " : "") << "profile for instance "
-        << runtime_state_->fragment_instance_id();
+        << PrintId(runtime_state_->fragment_instance_id());
     stringstream ss;
     profile()->PrettyPrint(&ss);
     VLOG_FILE << ss.str();
@@ -554,5 +554,5 @@ void FragmentInstanceState::PrintVolumeIds() {
   profile()->AddInfoString(HdfsScanNodeBase::HDFS_SPLIT_STATS_DESC, str.str());
   VLOG_FILE
       << "Hdfs split stats (<volume id>:<# splits>/<split lengths>) for query="
-      << query_id() << ":\n" << str.str();
+      << PrintId(query_id()) << ":\n" << str.str();
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/krpc-data-stream-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-mgr.cc b/be/src/runtime/krpc-data-stream-mgr.cc
index cd8d90b..2aca6c8 100644
--- a/be/src/runtime/krpc-data-stream-mgr.cc
+++ b/be/src/runtime/krpc-data-stream-mgr.cc
@@ -104,7 +104,7 @@ shared_ptr<DataStreamRecvrBase> KrpcDataStreamMgr::CreateRecvr(
   DCHECK(profile != nullptr);
   DCHECK(parent_tracker != nullptr);
   DCHECK(client != nullptr);
-  VLOG_FILE << "creating receiver for fragment_instance_id="<< finst_id
+  VLOG_FILE << "creating receiver for fragment_instance_id="<< PrintId(finst_id)
             << ", node=" << dest_node_id;
   shared_ptr<KrpcDataStreamRecvr> recvr(new KrpcDataStreamRecvr(
       this, parent_tracker, row_desc, finst_id, dest_node_id, num_senders, is_merging,
@@ -149,7 +149,7 @@ shared_ptr<DataStreamRecvrBase> KrpcDataStreamMgr::CreateRecvr(
 
 shared_ptr<KrpcDataStreamRecvr> KrpcDataStreamMgr::FindRecvr(
     const TUniqueId& finst_id, PlanNodeId dest_node_id, bool* already_unregistered) {
-  VLOG_ROW << "looking up fragment_instance_id=" << finst_id
+  VLOG_ROW << "looking up fragment_instance_id=" << PrintId(finst_id)
            << ", node=" << dest_node_id;
   *already_unregistered = false;
   uint32_t hash_value = GetHashValue(finst_id, dest_node_id);
@@ -290,7 +290,7 @@ void KrpcDataStreamMgr::CloseSender(const EndDataStreamRequestPB* request,
 
 Status KrpcDataStreamMgr::DeregisterRecvr(
     const TUniqueId& finst_id, PlanNodeId dest_node_id) {
-  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << finst_id
+  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << PrintId(finst_id)
              << ", node=" << dest_node_id;
   uint32_t hash_value = GetHashValue(finst_id, dest_node_id);
   lock_guard<mutex> l(lock_);
@@ -321,7 +321,7 @@ Status KrpcDataStreamMgr::DeregisterRecvr(
 }
 
 void KrpcDataStreamMgr::Cancel(const TUniqueId& finst_id) {
-  VLOG_QUERY << "cancelling all streams for fragment_instance_id=" << finst_id;
+  VLOG_QUERY << "cancelling all streams for fragment_instance_id=" << PrintId(finst_id);
   lock_guard<mutex> l(lock_);
   FragmentRecvrSet::iterator iter =
       fragment_recvr_set_.lower_bound(make_pair(finst_id, 0));

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/krpc-data-stream-recvr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-recvr.cc b/be/src/runtime/krpc-data-stream-recvr.cc
index 6e47bd6..be51f32 100644
--- a/be/src/runtime/krpc-data-stream-recvr.cc
+++ b/be/src/runtime/krpc-data-stream-recvr.cc
@@ -232,7 +232,8 @@ Status KrpcDataStreamRecvr::SenderQueue::GetBatch(RowBatch** next_batch) {
       // is pending insertion so this thread is guaranteed to wake up at some point.
       DCHECK(deferred_rpcs_.empty() ||
           (num_deserialize_tasks_pending_ + num_pending_enqueue_) > 0);
-      VLOG_ROW << "wait arrival fragment_instance_id=" << recvr_->fragment_instance_id()
+      VLOG_ROW << "wait arrival fragment_instance_id="
+               << PrintId(recvr_->fragment_instance_id())
                << " node=" << recvr_->dest_node_id();
       // Don't count time spent waiting on the sender as active time.
       CANCEL_SAFE_SCOPED_TIMER(recvr_->data_wait_timer_, &is_cancelled_);
@@ -534,7 +535,7 @@ void KrpcDataStreamRecvr::SenderQueue::DecrementSenders() {
   DCHECK_GT(num_remaining_senders_, 0);
   num_remaining_senders_ = max(0, num_remaining_senders_ - 1);
   VLOG_FILE << "decremented senders: fragment_instance_id="
-            << recvr_->fragment_instance_id()
+            << PrintId(recvr_->fragment_instance_id())
             << " node_id=" << recvr_->dest_node_id()
             << " #senders=" << num_remaining_senders_;
   if (num_remaining_senders_ == 0) data_arrival_cv_.notify_one();
@@ -555,7 +556,7 @@ void KrpcDataStreamRecvr::SenderQueue::Cancel() {
     }
   }
   VLOG_QUERY << "cancelled stream: fragment_instance_id="
-             << recvr_->fragment_instance_id()
+             << PrintId(recvr_->fragment_instance_id())
              << " node_id=" << recvr_->dest_node_id();
   // Wake up all threads waiting to produce/consume batches. They will all
   // notice that the stream is cancelled and handle it.

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/krpc-data-stream-sender.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-sender.cc b/be/src/runtime/krpc-data-stream-sender.cc
index 0f11dec..cd30f06 100644
--- a/be/src/runtime/krpc-data-stream-sender.cc
+++ b/be/src/runtime/krpc-data-stream-sender.cc
@@ -333,7 +333,7 @@ Status KrpcDataStreamSender::Channel::WaitForRpc(std::unique_lock<SpinLock>* loc
   DCHECK(!rpc_in_flight_);
   if (UNLIKELY(!rpc_status_.ok())) {
     LOG(ERROR) << "channel send to " << TNetworkAddressToString(address_) << " failed: "
-               << "(fragment_instance_id=" << fragment_instance_id_ << "): "
+               << "(fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
     return rpc_status_;
   }
@@ -449,8 +449,8 @@ Status KrpcDataStreamSender::Channel::DoTransmitDataRpc() {
 
 Status KrpcDataStreamSender::Channel::TransmitData(
     const OutboundRowBatch* outbound_batch) {
-  VLOG_ROW << "Channel::TransmitData() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_ROW << "Channel::TransmitData() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows=" << outbound_batch->header()->num_rows();
   std::unique_lock<SpinLock> l(lock_);
   RETURN_IF_ERROR(WaitForRpc(&l));
@@ -529,8 +529,8 @@ Status KrpcDataStreamSender::Channel::DoEndDataStreamRpc() {
 }
 
 Status KrpcDataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
-  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows= " << batch_->num_rows();
 
   // We can return an error here and not go on to send the EOS RPC because the error that
@@ -544,7 +544,7 @@ Status KrpcDataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
     DCHECK(rpc_status_.ok());
     if (UNLIKELY(remote_recvr_closed_)) return Status::OK();
     VLOG_RPC << "calling EndDataStream() to terminate channel. fragment_instance_id="
-             << fragment_instance_id_;
+             << PrintId(fragment_instance_id_);
     rpc_in_flight_ = true;
     COUNTER_ADD(parent_->eos_sent_counter_, 1);
     RETURN_IF_ERROR(DoEndDataStreamRpc());

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/mem-tracker.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-tracker.cc b/be/src/runtime/mem-tracker.cc
index e5aa290..96c02a2 100644
--- a/be/src/runtime/mem-tracker.cc
+++ b/be/src/runtime/mem-tracker.cc
@@ -196,7 +196,7 @@ MemTracker* MemTracker::CreateQueryMemTracker(const TUniqueId& id,
       ExecEnv::GetInstance()->pool_mem_trackers()->GetRequestPoolMemTracker(
           pool_name, true);
   MemTracker* tracker = obj_pool->Add(new MemTracker(
-      byte_limit, Substitute("Query($0)", lexical_cast<string>(id)), pool_tracker));
+      byte_limit, Substitute("Query($0)", PrintId(id)), pool_tracker));
   tracker->is_query_mem_tracker_ = true;
   tracker->query_id_ = id;
   return tracker;
@@ -370,7 +370,7 @@ Status MemTracker::MemLimitExceeded(RuntimeState* state, const std::string& deta
        << " without exceeding limit." << endl;
   }
   ss << "Error occurred on backend " << GetBackendString();
-  if (state != nullptr) ss << " by fragment " << state->fragment_instance_id();
+  if (state != nullptr) ss << " by fragment " << PrintId(state->fragment_instance_id());
   ss << endl;
   ExecEnv* exec_env = ExecEnv::GetInstance();
   MemTracker* process_tracker = exec_env->process_mem_tracker();

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/query-exec-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-exec-mgr.cc b/be/src/runtime/query-exec-mgr.cc
index 967dc4b..2d66f57 100644
--- a/be/src/runtime/query-exec-mgr.cc
+++ b/be/src/runtime/query-exec-mgr.cc
@@ -44,7 +44,7 @@ DEFINE_int32(log_mem_usage_interval, 0, "If non-zero, impalad will output memory
 Status QueryExecMgr::StartQuery(const TExecQueryFInstancesParams& params) {
   TUniqueId query_id = params.query_ctx.query_id;
   VLOG_QUERY << "StartQueryFInstances() query_id=" << PrintId(query_id)
-             << " coord=" << params.query_ctx.coord_address;
+             << " coord=" << TNetworkAddressToString(params.query_ctx.coord_address);
 
   bool dummy;
   QueryState* qs = GetOrCreateQueryState(params.query_ctx, &dummy);
@@ -92,7 +92,7 @@ QueryState* QueryExecMgr::GetQueryState(const TUniqueId& query_id) {
     refcnt = qs->refcnt_.Add(1);
   }
   DCHECK(qs != nullptr && refcnt > 0);
-  VLOG_QUERY << "QueryState: query_id=" << query_id << " refcnt=" << refcnt;
+  VLOG_QUERY << "QueryState: query_id=" << PrintId(query_id) << " refcnt=" << refcnt;
   return qs;
 }
 
@@ -167,7 +167,7 @@ void QueryExecMgr::ReleaseQueryState(QueryState* qs) {
     // someone else might have gc'd the entry
     if (it == map_ref->end()) return;
     qs_from_map = it->second;
-    DCHECK_EQ(qs_from_map->query_ctx().query_id, query_id);
+    DCHECK(qs_from_map->query_ctx().query_id == query_id);
     int32_t cnt = qs_from_map->refcnt_.Load();
     DCHECK_GE(cnt, 0);
     // someone else might have increased the refcnt in the meantime

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/query-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-state.cc b/be/src/runtime/query-state.cc
index 04a4283..22616ed 100644
--- a/be/src/runtime/query-state.cc
+++ b/be/src/runtime/query-state.cc
@@ -413,7 +413,7 @@ void QueryState::ExecFInstance(FragmentInstanceState* fis) {
 }
 
 void QueryState::Cancel() {
-  VLOG_QUERY << "Cancel: query_id=" << query_id();
+  VLOG_QUERY << "Cancel: query_id=" << PrintId(query_id());
   (void) instances_prepared_promise_.Get();
   if (!is_cancelled_.CompareAndSwap(0, 1)) return;
   for (auto entry: fis_map_) entry.second->Cancel();

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/runtime-filter-bank.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/runtime-filter-bank.cc b/be/src/runtime/runtime-filter-bank.cc
index 4e23a42..64638a6 100644
--- a/be/src/runtime/runtime-filter-bank.cc
+++ b/be/src/runtime/runtime-filter-bank.cc
@@ -263,7 +263,8 @@ void RuntimeFilterBank::Close() {
   obj_pool_.Clear();
   mem_pool_.FreeAll();
   if (buffer_pool_client_.is_registered()) {
-    VLOG_FILE << "RuntimeFilterBank (Fragment Id: " << state_->fragment_instance_id()
+    VLOG_FILE << "RuntimeFilterBank (Fragment Id: "
+              << PrintId(state_->fragment_instance_id())
               << ") returning reservation " << total_bloom_filter_mem_required_;
     state_->query_state()->initial_reservations()->Return(
         &buffer_pool_client_, total_bloom_filter_mem_required_);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/runtime/runtime-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/runtime-state.cc b/be/src/runtime/runtime-state.cc
index 4b39ec8..29ea737 100644
--- a/be/src/runtime/runtime-state.cc
+++ b/be/src/runtime/runtime-state.cc
@@ -167,7 +167,7 @@ bool RuntimeState::LogError(const ErrorMsg& message, int vlog_level) {
   // All errors go to the log, unreported_error_count_ is counted independently of the
   // size of the error_log to account for errors that were already reported to the
   // coordinator
-  VLOG(vlog_level) << "Error from query " << query_id() << ": " << message.msg();
+  VLOG(vlog_level) << "Error from query " << PrintId(query_id()) << ": " << message.msg();
   if (ErrorCount(error_log_) < query_options().max_errors) {
     AppendError(&error_log_, message);
     return true;
@@ -239,7 +239,7 @@ void RuntimeState::ReleaseResources() {
 
   // No more memory should be tracked for this instance at this point.
   if (instance_mem_tracker_->consumption() != 0) {
-    LOG(WARNING) << "Query " << query_id() << " may have leaked memory." << endl
+    LOG(WARNING) << "Query " << PrintId(query_id()) << " may have leaked memory." << endl
                  << instance_mem_tracker_->LogUsage(MemTracker::UNLIMITED_DEPTH);
   }
   instance_mem_tracker_->Close();

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/scheduling/admission-controller.cc
----------------------------------------------------------------------
diff --git a/be/src/scheduling/admission-controller.cc b/be/src/scheduling/admission-controller.cc
index 640a6af..7cdcd02 100644
--- a/be/src/scheduling/admission-controller.cc
+++ b/be/src/scheduling/admission-controller.cc
@@ -505,7 +505,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     pool_config_map_[pool_name] = pool_cfg;
     PoolStats* stats = GetPoolStats(pool_name);
     stats->UpdateConfigMetrics(pool_cfg);
-    VLOG_QUERY << "Schedule for id=" << schedule->query_id() << " in pool_name="
+    VLOG_QUERY << "Schedule for id=" << PrintId(schedule->query_id()) << " in pool_name="
                << pool_name << " cluster_mem_needed="
                << PrintBytes(schedule->GetClusterMemoryEstimate())
                << " PoolConfig: max_requests=" << max_requests << " max_queued="
@@ -526,7 +526,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
 
     if (CanAdmitRequest(*schedule, pool_cfg, false, &not_admitted_reason)) {
       DCHECK_EQ(stats->local_stats().num_queued, 0);
-      VLOG_QUERY << "Admitted query id=" << schedule->query_id();
+      VLOG_QUERY << "Admitted query id=" << PrintId(schedule->query_id());
       stats->Admit(*schedule);
       UpdateHostMemAdmitted(*schedule, schedule->GetPerHostMemoryEstimate());
       schedule->set_is_admitted(true);
@@ -537,7 +537,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     }
 
     // We cannot immediately admit but do not need to reject, so queue the request
-    VLOG_QUERY << "Queuing, query id=" << schedule->query_id();
+    VLOG_QUERY << "Queuing, query id=" << PrintId(schedule->query_id());
     stats->Queue(*schedule);
     queue->Enqueue(&queue_node);
   }
@@ -600,7 +600,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     schedule->set_is_admitted(true);
     schedule->summary_profile()->AddInfoString(PROFILE_INFO_KEY_ADMISSION_RESULT,
         PROFILE_INFO_VAL_ADMIT_QUEUED);
-    VLOG_QUERY << "Admitted queued query id=" << schedule->query_id();
+    VLOG_QUERY << "Admitted queued query id=" << PrintId(schedule->query_id());
     VLOG_RPC << "Final: " << stats->DebugString();
     return Status::OK();
   }
@@ -615,7 +615,7 @@ void AdmissionController::ReleaseQuery(const QuerySchedule& schedule) {
     stats->Release(schedule);
     UpdateHostMemAdmitted(schedule, -schedule.GetPerHostMemoryEstimate());
     pools_for_updates_.insert(pool_name);
-    VLOG_RPC << "Released query id=" << schedule.query_id() << " "
+    VLOG_RPC << "Released query id=" << PrintId(schedule.query_id()) << " "
              << stats->DebugString();
   }
   dequeue_cv_.NotifyOne();
@@ -875,11 +875,11 @@ void AdmissionController::DequeueLoop() {
         // TODO: Requests further in the queue may be blocked unnecessarily. Consider a
         // better policy once we have better test scenarios.
         if (!CanAdmitRequest(schedule, pool_config, true, &not_admitted_reason)) {
-          VLOG_RPC << "Could not dequeue query id=" << schedule.query_id()
+          VLOG_RPC << "Could not dequeue query id=" << PrintId(schedule.query_id())
                    << " reason: " << not_admitted_reason;
           break;
         }
-        VLOG_RPC << "Dequeuing query=" << schedule.query_id();
+        VLOG_RPC << "Dequeuing query=" << PrintId(schedule.query_id());
         queue.Dequeue();
         stats->Dequeue(schedule, false);
         stats->Admit(schedule);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/scheduling/scheduler.cc
----------------------------------------------------------------------
diff --git a/be/src/scheduling/scheduler.cc b/be/src/scheduling/scheduler.cc
index b091415..d8aa88b 100644
--- a/be/src/scheduling/scheduler.cc
+++ b/be/src/scheduling/scheduler.cc
@@ -172,7 +172,7 @@ void Scheduler::UpdateMembership(
       // adds the IP address to local_backend_descriptor_. If it is empty, then either
       // that code has been changed, or someone else is sending malformed packets.
       VLOG(1) << "Ignoring subscription request with empty IP address from subscriber: "
-              << be_desc.address;
+              << TNetworkAddressToString(be_desc.address);
       continue;
     }
     if (item.key == local_backend_id_
@@ -181,9 +181,8 @@ void Scheduler::UpdateMembership(
       // will try to re-register (i.e. overwrite their subscription), but there is
       // likely a configuration problem.
       LOG_EVERY_N(WARNING, 30) << "Duplicate subscriber registration from address: "
-                               << be_desc.address
-                               << " (we are: " << local_backend_descriptor_.address
-                               << ")";
+           << TNetworkAddressToString(be_desc.address) << " (we are: "
+           << TNetworkAddressToString(local_backend_descriptor_.address) << ")";
       continue;
     }
     if (be_desc.is_executor) {
@@ -216,7 +215,7 @@ const TBackendDescriptor& Scheduler::LookUpBackendDesc(
   const TBackendDescriptor* desc = executor_config.LookUpBackendDesc(host);
   if (desc == nullptr) {
     // Local host may not be in executor_config if it's a dedicated coordinator.
-    DCHECK_EQ(host, local_backend_descriptor_.address);
+    DCHECK(host == local_backend_descriptor_.address);
     DCHECK(!local_backend_descriptor_.is_executor);
     desc = &local_backend_descriptor_;
   }
@@ -724,7 +723,7 @@ void Scheduler::ComputeBackendExecParams(QuerySchedule* schedule) {
 
   stringstream min_reservation_ss;
   for (const auto& e: per_backend_params) {
-    min_reservation_ss << e.first << "("
+    min_reservation_ss << TNetworkAddressToString(e.first) << "("
          << PrettyPrinter::Print(e.second.min_reservation_bytes, TUnit::BYTES)
          << ") ";
   }
@@ -900,7 +899,8 @@ void Scheduler::AssignmentCtx::RecordScanRangeAssignment(
   scan_range_params_list->push_back(scan_range_params);
 
   if (VLOG_FILE_IS_ON) {
-    VLOG_FILE << "Scheduler assignment to executor: " << executor.address << "("
+    VLOG_FILE << "Scheduler assignment to executor: "
+              << TNetworkAddressToString(executor.address) << "("
               << (remote_read ? "remote" : "local") << " selection)";
   }
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/child-query.cc
----------------------------------------------------------------------
diff --git a/be/src/service/child-query.cc b/be/src/service/child-query.cc
index ac4c6be..2c5b316 100644
--- a/be/src/service/child-query.cc
+++ b/be/src/service/child-query.cc
@@ -136,7 +136,8 @@ void ChildQuery::Cancel() {
   Status status = ImpalaServer::THandleIdentifierToTUniqueId(hs2_handle_.operationId,
       &session_id, &secret_unused);
   if (status.ok()) {
-    VLOG_QUERY << "Cancelling and closing child query with operation id: " << session_id;
+    VLOG_QUERY << "Cancelling and closing child query with operation id: " <<
+        PrintId(session_id);
   } else {
     VLOG_QUERY << "Cancelling and closing child query. Failed to get query id: " <<
         status;

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/client-request-state.cc
----------------------------------------------------------------------
diff --git a/be/src/service/client-request-state.cc b/be/src/service/client-request-state.cc
index f58eeb8..5186c6f 100644
--- a/be/src/service/client-request-state.cc
+++ b/be/src/service/client-request-state.cc
@@ -91,7 +91,7 @@ ClientRequestState::ClientRequestState(
 
   profile_->set_name("Query (id=" + PrintId(query_id()) + ")");
   summary_profile_->AddInfoString("Session ID", PrintId(session_id()));
-  summary_profile_->AddInfoString("Session Type", PrintTSessionType(session_type()));
+  summary_profile_->AddInfoString("Session Type", PrintThriftEnum(session_type()));
   if (session_type() == TSessionType::HIVESERVER2) {
     summary_profile_->AddInfoString("HiveServer2 Protocol Version",
         Substitute("V$0", 1 + session->hs2_version));
@@ -102,14 +102,14 @@ ClientRequestState::ClientRequestState(
       TimePrecision::Nanosecond));
   summary_profile_->AddInfoString("End Time", "");
   summary_profile_->AddInfoString("Query Type", "N/A");
-  summary_profile_->AddInfoString("Query State", PrintQueryState(BeeswaxQueryState()));
+  summary_profile_->AddInfoString("Query State", PrintThriftEnum(BeeswaxQueryState()));
   summary_profile_->AddInfoString("Query Status", "OK");
   summary_profile_->AddInfoString("Impala Version", GetVersionString(/* compact */ true));
   summary_profile_->AddInfoString("User", effective_user());
   summary_profile_->AddInfoString("Connected User", connected_user());
   summary_profile_->AddInfoString("Delegated User", do_as_user());
   summary_profile_->AddInfoString("Network Address",
-      lexical_cast<string>(session_->network_address));
+      TNetworkAddressToString(session_->network_address));
   summary_profile_->AddInfoString("Default Db", default_db());
   summary_profile_->AddInfoStringRedacted(
       "Sql Statement", query_ctx_.client_request.stmt);
@@ -140,7 +140,7 @@ Status ClientRequestState::Exec(TExecRequest* exec_request) {
   exec_request_ = *exec_request;
 
   profile_->AddChild(server_profile_);
-  summary_profile_->AddInfoString("Query Type", PrintTStmtType(stmt_type()));
+  summary_profile_->AddInfoString("Query Type", PrintThriftEnum(stmt_type()));
   summary_profile_->AddInfoString("Query Options (set by configuration)",
       DebugQueryOptions(query_ctx_.client_request.query_options));
   summary_profile_->AddInfoString("Query Options (set by configuration and planner)",
@@ -489,7 +489,7 @@ Status ClientRequestState::ExecQueryOrDmlRequest(
 
 Status ClientRequestState::ExecDdlRequest() {
   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
-      PrintTDdlType(ddl_type()) : PrintTCatalogOpType(catalog_op_type());
+      PrintThriftEnum(ddl_type()) : PrintThriftEnum(catalog_op_type());
   summary_profile_->AddInfoString("DDL Type", op_type);
 
   if (catalog_op_type() != TCatalogOpType::DDL &&
@@ -575,7 +575,7 @@ void ClientRequestState::Done() {
     uint64_t latest_kudu_ts =
         coord_->dml_exec_state()->GetKuduLatestObservedTimestamp();
     if (latest_kudu_ts > 0) {
-      VLOG_RPC << "Updating session (id=" << session_id()  << ") with latest "
+      VLOG_RPC << "Updating session (id=" << PrintId(session_id())  << ") with latest "
                << "observed Kudu timestamp: " << latest_kudu_ts;
       lock_guard<mutex> session_lock(session_->lock);
       session_->kudu_latest_observed_ts = std::max<uint64_t>(
@@ -599,7 +599,7 @@ void ClientRequestState::Done() {
 Status ClientRequestState::Exec(const TMetadataOpRequest& exec_request) {
   TResultSet metadata_op_result;
   // Like the other Exec(), fill out as much profile information as we're able to.
-  summary_profile_->AddInfoString("Query Type", PrintTStmtType(TStmtType::DDL));
+  summary_profile_->AddInfoString("Query Type", PrintThriftEnum(TStmtType::DDL));
   RETURN_IF_ERROR(frontend_->ExecHiveServer2MetadataOp(exec_request,
       &metadata_op_result));
   result_metadata_ = metadata_op_result.schema;
@@ -926,7 +926,7 @@ Status ClientRequestState::UpdateCatalog() {
     catalog_update.header.__set_requesting_user(effective_user());
     if (!coord()->dml_exec_state()->PrepareCatalogUpdate(&catalog_update)) {
       VLOG_QUERY << "No partitions altered, not updating metastore (query id: "
-                 << query_id() << ")";
+                 << PrintId(query_id()) << ")";
     } else {
       // TODO: We track partitions written to, not created, which means
       // that we do more work than is necessary, because written-to
@@ -1110,7 +1110,7 @@ void ClientRequestState::ClearResultCache() {
 void ClientRequestState::UpdateOperationState(
     TOperationState::type operation_state) {
   operation_state_ = operation_state;
-  summary_profile_->AddInfoString("Query State", PrintQueryState(BeeswaxQueryState()));
+  summary_profile_->AddInfoString("Query State", PrintThriftEnum(BeeswaxQueryState()));
 }
 
 beeswax::QueryState::type ClientRequestState::BeeswaxQueryState() const {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/impala-beeswax-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-beeswax-server.cc b/be/src/service/impala-beeswax-server.cc
index 4875adb..1096677 100644
--- a/be/src/service/impala-beeswax-server.cc
+++ b/be/src/service/impala-beeswax-server.cc
@@ -286,7 +286,7 @@ void ImpalaServer::get_log(string& log, const LogContextId& context) {
   shared_ptr<ClientRequestState> request_state = GetClientRequestState(query_id);
   if (request_state.get() == nullptr) {
     stringstream str;
-    str << "unknown query id: " << query_id;
+    str << "unknown query id: " << PrintId(query_id);
     LOG(ERROR) << str.str();
     return;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/impala-hs2-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-hs2-server.cc b/be/src/service/impala-hs2-server.cc
index 765fccf..36c7169 100644
--- a/be/src/service/impala-hs2-server.cc
+++ b/be/src/service/impala-hs2-server.cc
@@ -338,8 +338,8 @@ void ImpalaServer::OpenSession(TOpenSessionResp& return_val,
   TQueryOptionsToMap(state->QueryOptions(), &return_val.configuration);
 
   // OpenSession() should return the coordinator's HTTP server address.
-  const string& http_addr = lexical_cast<string>(
-      MakeNetworkAddress(FLAGS_hostname, FLAGS_webserver_port));
+  const string& http_addr = TNetworkAddressToString(MakeNetworkAddress(
+      FLAGS_hostname, FLAGS_webserver_port));
   return_val.configuration.insert(make_pair("http_addr", http_addr));
 
   // Put the session state in session_state_map_

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/impala-http-handler.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-http-handler.cc b/be/src/service/impala-http-handler.cc
index 9b8d597..7be4370 100644
--- a/be/src/service/impala-http-handler.cc
+++ b/be/src/service/impala-http-handler.cc
@@ -198,7 +198,7 @@ void ImpalaHttpHandler::CloseSessionHandler(const Webserver::ArgumentMap& args,
     return;
   }
   stringstream ss;
-  ss << "Session " << unique_id << " closed successfully";
+  ss << "Session " << PrintId(unique_id) << " closed successfully";
   Value message(ss.str().c_str(), document->GetAllocator());
   document->AddMember("contents", message, document->GetAllocator());
 }
@@ -250,7 +250,7 @@ void ImpalaHttpHandler::InflightQueryIdsHandler(const Webserver::ArgumentMap& ar
   stringstream ss;
   server_->client_request_state_map_.DoFuncForAllEntries(
       [&](const std::shared_ptr<ClientRequestState>& request_state) {
-          ss << request_state->query_id() << "\n";
+          ss << PrintId(request_state->query_id()) << "\n";
       });
   document->AddMember(Webserver::ENABLE_RAW_JSON_KEY, true, document->GetAllocator());
   Value query_ids(ss.str().c_str(), document->GetAllocator());
@@ -419,7 +419,7 @@ void ImpalaHttpHandler::QueryStateHandler(const Webserver::ArgumentMap& args,
     for (const ImpalaServer::QueryLocations::value_type& location:
          server_->query_locations_) {
       Value location_json(kObjectType);
-      Value location_name(lexical_cast<string>(location.first).c_str(),
+      Value location_name(TNetworkAddressToString(location.first).c_str(),
           document->GetAllocator());
       location_json.AddMember("location", location_name, document->GetAllocator());
       location_json.AddMember("count", static_cast<uint64_t>(location.second.size()),
@@ -440,8 +440,7 @@ void ImpalaHttpHandler::SessionsHandler(const Webserver::ArgumentMap& args,
            server_->session_state_map_) {
     shared_ptr<ImpalaServer::SessionState> state = session.second;
     Value session_json(kObjectType);
-    Value type(PrintTSessionType(state->session_type).c_str(),
-        document->GetAllocator());
+    Value type(PrintThriftEnum(state->session_type).c_str(), document->GetAllocator());
     session_json.AddMember("type", type, document->GetAllocator());
 
     session_json.AddMember("inflight_queries",
@@ -459,7 +458,7 @@ void ImpalaHttpHandler::SessionsHandler(const Webserver::ArgumentMap& args,
     Value session_id(PrintId(session.first).c_str(), document->GetAllocator());
     session_json.AddMember("session_id", session_id, document->GetAllocator());
 
-    Value network_address(lexical_cast<string>(state->network_address).c_str(),
+    Value network_address(TNetworkAddressToString(state->network_address).c_str(),
         document->GetAllocator());
     session_json.AddMember("network_address", network_address, document->GetAllocator());
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/impala-internal-service.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-internal-service.cc b/be/src/service/impala-internal-service.cc
index 5be8765..53a62da 100644
--- a/be/src/service/impala-internal-service.cc
+++ b/be/src/service/impala-internal-service.cc
@@ -41,7 +41,8 @@ ImpalaInternalService::ImpalaInternalService() {
 
 void ImpalaInternalService::ExecQueryFInstances(TExecQueryFInstancesResult& return_val,
     const TExecQueryFInstancesParams& params) {
-  VLOG_QUERY << "ExecQueryFInstances():" << " query_id=" << params.query_ctx.query_id;
+  VLOG_QUERY << "ExecQueryFInstances():" << " query_id=" <<
+      PrintId(params.query_ctx.query_id);
   FAULT_INJECTION_RPC_DELAY(RPC_EXECQUERYFINSTANCES);
   DCHECK(params.__isset.coord_state_idx);
   DCHECK(params.__isset.query_ctx);
@@ -53,14 +54,14 @@ void ImpalaInternalService::ExecQueryFInstances(TExecQueryFInstancesResult& retu
 template <typename T> void SetUnknownIdError(
     const string& id_type, const TUniqueId& id, T* status_container) {
   Status status(ErrorMsg(TErrorCode::INTERNAL_ERROR,
-      Substitute("Unknown $0 id: $1", id_type, lexical_cast<string>(id))));
+      Substitute("Unknown $0 id: $1", id_type, PrintId(id))));
   status.SetTStatus(status_container);
 }
 
 void ImpalaInternalService::CancelQueryFInstances(
     TCancelQueryFInstancesResult& return_val,
     const TCancelQueryFInstancesParams& params) {
-  VLOG_QUERY << "CancelQueryFInstances(): query_id=" << params.query_id;
+  VLOG_QUERY << "CancelQueryFInstances(): query_id=" << PrintId(params.query_id);
   FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES);
   DCHECK(params.__isset.query_id);
   QueryState::ScopedRef qs(params.query_id);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/impala-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index f6cd6e5..a15f7e7 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -487,16 +487,16 @@ Status ImpalaServer::LogAuditRecord(const ClientRequestState& request_state,
   if (request.stmt_type == TStmtType::DDL) {
     if (request.catalog_op_request.op_type == TCatalogOpType::DDL) {
       writer.String(
-          PrintTDdlType(request.catalog_op_request.ddl_params.ddl_type).c_str());
+          PrintThriftEnum(request.catalog_op_request.ddl_params.ddl_type).c_str());
     } else {
-      writer.String(PrintTCatalogOpType(request.catalog_op_request.op_type).c_str());
+      writer.String(PrintThriftEnum(request.catalog_op_request.op_type).c_str());
     }
   } else {
-    writer.String(PrintTStmtType(request.stmt_type).c_str());
+    writer.String(PrintThriftEnum(request.stmt_type).c_str());
   }
   writer.String("network_address");
-  writer.String(
-      lexical_cast<string>(request_state.session()->network_address).c_str());
+  writer.String(TNetworkAddressToString(
+      request_state.session()->network_address).c_str());
   writer.String("sql_statement");
   string stmt = replace_all_copy(request_state.sql_stmt(), "\n", " ");
   Redact(&stmt);
@@ -508,7 +508,7 @@ Status ImpalaServer::LogAuditRecord(const ClientRequestState& request_state,
     writer.String("name");
     writer.String(event.name.c_str());
     writer.String("object_type");
-    writer.String(PrintTCatalogObjectType(event.object_type).c_str());
+    writer.String(PrintThriftEnum(event.object_type).c_str());
     writer.String("privilege");
     writer.String(event.privilege.c_str());
     writer.EndObject();
@@ -753,7 +753,7 @@ void ImpalaServer::ArchiveQuery(const ClientRequestState& query) {
   // FLAGS_log_query_to_file will have been set to false
   if (FLAGS_log_query_to_file) {
     stringstream ss;
-    ss << UnixMillis() << " " << query.query_id() << " " << encoded_profile_str;
+    ss << UnixMillis() << " " << PrintId(query.query_id()) << " " << encoded_profile_str;
     status = profile_logger_->AppendEntry(ss.str());
     if (!status.ok()) {
       LOG_EVERY_N(WARNING, 1000) << "Could not write to profile log file file ("
@@ -792,7 +792,7 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
   Status status = exec_env_->request_pool_service()->ResolveRequestPool(*ctx,
       &resolved_pool);
   if (!status.ok()) {
-    VLOG_RPC << "Not adding pool query options for query=" << ctx->query_id
+    VLOG_RPC << "Not adding pool query options for query=" << PrintId(ctx->query_id)
              << " ResolveRequestPool status: " << status.GetDetail();
     return;
   }
@@ -801,7 +801,7 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
   TPoolConfig config;
   status = exec_env_->request_pool_service()->GetPoolConfig(resolved_pool, &config);
   if (!status.ok()) {
-    VLOG_RPC << "Not adding pool query options for query=" << ctx->query_id
+    VLOG_RPC << "Not adding pool query options for query=" << PrintId(ctx->query_id)
              << " GetConfigPool status: " << status.GetDetail();
     return;
   }
@@ -1007,7 +1007,7 @@ Status ImpalaServer::SetQueryInflight(shared_ptr<SessionState> session_state,
 
 Status ImpalaServer::UnregisterQuery(const TUniqueId& query_id, bool check_inflight,
     const Status* cause) {
-  VLOG_QUERY << "UnregisterQuery(): query_id=" << query_id;
+  VLOG_QUERY << "UnregisterQuery(): query_id=" << PrintId(query_id);
 
   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause));
 
@@ -1201,7 +1201,7 @@ void ImpalaServer::ReportExecStatus(
 
 void ImpalaServer::TransmitData(
     TTransmitDataResult& return_val, const TTransmitDataParams& params) {
-  VLOG_ROW << "TransmitData(): instance_id=" << params.dest_fragment_instance_id
+  VLOG_ROW << "TransmitData(): instance_id=" << PrintId(params.dest_fragment_instance_id)
            << " node_id=" << params.dest_node_id
            << " #rows=" << params.row_batch.num_rows
            << " sender_id=" << params.sender_id
@@ -1301,14 +1301,14 @@ void ImpalaServer::CancelFromThreadPool(uint32_t thread_id,
     Status status = UnregisterQuery(cancellation_work.query_id(), true,
         &cancellation_work.cause());
     if (!status.ok()) {
-      VLOG_QUERY << "Query de-registration (" << cancellation_work.query_id()
+      VLOG_QUERY << "Query de-registration (" << PrintId(cancellation_work.query_id())
                  << ") failed";
     }
   } else {
     Status status = CancelInternal(cancellation_work.query_id(), true,
         &cancellation_work.cause());
     if (!status.ok()) {
-      VLOG_QUERY << "Query cancellation (" << cancellation_work.query_id()
+      VLOG_QUERY << "Query cancellation (" << PrintId(cancellation_work.query_id())
                  << ") did not succeed: " << status.GetDetail();
     }
   }
@@ -1623,7 +1623,7 @@ void ImpalaServer::MembershipCallback(
         stringstream cause_msg;
         cause_msg << "Cancelled due to unreachable impalad(s): ";
         for (int i = 0; i < cancellation_entry->second.size(); ++i) {
-          cause_msg << cancellation_entry->second[i];
+          cause_msg << TNetworkAddressToString(cancellation_entry->second[i]);
           if (i + 1 != cancellation_entry->second.size()) cause_msg << ", ";
         }
         string cause_str = cause_msg.str();
@@ -1788,13 +1788,15 @@ void ImpalaServer::ConnectionEnd(
     connection_to_sessions_map_.erase(it);
   }
 
-  LOG(INFO) << "Connection from client " << connection_context.network_address
-            << " closed, closing " << sessions_to_close.size() << " associated session(s)";
+  LOG(INFO) << "Connection from client "
+            << TNetworkAddressToString(connection_context.network_address)
+            << " closed, closing " << sessions_to_close.size()
+            << " associated session(s)";
 
   for (const TUniqueId& session_id: sessions_to_close) {
     Status status = CloseSessionInternal(session_id, true);
     if (!status.ok()) {
-      LOG(WARNING) << "Error closing session " << session_id << ": "
+      LOG(WARNING) << "Error closing session " << PrintId(session_id) << ": "
                    << status.GetDetail();
     }
   }
@@ -1848,7 +1850,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
         int64_t last_accessed_ms = session_state.second->last_accessed_ms;
         int64_t session_timeout_ms = session_state.second->session_timeout * 1000;
         if (now - last_accessed_ms <= session_timeout_ms) continue;
-        LOG(INFO) << "Expiring session: " << session_state.first << ", user:"
+        LOG(INFO) << "Expiring session: " << PrintId(session_state.first) << ", user:"
                   << session_state.second->connected_user << ", last active: "
                   << ToStringFromUnixMillis(last_accessed_ms);
         session_state.second->expired = true;
@@ -1903,7 +1905,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
         // If the query time limit expired, we must cancel the query.
         if (expiration_event->kind == ExpirationKind::EXEC_TIME_LIMIT) {
           int32_t exec_time_limit_s = query_state->query_options().exec_time_limit_s;
-          VLOG_QUERY << "Expiring query " << expiration_event->query_id
+          VLOG_QUERY << "Expiring query " << PrintId(expiration_event->query_id)
                      << " due to execution time limit of " << exec_time_limit_s << "s.";
           const string& err_msg = Substitute(
               "Query $0 expired due to execution time limit of $1",
@@ -1946,7 +1948,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
           // Otherwise time to expire this query
           VLOG_QUERY
               << "Expiring query due to client inactivity: "
-              << expiration_event->query_id << ", last activity was at: "
+              << PrintId(expiration_event->query_id) << ", last activity was at: "
               << ToStringFromUnixMillis(query_state->last_active_ms());
           const string& err_msg = Substitute(
               "Query $0 expired due to client inactivity (timeout is $1)",
@@ -2021,7 +2023,7 @@ Status ImpalaServer::Start(int32_t thrift_be_port, int32_t beeswax_port,
 
   if (!FLAGS_is_coordinator) {
     LOG(INFO) << "Initialized executor Impala server on "
-              << ExecEnv::GetInstance()->backend_address();
+              << TNetworkAddressToString(ExecEnv::GetInstance()->backend_address());
   } else {
     // Initialize the client servers.
     boost::shared_ptr<ImpalaServer> handler = shared_from_this();
@@ -2079,7 +2081,7 @@ Status ImpalaServer::Start(int32_t thrift_be_port, int32_t beeswax_port,
     }
   }
   LOG(INFO) << "Initialized coordinator/executor Impala server on "
-      << ExecEnv::GetInstance()->backend_address();
+      << TNetworkAddressToString(ExecEnv::GetInstance()->backend_address());
 
   // Start the RPC services.
   RETURN_IF_ERROR(exec_env_->StartKrpcService());
@@ -2136,7 +2138,7 @@ void ImpalaServer::UpdateFilter(TUpdateFilterResult& result,
   shared_ptr<ClientRequestState> client_request_state =
       GetClientRequestState(params.query_id);
   if (client_request_state.get() == nullptr) {
-    LOG(INFO) << "Could not find client request state: " << params.query_id;
+    LOG(INFO) << "Could not find client request state: " << PrintId(params.query_id);
     return;
   }
   client_request_state->coord()->UpdateFilter(params);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/query-options-test.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options-test.cc b/be/src/service/query-options-test.cc
index a472f2a..7de5f4d 100644
--- a/be/src/service/query-options-test.cc
+++ b/be/src/service/query-options-test.cc
@@ -360,7 +360,7 @@ TEST(QueryOptions, MapOptionalDefaultlessToEmptyString) {
   EXPECT_EQ(map["COMPRESSION_CODEC"], "");
   EXPECT_EQ(map["MT_DOP"], "");
   // Has defaults
-  EXPECT_EQ(map["EXPLAIN_LEVEL"], "1");
+  EXPECT_EQ(map["EXPLAIN_LEVEL"], "STANDARD");
 }
 
 /// Overlay a with b. batch_size is set in both places.

[2/9] impala git commit: IMPALA-5690: Part 1: Rename ostream operators for thrift types

Posted by sa...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/query-options.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 7395e24..9d52ae1 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -70,14 +70,29 @@ void impala::OverlayQueryOptions(const TQueryOptions& src, const QueryOptionsMas
 #undef REMOVED_QUERY_OPT_FN
 }
 
+// Choose different print function based on the type.
+// TODO: In thrift 0.11.0 operator << is implemented for enums and this indirection can be
+// removed.
+template<typename T, typename std::enable_if_t<std::is_enum<T>::value>* = nullptr>
+string PrintQueryOptionValue(const T& option) {
+  return PrintThriftEnum(option);
+}
+
+template<typename T, typename std::enable_if_t<std::is_integral<T>::value>* = nullptr>
+string PrintQueryOptionValue(const T& option)  {
+  return std::to_string(option);
+}
+
+const string& PrintQueryOptionValue(const std::string& option)  {
+  return option;
+}
+
 void impala::TQueryOptionsToMap(const TQueryOptions& query_options,
     map<string, string>* configuration) {
 #define QUERY_OPT_FN(NAME, ENUM, LEVEL)\
   {\
     if (query_options.__isset.NAME) { \
-      stringstream val;\
-      val << query_options.NAME;\
-      (*configuration)[#ENUM] = val.str();\
+      (*configuration)[#ENUM] = PrintQueryOptionValue(query_options.NAME); \
     } else { \
       (*configuration)[#ENUM] = ""; \
     }\
@@ -370,7 +385,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
         if (size < RuntimeFilterBank::MIN_BLOOM_FILTER_SIZE ||
             size > RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE) {
           return Status(Substitute("$0 is not a valid Bloom filter size for $1. "
-                  "Valid sizes are in [$2, $3].", value, PrintTImpalaQueryOptions(
+                  "Valid sizes are in [$2, $3].", value, PrintThriftEnum(
                       static_cast<TImpalaQueryOptions::type>(option)),
                   RuntimeFilterBank::MIN_BLOOM_FILTER_SIZE,
                   RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE));
@@ -382,7 +397,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
             && FLAGS_min_buffer_size <= RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE) {
           return Status(Substitute("$0 should not be less than $1 which is the minimum "
               "buffer size that can be allocated by the buffer pool",
-              PrintTImpalaQueryOptions(static_cast<TImpalaQueryOptions::type>(option)),
+              PrintThriftEnum(static_cast<TImpalaQueryOptions::type>(option)),
               FLAGS_min_buffer_size));
         }
         if (option == TImpalaQueryOptions::RUNTIME_BLOOM_FILTER_SIZE) {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/service/query-result-set.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-result-set.cc b/be/src/service/query-result-set.cc
index 8d00af5..aacd849 100644
--- a/be/src/service/query-result-set.cc
+++ b/be/src/service/query-result-set.cc
@@ -182,8 +182,8 @@ Status AsciiQueryResultSet::AddOneRow(const TResultRow& row) {
   out_stream.precision(ASCII_PRECISION);
   for (int i = 0; i < num_col; ++i) {
     // ODBC-187 - ODBC can only take "\t" as the delimiter
-    out_stream << (i > 0 ? "\t" : "");
-    out_stream << row.colVals[i];
+    if (i > 0) out_stream << '\t';
+    PrintTColumnValue(out_stream, row.colVals[i]);
   }
   result_set_->push_back(out_stream.str());
   return Status::OK();

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/statestore/statestore.cc
----------------------------------------------------------------------
diff --git a/be/src/statestore/statestore.cc b/be/src/statestore/statestore.cc
index 02363fe..5c1952d 100644
--- a/be/src/statestore/statestore.cc
+++ b/be/src/statestore/statestore.cc
@@ -460,7 +460,7 @@ void Statestore::SubscribersHandler(const Webserver::ArgumentMap& args,
     Value subscriber_id(subscriber.second->id().c_str(), document->GetAllocator());
     sub_json.AddMember("id", subscriber_id, document->GetAllocator());
 
-    Value address(lexical_cast<string>(subscriber.second->network_address()).c_str(),
+    Value address(TNetworkAddressToString(subscriber.second->network_address()).c_str(),
         document->GetAllocator());
     sub_json.AddMember("address", address, document->GetAllocator());
 
@@ -875,7 +875,7 @@ void Statestore::DoSubscriberUpdate(UpdateKind update_kind, int thread_id,
         // TODO: Consider if a metric to track the number of failures would be useful.
         LOG(INFO) << "Subscriber '" << subscriber->id() << "' has failed, disconnected "
                   << "or re-registered (last known registration ID: "
-                  << update.registration_id << ")";
+                  << PrintId(update.registration_id) << ")";
         UnregisterSubscriber(subscriber.get());
       } else {
         LOG(INFO) << "Failure was already detected for subscriber '" << subscriber->id()

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/collection-metrics.h
----------------------------------------------------------------------
diff --git a/be/src/util/collection-metrics.h b/be/src/util/collection-metrics.h
index 1081c33..79ae072 100644
--- a/be/src/util/collection-metrics.h
+++ b/be/src/util/collection-metrics.h
@@ -160,7 +160,7 @@ class StatsMetric : public Metric {
     boost::lock_guard<boost::mutex> l(lock_);
     rapidjson::Value container(rapidjson::kObjectType);
     AddStandardFields(document, &container);
-    rapidjson::Value units(PrintTUnit(unit_).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit_).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
 
     if (StatsSelection & StatsType::COUNT) {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/debug-util.cc
----------------------------------------------------------------------
diff --git a/be/src/util/debug-util.cc b/be/src/util/debug-util.cc
index 1cb61e4..edf1749 100644
--- a/be/src/util/debug-util.cc
+++ b/be/src/util/debug-util.cc
@@ -50,62 +50,33 @@ DECLARE_string(hostname);
 
 namespace impala {
 
-#define THRIFT_ENUM_OUTPUT_FN_IMPL(E, MAP) \
-  ostream& operator<<(ostream& os, const E::type& e) {\
-    map<int, const char*>::const_iterator i;\
-    i = MAP.find(e);\
-    if (i != MAP.end()) {\
-      os << i->second;\
-    }\
-    return os;\
+#define PRINT_THRIFT_ENUM_IMPL(T) \
+  string PrintThriftEnum(const T::type& value) { \
+    map<int, const char*>::const_iterator it = _##T##_VALUES_TO_NAMES.find(value); \
+    return it == _##T##_VALUES_TO_NAMES.end() ? std::to_string(value) : it->second; \
   }
 
-// Macro to stamp out operator<< for thrift enums.  Why doesn't thrift do this?
-#define THRIFT_ENUM_OUTPUT_FN(E) THRIFT_ENUM_OUTPUT_FN_IMPL(E , _##E##_VALUES_TO_NAMES)
-
-// Macro to implement Print function that returns string for thrift enums. Make sure you
-// define a corresponding THRIFT_ENUM_OUTPUT_FN.
-#define THRIFT_ENUM_PRINT_FN(E) \
-  string Print##E(const E::type& e) {\
-    stringstream ss;\
-    ss << e;\
-    return ss.str();\
-  }
-
-THRIFT_ENUM_OUTPUT_FN(TFunctionBinaryType);
-THRIFT_ENUM_OUTPUT_FN(TCatalogObjectType);
-THRIFT_ENUM_OUTPUT_FN(TDdlType);
-THRIFT_ENUM_OUTPUT_FN(TCatalogOpType);
-THRIFT_ENUM_OUTPUT_FN(THdfsFileFormat);
-THRIFT_ENUM_OUTPUT_FN(THdfsCompression);
-THRIFT_ENUM_OUTPUT_FN(TReplicaPreference);
-THRIFT_ENUM_OUTPUT_FN(TSessionType);
-THRIFT_ENUM_OUTPUT_FN(TStmtType);
-THRIFT_ENUM_OUTPUT_FN(QueryState);
-THRIFT_ENUM_OUTPUT_FN(Encoding);
-THRIFT_ENUM_OUTPUT_FN(CompressionCodec);
-THRIFT_ENUM_OUTPUT_FN(Type);
-THRIFT_ENUM_OUTPUT_FN(TMetricKind);
-THRIFT_ENUM_OUTPUT_FN(TUnit);
-THRIFT_ENUM_OUTPUT_FN(TImpalaQueryOptions);
-
-THRIFT_ENUM_PRINT_FN(TCatalogObjectType);
-THRIFT_ENUM_PRINT_FN(TDdlType);
-THRIFT_ENUM_PRINT_FN(TCatalogOpType);
-THRIFT_ENUM_PRINT_FN(TReplicaPreference);
-THRIFT_ENUM_PRINT_FN(TSessionType);
-THRIFT_ENUM_PRINT_FN(TStmtType);
-THRIFT_ENUM_PRINT_FN(QueryState);
-THRIFT_ENUM_PRINT_FN(Encoding);
-THRIFT_ENUM_PRINT_FN(TMetricKind);
-THRIFT_ENUM_PRINT_FN(TUnit);
-THRIFT_ENUM_PRINT_FN(TImpalaQueryOptions);
-
-
-ostream& operator<<(ostream& os, const TUniqueId& id) {
-  os << PrintId(id);
-  return os;
-}
+PRINT_THRIFT_ENUM_IMPL(QueryState)
+PRINT_THRIFT_ENUM_IMPL(Encoding)
+PRINT_THRIFT_ENUM_IMPL(TCatalogObjectType)
+PRINT_THRIFT_ENUM_IMPL(TCatalogOpType)
+PRINT_THRIFT_ENUM_IMPL(TDdlType)
+PRINT_THRIFT_ENUM_IMPL(TExplainLevel)
+PRINT_THRIFT_ENUM_IMPL(THdfsCompression)
+PRINT_THRIFT_ENUM_IMPL(THdfsFileFormat)
+PRINT_THRIFT_ENUM_IMPL(THdfsSeqCompressionMode)
+PRINT_THRIFT_ENUM_IMPL(TImpalaQueryOptions)
+PRINT_THRIFT_ENUM_IMPL(TJoinDistributionMode)
+PRINT_THRIFT_ENUM_IMPL(TMetricKind)
+PRINT_THRIFT_ENUM_IMPL(TParquetArrayResolution)
+PRINT_THRIFT_ENUM_IMPL(TParquetFallbackSchemaResolution)
+PRINT_THRIFT_ENUM_IMPL(TPlanNodeType)
+PRINT_THRIFT_ENUM_IMPL(TPrefetchMode)
+PRINT_THRIFT_ENUM_IMPL(TReplicaPreference)
+PRINT_THRIFT_ENUM_IMPL(TRuntimeFilterMode)
+PRINT_THRIFT_ENUM_IMPL(TSessionType)
+PRINT_THRIFT_ENUM_IMPL(TStmtType)
+PRINT_THRIFT_ENUM_IMPL(TUnit)
 
 string PrintId(const TUniqueId& id, const string& separator) {
   stringstream out;
@@ -158,15 +129,6 @@ bool ParseId(const string& s, TUniqueId* id) {
   return valid;
 }
 
-string PrintPlanNodeType(const TPlanNodeType::type& type) {
-  map<int, const char*>::const_iterator i;
-  i = _TPlanNodeType_VALUES_TO_NAMES.find(type);
-  if (i != _TPlanNodeType_VALUES_TO_NAMES.end()) {
-    return i->second;
-  }
-  return "Invalid plan node type";
-}
-
 string PrintTuple(const Tuple* t, const TupleDescriptor& d) {
   if (t == NULL) return "null";
   stringstream out;

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/debug-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/debug-util.h b/be/src/util/debug-util.h
index 27d6cee..c5a5697 100644
--- a/be/src/util/debug-util.h
+++ b/be/src/util/debug-util.h
@@ -44,34 +44,34 @@ class Tuple;
 class TupleRow;
 class RowBatch;
 
-std::ostream& operator<<(std::ostream& os, const TFunctionBinaryType::type& op);
-std::ostream& operator<<(std::ostream& os, const TUniqueId& id);
-std::ostream& operator<<(std::ostream& os, const THdfsFileFormat::type& type);
-std::ostream& operator<<(std::ostream& os, const THdfsCompression::type& type);
-std::ostream& operator<<(std::ostream& os, const TStmtType::type& type);
-std::ostream& operator<<(std::ostream& os, const TUnit::type& type);
-std::ostream& operator<<(std::ostream& os, const TMetricKind::type& type);
-std::ostream& operator<<(std::ostream& os, const beeswax::QueryState::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::Encoding::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::CompressionCodec::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::Type::type& type);
+// TODO: remove these functions and use operator << after upgrading to Thrift 0.11.0 or
+// higher.
+std::string PrintThriftEnum(const beeswax::QueryState::type& value);
+std::string PrintThriftEnum(const parquet::Encoding::type& value);
+std::string PrintThriftEnum(const TCatalogObjectType::type& value);
+std::string PrintThriftEnum(const TCatalogOpType::type& value);
+std::string PrintThriftEnum(const TDdlType::type& value);
+std::string PrintThriftEnum(const TExplainLevel::type& value);
+std::string PrintThriftEnum(const THdfsCompression::type& value);
+std::string PrintThriftEnum(const THdfsFileFormat::type& value);
+std::string PrintThriftEnum(const THdfsSeqCompressionMode::type& value);
+std::string PrintThriftEnum(const TImpalaQueryOptions::type& value);
+std::string PrintThriftEnum(const TJoinDistributionMode::type& value);
+std::string PrintThriftEnum(const TMetricKind::type& value);
+std::string PrintThriftEnum(const TParquetArrayResolution::type& value);
+std::string PrintThriftEnum(const TParquetFallbackSchemaResolution::type& value);
+std::string PrintThriftEnum(const TPlanNodeType::type& value);
+std::string PrintThriftEnum(const TPrefetchMode::type& value);
+std::string PrintThriftEnum(const TReplicaPreference::type& value);
+std::string PrintThriftEnum(const TRuntimeFilterMode::type& value);
+std::string PrintThriftEnum(const TSessionType::type& value);
+std::string PrintThriftEnum(const TStmtType::type& value);
+std::string PrintThriftEnum(const TUnit::type& value);
 
 std::string PrintTuple(const Tuple* t, const TupleDescriptor& d);
 std::string PrintRow(TupleRow* row, const RowDescriptor& d);
 std::string PrintBatch(RowBatch* batch);
 std::string PrintId(const TUniqueId& id, const std::string& separator = ":");
-std::string PrintPlanNodeType(const TPlanNodeType::type& type);
-std::string PrintTCatalogObjectType(const TCatalogObjectType::type& type);
-std::string PrintTDdlType(const TDdlType::type& type);
-std::string PrintTCatalogOpType(const TCatalogOpType::type& type);
-std::string PrintTReplicaPreference(const TReplicaPreference::type& type);
-std::string PrintTSessionType(const TSessionType::type& type);
-std::string PrintTStmtType(const TStmtType::type& type);
-std::string PrintQueryState(const beeswax::QueryState::type& type);
-std::string PrintEncoding(const parquet::Encoding::type& type);
-std::string PrintTMetricKind(const TMetricKind::type& type);
-std::string PrintTUnit(const TUnit::type& type);
-std::string PrintTImpalaQueryOptions(const TImpalaQueryOptions::type& type);
 
 /// Returns the fully qualified path, e.g. "database.table.array_col.item.field"
 std::string PrintPath(const TableDescriptor& tbl_desc, const SchemaPath& path);

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/histogram-metric.h
----------------------------------------------------------------------
diff --git a/be/src/util/histogram-metric.h b/be/src/util/histogram-metric.h
index d4e09e4..43d4eaf 100644
--- a/be/src/util/histogram-metric.h
+++ b/be/src/util/histogram-metric.h
@@ -62,10 +62,10 @@ class HistogramMetric : public Metric {
       container.AddMember("min", histogram_->MinValue(), document->GetAllocator());
       container.AddMember("count", histogram_->TotalCount(), document->GetAllocator());
     }
-    rapidjson::Value type_value(PrintTMetricKind(TMetricKind::HISTOGRAM).c_str(),
+    rapidjson::Value type_value(PrintThriftEnum(TMetricKind::HISTOGRAM).c_str(),
         document->GetAllocator());
     container.AddMember("kind", type_value, document->GetAllocator());
-    rapidjson::Value units(PrintTUnit(unit()).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit()).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
 
     *value = container;

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/metrics.h
----------------------------------------------------------------------
diff --git a/be/src/util/metrics.h b/be/src/util/metrics.h
index b513c1e..99115c4 100644
--- a/be/src/util/metrics.h
+++ b/be/src/util/metrics.h
@@ -152,10 +152,9 @@ class ScalarMetric: public Metric {
     ToJsonValue(GetValue(), TUnit::NONE, document, &metric_value);
     container.AddMember("value", metric_value, document->GetAllocator());
 
-    rapidjson::Value type_value(PrintTMetricKind(kind()).c_str(),
-        document->GetAllocator());
+    rapidjson::Value type_value(PrintThriftEnum(kind()).c_str(), document->GetAllocator());
     container.AddMember("kind", type_value, document->GetAllocator());
-    rapidjson::Value units(PrintTUnit(unit()).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit()).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
     *val = container;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/network-util.cc
----------------------------------------------------------------------
diff --git a/be/src/util/network-util.cc b/be/src/util/network-util.cc
index 7a10965..49c96b1 100644
--- a/be/src/util/network-util.cc
+++ b/be/src/util/network-util.cc
@@ -174,15 +174,10 @@ bool IsWildcardAddress(const string& ipaddress) {
 
 string TNetworkAddressToString(const TNetworkAddress& address) {
   stringstream ss;
-  ss << address;
+  ss << address.hostname << ":" << dec << address.port;
   return ss.str();
 }
 
-ostream& operator<<(ostream& out, const TNetworkAddress& hostport) {
-  out << hostport.hostname << ":" << dec << hostport.port;
-  return out;
-}
-
 /// Pick a random port in the range of ephemeral ports
 /// https://tools.ietf.org/html/rfc6335
 int FindUnusedEphemeralPort(vector<int>* used_ports) {

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/network-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/network-util.h b/be/src/util/network-util.h
index 5b108dc..ef270ee 100644
--- a/be/src/util/network-util.h
+++ b/be/src/util/network-util.h
@@ -73,9 +73,6 @@ std::string TNetworkAddressToString(const TNetworkAddress& address);
 Status TNetworkAddressToSockaddr(const TNetworkAddress& address,
     kudu::Sockaddr* sockaddr);
 
-/// Prints a hostport as ipaddress:port
-std::ostream& operator<<(std::ostream& out, const TNetworkAddress& hostport);
-
 /// Returns a ephemeral port that is currently unused. Returns -1 on an error or if
 /// a free ephemeral port can't be found after 100 tries. If 'used_ports' is non-NULL,
 /// does not select those ports and adds the selected port to 'used_ports'.

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/be/src/util/webserver.cc
----------------------------------------------------------------------
diff --git a/be/src/util/webserver.cc b/be/src/util/webserver.cc
index ea0a6e9..a77c6da 100644
--- a/be/src/util/webserver.cc
+++ b/be/src/util/webserver.cc
@@ -220,10 +220,10 @@ string Webserver::Url() {
 }
 
 Status Webserver::Start() {
-  LOG(INFO) << "Starting webserver on " << http_address_;
+  LOG(INFO) << "Starting webserver on " << TNetworkAddressToString(http_address_);
 
   stringstream listening_spec;
-  listening_spec << http_address_;
+  listening_spec << TNetworkAddressToString(http_address_);
 
   if (IsSecure()) {
     LOG(INFO) << "Webserver: Enabling HTTPS support";
@@ -320,7 +320,8 @@ Status Webserver::Start() {
 
   if (context_ == nullptr) {
     stringstream error_msg;
-    error_msg << "Webserver: Could not start on address " << http_address_;
+    error_msg << "Webserver: Could not start on address "
+              << TNetworkAddressToString(http_address_);
     return Status(error_msg.str());
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/testdata/workloads/functional-query/queries/QueryTest/set.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/set.test b/testdata/workloads/functional-query/queries/QueryTest/set.test
index 57c5131..2779d55 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/set.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/set.test
@@ -14,7 +14,7 @@ set all;
 'DEBUG_ACTION','','DEVELOPMENT'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','1','REGULAR'
+'EXPLAIN_LEVEL','STANDARD','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'
@@ -40,7 +40,7 @@ set all;
 'DEBUG_ACTION','','DEVELOPMENT'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','3','REGULAR'
+'EXPLAIN_LEVEL','VERBOSE','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'
@@ -66,7 +66,7 @@ set all;
 'DEBUG_ACTION','','DEVELOPMENT'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','0','REGULAR'
+'EXPLAIN_LEVEL','MINIMAL','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'

http://git-wip-us.apache.org/repos/asf/impala/blob/8e86678d/tests/shell/test_shell_commandline.py
----------------------------------------------------------------------
diff --git a/tests/shell/test_shell_commandline.py b/tests/shell/test_shell_commandline.py
index 6aa05f6..4db71f7 100644
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -247,8 +247,8 @@ class TestImpalaShell(ImpalaTestSuite):
     args = '-q "set"'
     result_set = run_impala_shell_cmd(args)
     assert 'MEM_LIMIT: [0]' in result_set.stdout
-    # test to check that explain_level is 1
-    assert 'EXPLAIN_LEVEL: [1]' in result_set.stdout
+    # test to check that explain_level is STANDARD
+    assert 'EXPLAIN_LEVEL: [STANDARD]' in result_set.stdout
     # test to check that configs without defaults show up as []
     assert 'COMPRESSION_CODEC: []' in result_set.stdout
     # test values displayed after setting value

[4/9] impala git commit: IMPALA-6896: NullPointerException in DESCRIBE FORMATTED on views

Posted by sa...@apache.org.

IMPALA-6896: NullPointerException in DESCRIBE FORMATTED on views

This patch fixes an issue where in ALTER VIEW the storage descriptor
is created with a new instance instead of reusing the existing
storage descriptor. This causes an issue where some HMS attributes
become nullable causing a NullPointerException.

The patch also differentiates between updating view attributes for
CREATE VIEW and ALTER VIEW.

Testing:
- Ran all front-end tests
- Added a new end-to-end test
- Ran the all end-to-end metadata tests

Change-Id: Ica2fb0c4f4b09cdf36eeb4911a1cbe7e98381d9e
Reviewed-on: http://gerrit.cloudera.org:8080/10132
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/34b2f218
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/34b2f218
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/34b2f218

Branch: refs/heads/master
Commit: 34b2f218411155de076d3e5463376c585bcff4f3
Parents: 8e86678
Author: Fredy Wijaya <fw...@cloudera.com>
Authored: Thu Apr 19 23:51:56 2018 -0500
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Apr 20 11:01:44 2018 +0000

----------------------------------------------------------------------
 .../impala/service/CatalogOpExecutor.java       | 29 +++++++++++++-----
 tests/metadata/test_ddl.py                      | 32 ++++++++++++++++++++
 2 files changed, 54 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/34b2f218/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index 87513aa..fdee124 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -702,7 +702,7 @@ public class CatalogOpExecutor {
       }
 
       // Set the altered view attributes and update the metastore.
-      setViewAttributes(params, msTbl);
+      setAlterViewAttributes(params, msTbl);
       if (LOG.isTraceEnabled()) {
         LOG.trace(String.format("Altering view %s", tableName));
       }
@@ -1802,7 +1802,7 @@ public class CatalogOpExecutor {
     // Create new view.
     org.apache.hadoop.hive.metastore.api.Table view =
         new org.apache.hadoop.hive.metastore.api.Table();
-    setViewAttributes(params, view);
+    setCreateViewAttributes(params, view);
     LOG.trace(String.format("Creating view %s", tableName));
     if (!createTable(view, params.if_not_exists, null, response)) {
       addSummary(response, "View already exists.");
@@ -1901,9 +1901,10 @@ public class CatalogOpExecutor {
   }
 
   /**
-   * Sets the given params in the metastore table as appropriate for a view.
+   * Sets the given params in the metastore table as appropriate for a
+   * create view operation.
    */
-  private void setViewAttributes(TCreateOrAlterViewParams params,
+  private void setCreateViewAttributes(TCreateOrAlterViewParams params,
       org.apache.hadoop.hive.metastore.api.Table view) {
     view.setTableType(TableType.VIRTUAL_VIEW.toString());
     view.setViewOriginalText(params.getOriginal_view_def());
@@ -1912,12 +1913,11 @@ public class CatalogOpExecutor {
     view.setTableName(params.getView_name().getTable_name());
     view.setOwner(params.getOwner());
     if (view.getParameters() == null) view.setParameters(new HashMap<String, String>());
-    if (params.isSetComment() &&  params.getComment() != null) {
+    if (params.isSetComment() && params.getComment() != null) {
       view.getParameters().put("comment", params.getComment());
     }
-
-    // Add all the columns to a new storage descriptor.
     StorageDescriptor sd = new StorageDescriptor();
+    // Add all the columns to a new storage descriptor.
     sd.setCols(buildFieldSchemaList(params.getColumns()));
     // Set a dummy SerdeInfo for Hive.
     sd.setSerdeInfo(new SerDeInfo());
@@ -1925,6 +1925,21 @@ public class CatalogOpExecutor {
   }
 
   /**
+   * Sets the given params in the metastore table as appropriate for an
+   * alter view operation.
+   */
+  private void setAlterViewAttributes(TCreateOrAlterViewParams params,
+      org.apache.hadoop.hive.metastore.api.Table view) {
+    view.setViewOriginalText(params.getOriginal_view_def());
+    view.setViewExpandedText(params.getExpanded_view_def());
+    if (params.isSetComment() && params.getComment() != null) {
+      view.getParameters().put("comment", params.getComment());
+    }
+    // Add all the columns to a new storage descriptor.
+    view.getSd().setCols(buildFieldSchemaList(params.getColumns()));
+  }
+
+  /**
    * Appends one or more columns to the given table, optionally replacing all existing
    * columns.
    */

http://git-wip-us.apache.org/repos/asf/impala/blob/34b2f218/tests/metadata/test_ddl.py
----------------------------------------------------------------------
diff --git a/tests/metadata/test_ddl.py b/tests/metadata/test_ddl.py
index 27748bd..0830060 100644
--- a/tests/metadata/test_ddl.py
+++ b/tests/metadata/test_ddl.py
@@ -26,6 +26,7 @@ from tests.common.parametrize import UniqueDatabase
 from tests.common.skip import SkipIf, SkipIfADLS, SkipIfLocal
 from tests.common.test_dimensions import create_single_exec_option_dimension
 from tests.util.filesystem_utils import WAREHOUSE, IS_HDFS, IS_S3, IS_ADLS
+from tests.common.impala_cluster import ImpalaCluster
 
 # Validates DDL statements (create, drop)
 class TestDdlStatements(TestDdlBase):
@@ -361,6 +362,37 @@ class TestDdlStatements(TestDdlBase):
 |  01:SCAN HDFS [functional.alltypes b]
 00:SCAN HDFS [functional.alltypestiny a]""" in '\n'.join(plan.data)
 
+  def test_views_describe(self, vector, unique_database):
+    # IMPALA-6896: Tests that altered views can be described by all impalads.
+    impala_cluster = ImpalaCluster()
+    impalads = impala_cluster.impalads
+    first_client = impalads[0].service.create_beeswax_client()
+    try:
+      self.execute_query_expect_success(first_client,
+                                        "create view {0}.test_describe_view as "
+                                        "select * from functional.alltypes"
+                                        .format(unique_database), {'sync_ddl': 1})
+      self.execute_query_expect_success(first_client,
+                                        "alter view {0}.test_describe_view as "
+                                        "select * from functional.alltypesagg"
+                                        .format(unique_database))
+    finally:
+      first_client.close()
+
+    for impalad in impalads:
+      client = impalad.service.create_beeswax_client()
+      try:
+        while True:
+          result = self.execute_query_expect_success(
+              client, "describe formatted {0}.test_describe_view"
+              .format(unique_database))
+          if any("select * from functional.alltypesagg" in s.lower()
+                 for s in result.data):
+            break
+          time.sleep(1)
+      finally:
+        client.close()
+
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_functions_ddl(self, vector, unique_database):
     self.run_test_case('QueryTest/functions-ddl', vector, use_db=unique_database,

[6/9] impala git commit: IMPALA-5893: Remove old kinit code for Impala 3

Posted by sa...@apache.org.

IMPALA-5893: Remove old kinit code for Impala 3

We've gone through a couple of releases with Kudu's kinit as the
default way to use kerberos and we've not come across any major issues.

Since we're going to have a major release soon, it's time to get rid of
the old Kinit code that's largely unused for a while now.

Testing: Made sure that our current kerberos tests continue to work
without the old code.

Cherry-picks: not for 2.x

Change-Id: Ic78de10f3fb9ec36537de7a090916e4be123234b
Reviewed-on: http://gerrit.cloudera.org:8080/9941
Reviewed-by: Sailesh Mukil <sa...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/4dc3d340
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/4dc3d340
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/4dc3d340

Branch: refs/heads/master
Commit: 4dc3d340d866ef9a7ef1f654cbfff402e0bc69cc
Parents: 7134d81
Author: Sailesh Mukil <sa...@cloudera.com>
Authored: Thu Apr 5 18:19:17 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Apr 20 21:40:13 2018 +0000

----------------------------------------------------------------------
 be/src/common/global-flags.cc         |  2 +
 be/src/rpc/auth-provider.h            | 21 +-------
 be/src/rpc/authentication.cc          | 82 ++----------------------------
 be/src/rpc/rpc-mgr-kerberized-test.cc |  6 +--
 be/src/rpc/thrift-server-test.cc      |  5 +-
 be/src/testutil/mini-kdc-wrapper.h    |  5 +-
 6 files changed, 11 insertions(+), 110 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/common/global-flags.cc
----------------------------------------------------------------------
diff --git a/be/src/common/global-flags.cc b/be/src/common/global-flags.cc
index a15e91b..ea88b28 100644
--- a/be/src/common/global-flags.cc
+++ b/be/src/common/global-flags.cc
@@ -226,6 +226,7 @@ REMOVED_FLAG(enable_partitioned_aggregation);
 REMOVED_FLAG(enable_partitioned_hash_join);
 REMOVED_FLAG(enable_phj_probe_side_filtering);
 REMOVED_FLAG(enable_rm);
+REMOVED_FLAG(kerberos_reinit_interval);
 REMOVED_FLAG(llama_addresses);
 REMOVED_FLAG(llama_callback_port);
 REMOVED_FLAG(llama_host);
@@ -246,4 +247,5 @@ REMOVED_FLAG(rpc_cnxn_retry_interval_ms);
 REMOVED_FLAG(staging_cgroup);
 REMOVED_FLAG(suppress_unknown_disk_id_warnings);
 REMOVED_FLAG(use_statestore);
+REMOVED_FLAG(use_kudu_kinit);
 REMOVED_FLAG(disable_admission_control);

http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/rpc/auth-provider.h
----------------------------------------------------------------------
diff --git a/be/src/rpc/auth-provider.h b/be/src/rpc/auth-provider.h
index 3e5517f..f3180d5 100644
--- a/be/src/rpc/auth-provider.h
+++ b/be/src/rpc/auth-provider.h
@@ -24,14 +24,11 @@
 
 #include "common/status.h"
 #include "util/promise.h"
-#include "util/thread.h"
 
 namespace sasl { class TSasl; }
 
 namespace impala {
 
-class Thread;
-
 /// An AuthProvider creates Thrift transports that are set up to authenticate themselves
 /// using a protocol such as Kerberos or PLAIN/SASL. Both server and client transports are
 /// provided by this class, using slightly different mechanisms (servers use a factory,
@@ -70,9 +67,8 @@ class SaslAuthProvider : public AuthProvider {
   SaslAuthProvider(bool is_internal) : has_ldap_(false), is_internal_(is_internal),
       needs_kinit_(false) {}
 
-  /// Performs initialization of external state.  If we're using kerberos and
-  /// need to kinit, start that thread.  If we're using ldap, set up appropriate
-  /// certificate usage.
+  /// Performs initialization of external state. Kinit if configured to use kerberos.
+  /// If we're using ldap, set up appropriate certificate usage.
   virtual Status Start();
 
   /// Wrap the client transport with a new TSaslClientTransport.  This is only for
@@ -143,19 +139,6 @@ class SaslAuthProvider : public AuthProvider {
   /// function as a client.
   bool needs_kinit_;
 
-  /// Runs "RunKinit" below if needs_kinit_ is true and FLAGS_use_kudu_kinit is false
-  /// and FLAGS_use_krpc is false. Once started, this thread lives as long as the process
-  /// does and periodically forks impalad and execs the 'kinit' process.
-  std::unique_ptr<Thread> kinit_thread_;
-
-  /// Periodically (roughly once every FLAGS_kerberos_reinit_interval minutes) calls kinit
-  /// to get a ticket granting ticket from the kerberos server for principal_, which is
-  /// kept in the kerberos cache associated with this process. This ensures that we have
-  /// valid kerberos credentials when operating as a client. Once the first attempt to
-  /// obtain a ticket has completed, first_kinit is Set() with the status of the operation.
-  /// Additionally, if the first attempt fails, this method will return.
-  void RunKinit(Promise<Status>* first_kinit);
-
   /// One-time kerberos-specific environment variable setup.  Called by InitKerberos().
   Status InitKerberosEnv() WARN_UNUSED_RESULT;
 };

http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/rpc/authentication.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/authentication.cc b/be/src/rpc/authentication.cc
index 4c3df50..3eb77bb 100644
--- a/be/src/rpc/authentication.cc
+++ b/be/src/rpc/authentication.cc
@@ -20,7 +20,6 @@
 #include <stdio.h>
 #include <signal.h>
 #include <boost/algorithm/string.hpp>
-#include <boost/thread/thread.hpp>
 #include <boost/scoped_ptr.hpp>
 #include <boost/random/mersenne_twister.hpp>
 #include <boost/random/uniform_int.hpp>
@@ -48,7 +47,6 @@
 #include "util/network-util.h"
 #include "util/os-util.h"
 #include "util/promise.h"
-#include "util/thread.h"
 #include "util/time.h"
 
 #include <sys/types.h>    // for stat system call
@@ -74,11 +72,6 @@ DECLARE_string(be_principal);
 DECLARE_string(krb5_conf);
 DECLARE_string(krb5_debug_file);
 
-// TODO: Remove this flag in a compatibility-breaking release. (IMPALA-5893)
-DEFINE_int32(kerberos_reinit_interval, 60,
-    "Interval, in minutes, between kerberos ticket renewals. "
-    "Only used when FLAGS_use_krpc is false");
-
 DEFINE_string(sasl_path, "", "Colon separated list of paths to look for SASL "
     "security library plugins.");
 DEFINE_bool(enable_ldap_auth, false,
@@ -108,13 +101,6 @@ DEFINE_string(internal_principals_whitelist, "hdfs", "(Advanced) Comma-separated
     "'hdfs' which is the system user that in certain deployments must access "
     "catalog server APIs.");
 
-// TODO: Remove this flag and the old kerberos code once we remove 'use_krpc' flag.
-// (IMPALA-5893)
-DEFINE_bool(use_kudu_kinit, true, "If true, Impala will programatically perform kinit "
-    "by calling into the libkrb5 library using the provided APIs. If false, it will fork "
-    "off a kinit process. If use_krpc=true, this flag is treated as true regardless of "
-    "what it's set to.");
-
 namespace impala {
 
 // Sasl callbacks.  Why are these here?  Well, Sasl isn't that bright, and
@@ -496,51 +482,6 @@ static int SaslGetPath(void* context, const char** path) {
   return SASL_OK;
 }
 
-// When operating as a Kerberos client (internal connections only), we need to
-// 'kinit' as the principal.  A thread is created and calls this function for
-// that purpose, and to periodically renew the ticket as well.
-//
-// first_kinit: Used to communicate success/failure of the initial kinit call to
-//              the parent thread
-// Return: Only if the first call to 'kinit' fails
-void SaslAuthProvider::RunKinit(Promise<Status>* first_kinit) {
-
-  // Pass the path to the key file and the principal.
-  const string kinit_cmd = Substitute("kinit -k -t $0 $1 2>&1",
-      keytab_file_, principal_);
-
-  bool first_time = true;
-  std::random_device rd;
-  mt19937 generator(rd());
-  uniform_int<> dist(0, 300);
-
-  while (true) {
-    LOG(INFO) << "Registering " << principal_ << ", keytab file " << keytab_file_;
-    string kinit_output;
-    bool success = RunShellProcess(kinit_cmd, &kinit_output);
-
-    if (!success) {
-      const string& err_msg = Substitute(
-          "Failed to obtain Kerberos ticket for principal: $0. $1", principal_,
-          kinit_output);
-      if (first_time) {
-        first_kinit->Set(Status(err_msg));
-        return;
-      } else {
-        LOG(ERROR) << err_msg;
-      }
-    } else if (first_time) {
-      first_time = false;
-      first_kinit->Set(Status::OK());
-    }
-
-    // Sleep for the renewal interval, minus a random time between 0-5 minutes to help
-    // avoid a storm at the KDC. Additionally, never sleep less than a minute to
-    // reduce KDC stress due to frequent renewals.
-    SleepForMs(1000 * max((60 * FLAGS_kerberos_reinit_interval) - dist(generator), 60));
-  }
-}
-
 namespace {
 
 // SASL requires mutexes for thread safety, but doesn't implement
@@ -842,25 +783,10 @@ Status SaslAuthProvider::Start() {
   if (needs_kinit_) {
     DCHECK(is_internal_);
     DCHECK(!principal_.empty());
-    if (FLAGS_use_kudu_kinit || FLAGS_use_krpc) {
-      // With KRPC enabled, we always rely on the Kudu library to carry out the Kerberos
-      // authentication during connection negotiation.
-      if (!FLAGS_use_kudu_kinit) {
-        LOG(INFO) << "Ignoring --use_kudu_kinit=false as KRPC and Kerberos are enabled";
-      }
-      // Starts a thread that periodically does a 'kinit'. The thread lives as long as the
-      // process does.
-      KUDU_RETURN_IF_ERROR(kudu::security::InitKerberosForServer(principal_, keytab_file_,
-          KRB5CCNAME_PATH, false), "Could not init kerberos");
-    } else {
-      Promise<Status> first_kinit;
-      stringstream thread_name;
-      thread_name << "kinit-" << principal_;
-      RETURN_IF_ERROR(Thread::Create("authentication", thread_name.str(),
-          &SaslAuthProvider::RunKinit, this, &first_kinit, &kinit_thread_));
-      LOG(INFO) << "Waiting for Kerberos ticket for principal: " << principal_;
-      RETURN_IF_ERROR(first_kinit.Get());
-    }
+    // Starts a thread that periodically does a 'kinit'. The thread lives as long as the
+    // process does.
+    KUDU_RETURN_IF_ERROR(kudu::security::InitKerberosForServer(principal_, keytab_file_,
+        KRB5CCNAME_PATH, false), "Could not init kerberos");
     LOG(INFO) << "Kerberos ticket granted to " << principal_;
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/rpc/rpc-mgr-kerberized-test.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/rpc-mgr-kerberized-test.cc b/be/src/rpc/rpc-mgr-kerberized-test.cc
index 141f359..0407308 100644
--- a/be/src/rpc/rpc-mgr-kerberized-test.cc
+++ b/be/src/rpc/rpc-mgr-kerberized-test.cc
@@ -18,7 +18,6 @@
 #include "rpc/rpc-mgr-test-base.h"
 #include "service/fe-support.h"
 
-DECLARE_bool(use_kudu_kinit);
 DECLARE_bool(use_krpc);
 
 DECLARE_string(be_principal);
@@ -39,9 +38,7 @@ class RpcMgrKerberizedTest :
     public RpcMgrTestBase<testing::TestWithParam<KerberosSwitch> > {
 
   virtual void SetUp() override {
-    KerberosSwitch k = GetParam();
     FLAGS_use_krpc = true;
-    FLAGS_use_kudu_kinit = k == USE_KRPC_KUDU_KERBEROS;
     FLAGS_principal = "dummy-service/host@realm";
     FLAGS_be_principal = strings::Substitute("$0@$1", kdc_principal, kdc_realm);
     ASSERT_OK(InitAuth(CURRENT_EXECUTABLE_PATH));
@@ -56,8 +53,7 @@ class RpcMgrKerberizedTest :
 
 INSTANTIATE_TEST_CASE_P(KerberosOnAndOff,
                         RpcMgrKerberizedTest,
-                        ::testing::Values(USE_KRPC_IMPALA_KERBEROS,
-                                          USE_KRPC_KUDU_KERBEROS));
+                        ::testing::Values(KERBEROS_ON));
 
 TEST_P(RpcMgrKerberizedTest, MultipleServicesTls) {
   // TODO: We're starting a seperate RpcMgr here instead of configuring

http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/rpc/thrift-server-test.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-server-test.cc b/be/src/rpc/thrift-server-test.cc
index f0a0bc5..03bd295 100644
--- a/be/src/rpc/thrift-server-test.cc
+++ b/be/src/rpc/thrift-server-test.cc
@@ -35,7 +35,6 @@ using namespace strings;
 using namespace apache::thrift;
 using apache::thrift::transport::SSLProtocol;
 
-DECLARE_bool(use_kudu_kinit);
 DECLARE_bool(use_krpc);
 
 DECLARE_string(principal);
@@ -110,7 +109,6 @@ class ThriftKerberizedParamsTest :
       FLAGS_principal.clear();
       FLAGS_be_principal.clear();
     } else {
-      FLAGS_use_kudu_kinit = k == USE_THRIFT_KUDU_KERBEROS;
       FLAGS_principal = "dummy-service/host@realm";
       FLAGS_be_principal = strings::Substitute("$0@$1", kdc_principal, kdc_realm);
     }
@@ -127,8 +125,7 @@ class ThriftKerberizedParamsTest :
 INSTANTIATE_TEST_CASE_P(KerberosOnAndOff,
                         ThriftKerberizedParamsTest,
                         ::testing::Values(KERBEROS_OFF,
-                                          USE_THRIFT_KUDU_KERBEROS,
-                                          USE_THRIFT_IMPALA_KERBEROS));
+                                          KERBEROS_ON));
 
 TEST(ThriftTestBase, Connectivity) {
   int port = GetServerPort();

http://git-wip-us.apache.org/repos/asf/impala/blob/4dc3d340/be/src/testutil/mini-kdc-wrapper.h
----------------------------------------------------------------------
diff --git a/be/src/testutil/mini-kdc-wrapper.h b/be/src/testutil/mini-kdc-wrapper.h
index 17c174a..602c15b 100644
--- a/be/src/testutil/mini-kdc-wrapper.h
+++ b/be/src/testutil/mini-kdc-wrapper.h
@@ -29,10 +29,7 @@ namespace impala {
 
 enum KerberosSwitch {
   KERBEROS_OFF,
-  USE_KRPC_KUDU_KERBEROS,    // FLAGS_use_kudu_kinit = true,  FLAGS_use_krpc = true
-  USE_KRPC_IMPALA_KERBEROS,  // FLAGS_use_kudu_kinit = false, FLAGS_use_krpc = true
-  USE_THRIFT_KUDU_KERBEROS,  // FLAGS_use_kudu_kinit = true,  FLAGS_use_krpc = false
-  USE_THRIFT_IMPALA_KERBEROS // FLAGS_use_kudu_kinit = false, FLAGS_use_krpc = false
+  KERBEROS_ON
 };
 
 /// This class allows tests to easily start and stop a KDC and configure Impala's auth

[7/9] impala git commit: IMPALA-6898: Avoid duplicate Kudu load during full dataload

Posted by sa...@apache.org.

IMPALA-6898: Avoid duplicate Kudu load during full dataload

testdata/bin/create-load-data.sh does bin/load-data.py for
functional/exhaustive, tpch/core, and tpcds/core in a
first phase, then it loads functional and tpch for Kudu
in a second phase. For a full dataload, this second phase
is not necessary. functional/exhaustive and tpch/core
already include Kudu.

This avoids the second phase when doing a full dataload.
The second phase is still necessary when loading from
a snapshot, and this does not change that behavior.

This saves a couple minutes off of full dataload.

Change-Id: Ic023d230f99126ed37795106c38faae5f0cb608e
Reviewed-on: http://gerrit.cloudera.org:8080/10128
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/5bc5279b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/5bc5279b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/5bc5279b

Branch: refs/heads/master
Commit: 5bc5279b07451f8c6fb8af29ef83127dc7785440
Parents: 4dc3d34
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Thu Apr 19 16:14:03 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Sat Apr 21 01:08:50 2018 +0000

----------------------------------------------------------------------
 testdata/bin/create-load-data.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/5bc5279b/testdata/bin/create-load-data.sh
----------------------------------------------------------------------
diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index e50515b..51ba449 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -540,8 +540,10 @@ elif [ "${TARGET_FILESYSTEM}" = "hdfs" ];  then
       load-data "functional-query" "core" "hbase/none"
 fi
 
-if $KUDU_IS_SUPPORTED; then
+if [[ $SKIP_METADATA_LOAD -eq 1 && $KUDU_IS_SUPPORTED ]]; then
   # Tests depend on the kudu data being clean, so load the data from scratch.
+  # This is only necessary if this is not a full dataload, because a full dataload
+  # already loads Kudu functional and TPC-H tables from scratch.
   run-step-backgroundable "Loading Kudu functional" load-kudu.log \
         load-data "functional-query" "core" "kudu/none/none" force
   run-step-backgroundable "Loading Kudu TPCH" load-kudu-tpch.log \

[8/9] impala git commit: IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

Posted by sa...@apache.org.

http://git-wip-us.apache.org/repos/asf/impala/blob/b9271ccf/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index a09188e..47e0c5c 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -38,22 +38,26 @@ under the License.
   <conbody>
 
     <p>
-      The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
-      most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
-      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
-      whether a fix is in the pipeline.
+      The following sections describe known issues and workarounds in Impala, as of the current
+      production release. This page summarizes the most serious or frequently encountered issues
+      in the current release, to help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
+      site, where you can see the diagnosis and whether a fix is in the pipeline.
     </p>
 
     <note>
-      The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
-      you are experiencing has already been reported, or which release an issue is fixed in, search on the
-      <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org JIRA tracker</xref>.
+      The online issue tracking system for Impala contains comprehensive information and is
+      updated in real time. To verify whether an issue you are experiencing has already been
+      reported, or which release an issue is fixed in, search on the
+      <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org
+      JIRA tracker</xref>.
     </note>
 
     <p outputclass="toc inpage"/>
 
     <p>
-      For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
+      For issues fixed in various Impala releases, see
+      <xref href="impala_fixed_issues.xml#fixed_issues"/>.
     </p>
 
 <!-- Use as a template for new issues.
@@ -73,62 +77,6 @@ under the License.
 
   </conbody>
 
-<!-- New known issues for Impala 2.3.
-
-Title: Server-to-server SSL and Kerberos do not work together
-Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2598
-Severity: Medium.  Server-to-server SSL is practically unusable but this is a new feature.
-Workaround: No known workaround.
-
-Title: Queries may hang on server-to-server exchange errors
-Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2592
-Severity: Low.  This does not occur frequently.
-Workaround: No known workaround.
-
-Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
-Description: Incremental stats use up about 400 bytes per partition X column.  So for a table with 20K partitions and 100 columns this is about 800 MB.  When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
-Severity: Low.  This does not occur frequently.
-Workaround:  Reduce the number of partitions.
-
-More from the JIRA report of blocker/critical issues:
-
-IMPALA-2093
-Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
-IMPALA-1652
-Incorrect results with basic predicate on CHAR typed column.
-IMPALA-1459
-Incorrect assignment of predicates through an outer join in an inline view.
-IMPALA-2665
-Incorrect assignment of On-clause predicate inside inline view with an outer join.
-IMPALA-2603
-Crash: impala::Coordinator::ValidateCollectionSlots
-IMPALA-2375
-Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
-IMPALA-1862
-Invalid bool value not reported as a scanner error
-IMPALA-1792
-ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
-IMPALA-1578
-Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
-IMPALA-2643
-Duplicated column in inline view causes dropping null slots during scan
-IMPALA-2005
-A failed CTAS does not drop the table if the insert fails.
-IMPALA-1821
-Casting scenarios with invalid/inconsistent results
-
-Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
-
-https://issues.apache.org/jira/browse/IMPALA-2665 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2643 - Already have
-https://issues.apache.org/jira/browse/IMPALA-1459 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
-
--->
-
   <concept id="known_issues_startup">
 
     <title>Impala Known Issues: Startup</title>
@@ -136,42 +84,60 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues can prevent one or more Impala-related daemons
-        from starting properly.
+        These issues can prevent one or more Impala-related daemons from starting properly.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4978">
+
       <title id="IMPALA-5253">Problem retrieving FQDN causes startup problem on kerberized clusters</title>
+
       <conbody>
+
         <p>
           The method Impala uses to retrieve the host name while constructing the Kerberos
-          principal is the <codeph>gethostname()</codeph> system call. This function might
-          not always return the fully qualified domain name, depending on the network
-          configuration. If the daemons cannot determine the FQDN, Impala does not start
-          on a kerberized cluster.
+          principal is the <codeph>gethostname()</codeph> system call. This function might not
+          always return the fully qualified domain name, depending on the network configuration.
+          If the daemons cannot determine the FQDN, Impala does not start on a kerberized
+          cluster.
         </p>
+
         <p>
           This problem might occur immediately after an upgrade of a CDH cluster, due to changes
-          in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically to
-          the Impala-related daemons. (See the issue <q>hostname parameter is not passed to Impala catalog role</q>
-          at <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the Cloudera Manager Known Issues page</xref>.)
+          in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically
+          to the Impala-related daemons. (See the issue <q>hostname parameter is not passed to
+          Impala catalog role</q> at
+          <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the
+          Cloudera Manager Known Issues page</xref>.)
         </p>
-        <p><b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> The issue is expected to occur less frequently on systems
-          with fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref>,
-          or both. Even on systems with fixes for both of these issues, the workaround might still
-          be required in some cases.
+
+        <p>
+          <b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
+          <xref keyref="IMPALA-5253">IMPALA-5253</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
         </p>
-        <p><b>Workaround:</b> Test if a host is affected by checking whether the output of the
-          <cmdname>hostname</cmdname> command includes the FQDN. On hosts where <cmdname>hostname</cmdname>
-          only returns the short name, pass the command-line flag
-          <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph>
-          in the startup options of all Impala-related daemons.
+
+        <p>
+          <b>Resolution:</b> The issue is expected to occur less frequently on systems with
+          fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
+          <xref keyref="IMPALA-5253">IMPALA-5253</xref>, or both. Even on systems with fixes for
+          both of these issues, the workaround might still be required in some cases.
         </p>
+
+        <p>
+          <b>Workaround:</b> Test if a host is affected by checking whether the output of the
+          <cmdname>hostname</cmdname> command includes the FQDN. On hosts where
+          <cmdname>hostname</cmdname> only returns the short name, pass the command-line flag
+          <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph> in the
+          startup options of all Impala-related daemons.
+        </p>
+
       </conbody>
+
     </concept>
 
   </concept>
@@ -188,23 +154,100 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
     </conbody>
 
+    <concept id="impala-6841">
+
+      <title>Unable to view large catalog objects in catalogd Web UI</title>
+
+      <conbody>
+
+        <p>
+          In <codeph>catalogd</codeph> Web UI, you can list metadata objects and view their
+          details. These details are accessed via a link and printed to a string formatted using
+          thrift's <codeph>DebugProtocol</codeph>. Printing large objects (> 1 GB) in Web UI can
+          crash <codeph>catalogd</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-6841">IMPALA-6841</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="impala-6389">
+
+      <title><b>Crash when querying tables with "\0" as a row delimiter</b></title>
+
+      <conbody>
+
+        <p>
+          When querying a textfile-based Impala table that uses <codeph>\0</codeph> as a new
+          line separator, Impala crashes.
+        </p>
+
+        <p>
+          The following sequence causes <codeph>impalad</codeph> to crash:
+        </p>
+
+<pre>create table tab_separated(id bigint, s string, n int, t timestamp, b boolean)
+  row format delimited
+  fields terminated by '\t' escaped by '\\' lines terminated by '\000'
+  stored as textfile;
+select * from tab_separated; -- Done. 0 results.
+insert into tab_separated (id, s) values (100, ''); -- Success.
+select * from tab_separated; -- 20 second delay before getting "Cancelled due to unreachable impalad(s): xxxx:22000"</pre>
+
+        <p>
+          <b>Bug:</b>
+          <xref keyref="IMPALA-6389" scope="external" format="html"
+            >IMPALA-6389</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use an alternative delimiter, e.g. <codeph>\001</codeph>.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-4828">
+
       <title>Altering Kudu table schema outside of Impala may result in crash on read</title>
+
       <conbody>
+
         <p>
-          Creating a table in Impala, changing the column schema outside of Impala,
-          and then reading again in Impala may result in a crash. Neither Impala nor
-          the Kudu client validates the schema immediately before reading, so Impala may attempt to
-          dereference pointers that aren't there. This happens if a string column is dropped
-          and then a new, non-string column is added with the old string column's name.
+          Creating a table in Impala, changing the column schema outside of Impala, and then
+          reading again in Impala may result in a crash. Neither Impala nor the Kudu client
+          validates the schema immediately before reading, so Impala may attempt to dereference
+          pointers that aren't there. This happens if a string column is dropped and then a new,
+          non-string column is added with the old string column's name.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> Run the statement <codeph>REFRESH <varname>table_name</varname></codeph>
-          after any occasion when the table structure, such as the number, names, and data types
-          of columns, are modified outside of Impala using the Kudu API.
+
+        <p>
+          <b>Bug:</b>
+          <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Run the statement <codeph>REFRESH
+          <varname>table_name</varname></codeph> after any occasion when the table structure,
+          such as the number, names, and data types of columns, are modified outside of Impala
+          using the Kudu API.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-1972" rev="IMPALA-1972">
@@ -214,10 +257,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Trying to get the details of a query through the debug web page
-          while the query is planning will block new queries that had not
-          started when the web page was requested. The web UI becomes
-          unresponsive until the planning phase is finished.
+          Trying to get the details of a query through the debug web page while the query is
+          planning will block new queries that had not started when the web page was requested.
+          The web UI becomes unresponsive until the planning phase is finished.
         </p>
 
         <p>
@@ -228,22 +270,44 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4595">
+
       <title>Linking IR UDF module to main module crashes Impala</title>
+
       <conbody>
+
         <p>
-          A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash
-          when executed.
+          A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash when
+          executed.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead of a
-          <codeph>.ll</codeph> IR module.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead
+          of a <codeph>.ll</codeph> IR module.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3069" rev="IMPALA-3069">
@@ -253,8 +317,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
-          columns, could cause a memory allocation of more than 2 GB resulting in a crash.
+          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option,
+          together with wide rows or large string values in columns, could cause a memory
+          allocation of more than 2 GB resulting in a crash.
         </p>
 
         <p>
@@ -265,7 +330,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
+        </p>
 
       </conbody>
 
@@ -278,7 +345,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
+          Malformed Avro data, such as out-of-bounds integers or values in the wrong format,
+          could cause a crash when queried.
         </p>
 
         <p>
@@ -289,7 +357,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala262"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and
+          <keyword keyref="impala262"/>.
+        </p>
 
       </conbody>
 
@@ -302,8 +373,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
-          the other side of the channel to wait indefinitely, causing a hang.
+          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the
+          channel on an error. This causes the node on the other side of the channel to wait
+          indefinitely, causing a hang.
         </p>
 
         <p>
@@ -325,15 +397,18 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
-          issued, the <cmdname>impalad</cmdname> daemon crashes.
+          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala
+          <codeph>CREATE FUNCTION</codeph> statement is issued, the <cmdname>impalad</cmdname>
+          daemon crashes.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2365">IMPALA-2365</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -353,30 +428,94 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
     </conbody>
 
+    <concept id="impala-6671">
+
+      <title>Metadata operations block read-only operations on unrelated tables</title>
+
+      <conbody>
+
+        <p>
+          Metadata operations that change the state of a table, like <codeph>COMPUTE
+          STATS</codeph> or <codeph>ALTER RECOVER PARTITIONS</codeph>, may delay metadata
+          propagation of unrelated unloaded tables triggered by statements like
+          <codeph>DESCRIBE</codeph> or <codeph>SELECT</codeph> queries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-6671">IMPALA-6671</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="impala-5200">
+
+      <title>Profile timers not updated during long-running sort</title>
+
+      <conbody>
+
+        <p>
+          If you have a query plan with a long-running sort operation, e.g. minutes, the profile
+          timers are not updated to reflect the time spent in the sort until the sort starts
+          returning rows.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5200">IMPALA-5200</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Slow sorts can be identified by looking at "Peak Mem" in the
+          summary or "PeakMemoryUsage" in the profile. If a sort is consuming multiple GB of
+          memory per host, it will likely spend a significant amount of time sorting the data.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-3316">
+
       <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title>
+
       <conbody>
+
         <p>
-          The configuration setting <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph>
-          uses an underlying function that can be a bottleneck on high volume, highly concurrent
-          queries due to the use of a global lock while loading time zone information. This bottleneck
-          can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount
-          of slowdown depends on factors such as the number of cores and number of threads involved in the query.
+          The configuration setting
+          <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> uses an underlying
+          function that can be a bottleneck on high volume, highly concurrent queries due to the
+          use of a global lock while loading time zone information. This bottleneck can cause
+          slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
+          slowdown depends on factors such as the number of cores and number of threads involved
+          in the query.
         </p>
+
         <note>
           <p>
-            The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within Parquet files that
-            were generated by Hive, and therefore require the on-the-fly timezone conversion processing.
+            The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within
+            Parquet files that were generated by Hive, and therefore require the on-the-fly
+            timezone conversion processing.
           </p>
         </note>
-        <p><b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table represent dates only,
-          with no time portion, consider storing them as strings in <codeph>yyyy-MM-dd</codeph> format.
-          Impala implicitly converts such string values to <codeph>TIMESTAMP</codeph> in calls to date/time
-          functions.
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
         </p>
+
+        <p>
+          <b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table
+          represent dates only, with no time portion, consider storing them as strings in
+          <codeph>yyyy-MM-dd</codeph> format. Impala implicitly converts such string values to
+          <codeph>TIMESTAMP</codeph> in calls to date/time functions.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-1480" rev="IMPALA-1480">
@@ -399,31 +538,37 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
     </concept>
 
     <concept id="ki_file_handle_cache">
+
       <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
+
       <conbody>
+
         <p>
-          If a data file used by Impala is being continuously appended or
-          overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
-            -appendToFile</cmdname>, interaction with the file handle caching
-          feature in <keyword keyref="impala210_full"/> and higher could cause
-          short-circuit reads to sometimes be disabled on some DataNodes. When a
-          mismatch is detected between the cached file handle and a data block
-          that was rewritten because of an append, short-circuit reads are
-          turned off on the affected host for a 10-minute period.
+          If a data file used by Impala is being continuously appended or overwritten in place
+          by an HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction
+          with the file handle caching feature in <keyword keyref="impala210_full"/> and higher
+          could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
+          mismatch is detected between the cached file handle and a data block that was
+          rewritten because of an append, short-circuit reads are turned off on the affected
+          host for a 10-minute period.
         </p>
+
         <p>
-          The possibility of encountering such an issue is the reason why the
-          file handle caching feature is currently turned off by default. See
-            <xref keyref="scalability_file_handle_cache"/> for information about
-          this feature and how to enable it.
+          The possibility of encountering such an issue is the reason why the file handle
+          caching feature is currently turned off by default. See
+          <xref keyref="scalability_file_handle_cache"/> for information about this feature and
+          how to enable it.
         </p>
+
         <p>
           <b>Bug:</b>
           <xref href="https://issues.apache.org/jira/browse/HDFS-12528"
@@ -434,31 +579,29 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
-          this issue before enabling the file handle caching feature. You can
-          set the <cmdname>impalad</cmdname> configuration option
-            <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+        <p>
+          <b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before
+          enabling the file handle caching feature. You can set the <cmdname>impalad</cmdname>
+          configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period
           that is shorter than the HDFS setting
-            <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
-          (Keep in mind that the HDFS setting is in milliseconds while the
-          Impala setting is in seconds.)
+          <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind
+          that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
         </p>
 
         <p>
-          <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
-          parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
-          to specify the amount of time that short circuit reads are disabled on
-          encountering an error. The default value is 10 minutes
-            (<codeph>600</codeph> seconds). It is recommended that you set
-            <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
-          small value, such as <codeph>1</codeph> second, when using the file
-          handle cache. Setting <codeph>
-            dfs.domain.socket.disable.interval.seconds</codeph> to
-            <codeph>0</codeph> is not recommended as a non-zero interval
-          protects the system if there is a persistent problem with short
-          circuit reads.
+          <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
+          <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to specify the amount of
+          time that short circuit reads are disabled on encountering an error. The default value
+          is 10 minutes (<codeph>600</codeph> seconds). It is recommended that you set
+          <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a small value, such as
+          <codeph>1</codeph> second, when using the file handle cache. Setting <codeph>
+          dfs.domain.socket.disable.interval.seconds</codeph> to <codeph>0</codeph> is not
+          recommended as a non-zero interval protects the system if there is a persistent
+          problem with short circuit reads.
         </p>
+
       </conbody>
+
     </concept>
 
   </concept>
@@ -470,24 +613,41 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
+        These issues affect the convenience of interacting directly with Impala, typically
+        through the Impala shell or Hue.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4570">
+
       <title>Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string</title>
+
       <conbody>
+
         <p>
           For example, this issue could occur on a system using setuptools version 20.7.0.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Change to a setuptools version that does not have <codeph>0.7</codeph> as
-          a substring.
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Change to a setuptools version that does not have
+          <codeph>0.7</codeph> as a substring.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3133" rev="IMPALA-3133">
@@ -497,9 +657,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
-          sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
-          not represent a security issue for other statements.
+          Due to a timing condition in updating cached policy data from Sentry, the
+          <codeph>SHOW</codeph> statements for Sentry roles could sometimes display out-of-date
+          role settings. Because Impala rechecks authorization for each SQL statement, this
+          discrepancy does not represent a security issue for other statements.
         </p>
 
         <p>
@@ -511,11 +672,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
         </p>
 
         <p>
-          <b>Resolution:</b> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and
+          <keyword keyref="impala251"/>.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and <keyword keyref="impala251"/>.</p>
-
       </conbody>
 
     </concept>
@@ -527,7 +687,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
+          Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they
+          are already completed.
         </p>
 
         <p>
@@ -547,8 +708,11 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
         <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
 
         <p>
-          <b>Bug:</b>
-          <xref keyref="IMPALA-3123">IMPALA-3123</xref>
+          <b>Bug:</b> <xref keyref="IMPALA-3123">IMPALA-3123</xref>
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
         </p>
 
       </conbody>
@@ -564,8 +728,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
-        in languages such as Java or C++.
+        These issues affect applications that use the JDBC or ODBC APIs, such as business
+        intelligence tools or custom-written applications in languages such as Java or C++.
       </p>
 
     </conbody>
@@ -579,8 +743,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
-          columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
+          If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function
+          calls must follow the same order as the columns. For example, if data is fetched from
+          column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
           <codeph>NULL</codeph>.
         </p>
 
@@ -605,31 +770,78 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
-        redaction.
+        These issues relate to security features, such as Kerberos authentication, Sentry
+        authorization, encryption, auditing, and redaction.
       </p>
 
     </conbody>
 
+    <concept id="impala-4712">
+
+      <title>Transient kerberos authentication error during table loading</title>
+
+      <conbody>
+
+        <p>
+          A transient Kerberos error can cause a table to get into a bad state with an error:
+          <codeph>Failed to load metadata for table</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4712">IMPALA-4712</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Resolve the Kerberos authentication problem and run
+          <codeph>INVALIDATE METADATA</codeph> on the affected table.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-5638">
+
       <title>Malicious user can gain unauthorized access to Kudu table data via Impala</title>
+
       <conbody>
+
         <p>
-          A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access any
-          other Kudu table data by altering the table properties to make it <q>external</q>
-          and then changing the underlying table mapping to point to other Kudu tables.
-          This violates and works around the authorization requirement that creating a
-          Kudu external table via Impala requires an <codeph>ALL</codeph> privilege at the server scope.
-          This privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely avoid
-          this scenario where a malicious user can change the underlying Kudu table
-          mapping. The fix is to enforce the same privilege requirement for <codeph>ALTER</codeph>
-          commands that would make existing non-external Kudu tables external.
+          A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access
+          any other Kudu table data by altering the table properties to make it <q>external</q>
+          and then changing the underlying table mapping to point to other Kudu tables. This
+          violates and works around the authorization requirement that creating a Kudu external
+          table via Impala requires an <codeph>ALL</codeph> privilege at the server scope. This
+          privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely
+          avoid this scenario where a malicious user can change the underlying Kudu table
+          mapping. The fix is to enforce the same privilege requirement for
+          <codeph>ALTER</codeph> commands that would make existing non-external Kudu tables
+          external.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph> permissions on Impala tables.</p>
-        <p><b>Resolution:</b> Upgrade to an Impala version containing the fix for <xref keyref="IMPALA-5638">IMPALA-5638</xref>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph>
+          permissions on Impala tables.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala2100"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="renewable_kerberos_tickets">
@@ -641,12 +853,13 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
+          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if
+          Kerberos tickets are not renewable.
         </p>
 
         <p>
-          <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
-          renewable tickets.
+          <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure
+          <filepath>krb5.conf</filepath> to request renewable tickets.
         </p>
 
       </conbody>
@@ -685,22 +898,38 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
   </concept>
 
-<!--
-  <concept id="known_issues_supportability">
+  <concept id="impala-6726">
 
-    <title id="ki_supportability">Impala Known Issues: Supportability</title>
+    <title>Catalog server's kerberos ticket gets deleted after 'ticket_lifetime' on SLES11</title>
 
     <conbody>
 
       <p>
-        These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
-        shown in monitoring applications.
+        On SLES11, after 'ticket_lifetime', the kerberos ticket gets deleted by the Java krb5
+        library.
+      </p>
+
+      <p>
+        <b>Bug:</b> <xref keyref="IMPALA-6726"/>
+      </p>
+
+      <p>
+        <b>Severity:</b> High
+      </p>
+
+      <p>
+        <b>Workaround:</b> On Impala 2.11.0, set <codeph>--use_kudu_kinit=false</codeph> in
+        Impala startup flag.
+      </p>
+
+      <p>
+        On Impala 2.12.0, set <codeph>--use_kudu_kinit=false</codeph> and
+        <codeph>--use_krpc=false</codeph> in Impala startup flags.
       </p>
 
     </conbody>
 
   </concept>
--->
 
   <concept id="known_issues_resources">
 
@@ -709,92 +938,156 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
-        features.
+        These issues involve memory or disk usage, including out-of-memory conditions, the
+        spill-to-disk feature, and resource management features.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-5605">
+
       <title>Configuration to prevent crashes caused by thread resource limits</title>
+
       <conbody>
+
         <p>
-          Impala could encounter a serious error due to resource usage under very high concurrency.
-          The error message is similar to:
+          Impala could encounter a serious error due to resource usage under very high
+          concurrency. The error message is similar to:
         </p>
+
 <codeblock><![CDATA[
 F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
 terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
 ]]>
 </codeblock>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b>
-          To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
-          daemon with the following settings:
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref>
         </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> To prevent such errors, configure each host running an
+          <cmdname>impalad</cmdname> daemon with the following settings:
+        </p>
+
 <codeblock>
 echo 2000000 > /proc/sys/kernel/threads-max
 echo 2000000 > /proc/sys/kernel/pid_max
 echo 8000000 > /proc/sys/vm/max_map_count
 </codeblock>
+
         <p>
-        Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
+          Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
         </p>
+
 <codeblock>
 impala soft nproc 262144
 impala hard nproc 262144
 </codeblock>
+
       </conbody>
+
     </concept>
 
     <concept id="flatbuffers_mem_usage">
+
       <title>Memory usage when compact_catalog_topic flag enabled</title>
+
       <conbody>
+
         <p>
-          The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref>
-          can cause an increase in size of the updates to Impala catalog metadata
-          that are broadcast to the <cmdname>impalad</cmdname> daemons
-          by the <cmdname>statestored</cmdname> daemon.
-          The increase in catalog update topic size results in higher CPU and network
+          The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref> can
+          cause an increase in size of the updates to Impala catalog metadata that are broadcast
+          to the <cmdname>impalad</cmdname> daemons by the <cmdname>statestored</cmdname>
+          daemon. The increase in catalog update topic size results in higher CPU and network
           utilization. By default, the increase in topic size is about 5-7%. If the
-          <codeph>compact_catalog_topic</codeph> flag is used, the
-          size increase is more substantial, with a topic size approximately twice as
-          large as in previous versions.
+          <codeph>compact_catalog_topic</codeph> flag is used, the size increase is more
+          substantial, with a topic size approximately twice as large as in previous versions.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref></p>
-        <p><b>Severity:</b> Medium</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> Medium
+        </p>
+
         <p>
-          <b>Workaround:</b> Consider setting the
-            <codeph>compact_catalog_topic</codeph> configuration setting to
-            <codeph>false</codeph> until this issue is resolved. </p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.</p>
+          <b>Workaround:</b> Consider setting the <codeph>compact_catalog_topic</codeph>
+          configuration setting to <codeph>false</codeph> until this issue is resolved.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-2294">
+
       <title>Kerberos initialization errors due to high memory usage</title>
+
       <conbody>
+
         <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
-        <p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b></p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala211"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b>
+        </p>
+
         <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="../shared/impala_common.xml#common/vm_overcommit_memory_end"/>
+
       </conbody>
+
     </concept>
 
     <concept id="drop_table_purge_s3a">
+
       <title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
+
       <conbody>
+
         <p>
-          A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data files
-          behind, if the table directory and the data files were created with a combination of
-          <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
+          A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data
+          files behind, if the table directory and the data files were created with a
+          combination of <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> The underlying issue with the S3A connector depends on the resolution of <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> The underlying issue with the S3A connector depends on the
+          resolution of
+          <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="catalogd_heap">
@@ -804,27 +1097,30 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher:
+          The default heap size for Impala <cmdname>catalogd</cmdname> has changed in
+          <keyword keyref="impala25_full"/> and higher:
         </p>
 
         <ul>
           <li>
             <p>
-              Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
-              physical memory or 32 GB.
+              Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default
+              heap size, which is the smaller of 1/4th of the physical memory or 32 GB.
             </p>
           </li>
 
           <li>
             <p>
-              Starting with <keyword keyref="impala250"/>, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
+              Starting with <keyword keyref="impala250"/>, the default
+              <cmdname>catalogd</cmdname> heap size is 4 GB.
             </p>
           </li>
         </ul>
 
         <p>
-          For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
-          in out-of-memory errors in catalogd and leading to query failures.
+          For example, on a host with 128GB physical memory this will result in catalogd heap
+          decreasing from 32GB to 4GB. This can result in out-of-memory errors in catalogd and
+          leading to query failures.
         </p>
 
         <p>
@@ -833,9 +1129,6 @@ impala hard nproc 262144
 
         <p>
           <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
-<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
-<!-- Including full details here via conref, for benefit of PDF readers or anyone else
-             who might have trouble seeing or following the link. -->
         </p>
 
         <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
@@ -851,8 +1144,9 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
-          minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
+          The size of the breakpad minidump files grows linearly with the number of threads. By
+          default, each thread adds 8 KB to the minidump size. Minidump files could consume
+          significant disk space when the daemons have a high number of threads.
         </p>
 
         <p>
@@ -864,11 +1158,13 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
-          size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
-          from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
-          file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
-          than 20 MB.
+          <b>Workaround:</b> Add
+          <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft
+          upper limit on the size of each minidump file. If the minidump file would exceed that
+          limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB.
+          (Full thread information is captured for the first 20 threads, then 2 KB per thread
+          after that.) The minidump file can still grow larger than the <q>hinted</q> size. For
+          example, if you have 10,000 threads, the minidump file can be more than 20 MB.
         </p>
 
       </conbody>
@@ -882,14 +1178,16 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading
-          Parquet files.
+          The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak
+          memory usage than in previous releases while reading Parquet files.
         </p>
 
         <p>
-          <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
-          may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
-          materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
+          <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the
+          efficiency of Parquet scans by up to 2x. The faster scans may result in a higher peak
+          memory consumption compared to earlier versions of Impala due to the new column-wise
+          row materialization strategy. You are likely to experience higher memory consumption
+          in any of the following scenarios:
           <ul>
             <li>
               <p>
@@ -899,14 +1197,15 @@ impala hard nproc 262144
 
             <li>
               <p>
-                Very large rows due to big column values, for example, long strings or nested collections with many items.
+                Very large rows due to big column values, for example, long strings or nested
+                collections with many items.
               </p>
             </li>
 
             <li>
               <p>
-                Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
-                plan nodes.
+                Producer/consumer speed imbalances, leading to more rows being buffered between
+                a scan (producer) and downstream (consumer) plan nodes.
               </p>
             </li>
           </ul>
@@ -921,10 +1220,16 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> The following query options might help to reduce memory consumption
+          in the Parquet scanner:
           <ul>
             <li>
-              Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
+              Reduce the number of scanner threads, for example: <codeph>set
+              num_scanner_threads=30</codeph>
             </li>
 
             <li>
@@ -950,8 +1255,8 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
-          <cmdname>impalad</cmdname> daemon.
+          Some memory allocated by the JVM used internally by Impala is not counted against the
+          memory limit for the <cmdname>impalad</cmdname> daemon.
         </p>
 
         <p>
@@ -959,8 +1264,9 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
-          Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
+          <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname>
+          command, or add the memory figures in the Impala web UI <uicontrol>/memz</uicontrol>
+          tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
         </p>
 
       </conbody>
@@ -982,10 +1288,13 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
+          <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation
+          mechanism if practical.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1000,88 +1309,145 @@ impala hard nproc 262144
     <conbody>
 
       <p>
-        These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
+        These issues can cause incorrect or unexpected results from queries. They typically only
+        arise in very specific circumstances.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4539">
+
       <title>Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it</title>
+
 <!-- TSB-225 title: Possibly incorrect results when scanning uncompressed Parquet files with Impala. -->
+
       <conbody>
+
         <p>
-          Impala queries may return incorrect results when scanning plain-encoded string
-          columns in uncompressed Parquet files. I/O buffers holding the string data are
-          prematurely freed, leading to invalid memory reads and possibly
-          non-deterministic results. This does not affect Parquet files that use a
-          compression codec such as Snappy. Snappy is both strongly recommended generally
-          and the default choice for Impala-written Parquet files.
+          Impala queries may return incorrect results when scanning plain-encoded string columns
+          in uncompressed Parquet files. I/O buffers holding the string data are prematurely
+          freed, leading to invalid memory reads and possibly non-deterministic results. This
+          does not affect Parquet files that use a compression codec such as Snappy. Snappy is
+          both strongly recommended generally and the default choice for Impala-written Parquet
+          files.
         </p>
+
         <p>
           How to determine whether a query might be affected:
         </p>
+
         <ul>
           <li>
             The query must reference <codeph>STRING</codeph> columns from a Parquet table.
           </li>
+
           <li>
             A selective filter on the Parquet table makes this issue more likely.
           </li>
+
           <li>
-            Identify any uncompressed Parquet files processed by the query.
-            Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
-            suspected table. Use a query that performs a full table scan, and materializes the column
-            values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM <varname>tablename</varname></codeph>.)
-            Look for <q>File Formats</q>. A value containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
+            Identify any uncompressed Parquet files processed by the query. Examine the
+            <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the suspected
+            table. Use a query that performs a full table scan, and materializes the column
+            values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM
+            <varname>tablename</varname></codeph>.) Look for <q>File Formats</q>. A value
+            containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
           </li>
+
           <li>
-            Identify any plain-encoded string columns in the associated table. Pay special attention to tables
-            containing Parquet files generated through Hive, Spark, or other mechanisms outside of Impala,
-            because Impala uses Snappy compression by default for Parquet files. Use <codeph>parquet-tools</codeph>
-            to dump the file metadata. Note that a column could have several encodings within the same file (the column
-            data is stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output of
-            <codeph>parquet-tools</codeph>, which means the values are plain encoded.
+            Identify any plain-encoded string columns in the associated table. Pay special
+            attention to tables containing Parquet files generated through Hive, Spark, or other
+            mechanisms outside of Impala, because Impala uses Snappy compression by default for
+            Parquet files. Use <codeph>parquet-tools</codeph> to dump the file metadata. Note
+            that a column could have several encodings within the same file (the column data is
+            stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output
+            of <codeph>parquet-tools</codeph>, which means the values are plain encoded.
           </li>
         </ul>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Upgrade to a version of Impala containing the fix for <xref keyref="IMPALA-4539">IMPALA-4539</xref>.</p>
-        <p><b>Workaround:</b> Use Snappy or another compression codec for Parquet files.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use Snappy or another compression codec for Parquet files.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4513">
+
       <title>ABS(n) where n is the lowest bound for the int types returns negative values</title>
+
       <conbody>
+
         <p>
-          If the <codeph>abs()</codeph> function evaluates a number that is right at the lower bound for
-          an integer data type, the positive result cannot be represented in the same type, and the
-          result is returned as a negative number. For example, <codeph>abs(-128)</codeph> returns -128
-          because the argument is interpreted as a <codeph>TINYINT</codeph> and the return value is also
-          a <codeph>TINYINT</codeph>.
+          If the <codeph>abs()</codeph> function evaluates a number that is right at the lower
+          bound for an integer data type, the positive result cannot be represented in the same
+          type, and the result is returned as a negative number. For example,
+          <codeph>abs(-128)</codeph> returns -128 because the argument is interpreted as a
+          <codeph>TINYINT</codeph> and the return value is also a <codeph>TINYINT</codeph>.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
-          <codeph>abs(<varname>tinyint_col</varname>)</codeph> as <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
+          <codeph>abs(<varname>tinyint_col</varname>)</codeph> as
+          <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4266">
+
       <title>Java udf expression returning string in group by can give incorrect results.</title>
+
       <conbody>
+
         <p>
-          If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a string value,
-          the UDF could return an incorrect result.
+          If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a
+          string value, the UDF could return an incorrect result.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF with an
-          empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF
+          with an empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
           <codeph>concat(my_hive_udf(), '')</codeph>.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3084" rev="IMPALA-3084">
@@ -1091,8 +1457,9 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
-          collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+          A query could return wrong results (too many or too few <codeph>NULL</codeph> values)
+          if it referenced an outer-joined nested collection and also contained a null-checking
+          predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
           <codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> clause.
         </p>
 
@@ -1104,7 +1471,9 @@ impala hard nproc 262144
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
+        </p>
 
       </conbody>
 
@@ -1117,8 +1486,8 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
-          another join clause. For example:
+          An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a
+          constant such as <codeph>FALSE</codeph> in another join clause. For example:
         </p>
 
 <codeblock><![CDATA[
@@ -1144,10 +1513,6 @@ explain SELECT 1 FROM alltypestiny a1
         </p>
 
         <p>
-          <b>Resolution:</b>
-        </p>
-
-        <p>
           <b>Workaround:</b>
         </p>
 
@@ -1174,8 +1539,8 @@ explain SELECT 1 FROM alltypestiny a1
 
           <li>
             <p>
-              The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
-              preceding OUTER JOINs.
+              The INNER JOIN has an On-clause with a predicate that references at least two
+              tables that are on the nullable side of the preceding OUTER JOINs.
             </p>
           </li>
         </ul>
@@ -1258,13 +1623,19 @@ on b.int_col = c.int_col;
         </p>
 
         <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
           <b>Workaround:</b> High
         </p>
 
         <p>
-          For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
-          <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
-          the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
+          For some queries, this problem can be worked around by placing the problematic
+          <codeph>ON</codeph> clause predicate in the <codeph>WHERE</codeph> clause instead, or
+          changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s
+          (if the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s).
+          For example, to fix the problematic query above:
         </p>
 
 <codeblock><![CDATA[
@@ -1340,7 +1711,8 @@ where b.int_col = c.int_col
       <conbody>
 
         <p>
-          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
+          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first.
+          The parquet standard says it is MSB first.
         </p>
 
         <p>
@@ -1348,8 +1720,8 @@ where b.int_col = c.int_col
         </p>
 
         <p>
-          <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
-          in Parquet 2.0.
+          <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used,
+          is not written by Impala, and is deprecated in Parquet 2.0.
         </p>
 
       </conbody>
@@ -1363,10 +1735,11 @@ where b.int_col = c.int_col
       <conbody>
 
         <p>
-          The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
-          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
-          third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
-          as:
+          The calculation of start and end times for the BST (British Summer Time) time zone
+          could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
+          at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function calls should return
+          13, but actually return 12, in a query such as:
         </p>
 
 <codeblock>
@@ -1394,15 +1767,18 @@ select
       <conbody>
 
         <p>
-          If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
-          the hostname field.
+          If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph>
+          function could return an incorrect value for the hostname field.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-1170"></xref>IMPALA-1170
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
+          <keyword keyref="impala234"/>.
+        </p>
 
       </conbody>
 
@@ -1415,8 +1791,9 @@ select
       <conbody>
 
         <p>
-          If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
-          does not match a <codeph>%</codeph> final character of the LHS argument.
+          If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an
+          escaped <codeph>\%</codeph> character, it does not match a <codeph>%</codeph> final
+          character of the LHS argument.
         </p>
 
         <p>
@@ -1434,8 +1811,9 @@ select
       <conbody>
 
         <p>
-          Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
-          involving a call to <codeph>rand()</codeph> does not actually randomize the results.
+          Because the value for <codeph>rand()</codeph> is computed early in a query, using an
+          <codeph>ORDER BY</codeph> expression involving a call to <codeph>rand()</codeph> does
+          not actually randomize the results.
         </p>
 
         <p>
@@ -1453,8 +1831,9 @@ select
       <conbody>
 
         <p>
-          If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
-          result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
+          If the same column is queried twice within a view, <codeph>NULL</codeph> values for
+          that column are omitted. For example, the result of <codeph>COUNT(*)</codeph> on the
+          view could be less than expected.
         </p>
 
         <p>
@@ -1465,7 +1844,10 @@ select
           <b>Workaround:</b> Avoid selecting the same column twice within an inline view.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.
+        </p>
 
       </conbody>
 
@@ -1480,15 +1862,19 @@ select
       <conbody>
 
         <p>
-          A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
-          from the <codeph>ON</codeph> clause incorrectly.
+          A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table
+          references is an inline view might apply predicates from the <codeph>ON</codeph>
+          clause incorrectly.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-1459">IMPALA-1459</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
+        </p>
 
       </conbody>
 
@@ -1501,8 +1887,8 @@ select
       <conbody>
 
         <p>
-          A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
-          subqueries.
+          A query could encounter a serious error if includes multiple nested levels of
+          <codeph>INNER JOIN</codeph> clauses involving subqueries.
         </p>
 
         <p>
@@ -1520,7 +1906,8 @@ select
       <conbody>
 
         <p>
-          A query might return incorrect results due to wrong predicate assignment in the following scenario:
+          A query might return incorrect results due to wrong predicate assignment in the
+          following scenario:
         </p>
 
         <ol>
@@ -1533,8 +1920,8 @@ select
           </li>
 
           <li>
-            That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
-            the inline view
+            That join has an On-clause containing a predicate that only references columns
+            originating from the outer-joined tables inside the inline view
           </li>
         </ol>
 
@@ -1542,7 +1929,10 @@ select
           <b>Bug:</b> <xref keyref="IMPALA-2665">IMPALA-2665</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
+        </p>
 
       </conbody>
 
@@ -1555,15 +1945,18 @@ select
       <conbody>
 
         <p>
-          In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
-          clause might be applied at the wrong stage of query processing, leading to incorrect results.
+          In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the
+          comparison from the <codeph>HAVING</codeph> clause might be applied at the wrong stage
+          of query processing, leading to incorrect results.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2144">IMPALA-2144</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1576,15 +1969,18 @@ select
       <conbody>
 
         <p>
-          A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
-          SUM(...))</codeph>, could return incorrect results.
+          A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function,
+          such as <codeph>NOT IN (SELECT SUM(...))</codeph>, could return incorrect results.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2093">IMPALA-2093</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
+          <keyword keyref="impala234"/>.
+        </p>
 
       </conbody>
 
@@ -1599,8 +1995,9 @@ select
     <conbody>
 
       <p>
-        These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
-        STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
+        These issues affect how Impala interacts with metadata. They cover areas such as the
+        metastore database, the <codeph>COMPUTE STATS</codeph> statement, and the Impala
+        <cmdname>catalogd</cmdname> daemon.
       </p>
 
     </conbody>
@@ -1612,9 +2009,11 @@ select
       <conbody>
 
         <p>
-          Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
-          columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
-          this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
+          Incremental stats use up about 400 bytes per partition for each column. For example,
+          for a table with 20K partitions and 100 columns, the memory overhead from incremental
+          statistics is about 800 MB. When serialized for transmission across the network, this
+          metadata exceeds the 2 GB Java array size limit and leads to a
+          <codeph>catalogd</codeph> crash.
         </p>
 
         <p>
@@ -1624,8 +2023,9 @@ select
         </p>
 
         <p>
-          <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
-          scalability of incremental stats computation is a continuing work item.
+          <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing
+          incremental stats for that table. The scalability of incremental stats computation is
+          a continuing work item.
         </p>
 
       </conbody>
@@ -1647,17 +2047,21 @@ select
         </p>
 
         <p>
-          <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
-          enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
-          set both properties with a single <codeph>ALTER TABLE</codeph> statement:
+          <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics
+          manually by setting the <codeph>numRows</codeph>, you must also enable the Boolean
+          property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement
+          like the following to set both properties with a single <codeph>ALTER TABLE</codeph>
+          statement:
         </p>
 
 <codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
 
         <p>
           <b>Resolution:</b> The underlying cause is the issue
-          <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
-          metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <keyword keyref="distro"/>.
+          <xref
+            href="https://issues.apache.org/jira/browse/HIVE-8648"
+            scope="external" format="html">HIVE-8648</xref>
+          that affects the metastore in Hive 0.13.
         </p>
 
       </conbody>
@@ -1673,8 +2077,8 @@ select
     <conbody>
 
       <p>
-        These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
-        and file formats.
+        These issues affect the ability to interchange data between Impala and other database
+        systems. They cover areas such as data types and file formats.
       </p>
 
     </conbody>
@@ -1688,26 +2092,32 @@ select
       <conbody>
 
         <p>
-          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
-          adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
-          FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
+          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
+          changing the Avro schema file by adding or removing columns. Columns added to the
+          schema file will not show up in the output of the <codeph>DESCRIBE FORMATTED</codeph>
+          command. Removing columns from the schema file will trigger a
+          <codeph>NullPointerException</codeph>.
         </p>
 
         <p>
-          As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
-          the Hive metastore database with the correct column definitions.
+          As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop
+          and recreate the table. This will populate the Hive metastore database with the
+          correct column definitions.
         </p>
 
         <note type="warning">
-          <p>Only use this for external tables, or Impala will remove the data
-            files. In case of an internal table, set it to external first:
+          <p>
+            Only use this for external tables, or Impala will remove the data files. In case of
+            an internal table, set it to external first:
 <codeblock>
 ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
 </codeblock>
-          (The part in parentheses is case sensitive.) Make sure to pick the
-          right choice between internal and external when recreating the table.
-          See <xref href="impala_tables.xml#tables"/> for the differences
-          between internal and external tables. </p></note>
+            (The part in parentheses is case sensitive.) Make sure to pick the right choice
+            between internal and external when recreating the table. See
+            <xref href="impala_tables.xml#tables"/> for the differences between internal and
+            external tables.
+          </p>
+        </note>
 
         <p>
           <b>Severity:</b> High
@@ -1746,8 +2156,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
-          allowed value of type (Hive returns NULL).
+          Impala behavior differs from Hive with respect to out of range float/double values.
+          Out of range values are returned as maximum allowed value of type (Hive returns NULL).
         </p>
 
         <p>
@@ -1767,14 +2177,16 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
-          <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
-          must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
-          Impala or Hive.
+          For compatibility with Impala, the value for the Flume HDFS Sink
+          <codeph>hdfs.writeFormat</codeph> must be set to <codeph>Text</codeph>, rather than
+          its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph>
+          setting must be changed to <codeph>Text</codeph> before creating data files with
+          Flume; otherwise, those files cannot be read by either Impala or Hive.
         </p>
 
         <p>
-          <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
+          <b>Resolution:</b> This information has been requested to be added to the upstream
+          Flume documentation.
         </p>
 
       </conbody>
@@ -1790,7 +2202,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
+          Querying certain Avro tables could cause a crash or return no rows, even though Impala
+          could <codeph>DESCRIBE</codeph> the table.
         </p>
 
         <p>
@@ -1798,13 +2211,14 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
         </p>
 
         <p>
-          <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
-          instead of <codeph>["string", "null"]</codeph>.
+          <b>Workaround:</b> Swap the order of the fields in the schema specification. For
+          example, <codeph>["null", "string"]</codeph> instead of <codeph>["string",
+          "null"]</codeph>.
         </p>
 
         <p>
-          <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
-          crashing issue is resolved.
+          <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it
+          may still cause an error even when the crashing issue is resolved.
         </p>
 
       </conbody>
@@ -1820,7 +2234,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
+          If an Avro table has a schema definition with a trailing semicolon, Impala encounters
+          an error when the table is queried.
         </p>
 
         <p>
@@ -1844,8 +2259,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
-          streams, the Impala query only processes the data from the first stream.
+          Currently, Impala can only read gzipped files containing a single stream. If a gzipped
+          file contains multiple concatenated streams, the Impala query only processes the data
+          from the first stream.
         </p>
 
         <p>
@@ -1856,7 +2272,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
           <b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1871,8 +2289,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
-          the row following the <codeph>\n\r</codeph> pair twice.
+          If a carriage return / newline pair of characters in a text table is split between
+          HDFS data blocks, Impala incorrectly processes the row following the
+          <codeph>\n\r</codeph> pair twice.
         </p>
 
         <p>
@@ -1883,7 +2302,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
           <b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
+        </p>
 
       </conbody>
 
@@ -1898,30 +2319,33 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value.
-          The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to
-          overlooking the presence of in

<TRUNCATED>

[9/9] impala git commit: IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

Posted by sa...@apache.org.

IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

- Updated the fixed versions for the issues fixed in 2.12 or earlier.
- Added new known issues open in 2.12.

Change-Id: I4638be7e488546287e3555945bb691a588ec6f09
Reviewed-on: http://gerrit.cloudera.org:8080/10101
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/b9271ccf
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/b9271ccf
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/b9271ccf

Branch: refs/heads/master
Commit: b9271ccf0e2e2e8cfbe7c6538aca0109c68acbef
Parents: 5bc5279
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Wed Apr 18 12:49:10 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Sat Apr 21 04:02:34 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_known_issues.xml | 1335 ++++++++++++++++++++----------
 1 file changed, 894 insertions(+), 441 deletions(-)
----------------------------------------------------------------------

[5/9] impala git commit: IMPALA-6459: [DOCS] Part 2: Stats extrapolation and sampling.

Posted by sa...@apache.org.

IMPALA-6459: [DOCS] Part 2: Stats extrapolation and sampling.

Adds new materials under COMPUTE STATS describing
the experimental stats extrapolation and sampling
features.

More cleanup and examples are needed. This patch provides
a reasonable starting point which we can extend.

Change-Id: Idae7a377b5873701e91f60afa62dde2bd8aacd1b
Reviewed-on: http://gerrit.cloudera.org:8080/10112
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/7134d812
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/7134d812
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/7134d812

Branch: refs/heads/master
Commit: 7134d812b57a734c194cc94d32f1e212ed0f17cd
Parents: 34b2f21
Author: Alex Behm <al...@cloudera.com>
Authored: Tue Apr 17 17:12:17 2018 -0700
Committer: Impala Public Jenkins <im...@cloudera.com>
Committed: Fri Apr 20 19:45:46 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_perf_stats.xml | 135 +++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/7134d812/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml
index f503a68..dab2eb8 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -392,6 +392,12 @@ show column stats year_month_day;
               This feature is available since Impala 2.8.
             </p>
           </li>
+          <li>
+            <p>
+              Consider the experimental extrapolation and sampling features (see below)
+              to further increase the efficiency of computing stats.
+            </p>
+          </li>
           </ul>
         </p>
 
@@ -415,6 +421,135 @@ show column stats year_month_day;
 
       </conbody>
 
+      <concept id="experimental_stats_features">
+        <title>Experimental: Extrapolation and Sampling</title>
+        <conbody>
+          <p>
+            Impala 2.12 and higher includes two experimental features to alleviate
+            common issues for computing and maintaining statistics on very large tables.
+            The following shortcomings are improved upon:
+            <ul>
+            <li>
+              <p>
+                Newly added partitions do not have row count statistics. Table scans
+                that only access those new partitions are treated as not having stats.
+                Similarly, table scans that access both new and old partitions estimate
+                the scan cardinality based on those old partitions that have stats, and
+                the new partitions without stats are treated as having 0 rows.
+              </p>
+            </li>
+            <li>
+              <p>
+                The row counts of existing partitions become stale when data is added
+                or dropped.
+              </p>
+            </li>
+            <li>
+              <p>
+                Computing stats for tables with a 100,000 or more partitions might fail
+                or be very slow due to the high cost of updating the partition metadata
+                in the Hive Metastore.
+              </p>
+            </li>
+            <li>
+              <p>
+                With transient compute resources it is important to minimize the time
+                from starting a new cluster to successfully running queries.
+                Since the cluster might be relatively short-lived, users might prefer to
+                quickly collect stats that are "good enough" as opposed to spending
+                a lot of time and resouces on computing full-fidelity stats.
+              </p>
+            </li>
+            </ul>
+            For very large tables, it is often wasteful or impractical to run a full
+            COMPUTE STATS to address the scenarios above on a frequent basis.
+          </p>
+          <p>
+            The sampling feature makes COMPUTE STATS more efficient by processing a
+            fraction of the table data, and the extrapolation feature aims to reduce
+            the frequency at which COMPUTE STATS needs to be re-run by estimating
+            the row count of new and modified partitions.
+          </p>
+          <p>
+            The sampling and extrapolation features are disabled by default.
+            They can be enabled globally or for specific tables, as follows.
+            Set the impalad start-up configuration "--enable_stats_extrapolation" to
+            enable the features globally. To enable them only for a specific table, set
+            the "impala.enable.stats.extrapolation" table property to "true" for the
+            desired table. The tbale-level property overrides the global setting, so
+            it is also possible to enable sampling and extrapolation globally, but
+            disable it for specific tables by setting the table property to "false".
+            Example:
+            ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true")
+          </p>
+          <note>
+            Why are these features experimental? Due to their probabilistic nature
+            it is possible that these features perform pathologically poorly on tables
+            with extreme data/file/size distributions. Since it is not feasible for us
+            to test all possible scenarios we only cautiously advertise these new
+            capabilities. That said, the features have been thoroughly tested and
+            are considered functionally stable. If you decide to give these features
+            a try, please tell us about your experience at user@impala.apache.org!
+            We rely on user feedback to guide future inprovements in statistics
+            collection.
+          </note>
+        </conbody>
+
+        <concept id="experimental_stats_extrapolation">
+          <title>Stats Extrapolation</title>
+          <conbody>
+            <p>
+              The main idea of stats extrapolation is to estimate the row count of new
+              and modified partitions based on the result of the last COMPUTE STATS.
+              Enabling stats extrapolation changes the behavior of COMPUTE STATS,
+              as well as the cardinality estimation of table scans. COMPUTE STATS no
+              longer computes and stores per-partition row counts, and instead, only
+              computes a table-level row count together with the total number of file
+              bytes in the table at that time. No partition metadata is modified. The
+              input cardinality of a table scan is estimated by converting the data
+              volume of relevant partitions to a row count, based on the table-level
+              row count and file bytes statistics. It is assumed that within the same
+              table, different sets of files with the same data volume correspond
+              to the similar number of rows on average. With extrapolation enabled,
+              the scan cardinality estimation ignores per-partition row counts. It
+              only relies on the table-level statistics and the scanned data volume.
+            </p>
+            <p>
+              The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts
+              stored in the Hive Metastore, and the row counts extrapolated based on the
+              above process. Consult the SHOW TABLE STATS and EXPLAIN documentation
+              for more details.
+            </p>
+          </conbody>
+        </concept>
+
+        <concept id="experimental_stats_sampling">
+          <title>Sampling</title>
+          <conbody>
+            <p>
+              A TABLESAMPLE clause may be added to COMPUTE STATS to limit the
+              percentage of data to be processed. The final statistics are obtained
+              by extrapolating the statistics from the data sample over the entire table.
+              The extrapolated statistics are stored in the Hive Metastore, just as if no
+              sampling was used. The following example runs COMPUTE STATS over a 10 percent
+              data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10)
+            </p>
+            <p>
+            We have found that a 10 percent sampling rate typically offers a good
+            tradeoff between statistics accuracy and execution cost. A sampling rate
+            well below 10 percent has shown poor results and is not recommended.
+            </p>
+            <note type="important">
+              Sampling-based techniques sacrifice result accuracy for execution
+              efficiency, so your mileage may vary for different tables and columns
+              depending on their data distribution. The extrapolation procedure Impala
+              uses for estimating the number of distinct values per column is inherently
+              non-detetministic, so your results may even vary between runs of
+              COMPUTE STATS TABLESAMPLE, even if no data has changed.
+            </note>
+          </conbody>
+        </concept>
+      </concept>
     </concept>
 
     <concept id="concept_bmk_pfl_mdb">