You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by sa...@apache.org on 2018/04/23 17:38:49 UTC

[01/20] impala git commit: IMPALA-6459: [DOCS] Part 1 Stats doc reorg

Repository: impala
Updated Branches:
  refs/heads/2.x 2be6814e6 -> 8dd465f69


IMPALA-6459: [DOCS] Part 1 Stats doc reorg

Change-Id: I3b061c39bb847690760311fd7280121eda2db626
Reviewed-on: http://gerrit.cloudera.org:8080/10067
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/dff44e4b
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/dff44e4b
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/dff44e4b

Branch: refs/heads/2.x
Commit: dff44e4b488163e8934a89762adc38637e5c5312
Parents: 27d6f5e
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Fri Apr 13 14:41:27 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:20 2018 +0000

----------------------------------------------------------------------
 docs/impala_keydefs.ditamap       |   1 -
 docs/shared/impala_common.xml     |  10 +-
 docs/topics/impala_perf_stats.xml | 893 ++++++++++++++++-----------------
 3 files changed, 448 insertions(+), 456 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/dff44e4b/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 27f0515..9703698 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10943,7 +10943,6 @@ under the License.
   <keydef href="topics/impala_perf_joins.xml#straight_join" keys="straight_join"/>
   <keydef href="topics/impala_perf_joins.xml#perf_joins_examples" keys="perf_joins_examples"/>
   <keydef href="topics/impala_perf_stats.xml" keys="perf_stats"/>
-  <keydef href="topics/impala_perf_stats.xml#perf_column_stats_manual" keys="perf_column_stats_manual"/>
 
   <keydef href="topics/impala_perf_benchmarking.xml" keys="perf_benchmarks"/>
   <keydef href="topics/impala_perf_resources.xml" keys="mem_limits"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/dff44e4b/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 4ffd211..e651337 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1354,11 +1354,11 @@ drop database temp;
 
       <p id="cs_or_cis">
         For a particular table, use either <codeph>COMPUTE STATS</codeph> or
-        <codeph>COMPUTE INCREMENTAL STATS</codeph>, but never combine the two or alternate
-        between them. If you switch from <codeph>COMPUTE STATS</codeph> to
-        <codeph>COMPUTE INCREMENTAL STATS</codeph> during the lifetime of a table, or vice
-        versa, drop all statistics (by running both <codeph>DROP STATS</codeph> and
-        <codeph>DROP INCREMENTAL STATS</codeph>) before making the switch.
+        <codeph>COMPUTE INCREMENTAL STATS</codeph>, but never combine the two or
+        alternate between them. If you switch from <codeph>COMPUTE STATS</codeph> to
+        <codeph>COMPUTE INCREMENTAL STATS</codeph> during the lifetime of a table, or
+        vice versa, drop all statistics by running <codeph>DROP STATS</codeph> before
+        making the switch.
       </p>
 
       <p id="incremental_stats_after_full">

http://git-wip-us.apache.org/repos/asf/impala/blob/dff44e4b/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml
index ac771be..f503a68 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -21,6 +21,7 @@ under the License.
 <concept id="perf_stats">
 
   <title>Table and Column Statistics</title>
+
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
@@ -35,57 +36,48 @@ under the License.
   <conbody>
 
     <p>
-      Impala can do better optimization for complex or multi-table queries when it has access to statistics about
-      the volume of data and how the values are distributed. Impala uses this information to help parallelize and
-      distribute the work for a query. For example, optimizing join queries requires a way of determining if one
-      table is <q>bigger</q> than another, which is a function of the number of rows and the average row size
-      for each table. The following sections describe the categories of statistics Impala can work
-      with, and how to produce them and keep them up to date.
+      Impala can do better optimization for complex or multi-table queries when it has access to
+      statistics about the volume of data and how the values are distributed. Impala uses this
+      information to help parallelize and distribute the work for a query. For example,
+      optimizing join queries requires a way of determining if one table is <q>bigger</q> than
+      another, which is a function of the number of rows and the average row size for each
+      table. The following sections describe the categories of statistics Impala can work with,
+      and how to produce them and keep them up to date.
     </p>
 
-    <note>
-      <p rev="1.2.2">
-        Originally, Impala relied on the Hive mechanism for collecting statistics, through the Hive <codeph>ANALYZE
-        TABLE</codeph> statement which initiates a MapReduce job. For better user-friendliness and reliability,
-        Impala implements its own <codeph>COMPUTE STATS</codeph> statement in Impala 1.2.2 and higher, along with the
-        <codeph>DROP STATS</codeph>, <codeph>SHOW TABLE STATS</codeph>, and <codeph>SHOW COLUMN STATS</codeph>
-        statements.
-      </p>
-    </note>
+    <p outputclass="toc inpage all"/>
 
-    <p outputclass="toc inpage"/>
   </conbody>
 
   <concept id="perf_table_stats">
 
     <title id="table_stats">Overview of Table Statistics</title>
-  <prolog>
-    <metadata>
-      <data name="Category" value="Concepts"/>
-    </metadata>
-  </prolog>
 
-    <conbody>
+    <prolog>
+      <metadata>
+        <data name="Category" value="Concepts"/>
+      </metadata>
+    </prolog>
 
-<!-- Hive background info: https://cwiki.apache.org/Hive/statsdev.html -->
+    <conbody>
 
       <p>
         The Impala query planner can make use of statistics about entire tables and partitions.
-        This information includes physical characteristics such as the number of rows, number of data files,
-        the total size of the data files, and the file format. For partitioned tables, the numbers
-        are calculated per partition, and as totals for the whole table.
-        This metadata is stored in the metastore database, and can be updated by either Impala or Hive.
-        If a number is not available, the value -1 is used as a placeholder.
-        Some numbers, such as number and total sizes of data files, are always kept up to date because
-        they can be calculated cheaply, as part of gathering HDFS block metadata.
+        This information includes physical characteristics such as the number of rows, number of
+        data files, the total size of the data files, and the file format. For partitioned
+        tables, the numbers are calculated per partition, and as totals for the whole table.
+        This metadata is stored in the metastore database, and can be updated by either Impala
+        or Hive. If a number is not available, the value -1 is used as a placeholder. Some
+        numbers, such as number and total sizes of data files, are always kept up to date
+        because they can be calculated cheaply, as part of gathering HDFS block metadata.
       </p>
 
       <p>
-        The following example shows table stats for an unpartitioned Parquet table.
-        The values for the number and sizes of files are always available.
-        Initially, the number of rows is not known, because it requires a potentially expensive
-        scan through the entire table, and so that value is displayed as -1.
-        The <codeph>COMPUTE STATS</codeph> statement fills in any unknown table stats values.
+        The following example shows table stats for an unpartitioned Parquet table. The values
+        for the number and sizes of files are always available. Initially, the number of rows is
+        not known, because it requires a potentially expensive scan through the entire table,
+        and so that value is displayed as -1. The <codeph>COMPUTE STATS</codeph> statement fills
+        in any unknown table stats values.
       </p>
 
 <codeblock>
@@ -113,26 +105,29 @@ show table stats parquet_snappy;
 </codeblock>
 
       <p>
-        Impala performs some optimizations using this metadata on its own, and other optimizations by
-        using a combination of table and column statistics.
+        Impala performs some optimizations using this metadata on its own, and other
+        optimizations by using a combination of table and column statistics.
       </p>
 
       <p rev="1.2.1">
-        To check that table statistics are available for a table, and see the details of those statistics, use the
-        statement <codeph>SHOW TABLE STATS <varname>table_name</varname></codeph>. See
-        <xref href="impala_show.xml#show"/> for details.
+        To check that table statistics are available for a table, and see the details of those
+        statistics, use the statement <codeph>SHOW TABLE STATS
+        <varname>table_name</varname></codeph>. See <xref href="impala_show.xml#show"/> for
+        details.
       </p>
 
       <p>
         If you use the Hive-based methods of gathering statistics, see
         <xref href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" scope="external" format="html">the
-        Hive wiki</xref> for information about the required configuration on the Hive side. Where practical,
-        use the Impala <codeph>COMPUTE STATS</codeph> statement to avoid potential configuration and scalability
-        issues with the statistics-gathering process.
+        Hive wiki</xref> for information about the required configuration on the Hive side.
+        Where practical, use the Impala <codeph>COMPUTE STATS</codeph> statement to avoid
+        potential configuration and scalability issues with the statistics-gathering process.
       </p>
 
       <p conref="../shared/impala_common.xml#common/hive_column_stats_caveat"/>
+
     </conbody>
+
   </concept>
 
   <concept id="perf_column_stats">
@@ -142,23 +137,24 @@ show table stats parquet_snappy;
     <conbody>
 
       <p>
-        The Impala query planner can make use of statistics about individual columns when that metadata is
-        available in the metastore database. This technique is most valuable for columns compared across tables in
-        <xref href="impala_perf_joins.xml#perf_joins">join queries</xref>, to help estimate how many rows the query
-        will retrieve from each table. <ph rev="2.0.0"> These statistics are also important for correlated
-        subqueries using the <codeph>EXISTS()</codeph> or <codeph>IN()</codeph> operators, which are processed
+        The Impala query planner can make use of statistics about individual columns when that
+        metadata is available in the metastore database. This technique is most valuable for
+        columns compared across tables in <xref href="impala_perf_joins.xml#perf_joins">join
+        queries</xref>, to help estimate how many rows the query will retrieve from each table.
+        <ph rev="2.0.0"> These statistics are also important for correlated subqueries using the
+        <codeph>EXISTS()</codeph> or <codeph>IN()</codeph> operators, which are processed
         internally the same way as join queries.</ph>
       </p>
 
       <p>
-        The following example shows column stats for an unpartitioned Parquet table.
-        The values for the maximum and average sizes of some types are always available,
-        because those figures are constant for numeric and other fixed-size types.
-        Initially, the number of distinct values is not known, because it requires a potentially expensive
-        scan through the entire table, and so that value is displayed as -1.
-        The same applies to maximum and average sizes of variable-sized types, such as <codeph>STRING</codeph>.
-        The <codeph>COMPUTE STATS</codeph> statement fills in most unknown column stats values.
-        (It does not record the number of <codeph>NULL</codeph> values, because currently Impala
+        The following example shows column stats for an unpartitioned Parquet table. The values
+        for the maximum and average sizes of some types are always available, because those
+        figures are constant for numeric and other fixed-size types. Initially, the number of
+        distinct values is not known, because it requires a potentially expensive scan through
+        the entire table, and so that value is displayed as -1. The same applies to maximum and
+        average sizes of variable-sized types, such as <codeph>STRING</codeph>. The
+        <codeph>COMPUTE STATS</codeph> statement fills in most unknown column stats values. (It
+        does not record the number of <codeph>NULL</codeph> values, because currently Impala
         does not use that figure for query optimization.)
       </p>
 
@@ -197,88 +193,56 @@ show column stats parquet_snappy;
 
       <note>
         <p>
-          For column statistics to be effective in Impala, you also need to have table statistics for the
-          applicable tables, as described in <xref href="impala_perf_stats.xml#perf_table_stats"/>. When you use
-          the Impala <codeph>COMPUTE STATS</codeph> statement, both table and column statistics are automatically
-          gathered at the same time, for all columns in the table.
+          For column statistics to be effective in Impala, you also need to have table
+          statistics for the applicable tables, as described in
+          <xref href="impala_perf_stats.xml#perf_table_stats"/>. When you use the Impala
+          <codeph>COMPUTE STATS</codeph> statement, both table and column statistics are
+          automatically gathered at the same time, for all columns in the table.
         </p>
       </note>
 
       <note conref="../shared/impala_common.xml#common/compute_stats_nulls"/>
 
-<!-- Hive-based instructions are considered obsolete since the introduction of the Impala COMPUTE STATS statement.
-      <p>
-        Add settings like the following to the <filepath>hive-site.xml</filepath>
-        configuration file, in the Hive configuration directory, on every node where you run
-        <codeph>ANALYZE TABLE</codeph> statements through the
-        <codeph>hive</codeph> shell. The
-        <codeph>hive.stats.ndv.error</codeph> setting represents the standard error when
-        estimating the number of distinct values for a column. The value of 5.0 is recommended as a tradeoff between the
-        accuracy of the gathered statistics and the resource usage of the stats-gathering process.
-      </p>
-
-<codeblock><![CDATA[<property>
- <name>hive.stats.ndv.error</name>
- <value>5.0</value>
-</property>]]></codeblock>
-
-      <p>
-        5.0 is a relatively low value that devotes substantial computational resources to the statistics-gathering
-        process. To reduce the resource usage, you could increase this value; to make the statistics even more precise,
-        you could lower it.
-      </p>
-
-      <p>
-        The syntax for gathering column statistics uses the <codeph>ANALYZE TABLE ...
-        COMPUTE STATISTICS</codeph> clause, with an additional <codeph>FOR
-        COLUMNS</codeph> clause. For partitioned tables, you can gather statistics for specific partitions by including
-        a clause <codeph>PARTITION
-        (<varname>col1=val1</varname>,<varname>col2=val2</varname>,
-        ...)</codeph>; but you cannot include the partitioning columns in the
-        <codeph>FOR COLUMNS</codeph> clause. Also, you cannot use fully qualified table
-        names, so issue a <codeph>USE</codeph> command first to switch to the
-        appropriate database. For example:
-      </p>
-
-<codeblock>USE <varname>database_name</varname>;
-ANALYZE TABLE <varname>table_name</varname> COMPUTE STATISTICS FOR COLUMNS <varname>column_list</varname>;
-ANALYZE TABLE <varname>table_name</varname> PARTITION (<varname>partition_specs</varname>) COMPUTE STATISTICS FOR COLUMNS <varname>column_list</varname>;</codeblock>
--->
-
       <p rev="1.2.1">
-        To check whether column statistics are available for a particular set of columns, use the <codeph>SHOW
-        COLUMN STATS <varname>table_name</varname></codeph> statement, or check the extended
-        <codeph>EXPLAIN</codeph> output for a query against that table that refers to those columns. See
-        <xref href="impala_show.xml#show"/> and <xref href="impala_explain.xml#explain"/> for details.
+        To check whether column statistics are available for a particular set of columns, use
+        the <codeph>SHOW COLUMN STATS <varname>table_name</varname></codeph> statement, or check
+        the extended <codeph>EXPLAIN</codeph> output for a query against that table that refers
+        to those columns. See <xref href="impala_show.xml#show"/> and
+        <xref href="impala_explain.xml#explain"/> for details.
       </p>
 
       <p conref="../shared/impala_common.xml#common/hive_column_stats_caveat"/>
+
     </conbody>
+
   </concept>
 
   <concept id="perf_stats_partitions">
+
     <title id="stats_partitions">How Table and Column Statistics Work for Partitioned Tables</title>
+
     <conbody>
 
       <p>
-        When you use Impala for <q>big data</q>, you are highly likely to use partitioning
-        for your biggest tables, the ones representing data that can be logically divided
-        based on dates, geographic regions, or similar criteria. The table and column statistics
-        are especially useful for optimizing queries on such tables. For example, a query involving
-        one year might involve substantially more or less data than a query involving a different year,
-        or a range of several years. Each query might be optimized differently as a result.
+        When you use Impala for <q>big data</q>, you are highly likely to use partitioning for
+        your biggest tables, the ones representing data that can be logically divided based on
+        dates, geographic regions, or similar criteria. The table and column statistics are
+        especially useful for optimizing queries on such tables. For example, a query involving
+        one year might involve substantially more or less data than a query involving a
+        different year, or a range of several years. Each query might be optimized differently
+        as a result.
       </p>
 
       <p>
         The following examples show how table and column stats work with a partitioned table.
-        The table for this example is partitioned by year, month, and day.
-        For simplicity, the sample data consists of 5 partitions, all from the same year and month.
-        Table stats are collected independently for each partition. (In fact, the
-        <codeph>SHOW PARTITIONS</codeph> statement displays exactly the same information as
-        <codeph>SHOW TABLE STATS</codeph> for a partitioned table.) Column stats apply to
-        the entire table, not to individual partitions. Because the partition key column values
-        are represented as HDFS directories, their characteristics are typically known in advance,
-        even when the values for non-key columns are shown as -1.
+        The table for this example is partitioned by year, month, and day. For simplicity, the
+        sample data consists of 5 partitions, all from the same year and month. Table stats are
+        collected independently for each partition. (In fact, the <codeph>SHOW
+        PARTITIONS</codeph> statement displays exactly the same information as <codeph>SHOW
+        TABLE STATS</codeph> for a partitioned table.) Column stats apply to the entire table,
+        not to individual partitions. Because the partition key column values are represented as
+        HDFS directories, their characteristics are typically known in advance, even when the
+        values for non-key columns are shown as -1.
       </p>
 
 <codeblock>
@@ -355,207 +319,257 @@ show column stats year_month_day;
 </codeblock>
 
       <p conref="../shared/impala_common.xml#common/hive_column_stats_caveat"/>
+
     </conbody>
+
   </concept>
 
-  <concept rev="2.1.0" id="perf_stats_incremental">
+  <concept id="perf_generating_stats">
 
-    <title id="incremental_stats">Overview of Incremental Statistics</title>
+    <title>Generating Table and Column Statistics</title>
 
     <conbody>
 
       <p>
-        In Impala 2.1.0 and higher, you can use the syntax <codeph>COMPUTE INCREMENTAL STATS</codeph> and
-        <codeph>DROP INCREMENTAL STATS</codeph>. The <codeph>INCREMENTAL</codeph> clauses work with incremental
-        statistics, a specialized feature for partitioned tables that are large or frequently updated with new
-        partitions.
-      </p>
-
-      <p>
-        When you compute incremental statistics for a partitioned table, by default Impala only processes those
-        partitions that do not yet have incremental statistics. By processing only newly added partitions, you can
-        keep statistics up to date for large partitioned tables, without incurring the overhead of reprocessing the
-        entire table each time.
+        Use the <codeph>COMPUTE STATS</codeph> family of commands to collect table and
+        column statistics. The <codeph>COMPUTE STATS</codeph> variants offer
+        different tradeoffs between computation cost, staleness, and maintenance
+        workflows which are explained below.
       </p>
 
       <note type="important">
         <p conref="../shared/impala_common.xml#common/cs_or_cis"/>
-        <p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
-        <p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
       </note>
 
+      <!-- TODO: Commented out because it is inaccurate and confusing. Leaving this
+      material for future refactoring into a Hive-compatibility section.
       <p>
-        You can also compute or drop statistics for a single partition by including a <codeph>PARTITION</codeph>
-        clause in the <codeph>COMPUTE INCREMENTAL STATS</codeph> or <codeph>DROP INCREMENTAL STATS</codeph>
-        statement.
+        If you use Hive as part of your ETL workflow, you can also use Hive to generate table
+        and column statistics. You might need to do extra configuration within Hive itself, the
+        metastore, or even set up a separate database to hold Hive-generated statistics. You
+        might need to run multiple statements to generate all the necessary statistics.
+        Therefore, prefer the Impala <codeph>COMPUTE STATS</codeph> statement where that
+        technique is practical. For details about collecting statistics through Hive, see
+        <xref href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" scope="external" format="html">the
+        Hive wiki</xref>.
       </p>
+      -->
 
-      <p>
-        The metadata for incremental statistics is handled differently from the original style of statistics:
-      </p>
+    </conbody>
 
-      <ul>
-        <li>
-          <p>
-            If you have a partitioned table for which you have already run a regular <codeph>COMPUTE STATS</codeph>
-            statement, issuing <codeph>COMPUTE INCREMENTAL STATS</codeph> without a partition clause causes Impala
-            to rescan the entire table. Once the incremental statistics are computed, any future <codeph>COMPUTE INCREMENTAL
-            STATS</codeph> statements only scan any new partitions and any partitions where you performed
-            <codeph>DROP INCREMENTAL STATS</codeph>.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            The <codeph>SHOW TABLE STATS</codeph> and <codeph>SHOW PARTITIONS</codeph> statements now include an
-            additional column showing whether incremental statistics are available for each column. A partition
-            could already be covered by the original type of statistics based on a prior <codeph>COMPUTE
-            STATS</codeph> statement, as indicated by a value other than <codeph>-1</codeph> under the
-            <codeph>#Rows</codeph> column. Impala query planning uses either kind of statistics when available.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            <codeph>COMPUTE INCREMENTAL STATS</codeph> takes more time than <codeph>COMPUTE STATS</codeph> for the
-            same volume of data. Therefore it is most suitable for tables with large data volume where new
-            partitions are added frequently, making it impractical to run a full <codeph>COMPUTE STATS</codeph>
-            operation for each new partition. For unpartitioned tables, or partitioned tables that are loaded once
-            and not updated with new partitions, use the original <codeph>COMPUTE STATS</codeph> syntax.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            <codeph>COMPUTE INCREMENTAL STATS</codeph> uses some memory in the <cmdname>catalogd</cmdname> process,
-            proportional to the number of partitions and number of columns in the applicable table. The memory
-            overhead is approximately 400 bytes for each column in each partition. This memory is reserved in the
-            <cmdname>catalogd</cmdname> daemon, the <cmdname>statestored</cmdname> daemon, and in each instance of
-            the <cmdname>impalad</cmdname> daemon.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            In cases where new files are added to an existing partition, issue a <codeph>REFRESH</codeph> statement
-            for the table, followed by a <codeph>DROP INCREMENTAL STATS</codeph> and <codeph>COMPUTE INCREMENTAL
-            STATS</codeph> sequence for the changed partition.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            The <codeph>DROP INCREMENTAL STATS</codeph> statement operates only on a single partition at a time. To
-            remove statistics (whether incremental or not) from all partitions of a table, issue a <codeph>DROP
-            STATS</codeph> statement with no <codeph>INCREMENTAL</codeph> or <codeph>PARTITION</codeph> clauses.
-          </p>
-        </li>
-      </ul>
+    <concept id="concept_y2f_nfl_mdb">
 
-      <p>
-        The following considerations apply to incremental statistics when the structure of an existing table is
-        changed (known as <term>schema evolution</term>):
-      </p>
+      <title>COMPUTE STATS</title>
 
-      <ul>
-        <li>
-          <p>
-            If you use an <codeph>ALTER TABLE</codeph> statement to drop a column, the existing statistics remain
-            valid and <codeph>COMPUTE INCREMENTAL STATS</codeph> does not rescan any partitions.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            If you use an <codeph>ALTER TABLE</codeph> statement to add a column, Impala rescans all partitions and
-            fills in the appropriate column-level values the next time you run <codeph>COMPUTE INCREMENTAL
-            STATS</codeph>.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            If you use an <codeph>ALTER TABLE</codeph> statement to change the data type of a column, Impala
-            rescans all partitions and fills in the appropriate column-level values the next time you run
-            <codeph>COMPUTE INCREMENTAL STATS</codeph>.
-          </p>
-        </li>
-
-        <li>
-          <p>
-            If you use an <codeph>ALTER TABLE</codeph> statement to change the file format of a table, the existing
-            statistics remain valid and a subsequent <codeph>COMPUTE INCREMENTAL STATS</codeph> does not rescan any
-            partitions.
-          </p>
-        </li>
-      </ul>
+      <conbody>
 
-      <p>
-        See <xref href="impala_compute_stats.xml#compute_stats"/> and
-        <xref href="impala_drop_stats.xml#drop_stats"/> for syntax details.
-      </p>
-    </conbody>
-  </concept>
+        <p>
+          The <codeph>COMPUTE STATS</codeph> command collects and sets the table-level
+          and partition-level row counts as well as all column statistics for a given
+          table. The collection process is CPU-intensive and can take a long time to
+          complete for very large tables.
+        </p>
+        <p>
+          To speed up <codeph>COMPUTE STATS</codeph> consider the following options
+          which can be combined.
+          <ul>
+          <li>
+            <p>
+              Limit the number of columns for which statistics are collected to increase
+              the efficiency of COMPUTE STATS. Queries benefit from statistics for those
+              columns involved in filters, join conditions, group by or partition by
+              clauses. Other columns are good candidates to exclude from COMPUTE STATS.
+              This feature is available since Impala 2.12.
+            </p>
+          </li>
+          <li>
+            <p>
+              Set the MT_DOP query option to use more threads within each participating
+              impalad to compute the statistics faster - but not more efficiently. Note
+              that computing stats on a large table with a high MT_DOP value can
+              negatively affect other queries running at the same time if the
+              COMPUTE STATS claims most CPU cycles.
+              This feature is available since Impala 2.8.
+            </p>
+          </li>
+          </ul>
+        </p>
 
-  <concept id="perf_stats_computing">
-    <title>Generating Table and Column Statistics (COMPUTE STATS Statement)</title>
-    <conbody>
+        <p>
+          <codeph>COMPUTE STATS</codeph> is intended to be run periodically,
+          e.g. weekly, or on-demand when the contents of a table have changed
+          significantly. Due to the high resource utilization and long repsonse
+          time of t<codeph>COMPUTE STATS</codeph>, it is most practical to run it
+          in a scheduled maintnance window where the Impala cluster is idle
+          enough to accommodate the expensive operation. The degree of change that
+          qualifies as <q>significant</q> depends on the query workload, but typically,
+          if 30% of the rows have changed then it is recommended to recompute
+          statistics.
+        </p>
 
-      <p>
-        To gather table statistics after loading data into a table or partition, you typically use the
-        <codeph>COMPUTE STATS</codeph> statement. This statement is available in Impala 1.2.2 and higher.
-        It gathers both table statistics and column statistics for all columns in a single operation.
-        For large partitioned tables, where you frequently need to update statistics and it is impractical
-        to scan the entire table each time, use the syntax <codeph>COMPUTE INCREMENTAL STATS</codeph>,
-        which is available in <keyword keyref="impala21_full"/> and higher.
-      </p>
+        <p>
+          If you reload a complete new set of data for a table, but the number of rows and
+          number of distinct values for each column is relatively unchanged from before, you
+          do not need to recompute stats for the table.
+        </p>
 
-      <p>
-        If you use Hive as part of your ETL workflow, you can also use Hive to generate table and
-        column statistics. You might need to do extra configuration within Hive itself, the metastore,
-        or even set up a separate database to hold Hive-generated statistics. You might need to run
-        multiple statements to generate all the necessary statistics. Therefore, prefer the
-        Impala <codeph>COMPUTE STATS</codeph> statement where that technique is practical.
-        For details about collecting statistics through Hive, see
-        <xref href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" scope="external" format="html">the Hive wiki</xref>.
-      </p>
+      </conbody>
 
-      <p conref="../shared/impala_common.xml#common/hive_column_stats_caveat"/>
+    </concept>
 
-<!-- Commenting out over-detailed Hive instructions as part of stats reorg.
-        <li>
-          Issue an <codeph>ANALYZE TABLE</codeph> statement in Hive, for the entire table or a specific partition.
-<codeblock>ANALYZE TABLE <varname>tablename</varname> [PARTITION(<varname>partcol1</varname>[=<varname>val1</varname>], <varname>partcol2</varname>[=<varname>val2</varname>], ...)] COMPUTE STATISTICS [NOSCAN];</codeblock>
-          For example, to gather statistics for a non-partitioned table:
-<codeblock>ANALYZE TABLE customer COMPUTE STATISTICS;</codeblock>
-          To gather statistics for a <codeph>store</codeph> table partitioned by state and city, and both of its
-          partitions:
-<codeblock>ANALYZE TABLE store PARTITION(s_state, s_county) COMPUTE STATISTICS;</codeblock>
-          To gather statistics for the <codeph>store</codeph> table and only the partitions for California:
-<codeblock>ANALYZE TABLE store PARTITION(s_state='CA', s_county) COMPUTE STATISTICS;</codeblock>
-        </li>
-
-        <li>
-          Load the data through the <codeph>INSERT OVERWRITE</codeph> statement in Hive, while the Hive setting
-          <b>hive.stats.autogather</b> is enabled.
-        </li>
-
-      </ul>
--->
+    <concept id="concept_bmk_pfl_mdb">
 
-      <p rev="2.0.1">
-<!-- Additional info as a result of IMPALA-1420 -->
-<!-- Keep checking if https://issues.apache.org/jira/browse/HIVE-8648 ever gets fixed and when that fix makes it into an Impala release. -->
-        For your very largest tables, you might find that <codeph>COMPUTE STATS</codeph> or even <codeph>COMPUTE INCREMENTAL STATS</codeph>
-        take so long to scan the data that it is impractical to use them regularly. In such a case, after adding a partition or inserting new data,
-        you can update just the number of rows property through an <codeph>ALTER TABLE</codeph> statement.
-        See <xref href="impala_perf_stats.xml#perf_table_stats_manual"/> for details.
-        Because the column statistics might be left in a stale state, do not use this technique as a replacement
-        for <codeph>COMPUTE STATS</codeph>. Only use this technique if all other means of collecting statistics are impractical, or as a
-        low-overhead operation that you run in between periodic <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph> operations.
-      </p>
+      <title>COMPUTE INCREMENTAL STATS</title>
+
+      <conbody>
+
+        <p>
+          In Impala 2.1.0 and higher, you can use the
+          <codeph>COMPUTE INCREMENTAL STATS</codeph> and
+          <codeph>DROP INCREMENTAL STATS</codeph> commands.
+          The <codeph>INCREMENTAL</codeph> clauses work with incremental statistics,
+          a specialized feature for partitioned tables.
+        </p>
+
+        <p>
+          When you compute incremental statistics for a partitioned table, by default Impala only
+          processes those partitions that do not yet have incremental statistics. By processing
+          only newly added partitions, you can keep statistics up to date without incurring the
+          overhead of reprocessing the entire table each time.
+        </p>
+
+        <p>
+          You can also compute or drop statistics for a specified subset of partitions by
+          including a <codeph>PARTITION</codeph> clause in the
+          <codeph>COMPUTE INCREMENTAL STATS</codeph> or <codeph>DROP INCREMENTAL STATS</codeph>
+          statement.
+        </p>
+
+        <note type="important">
+          <p conref="../shared/impala_common.xml#common/incremental_stats_caveats"/>
+          <p conref="../shared/impala_common.xml#common/incremental_stats_after_full"/>
+        </note>
+
+        <p>
+          The metadata for incremental statistics is handled differently from the original style
+          of statistics:
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              Issuing a <codeph>COMPUTE INCREMENTAL STATS</codeph> without a partition
+              clause causes Impala to compute incremental stats for all partitions that
+              do not already have incremental stats. This might be the entire table when
+              running the command for the first time, but subsequent runs should only
+              update new partitions. You can force updating a partition that already has
+              incremental stats by issuing a <codeph>DROP INCREMENTAL STATS</codeph>
+              before running <codeph>COMPUTE INCREMENTAL STATS</codeph>.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The <codeph>SHOW TABLE STATS</codeph> and <codeph>SHOW PARTITIONS</codeph>
+              statements now include an additional column showing whether incremental statistics
+              are available for each column. A partition could already be covered by the original
+              type of statistics based on a prior <codeph>COMPUTE STATS</codeph> statement, as
+              indicated by a value other than <codeph>-1</codeph> under the <codeph>#Rows</codeph>
+              column. Impala query planning uses either kind of statistics when available.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              <codeph>COMPUTE INCREMENTAL STATS</codeph> takes more time than <codeph>COMPUTE
+              STATS</codeph> for the same volume of data. Therefore it is most suitable for tables
+              with large data volume where new partitions are added frequently, making it
+              impractical to run a full <codeph>COMPUTE STATS</codeph> operation for each new
+              partition. For unpartitioned tables, or partitioned tables that are loaded once and
+              not updated with new partitions, use the original <codeph>COMPUTE STATS</codeph>
+              syntax.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              <codeph>COMPUTE INCREMENTAL STATS</codeph> uses some memory in the
+              <cmdname>catalogd</cmdname> process, proportional to the number of partitions and
+              number of columns in the applicable table. The memory overhead is approximately 400
+              bytes for each column in each partition. This memory is reserved in the
+              <cmdname>catalogd</cmdname> daemon, the <cmdname>statestored</cmdname> daemon, and
+              in each instance of the <cmdname>impalad</cmdname> daemon.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              In cases where new files are added to an existing partition, issue a
+              <codeph>REFRESH</codeph> statement for the table, followed by a <codeph>DROP
+              INCREMENTAL STATS</codeph> and <codeph>COMPUTE INCREMENTAL STATS</codeph> sequence
+              for the changed partition.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The <codeph>DROP INCREMENTAL STATS</codeph> statement operates only on a single
+              partition at a time. To remove statistics (whether incremental or not) from all
+              partitions of a table, issue a <codeph>DROP STATS</codeph> statement with no
+              <codeph>INCREMENTAL</codeph> or <codeph>PARTITION</codeph> clauses.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          The following considerations apply to incremental statistics when the structure of an
+          existing table is changed (known as <term>schema evolution</term>):
+        </p>
+
+        <ul>
+          <li>
+            <p>
+              If you use an <codeph>ALTER TABLE</codeph> statement to drop a column, the existing
+              statistics remain valid and <codeph>COMPUTE INCREMENTAL STATS</codeph> does not
+              rescan any partitions.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              If you use an <codeph>ALTER TABLE</codeph> statement to add a column, Impala rescans
+              all partitions and fills in the appropriate column-level values the next time you
+              run <codeph>COMPUTE INCREMENTAL STATS</codeph>.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              If you use an <codeph>ALTER TABLE</codeph> statement to change the data type of a
+              column, Impala rescans all partitions and fills in the appropriate column-level
+              values the next time you run <codeph>COMPUTE INCREMENTAL STATS</codeph>.
+            </p>
+          </li>
+
+          <li>
+            <p>
+              If you use an <codeph>ALTER TABLE</codeph> statement to change the file format of a
+              table, the existing statistics remain valid and a subsequent <codeph>COMPUTE
+              INCREMENTAL STATS</codeph> does not rescan any partitions.
+            </p>
+          </li>
+        </ul>
+
+        <p>
+          See <xref href="impala_compute_stats.xml#compute_stats"/> and
+          <xref
+            href="impala_drop_stats.xml#drop_stats"/> for syntax details.
+        </p>
+
+      </conbody>
+
+    </concept>
 
-    </conbody>
   </concept>
 
   <concept rev="2.1.0" id="perf_stats_checking">
@@ -565,17 +579,18 @@ show column stats year_month_day;
     <conbody>
 
       <p>
-        You can check whether a specific table has statistics using the <codeph>SHOW TABLE STATS</codeph> statement
-        (for any table) or the <codeph>SHOW PARTITIONS</codeph> statement (for a partitioned table). Both
-        statements display the same information. If a table or a partition does not have any statistics, the
-        <codeph>#Rows</codeph> field contains <codeph>-1</codeph>. Once you compute statistics for the table or
-        partition, the <codeph>#Rows</codeph> field changes to an accurate value.
+        You can check whether a specific table has statistics using the <codeph>SHOW TABLE
+        STATS</codeph> statement (for any table) or the <codeph>SHOW PARTITIONS</codeph>
+        statement (for a partitioned table). Both statements display the same information. If a
+        table or a partition does not have any statistics, the <codeph>#Rows</codeph> field
+        contains <codeph>-1</codeph>. Once you compute statistics for the table or partition,
+        the <codeph>#Rows</codeph> field changes to an accurate value.
       </p>
 
       <p>
-        The following example shows a table that initially does not have any statistics. The <codeph>SHOW TABLE
-        STATS</codeph> statement displays different values for <codeph>#Rows</codeph> before and after the
-        <codeph>COMPUTE STATS</codeph> operation.
+        The following example shows a table that initially does not have any statistics. The
+        <codeph>SHOW TABLE STATS</codeph> statement displays different values for
+        <codeph>#Rows</codeph> before and after the <codeph>COMPUTE STATS</codeph> operation.
       </p>
 
 <codeblock>[localhost:21000] &gt; create table no_stats (x int);
@@ -601,9 +616,10 @@ show column stats year_month_day;
 
       <p>
         The following example shows a similar progression with a partitioned table. Initially,
-        <codeph>#Rows</codeph> is <codeph>-1</codeph>. After a <codeph>COMPUTE STATS</codeph> operation,
-        <codeph>#Rows</codeph> changes to an accurate value. Any newly added partition starts with no statistics,
-        meaning that you must collect statistics after adding a new partition.
+        <codeph>#Rows</codeph> is <codeph>-1</codeph>. After a <codeph>COMPUTE STATS</codeph>
+        operation, <codeph>#Rows</codeph> changes to an accurate value. Any newly added
+        partition starts with no statistics, meaning that you must collect statistics after
+        adding a new partition.
       </p>
 
 <codeblock>[localhost:21000] &gt; create table no_stats_partitioned (x int) partitioned by (year smallint);
@@ -638,20 +654,22 @@ show column stats year_month_day;
 </codeblock>
 
       <note>
-        Because the default <codeph>COMPUTE STATS</codeph> statement creates and updates statistics for all
-        partitions in a table, if you expect to frequently add new partitions, use the <codeph>COMPUTE INCREMENTAL
-        STATS</codeph> syntax instead, which lets you compute stats for a single specified partition, or only for
-        those partitions that do not already have incremental stats.
+        Because the default <codeph>COMPUTE STATS</codeph> statement creates and updates
+        statistics for all partitions in a table, if you expect to frequently add new
+        partitions, use the <codeph>COMPUTE INCREMENTAL STATS</codeph> syntax instead, which
+        lets you compute stats for a single specified partition, or only for those partitions
+        that do not already have incremental stats.
       </note>
 
       <p>
-        If checking each individual table is impractical, due to a large number of tables or views that hide the
-        underlying base tables, you can also check for missing statistics for a particular query. Use the
-        <codeph>EXPLAIN</codeph> statement to preview query efficiency before actually running the query. Use the
-        query profile output available through the <codeph>PROFILE</codeph> command in
-        <cmdname>impala-shell</cmdname> or the web UI to verify query execution and timing after running the query.
-        Both the <codeph>EXPLAIN</codeph> plan and the <codeph>PROFILE</codeph> output display a warning if any
-        tables or partitions involved in the query do not have statistics.
+        If checking each individual table is impractical, due to a large number of tables or
+        views that hide the underlying base tables, you can also check for missing statistics
+        for a particular query. Use the <codeph>EXPLAIN</codeph> statement to preview query
+        efficiency before actually running the query. Use the query profile output available
+        through the <codeph>PROFILE</codeph> command in <cmdname>impala-shell</cmdname> or the
+        web UI to verify query execution and timing after running the query. Both the
+        <codeph>EXPLAIN</codeph> plan and the <codeph>PROFILE</codeph> output display a warning
+        if any tables or partitions involved in the query do not have statistics.
       </p>
 
 <codeblock>[localhost:21000] &gt; create table no_stats (x int);
@@ -677,10 +695,11 @@ show column stats year_month_day;
 </codeblock>
 
       <p>
-        Because Impala uses the <term>partition pruning</term> technique when possible to only evaluate certain
-        partitions, if you have a partitioned table with statistics for some partitions and not others, whether or
-        not the <codeph>EXPLAIN</codeph> statement shows the warning depends on the actual partitions used by the
-        query. For example, you might see warnings or not for different queries against the same table:
+        Because Impala uses the <term>partition pruning</term> technique when possible to only
+        evaluate certain partitions, if you have a partitioned table with statistics for some
+        partitions and not others, whether or not the <codeph>EXPLAIN</codeph> statement shows
+        the warning depends on the actual partitions used by the query. For example, you might
+        see warnings or not for different queries against the same table:
       </p>
 
 <codeblock>-- No warning because all the partitions for the year 2012 have stats.
@@ -692,89 +711,36 @@ EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
 </codeblock>
 
       <p>
-        To confirm if any partitions at all in the table are missing statistics, you might explain a query that
-        scans the entire table, such as <codeph>SELECT COUNT(*) FROM <varname>table_name</varname></codeph>.
-      </p>
-    </conbody>
-  </concept>
-
-  <concept rev="2.1.0" id="perf_stats_collecting">
-
-    <title>Keeping Statistics Up to Date</title>
-
-    <conbody>
-
-      <p>
-        When the contents of a table or partition change significantly, recompute the stats for the relevant table
-        or partition. The degree of change that qualifies as <q>significant</q> varies, depending on the absolute
-        and relative sizes of the tables. Typically, if you add more than 30% more data to a table, it is
-        worthwhile to recompute stats, because the differences in number of rows and number of distinct values
-        might cause Impala to choose a different join order when that table is used in join queries. This guideline
-        is most important for the largest tables. For example, adding 30% new data to a table containing 1 TB has a
-        greater effect on join order than adding 30% to a table containing only a few megabytes, and the larger
-        table has a greater effect on query performance if Impala chooses a suboptimal join order as a result of
-        outdated statistics.
-      </p>
-
-      <p>
-        If you reload a complete new set of data for a table, but the number of rows and number of distinct values
-        for each column is relatively unchanged from before, you do not need to recompute stats for the table.
+        To confirm if any partitions at all in the table are missing statistics, you might
+        explain a query that scans the entire table, such as <codeph>SELECT COUNT(*) FROM
+        <varname>table_name</varname></codeph>.
       </p>
 
-      <p>
-        If the statistics for a table are out of date, and the table's large size makes it impractical to recompute
-        new stats immediately, you can use the <codeph>DROP STATS</codeph> statement to remove the obsolete
-        statistics, making it easier to identify tables that need a new <codeph>COMPUTE STATS</codeph> operation.
-      </p>
-
-      <p>
-        For a large partitioned table, consider using the incremental stats feature available in Impala 2.1.0 and
-        higher, as explained in <xref href="impala_perf_stats.xml#perf_stats_incremental"/>. If you add a new
-        partition to a table, it is worthwhile to recompute incremental stats, because the operation only scans the
-        data for that one new partition.
-      </p>
     </conbody>
-  </concept>
-
-<!-- Might deserve its own conceptual topic at some point. -->
-
-  <concept audience="hidden" rev="1.2.2" id="perf_stats_joins">
 
-    <title>How Statistics Are Used in Join Queries</title>
-
-    <conbody>
-
-      <p></p>
-    </conbody>
   </concept>
 
-<!-- Might deserve its own conceptual topic at some point. -->
+  <concept id="concept_s3c_4gl_mdb">
 
-  <concept audience="hidden" rev="1.2.2" id="perf_stats_inserts">
+    <title>Manually Setting Table and Column Statistics with ALTER TABLE</title>
 
-    <title>How Statistics Are Used in INSERT Operations</title>
+    <concept id="concept_wpt_pgl_mdb">
 
-    <conbody>
+      <title>Setting Table Statistics</title>
 
-      <p conref="../shared/impala_common.xml#common/insert_hints"/>
-    </conbody>
-  </concept>
-
-  <concept rev="1.2.2" id="perf_table_stats_manual">
-
-    <title>Setting the NUMROWS Value Manually through ALTER TABLE</title>
-
-    <conbody>
+      <conbody>
 
-      <p>
-        The most crucial piece of data in all the statistics is the number of rows in the table (for an
-        unpartitioned or partitioned table) and for each partition (for a partitioned table). The <codeph>COMPUTE STATS</codeph>
-        statement always gathers statistics about all columns, as well as overall table statistics. If it is not
-        practical to do a full <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph>
-        operation after adding a partition or inserting data, or if you can see that Impala would produce a more
-        efficient plan if the number of rows was different, you can manually set the number of rows through an
-        <codeph>ALTER TABLE</codeph> statement:
-      </p>
+        <p>
+          The most crucial piece of data in all the statistics is the number of rows in the
+          table (for an unpartitioned or partitioned table) and for each partition (for a
+          partitioned table). The <codeph>COMPUTE STATS</codeph> statement always gathers
+          statistics about all columns, as well as overall table statistics. If it is not
+          practical to do a full <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL
+          STATS</codeph> operation after adding a partition or inserting data, or if you can see
+          that Impala would produce a more efficient plan if the number of rows was different,
+          you can manually set the number of rows through an <codeph>ALTER TABLE</codeph>
+          statement:
+        </p>
 
 <codeblock>
 -- Set total number of rows. Applies to both unpartitioned and partitioned tables.
@@ -785,49 +751,66 @@ alter table <varname>table_name</varname> set tblproperties('numRows'='<varname>
 alter table <varname>table_name</varname> partition (<varname>keycol1</varname>=<varname>val1</varname>,<varname>keycol2</varname>=<varname>val2</varname>...) set tblproperties('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
 </codeblock>
 
-      <p>
-        This statement avoids re-scanning any data files. (The requirement to include the <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph> property is relatively new, as a
-        result of the issue <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref>
-        for the Hive metastore.)
-      </p>
+        <p>
+          This statement avoids re-scanning any data files. (The requirement to include the
+          <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph> property is relatively new, as a
+          result of the issue
+          <xref
+            href="https://issues.apache.org/jira/browse/HIVE-8648"
+            scope="external" format="html">HIVE-8648</xref>
+          for the Hive metastore.)
+        </p>
 
 <codeblock conref="../shared/impala_common.xml#common/set_numrows_example"/>
 
-      <p>
-        For a partitioned table, update both the per-partition number of rows and the number of rows for the whole
-        table:
-      </p>
+        <p>
+          For a partitioned table, update both the per-partition number of rows and the number
+          of rows for the whole table:
+        </p>
 
 <codeblock conref="../shared/impala_common.xml#common/set_numrows_partitioned_example"/>
 
-      <p>
-        In practice, the <codeph>COMPUTE STATS</codeph> statement, or <codeph>COMPUTE INCREMENTAL STATS</codeph>
-        for a partitioned table, should be fast and convenient enough that this technique is only useful for the very
-        largest partitioned tables.
-        <!--
+        <p>
+          In practice, the <codeph>COMPUTE STATS</codeph> statement, or <codeph>COMPUTE
+          INCREMENTAL STATS</codeph> for a partitioned table, should be fast and convenient
+          enough that this technique is only useful for the very largest partitioned tables.
+<!--
         It is most useful as a workaround for in case of performance issues where you might adjust the <codeph>numRows</codeph> value higher
         or lower to produce the ideal join order.
         -->
-        <!-- Following wording is duplicated from earlier. Consider conref'ing. -->
-        Because the column statistics might be left in a stale state, do not use this technique as a replacement
-        for <codeph>COMPUTE STATS</codeph>. Only use this technique if all other means of collecting statistics are impractical, or as a
-        low-overhead operation that you run in between periodic <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph> operations.
-      </p>
-    </conbody>
-  </concept>
+<!-- Following wording is duplicated from earlier. Consider conref'ing. -->
+          Because the column statistics might be left in a stale state, do not use this
+          technique as a replacement for <codeph>COMPUTE STATS</codeph>. Only use this technique
+          if all other means of collecting statistics are impractical, or as a low-overhead
+          operation that you run in between periodic <codeph>COMPUTE STATS</codeph> or
+          <codeph>COMPUTE INCREMENTAL STATS</codeph> operations.
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="concept_asb_vgl_mdb">
+
+      <title>Setting Column Statistics</title>
+
+      <conbody>
+
+        <p>
+          In <keyword keyref="impala26_full"/> and higher, you can also use the <codeph>SET
+          COLUMN STATS</codeph> clause of <codeph>ALTER TABLE</codeph> to manually set or change
+          column statistics. Only use this technique in cases where it is impractical to run
+          <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph>
+          frequently enough to keep up with data changes for a huge table.
+        </p>
+
+        <p conref="../shared/impala_common.xml#common/set_column_stats_example"
+        />
+
+      </conbody>
+
+    </concept>
 
-  <concept id="perf_column_stats_manual" rev="2.6.0 IMPALA-3369">
-    <title>Setting Column Stats Manually through ALTER TABLE</title>
-    <conbody>
-      <p>
-        In <keyword keyref="impala26_full"/> and higher, you can also use the <codeph>SET COLUMN STATS</codeph>
-        clause of <codeph>ALTER TABLE</codeph> to manually set or change column statistics.
-        Only use this technique in cases where it is impractical to run
-        <codeph>COMPUTE STATS</codeph> or <codeph>COMPUTE INCREMENTAL STATS</codeph>
-        frequently enough to keep up with data changes for a huge table.
-      </p>
-      <p conref="../shared/impala_common.xml#common/set_column_stats_example"/>
-    </conbody>
   </concept>
 
   <concept rev="1.2.2" id="perf_stats_examples">
@@ -837,20 +820,23 @@ alter table <varname>table_name</varname> partition (<varname>keycol1</varname>=
     <conbody>
 
       <p>
-        The following examples walk through a sequence of <codeph>SHOW TABLE STATS</codeph>, <codeph>SHOW COLUMN
-        STATS</codeph>, <codeph>ALTER TABLE</codeph>, and <codeph>SELECT</codeph> and <codeph>INSERT</codeph>
-        statements to illustrate various aspects of how Impala uses statistics to help optimize queries.
+        The following examples walk through a sequence of <codeph>SHOW TABLE STATS</codeph>,
+        <codeph>SHOW COLUMN STATS</codeph>, <codeph>ALTER TABLE</codeph>, and
+        <codeph>SELECT</codeph> and <codeph>INSERT</codeph> statements to illustrate various
+        aspects of how Impala uses statistics to help optimize queries.
       </p>
 
       <p>
-        This example shows table and column statistics for the <codeph>STORE</codeph> column used in the
-        <xref href="http://www.tpc.org/tpcds/" scope="external" format="html">TPC-DS benchmarks for decision
-        support</xref> systems. It is a tiny table holding data for 12 stores. Initially, before any statistics are
-        gathered by a <codeph>COMPUTE STATS</codeph> statement, most of the numeric fields show placeholder values
-        of -1, indicating that the figures are unknown. The figures that are filled in are values that are easily
-        countable or deducible at the physical level, such as the number of files, total data size of the files,
-        and the maximum and average sizes for data types that have a constant size such as <codeph>INT</codeph>,
-        <codeph>FLOAT</codeph>, and <codeph>TIMESTAMP</codeph>.
+        This example shows table and column statistics for the <codeph>STORE</codeph> column
+        used in the <xref href="http://www.tpc.org/tpcds/" scope="external" format="html">TPC-DS
+        benchmarks for decision support</xref> systems. It is a tiny table holding data for 12
+        stores. Initially, before any statistics are gathered by a <codeph>COMPUTE
+        STATS</codeph> statement, most of the numeric fields show placeholder values of -1,
+        indicating that the figures are unknown. The figures that are filled in are values that
+        are easily countable or deducible at the physical level, such as the number of files,
+        total data size of the files, and the maximum and average sizes for data types that have
+        a constant size such as <codeph>INT</codeph>, <codeph>FLOAT</codeph>, and
+        <codeph>TIMESTAMP</codeph>.
       </p>
 
 <codeblock>[localhost:21000] &gt; show table stats store;
@@ -897,11 +883,13 @@ Returned 1 row(s) in 0.03s
 Returned 29 row(s) in 0.04s</codeblock>
 
       <p>
-        With the Hive <codeph>ANALYZE TABLE</codeph> statement for column statistics, you had to specify each
-        column for which to gather statistics. The Impala <codeph>COMPUTE STATS</codeph> statement automatically
-        gathers statistics for all columns, because it reads through the entire table relatively quickly and can
-        efficiently compute the values for all the columns. This example shows how after running the
-        <codeph>COMPUTE STATS</codeph> statement, statistics are filled in for both the table and all its columns:
+        With the Hive <codeph>ANALYZE TABLE</codeph> statement for column statistics, you had to
+        specify each column for which to gather statistics. The Impala <codeph>COMPUTE
+        STATS</codeph> statement automatically gathers statistics for all columns, because it
+        reads through the entire table relatively quickly and can efficiently compute the values
+        for all the columns. This example shows how after running the <codeph>COMPUTE
+        STATS</codeph> statement, statistics are filled in for both the table and all its
+        columns:
       </p>
 
 <codeblock>[localhost:21000] &gt; compute stats store;
@@ -955,11 +943,12 @@ Returned 1 row(s) in 0.02s
 Returned 29 row(s) in 0.04s</codeblock>
 
       <p>
-        The following example shows how statistics are represented for a partitioned table. In this case, we have
-        set up a table to hold the world's most trivial census data, a single <codeph>STRING</codeph> field,
-        partitioned by a <codeph>YEAR</codeph> column. The table statistics include a separate entry for each
-        partition, plus final totals for the numeric fields. The column statistics include some easily deducible
-        facts for the partitioning column, such as the number of distinct values (the number of partition
+        The following example shows how statistics are represented for a partitioned table. In
+        this case, we have set up a table to hold the world's most trivial census data, a single
+        <codeph>STRING</codeph> field, partitioned by a <codeph>YEAR</codeph> column. The table
+        statistics include a separate entry for each partition, plus final totals for the
+        numeric fields. The column statistics include some easily deducible facts for the
+        partitioning column, such as the number of distinct values (the number of partition
         subdirectories).
 <!-- and the number of <codeph>NULL</codeph> values (none in this case). -->
       </p>
@@ -996,8 +985,8 @@ Returned 8 row(s) in 0.02s
 Returned 2 row(s) in 0.02s</codeblock>
 
       <p>
-        The following example shows how the statistics are filled in by a <codeph>COMPUTE STATS</codeph> statement
-        in Impala.
+        The following example shows how the statistics are filled in by a <codeph>COMPUTE
+        STATS</codeph> statement in Impala.
       </p>
 
 <codeblock>[localhost:21000] &gt; compute stats census;
@@ -1031,13 +1020,17 @@ Returned 8 row(s) in 0.02s
 Returned 2 row(s) in 0.02s</codeblock>
 
       <p rev="1.4.0">
-        For examples showing how some queries work differently when statistics are available, see
-        <xref href="impala_perf_joins.xml#perf_joins_examples"/>. You can see how Impala executes a query
-        differently in each case by observing the <codeph>EXPLAIN</codeph> output before and after collecting
-        statistics. Measure the before and after query times, and examine the throughput numbers in before and
-        after <codeph>SUMMARY</codeph> or <codeph>PROFILE</codeph> output, to verify how much the improved plan
-        speeds up performance.
+        For examples showing how some queries work differently when statistics are available,
+        see <xref href="impala_perf_joins.xml#perf_joins_examples"/>. You can see how Impala
+        executes a query differently in each case by observing the <codeph>EXPLAIN</codeph>
+        output before and after collecting statistics. Measure the before and after query times,
+        and examine the throughput numbers in before and after <codeph>SUMMARY</codeph> or
+        <codeph>PROFILE</codeph> output, to verify how much the improved plan speeds up
+        performance.
       </p>
+
     </conbody>
+
   </concept>
+
 </concept>


[18/20] impala git commit: IMPALA-6898: Avoid duplicate Kudu load during full dataload

Posted by sa...@apache.org.
IMPALA-6898: Avoid duplicate Kudu load during full dataload

testdata/bin/create-load-data.sh does bin/load-data.py for
functional/exhaustive, tpch/core, and tpcds/core in a
first phase, then it loads functional and tpch for Kudu
in a second phase. For a full dataload, this second phase
is not necessary. functional/exhaustive and tpch/core
already include Kudu.

This avoids the second phase when doing a full dataload.
The second phase is still necessary when loading from
a snapshot, and this does not change that behavior.

This saves a couple minutes off of full dataload.

Change-Id: Ic023d230f99126ed37795106c38faae5f0cb608e
Reviewed-on: http://gerrit.cloudera.org:8080/10128
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/b6a553ef
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/b6a553ef
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/b6a553ef

Branch: refs/heads/2.x
Commit: b6a553efa3afe177a68823331585705ee1ee1d17
Parents: d42f8d7
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Thu Apr 19 16:14:03 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Sat Apr 21 01:10:17 2018 +0000

----------------------------------------------------------------------
 testdata/bin/create-load-data.sh | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/b6a553ef/testdata/bin/create-load-data.sh
----------------------------------------------------------------------
diff --git a/testdata/bin/create-load-data.sh b/testdata/bin/create-load-data.sh
index e50515b..51ba449 100755
--- a/testdata/bin/create-load-data.sh
+++ b/testdata/bin/create-load-data.sh
@@ -540,8 +540,10 @@ elif [ "${TARGET_FILESYSTEM}" = "hdfs" ];  then
       load-data "functional-query" "core" "hbase/none"
 fi
 
-if $KUDU_IS_SUPPORTED; then
+if [[ $SKIP_METADATA_LOAD -eq 1 && $KUDU_IS_SUPPORTED ]]; then
   # Tests depend on the kudu data being clean, so load the data from scratch.
+  # This is only necessary if this is not a full dataload, because a full dataload
+  # already loads Kudu functional and TPC-H tables from scratch.
   run-step-backgroundable "Loading Kudu functional" load-kudu.log \
         load-data "functional-query" "core" "kudu/none/none" force
   run-step-backgroundable "Loading Kudu TPCH" load-kudu-tpch.log \


[07/20] impala git commit: IMPALA-6878: SentryServicePinger should not print stacktrace at every retry

Posted by sa...@apache.org.
IMPALA-6878: SentryServicePinger should not print stacktrace at every retry

With this patch, SentryServicePinger only prints the stacktrace at the
very end when Sentry Service is unable to start.

Testing:
- Started Sentry in a normal way to make sure no stack trace was printed
- Injected failure in Sentry to see the stack trace at the end

Change-Id: I26f9a141c89692443cb3cdcb6bf62581a93b5ba0
Reviewed-on: http://gerrit.cloudera.org:8080/10108
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/423ab360
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/423ab360
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/423ab360

Branch: refs/heads/2.x
Commit: 423ab36010b40eedb4f5657023f5b168db658cb2
Parents: dfc17b8
Author: Fredy wijaya <fw...@cloudera.com>
Authored: Wed Apr 18 14:02:59 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:21 2018 +0000

----------------------------------------------------------------------
 .../impala/testutil/SentryServicePinger.java    | 22 ++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/423ab360/fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
----------------------------------------------------------------------
diff --git a/fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java b/fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
index 96a849b..705f58a 100644
--- a/fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
+++ b/fe/src/test/java/org/apache/impala/testutil/SentryServicePinger.java
@@ -21,12 +21,12 @@ import org.apache.commons.cli.BasicParser;
 import org.apache.commons.cli.CommandLine;
 import org.apache.commons.cli.OptionBuilder;
 import org.apache.commons.cli.Options;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.impala.authorization.User;
 import org.apache.impala.authorization.SentryConfig;
+import org.apache.impala.authorization.User;
 import org.apache.impala.util.SentryPolicyService;
+import org.apache.log4j.Level;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * Simple class that issues a read-only RPC to the Sentry Service to check if it
@@ -41,6 +41,11 @@ public class SentryServicePinger {
   // Suppress warnings from OptionBuilder.
   @SuppressWarnings("static-access")
   public static void main(String[] args) throws Exception {
+    // Programmatically disable Sentry Thrift logging since Sentry error logging can be
+    // pretty noisy and verbose.
+    org.apache.log4j.Logger logger4j = org.apache.log4j.Logger.getLogger("sentry");
+    logger4j.setLevel(Level.OFF);
+
     // Parse command line options to get config file path.
     Options options = new Options();
     options.addOption(OptionBuilder.withLongOpt("config_file")
@@ -69,6 +74,7 @@ public class SentryServicePinger {
     int sleepSecs = Integer.parseInt(cmdArgs.getOptionValue("sleep_secs"));
 
     sentryConfig.loadConfig();
+    Exception exception = null;
     while (numPings > 0) {
       SentryPolicyService policyService = new SentryPolicyService(sentryConfig);
       try {
@@ -76,12 +82,16 @@ public class SentryServicePinger {
         LOG.info("Sentry Service ping succeeded.");
         System.exit(0);
       } catch (Exception e) {
-        LOG.error(String.format("Error issuing RPC to Sentry Service (attempt %d/%d): ",
-            maxPings - numPings + 1, maxPings), e);
+        exception = e;
+        LOG.error(String.format("Error issuing RPC to Sentry Service (attempt %d/%d)",
+            maxPings - numPings + 1, maxPings));
         Thread.sleep(sleepSecs * 1000);
       }
       --numPings;
     }
+    if (exception != null) {
+      LOG.error("Error starting Sentry Service: ", exception);
+    }
     System.exit(1);
   }
 }


[11/20] impala git commit: IMPALA-6884: Skip test_misaligned_orc_stripes on invalid platforms

Posted by sa...@apache.org.
IMPALA-6884: Skip test_misaligned_orc_stripes on invalid platforms

test_misaligned_orc_stripes relies on generating files
with a specific number of HDFS blocks. This is not
possible on S3, ADLS, or Isilon, so the test needs to
be skipped on those platforms. Additionally, it relies
on running multiple impalads, so it needs to be skipped
on local filesystem tests.

Change-Id: I8395b4eb06847a35c630b3ef43e289960d78382a
Reviewed-on: http://gerrit.cloudera.org:8080/10117
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d253c06f
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d253c06f
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d253c06f

Branch: refs/heads/2.x
Commit: d253c06f2b814ac188f04fefdfc95b3cde6203ee
Parents: c7c9433
Author: Joe McDonnell <jo...@cloudera.com>
Authored: Thu Apr 19 10:30:46 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:57 2018 +0000

----------------------------------------------------------------------
 tests/query_test/test_scanners.py | 4 ++++
 1 file changed, 4 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/d253c06f/tests/query_test/test_scanners.py
----------------------------------------------------------------------
diff --git a/tests/query_test/test_scanners.py b/tests/query_test/test_scanners.py
index f17561d..191db24 100644
--- a/tests/query_test/test_scanners.py
+++ b/tests/query_test/test_scanners.py
@@ -912,6 +912,10 @@ class TestOrc(ImpalaTestSuite):
     cls.ImpalaTestMatrix.add_constraint(
       lambda v: v.get_value('table_format').file_format == 'orc')
 
+  @SkipIfS3.hdfs_block_size
+  @SkipIfADLS.hdfs_block_size
+  @SkipIfIsilon.hdfs_block_size
+  @SkipIfLocal.multiple_impalad
   def test_misaligned_orc_stripes(self, vector, unique_database):
     self._build_lineitem_table_helper(unique_database, 'lineitem_threeblocks',
         'lineitem_threeblocks.orc')


[17/20] impala git commit: IMPALA-6896: NullPointerException in DESCRIBE FORMATTED on views

Posted by sa...@apache.org.
IMPALA-6896: NullPointerException in DESCRIBE FORMATTED on views

This patch fixes an issue where in ALTER VIEW the storage descriptor
is created with a new instance instead of reusing the existing
storage descriptor. This causes an issue where some HMS attributes
become nullable causing a NullPointerException.

The patch also differentiates between updating view attributes for
CREATE VIEW and ALTER VIEW.

Testing:
- Ran all front-end tests
- Added a new end-to-end test
- Ran the all end-to-end metadata tests

Change-Id: Ica2fb0c4f4b09cdf36eeb4911a1cbe7e98381d9e
Reviewed-on: http://gerrit.cloudera.org:8080/10132
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/62885d8d
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/62885d8d
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/62885d8d

Branch: refs/heads/2.x
Commit: 62885d8dedf9994707ba3d30c6b7d2d640551702
Parents: e114778
Author: Fredy Wijaya <fw...@cloudera.com>
Authored: Thu Apr 19 23:51:56 2018 -0500
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:58 2018 +0000

----------------------------------------------------------------------
 .../impala/service/CatalogOpExecutor.java       | 29 +++++++++++++-----
 tests/metadata/test_ddl.py                      | 32 ++++++++++++++++++++
 2 files changed, 54 insertions(+), 7 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/62885d8d/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
----------------------------------------------------------------------
diff --git a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
index 87513aa..fdee124 100644
--- a/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
+++ b/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
@@ -702,7 +702,7 @@ public class CatalogOpExecutor {
       }
 
       // Set the altered view attributes and update the metastore.
-      setViewAttributes(params, msTbl);
+      setAlterViewAttributes(params, msTbl);
       if (LOG.isTraceEnabled()) {
         LOG.trace(String.format("Altering view %s", tableName));
       }
@@ -1802,7 +1802,7 @@ public class CatalogOpExecutor {
     // Create new view.
     org.apache.hadoop.hive.metastore.api.Table view =
         new org.apache.hadoop.hive.metastore.api.Table();
-    setViewAttributes(params, view);
+    setCreateViewAttributes(params, view);
     LOG.trace(String.format("Creating view %s", tableName));
     if (!createTable(view, params.if_not_exists, null, response)) {
       addSummary(response, "View already exists.");
@@ -1901,9 +1901,10 @@ public class CatalogOpExecutor {
   }
 
   /**
-   * Sets the given params in the metastore table as appropriate for a view.
+   * Sets the given params in the metastore table as appropriate for a
+   * create view operation.
    */
-  private void setViewAttributes(TCreateOrAlterViewParams params,
+  private void setCreateViewAttributes(TCreateOrAlterViewParams params,
       org.apache.hadoop.hive.metastore.api.Table view) {
     view.setTableType(TableType.VIRTUAL_VIEW.toString());
     view.setViewOriginalText(params.getOriginal_view_def());
@@ -1912,12 +1913,11 @@ public class CatalogOpExecutor {
     view.setTableName(params.getView_name().getTable_name());
     view.setOwner(params.getOwner());
     if (view.getParameters() == null) view.setParameters(new HashMap<String, String>());
-    if (params.isSetComment() &&  params.getComment() != null) {
+    if (params.isSetComment() && params.getComment() != null) {
       view.getParameters().put("comment", params.getComment());
     }
-
-    // Add all the columns to a new storage descriptor.
     StorageDescriptor sd = new StorageDescriptor();
+    // Add all the columns to a new storage descriptor.
     sd.setCols(buildFieldSchemaList(params.getColumns()));
     // Set a dummy SerdeInfo for Hive.
     sd.setSerdeInfo(new SerDeInfo());
@@ -1925,6 +1925,21 @@ public class CatalogOpExecutor {
   }
 
   /**
+   * Sets the given params in the metastore table as appropriate for an
+   * alter view operation.
+   */
+  private void setAlterViewAttributes(TCreateOrAlterViewParams params,
+      org.apache.hadoop.hive.metastore.api.Table view) {
+    view.setViewOriginalText(params.getOriginal_view_def());
+    view.setViewExpandedText(params.getExpanded_view_def());
+    if (params.isSetComment() && params.getComment() != null) {
+      view.getParameters().put("comment", params.getComment());
+    }
+    // Add all the columns to a new storage descriptor.
+    view.getSd().setCols(buildFieldSchemaList(params.getColumns()));
+  }
+
+  /**
    * Appends one or more columns to the given table, optionally replacing all existing
    * columns.
    */

http://git-wip-us.apache.org/repos/asf/impala/blob/62885d8d/tests/metadata/test_ddl.py
----------------------------------------------------------------------
diff --git a/tests/metadata/test_ddl.py b/tests/metadata/test_ddl.py
index 27748bd..0830060 100644
--- a/tests/metadata/test_ddl.py
+++ b/tests/metadata/test_ddl.py
@@ -26,6 +26,7 @@ from tests.common.parametrize import UniqueDatabase
 from tests.common.skip import SkipIf, SkipIfADLS, SkipIfLocal
 from tests.common.test_dimensions import create_single_exec_option_dimension
 from tests.util.filesystem_utils import WAREHOUSE, IS_HDFS, IS_S3, IS_ADLS
+from tests.common.impala_cluster import ImpalaCluster
 
 # Validates DDL statements (create, drop)
 class TestDdlStatements(TestDdlBase):
@@ -361,6 +362,37 @@ class TestDdlStatements(TestDdlBase):
 |  01:SCAN HDFS [functional.alltypes b]
 00:SCAN HDFS [functional.alltypestiny a]""" in '\n'.join(plan.data)
 
+  def test_views_describe(self, vector, unique_database):
+    # IMPALA-6896: Tests that altered views can be described by all impalads.
+    impala_cluster = ImpalaCluster()
+    impalads = impala_cluster.impalads
+    first_client = impalads[0].service.create_beeswax_client()
+    try:
+      self.execute_query_expect_success(first_client,
+                                        "create view {0}.test_describe_view as "
+                                        "select * from functional.alltypes"
+                                        .format(unique_database), {'sync_ddl': 1})
+      self.execute_query_expect_success(first_client,
+                                        "alter view {0}.test_describe_view as "
+                                        "select * from functional.alltypesagg"
+                                        .format(unique_database))
+    finally:
+      first_client.close()
+
+    for impalad in impalads:
+      client = impalad.service.create_beeswax_client()
+      try:
+        while True:
+          result = self.execute_query_expect_success(
+              client, "describe formatted {0}.test_describe_view"
+              .format(unique_database))
+          if any("select * from functional.alltypesagg" in s.lower()
+                 for s in result.data):
+            break
+          time.sleep(1)
+      finally:
+        client.close()
+
   @UniqueDatabase.parametrize(sync_ddl=True)
   def test_functions_ddl(self, vector, unique_database):
     self.run_test_case('QueryTest/functions-ddl', vector, use_db=unique_database,


[15/20] impala git commit: IMPALA-5690: Part 1: Rename ostream operators for thrift types

Posted by sa...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/query-options.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options.cc b/be/src/service/query-options.cc
index 91378ab..a0b7227 100644
--- a/be/src/service/query-options.cc
+++ b/be/src/service/query-options.cc
@@ -68,14 +68,29 @@ void impala::OverlayQueryOptions(const TQueryOptions& src, const QueryOptionsMas
 #undef QUERY_OPT_FN
 }
 
+// Choose different print function based on the type.
+// TODO: In thrift 0.11.0 operator << is implemented for enums and this indirection can be
+// removed.
+template<typename T, typename std::enable_if_t<std::is_enum<T>::value>* = nullptr>
+string PrintQueryOptionValue(const T& option) {
+  return PrintThriftEnum(option);
+}
+
+template<typename T, typename std::enable_if_t<std::is_integral<T>::value>* = nullptr>
+string PrintQueryOptionValue(const T& option)  {
+  return std::to_string(option);
+}
+
+const string& PrintQueryOptionValue(const std::string& option)  {
+  return option;
+}
+
 void impala::TQueryOptionsToMap(const TQueryOptions& query_options,
     map<string, string>* configuration) {
 #define QUERY_OPT_FN(NAME, ENUM, LEVEL)\
   {\
     if (query_options.__isset.NAME) { \
-      stringstream val;\
-      val << query_options.NAME;\
-      (*configuration)[#ENUM] = val.str();\
+      (*configuration)[#ENUM] = PrintQueryOptionValue(query_options.NAME); \
     } else { \
       (*configuration)[#ENUM] = ""; \
     }\
@@ -384,7 +399,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
         if (size < RuntimeFilterBank::MIN_BLOOM_FILTER_SIZE ||
             size > RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE) {
           return Status(Substitute("$0 is not a valid Bloom filter size for $1. "
-                  "Valid sizes are in [$2, $3].", value, PrintTImpalaQueryOptions(
+                  "Valid sizes are in [$2, $3].", value, PrintThriftEnum(
                       static_cast<TImpalaQueryOptions::type>(option)),
                   RuntimeFilterBank::MIN_BLOOM_FILTER_SIZE,
                   RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE));
@@ -396,7 +411,7 @@ Status impala::SetQueryOption(const string& key, const string& value,
             && FLAGS_min_buffer_size <= RuntimeFilterBank::MAX_BLOOM_FILTER_SIZE) {
           return Status(Substitute("$0 should not be less than $1 which is the minimum "
               "buffer size that can be allocated by the buffer pool",
-              PrintTImpalaQueryOptions(static_cast<TImpalaQueryOptions::type>(option)),
+              PrintThriftEnum(static_cast<TImpalaQueryOptions::type>(option)),
               FLAGS_min_buffer_size));
         }
         if (option == TImpalaQueryOptions::RUNTIME_BLOOM_FILTER_SIZE) {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/query-result-set.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-result-set.cc b/be/src/service/query-result-set.cc
index 8d00af5..aacd849 100644
--- a/be/src/service/query-result-set.cc
+++ b/be/src/service/query-result-set.cc
@@ -182,8 +182,8 @@ Status AsciiQueryResultSet::AddOneRow(const TResultRow& row) {
   out_stream.precision(ASCII_PRECISION);
   for (int i = 0; i < num_col; ++i) {
     // ODBC-187 - ODBC can only take "\t" as the delimiter
-    out_stream << (i > 0 ? "\t" : "");
-    out_stream << row.colVals[i];
+    if (i > 0) out_stream << '\t';
+    PrintTColumnValue(out_stream, row.colVals[i]);
   }
   result_set_->push_back(out_stream.str());
   return Status::OK();

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/statestore/statestore.cc
----------------------------------------------------------------------
diff --git a/be/src/statestore/statestore.cc b/be/src/statestore/statestore.cc
index 02363fe..5c1952d 100644
--- a/be/src/statestore/statestore.cc
+++ b/be/src/statestore/statestore.cc
@@ -460,7 +460,7 @@ void Statestore::SubscribersHandler(const Webserver::ArgumentMap& args,
     Value subscriber_id(subscriber.second->id().c_str(), document->GetAllocator());
     sub_json.AddMember("id", subscriber_id, document->GetAllocator());
 
-    Value address(lexical_cast<string>(subscriber.second->network_address()).c_str(),
+    Value address(TNetworkAddressToString(subscriber.second->network_address()).c_str(),
         document->GetAllocator());
     sub_json.AddMember("address", address, document->GetAllocator());
 
@@ -875,7 +875,7 @@ void Statestore::DoSubscriberUpdate(UpdateKind update_kind, int thread_id,
         // TODO: Consider if a metric to track the number of failures would be useful.
         LOG(INFO) << "Subscriber '" << subscriber->id() << "' has failed, disconnected "
                   << "or re-registered (last known registration ID: "
-                  << update.registration_id << ")";
+                  << PrintId(update.registration_id) << ")";
         UnregisterSubscriber(subscriber.get());
       } else {
         LOG(INFO) << "Failure was already detected for subscriber '" << subscriber->id()

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/collection-metrics.h
----------------------------------------------------------------------
diff --git a/be/src/util/collection-metrics.h b/be/src/util/collection-metrics.h
index 1081c33..79ae072 100644
--- a/be/src/util/collection-metrics.h
+++ b/be/src/util/collection-metrics.h
@@ -160,7 +160,7 @@ class StatsMetric : public Metric {
     boost::lock_guard<boost::mutex> l(lock_);
     rapidjson::Value container(rapidjson::kObjectType);
     AddStandardFields(document, &container);
-    rapidjson::Value units(PrintTUnit(unit_).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit_).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
 
     if (StatsSelection & StatsType::COUNT) {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/debug-util.cc
----------------------------------------------------------------------
diff --git a/be/src/util/debug-util.cc b/be/src/util/debug-util.cc
index 1cb61e4..edf1749 100644
--- a/be/src/util/debug-util.cc
+++ b/be/src/util/debug-util.cc
@@ -50,62 +50,33 @@ DECLARE_string(hostname);
 
 namespace impala {
 
-#define THRIFT_ENUM_OUTPUT_FN_IMPL(E, MAP) \
-  ostream& operator<<(ostream& os, const E::type& e) {\
-    map<int, const char*>::const_iterator i;\
-    i = MAP.find(e);\
-    if (i != MAP.end()) {\
-      os << i->second;\
-    }\
-    return os;\
+#define PRINT_THRIFT_ENUM_IMPL(T) \
+  string PrintThriftEnum(const T::type& value) { \
+    map<int, const char*>::const_iterator it = _##T##_VALUES_TO_NAMES.find(value); \
+    return it == _##T##_VALUES_TO_NAMES.end() ? std::to_string(value) : it->second; \
   }
 
-// Macro to stamp out operator<< for thrift enums.  Why doesn't thrift do this?
-#define THRIFT_ENUM_OUTPUT_FN(E) THRIFT_ENUM_OUTPUT_FN_IMPL(E , _##E##_VALUES_TO_NAMES)
-
-// Macro to implement Print function that returns string for thrift enums. Make sure you
-// define a corresponding THRIFT_ENUM_OUTPUT_FN.
-#define THRIFT_ENUM_PRINT_FN(E) \
-  string Print##E(const E::type& e) {\
-    stringstream ss;\
-    ss << e;\
-    return ss.str();\
-  }
-
-THRIFT_ENUM_OUTPUT_FN(TFunctionBinaryType);
-THRIFT_ENUM_OUTPUT_FN(TCatalogObjectType);
-THRIFT_ENUM_OUTPUT_FN(TDdlType);
-THRIFT_ENUM_OUTPUT_FN(TCatalogOpType);
-THRIFT_ENUM_OUTPUT_FN(THdfsFileFormat);
-THRIFT_ENUM_OUTPUT_FN(THdfsCompression);
-THRIFT_ENUM_OUTPUT_FN(TReplicaPreference);
-THRIFT_ENUM_OUTPUT_FN(TSessionType);
-THRIFT_ENUM_OUTPUT_FN(TStmtType);
-THRIFT_ENUM_OUTPUT_FN(QueryState);
-THRIFT_ENUM_OUTPUT_FN(Encoding);
-THRIFT_ENUM_OUTPUT_FN(CompressionCodec);
-THRIFT_ENUM_OUTPUT_FN(Type);
-THRIFT_ENUM_OUTPUT_FN(TMetricKind);
-THRIFT_ENUM_OUTPUT_FN(TUnit);
-THRIFT_ENUM_OUTPUT_FN(TImpalaQueryOptions);
-
-THRIFT_ENUM_PRINT_FN(TCatalogObjectType);
-THRIFT_ENUM_PRINT_FN(TDdlType);
-THRIFT_ENUM_PRINT_FN(TCatalogOpType);
-THRIFT_ENUM_PRINT_FN(TReplicaPreference);
-THRIFT_ENUM_PRINT_FN(TSessionType);
-THRIFT_ENUM_PRINT_FN(TStmtType);
-THRIFT_ENUM_PRINT_FN(QueryState);
-THRIFT_ENUM_PRINT_FN(Encoding);
-THRIFT_ENUM_PRINT_FN(TMetricKind);
-THRIFT_ENUM_PRINT_FN(TUnit);
-THRIFT_ENUM_PRINT_FN(TImpalaQueryOptions);
-
-
-ostream& operator<<(ostream& os, const TUniqueId& id) {
-  os << PrintId(id);
-  return os;
-}
+PRINT_THRIFT_ENUM_IMPL(QueryState)
+PRINT_THRIFT_ENUM_IMPL(Encoding)
+PRINT_THRIFT_ENUM_IMPL(TCatalogObjectType)
+PRINT_THRIFT_ENUM_IMPL(TCatalogOpType)
+PRINT_THRIFT_ENUM_IMPL(TDdlType)
+PRINT_THRIFT_ENUM_IMPL(TExplainLevel)
+PRINT_THRIFT_ENUM_IMPL(THdfsCompression)
+PRINT_THRIFT_ENUM_IMPL(THdfsFileFormat)
+PRINT_THRIFT_ENUM_IMPL(THdfsSeqCompressionMode)
+PRINT_THRIFT_ENUM_IMPL(TImpalaQueryOptions)
+PRINT_THRIFT_ENUM_IMPL(TJoinDistributionMode)
+PRINT_THRIFT_ENUM_IMPL(TMetricKind)
+PRINT_THRIFT_ENUM_IMPL(TParquetArrayResolution)
+PRINT_THRIFT_ENUM_IMPL(TParquetFallbackSchemaResolution)
+PRINT_THRIFT_ENUM_IMPL(TPlanNodeType)
+PRINT_THRIFT_ENUM_IMPL(TPrefetchMode)
+PRINT_THRIFT_ENUM_IMPL(TReplicaPreference)
+PRINT_THRIFT_ENUM_IMPL(TRuntimeFilterMode)
+PRINT_THRIFT_ENUM_IMPL(TSessionType)
+PRINT_THRIFT_ENUM_IMPL(TStmtType)
+PRINT_THRIFT_ENUM_IMPL(TUnit)
 
 string PrintId(const TUniqueId& id, const string& separator) {
   stringstream out;
@@ -158,15 +129,6 @@ bool ParseId(const string& s, TUniqueId* id) {
   return valid;
 }
 
-string PrintPlanNodeType(const TPlanNodeType::type& type) {
-  map<int, const char*>::const_iterator i;
-  i = _TPlanNodeType_VALUES_TO_NAMES.find(type);
-  if (i != _TPlanNodeType_VALUES_TO_NAMES.end()) {
-    return i->second;
-  }
-  return "Invalid plan node type";
-}
-
 string PrintTuple(const Tuple* t, const TupleDescriptor& d) {
   if (t == NULL) return "null";
   stringstream out;

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/debug-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/debug-util.h b/be/src/util/debug-util.h
index 27d6cee..c5a5697 100644
--- a/be/src/util/debug-util.h
+++ b/be/src/util/debug-util.h
@@ -44,34 +44,34 @@ class Tuple;
 class TupleRow;
 class RowBatch;
 
-std::ostream& operator<<(std::ostream& os, const TFunctionBinaryType::type& op);
-std::ostream& operator<<(std::ostream& os, const TUniqueId& id);
-std::ostream& operator<<(std::ostream& os, const THdfsFileFormat::type& type);
-std::ostream& operator<<(std::ostream& os, const THdfsCompression::type& type);
-std::ostream& operator<<(std::ostream& os, const TStmtType::type& type);
-std::ostream& operator<<(std::ostream& os, const TUnit::type& type);
-std::ostream& operator<<(std::ostream& os, const TMetricKind::type& type);
-std::ostream& operator<<(std::ostream& os, const beeswax::QueryState::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::Encoding::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::CompressionCodec::type& type);
-std::ostream& operator<<(std::ostream& os, const parquet::Type::type& type);
+// TODO: remove these functions and use operator << after upgrading to Thrift 0.11.0 or
+// higher.
+std::string PrintThriftEnum(const beeswax::QueryState::type& value);
+std::string PrintThriftEnum(const parquet::Encoding::type& value);
+std::string PrintThriftEnum(const TCatalogObjectType::type& value);
+std::string PrintThriftEnum(const TCatalogOpType::type& value);
+std::string PrintThriftEnum(const TDdlType::type& value);
+std::string PrintThriftEnum(const TExplainLevel::type& value);
+std::string PrintThriftEnum(const THdfsCompression::type& value);
+std::string PrintThriftEnum(const THdfsFileFormat::type& value);
+std::string PrintThriftEnum(const THdfsSeqCompressionMode::type& value);
+std::string PrintThriftEnum(const TImpalaQueryOptions::type& value);
+std::string PrintThriftEnum(const TJoinDistributionMode::type& value);
+std::string PrintThriftEnum(const TMetricKind::type& value);
+std::string PrintThriftEnum(const TParquetArrayResolution::type& value);
+std::string PrintThriftEnum(const TParquetFallbackSchemaResolution::type& value);
+std::string PrintThriftEnum(const TPlanNodeType::type& value);
+std::string PrintThriftEnum(const TPrefetchMode::type& value);
+std::string PrintThriftEnum(const TReplicaPreference::type& value);
+std::string PrintThriftEnum(const TRuntimeFilterMode::type& value);
+std::string PrintThriftEnum(const TSessionType::type& value);
+std::string PrintThriftEnum(const TStmtType::type& value);
+std::string PrintThriftEnum(const TUnit::type& value);
 
 std::string PrintTuple(const Tuple* t, const TupleDescriptor& d);
 std::string PrintRow(TupleRow* row, const RowDescriptor& d);
 std::string PrintBatch(RowBatch* batch);
 std::string PrintId(const TUniqueId& id, const std::string& separator = ":");
-std::string PrintPlanNodeType(const TPlanNodeType::type& type);
-std::string PrintTCatalogObjectType(const TCatalogObjectType::type& type);
-std::string PrintTDdlType(const TDdlType::type& type);
-std::string PrintTCatalogOpType(const TCatalogOpType::type& type);
-std::string PrintTReplicaPreference(const TReplicaPreference::type& type);
-std::string PrintTSessionType(const TSessionType::type& type);
-std::string PrintTStmtType(const TStmtType::type& type);
-std::string PrintQueryState(const beeswax::QueryState::type& type);
-std::string PrintEncoding(const parquet::Encoding::type& type);
-std::string PrintTMetricKind(const TMetricKind::type& type);
-std::string PrintTUnit(const TUnit::type& type);
-std::string PrintTImpalaQueryOptions(const TImpalaQueryOptions::type& type);
 
 /// Returns the fully qualified path, e.g. "database.table.array_col.item.field"
 std::string PrintPath(const TableDescriptor& tbl_desc, const SchemaPath& path);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/histogram-metric.h
----------------------------------------------------------------------
diff --git a/be/src/util/histogram-metric.h b/be/src/util/histogram-metric.h
index d4e09e4..43d4eaf 100644
--- a/be/src/util/histogram-metric.h
+++ b/be/src/util/histogram-metric.h
@@ -62,10 +62,10 @@ class HistogramMetric : public Metric {
       container.AddMember("min", histogram_->MinValue(), document->GetAllocator());
       container.AddMember("count", histogram_->TotalCount(), document->GetAllocator());
     }
-    rapidjson::Value type_value(PrintTMetricKind(TMetricKind::HISTOGRAM).c_str(),
+    rapidjson::Value type_value(PrintThriftEnum(TMetricKind::HISTOGRAM).c_str(),
         document->GetAllocator());
     container.AddMember("kind", type_value, document->GetAllocator());
-    rapidjson::Value units(PrintTUnit(unit()).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit()).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
 
     *value = container;

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/metrics.h
----------------------------------------------------------------------
diff --git a/be/src/util/metrics.h b/be/src/util/metrics.h
index b513c1e..99115c4 100644
--- a/be/src/util/metrics.h
+++ b/be/src/util/metrics.h
@@ -152,10 +152,9 @@ class ScalarMetric: public Metric {
     ToJsonValue(GetValue(), TUnit::NONE, document, &metric_value);
     container.AddMember("value", metric_value, document->GetAllocator());
 
-    rapidjson::Value type_value(PrintTMetricKind(kind()).c_str(),
-        document->GetAllocator());
+    rapidjson::Value type_value(PrintThriftEnum(kind()).c_str(), document->GetAllocator());
     container.AddMember("kind", type_value, document->GetAllocator());
-    rapidjson::Value units(PrintTUnit(unit()).c_str(), document->GetAllocator());
+    rapidjson::Value units(PrintThriftEnum(unit()).c_str(), document->GetAllocator());
     container.AddMember("units", units, document->GetAllocator());
     *val = container;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/network-util.cc
----------------------------------------------------------------------
diff --git a/be/src/util/network-util.cc b/be/src/util/network-util.cc
index 7a10965..49c96b1 100644
--- a/be/src/util/network-util.cc
+++ b/be/src/util/network-util.cc
@@ -174,15 +174,10 @@ bool IsWildcardAddress(const string& ipaddress) {
 
 string TNetworkAddressToString(const TNetworkAddress& address) {
   stringstream ss;
-  ss << address;
+  ss << address.hostname << ":" << dec << address.port;
   return ss.str();
 }
 
-ostream& operator<<(ostream& out, const TNetworkAddress& hostport) {
-  out << hostport.hostname << ":" << dec << hostport.port;
-  return out;
-}
-
 /// Pick a random port in the range of ephemeral ports
 /// https://tools.ietf.org/html/rfc6335
 int FindUnusedEphemeralPort(vector<int>* used_ports) {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/network-util.h
----------------------------------------------------------------------
diff --git a/be/src/util/network-util.h b/be/src/util/network-util.h
index 5b108dc..ef270ee 100644
--- a/be/src/util/network-util.h
+++ b/be/src/util/network-util.h
@@ -73,9 +73,6 @@ std::string TNetworkAddressToString(const TNetworkAddress& address);
 Status TNetworkAddressToSockaddr(const TNetworkAddress& address,
     kudu::Sockaddr* sockaddr);
 
-/// Prints a hostport as ipaddress:port
-std::ostream& operator<<(std::ostream& out, const TNetworkAddress& hostport);
-
 /// Returns a ephemeral port that is currently unused. Returns -1 on an error or if
 /// a free ephemeral port can't be found after 100 tries. If 'used_ports' is non-NULL,
 /// does not select those ports and adds the selected port to 'used_ports'.

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/util/webserver.cc
----------------------------------------------------------------------
diff --git a/be/src/util/webserver.cc b/be/src/util/webserver.cc
index ea0a6e9..a77c6da 100644
--- a/be/src/util/webserver.cc
+++ b/be/src/util/webserver.cc
@@ -220,10 +220,10 @@ string Webserver::Url() {
 }
 
 Status Webserver::Start() {
-  LOG(INFO) << "Starting webserver on " << http_address_;
+  LOG(INFO) << "Starting webserver on " << TNetworkAddressToString(http_address_);
 
   stringstream listening_spec;
-  listening_spec << http_address_;
+  listening_spec << TNetworkAddressToString(http_address_);
 
   if (IsSecure()) {
     LOG(INFO) << "Webserver: Enabling HTTPS support";
@@ -320,7 +320,8 @@ Status Webserver::Start() {
 
   if (context_ == nullptr) {
     stringstream error_msg;
-    error_msg << "Webserver: Could not start on address " << http_address_;
+    error_msg << "Webserver: Could not start on address "
+              << TNetworkAddressToString(http_address_);
     return Status(error_msg.str());
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/testdata/workloads/functional-query/queries/QueryTest/set.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/set.test b/testdata/workloads/functional-query/queries/QueryTest/set.test
index 32ad938..c1db9f5 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/set.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/set.test
@@ -17,7 +17,7 @@ set all;
 'DISABLE_CACHED_READS','0','DEPRECATED'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','1','REGULAR'
+'EXPLAIN_LEVEL','STANDARD','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'
@@ -50,7 +50,7 @@ set all;
 'DISABLE_CACHED_READS','0','DEPRECATED'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','3','REGULAR'
+'EXPLAIN_LEVEL','VERBOSE','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'
@@ -83,7 +83,7 @@ set all;
 'DISABLE_CACHED_READS','0','DEPRECATED'
 'DISABLE_CODEGEN','0','REGULAR'
 'DISABLE_OUTERMOST_TOPN','0','DEVELOPMENT'
-'EXPLAIN_LEVEL','0','REGULAR'
+'EXPLAIN_LEVEL','MINIMAL','REGULAR'
 'HBASE_CACHE_BLOCKS','0','ADVANCED'
 'HBASE_CACHING','0','ADVANCED'
 'MAX_ERRORS','100','ADVANCED'

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/tests/shell/test_shell_commandline.py
----------------------------------------------------------------------
diff --git a/tests/shell/test_shell_commandline.py b/tests/shell/test_shell_commandline.py
index 10513b6..97f4a30 100644
--- a/tests/shell/test_shell_commandline.py
+++ b/tests/shell/test_shell_commandline.py
@@ -248,8 +248,8 @@ class TestImpalaShell(ImpalaTestSuite):
     args = '-q "set"'
     result_set = run_impala_shell_cmd(args)
     assert 'MEM_LIMIT: [0]' in result_set.stdout
-    # test to check that explain_level is 1
-    assert 'EXPLAIN_LEVEL: [1]' in result_set.stdout
+    # test to check that explain_level is STANDARD
+    assert 'EXPLAIN_LEVEL: [STANDARD]' in result_set.stdout
     # test to check that configs without defaults show up as []
     assert 'COMPRESSION_CODEC: []' in result_set.stdout
     # test values displayed after setting value


[05/20] impala git commit: IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS

Posted by sa...@apache.org.
IMPALA-5310: [DOCS] Document TABLESAMPLE clause for COMPUTE STATS

Change-Id: I214b63db391bd35562f5ea9091508005f83b2fcc
Reviewed-on: http://gerrit.cloudera.org:8080/8975
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/0ec3cd71
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/0ec3cd71
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/0ec3cd71

Branch: refs/heads/2.x
Commit: 0ec3cd71071ce623dcf9eb919dfca639f91a5bc7
Parents: 5f4d89f
Author: John Russell <jr...@cloudera.com>
Authored: Mon Jan 8 14:41:16 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:21 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_compute_stats.xml | 30 ++++++++++++++++++++++--------
 docs/topics/impala_tablesample.xml   |  6 ++++++
 2 files changed, 28 insertions(+), 8 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_compute_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_compute_stats.xml b/docs/topics/impala_compute_stats.xml
index b62972c..95343f4 100644
--- a/docs/topics/impala_compute_stats.xml
+++ b/docs/topics/impala_compute_stats.xml
@@ -39,18 +39,20 @@ under the License.
   <conbody>
 
     <p>
-      <indexterm audience="hidden">COMPUTE STATS statement</indexterm>
-      Gathers information about volume and distribution of data in a table and all associated columns and
-      partitions. The information is stored in the metastore database, and used by Impala to help optimize queries.
-      For example, if Impala can determine that a table is large or small, or has many or few distinct values it
-      can organize parallelize the work appropriately for a join query or insert operation. For details about the
-      kinds of information gathered by this statement, see <xref href="impala_perf_stats.xml#perf_stats"/>.
+      <indexterm audience="hidden">COMPUTE STATS statement</indexterm> The
+      COMPUTE STATS statement gathers information about volume and distribution
+      of data in a table and all associated columns and partitions. The
+      information is stored in the metastore database, and used by Impala to
+      help optimize queries. For example, if Impala can determine that a table
+      is large or small, or has many or few distinct values it can organize and
+      parallelize the work appropriately for a join query or insert operation.
+      For details about the kinds of information gathered by this statement, see
+        <xref href="impala_perf_stats.xml#perf_stats"/>.
     </p>
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
 
-<codeblock rev="impala-3562">COMPUTE STATS
-  [<varname>db_name</varname>.]<varname>table_name</varname> [ ( <varname>column_list</varname> ) ]
+<codeblock rev="2.1.0"><ph rev="2.12.0 IMPALA-5310">COMPUTE STATS [<varname>db_name</varname>.]<varname>table_name</varname>  [ ( <varname>column_list</varname> ) ] [TABLESAMPLE SYSTEM(<varname>percentage</varname>) [REPEATABLE(<varname>seed</varname>)]]</ph>
 
 <varname>column_list</varname> ::= <varname>column_name</varname> [ , <varname>column_name</varname>, ... ]
 
@@ -104,6 +106,18 @@ COMPUTE INCREMENTAL STATS [<varname>db_name</varname>.]<varname>table_name</varn
         STATS</codeph>.
     </p>
 
+    <p rev="2.12.0 IMPALA-5310">
+      In <keyword keyref="impala212_full"/> and
+      higher, an optional <codeph>TABLESAMPLE</codeph> clause immediately after
+      a table reference specifies that the <codeph>COMPUTE STATS</codeph>
+      operation only processes a specified percentage of the table data. For
+      tables that are so large that a full <codeph>COMPUTE STATS</codeph>
+      operation is impractical, you can use <codeph>COMPUTE STATS</codeph> with
+      a <codeph>TABLESAMPLE</codeph> clause to extrapolate statistics from a
+      sample of the table data. See <keyword keyref="perf_stats"/>about the
+      experimental stats extrapolation and sampling features.
+    </p>
+
     <p rev="2.1.0">
       The <codeph>COMPUTE INCREMENTAL STATS</codeph> variation is a shortcut for partitioned tables that works on a
       subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables

http://git-wip-us.apache.org/repos/asf/impala/blob/0ec3cd71/docs/topics/impala_tablesample.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_tablesample.xml b/docs/topics/impala_tablesample.xml
index f60c5be..e5123cb 100644
--- a/docs/topics/impala_tablesample.xml
+++ b/docs/topics/impala_tablesample.xml
@@ -81,6 +81,12 @@ under the License.
 
     <p conref="../shared/impala_common.xml#common/added_in_290"/>
 
+    <p rev="2.12.0 IMPALA-5310">
+      See <keyword keyref="compute_stats"/> for the
+        <codeph>TABLESAMPLE</codeph> clause used in the <codeph>COMPUTE
+        STATS</codeph> statement.
+    </p>
+
     <p conref="../shared/impala_common.xml#common/usage_notes_blurb"/>
 
     <p>


[20/20] impala git commit: IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

Posted by sa...@apache.org.
IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

- Updated the fixed versions for the issues fixed in 2.12 or earlier.
- Added new known issues open in 2.12.

Change-Id: I4638be7e488546287e3555945bb691a588ec6f09
Reviewed-on: http://gerrit.cloudera.org:8080/10101
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/8dd465f6
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/8dd465f6
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/8dd465f6

Branch: refs/heads/2.x
Commit: 8dd465f69e0e0614aa14f05ef4f32d9b0f1997fe
Parents: b6a553e
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Wed Apr 18 12:49:10 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Sat Apr 21 04:48:45 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_known_issues.xml | 1335 ++++++++++++++++++++----------
 1 file changed, 894 insertions(+), 441 deletions(-)
----------------------------------------------------------------------



[06/20] impala git commit: IMPALA-6748: [DOCS] Separators when casting STRING to TIMESTAMP

Posted by sa...@apache.org.
IMPALA-6748: [DOCS] Separators when casting STRING to TIMESTAMP

Change-Id: Ib82884d5f56c520712c4391b53b799d518d6a54f
Reviewed-on: http://gerrit.cloudera.org:8080/10052
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/5f4d89f8
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/5f4d89f8
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/5f4d89f8

Branch: refs/heads/2.x
Commit: 5f4d89f86cdf79bc4f718ac654ced17c67b93ce3
Parents: dff44e4
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Apr 12 15:52:29 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:21 2018 +0000

----------------------------------------------------------------------
 docs/shared/impala_common.xml    |  44 +++++-
 docs/topics/impala_timestamp.xml | 245 ++++++++++++++++++++--------------
 2 files changed, 182 insertions(+), 107 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/5f4d89f8/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index e651337..1d6ef1f 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1328,12 +1328,44 @@ drop database temp;
       </p>
 
       <p id="timestamp_conversions">
-        Impala automatically converts <codeph>STRING</codeph> literals of the correct format into
-        <codeph>TIMESTAMP</codeph> values. Timestamp values are accepted in the format
-        <codeph>"yyyy-MM-dd HH:mm:ss.SSSSSS"</codeph>, and can consist of just the date, or just the time, with or
-        without the fractional second portion. For example, you can specify <codeph>TIMESTAMP</codeph> values such as
-        <codeph>'1966-07-30'</codeph>, <codeph>'08:30:00'</codeph>, or <codeph>'1985-09-25 17:45:30.005'</codeph>.
-        <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+        Impala automatically converts <codeph>STRING</codeph> literals of the
+        correct format into <codeph>TIMESTAMP</codeph> values. Timestamp values
+        are accepted in the format <codeph>"yyyy-MM-dd HH:mm:ss.SSSSSS"</codeph>,
+        and can consist of just the date, or just the time, with or without the
+        fractional second portion. For example, you can specify <codeph>TIMESTAMP</codeph>
+        values such as <codeph>'1966-07-30'</codeph>, <codeph>'08:30:00'</codeph>,
+        or <codeph>'1985-09-25 17:45:30.005'</codeph>.
+      </p>
+      <p>Leading zeroes are not required in the numbers representing the date
+        component, such as month and date, or the time component, such as
+        month, date, hour, minute, second. For example, Impala accepts both
+        <codeph>"2018-1-1 01:02:03"</codeph> and
+          <codeph>"2018-01-01 1:2:3"</codeph> as valid.</p>
+
+      <p id="cast_string_to_timestamp">
+        When you convert or cast a <codeph>STRING</codeph> literal to <codeph>TIMESTAMP</codeph>,
+        you can use the following separators between the date part and the time part:
+        <ul>
+          <li>
+            <p>
+              One or more space characters
+            </p>
+
+            <p>
+              Example: <codeph>CAST ('2001-01-09   01:05:01' AS TIMESTAMP)</codeph>
+            </p>
+          </li>
+
+          <li>
+            <p>
+              The character “T”
+            </p>
+
+            <p>
+              Example: <codeph>CAST ('2001-01-09T01:05:01' AS TIMESTAMP)</codeph>
+            </p>
+          </li>
+        </ul>
       </p>
 
       <p>

http://git-wip-us.apache.org/repos/asf/impala/blob/5f4d89f8/docs/topics/impala_timestamp.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_timestamp.xml b/docs/topics/impala_timestamp.xml
index 320f99d..d032e33 100644
--- a/docs/topics/impala_timestamp.xml
+++ b/docs/topics/impala_timestamp.xml
@@ -21,7 +21,13 @@ under the License.
 <concept id="timestamp">
 
   <title>TIMESTAMP Data Type</title>
-  <titlealts audience="PDF"><navtitle>TIMESTAMP</navtitle></titlealts>
+
+  <titlealts audience="PDF">
+
+    <navtitle>TIMESTAMP</navtitle>
+
+  </titlealts>
+
   <prolog>
     <metadata>
       <data name="Category" value="Impala"/>
@@ -36,8 +42,8 @@ under the License.
   <conbody>
 
     <p>
-      A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph> statements, representing a
-      point in time.
+      A data type used in <codeph>CREATE TABLE</codeph> and <codeph>ALTER TABLE</codeph>
+      statements, representing a point in time.
     </p>
 
     <p conref="../shared/impala_common.xml#common/syntax_blurb"/>
@@ -49,9 +55,9 @@ under the License.
 <codeblock><varname>column_name</varname> TIMESTAMP</codeblock>
 
     <p>
-      <b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; this range is different from the Hive
-      <codeph>TIMESTAMP</codeph> type. Internally, the resolution of the time portion of a
-      <codeph>TIMESTAMP</codeph> value is in nanoseconds.
+      <b>Range:</b> Allowed date values range from 1400-01-01 to 9999-12-31; this range is
+      different from the Hive <codeph>TIMESTAMP</codeph> type. Internally, the resolution of the
+      time portion of a <codeph>TIMESTAMP</codeph> value is in nanoseconds.
     </p>
 
     <p>
@@ -59,16 +65,18 @@ under the License.
     </p>
 
     <p>
-      You can perform date arithmetic by adding or subtracting a specified number of time units, using the
-      <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and <codeph>-</codeph> operators or
-      <codeph>date_add()</codeph> and <codeph>date_sub()</codeph> functions. You can specify units as
-      <codeph>YEAR[S]</codeph>, <codeph>MONTH[S]</codeph>, <codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
+      You can perform date arithmetic by adding or subtracting a specified number of time units,
+      using the <codeph>INTERVAL</codeph> keyword and the <codeph>+</codeph> and
+      <codeph>-</codeph> operators or <codeph>date_add()</codeph> and
+      <codeph>date_sub()</codeph> functions. You can specify units as <codeph>YEAR[S]</codeph>,
+      <codeph>MONTH[S]</codeph>, <codeph>WEEK[S]</codeph>, <codeph>DAY[S]</codeph>,
       <codeph>HOUR[S]</codeph>, <codeph>MINUTE[S]</codeph>, <codeph>SECOND[S]</codeph>,
-      <codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and <codeph>NANOSECOND[S]</codeph>. You can
-      only specify one time unit in each interval expression, for example <codeph>INTERVAL 3 DAYS</codeph> or
-      <codeph>INTERVAL 25 HOURS</codeph>, but you can produce any granularity by adding together successive
-      <codeph>INTERVAL</codeph> values, such as <codeph><varname>timestamp_value</varname> + INTERVAL 3 WEEKS -
-      INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
+      <codeph>MILLISECOND[S]</codeph>, <codeph>MICROSECOND[S]</codeph>, and
+      <codeph>NANOSECOND[S]</codeph>. You can only specify one time unit in each interval
+      expression, for example <codeph>INTERVAL 3 DAYS</codeph> or <codeph>INTERVAL 25
+      HOURS</codeph>, but you can produce any granularity by adding together successive
+      <codeph>INTERVAL</codeph> values, such as <codeph><varname>timestamp_value</varname> +
+      INTERVAL 3 WEEKS - INTERVAL 1 DAY + INTERVAL 10 MICROSECONDS</codeph>.
     </p>
 
     <p>
@@ -86,34 +94,39 @@ insert into auction_details
     </p>
 
     <p>
-      By default, Impala does not store timestamps using the local timezone, to avoid undesired results from
-      unexpected time zone issues. Timestamps are stored and interpreted relative to UTC, both when written to or
-      read from data files, or when converted to or from Unix time values through functions such as
-      <codeph>from_unixtime()</codeph> or <codeph>unix_timestamp()</codeph>. To convert such a
-      <codeph>TIMESTAMP</codeph> value to one that represents the date and time in a specific time zone, convert
-      the original value with the <codeph>from_utc_timestamp()</codeph> function.
+      By default, Impala does not store timestamps using the local timezone, to avoid undesired
+      results from unexpected time zone issues. Timestamps are stored and interpreted relative
+      to UTC, both when written to or read from data files, or when converted to or from Unix
+      time values through functions such as <codeph>from_unixtime()</codeph> or
+      <codeph>unix_timestamp()</codeph>. To convert such a <codeph>TIMESTAMP</codeph> value to
+      one that represents the date and time in a specific time zone, convert the original value
+      with the <codeph>from_utc_timestamp()</codeph> function.
     </p>
 
     <p>
-      Because Impala does not assume that <codeph>TIMESTAMP</codeph> values are in any particular time zone, you
-      must be conscious of the time zone aspects of data that you query, insert, or convert.
+      Because Impala does not assume that <codeph>TIMESTAMP</codeph> values are in any
+      particular time zone, you must be conscious of the time zone aspects of data that you
+      query, insert, or convert.
     </p>
 
     <p>
-      For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> returned by the <codeph>now()</codeph>
-      function represents the local time in the system time zone, rather than in UTC. To store values relative to
-      the current time in a portable way, convert any <codeph>now()</codeph> return values using the
-      <codeph>to_utc_timestamp()</codeph> function first. For example, the following example shows that the current
-      time in California (where this Impala cluster is located) is shortly after 2 PM. If that value was written to a data
-      file, and shipped off to a distant server to be analyzed alongside other data from far-flung locations, the
-      dates and times would not match up precisely because of time zone differences. Therefore, the
-      <codeph>to_utc_timestamp()</codeph> function converts it using a common reference point, the UTC time zone
-      (descended from the old Greenwich Mean Time standard). The <codeph>'PDT'</codeph> argument indicates that the
-      original value is from the Pacific time zone with Daylight Saving Time in effect. When servers in all
-      geographic locations run the same transformation on any local date and time values (with the appropriate time
-      zone argument), the stored data uses a consistent representation. Impala queries can use functions such as
-      <codeph>EXTRACT()</codeph>, <codeph>MIN()</codeph>, <codeph>AVG()</codeph>, and so on to do time-series
-      analysis on those timestamps.
+      For consistency with Unix system calls, the <codeph>TIMESTAMP</codeph> returned by the
+      <codeph>now()</codeph> function represents the local time in the system time zone, rather
+      than in UTC. To store values relative to the current time in a portable way, convert any
+      <codeph>now()</codeph> return values using the <codeph>to_utc_timestamp()</codeph>
+      function first. For example, the following example shows that the current time in
+      California (where this Impala cluster is located) is shortly after 2 PM. If that value was
+      written to a data file, and shipped off to a distant server to be analyzed alongside other
+      data from far-flung locations, the dates and times would not match up precisely because of
+      time zone differences. Therefore, the <codeph>to_utc_timestamp()</codeph> function
+      converts it using a common reference point, the UTC time zone (descended from the old
+      Greenwich Mean Time standard). The <codeph>'PDT'</codeph> argument indicates that the
+      original value is from the Pacific time zone with Daylight Saving Time in effect. When
+      servers in all geographic locations run the same transformation on any local date and time
+      values (with the appropriate time zone argument), the stored data uses a consistent
+      representation. Impala queries can use functions such as <codeph>EXTRACT()</codeph>,
+      <codeph>MIN()</codeph>, <codeph>AVG()</codeph>, and so on to do time-series analysis on
+      those timestamps.
     </p>
 
 <codeblock>[localhost:21000] > select now();
@@ -131,12 +144,14 @@ insert into auction_details
 </codeblock>
 
     <p>
-      The converse function, <codeph>from_utc_timestamp()</codeph>, lets you take stored <codeph>TIMESTAMP</codeph>
-      data or calculated results and convert back to local date and time for processing on the application side.
-      The following example shows how you might represent some future date (such as the ending date and time of an
-      auction) in UTC, and then convert back to local time when convenient for reporting or other processing. The
-      final query in the example tests whether this arbitrary UTC date and time has passed yet, by converting it
-      back to the local time zone and comparing it against the current date and time.
+      The converse function, <codeph>from_utc_timestamp()</codeph>, lets you take stored
+      <codeph>TIMESTAMP</codeph> data or calculated results and convert back to local date and
+      time for processing on the application side. The following example shows how you might
+      represent some future date (such as the ending date and time of an auction) in UTC, and
+      then convert back to local time when convenient for reporting or other processing. The
+      final query in the example tests whether this arbitrary UTC date and time has passed yet,
+      by converting it back to the local time zone and comparing it against the current date and
+      time.
     </p>
 
 <codeblock>[localhost:21000] > select to_utc_timestamp(now() + interval 2 weeks, 'PDT');
@@ -160,35 +175,42 @@ insert into auction_details
 </codeblock>
 
     <p rev="2.2.0">
-      If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> values represent the local timezone
-      of the host where the data was written, potentially leading to inconsistent results when processed by Impala.
-      To avoid compatibility problems or having to code workarounds, you can specify one or both of these
-      <cmdname>impalad</cmdname> startup flags: <codeph>--use_local_tz_for_unix_timestamp_conversions=true</codeph>
+      If you have data files written by Hive, those <codeph>TIMESTAMP</codeph> values represent
+      the local timezone of the host where the data was written, potentially leading to
+      inconsistent results when processed by Impala. To avoid compatibility problems or having
+      to code workarounds, you can specify one or both of these <cmdname>impalad</cmdname>
+      startup flags: <codeph>--use_local_tz_for_unix_timestamp_conversions=true</codeph>
       <codeph>-convert_legacy_hive_parquet_utc_timestamps=true</codeph>. Although
-      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned off by default to avoid performance overhead, where practical
-      turn it on when processing <codeph>TIMESTAMP</codeph> columns in Parquet files written by Hive, to avoid unexpected behavior.
+      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> is turned off by default to
+      avoid performance overhead, where practical turn it on when processing
+      <codeph>TIMESTAMP</codeph> columns in Parquet files written by Hive, to avoid unexpected
+      behavior.
     </p>
 
     <p rev="2.2.0">
-      The <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting affects conversions from
-      <codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from <codeph>BIGINT</codeph>
-      to <codeph>TIMESTAMP</codeph>. By default, Impala treats all <codeph>TIMESTAMP</codeph> values as UTC,
-      to simplify analysis of time-series data from different geographic regions. When you enable the
+      The <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting affects
+      conversions from <codeph>TIMESTAMP</codeph> to <codeph>BIGINT</codeph>, or from
+      <codeph>BIGINT</codeph> to <codeph>TIMESTAMP</codeph>. By default, Impala treats all
+      <codeph>TIMESTAMP</codeph> values as UTC, to simplify analysis of time-series data from
+      different geographic regions. When you enable the
       <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting, these operations
-      treat the input values as if they are in the local tie zone of the host doing the processing.
-      See <xref href="impala_datetime_functions.xml#datetime_functions"/> for the list of functions
-      affected by the <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting.
+      treat the input values as if they are in the local tie zone of the host doing the
+      processing. See <xref
+        href="impala_datetime_functions.xml#datetime_functions"/>
+      for the list of functions affected by the
+      <codeph>--use_local_tz_for_unix_timestamp_conversions</codeph> setting.
     </p>
 
     <p>
-      The following sequence of examples shows how the interpretation of <codeph>TIMESTAMP</codeph> values in
-      Parquet tables is affected by the setting of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph>
-      setting.
+      The following sequence of examples shows how the interpretation of
+      <codeph>TIMESTAMP</codeph> values in Parquet tables is affected by the setting of the
+      <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting.
     </p>
 
     <p>
       Regardless of the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting,
-      <codeph>TIMESTAMP</codeph> columns in text tables can be written and read interchangeably by Impala and Hive:
+      <codeph>TIMESTAMP</codeph> columns in text tables can be written and read interchangeably
+      by Impala and Hive:
     </p>
 
 <codeblock>Impala DDL and queries for text table:
@@ -220,11 +242,12 @@ Time taken: 1.245 seconds, Fetched: 2 row(s)
 </codeblock>
 
     <p>
-      When the table uses Parquet format, Impala expects any time zone adjustment to be applied prior to writing,
-      while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to be in the UTC time zone. When Hive
-      queries Parquet data files that it wrote, it adjusts the <codeph>TIMESTAMP</codeph> values back to the local
-      time zone, while Impala does no conversion. Hive does no time zone conversion when it queries Impala-written
-      Parquet files.
+      When the table uses Parquet format, Impala expects any time zone adjustment to be applied
+      prior to writing, while <codeph>TIMESTAMP</codeph> values written by Hive are adjusted to
+      be in the UTC time zone. When Hive queries Parquet data files that it wrote, it adjusts
+      the <codeph>TIMESTAMP</codeph> values back to the local time zone, while Impala does no
+      conversion. Hive does no time zone conversion when it queries Impala-written Parquet
+      files.
     </p>
 
 <codeblock>Impala DDL and queries for Parquet table:
@@ -264,10 +287,11 @@ Time taken: 0.197 seconds, Fetched: 2 row(s)
 </codeblock>
 
     <p>
-      The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying values in the
-      <codeph>TIMESTAMP</codeph> column are different from the ones written by Impala, even though they were copied
-      from one table to another by an <codeph>INSERT ... SELECT</codeph> statement in Hive. Hive did an implicit
-      conversion from the local time zone to UTC as it wrote the values to Parquet.
+      The discrepancy arises when Impala queries the Hive-created Parquet table. The underlying
+      values in the <codeph>TIMESTAMP</codeph> column are different from the ones written by
+      Impala, even though they were copied from one table to another by an <codeph>INSERT ...
+      SELECT</codeph> statement in Hive. Hive did an implicit conversion from the local time
+      zone to UTC as it wrote the values to Parquet.
     </p>
 
 <codeblock>Impala query for TIMESTAMP values from Impala-written and Hive-written data:
@@ -310,11 +334,12 @@ Fetched 2 row(s) in 0.20s
 </codeblock>
 
     <p>
-      When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting is enabled, Impala recognizes
-      the Parquet data files written by Hive, and applies the same UTC-to-local-timezone conversion logic during
-      the query as Hive uses, making the contents of the Impala-written <codeph>P1</codeph> table and the
-      Hive-written <codeph>H1</codeph> table appear identical, whether represented as <codeph>TIMESTAMP</codeph>
-      values or the underlying <codeph>BIGINT</codeph> integers:
+      When the <codeph>-convert_legacy_hive_parquet_utc_timestamps</codeph> setting is enabled,
+      Impala recognizes the Parquet data files written by Hive, and applies the same
+      UTC-to-local-timezone conversion logic during the query as Hive uses, making the contents
+      of the Impala-written <codeph>P1</codeph> table and the Hive-written <codeph>H1</codeph>
+      table appear identical, whether represented as <codeph>TIMESTAMP</codeph> values or the
+      underlying <codeph>BIGINT</codeph> integers:
     </p>
 
 <codeblock>[localhost:21000] > select x from p1;
@@ -355,14 +380,23 @@ Fetched 2 row(s) in 0.22s
       <b>Conversions:</b>
     </p>
 
-    <p conref="../shared/impala_common.xml#common/timestamp_conversions"/>
+    <p conref="../shared/impala_common.xml#common/timestamp_conversions"
+      conrefend="../shared/impala_common.xml#common/cast_string_to_timestamp"/>
 
     <p>
-      In Impala 1.3 and higher, the <codeph>FROM_UNIXTIME()</codeph> and <codeph>UNIX_TIMESTAMP()</codeph>
-      functions allow a wider range of format strings, with more flexibility in element order, repetition of letter
-      placeholders, and separator characters. In <keyword keyref="impala23_full"/> and higher, the <codeph>UNIX_TIMESTAMP()</codeph>
-      function also allows a numeric timezone offset to be specified as part of the input string.
-      See <xref href="impala_datetime_functions.xml#datetime_functions"/> for details.
+      <ph conref="../shared/impala_common.xml#common/cast_int_to_timestamp"/>
+    </p>
+
+    <p>
+      In Impala 1.3 and higher, the <codeph>FROM_UNIXTIME()</codeph> and
+      <codeph>UNIX_TIMESTAMP()</codeph> functions allow a wider range of format strings, with
+      more flexibility in element order, repetition of letter placeholders, and separator
+      characters. In <keyword
+        keyref="impala23_full"/> and higher, the
+      <codeph>UNIX_TIMESTAMP()</codeph> function also allows a numeric timezone offset to be
+      specified as part of the input string. See
+      <xref
+        href="impala_datetime_functions.xml#datetime_functions"/> for details.
     </p>
 
     <p conref="../shared/impala_common.xml#common/y2k38"/>
@@ -372,11 +406,13 @@ Fetched 2 row(s) in 0.22s
     </p>
 
     <p>
-      Although you cannot use a <codeph>TIMESTAMP</codeph> column as a partition key, you can extract the
-      individual years, months, days, hours, and so on and partition based on those columns. Because the partition
-      key column values are represented in HDFS directory names, rather than as fields in the data files
-      themselves, you can also keep the original <codeph>TIMESTAMP</codeph> values if desired, without duplicating
-      data or wasting storage space. See <xref href="impala_partitioning.xml#partition_key_columns"/> for more
+      Although you cannot use a <codeph>TIMESTAMP</codeph> column as a partition key, you can
+      extract the individual years, months, days, hours, and so on and partition based on those
+      columns. Because the partition key column values are represented in HDFS directory names,
+      rather than as fields in the data files themselves, you can also keep the original
+      <codeph>TIMESTAMP</codeph> values if desired, without duplicating data or wasting storage
+      space. See <xref
+        href="impala_partitioning.xml#partition_key_columns"/> for more
       details on partitioning with date and time values.
     </p>
 
@@ -409,21 +445,23 @@ ERROR: AnalysisException: Type 'TIMESTAMP' is not supported as partition-column
     <p conref="../shared/impala_common.xml#common/restrictions_blurb"/>
 
     <p>
-      If you cast a <codeph>STRING</codeph> with an unrecognized format to a <codeph>TIMESTAMP</codeph>, the result
-      is <codeph>NULL</codeph> rather than an error. Make sure to test your data pipeline to be sure any textual
-      date and time values are in a format that Impala <codeph>TIMESTAMP</codeph> can recognize.
+      If you cast a <codeph>STRING</codeph> with an unrecognized format to a
+      <codeph>TIMESTAMP</codeph>, the result is <codeph>NULL</codeph> rather than an error. Make
+      sure to test your data pipeline to be sure any textual date and time values are in a
+      format that Impala <codeph>TIMESTAMP</codeph> can recognize.
     </p>
 
     <p conref="../shared/impala_common.xml#common/avro_no_timestamp"/>
 
     <p conref="../shared/impala_common.xml#common/kudu_blurb"/>
+
     <p conref="../shared/impala_common.xml#common/kudu_timestamp_details"/>
 
     <p conref="../shared/impala_common.xml#common/example_blurb"/>
 
     <p>
-      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values
-      with built-in functions:
+      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values with built-in
+      functions:
     </p>
 
 <codeblock>select cast('1966-07-30' as timestamp);
@@ -441,8 +479,8 @@ select now();                               -- Returns current date and time in
 </codeblock>
 
     <p>
-      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values
-      with HDFS-backed tables:
+      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values with
+      HDFS-backed tables:
     </p>
 
 <codeblock>create table dates_and_times (t timestamp);
@@ -451,8 +489,8 @@ insert into dates_and_times values
 </codeblock>
 
     <p rev="IMPALA-5137">
-      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values
-      with Kudu tables:
+      The following examples demonstrate using <codeph>TIMESTAMP</codeph> values with Kudu
+      tables:
     </p>
 
 <codeblock rev="IMPALA-5137">create table timestamp_t (x int primary key, s string, t timestamp, b bigint)
@@ -495,16 +533,21 @@ select s, t, b from timestamp_t order by t;
       </li>
 
       <li>
-        To convert to or from different date formats, or perform date arithmetic, use the date and time functions
-        described in <xref href="impala_datetime_functions.xml#datetime_functions"/>. In particular, the
-        <codeph>from_unixtime()</codeph> function requires a case-sensitive format string such as
-        <codeph>"yyyy-MM-dd HH:mm:ss.SSSS"</codeph>, matching one of the allowed variations of a
-        <codeph>TIMESTAMP</codeph> value (date plus time, only date, only time, optional fractional seconds).
+        To convert to or from different date formats, or perform date arithmetic, use the date
+        and time functions described in
+        <xref
+          href="impala_datetime_functions.xml#datetime_functions"/>. In
+        particular, the <codeph>from_unixtime()</codeph> function requires a case-sensitive
+        format string such as <codeph>"yyyy-MM-dd HH:mm:ss.SSSS"</codeph>, matching one of the
+        allowed variations of a <codeph>TIMESTAMP</codeph> value (date plus time, only date,
+        only time, optional fractional seconds).
       </li>
 
       <li>
-        See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"/> for details about differences in
-        <codeph>TIMESTAMP</codeph> handling between Impala and Hive.
+        See <xref href="impala_langref_unsupported.xml#langref_hiveql_delta"
+        /> for
+        details about differences in <codeph>TIMESTAMP</codeph> handling between Impala and
+        Hive.
       </li>
     </ul>
 


[13/20] impala git commit: IMPALA-6880: disable flaky bloom filter test

Posted by sa...@apache.org.
IMPALA-6880: disable flaky bloom filter test

This test is made flaky by IMPALA-6338. While that is being worked on,
temporarily disable this test.

Change-Id: I595645b0f2875614294adc7abb4572aec1be8ad5
Reviewed-on: http://gerrit.cloudera.org:8080/10122
Reviewed-by: Vuk Ercegovac <ve...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/f51dce02
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/f51dce02
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/f51dce02

Branch: refs/heads/2.x
Commit: f51dce02dd42287ae07460979466034a9461e87d
Parents: d253c06
Author: Thomas Tauber-Marshall <tm...@cloudera.com>
Authored: Thu Apr 19 18:47:08 2018 +0000
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:58 2018 +0000

----------------------------------------------------------------------
 .../functional-query/queries/QueryTest/bloom_filters.test         | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/f51dce02/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
----------------------------------------------------------------------
diff --git a/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test b/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
index 4908e20..b8d896f 100644
--- a/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
+++ b/testdata/workloads/functional-query/queries/QueryTest/bloom_filters.test
@@ -15,7 +15,8 @@ select STRAIGHT_JOIN count(*) from (select * from tpch.lineitem a LIMIT 1) a
 ---- RESULTS
 0
 ---- RUNTIME_PROFILE
-row_regex: .*0 of 1 Runtime Filter Published, 1 Disabled.*
+# TODO: reenable once IMPALA-6338 is fixed
+#row_regex: .*0 of 1 Runtime Filter Published, 1 Disabled.*
 ====
 
 


[12/20] impala git commit: IMPALA-6917: Limit impalad mem-limit to 12GB.

Posted by sa...@apache.org.
IMPALA-6917: Limit impalad mem-limit to 12GB.

This changes the memlimit choosing code in start-impala-cluster to have
a maximum of 12GB. In 68GB machines, this has the effect of reducing the
memlimit from ~15.8GB to 12GB. On machines with less than 51.4 GB of
RAM, this has no effect.

I'm tinkering with this threshold because ASAN builds sometimes
die at the hands of the OOM killer on m2.4xlarge machines (68GB).
My theory for why it's only sometimes is that during the parallel
tests, our memory usage could vary widely depending on which tests
and queries execute.

End-users don't see this code; this is only used by our minicluster
tests to test Impala.

I have run the ASAN build with this change successfully, though I've
found this particular OOM seems to come and go.

Change-Id: I8024414c5c23bb42cce912d8f34cd0b787e0e39a
Reviewed-on: http://gerrit.cloudera.org:8080/10051
Reviewed-by: Philip Zeyliger <ph...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/b018c5bf
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/b018c5bf
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/b018c5bf

Branch: refs/heads/2.x
Commit: b018c5bfa0bd471fb3fdf3661b201e38e2f43720
Parents: a8c3e2b
Author: Philip Zeyliger <ph...@cloudera.com>
Authored: Sat Apr 7 20:43:03 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:57 2018 +0000

----------------------------------------------------------------------
 bin/start-impala-cluster.py | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/b018c5bf/bin/start-impala-cluster.py
----------------------------------------------------------------------
diff --git a/bin/start-impala-cluster.py b/bin/start-impala-cluster.py
index bfe71d5..e870852 100755
--- a/bin/start-impala-cluster.py
+++ b/bin/start-impala-cluster.py
@@ -216,15 +216,21 @@ def start_impalad_instances(cluster_size, num_coordinators, use_exclusive_coordi
     # No impalad instances should be started.
     return
 
+  # Set mem_limit of each impalad to the smaller of 12GB or
+  # 1/cluster_size (typically 1/3) of 70% of system memory.
+  #
   # The default memory limit for an impalad is 80% of the total system memory. On a
   # mini-cluster with 3 impalads that means 240%. Since having an impalad be OOM killed
   # is very annoying, the mem limit will be reduced. This can be overridden using the
   # --impalad_args flag. virtual_memory().total returns the total physical memory.
   # The exact ratio to use is somewhat arbitrary. Peak memory usage during
   # tests depends on the concurrency of parallel tests as well as their ordering.
-  # At a ratio of 0.8, on 8-core, 68GB machines, ASAN builds can trigger the OOM
-  # killer, so this ratio is currently set to 0.7.
+  # On the other hand, to avoid using too much memory, we limit the
+  # memory choice here to max out at 12GB. This should be sufficient for tests.
+  #
+  # Beware that ASAN builds use more memory than regular builds.
   mem_limit = int(0.7 * psutil.virtual_memory().total / cluster_size)
+  mem_limit = min(12 * 1024 * 1024 * 1024, mem_limit)
 
   delay_list = []
   if options.catalog_init_delays != "":


[02/20] impala git commit: IMPALA-6867: [DOCS] SHUFFLE_DISTINCT_EXPRS Query option added

Posted by sa...@apache.org.
IMPALA-6867: [DOCS] SHUFFLE_DISTINCT_EXPRS Query option added

Change-Id: I42cb22a13b26bdc3d22585e4fc23934dfc91722a
Reviewed-on: http://gerrit.cloudera.org:8080/10094
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/27d6f5ee
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/27d6f5ee
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/27d6f5ee

Branch: refs/heads/2.x
Commit: 27d6f5ee612d8abf84956529efffaaab183fd18b
Parents: a71f50a
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Tue Apr 17 20:04:39 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:20 2018 +0000

----------------------------------------------------------------------
 docs/impala.ditamap                           |  1 +
 docs/impala_keydefs.ditamap                   |  1 +
 docs/topics/impala_shuffle_distinct_exprs.xml | 59 ++++++++++++++++++++++
 3 files changed, 61 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/27d6f5ee/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 8dd22f7..7ef7d47 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -226,6 +226,7 @@ under the License.
           <topicref rev="2.8.0 IMPALA-3671" href="topics/impala_scratch_limit.xml"/>
           <!-- This option is for internal use only and might go away without ever being documented. -->
           <!-- <topicref href="topics/impala_seq_compression_mode.xml"/> -->
+          <topicref href="topics/impala_shuffle_distinct_exprs.xml"/>
           <topicref href="topics/impala_support_start_over.xml"/>
           <topicref href="topics/impala_sync_ddl.xml"/>
           <topicref href="topics/impala_v_cpu_cores.xml"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/27d6f5ee/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index ae6b2f8..27f0515 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10824,6 +10824,7 @@ under the License.
   <keydef href="topics/impala_s3_skip_insert_staging.xml" keys="s3_skip_insert_staging"/>
   <keydef href="topics/impala_scan_node_codegen_threshold.xml" keys="scan_node_codegen_threshold"/>
   <keydef href="topics/impala_scratch_limit.xml" keys="scratch_limit"/>
+  <keydef href="topics/impala_shuffle_distinct_exprs.xml" keys="shuffle_distinct_exprs"/>
   <keydef href="topics/impala_support_start_over.xml" keys="support_start_over"/>
   <keydef href="topics/impala_sync_ddl.xml" keys="sync_ddl"/>
   <keydef href="topics/impala_v_cpu_cores.xml" keys="v_cpu_cores"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/27d6f5ee/docs/topics/impala_shuffle_distinct_exprs.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_shuffle_distinct_exprs.xml b/docs/topics/impala_shuffle_distinct_exprs.xml
new file mode 100644
index 0000000..804ca4f
--- /dev/null
+++ b/docs/topics/impala_shuffle_distinct_exprs.xml
@@ -0,0 +1,59 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
+<concept id="shuffle_distinct_exprs">
+
+  <title>SHUFFLE_DISTINCT_EXPRS Query Option</title>
+
+  <titlealts audience="PDF">
+
+    <navtitle>SHUFFLE_DISTINCT_EXPRS</navtitle>
+
+  </titlealts>
+
+  <prolog>
+    <metadata>
+      <data name="Category" value="Impala"/>
+      <data name="Category" value="Impala Query Options"/>
+    </metadata>
+  </prolog>
+
+  <conbody>
+
+    <p>
+      The <codeph>SHUFFLE_DISTINCT_EXPRS</codeph> query option controls the
+      shuffling behavior when a query has both grouping and distinct expressions.
+      Impala can optionally include the distinct expressions in the hash exchange
+      to spread the data among more nodes. However, this plan requires one more
+      hash exchange phase.
+    </p>
+
+    <p>
+      It is recommended that you turn off this option if the NDVs of the grouping
+      expressions are high.
+    </p>
+
+    <p conref="../shared/impala_common.xml#common/type_boolean"/>
+
+    <p conref="../shared/impala_common.xml#common/default_false"/>
+
+  </conbody>
+
+</concept>


[03/20] impala git commit: IMPALA-6793: Fix empty metadata after statestore restarts

Posted by sa...@apache.org.
IMPALA-6793: Fix empty metadata after statestore restarts

IMPALA-5990 introduced a bug where restarting the statestore
deterministically clears the metadata without ever coming back. The
cause of the bug is a wrong condition used by catalog to detect the
restart of statestore.

A custom cluster regression test is added. The process restarting
utility function in the custom cluster test is changed into using
shell=True in popen.

Change-Id: I332a60e172af84b93b3544373fe363cdced5e8d0
Reviewed-on: http://gerrit.cloudera.org:8080/9921
Tested-by: Impala Public Jenkins <im...@cloudera.com>
Reviewed-by: Tianyi Wang <tw...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a71f50a3
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a71f50a3
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a71f50a3

Branch: refs/heads/2.x
Commit: a71f50a39ff3e5043c6408f0a5dc987c645dae2a
Parents: 2be6814
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Fri Apr 6 13:06:35 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:20 2018 +0000

----------------------------------------------------------------------
 be/src/catalog/catalog-server.cc              | 13 ++---
 be/src/catalog/catalog-server.h               |  5 --
 tests/common/impala_cluster.py                |  2 +-
 tests/custom_cluster/test_restart_services.py | 59 ++++++++++++++++++++++
 4 files changed, 67 insertions(+), 12 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/a71f50a3/be/src/catalog/catalog-server.cc
----------------------------------------------------------------------
diff --git a/be/src/catalog/catalog-server.cc b/be/src/catalog/catalog-server.cc
index 8a91c25..e645204 100644
--- a/be/src/catalog/catalog-server.cc
+++ b/be/src/catalog/catalog-server.cc
@@ -157,7 +157,7 @@ CatalogServer::CatalogServer(MetricGroup* metrics)
   : thrift_iface_(new CatalogServiceThriftIf(this)),
     thrift_serializer_(FLAGS_compact_catalog_topic), metrics_(metrics),
     topic_updates_ready_(false), last_sent_catalog_version_(0L),
-    catalog_objects_min_version_(0L), catalog_objects_max_version_(0L) {
+    catalog_objects_max_version_(0L) {
   topic_processing_time_metric_ = StatsMetric<double>::CreateAndRegister(metrics,
       CATALOG_SERVER_TOPIC_PROCESSING_TIMES);
 }
@@ -228,10 +228,12 @@ void CatalogServer::UpdateCatalogTopicCallback(
 
   const TTopicDelta& delta = topic->second;
 
-  // If not generating a delta update and 'pending_topic_updates_' doesn't already contain
-  // the full catalog (beginning with version 0), then force GatherCatalogUpdatesThread()
-  // to reload the full catalog.
-  if (delta.from_version == 0 && catalog_objects_min_version_ != 0) {
+  // If the statestore restarts, both from_version and to_version would be 0. If catalog
+  // has sent non-empty topic udpate, pending_topic_updates_ won't be from version 0 and
+  // it should be re-collected.
+  if (delta.from_version == 0 && delta.to_version == 0 &&
+      last_sent_catalog_version_ != 0) {
+    LOG(INFO) << "Statestore restart detected. Collecting a non-delta catalog update.";
     last_sent_catalog_version_ = 0L;
   } else if (!pending_topic_updates_.empty()) {
     // Process the pending topic update.
@@ -284,7 +286,6 @@ void CatalogServer::UpdateCatalogTopicCallback(
       if (!status.ok()) {
         LOG(ERROR) << status.GetDetail();
       } else {
-        catalog_objects_min_version_ = last_sent_catalog_version_;
         catalog_objects_max_version_ = resp.max_catalog_version;
       }
     }

http://git-wip-us.apache.org/repos/asf/impala/blob/a71f50a3/be/src/catalog/catalog-server.h
----------------------------------------------------------------------
diff --git a/be/src/catalog/catalog-server.h b/be/src/catalog/catalog-server.h
index 2fa8ce7..1df83a3 100644
--- a/be/src/catalog/catalog-server.h
+++ b/be/src/catalog/catalog-server.h
@@ -119,11 +119,6 @@ class CatalogServer {
   /// Set in UpdateCatalogTopicCallback() and protected by the catalog_lock_.
   int64_t last_sent_catalog_version_;
 
-  /// The minimum catalog object version in pending_topic_updates_. All items in
-  /// pending_topic_updates_ will be greater than this version. Set by the
-  /// catalog_update_gathering_thread_ and protected by catalog_lock_.
-  int64_t catalog_objects_min_version_;
-
   /// The max catalog version in pending_topic_updates_. Set by the
   /// catalog_update_gathering_thread_ and protected by catalog_lock_.
   int64_t catalog_objects_max_version_;

http://git-wip-us.apache.org/repos/asf/impala/blob/a71f50a3/tests/common/impala_cluster.py
----------------------------------------------------------------------
diff --git a/tests/common/impala_cluster.py b/tests/common/impala_cluster.py
index 3fbcacf..276c02b 100644
--- a/tests/common/impala_cluster.py
+++ b/tests/common/impala_cluster.py
@@ -29,7 +29,7 @@ from tests.common.impala_service import (
     CatalogdService,
     ImpaladService,
     StateStoredService)
-from tests.util.shell_util import exec_process_async, exec_process
+from tests.util.shell_util import exec_process, exec_process_async
 
 logging.basicConfig(level=logging.ERROR, format='%(threadName)s: %(message)s')
 LOG = logging.getLogger('impala_cluster')

http://git-wip-us.apache.org/repos/asf/impala/blob/a71f50a3/tests/custom_cluster/test_restart_services.py
----------------------------------------------------------------------
diff --git a/tests/custom_cluster/test_restart_services.py b/tests/custom_cluster/test_restart_services.py
new file mode 100644
index 0000000..bcfe19d
--- /dev/null
+++ b/tests/custom_cluster/test_restart_services.py
@@ -0,0 +1,59 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+import pytest
+
+from impala.error import HiveServer2Error
+from tests.common.environ import specific_build_type_timeout
+from time import sleep
+
+from tests.common.custom_cluster_test_suite import CustomClusterTestSuite
+
+
+
+class TestRestart(CustomClusterTestSuite):
+  @classmethod
+  def get_workload(cls):
+    return 'functional-query'
+
+  @pytest.mark.execute_serially
+  def test_restart_statestore(self, cursor):
+    """ Regression test of IMPALA-6973. After the statestore restarts, the metadata should
+        eventually recover after being cleared by the new statestore.
+    """
+    try:
+      self.cluster.statestored.restart()
+      # We need to wait for the impalad to register to the new statestored and for a
+      # non-empty catalog update from the new statestored. It cannot be expressed with the
+      # existing metrics yet so we wait for some time here.
+      wait_time_s = specific_build_type_timeout(60, slow_build_timeout=100)
+      sleep(wait_time_s)
+      for retry in xrange(wait_time_s):
+        try:
+          cursor.execute("describe database functional")
+          return
+        except HiveServer2Error, e:
+          assert "AnalysisException: Database does not exist: functional" in e.message,\
+              "Unexpected exception: " + e.message
+          sleep(1)
+      assert False, "Coordinator never received non-empty metadata from the restarted " \
+          "statestore after {0} seconds".format(wait_time_s)
+    finally:
+      # Workaround for IMPALA-5695. Restarted process has to be manually killed or it will
+      # block start-impala-cluster.py from killing impala daemons.
+      self.cluster.statestored.kill()
+      self.cluster.statestored.wait()


[14/20] impala git commit: IMPALA-6459: [DOCS] Part 2: Stats extrapolation and sampling.

Posted by sa...@apache.org.
IMPALA-6459: [DOCS] Part 2: Stats extrapolation and sampling.

Adds new materials under COMPUTE STATS describing
the experimental stats extrapolation and sampling
features.

More cleanup and examples are needed. This patch provides
a reasonable starting point which we can extend.

Change-Id: Idae7a377b5873701e91f60afa62dde2bd8aacd1b
Reviewed-on: http://gerrit.cloudera.org:8080/10112
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/d42f8d7c
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/d42f8d7c
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/d42f8d7c

Branch: refs/heads/2.x
Commit: d42f8d7c61bc819047b36564f4be1c0a544a4f0d
Parents: 62885d8
Author: Alex Behm <al...@cloudera.com>
Authored: Tue Apr 17 17:12:17 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:58 2018 +0000

----------------------------------------------------------------------
 docs/topics/impala_perf_stats.xml | 135 +++++++++++++++++++++++++++++++++
 1 file changed, 135 insertions(+)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/d42f8d7c/docs/topics/impala_perf_stats.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_perf_stats.xml b/docs/topics/impala_perf_stats.xml
index f503a68..dab2eb8 100644
--- a/docs/topics/impala_perf_stats.xml
+++ b/docs/topics/impala_perf_stats.xml
@@ -392,6 +392,12 @@ show column stats year_month_day;
               This feature is available since Impala 2.8.
             </p>
           </li>
+          <li>
+            <p>
+              Consider the experimental extrapolation and sampling features (see below)
+              to further increase the efficiency of computing stats.
+            </p>
+          </li>
           </ul>
         </p>
 
@@ -415,6 +421,135 @@ show column stats year_month_day;
 
       </conbody>
 
+      <concept id="experimental_stats_features">
+        <title>Experimental: Extrapolation and Sampling</title>
+        <conbody>
+          <p>
+            Impala 2.12 and higher includes two experimental features to alleviate
+            common issues for computing and maintaining statistics on very large tables.
+            The following shortcomings are improved upon:
+            <ul>
+            <li>
+              <p>
+                Newly added partitions do not have row count statistics. Table scans
+                that only access those new partitions are treated as not having stats.
+                Similarly, table scans that access both new and old partitions estimate
+                the scan cardinality based on those old partitions that have stats, and
+                the new partitions without stats are treated as having 0 rows.
+              </p>
+            </li>
+            <li>
+              <p>
+                The row counts of existing partitions become stale when data is added
+                or dropped.
+              </p>
+            </li>
+            <li>
+              <p>
+                Computing stats for tables with a 100,000 or more partitions might fail
+                or be very slow due to the high cost of updating the partition metadata
+                in the Hive Metastore.
+              </p>
+            </li>
+            <li>
+              <p>
+                With transient compute resources it is important to minimize the time
+                from starting a new cluster to successfully running queries.
+                Since the cluster might be relatively short-lived, users might prefer to
+                quickly collect stats that are "good enough" as opposed to spending
+                a lot of time and resouces on computing full-fidelity stats.
+              </p>
+            </li>
+            </ul>
+            For very large tables, it is often wasteful or impractical to run a full
+            COMPUTE STATS to address the scenarios above on a frequent basis.
+          </p>
+          <p>
+            The sampling feature makes COMPUTE STATS more efficient by processing a
+            fraction of the table data, and the extrapolation feature aims to reduce
+            the frequency at which COMPUTE STATS needs to be re-run by estimating
+            the row count of new and modified partitions.
+          </p>
+          <p>
+            The sampling and extrapolation features are disabled by default.
+            They can be enabled globally or for specific tables, as follows.
+            Set the impalad start-up configuration "--enable_stats_extrapolation" to
+            enable the features globally. To enable them only for a specific table, set
+            the "impala.enable.stats.extrapolation" table property to "true" for the
+            desired table. The tbale-level property overrides the global setting, so
+            it is also possible to enable sampling and extrapolation globally, but
+            disable it for specific tables by setting the table property to "false".
+            Example:
+            ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true")
+          </p>
+          <note>
+            Why are these features experimental? Due to their probabilistic nature
+            it is possible that these features perform pathologically poorly on tables
+            with extreme data/file/size distributions. Since it is not feasible for us
+            to test all possible scenarios we only cautiously advertise these new
+            capabilities. That said, the features have been thoroughly tested and
+            are considered functionally stable. If you decide to give these features
+            a try, please tell us about your experience at user@impala.apache.org!
+            We rely on user feedback to guide future inprovements in statistics
+            collection.
+          </note>
+        </conbody>
+
+        <concept id="experimental_stats_extrapolation">
+          <title>Stats Extrapolation</title>
+          <conbody>
+            <p>
+              The main idea of stats extrapolation is to estimate the row count of new
+              and modified partitions based on the result of the last COMPUTE STATS.
+              Enabling stats extrapolation changes the behavior of COMPUTE STATS,
+              as well as the cardinality estimation of table scans. COMPUTE STATS no
+              longer computes and stores per-partition row counts, and instead, only
+              computes a table-level row count together with the total number of file
+              bytes in the table at that time. No partition metadata is modified. The
+              input cardinality of a table scan is estimated by converting the data
+              volume of relevant partitions to a row count, based on the table-level
+              row count and file bytes statistics. It is assumed that within the same
+              table, different sets of files with the same data volume correspond
+              to the similar number of rows on average. With extrapolation enabled,
+              the scan cardinality estimation ignores per-partition row counts. It
+              only relies on the table-level statistics and the scanned data volume.
+            </p>
+            <p>
+              The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts
+              stored in the Hive Metastore, and the row counts extrapolated based on the
+              above process. Consult the SHOW TABLE STATS and EXPLAIN documentation
+              for more details.
+            </p>
+          </conbody>
+        </concept>
+
+        <concept id="experimental_stats_sampling">
+          <title>Sampling</title>
+          <conbody>
+            <p>
+              A TABLESAMPLE clause may be added to COMPUTE STATS to limit the
+              percentage of data to be processed. The final statistics are obtained
+              by extrapolating the statistics from the data sample over the entire table.
+              The extrapolated statistics are stored in the Hive Metastore, just as if no
+              sampling was used. The following example runs COMPUTE STATS over a 10 percent
+              data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10)
+            </p>
+            <p>
+            We have found that a 10 percent sampling rate typically offers a good
+            tradeoff between statistics accuracy and execution cost. A sampling rate
+            well below 10 percent has shown poor results and is not recommended.
+            </p>
+            <note type="important">
+              Sampling-based techniques sacrifice result accuracy for execution
+              efficiency, so your mileage may vary for different tables and columns
+              depending on their data distribution. The extrapolation procedure Impala
+              uses for estimating the number of distinct values per column is inherently
+              non-detetministic, so your results may even vary between runs of
+              COMPUTE STATS TABLESAMPLE, even if no data has changed.
+            </note>
+          </conbody>
+        </concept>
+      </concept>
     </concept>
 
     <concept id="concept_bmk_pfl_mdb">


[04/20] impala git commit: IMPALA-6886: [DOCS] Removed impala_cluster_sizing.xml

Posted by sa...@apache.org.
IMPALA-6886: [DOCS] Removed impala_cluster_sizing.xml

Change-Id: I03d605d33ed6ced809074b1fc96def30ad0887fd
Reviewed-on: http://gerrit.cloudera.org:8080/10109
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/dfc17b86
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/dfc17b86
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/dfc17b86

Branch: refs/heads/2.x
Commit: dfc17b86a2df2e6a4675bce2d64676a29ec59231
Parents: 0ec3cd7
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Wed Apr 18 16:49:51 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Thu Apr 19 22:10:21 2018 +0000

----------------------------------------------------------------------
 docs/impala.ditamap                   |   2 +-
 docs/topics/impala_cluster_sizing.xml | 371 -----------------------------
 2 files changed, 1 insertion(+), 372 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/dfc17b86/docs/impala.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala.ditamap b/docs/impala.ditamap
index 7ef7d47..ec6d313 100644
--- a/docs/impala.ditamap
+++ b/docs/impala.ditamap
@@ -47,7 +47,7 @@ under the License.
   </topicref>
   <topicref href="topics/impala_planning.xml">
     <topicref href="topics/impala_prereqs.xml#prereqs"/>
-    <topicref href="topics/impala_cluster_sizing.xml"/>
+    <!-- Removed per Alan Choi's request on 4/18/2018 <topicref href="topics/impala_cluster_sizing.xml"/> -->
     <topicref href="topics/impala_schema_design.xml"/>
   </topicref>
   <topicref audience="standalone" href="topics/impala_install.xml#install">

http://git-wip-us.apache.org/repos/asf/impala/blob/dfc17b86/docs/topics/impala_cluster_sizing.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_cluster_sizing.xml b/docs/topics/impala_cluster_sizing.xml
deleted file mode 100644
index 7b395c5..0000000
--- a/docs/topics/impala_cluster_sizing.xml
+++ /dev/null
@@ -1,371 +0,0 @@
-<?xml version="1.0" encoding="UTF-8"?>
-<!--
-Licensed to the Apache Software Foundation (ASF) under one
-or more contributor license agreements.  See the NOTICE file
-distributed with this work for additional information
-regarding copyright ownership.  The ASF licenses this file
-to you under the Apache License, Version 2.0 (the
-"License"); you may not use this file except in compliance
-with the License.  You may obtain a copy of the License at
-
-  http://www.apache.org/licenses/LICENSE-2.0
-
-Unless required by applicable law or agreed to in writing,
-software distributed under the License is distributed on an
-"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-KIND, either express or implied.  See the License for the
-specific language governing permissions and limitations
-under the License.
--->
-<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
-<concept id="cluster_sizing">
-
-  <title>Cluster Sizing Guidelines for Impala</title>
-  <titlealts audience="PDF"><navtitle>Cluster Sizing</navtitle></titlealts>
-  <prolog>
-    <metadata>
-      <data name="Category" value="Impala"/>
-      <data name="Category" value="Clusters"/>
-      <data name="Category" value="Planning"/>
-      <data name="Category" value="Sizing"/>
-      <data name="Category" value="Deploying"/>
-      <!-- Hoist by my own petard. Memory is an important theme of this topic but that's in a <section> title. -->
-      <data name="Category" value="Sectionated Pages"/>
-      <data name="Category" value="Memory"/>
-      <data name="Category" value="Scalability"/>
-      <data name="Category" value="Proof of Concept"/>
-      <data name="Category" value="Requirements"/>
-      <data name="Category" value="Guidelines"/>
-      <data name="Category" value="Best Practices"/>
-      <data name="Category" value="Administrators"/>
-    </metadata>
-  </prolog>
-
-  <conbody>
-
-    <p>
-      <indexterm audience="hidden">cluster sizing</indexterm>
-      This document provides a very rough guideline to estimate the size of a cluster needed for a specific
-      customer application. You can use this information when planning how much and what type of hardware to
-      acquire for a new cluster, or when adding Impala workloads to an existing cluster.
-    </p>
-
-    <note>
-      Before making purchase or deployment decisions, consult organizations with relevant experience
-      to verify the conclusions about hardware requirements based on your data volume and workload.
-    </note>
-
-<!--    <p outputclass="toc inpage"/> -->
-
-    <p>
-      Always use hosts with identical specifications and capacities for all the nodes in the cluster. Currently,
-      Impala divides the work evenly between cluster nodes, regardless of their exact hardware configuration.
-      Because work can be distributed in different ways for different queries, if some hosts are overloaded
-      compared to others in terms of CPU, memory, I/O, or network, you might experience inconsistent performance
-      and overall slowness
-    </p>
-
-    <p>
-      For analytic workloads with star/snowflake schemas, and using consistent hardware for all nodes (64 GB RAM,
-      12 2 TB hard drives, 2x E5-2630L 12 cores total, 10 GB network), the following table estimates the number of
-      DataNodes needed in the cluster based on data size and the number of concurrent queries, for workloads
-      similar to TPC-DS benchmark queries:
-    </p>
-
-    <table>
-      <title>Cluster size estimation based on the number of concurrent queries and data size with a 20 second average query response time</title>
-      <tgroup cols="6">
-        <colspec colnum="1" colname="col1"/>
-        <colspec colnum="2" colname="col2"/>
-        <colspec colnum="3" colname="col3"/>
-        <colspec colnum="4" colname="col4"/>
-        <colspec colnum="5" colname="col5"/>
-        <colspec colnum="6" colname="col6"/>
-        <thead>
-          <row>
-            <entry>
-              Data Size
-            </entry>
-            <entry>
-              1 query
-            </entry>
-            <entry>
-              10 queries
-            </entry>
-            <entry>
-              100 queries
-            </entry>
-            <entry>
-              1000 queries
-            </entry>
-            <entry>
-              2000 queries
-            </entry>
-          </row>
-        </thead>
-        <tbody>
-          <row>
-            <entry>
-              <b>250 GB</b>
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              5
-            </entry>
-            <entry>
-              35
-            </entry>
-            <entry>
-              70
-            </entry>
-          </row>
-          <row>
-            <entry>
-              <b>500 GB</b>
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              10
-            </entry>
-            <entry>
-              70
-            </entry>
-            <entry>
-              135
-            </entry>
-          </row>
-          <row>
-            <entry>
-              <b>1 TB</b>
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              15
-            </entry>
-            <entry>
-              135
-            </entry>
-            <entry>
-              270
-            </entry>
-          </row>
-          <row>
-            <entry>
-              <b>15 TB</b>
-            </entry>
-            <entry>
-              2
-            </entry>
-            <entry>
-              20
-            </entry>
-            <entry>
-              200
-            </entry>
-            <entry>
-              N/A
-            </entry>
-            <entry>
-              N/A
-            </entry>
-          </row>
-          <row>
-            <entry>
-              <b>30 TB</b>
-            </entry>
-            <entry>
-              4
-            </entry>
-            <entry>
-              40
-            </entry>
-            <entry>
-              400
-            </entry>
-            <entry>
-              N/A
-            </entry>
-            <entry>
-              N/A
-            </entry>
-          </row>
-          <row>
-            <entry>
-              <b>60 TB</b>
-            </entry>
-            <entry>
-              8
-            </entry>
-            <entry>
-              80
-            </entry>
-            <entry>
-              800
-            </entry>
-            <entry>
-              N/A
-            </entry>
-            <entry>
-              N/A
-            </entry>
-          </row>
-        </tbody>
-      </tgroup>
-    </table>
-
-    <section id="sizing_factors">
-
-      <title>Factors Affecting Scalability</title>
-
-      <p>
-        A typical analytic workload (TPC-DS style queries) using recommended hardware is usually CPU-bound. Each
-        node can process roughly 1.6 GB/sec. Both CPU-bound and disk-bound workloads can scale almost linearly with
-        cluster size. However, for some workloads, the scalability might be bounded by the network, or even by
-        memory.
-      </p>
-
-      <p>
-        If the workload is already network bound (on a 10 GB network), increasing the cluster size won’t reduce
-        the network load; in fact, a larger cluster could increase network traffic because some queries involve
-        <q>broadcast</q> operations to all DataNodes. Therefore, boosting the cluster size does not improve query
-        throughput in a network-constrained environment.
-      </p>
-
-      <p>
-        Let’s look at a memory-bound workload. A workload is memory-bound if Impala cannot run any additional
-        concurrent queries because all memory allocated has already been consumed, but neither CPU, disk, nor
-        network is saturated yet. This can happen because currently Impala uses only a single core per node to
-        process join and aggregation queries. For a node with 128 GB of RAM, if a join node takes 50 GB, the system
-        cannot run more than 2 such queries at the same time.
-      </p>
-
-      <p>
-        Therefore, at most 2 cores are used. Throughput can still scale almost linearly even for a memory-bound
-        workload. It’s just that the CPU will not be saturated. Per-node throughput will be lower than 1.6
-        GB/sec. Consider increasing the memory per node.
-      </p>
-
-      <p>
-        As long as the workload is not network- or memory-bound, we can use the 1.6 GB/second per node as the
-        throughput estimate.
-      </p>
-    </section>
-
-    <section id="sizing_details">
-
-      <title>A More Precise Approach</title>
-
-      <p>
-        A more precise sizing estimate would require not only queries per minute (QPM), but also an average data
-        size scanned per query (D). With the proper partitioning strategy, D is usually a fraction of the total
-        data size. The following equation can be used as a rough guide to estimate the number of nodes (N) needed:
-      </p>
-
-<codeblock>Eq 1: N &gt; QPM * D / 100 GB
-</codeblock>
-
-      <p>
-        Here is an example. Suppose, on average, a query scans 50 GB of data and the average response time is
-        required to be 15 seconds or less when there are 100 concurrent queries. The QPM is 100/15*60 = 400. We can
-        estimate the number of node using our equation above.
-      </p>
-
-<codeblock>N &gt; QPM * D / 100GB
-N &gt; 400 * 50GB / 100GB
-N &gt; 200
-</codeblock>
-
-      <p>
-        Because this figure is a rough estimate, the corresponding number of nodes could be between 100 and 500.
-      </p>
-
-      <p>
-        Depending on the complexity of the query, the processing rate of query might change. If the query has more
-        joins, aggregation functions, or CPU-intensive functions such as string processing or complex UDFs, the
-        process rate will be lower than 1.6 GB/second per node. On the other hand, if the query only does scan and
-        filtering on numbers, the processing rate can be higher.
-      </p>
-    </section>
-
-    <section id="sizing_mem_estimate">
-
-      <title>Estimating Memory Requirements</title>
-      <!--
-  <prolog>
-    <metadata>
-      <data name="Category" value="Memory"/>
-    </metadata>
-  </prolog>
-      -->
-
-      <p>
-        Impala can handle joins between multiple large tables. Make sure that statistics are collected for all the
-        joined tables, using the <codeph><xref href="impala_compute_stats.xml#compute_stats">COMPUTE
-        STATS</xref></codeph> statement. However, joining big tables does consume more memory. Follow the steps
-        below to calculate the minimum memory requirement.
-      </p>
-
-      <p>
-        Suppose you are running the following join:
-      </p>
-
-<codeblock>select a.*, b.col_1, b.col_2, … b.col_n
-from a, b
-where a.key = b.key
-and b.col_1 in (1,2,4...)
-and b.col_4 in (....);
-</codeblock>
-
-      <p>
-        And suppose table <codeph>B</codeph> is smaller than table <codeph>A</codeph> (but still a large table).
-      </p>
-
-      <p>
-        The memory requirement for the query is the right-hand table (<codeph>B</codeph>), after decompression,
-        filtering (<codeph>b.col_n in ...</codeph>) and after projection (only using certain columns) must be less
-        than the total memory of the entire cluster.
-      </p>
-
-<codeblock>Cluster Total Memory Requirement  = Size of the smaller table *
-  selectivity factor from the predicate *
-  projection factor * compression ratio
-</codeblock>
-
-      <p>
-        In this case, assume that table <codeph>B</codeph> is 100 TB in Parquet format with 200 columns. The
-        predicate on <codeph>B</codeph> (<codeph>b.col_1 in ...and b.col_4 in ...</codeph>) will select only 10% of
-        the rows from <codeph>B</codeph> and for projection, we are only projecting 5 columns out of 200 columns.
-        Usually, Snappy compression gives us 3 times compression, so we estimate a 3x compression factor.
-      </p>
-
-<codeblock>Cluster Total Memory Requirement  = Size of the smaller table *
-  selectivity factor from the predicate *
-  projection factor * compression ratio
-  = 100TB * 10% * 5/200 * 3
-  = 0.75TB
-  = 750GB
-</codeblock>
-
-      <p>
-        So, if you have a 10-node cluster, each node has 128 GB of RAM and you give 80% to Impala, then you have 1
-        TB of usable memory for Impala, which is more than 750GB. Therefore, your cluster can handle join queries
-        of this magnitude.
-      </p>
-    </section>
-  </conbody>
-</concept>


[10/20] impala git commit: IMPALA-6886: [DOCS] Removed the missed keyref for impala_cluster_sizing

Posted by sa...@apache.org.
IMPALA-6886: [DOCS] Removed the missed keyref for impala_cluster_sizing

Change-Id: Ieab3eac8027c57a9a1a30a77ac0e07686a92f2b0
Reviewed-on: http://gerrit.cloudera.org:8080/10118
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/a8c3e2b1
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/a8c3e2b1
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/a8c3e2b1

Branch: refs/heads/2.x
Commit: a8c3e2b152d777427fc907d55326fafc3bc20a4f
Parents: db15294
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Apr 19 10:37:26 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:57 2018 +0000

----------------------------------------------------------------------
 docs/impala_keydefs.ditamap         | 2 +-
 docs/topics/impala_new_features.xml | 4 +---
 2 files changed, 2 insertions(+), 4 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/a8c3e2b1/docs/impala_keydefs.ditamap
----------------------------------------------------------------------
diff --git a/docs/impala_keydefs.ditamap b/docs/impala_keydefs.ditamap
index 9703698..d9eb62d 100644
--- a/docs/impala_keydefs.ditamap
+++ b/docs/impala_keydefs.ditamap
@@ -10618,7 +10618,7 @@ under the License.
   <keydef href="topics/impala_hadoop.xml" keys="intro_hadoop"/>
   <keydef href="topics/impala_planning.xml" keys="planning"/>
   <keydef href="topics/impala_prereqs.xml" keys="prereqs"/>
-  <keydef href="topics/impala_cluster_sizing.xml" keys="cluster_sizing"/>
+  <!-- Removed <keydef href="topics/impala_cluster_sizing.xml" keys="cluster_sizing"/> -->
   <keydef href="topics/impala_schema_design.xml" keys="schema_design"/>
   <keydef href="topics/impala_install.xml" keys="install"/>
   <keydef href="topics/impala_config.xml" keys="config"/>

http://git-wip-us.apache.org/repos/asf/impala/blob/a8c3e2b1/docs/topics/impala_new_features.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_new_features.xml b/docs/topics/impala_new_features.xml
index deb15e0..c5109e3 100644
--- a/docs/topics/impala_new_features.xml
+++ b/docs/topics/impala_new_features.xml
@@ -2648,9 +2648,7 @@ under the License.
         <li>
           <p>
             The documentation provides additional guidance for planning tasks. <ph audience="PDF">See
-            <xref href="impala_planning.xml#planning"/>.</ph> <ph audience="PDF">In particular, see
-            <xref href="impala_cluster_sizing.xml#cluster_sizing"/> before you purchase or repurpose hardware for a
-            cluster to run Impala.</ph>
+            <xref href="impala_planning.xml#planning"/>.</ph>
           </p>
         </li>
 


[16/20] impala git commit: IMPALA-5690: Part 1: Rename ostream operators for thrift types

Posted by sa...@apache.org.
IMPALA-5690: Part 1: Rename ostream operators for thrift types

Thrift 0.9.3 implements "ostream& operator<<(ostream&, T)" for thrift
data types while impala did the same to enums and special types
including TNetworkAddress and TUniqueId. To prepare for the upgrade of
thrift 0.9.3, this patch renames these impala defined functions. In the
absence of operator<<, assertion macros like DCHECK_EQ can no longer be
used on non-enum thrift defined types.

Change-Id: I9c303997411237e988ef960157f781776f6fcb60
Reviewed-on: http://gerrit.cloudera.org:8080/9168
Reviewed-by: Tianyi Wang <tw...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/e114778e
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/e114778e
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/e114778e

Branch: refs/heads/2.x
Commit: e114778ef32b6e5cfc230854cc50e79eedf2d0bf
Parents: f51dce0
Author: Tianyi Wang <tw...@cloudera.com>
Authored: Mon Feb 12 15:59:02 2018 -0800
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:58 2018 +0000

----------------------------------------------------------------------
 be/src/benchmarks/scheduler-benchmark.cc        |  2 +-
 be/src/codegen/llvm-codegen.cc                  |  4 +-
 be/src/exec/exchange-node.cc                    |  2 +-
 be/src/exec/exec-node.cc                        |  8 +-
 be/src/exec/hdfs-scan-node-base.cc              | 13 +--
 be/src/exec/hdfs-scan-node.cc                   |  2 +-
 be/src/exec/parquet-column-readers.cc           |  2 +-
 be/src/exec/parquet-metadata-utils.cc           |  2 +-
 be/src/rpc/rpc-trace.cc                         |  4 +-
 be/src/rpc/thrift-client.cc                     | 16 ++--
 be/src/rpc/thrift-util.cc                       |  3 +-
 be/src/rpc/thrift-util.h                        |  2 +-
 be/src/runtime/client-cache.cc                  | 21 +++--
 be/src/runtime/coordinator-backend-state.cc     | 27 +++---
 be/src/runtime/coordinator.cc                   | 34 ++++----
 be/src/runtime/data-stream-mgr.cc               | 22 ++---
 be/src/runtime/data-stream-recvr.cc             |  7 +-
 be/src/runtime/data-stream-sender.cc            | 17 ++--
 be/src/runtime/fragment-instance-state.cc       |  6 +-
 be/src/runtime/krpc-data-stream-mgr.cc          |  8 +-
 be/src/runtime/krpc-data-stream-recvr.cc        |  7 +-
 be/src/runtime/krpc-data-stream-sender.cc       | 12 +--
 be/src/runtime/mem-tracker.cc                   |  4 +-
 be/src/runtime/query-exec-mgr.cc                |  6 +-
 be/src/runtime/query-state.cc                   |  2 +-
 be/src/runtime/runtime-filter-bank.cc           |  3 +-
 be/src/runtime/runtime-state.cc                 |  4 +-
 be/src/scheduling/admission-controller.cc       | 14 ++--
 be/src/scheduling/scheduler.cc                  | 14 ++--
 be/src/service/child-query.cc                   |  3 +-
 be/src/service/client-request-state.cc          | 18 ++--
 be/src/service/impala-beeswax-server.cc         |  2 +-
 be/src/service/impala-hs2-server.cc             |  4 +-
 be/src/service/impala-http-handler.cc           | 11 ++-
 be/src/service/impala-internal-service.cc       |  7 +-
 be/src/service/impala-server.cc                 | 48 ++++++-----
 be/src/service/query-options-test.cc            |  2 +-
 be/src/service/query-options.cc                 | 25 ++++--
 be/src/service/query-result-set.cc              |  4 +-
 be/src/statestore/statestore.cc                 |  4 +-
 be/src/util/collection-metrics.h                |  2 +-
 be/src/util/debug-util.cc                       | 88 ++++++--------------
 be/src/util/debug-util.h                        | 46 +++++-----
 be/src/util/histogram-metric.h                  |  4 +-
 be/src/util/metrics.h                           |  5 +-
 be/src/util/network-util.cc                     |  7 +-
 be/src/util/network-util.h                      |  3 -
 be/src/util/webserver.cc                        |  7 +-
 .../functional-query/queries/QueryTest/set.test |  6 +-
 tests/shell/test_shell_commandline.py           |  4 +-
 50 files changed, 278 insertions(+), 290 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/benchmarks/scheduler-benchmark.cc
----------------------------------------------------------------------
diff --git a/be/src/benchmarks/scheduler-benchmark.cc b/be/src/benchmarks/scheduler-benchmark.cc
index 7149dde..ef7fb9e 100644
--- a/be/src/benchmarks/scheduler-benchmark.cc
+++ b/be/src/benchmarks/scheduler-benchmark.cc
@@ -131,7 +131,7 @@ void BenchmarkFunction(int num_iterations, void* data) {
 /// blocks. Scheduling will be done according to the parameter 'replica_preference'.
 void RunClusterSizeBenchmark(TReplicaPreference::type replica_preference) {
   string suite_name = strings::Substitute(
-      "Cluster Size, $0", PrintTReplicaPreference(replica_preference));
+      "Cluster Size, $0", PrintThriftEnum(replica_preference));
   Benchmark suite(suite_name, false /* micro_heuristics */);
   vector<TestCtx> test_ctx(CLUSTER_SIZES.size());
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/codegen/llvm-codegen.cc
----------------------------------------------------------------------
diff --git a/be/src/codegen/llvm-codegen.cc b/be/src/codegen/llvm-codegen.cc
index 5d5ed15..7fe4ec1 100644
--- a/be/src/codegen/llvm-codegen.cc
+++ b/be/src/codegen/llvm-codegen.cc
@@ -1088,7 +1088,7 @@ Status LlvmCodeGen::FinalizeModule() {
   }
   string non_finalized_fns_str = ss.str();
   if (!non_finalized_fns_str.empty()) {
-    LOG(INFO) << "For query " << state_->query_id()
+    LOG(INFO) << "For query " << PrintId(state_->query_id())
               << " the following functions were not finalized and have been removed from "
                  "the module:\n"
               << non_finalized_fns_str;
@@ -1710,7 +1710,7 @@ void LlvmCodeGen::DiagnosticHandler::DiagnosticHandlerFn(
     info.print(diagnostic_printer);
     error_msg.flush();
     if (codegen->state_) {
-      LOG(INFO) << "Query " << codegen->state_->query_id() << " encountered a "
+      LOG(INFO) << "Query " << PrintId(codegen->state_->query_id()) << " encountered a "
           << codegen->diagnostic_handler_.error_str_;
     }
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/exchange-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/exchange-node.cc b/be/src/exec/exchange-node.cc
index 2dc662b..297a805 100644
--- a/be/src/exec/exchange-node.cc
+++ b/be/src/exec/exchange-node.cc
@@ -150,7 +150,7 @@ Status ExchangeNode::FillInputRowBatch(RuntimeState* state) {
   VLOG_FILE << "exch: has batch=" << (input_batch_ == NULL ? "false" : "true")
             << " #rows=" << (input_batch_ != NULL ? input_batch_->num_rows() : 0)
             << " is_cancelled=" << (ret_status.IsCancelled() ? "true" : "false")
-            << " instance_id=" << state->fragment_instance_id();
+            << " instance_id=" << PrintId(state->fragment_instance_id());
   return ret_status;
 }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/exec-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/exec-node.cc b/be/src/exec/exec-node.cc
index 5f9f777..c94f38c 100644
--- a/be/src/exec/exec-node.cc
+++ b/be/src/exec/exec-node.cc
@@ -123,7 +123,7 @@ ExecNode::ExecNode(ObjectPool* pool, const TPlanNode& tnode, const DescriptorTbl
     limit_(tnode.limit),
     num_rows_returned_(0),
     runtime_profile_(RuntimeProfile::Create(pool_,
-        Substitute("$0 (id=$1)", PrintPlanNodeType(tnode.node_type), id_))),
+        Substitute("$0 (id=$1)", PrintThriftEnum(tnode.node_type), id_))),
     rows_returned_counter_(NULL),
     rows_returned_rate_(NULL),
     containing_subplan_(NULL),
@@ -209,8 +209,8 @@ void ExecNode::Close(RuntimeState* state) {
   if (expr_mem_tracker_ != nullptr) expr_mem_tracker_->Close();
   if (mem_tracker_ != nullptr) {
     if (mem_tracker()->consumption() != 0) {
-      LOG(WARNING) << "Query " << state->query_id() << " may have leaked memory." << endl
-          << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
+      LOG(WARNING) << "Query " << PrintId(state->query_id()) << " may have leaked memory."
+          << endl << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
       DCHECK_EQ(mem_tracker()->consumption(), 0)
           << "Leaked memory." << endl
           << state->instance_mem_tracker()->LogUsage(MemTracker::UNLIMITED_DEPTH);
@@ -232,7 +232,7 @@ Status ExecNode::ClaimBufferReservation(RuntimeState* state) {
   }
 
   RETURN_IF_ERROR(buffer_pool->RegisterClient(
-      Substitute("$0 id=$1 ptr=$2", PrintPlanNodeType(type_), id_, this),
+      Substitute("$0 id=$1 ptr=$2", PrintThriftEnum(type_), id_, this),
       state->query_state()->file_group(), state->instance_buffer_reservation(),
       mem_tracker(), resource_profile_.max_reservation, runtime_profile(),
       &buffer_pool_client_));

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/hdfs-scan-node-base.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-scan-node-base.cc b/be/src/exec/hdfs-scan-node-base.cc
index 81642bf..b2c3779 100644
--- a/be/src/exec/hdfs-scan-node-base.cc
+++ b/be/src/exec/hdfs-scan-node-base.cc
@@ -799,23 +799,26 @@ void HdfsScanNodeBase::StopAndFinalizeCounters() {
           if (file_format == THdfsFileFormat::PARQUET) {
             // If a scan range stored as parquet is skipped, its compression type
             // cannot be figured out without reading the data.
-            ss << file_format << "/" << "Unknown" << "(Skipped):" << file_cnt << " ";
+            ss << PrintThriftEnum(file_format) << "/" << "Unknown" << "(Skipped):"
+               << file_cnt << " ";
           } else {
-            ss << file_format << "/" << compressions_set.GetFirstType() << "(Skipped):"
+            ss << PrintThriftEnum(file_format) << "/"
+               << PrintThriftEnum(compressions_set.GetFirstType()) << "(Skipped):"
                << file_cnt << " ";
           }
         } else if (compressions_set.Size() == 1) {
-          ss << file_format << "/" << compressions_set.GetFirstType() << ":" << file_cnt
+          ss << PrintThriftEnum(file_format) << "/"
+             << PrintThriftEnum(compressions_set.GetFirstType()) << ":" << file_cnt
              << " ";
         } else {
-          ss << file_format << "/" << "(";
+          ss << PrintThriftEnum(file_format) << "/" << "(";
           bool first = true;
           for (auto& elem : _THdfsCompression_VALUES_TO_NAMES) {
             THdfsCompression::type type = static_cast<THdfsCompression::type>(
                 elem.first);
             if (!compressions_set.HasType(type)) continue;
             if (!first) ss << ",";
-            ss << type;
+            ss << PrintThriftEnum(type);
             first = false;
           }
           ss << "):" << file_cnt << " ";

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/hdfs-scan-node.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/hdfs-scan-node.cc b/be/src/exec/hdfs-scan-node.cc
index 710a8af..7c64338 100644
--- a/be/src/exec/hdfs-scan-node.cc
+++ b/be/src/exec/hdfs-scan-node.cc
@@ -158,7 +158,7 @@ Status HdfsScanNode::Init(const TPlanNode& tnode, RuntimeState* state) {
     max_materialized_row_batches_ = default_max_row_batches;
   }
   VLOG_QUERY << "Max row batch queue size for scan node '" << id_
-      << "' in fragment instance '" << state->fragment_instance_id()
+      << "' in fragment instance '" << PrintId(state->fragment_instance_id())
       << "': " << max_materialized_row_batches_;
   materialized_row_batches_.reset(new RowBatchQueue(max_materialized_row_batches_));
   return HdfsScanNodeBase::Init(tnode, state);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/parquet-column-readers.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/parquet-column-readers.cc b/be/src/exec/parquet-column-readers.cc
index 1406919..6d7c52f 100644
--- a/be/src/exec/parquet-column-readers.cc
+++ b/be/src/exec/parquet-column-readers.cc
@@ -1224,7 +1224,7 @@ bool BaseScalarColumnReader::NextLevels() {
 Status BaseScalarColumnReader::GetUnsupportedDecodingError() {
   return Status(Substitute(
       "File '$0' is corrupt: unexpected encoding: $1 for data page of column '$2'.",
-      filename(), PrintEncoding(page_encoding_), schema_element().name));
+      filename(), PrintThriftEnum(page_encoding_), schema_element().name));
 }
 
 bool BaseScalarColumnReader::NextPage() {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/exec/parquet-metadata-utils.cc
----------------------------------------------------------------------
diff --git a/be/src/exec/parquet-metadata-utils.cc b/be/src/exec/parquet-metadata-utils.cc
index 3fc6b3e..3d05fe6 100644
--- a/be/src/exec/parquet-metadata-utils.cc
+++ b/be/src/exec/parquet-metadata-utils.cc
@@ -154,7 +154,7 @@ Status ParquetMetadataUtils::ValidateRowGroupColumn(
   for (int i = 0; i < encodings.size(); ++i) {
     if (!IsEncodingSupported(encodings[i])) {
       return Status(Substitute("File '$0' uses an unsupported encoding: $1 for column "
-          "'$2'.", filename, PrintEncoding(encodings[i]), schema_element.name));
+          "'$2'.", filename, PrintThriftEnum(encodings[i]), schema_element.name));
     }
   }
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/rpc/rpc-trace.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/rpc-trace.cc b/be/src/rpc/rpc-trace.cc
index 028f397..425312f 100644
--- a/be/src/rpc/rpc-trace.cc
+++ b/be/src/rpc/rpc-trace.cc
@@ -197,7 +197,7 @@ void* RpcEventHandler::getContext(const char* fn_name, void* server_context) {
   InvocationContext* ctxt_ptr =
       new InvocationContext(MonotonicMillis(), cnxn_ctx, it->second);
   VLOG_RPC << "RPC call: " << string(fn_name) << "(from "
-           << ctxt_ptr->cnxn_ctx->network_address << ")";
+           << TNetworkAddressToString(ctxt_ptr->cnxn_ctx->network_address) << ")";
   return reinterpret_cast<void*>(ctxt_ptr);
 }
 
@@ -207,7 +207,7 @@ void RpcEventHandler::postWrite(void* ctx, const char* fn_name, uint32_t bytes)
   const string& call_name = string(fn_name);
   // TODO: bytes is always 0, how come?
   VLOG_RPC << "RPC call: " << server_name_ << ":" << call_name << " from "
-           << rpc_ctx->cnxn_ctx->network_address << " took "
+           << TNetworkAddressToString(rpc_ctx->cnxn_ctx->network_address) << " took "
            << PrettyPrinter::Print(elapsed_time * 1000L * 1000L, TUnit::TIME_NS);
   MethodDescriptor* descriptor = rpc_ctx->method_descriptor;
   delete rpc_ctx;

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/rpc/thrift-client.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-client.cc b/be/src/rpc/thrift-client.cc
index 1f8d99e..e0852eb 100644
--- a/be/src/rpc/thrift-client.cc
+++ b/be/src/rpc/thrift-client.cc
@@ -67,14 +67,14 @@ Status ThriftClientImpl::Open() {
     try {
       transport_->close();
     } catch (const TException& e) {
-      VLOG(1) << "Error closing socket to: " << address_ << ", ignoring (" << e.what()
-                << ")";
+      VLOG(1) << "Error closing socket to: " << TNetworkAddressToString(address_)
+              << ", ignoring (" << e.what() << ")";
     }
     // In certain cases in which the remote host is overloaded, this failure can
     // happen quite frequently. Let's print this error message without the stack
     // trace as there aren't many callers of this function.
     const string& err_msg = Substitute("Couldn't open transport for $0 ($1)",
-        lexical_cast<string>(address_), e.what());
+        TNetworkAddressToString(address_), e.what());
     VLOG(1) << err_msg;
     return Status::Expected(err_msg);
   }
@@ -91,7 +91,7 @@ Status ThriftClientImpl::OpenWithRetry(uint32_t num_tries, uint64_t wait_ms) {
     Status status = Open();
     if (status.ok()) return status;
 
-    LOG(INFO) << "Unable to connect to " << address_;
+    LOG(INFO) << "Unable to connect to " << TNetworkAddressToString(address_);
     if (num_tries == 0) {
       LOG(INFO) << "(Attempt " << try_count << ", will retry indefinitely)";
     } else {
@@ -109,15 +109,15 @@ void ThriftClientImpl::Close() {
   try {
     if (transport_.get() != NULL && transport_->isOpen()) transport_->close();
   } catch (const TException& e) {
-    LOG(INFO) << "Error closing connection to: " << address_ << ", ignoring (" << e.what()
-              << ")";
+    LOG(INFO) << "Error closing connection to: " << TNetworkAddressToString(address_)
+              << ", ignoring (" << e.what() << ")";
     // Forcibly close the socket (since the transport may have failed to get that far
     // during close())
     try {
       if (socket_.get() != NULL) socket_->close();
     } catch (const TException& e) {
-      LOG(INFO) << "Error closing socket to: " << address_ << ", ignoring (" << e.what()
-                << ")";
+      LOG(INFO) << "Error closing socket to: " << TNetworkAddressToString(address_)
+                << ", ignoring (" << e.what() << ")";
     }
   }
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/rpc/thrift-util.cc
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-util.cc b/be/src/rpc/thrift-util.cc
index b8326f5..c0ba537 100644
--- a/be/src/rpc/thrift-util.cc
+++ b/be/src/rpc/thrift-util.cc
@@ -156,7 +156,7 @@ Status WaitForServer(const string& host, int port, int num_retries,
   return Status("Server did not come up");
 }
 
-std::ostream& operator<<(std::ostream& out, const TColumnValue& colval) {
+void PrintTColumnValue(std::ostream& out, const TColumnValue& colval) {
   if (colval.__isset.bool_val) {
     out << ((colval.bool_val) ? "true" : "false");
   } else if (colval.__isset.double_val) {
@@ -176,7 +176,6 @@ std::ostream& operator<<(std::ostream& out, const TColumnValue& colval) {
   } else {
     out << "NULL";
   }
-  return out;
 }
 
 bool TNetworkAddressComparator(const TNetworkAddress& a, const TNetworkAddress& b) {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/rpc/thrift-util.h
----------------------------------------------------------------------
diff --git a/be/src/rpc/thrift-util.h b/be/src/rpc/thrift-util.h
index 24b0b6f..ed95a71 100644
--- a/be/src/rpc/thrift-util.h
+++ b/be/src/rpc/thrift-util.h
@@ -139,7 +139,7 @@ Status WaitForServer(const std::string& host, int port, int num_retries,
    int retry_interval_ms);
 
 /// Print a TColumnValue. If null, print "NULL".
-std::ostream& operator<<(std::ostream& out, const TColumnValue& colval);
+void PrintTColumnValue(std::ostream& out, const TColumnValue& colval);
 
 /// Compares two TNetworkAddresses alphanumerically by their host:port
 /// string representation

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/client-cache.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/client-cache.cc b/be/src/runtime/client-cache.cc
index af530f7..d066d6b 100644
--- a/be/src/runtime/client-cache.cc
+++ b/be/src/runtime/client-cache.cc
@@ -43,7 +43,7 @@ Status ClientCacheHelper::GetClient(const TNetworkAddress& address,
   shared_ptr<PerHostCache> host_cache;
   {
     lock_guard<mutex> lock(cache_lock_);
-    VLOG(2) << "GetClient(" << address << ")";
+    VLOG(2) << "GetClient(" << TNetworkAddressToString(address) << ")";
     shared_ptr<PerHostCache>* ptr = &per_host_caches_[address];
     if (ptr->get() == NULL) ptr->reset(new PerHostCache());
     host_cache = *ptr;
@@ -53,7 +53,8 @@ Status ClientCacheHelper::GetClient(const TNetworkAddress& address,
     lock_guard<mutex> lock(host_cache->lock);
     if (!host_cache->clients.empty()) {
       *client_key = host_cache->clients.front();
-      VLOG(2) << "GetClient(): returning cached client for " << address;
+      VLOG(2) << "GetClient(): returning cached client for " <<
+          TNetworkAddressToString(address);
       host_cache->clients.pop_front();
       if (metrics_enabled_) clients_in_use_metric_->Increment(1);
       return Status::OK();
@@ -78,7 +79,8 @@ Status ClientCacheHelper::ReopenClient(ClientFactory factory_method,
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(1) << "ReopenClient(): re-creating client for " << client_impl->address();
+  VLOG(1) << "ReopenClient(): re-creating client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   client_impl->Close();
 
@@ -109,7 +111,8 @@ Status ClientCacheHelper::ReopenClient(ClientFactory factory_method,
 Status ClientCacheHelper::CreateClient(const TNetworkAddress& address,
     ClientFactory factory_method, ClientKey* client_key) {
   shared_ptr<ThriftClientImpl> client_impl(factory_method(address, client_key));
-  VLOG(2) << "CreateClient(): creating new client for " << client_impl->address();
+  VLOG(2) << "CreateClient(): creating new client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   if (!client_impl->init_status().ok()) {
     *client_key = nullptr;
@@ -145,7 +148,8 @@ void ClientCacheHelper::ReleaseClient(ClientKey* client_key) {
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(2) << "Releasing client for " << client_impl->address() << " back to cache";
+  VLOG(2) << "Releasing client for " << TNetworkAddressToString(client_impl->address())
+      << " back to cache";
   {
     lock_guard<mutex> lock(cache_lock_);
     PerHostCacheMap::iterator cache = per_host_caches_.find(client_impl->address());
@@ -167,7 +171,8 @@ void ClientCacheHelper::DestroyClient(ClientKey* client_key) {
     DCHECK(client != client_map_.end());
     client_impl = client->second;
   }
-  VLOG(1) << "Broken Connection, destroy client for " << client_impl->address();
+  VLOG(1) << "Broken Connection, destroy client for " <<
+      TNetworkAddressToString(client_impl->address());
 
   client_impl->Close();
   if (metrics_enabled_) total_clients_metric_->Increment(-1);
@@ -188,7 +193,7 @@ void ClientCacheHelper::CloseConnections(const TNetworkAddress& address) {
 
   {
     VLOG(2) << "Invalidating all " << cache->clients.size() << " clients for: "
-            << address;
+            << TNetworkAddressToString(address);
     lock_guard<mutex> entry_lock(cache->lock);
     lock_guard<mutex> map_lock(client_map_lock_);
     for (ClientKey client_key: cache->clients) {
@@ -208,7 +213,7 @@ string ClientCacheHelper::DebugString() {
   for (const PerHostCacheMap::value_type& cache: per_host_caches_) {
     lock_guard<mutex> host_cache_lock(cache.second->lock);
     if (!first) out << " ";
-    out << cache.first << ":" << cache.second->clients.size();
+    out << TNetworkAddressToString(cache.first) << ":" << cache.second->clients.size();
     first = false;
   }
   out << "])";

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/coordinator-backend-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator-backend-state.cc b/be/src/runtime/coordinator-backend-state.cc
index e8db00e..c701f8e 100644
--- a/be/src/runtime/coordinator-backend-state.cc
+++ b/be/src/runtime/coordinator-backend-state.cc
@@ -58,7 +58,7 @@ void Coordinator::BackendState::Init(
   int prev_fragment_idx = -1;
   for (const FInstanceExecParams* instance_params:
        backend_exec_params_->instance_params) {
-    DCHECK_EQ(host_, instance_params->host);  // all hosts must be the same
+    DCHECK(host_ == instance_params->host);  // all hosts must be the same
     int fragment_idx = instance_params->fragment().idx;
     DCHECK_LT(fragment_idx, fragment_stats.size());
     if (prev_fragment_idx != -1 && fragment_idx != prev_fragment_idx) {
@@ -157,7 +157,8 @@ void Coordinator::BackendState::Exec(
   rpc_params.__set_query_ctx(query_ctx);
   SetRpcParams(debug_options, filter_routing_table, &rpc_params);
   VLOG_FILE << "making rpc: ExecQueryFInstances"
-      << " host=" << impalad_address() << " query_id=" << PrintId(query_id_);
+      << " host=" << TNetworkAddressToString(impalad_address()) << " query_id="
+      << PrintId(query_id_);
 
   // guard against concurrent UpdateBackendExecStatus() that may arrive after RPC returns
   lock_guard<mutex> l(lock_);
@@ -223,8 +224,9 @@ void Coordinator::BackendState::LogFirstInProgress(
   for (Coordinator::BackendState* backend_state : backend_states) {
     lock_guard<mutex> l(backend_state->lock_);
     if (!backend_state->IsDone()) {
-      VLOG_QUERY << "query_id=" << backend_state->query_id_
-                 << ": first in-progress backend: " << backend_state->impalad_address();
+      VLOG_QUERY << "query_id=" << PrintId(backend_state->query_id_)
+                 << ": first in-progress backend: "
+                 << TNetworkAddressToString(backend_state->impalad_address());
       break;
     }
   }
@@ -249,7 +251,7 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
     int instance_idx = GetInstanceIdx(instance_exec_status.fragment_instance_id);
     DCHECK_EQ(instance_stats_map_.count(instance_idx), 1);
     InstanceStats* instance_stats = instance_stats_map_[instance_idx];
-    DCHECK_EQ(instance_stats->exec_params_.instance_id,
+    DCHECK(instance_stats->exec_params_.instance_id ==
         instance_exec_status.fragment_instance_id);
     // Ignore duplicate or out-of-order messages.
     if (instance_stats->done_) continue;
@@ -304,7 +306,8 @@ bool Coordinator::BackendState::ApplyExecStatusReport(
     // Append the log messages from each update with the global state of the query
     // execution
     MergeErrorMaps(backend_exec_status.error_log, &error_log_);
-    VLOG_FILE << "host=" << host_ << " error log: " << PrintErrorMapToString(error_log_);
+    VLOG_FILE << "host=" << TNetworkAddressToString(host_) << " error log: " <<
+        PrintErrorMapToString(error_log_);
   }
 
   // TODO: keep backend-wide stopwatch?
@@ -349,8 +352,8 @@ bool Coordinator::BackendState::Cancel() {
   params.protocol_version = ImpalaInternalServiceVersion::V1;
   params.__set_query_id(query_id_);
   TCancelQueryFInstancesResult dummy;
-  VLOG_QUERY << "sending CancelQueryFInstances rpc for query_id="
-             << query_id_ << " backend=" << TNetworkAddressToString(impalad_address());
+  VLOG_QUERY << "sending CancelQueryFInstances rpc for query_id=" << PrintId(query_id_) <<
+      " backend=" << TNetworkAddressToString(impalad_address());
 
   Status rpc_status;
   Status client_status;
@@ -370,14 +373,14 @@ bool Coordinator::BackendState::Cancel() {
   }
   if (!client_status.ok()) {
     status_.MergeStatus(client_status);
-    VLOG_QUERY << "CancelQueryFInstances query_id= " << query_id_
+    VLOG_QUERY << "CancelQueryFInstances query_id= " << PrintId(query_id_)
                << " failed to connect to " << TNetworkAddressToString(impalad_address())
                << " :" << client_status.msg().msg();
     return true;
   }
   if (!rpc_status.ok()) {
     status_.MergeStatus(rpc_status);
-    VLOG_QUERY << "CancelQueryFInstances query_id= " << query_id_
+    VLOG_QUERY << "CancelQueryFInstances query_id= " << PrintId(query_id_)
                << " rpc to " << TNetworkAddressToString(impalad_address())
                << " failed: " << rpc_status.msg().msg();
     return true;
@@ -386,7 +389,7 @@ bool Coordinator::BackendState::Cancel() {
 }
 
 void Coordinator::BackendState::PublishFilter(const TPublishFilterParams& rpc_params) {
-  DCHECK_EQ(rpc_params.dst_query_id, query_id_);
+  DCHECK(rpc_params.dst_query_id == query_id_);
   {
     // If the backend is already done, it's not waiting for this filter, so we skip
     // sending it in this case.
@@ -412,7 +415,7 @@ Coordinator::BackendState::InstanceStats::InstanceStats(
   : exec_params_(exec_params),
     profile_(nullptr) {
   const string& profile_name = Substitute("Instance $0 (host=$1)",
-      PrintId(exec_params.instance_id), lexical_cast<string>(exec_params.host));
+      PrintId(exec_params.instance_id), TNetworkAddressToString(exec_params.host));
   profile_ = RuntimeProfile::Create(obj_pool, profile_name);
   fragment_stats->root_profile()->AddChild(profile_);
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/coordinator.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/coordinator.cc b/be/src/runtime/coordinator.cc
index 5a3de5d..d6a70e7 100644
--- a/be/src/runtime/coordinator.cc
+++ b/be/src/runtime/coordinator.cc
@@ -90,7 +90,7 @@ Status Coordinator::Exec() {
   const TQueryExecRequest& request = schedule_.request();
   DCHECK(request.plan_exec_info.size() > 0);
 
-  VLOG_QUERY << "Exec() query_id=" << query_id()
+  VLOG_QUERY << "Exec() query_id=" << PrintId(query_id())
              << " stmt=" << request.query_ctx.client_request.stmt;
   stmt_type_ = request.stmt_type;
 
@@ -347,7 +347,7 @@ void Coordinator::StartBackendExec() {
   DebugOptions debug_options(schedule_.query_options());
 
   VLOG_QUERY << "starting execution on " << num_backends << " backends for query_id="
-             << query_id();
+             << PrintId(query_id());
   query_events_->MarkEvent(Substitute("Ready to start on $0 backends", num_backends));
 
   for (BackendState* backend_state: backend_states_) {
@@ -360,7 +360,7 @@ void Coordinator::StartBackendExec() {
 
   exec_complete_barrier_->Wait();
   VLOG_QUERY << "started execution on " << num_backends << " backends for query_id="
-             << query_id();
+             << PrintId(query_id());
   query_events_->MarkEvent(
       Substitute("All $0 execution backends ($1 fragment instances) started",
         num_backends, schedule_.GetNumFragmentInstances()));
@@ -472,10 +472,11 @@ Status Coordinator::UpdateStatus(const Status& status, const string& backend_hos
 
   if (is_fragment_failure) {
     // Log the id of the fragment that first failed so we can track it down more easily.
-    VLOG_QUERY << "query_id=" << query_id() << " failed because fragment_instance_id="
-               << instance_id << " on host=" << backend_hostname << " failed.";
+    VLOG_QUERY << "query_id=" << PrintId(query_id())
+               << " failed because fragment_instance_id=" << PrintId(instance_id)
+               << " on host=" << backend_hostname << " failed.";
   } else {
-    VLOG_QUERY << "query_id=" << query_id() << " failed due to error on host="
+    VLOG_QUERY << "query_id=" << PrintId(query_id()) << " failed due to error on host="
                << backend_hostname;
   }
   return query_status_;
@@ -488,7 +489,7 @@ Status Coordinator::FinalizeHdfsInsert() {
   DCHECK(has_called_wait_);
   DCHECK(finalize_params() != nullptr);
 
-  VLOG_QUERY << "Finalizing query: " << query_id();
+  VLOG_QUERY << "Finalizing query: " << PrintId(query_id());
   SCOPED_TIMER(finalization_timer_);
   Status return_status = GetStatus();
   if (return_status.ok()) {
@@ -520,14 +521,15 @@ Status Coordinator::WaitForBackendCompletion() {
   unique_lock<mutex> l(lock_);
   while (num_remaining_backends_ > 0 && query_status_.ok()) {
     VLOG_QUERY << "Coordinator waiting for backends to finish, "
-               << num_remaining_backends_ << " remaining. query_id=" << query_id();
+               << num_remaining_backends_ << " remaining. query_id="
+               << PrintId(query_id());
     backend_completion_cv_.Wait(l);
   }
   if (query_status_.ok()) {
-    VLOG_QUERY << "All backends finished successfully. query_id=" << query_id();
+    VLOG_QUERY << "All backends finished successfully. query_id=" << PrintId(query_id());
   } else {
     VLOG_QUERY << "All backends finished due to one or more errors. query_id="
-               << query_id() << ". " << query_status_.GetDetail();
+               << PrintId(query_id()) << ". " << query_status_.GetDetail();
   }
 
   return query_status_;
@@ -572,7 +574,7 @@ Status Coordinator::Wait() {
 }
 
 Status Coordinator::GetNext(QueryResultSet* results, int max_rows, bool* eos) {
-  VLOG_ROW << "GetNext() query_id=" << query_id();
+  VLOG_ROW << "GetNext() query_id=" << PrintId(query_id());
   DCHECK(has_called_wait_);
   SCOPED_TIMER(query_profile_->total_time_counter());
 
@@ -626,7 +628,7 @@ void Coordinator::Cancel(const Status* cause) {
 }
 
 void Coordinator::CancelInternal() {
-  VLOG_QUERY << "Cancel() query_id=" << query_id();
+  VLOG_QUERY << "Cancel() query_id=" << PrintId(query_id());
   // TODO: remove when restructuring cancellation, which should happen automatically
   // as soon as the coordinator knows that the query is finished
   DCHECK(!query_status_.ok());
@@ -687,9 +689,9 @@ Status Coordinator::UpdateBackendExecStatus(const TReportExecStatusParams& param
     DCHECK_GT(num_remaining_backends_, 0);
     if (VLOG_QUERY_IS_ON && num_remaining_backends_ > 1) {
       VLOG_QUERY << "Backend completed: "
-          << " host=" << backend_state->impalad_address()
+          << " host=" << TNetworkAddressToString(backend_state->impalad_address())
           << " remaining=" << num_remaining_backends_ - 1
-          << " query_id=" << query_id();
+          << " query_id=" << PrintId(query_id());
       BackendState::LogFirstInProgress(backend_states_);
     }
     if (--num_remaining_backends_ == 0 || !status.ok()) {
@@ -729,7 +731,7 @@ void Coordinator::ComputeQuerySummary() {
 
   stringstream info;
   for (BackendState* backend_state: backend_states_) {
-    info << backend_state->impalad_address() << "("
+    info << TNetworkAddressToString(backend_state->impalad_address()) << "("
          << PrettyPrinter::Print(backend_state->GetPeakConsumption(), TUnit::BYTES)
          << ") ";
   }
@@ -892,7 +894,7 @@ void Coordinator::FilterState::ApplyUpdate(const TUpdateFilterParams& params,
       if (!coord->filter_mem_tracker_->TryConsume(heap_space)) {
         VLOG_QUERY << "Not enough memory to allocate filter: "
                    << PrettyPrinter::Print(heap_space, TUnit::BYTES)
-                   << " (query_id=" << coord->query_id() << ")";
+                   << " (query_id=" << PrintId(coord->query_id()) << ")";
         // Disable, as one missing update means a correct filter cannot be produced.
         Disable(coord->filter_mem_tracker_);
       } else {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/data-stream-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-mgr.cc b/be/src/runtime/data-stream-mgr.cc
index ed1e29e..8f55f0a 100644
--- a/be/src/runtime/data-stream-mgr.cc
+++ b/be/src/runtime/data-stream-mgr.cc
@@ -82,7 +82,7 @@ shared_ptr<DataStreamRecvrBase> DataStreamMgr::CreateRecvr(const RowDescriptor*
   DCHECK(profile != nullptr);
   DCHECK(parent_tracker != nullptr);
   VLOG_FILE << "creating receiver for fragment_instance_id="
-            << fragment_instance_id << ", node=" << dest_node_id;
+            << PrintId(fragment_instance_id) << ", node=" << dest_node_id;
   shared_ptr<DataStreamRecvr> recvr(new DataStreamRecvr(this, parent_tracker, row_desc,
       fragment_instance_id, dest_node_id, num_senders, is_merging, buffer_size, profile));
   size_t hash_value = GetHashValue(fragment_instance_id, dest_node_id);
@@ -127,9 +127,9 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvrOrWait(
   const string& time_taken = PrettyPrinter::Print(sw.ElapsedTime(), TUnit::TIME_NS);
   if (timed_out) {
     LOG(INFO) << "Datastream sender timed-out waiting for recvr for fragment_instance_id="
-              << fragment_instance_id << " (time-out was: " << time_taken << "). "
-              << "Increase --datastream_sender_timeout_ms if you see this message "
-              << "frequently.";
+              << PrintId(fragment_instance_id) << " (time-out was: " << time_taken <<
+              "). Increase --datastream_sender_timeout_ms if you see this message "
+              "frequently.";
   } else {
     VLOG_RPC << "Datastream sender waited for " << time_taken
              << ", and did not time-out.";
@@ -148,7 +148,7 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvrOrWait(
 
 shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvr(
     const TUniqueId& fragment_instance_id, PlanNodeId node_id, bool acquire_lock) {
-  VLOG_ROW << "looking up fragment_instance_id=" << fragment_instance_id
+  VLOG_ROW << "looking up fragment_instance_id=" << PrintId(fragment_instance_id)
            << ", node=" << node_id;
   size_t hash_value = GetHashValue(fragment_instance_id, node_id);
   if (acquire_lock) lock_.lock();
@@ -169,7 +169,7 @@ shared_ptr<DataStreamRecvr> DataStreamMgr::FindRecvr(
 
 Status DataStreamMgr::AddData(const TUniqueId& fragment_instance_id,
     PlanNodeId dest_node_id, const TRowBatch& thrift_batch, int sender_id) {
-  VLOG_ROW << "AddData(): fragment_instance_id=" << fragment_instance_id
+  VLOG_ROW << "AddData(): fragment_instance_id=" << PrintId(fragment_instance_id)
            << " node=" << dest_node_id
            << " size=" << RowBatch::GetDeserializedSize(thrift_batch);
   bool already_unregistered;
@@ -197,7 +197,7 @@ Status DataStreamMgr::AddData(const TUniqueId& fragment_instance_id,
 
 Status DataStreamMgr::CloseSender(const TUniqueId& fragment_instance_id,
     PlanNodeId dest_node_id, int sender_id) {
-  VLOG_FILE << "CloseSender(): fragment_instance_id=" << fragment_instance_id
+  VLOG_FILE << "CloseSender(): fragment_instance_id=" << PrintId(fragment_instance_id)
             << ", node=" << dest_node_id;
   Status status;
   bool already_unregistered;
@@ -243,7 +243,7 @@ Status DataStreamMgr::CloseSender(const TUniqueId& fragment_instance_id,
 
 Status DataStreamMgr::DeregisterRecvr(
     const TUniqueId& fragment_instance_id, PlanNodeId node_id) {
-  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << fragment_instance_id
+  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << PrintId(fragment_instance_id)
              << ", node=" << node_id;
   size_t hash_value = GetHashValue(fragment_instance_id, node_id);
   lock_guard<mutex> l(lock_);
@@ -268,7 +268,7 @@ Status DataStreamMgr::DeregisterRecvr(
   }
 
   stringstream err;
-  err << "unknown row receiver id: fragment_instance_id=" << fragment_instance_id
+  err << "unknown row receiver id: fragment_instance_id=" << PrintId(fragment_instance_id)
       << " node_id=" << node_id;
   LOG(ERROR) << err.str();
   return Status(err.str());
@@ -276,7 +276,7 @@ Status DataStreamMgr::DeregisterRecvr(
 
 void DataStreamMgr::Cancel(const TUniqueId& fragment_instance_id) {
   VLOG_QUERY << "cancelling all streams for fragment_instance_id="
-             << fragment_instance_id;
+             << PrintId(fragment_instance_id);
   lock_guard<mutex> l(lock_);
   FragmentRecvrSet::iterator i =
       fragment_recvr_set_.lower_bound(make_pair(fragment_instance_id, 0));
@@ -285,7 +285,7 @@ void DataStreamMgr::Cancel(const TUniqueId& fragment_instance_id) {
     if (recvr.get() == NULL) {
       // keep going but at least log it
       stringstream err;
-      err << "Cancel(): missing in stream_map: fragment_instance_id=" << i->first
+      err << "Cancel(): missing in stream_map: fragment_instance_id=" << PrintId(i->first)
           << " node=" << i->second;
       LOG(ERROR) << err.str();
     } else {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/data-stream-recvr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-recvr.cc b/be/src/runtime/data-stream-recvr.cc
index cdea4a0..8d9047f 100644
--- a/be/src/runtime/data-stream-recvr.cc
+++ b/be/src/runtime/data-stream-recvr.cc
@@ -112,7 +112,8 @@ Status DataStreamRecvr::SenderQueue::GetBatch(RowBatch** next_batch) {
   unique_lock<mutex> l(lock_);
   // wait until something shows up or we know we're done
   while (!is_cancelled_ && batch_queue_.empty() && num_remaining_senders_ > 0) {
-    VLOG_ROW << "wait arrival fragment_instance_id=" << recvr_->fragment_instance_id()
+    VLOG_ROW << "wait arrival fragment_instance_id="
+             << PrintId(recvr_->fragment_instance_id())
              << " node=" << recvr_->dest_node_id();
     // Don't count time spent waiting on the sender as active time.
     CANCEL_SAFE_SCOPED_TIMER(recvr_->data_arrival_timer_, &is_cancelled_);
@@ -221,7 +222,7 @@ void DataStreamRecvr::SenderQueue::DecrementSenders() {
   DCHECK_GT(num_remaining_senders_, 0);
   num_remaining_senders_ = max(0, num_remaining_senders_ - 1);
   VLOG_FILE << "decremented senders: fragment_instance_id="
-            << recvr_->fragment_instance_id()
+            << PrintId(recvr_->fragment_instance_id())
             << " node_id=" << recvr_->dest_node_id()
             << " #senders=" << num_remaining_senders_;
   if (num_remaining_senders_ == 0) data_arrival_cv_.NotifyOne();
@@ -233,7 +234,7 @@ void DataStreamRecvr::SenderQueue::Cancel() {
     if (is_cancelled_) return;
     is_cancelled_ = true;
     VLOG_QUERY << "cancelled stream: fragment_instance_id_="
-               << recvr_->fragment_instance_id()
+               << PrintId(recvr_->fragment_instance_id())
                << " node_id=" << recvr_->dest_node_id();
   }
   // Wake up all threads waiting to produce/consume batches.  They will all

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/data-stream-sender.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/data-stream-sender.cc b/be/src/runtime/data-stream-sender.cc
index f68788e..7d0766d 100644
--- a/be/src/runtime/data-stream-sender.cc
+++ b/be/src/runtime/data-stream-sender.cc
@@ -165,8 +165,9 @@ Status DataStreamSender::Channel::Init(RuntimeState* state) {
 }
 
 Status DataStreamSender::Channel::SendBatch(TRowBatch* batch) {
-  VLOG_ROW << "Channel::SendBatch() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_ << " #rows=" << batch->num_rows;
+  VLOG_ROW << "Channel::SendBatch() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
+           << " #rows=" << batch->num_rows;
   // return if the previous batch saw an error
   RETURN_IF_ERROR(GetSendStatus());
   {
@@ -193,8 +194,8 @@ void DataStreamSender::Channel::TransmitData(int thread_id, const TRowBatch* bat
 
 void DataStreamSender::Channel::TransmitDataHelper(const TRowBatch* batch) {
   DCHECK(batch != NULL);
-  VLOG_ROW << "Channel::TransmitData() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_ROW << "Channel::TransmitData() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows=" << batch->num_rows;
   TTransmitDataParams params;
   params.protocol_version = ImpalaInternalServiceVersion::V1;
@@ -276,15 +277,15 @@ Status DataStreamSender::Channel::GetSendStatus() {
   WaitForRpc();
   if (!rpc_status_.ok()) {
     LOG(ERROR) << "channel send to " << TNetworkAddressToString(address_) << " failed "
-               << "(fragment_instance_id=" << fragment_instance_id_ << "): "
+               << "(fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
   }
   return rpc_status_;
 }
 
 Status DataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
-  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows= " << batch_->num_rows();
 
   // We can return an error here and not go on to send the EOS RPC because the error that
@@ -314,7 +315,7 @@ Status DataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
   rpc_status_ = DoTransmitDataRpc(&client, params, &res);
   if (!rpc_status_.ok()) {
     LOG(ERROR) << "Failed to send EOS to " << TNetworkAddressToString(address_)
-               << " (fragment_instance_id=" << fragment_instance_id_ << "): "
+               << " (fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
     return rpc_status_;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/fragment-instance-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/fragment-instance-state.cc b/be/src/runtime/fragment-instance-state.cc
index 1a0d452..a6ae1ff 100644
--- a/be/src/runtime/fragment-instance-state.cc
+++ b/be/src/runtime/fragment-instance-state.cc
@@ -361,7 +361,7 @@ void FragmentInstanceState::ReportProfileThread() {
     SendReport(false, Status::OK());
   }
 
-  VLOG_FILE << "exiting reporting thread: instance_id=" << instance_id();
+  VLOG_FILE << "exiting reporting thread: instance_id=" << PrintId(instance_id());
 }
 
 void FragmentInstanceState::SendReport(bool done, const Status& status) {
@@ -370,7 +370,7 @@ void FragmentInstanceState::SendReport(bool done, const Status& status) {
 
   if (VLOG_FILE_IS_ON) {
     VLOG_FILE << "Reporting " << (done ? "final " : "") << "profile for instance "
-        << runtime_state_->fragment_instance_id();
+        << PrintId(runtime_state_->fragment_instance_id());
     stringstream ss;
     profile()->PrettyPrint(&ss);
     VLOG_FILE << ss.str();
@@ -554,5 +554,5 @@ void FragmentInstanceState::PrintVolumeIds() {
   profile()->AddInfoString(HdfsScanNodeBase::HDFS_SPLIT_STATS_DESC, str.str());
   VLOG_FILE
       << "Hdfs split stats (<volume id>:<# splits>/<split lengths>) for query="
-      << query_id() << ":\n" << str.str();
+      << PrintId(query_id()) << ":\n" << str.str();
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/krpc-data-stream-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-mgr.cc b/be/src/runtime/krpc-data-stream-mgr.cc
index cd8d90b..2aca6c8 100644
--- a/be/src/runtime/krpc-data-stream-mgr.cc
+++ b/be/src/runtime/krpc-data-stream-mgr.cc
@@ -104,7 +104,7 @@ shared_ptr<DataStreamRecvrBase> KrpcDataStreamMgr::CreateRecvr(
   DCHECK(profile != nullptr);
   DCHECK(parent_tracker != nullptr);
   DCHECK(client != nullptr);
-  VLOG_FILE << "creating receiver for fragment_instance_id="<< finst_id
+  VLOG_FILE << "creating receiver for fragment_instance_id="<< PrintId(finst_id)
             << ", node=" << dest_node_id;
   shared_ptr<KrpcDataStreamRecvr> recvr(new KrpcDataStreamRecvr(
       this, parent_tracker, row_desc, finst_id, dest_node_id, num_senders, is_merging,
@@ -149,7 +149,7 @@ shared_ptr<DataStreamRecvrBase> KrpcDataStreamMgr::CreateRecvr(
 
 shared_ptr<KrpcDataStreamRecvr> KrpcDataStreamMgr::FindRecvr(
     const TUniqueId& finst_id, PlanNodeId dest_node_id, bool* already_unregistered) {
-  VLOG_ROW << "looking up fragment_instance_id=" << finst_id
+  VLOG_ROW << "looking up fragment_instance_id=" << PrintId(finst_id)
            << ", node=" << dest_node_id;
   *already_unregistered = false;
   uint32_t hash_value = GetHashValue(finst_id, dest_node_id);
@@ -290,7 +290,7 @@ void KrpcDataStreamMgr::CloseSender(const EndDataStreamRequestPB* request,
 
 Status KrpcDataStreamMgr::DeregisterRecvr(
     const TUniqueId& finst_id, PlanNodeId dest_node_id) {
-  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << finst_id
+  VLOG_QUERY << "DeregisterRecvr(): fragment_instance_id=" << PrintId(finst_id)
              << ", node=" << dest_node_id;
   uint32_t hash_value = GetHashValue(finst_id, dest_node_id);
   lock_guard<mutex> l(lock_);
@@ -321,7 +321,7 @@ Status KrpcDataStreamMgr::DeregisterRecvr(
 }
 
 void KrpcDataStreamMgr::Cancel(const TUniqueId& finst_id) {
-  VLOG_QUERY << "cancelling all streams for fragment_instance_id=" << finst_id;
+  VLOG_QUERY << "cancelling all streams for fragment_instance_id=" << PrintId(finst_id);
   lock_guard<mutex> l(lock_);
   FragmentRecvrSet::iterator iter =
       fragment_recvr_set_.lower_bound(make_pair(finst_id, 0));

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/krpc-data-stream-recvr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-recvr.cc b/be/src/runtime/krpc-data-stream-recvr.cc
index 6e47bd6..be51f32 100644
--- a/be/src/runtime/krpc-data-stream-recvr.cc
+++ b/be/src/runtime/krpc-data-stream-recvr.cc
@@ -232,7 +232,8 @@ Status KrpcDataStreamRecvr::SenderQueue::GetBatch(RowBatch** next_batch) {
       // is pending insertion so this thread is guaranteed to wake up at some point.
       DCHECK(deferred_rpcs_.empty() ||
           (num_deserialize_tasks_pending_ + num_pending_enqueue_) > 0);
-      VLOG_ROW << "wait arrival fragment_instance_id=" << recvr_->fragment_instance_id()
+      VLOG_ROW << "wait arrival fragment_instance_id="
+               << PrintId(recvr_->fragment_instance_id())
                << " node=" << recvr_->dest_node_id();
       // Don't count time spent waiting on the sender as active time.
       CANCEL_SAFE_SCOPED_TIMER(recvr_->data_wait_timer_, &is_cancelled_);
@@ -534,7 +535,7 @@ void KrpcDataStreamRecvr::SenderQueue::DecrementSenders() {
   DCHECK_GT(num_remaining_senders_, 0);
   num_remaining_senders_ = max(0, num_remaining_senders_ - 1);
   VLOG_FILE << "decremented senders: fragment_instance_id="
-            << recvr_->fragment_instance_id()
+            << PrintId(recvr_->fragment_instance_id())
             << " node_id=" << recvr_->dest_node_id()
             << " #senders=" << num_remaining_senders_;
   if (num_remaining_senders_ == 0) data_arrival_cv_.notify_one();
@@ -555,7 +556,7 @@ void KrpcDataStreamRecvr::SenderQueue::Cancel() {
     }
   }
   VLOG_QUERY << "cancelled stream: fragment_instance_id="
-             << recvr_->fragment_instance_id()
+             << PrintId(recvr_->fragment_instance_id())
              << " node_id=" << recvr_->dest_node_id();
   // Wake up all threads waiting to produce/consume batches. They will all
   // notice that the stream is cancelled and handle it.

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/krpc-data-stream-sender.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/krpc-data-stream-sender.cc b/be/src/runtime/krpc-data-stream-sender.cc
index 0f11dec..cd30f06 100644
--- a/be/src/runtime/krpc-data-stream-sender.cc
+++ b/be/src/runtime/krpc-data-stream-sender.cc
@@ -333,7 +333,7 @@ Status KrpcDataStreamSender::Channel::WaitForRpc(std::unique_lock<SpinLock>* loc
   DCHECK(!rpc_in_flight_);
   if (UNLIKELY(!rpc_status_.ok())) {
     LOG(ERROR) << "channel send to " << TNetworkAddressToString(address_) << " failed: "
-               << "(fragment_instance_id=" << fragment_instance_id_ << "): "
+               << "(fragment_instance_id=" << PrintId(fragment_instance_id_) << "): "
                << rpc_status_.GetDetail();
     return rpc_status_;
   }
@@ -449,8 +449,8 @@ Status KrpcDataStreamSender::Channel::DoTransmitDataRpc() {
 
 Status KrpcDataStreamSender::Channel::TransmitData(
     const OutboundRowBatch* outbound_batch) {
-  VLOG_ROW << "Channel::TransmitData() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_ROW << "Channel::TransmitData() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows=" << outbound_batch->header()->num_rows();
   std::unique_lock<SpinLock> l(lock_);
   RETURN_IF_ERROR(WaitForRpc(&l));
@@ -529,8 +529,8 @@ Status KrpcDataStreamSender::Channel::DoEndDataStreamRpc() {
 }
 
 Status KrpcDataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
-  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id=" << fragment_instance_id_
-           << " dest_node=" << dest_node_id_
+  VLOG_RPC << "Channel::FlushAndSendEos() fragment_instance_id="
+           << PrintId(fragment_instance_id_) << " dest_node=" << dest_node_id_
            << " #rows= " << batch_->num_rows();
 
   // We can return an error here and not go on to send the EOS RPC because the error that
@@ -544,7 +544,7 @@ Status KrpcDataStreamSender::Channel::FlushAndSendEos(RuntimeState* state) {
     DCHECK(rpc_status_.ok());
     if (UNLIKELY(remote_recvr_closed_)) return Status::OK();
     VLOG_RPC << "calling EndDataStream() to terminate channel. fragment_instance_id="
-             << fragment_instance_id_;
+             << PrintId(fragment_instance_id_);
     rpc_in_flight_ = true;
     COUNTER_ADD(parent_->eos_sent_counter_, 1);
     RETURN_IF_ERROR(DoEndDataStreamRpc());

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/mem-tracker.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/mem-tracker.cc b/be/src/runtime/mem-tracker.cc
index e5aa290..96c02a2 100644
--- a/be/src/runtime/mem-tracker.cc
+++ b/be/src/runtime/mem-tracker.cc
@@ -196,7 +196,7 @@ MemTracker* MemTracker::CreateQueryMemTracker(const TUniqueId& id,
       ExecEnv::GetInstance()->pool_mem_trackers()->GetRequestPoolMemTracker(
           pool_name, true);
   MemTracker* tracker = obj_pool->Add(new MemTracker(
-      byte_limit, Substitute("Query($0)", lexical_cast<string>(id)), pool_tracker));
+      byte_limit, Substitute("Query($0)", PrintId(id)), pool_tracker));
   tracker->is_query_mem_tracker_ = true;
   tracker->query_id_ = id;
   return tracker;
@@ -370,7 +370,7 @@ Status MemTracker::MemLimitExceeded(RuntimeState* state, const std::string& deta
        << " without exceeding limit." << endl;
   }
   ss << "Error occurred on backend " << GetBackendString();
-  if (state != nullptr) ss << " by fragment " << state->fragment_instance_id();
+  if (state != nullptr) ss << " by fragment " << PrintId(state->fragment_instance_id());
   ss << endl;
   ExecEnv* exec_env = ExecEnv::GetInstance();
   MemTracker* process_tracker = exec_env->process_mem_tracker();

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/query-exec-mgr.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-exec-mgr.cc b/be/src/runtime/query-exec-mgr.cc
index 967dc4b..2d66f57 100644
--- a/be/src/runtime/query-exec-mgr.cc
+++ b/be/src/runtime/query-exec-mgr.cc
@@ -44,7 +44,7 @@ DEFINE_int32(log_mem_usage_interval, 0, "If non-zero, impalad will output memory
 Status QueryExecMgr::StartQuery(const TExecQueryFInstancesParams& params) {
   TUniqueId query_id = params.query_ctx.query_id;
   VLOG_QUERY << "StartQueryFInstances() query_id=" << PrintId(query_id)
-             << " coord=" << params.query_ctx.coord_address;
+             << " coord=" << TNetworkAddressToString(params.query_ctx.coord_address);
 
   bool dummy;
   QueryState* qs = GetOrCreateQueryState(params.query_ctx, &dummy);
@@ -92,7 +92,7 @@ QueryState* QueryExecMgr::GetQueryState(const TUniqueId& query_id) {
     refcnt = qs->refcnt_.Add(1);
   }
   DCHECK(qs != nullptr && refcnt > 0);
-  VLOG_QUERY << "QueryState: query_id=" << query_id << " refcnt=" << refcnt;
+  VLOG_QUERY << "QueryState: query_id=" << PrintId(query_id) << " refcnt=" << refcnt;
   return qs;
 }
 
@@ -167,7 +167,7 @@ void QueryExecMgr::ReleaseQueryState(QueryState* qs) {
     // someone else might have gc'd the entry
     if (it == map_ref->end()) return;
     qs_from_map = it->second;
-    DCHECK_EQ(qs_from_map->query_ctx().query_id, query_id);
+    DCHECK(qs_from_map->query_ctx().query_id == query_id);
     int32_t cnt = qs_from_map->refcnt_.Load();
     DCHECK_GE(cnt, 0);
     // someone else might have increased the refcnt in the meantime

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/query-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/query-state.cc b/be/src/runtime/query-state.cc
index 04a4283..22616ed 100644
--- a/be/src/runtime/query-state.cc
+++ b/be/src/runtime/query-state.cc
@@ -413,7 +413,7 @@ void QueryState::ExecFInstance(FragmentInstanceState* fis) {
 }
 
 void QueryState::Cancel() {
-  VLOG_QUERY << "Cancel: query_id=" << query_id();
+  VLOG_QUERY << "Cancel: query_id=" << PrintId(query_id());
   (void) instances_prepared_promise_.Get();
   if (!is_cancelled_.CompareAndSwap(0, 1)) return;
   for (auto entry: fis_map_) entry.second->Cancel();

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/runtime-filter-bank.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/runtime-filter-bank.cc b/be/src/runtime/runtime-filter-bank.cc
index 4e23a42..64638a6 100644
--- a/be/src/runtime/runtime-filter-bank.cc
+++ b/be/src/runtime/runtime-filter-bank.cc
@@ -263,7 +263,8 @@ void RuntimeFilterBank::Close() {
   obj_pool_.Clear();
   mem_pool_.FreeAll();
   if (buffer_pool_client_.is_registered()) {
-    VLOG_FILE << "RuntimeFilterBank (Fragment Id: " << state_->fragment_instance_id()
+    VLOG_FILE << "RuntimeFilterBank (Fragment Id: "
+              << PrintId(state_->fragment_instance_id())
               << ") returning reservation " << total_bloom_filter_mem_required_;
     state_->query_state()->initial_reservations()->Return(
         &buffer_pool_client_, total_bloom_filter_mem_required_);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/runtime/runtime-state.cc
----------------------------------------------------------------------
diff --git a/be/src/runtime/runtime-state.cc b/be/src/runtime/runtime-state.cc
index 4b39ec8..29ea737 100644
--- a/be/src/runtime/runtime-state.cc
+++ b/be/src/runtime/runtime-state.cc
@@ -167,7 +167,7 @@ bool RuntimeState::LogError(const ErrorMsg& message, int vlog_level) {
   // All errors go to the log, unreported_error_count_ is counted independently of the
   // size of the error_log to account for errors that were already reported to the
   // coordinator
-  VLOG(vlog_level) << "Error from query " << query_id() << ": " << message.msg();
+  VLOG(vlog_level) << "Error from query " << PrintId(query_id()) << ": " << message.msg();
   if (ErrorCount(error_log_) < query_options().max_errors) {
     AppendError(&error_log_, message);
     return true;
@@ -239,7 +239,7 @@ void RuntimeState::ReleaseResources() {
 
   // No more memory should be tracked for this instance at this point.
   if (instance_mem_tracker_->consumption() != 0) {
-    LOG(WARNING) << "Query " << query_id() << " may have leaked memory." << endl
+    LOG(WARNING) << "Query " << PrintId(query_id()) << " may have leaked memory." << endl
                  << instance_mem_tracker_->LogUsage(MemTracker::UNLIMITED_DEPTH);
   }
   instance_mem_tracker_->Close();

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/scheduling/admission-controller.cc
----------------------------------------------------------------------
diff --git a/be/src/scheduling/admission-controller.cc b/be/src/scheduling/admission-controller.cc
index 640a6af..7cdcd02 100644
--- a/be/src/scheduling/admission-controller.cc
+++ b/be/src/scheduling/admission-controller.cc
@@ -505,7 +505,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     pool_config_map_[pool_name] = pool_cfg;
     PoolStats* stats = GetPoolStats(pool_name);
     stats->UpdateConfigMetrics(pool_cfg);
-    VLOG_QUERY << "Schedule for id=" << schedule->query_id() << " in pool_name="
+    VLOG_QUERY << "Schedule for id=" << PrintId(schedule->query_id()) << " in pool_name="
                << pool_name << " cluster_mem_needed="
                << PrintBytes(schedule->GetClusterMemoryEstimate())
                << " PoolConfig: max_requests=" << max_requests << " max_queued="
@@ -526,7 +526,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
 
     if (CanAdmitRequest(*schedule, pool_cfg, false, &not_admitted_reason)) {
       DCHECK_EQ(stats->local_stats().num_queued, 0);
-      VLOG_QUERY << "Admitted query id=" << schedule->query_id();
+      VLOG_QUERY << "Admitted query id=" << PrintId(schedule->query_id());
       stats->Admit(*schedule);
       UpdateHostMemAdmitted(*schedule, schedule->GetPerHostMemoryEstimate());
       schedule->set_is_admitted(true);
@@ -537,7 +537,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     }
 
     // We cannot immediately admit but do not need to reject, so queue the request
-    VLOG_QUERY << "Queuing, query id=" << schedule->query_id();
+    VLOG_QUERY << "Queuing, query id=" << PrintId(schedule->query_id());
     stats->Queue(*schedule);
     queue->Enqueue(&queue_node);
   }
@@ -600,7 +600,7 @@ Status AdmissionController::AdmitQuery(QuerySchedule* schedule) {
     schedule->set_is_admitted(true);
     schedule->summary_profile()->AddInfoString(PROFILE_INFO_KEY_ADMISSION_RESULT,
         PROFILE_INFO_VAL_ADMIT_QUEUED);
-    VLOG_QUERY << "Admitted queued query id=" << schedule->query_id();
+    VLOG_QUERY << "Admitted queued query id=" << PrintId(schedule->query_id());
     VLOG_RPC << "Final: " << stats->DebugString();
     return Status::OK();
   }
@@ -615,7 +615,7 @@ void AdmissionController::ReleaseQuery(const QuerySchedule& schedule) {
     stats->Release(schedule);
     UpdateHostMemAdmitted(schedule, -schedule.GetPerHostMemoryEstimate());
     pools_for_updates_.insert(pool_name);
-    VLOG_RPC << "Released query id=" << schedule.query_id() << " "
+    VLOG_RPC << "Released query id=" << PrintId(schedule.query_id()) << " "
              << stats->DebugString();
   }
   dequeue_cv_.NotifyOne();
@@ -875,11 +875,11 @@ void AdmissionController::DequeueLoop() {
         // TODO: Requests further in the queue may be blocked unnecessarily. Consider a
         // better policy once we have better test scenarios.
         if (!CanAdmitRequest(schedule, pool_config, true, &not_admitted_reason)) {
-          VLOG_RPC << "Could not dequeue query id=" << schedule.query_id()
+          VLOG_RPC << "Could not dequeue query id=" << PrintId(schedule.query_id())
                    << " reason: " << not_admitted_reason;
           break;
         }
-        VLOG_RPC << "Dequeuing query=" << schedule.query_id();
+        VLOG_RPC << "Dequeuing query=" << PrintId(schedule.query_id());
         queue.Dequeue();
         stats->Dequeue(schedule, false);
         stats->Admit(schedule);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/scheduling/scheduler.cc
----------------------------------------------------------------------
diff --git a/be/src/scheduling/scheduler.cc b/be/src/scheduling/scheduler.cc
index 60302d3..a26d175 100644
--- a/be/src/scheduling/scheduler.cc
+++ b/be/src/scheduling/scheduler.cc
@@ -172,7 +172,7 @@ void Scheduler::UpdateMembership(
       // adds the IP address to local_backend_descriptor_. If it is empty, then either
       // that code has been changed, or someone else is sending malformed packets.
       VLOG(1) << "Ignoring subscription request with empty IP address from subscriber: "
-              << be_desc.address;
+              << TNetworkAddressToString(be_desc.address);
       continue;
     }
     if (item.key == local_backend_id_
@@ -181,9 +181,8 @@ void Scheduler::UpdateMembership(
       // will try to re-register (i.e. overwrite their subscription), but there is
       // likely a configuration problem.
       LOG_EVERY_N(WARNING, 30) << "Duplicate subscriber registration from address: "
-                               << be_desc.address
-                               << " (we are: " << local_backend_descriptor_.address
-                               << ")";
+           << TNetworkAddressToString(be_desc.address) << " (we are: "
+           << TNetworkAddressToString(local_backend_descriptor_.address) << ")";
       continue;
     }
     if (be_desc.is_executor) {
@@ -216,7 +215,7 @@ const TBackendDescriptor& Scheduler::LookUpBackendDesc(
   const TBackendDescriptor* desc = executor_config.LookUpBackendDesc(host);
   if (desc == nullptr) {
     // Local host may not be in executor_config if it's a dedicated coordinator.
-    DCHECK_EQ(host, local_backend_descriptor_.address);
+    DCHECK(host == local_backend_descriptor_.address);
     DCHECK(!local_backend_descriptor_.is_executor);
     desc = &local_backend_descriptor_;
   }
@@ -731,7 +730,7 @@ void Scheduler::ComputeBackendExecParams(QuerySchedule* schedule) {
 
   stringstream min_reservation_ss;
   for (const auto& e: per_backend_params) {
-    min_reservation_ss << e.first << "("
+    min_reservation_ss << TNetworkAddressToString(e.first) << "("
          << PrettyPrinter::Print(e.second.min_reservation_bytes, TUnit::BYTES)
          << ") ";
   }
@@ -907,7 +906,8 @@ void Scheduler::AssignmentCtx::RecordScanRangeAssignment(
   scan_range_params_list->push_back(scan_range_params);
 
   if (VLOG_FILE_IS_ON) {
-    VLOG_FILE << "Scheduler assignment to executor: " << executor.address << "("
+    VLOG_FILE << "Scheduler assignment to executor: "
+              << TNetworkAddressToString(executor.address) << "("
               << (remote_read ? "remote" : "local") << " selection)";
   }
 }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/child-query.cc
----------------------------------------------------------------------
diff --git a/be/src/service/child-query.cc b/be/src/service/child-query.cc
index 520a834..94609bf 100644
--- a/be/src/service/child-query.cc
+++ b/be/src/service/child-query.cc
@@ -134,7 +134,8 @@ void ChildQuery::Cancel() {
   Status status = ImpalaServer::THandleIdentifierToTUniqueId(hs2_handle_.operationId,
       &session_id, &secret_unused);
   if (status.ok()) {
-    VLOG_QUERY << "Cancelling and closing child query with operation id: " << session_id;
+    VLOG_QUERY << "Cancelling and closing child query with operation id: " <<
+        PrintId(session_id);
   } else {
     VLOG_QUERY << "Cancelling and closing child query. Failed to get query id: " <<
         status;

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/client-request-state.cc
----------------------------------------------------------------------
diff --git a/be/src/service/client-request-state.cc b/be/src/service/client-request-state.cc
index 6d38521..12b9b78 100644
--- a/be/src/service/client-request-state.cc
+++ b/be/src/service/client-request-state.cc
@@ -91,7 +91,7 @@ ClientRequestState::ClientRequestState(
 
   profile_->set_name("Query (id=" + PrintId(query_id()) + ")");
   summary_profile_->AddInfoString("Session ID", PrintId(session_id()));
-  summary_profile_->AddInfoString("Session Type", PrintTSessionType(session_type()));
+  summary_profile_->AddInfoString("Session Type", PrintThriftEnum(session_type()));
   if (session_type() == TSessionType::HIVESERVER2) {
     summary_profile_->AddInfoString("HiveServer2 Protocol Version",
         Substitute("V$0", 1 + session->hs2_version));
@@ -102,14 +102,14 @@ ClientRequestState::ClientRequestState(
       TimePrecision::Nanosecond));
   summary_profile_->AddInfoString("End Time", "");
   summary_profile_->AddInfoString("Query Type", "N/A");
-  summary_profile_->AddInfoString("Query State", PrintQueryState(BeeswaxQueryState()));
+  summary_profile_->AddInfoString("Query State", PrintThriftEnum(BeeswaxQueryState()));
   summary_profile_->AddInfoString("Query Status", "OK");
   summary_profile_->AddInfoString("Impala Version", GetVersionString(/* compact */ true));
   summary_profile_->AddInfoString("User", effective_user());
   summary_profile_->AddInfoString("Connected User", connected_user());
   summary_profile_->AddInfoString("Delegated User", do_as_user());
   summary_profile_->AddInfoString("Network Address",
-      lexical_cast<string>(session_->network_address));
+      TNetworkAddressToString(session_->network_address));
   summary_profile_->AddInfoString("Default Db", default_db());
   summary_profile_->AddInfoStringRedacted(
       "Sql Statement", query_ctx_.client_request.stmt);
@@ -140,7 +140,7 @@ Status ClientRequestState::Exec(TExecRequest* exec_request) {
   exec_request_ = *exec_request;
 
   profile_->AddChild(server_profile_);
-  summary_profile_->AddInfoString("Query Type", PrintTStmtType(stmt_type()));
+  summary_profile_->AddInfoString("Query Type", PrintThriftEnum(stmt_type()));
   summary_profile_->AddInfoString("Query Options (set by configuration)",
       DebugQueryOptions(query_ctx_.client_request.query_options));
   summary_profile_->AddInfoString("Query Options (set by configuration and planner)",
@@ -489,7 +489,7 @@ Status ClientRequestState::ExecQueryOrDmlRequest(
 
 Status ClientRequestState::ExecDdlRequest() {
   string op_type = catalog_op_type() == TCatalogOpType::DDL ?
-      PrintTDdlType(ddl_type()) : PrintTCatalogOpType(catalog_op_type());
+      PrintThriftEnum(ddl_type()) : PrintThriftEnum(catalog_op_type());
   summary_profile_->AddInfoString("DDL Type", op_type);
 
   if (catalog_op_type() != TCatalogOpType::DDL &&
@@ -575,7 +575,7 @@ void ClientRequestState::Done() {
     uint64_t latest_kudu_ts =
         coord_->dml_exec_state()->GetKuduLatestObservedTimestamp();
     if (latest_kudu_ts > 0) {
-      VLOG_RPC << "Updating session (id=" << session_id()  << ") with latest "
+      VLOG_RPC << "Updating session (id=" << PrintId(session_id())  << ") with latest "
                << "observed Kudu timestamp: " << latest_kudu_ts;
       lock_guard<mutex> session_lock(session_->lock);
       session_->kudu_latest_observed_ts = std::max<uint64_t>(
@@ -599,7 +599,7 @@ void ClientRequestState::Done() {
 Status ClientRequestState::Exec(const TMetadataOpRequest& exec_request) {
   TResultSet metadata_op_result;
   // Like the other Exec(), fill out as much profile information as we're able to.
-  summary_profile_->AddInfoString("Query Type", PrintTStmtType(TStmtType::DDL));
+  summary_profile_->AddInfoString("Query Type", PrintThriftEnum(TStmtType::DDL));
   RETURN_IF_ERROR(frontend_->ExecHiveServer2MetadataOp(exec_request,
       &metadata_op_result));
   result_metadata_ = metadata_op_result.schema;
@@ -926,7 +926,7 @@ Status ClientRequestState::UpdateCatalog() {
     catalog_update.header.__set_requesting_user(effective_user());
     if (!coord()->dml_exec_state()->PrepareCatalogUpdate(&catalog_update)) {
       VLOG_QUERY << "No partitions altered, not updating metastore (query id: "
-                 << query_id() << ")";
+                 << PrintId(query_id()) << ")";
     } else {
       // TODO: We track partitions written to, not created, which means
       // that we do more work than is necessary, because written-to
@@ -1110,7 +1110,7 @@ void ClientRequestState::ClearResultCache() {
 void ClientRequestState::UpdateOperationState(
     TOperationState::type operation_state) {
   operation_state_ = operation_state;
-  summary_profile_->AddInfoString("Query State", PrintQueryState(BeeswaxQueryState()));
+  summary_profile_->AddInfoString("Query State", PrintThriftEnum(BeeswaxQueryState()));
 }
 
 beeswax::QueryState::type ClientRequestState::BeeswaxQueryState() const {

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/impala-beeswax-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-beeswax-server.cc b/be/src/service/impala-beeswax-server.cc
index 4875adb..1096677 100644
--- a/be/src/service/impala-beeswax-server.cc
+++ b/be/src/service/impala-beeswax-server.cc
@@ -286,7 +286,7 @@ void ImpalaServer::get_log(string& log, const LogContextId& context) {
   shared_ptr<ClientRequestState> request_state = GetClientRequestState(query_id);
   if (request_state.get() == nullptr) {
     stringstream str;
-    str << "unknown query id: " << query_id;
+    str << "unknown query id: " << PrintId(query_id);
     LOG(ERROR) << str.str();
     return;
   }

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/impala-hs2-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-hs2-server.cc b/be/src/service/impala-hs2-server.cc
index 765fccf..36c7169 100644
--- a/be/src/service/impala-hs2-server.cc
+++ b/be/src/service/impala-hs2-server.cc
@@ -338,8 +338,8 @@ void ImpalaServer::OpenSession(TOpenSessionResp& return_val,
   TQueryOptionsToMap(state->QueryOptions(), &return_val.configuration);
 
   // OpenSession() should return the coordinator's HTTP server address.
-  const string& http_addr = lexical_cast<string>(
-      MakeNetworkAddress(FLAGS_hostname, FLAGS_webserver_port));
+  const string& http_addr = TNetworkAddressToString(MakeNetworkAddress(
+      FLAGS_hostname, FLAGS_webserver_port));
   return_val.configuration.insert(make_pair("http_addr", http_addr));
 
   // Put the session state in session_state_map_

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/impala-http-handler.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-http-handler.cc b/be/src/service/impala-http-handler.cc
index 9b8d597..7be4370 100644
--- a/be/src/service/impala-http-handler.cc
+++ b/be/src/service/impala-http-handler.cc
@@ -198,7 +198,7 @@ void ImpalaHttpHandler::CloseSessionHandler(const Webserver::ArgumentMap& args,
     return;
   }
   stringstream ss;
-  ss << "Session " << unique_id << " closed successfully";
+  ss << "Session " << PrintId(unique_id) << " closed successfully";
   Value message(ss.str().c_str(), document->GetAllocator());
   document->AddMember("contents", message, document->GetAllocator());
 }
@@ -250,7 +250,7 @@ void ImpalaHttpHandler::InflightQueryIdsHandler(const Webserver::ArgumentMap& ar
   stringstream ss;
   server_->client_request_state_map_.DoFuncForAllEntries(
       [&](const std::shared_ptr<ClientRequestState>& request_state) {
-          ss << request_state->query_id() << "\n";
+          ss << PrintId(request_state->query_id()) << "\n";
       });
   document->AddMember(Webserver::ENABLE_RAW_JSON_KEY, true, document->GetAllocator());
   Value query_ids(ss.str().c_str(), document->GetAllocator());
@@ -419,7 +419,7 @@ void ImpalaHttpHandler::QueryStateHandler(const Webserver::ArgumentMap& args,
     for (const ImpalaServer::QueryLocations::value_type& location:
          server_->query_locations_) {
       Value location_json(kObjectType);
-      Value location_name(lexical_cast<string>(location.first).c_str(),
+      Value location_name(TNetworkAddressToString(location.first).c_str(),
           document->GetAllocator());
       location_json.AddMember("location", location_name, document->GetAllocator());
       location_json.AddMember("count", static_cast<uint64_t>(location.second.size()),
@@ -440,8 +440,7 @@ void ImpalaHttpHandler::SessionsHandler(const Webserver::ArgumentMap& args,
            server_->session_state_map_) {
     shared_ptr<ImpalaServer::SessionState> state = session.second;
     Value session_json(kObjectType);
-    Value type(PrintTSessionType(state->session_type).c_str(),
-        document->GetAllocator());
+    Value type(PrintThriftEnum(state->session_type).c_str(), document->GetAllocator());
     session_json.AddMember("type", type, document->GetAllocator());
 
     session_json.AddMember("inflight_queries",
@@ -459,7 +458,7 @@ void ImpalaHttpHandler::SessionsHandler(const Webserver::ArgumentMap& args,
     Value session_id(PrintId(session.first).c_str(), document->GetAllocator());
     session_json.AddMember("session_id", session_id, document->GetAllocator());
 
-    Value network_address(lexical_cast<string>(state->network_address).c_str(),
+    Value network_address(TNetworkAddressToString(state->network_address).c_str(),
         document->GetAllocator());
     session_json.AddMember("network_address", network_address, document->GetAllocator());
 

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/impala-internal-service.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-internal-service.cc b/be/src/service/impala-internal-service.cc
index 5be8765..53a62da 100644
--- a/be/src/service/impala-internal-service.cc
+++ b/be/src/service/impala-internal-service.cc
@@ -41,7 +41,8 @@ ImpalaInternalService::ImpalaInternalService() {
 
 void ImpalaInternalService::ExecQueryFInstances(TExecQueryFInstancesResult& return_val,
     const TExecQueryFInstancesParams& params) {
-  VLOG_QUERY << "ExecQueryFInstances():" << " query_id=" << params.query_ctx.query_id;
+  VLOG_QUERY << "ExecQueryFInstances():" << " query_id=" <<
+      PrintId(params.query_ctx.query_id);
   FAULT_INJECTION_RPC_DELAY(RPC_EXECQUERYFINSTANCES);
   DCHECK(params.__isset.coord_state_idx);
   DCHECK(params.__isset.query_ctx);
@@ -53,14 +54,14 @@ void ImpalaInternalService::ExecQueryFInstances(TExecQueryFInstancesResult& retu
 template <typename T> void SetUnknownIdError(
     const string& id_type, const TUniqueId& id, T* status_container) {
   Status status(ErrorMsg(TErrorCode::INTERNAL_ERROR,
-      Substitute("Unknown $0 id: $1", id_type, lexical_cast<string>(id))));
+      Substitute("Unknown $0 id: $1", id_type, PrintId(id))));
   status.SetTStatus(status_container);
 }
 
 void ImpalaInternalService::CancelQueryFInstances(
     TCancelQueryFInstancesResult& return_val,
     const TCancelQueryFInstancesParams& params) {
-  VLOG_QUERY << "CancelQueryFInstances(): query_id=" << params.query_id;
+  VLOG_QUERY << "CancelQueryFInstances(): query_id=" << PrintId(params.query_id);
   FAULT_INJECTION_RPC_DELAY(RPC_CANCELQUERYFINSTANCES);
   DCHECK(params.__isset.query_id);
   QueryState::ScopedRef qs(params.query_id);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/impala-server.cc
----------------------------------------------------------------------
diff --git a/be/src/service/impala-server.cc b/be/src/service/impala-server.cc
index 829ac79..811abab 100644
--- a/be/src/service/impala-server.cc
+++ b/be/src/service/impala-server.cc
@@ -492,16 +492,16 @@ Status ImpalaServer::LogAuditRecord(const ClientRequestState& request_state,
   if (request.stmt_type == TStmtType::DDL) {
     if (request.catalog_op_request.op_type == TCatalogOpType::DDL) {
       writer.String(
-          PrintTDdlType(request.catalog_op_request.ddl_params.ddl_type).c_str());
+          PrintThriftEnum(request.catalog_op_request.ddl_params.ddl_type).c_str());
     } else {
-      writer.String(PrintTCatalogOpType(request.catalog_op_request.op_type).c_str());
+      writer.String(PrintThriftEnum(request.catalog_op_request.op_type).c_str());
     }
   } else {
-    writer.String(PrintTStmtType(request.stmt_type).c_str());
+    writer.String(PrintThriftEnum(request.stmt_type).c_str());
   }
   writer.String("network_address");
-  writer.String(
-      lexical_cast<string>(request_state.session()->network_address).c_str());
+  writer.String(TNetworkAddressToString(
+      request_state.session()->network_address).c_str());
   writer.String("sql_statement");
   string stmt = replace_all_copy(request_state.sql_stmt(), "\n", " ");
   Redact(&stmt);
@@ -513,7 +513,7 @@ Status ImpalaServer::LogAuditRecord(const ClientRequestState& request_state,
     writer.String("name");
     writer.String(event.name.c_str());
     writer.String("object_type");
-    writer.String(PrintTCatalogObjectType(event.object_type).c_str());
+    writer.String(PrintThriftEnum(event.object_type).c_str());
     writer.String("privilege");
     writer.String(event.privilege.c_str());
     writer.EndObject();
@@ -758,7 +758,7 @@ void ImpalaServer::ArchiveQuery(const ClientRequestState& query) {
   // FLAGS_log_query_to_file will have been set to false
   if (FLAGS_log_query_to_file) {
     stringstream ss;
-    ss << UnixMillis() << " " << query.query_id() << " " << encoded_profile_str;
+    ss << UnixMillis() << " " << PrintId(query.query_id()) << " " << encoded_profile_str;
     status = profile_logger_->AppendEntry(ss.str());
     if (!status.ok()) {
       LOG_EVERY_N(WARNING, 1000) << "Could not write to profile log file file ("
@@ -797,7 +797,7 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
   Status status = exec_env_->request_pool_service()->ResolveRequestPool(*ctx,
       &resolved_pool);
   if (!status.ok()) {
-    VLOG_RPC << "Not adding pool query options for query=" << ctx->query_id
+    VLOG_RPC << "Not adding pool query options for query=" << PrintId(ctx->query_id)
              << " ResolveRequestPool status: " << status.GetDetail();
     return;
   }
@@ -806,7 +806,7 @@ void ImpalaServer::AddPoolConfiguration(TQueryCtx* ctx,
   TPoolConfig config;
   status = exec_env_->request_pool_service()->GetPoolConfig(resolved_pool, &config);
   if (!status.ok()) {
-    VLOG_RPC << "Not adding pool query options for query=" << ctx->query_id
+    VLOG_RPC << "Not adding pool query options for query=" << PrintId(ctx->query_id)
              << " GetConfigPool status: " << status.GetDetail();
     return;
   }
@@ -1012,7 +1012,7 @@ Status ImpalaServer::SetQueryInflight(shared_ptr<SessionState> session_state,
 
 Status ImpalaServer::UnregisterQuery(const TUniqueId& query_id, bool check_inflight,
     const Status* cause) {
-  VLOG_QUERY << "UnregisterQuery(): query_id=" << query_id;
+  VLOG_QUERY << "UnregisterQuery(): query_id=" << PrintId(query_id);
 
   RETURN_IF_ERROR(CancelInternal(query_id, check_inflight, cause));
 
@@ -1206,7 +1206,7 @@ void ImpalaServer::ReportExecStatus(
 
 void ImpalaServer::TransmitData(
     TTransmitDataResult& return_val, const TTransmitDataParams& params) {
-  VLOG_ROW << "TransmitData(): instance_id=" << params.dest_fragment_instance_id
+  VLOG_ROW << "TransmitData(): instance_id=" << PrintId(params.dest_fragment_instance_id)
            << " node_id=" << params.dest_node_id
            << " #rows=" << params.row_batch.num_rows
            << " sender_id=" << params.sender_id
@@ -1306,14 +1306,14 @@ void ImpalaServer::CancelFromThreadPool(uint32_t thread_id,
     Status status = UnregisterQuery(cancellation_work.query_id(), true,
         &cancellation_work.cause());
     if (!status.ok()) {
-      VLOG_QUERY << "Query de-registration (" << cancellation_work.query_id()
+      VLOG_QUERY << "Query de-registration (" << PrintId(cancellation_work.query_id())
                  << ") failed";
     }
   } else {
     Status status = CancelInternal(cancellation_work.query_id(), true,
         &cancellation_work.cause());
     if (!status.ok()) {
-      VLOG_QUERY << "Query cancellation (" << cancellation_work.query_id()
+      VLOG_QUERY << "Query cancellation (" << PrintId(cancellation_work.query_id())
                  << ") did not succeed: " << status.GetDetail();
     }
   }
@@ -1628,7 +1628,7 @@ void ImpalaServer::MembershipCallback(
         stringstream cause_msg;
         cause_msg << "Cancelled due to unreachable impalad(s): ";
         for (int i = 0; i < cancellation_entry->second.size(); ++i) {
-          cause_msg << cancellation_entry->second[i];
+          cause_msg << TNetworkAddressToString(cancellation_entry->second[i]);
           if (i + 1 != cancellation_entry->second.size()) cause_msg << ", ";
         }
         string cause_str = cause_msg.str();
@@ -1793,13 +1793,15 @@ void ImpalaServer::ConnectionEnd(
     connection_to_sessions_map_.erase(it);
   }
 
-  LOG(INFO) << "Connection from client " << connection_context.network_address
-            << " closed, closing " << sessions_to_close.size() << " associated session(s)";
+  LOG(INFO) << "Connection from client "
+            << TNetworkAddressToString(connection_context.network_address)
+            << " closed, closing " << sessions_to_close.size()
+            << " associated session(s)";
 
   for (const TUniqueId& session_id: sessions_to_close) {
     Status status = CloseSessionInternal(session_id, true);
     if (!status.ok()) {
-      LOG(WARNING) << "Error closing session " << session_id << ": "
+      LOG(WARNING) << "Error closing session " << PrintId(session_id) << ": "
                    << status.GetDetail();
     }
   }
@@ -1853,7 +1855,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
         int64_t last_accessed_ms = session_state.second->last_accessed_ms;
         int64_t session_timeout_ms = session_state.second->session_timeout * 1000;
         if (now - last_accessed_ms <= session_timeout_ms) continue;
-        LOG(INFO) << "Expiring session: " << session_state.first << ", user:"
+        LOG(INFO) << "Expiring session: " << PrintId(session_state.first) << ", user:"
                   << session_state.second->connected_user << ", last active: "
                   << ToStringFromUnixMillis(last_accessed_ms);
         session_state.second->expired = true;
@@ -1908,7 +1910,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
         // If the query time limit expired, we must cancel the query.
         if (expiration_event->kind == ExpirationKind::EXEC_TIME_LIMIT) {
           int32_t exec_time_limit_s = query_state->query_options().exec_time_limit_s;
-          VLOG_QUERY << "Expiring query " << expiration_event->query_id
+          VLOG_QUERY << "Expiring query " << PrintId(expiration_event->query_id)
                      << " due to execution time limit of " << exec_time_limit_s << "s.";
           const string& err_msg = Substitute(
               "Query $0 expired due to execution time limit of $1",
@@ -1951,7 +1953,7 @@ void ImpalaServer::UnregisterSessionTimeout(int32_t session_timeout) {
           // Otherwise time to expire this query
           VLOG_QUERY
               << "Expiring query due to client inactivity: "
-              << expiration_event->query_id << ", last activity was at: "
+              << PrintId(expiration_event->query_id) << ", last activity was at: "
               << ToStringFromUnixMillis(query_state->last_active_ms());
           const string& err_msg = Substitute(
               "Query $0 expired due to client inactivity (timeout is $1)",
@@ -2026,7 +2028,7 @@ Status ImpalaServer::Start(int32_t thrift_be_port, int32_t beeswax_port,
 
   if (!FLAGS_is_coordinator) {
     LOG(INFO) << "Initialized executor Impala server on "
-              << ExecEnv::GetInstance()->backend_address();
+              << TNetworkAddressToString(ExecEnv::GetInstance()->backend_address());
   } else {
     // Initialize the client servers.
     boost::shared_ptr<ImpalaServer> handler = shared_from_this();
@@ -2084,7 +2086,7 @@ Status ImpalaServer::Start(int32_t thrift_be_port, int32_t beeswax_port,
     }
   }
   LOG(INFO) << "Initialized coordinator/executor Impala server on "
-      << ExecEnv::GetInstance()->backend_address();
+      << TNetworkAddressToString(ExecEnv::GetInstance()->backend_address());
 
   // Start the RPC services.
   RETURN_IF_ERROR(exec_env_->StartKrpcService());
@@ -2141,7 +2143,7 @@ void ImpalaServer::UpdateFilter(TUpdateFilterResult& result,
   shared_ptr<ClientRequestState> client_request_state =
       GetClientRequestState(params.query_id);
   if (client_request_state.get() == nullptr) {
-    LOG(INFO) << "Could not find client request state: " << params.query_id;
+    LOG(INFO) << "Could not find client request state: " << PrintId(params.query_id);
     return;
   }
   client_request_state->coord()->UpdateFilter(params);

http://git-wip-us.apache.org/repos/asf/impala/blob/e114778e/be/src/service/query-options-test.cc
----------------------------------------------------------------------
diff --git a/be/src/service/query-options-test.cc b/be/src/service/query-options-test.cc
index c684b4a..ed8d986 100644
--- a/be/src/service/query-options-test.cc
+++ b/be/src/service/query-options-test.cc
@@ -366,7 +366,7 @@ TEST(QueryOptions, MapOptionalDefaultlessToEmptyString) {
   EXPECT_EQ(map["COMPRESSION_CODEC"], "");
   EXPECT_EQ(map["MT_DOP"], "");
   // Has defaults
-  EXPECT_EQ(map["EXPLAIN_LEVEL"], "1");
+  EXPECT_EQ(map["EXPLAIN_LEVEL"], "STANDARD");
 }
 
 /// Overlay a with b. batch_size is set in both places.


[08/20] impala git commit: Ignore "IMPALA-6887: Typo in authz-policy.ini.template"

Posted by sa...@apache.org.
Ignore "IMPALA-6887: Typo in authz-policy.ini.template"

Change-Id: Ic0fc3598a72b515d1dee6323639e15d3c40cb4a3
Reviewed-on: http://gerrit.cloudera.org:8080/10137
Reviewed-by: Alex Behm <al...@cloudera.com>
Tested-by: Alex Behm <al...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/db152944
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/db152944
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/db152944

Branch: refs/heads/2.x
Commit: db152944783a1a7e63e8d60ca3011c1dd8bafd02
Parents: 423ab36
Author: Fredy wijaya <fw...@cloudera.com>
Authored: Fri Apr 20 09:07:02 2018 -0700
Committer: Alex Behm <al...@cloudera.com>
Committed: Fri Apr 20 16:25:30 2018 +0000

----------------------------------------------------------------------
 bin/ignored_commits.json | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/db152944/bin/ignored_commits.json
----------------------------------------------------------------------
diff --git a/bin/ignored_commits.json b/bin/ignored_commits.json
index dcf30e3..30d7b5a 100644
--- a/bin/ignored_commits.json
+++ b/bin/ignored_commits.json
@@ -16,7 +16,10 @@
       { "hash": "6b18b00310fbdae64bc3d8aec0b58385884fe76f",
         "comment": "Not applicable to 2.x; new HBase splitter code there.  IMPALA-6776: Increase region move timeout." },
       { "hash": "5417e712f33dfb963b4646723ec173a5f3c4f49b",
-        "comment": "Only applies to master; Existing privileges available at server level." }
+        "comment": "Only applies to master; Existing privileges available at server level." },
+      { "hash": "08a1a7a967f19e41e8523a9020320bbad5cdac77",
+        "comment": "Not applicable to 2.x: Fix typo in authz-policy.ini.template." }
+
     ]
   }
 ]


[19/20] impala git commit: IMPALA-6869: [DOCS] Update Known Issues doc for 2.12

Posted by sa...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/8dd465f6/docs/topics/impala_known_issues.xml
----------------------------------------------------------------------
diff --git a/docs/topics/impala_known_issues.xml b/docs/topics/impala_known_issues.xml
index a09188e..47e0c5c 100644
--- a/docs/topics/impala_known_issues.xml
+++ b/docs/topics/impala_known_issues.xml
@@ -38,22 +38,26 @@ under the License.
   <conbody>
 
     <p>
-      The following sections describe known issues and workarounds in Impala, as of the current production release. This page summarizes the
-      most serious or frequently encountered issues in the current release, to help you make planning decisions about installing and
-      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues site, where you can see the diagnosis and
-      whether a fix is in the pipeline.
+      The following sections describe known issues and workarounds in Impala, as of the current
+      production release. This page summarizes the most serious or frequently encountered issues
+      in the current release, to help you make planning decisions about installing and
+      upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
+      site, where you can see the diagnosis and whether a fix is in the pipeline.
     </p>
 
     <note>
-      The online issue tracking system for Impala contains comprehensive information and is updated in real time. To verify whether an issue
-      you are experiencing has already been reported, or which release an issue is fixed in, search on the
-      <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org JIRA tracker</xref>.
+      The online issue tracking system for Impala contains comprehensive information and is
+      updated in real time. To verify whether an issue you are experiencing has already been
+      reported, or which release an issue is fixed in, search on the
+      <xref href="https://issues.apache.org/jira/" scope="external" format="html">issues.apache.org
+      JIRA tracker</xref>.
     </note>
 
     <p outputclass="toc inpage"/>
 
     <p>
-      For issues fixed in various Impala releases, see <xref href="impala_fixed_issues.xml#fixed_issues"/>.
+      For issues fixed in various Impala releases, see
+      <xref href="impala_fixed_issues.xml#fixed_issues"/>.
     </p>
 
 <!-- Use as a template for new issues.
@@ -73,62 +77,6 @@ under the License.
 
   </conbody>
 
-<!-- New known issues for Impala 2.3.
-
-Title: Server-to-server SSL and Kerberos do not work together
-Description: If server<->server SSL is enabled (with ssl_client_ca_certificate), and Kerberos auth is used between servers, the cluster will fail to start.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2598
-Severity: Medium.  Server-to-server SSL is practically unusable but this is a new feature.
-Workaround: No known workaround.
-
-Title: Queries may hang on server-to-server exchange errors
-Description: The DataStreamSender::Channel::CloseInternal() does not close the channel on an error. This will cause the node on the other side of the channel to wait indefinitely causing a hang.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2592
-Severity: Low.  This does not occur frequently.
-Workaround: No known workaround.
-
-Title: Catalogd may crash when loading metadata for tables with many partitions, many columns and with incremental stats
-Description: Incremental stats use up about 400 bytes per partition X column.  So for a table with 20K partitions and 100 columns this is about 800 MB.  When serialized this goes past the 2 GB Java array size limit and leads to a catalog crash.
-Upstream & Internal JIRAs: https://issues.apache.org/jira/browse/IMPALA-2648, IMPALA-2647, IMPALA-2649.
-Severity: Low.  This does not occur frequently.
-Workaround:  Reduce the number of partitions.
-
-More from the JIRA report of blocker/critical issues:
-
-IMPALA-2093
-Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate
-IMPALA-1652
-Incorrect results with basic predicate on CHAR typed column.
-IMPALA-1459
-Incorrect assignment of predicates through an outer join in an inline view.
-IMPALA-2665
-Incorrect assignment of On-clause predicate inside inline view with an outer join.
-IMPALA-2603
-Crash: impala::Coordinator::ValidateCollectionSlots
-IMPALA-2375
-Fix issues with the legacy join and agg nodes using enable_partitioned_hash_join=false and enable_partitioned_aggregation=false
-IMPALA-1862
-Invalid bool value not reported as a scanner error
-IMPALA-1792
-ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)
-IMPALA-1578
-Impala incorrectly handles text data when the new line character \n\r is split between different HDFS block
-IMPALA-2643
-Duplicated column in inline view causes dropping null slots during scan
-IMPALA-2005
-A failed CTAS does not drop the table if the insert fails.
-IMPALA-1821
-Casting scenarios with invalid/inconsistent results
-
-Another list from Alex, of correctness problems with predicates; might overlap with ones I already have:
-
-https://issues.apache.org/jira/browse/IMPALA-2665 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2643 - Already have
-https://issues.apache.org/jira/browse/IMPALA-1459 - Already have
-https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
-
--->
-
   <concept id="known_issues_startup">
 
     <title>Impala Known Issues: Startup</title>
@@ -136,42 +84,60 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues can prevent one or more Impala-related daemons
-        from starting properly.
+        These issues can prevent one or more Impala-related daemons from starting properly.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4978">
+
       <title id="IMPALA-5253">Problem retrieving FQDN causes startup problem on kerberized clusters</title>
+
       <conbody>
+
         <p>
           The method Impala uses to retrieve the host name while constructing the Kerberos
-          principal is the <codeph>gethostname()</codeph> system call. This function might
-          not always return the fully qualified domain name, depending on the network
-          configuration. If the daemons cannot determine the FQDN, Impala does not start
-          on a kerberized cluster.
+          principal is the <codeph>gethostname()</codeph> system call. This function might not
+          always return the fully qualified domain name, depending on the network configuration.
+          If the daemons cannot determine the FQDN, Impala does not start on a kerberized
+          cluster.
         </p>
+
         <p>
           This problem might occur immediately after an upgrade of a CDH cluster, due to changes
-          in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically to
-          the Impala-related daemons. (See the issue <q>hostname parameter is not passed to Impala catalog role</q>
-          at <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the Cloudera Manager Known Issues page</xref>.)
+          in Cloudera Manager that supplies the <codeph>--hostname</codeph> flag automatically
+          to the Impala-related daemons. (See the issue <q>hostname parameter is not passed to
+          Impala catalog role</q> at
+          <xref href="https://www.cloudera.com/documentation/enterprise/release-notes/topics/cm_rn_known_issues.html" scope="external" format="html">the
+          Cloudera Manager Known Issues page</xref>.)
         </p>
-        <p><b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> The issue is expected to occur less frequently on systems
-          with fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>, <xref keyref="IMPALA-5253">IMPALA-5253</xref>,
-          or both. Even on systems with fixes for both of these issues, the workaround might still
-          be required in some cases.
+
+        <p>
+          <b>Bugs:</b> <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
+          <xref keyref="IMPALA-5253">IMPALA-5253</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
         </p>
-        <p><b>Workaround:</b> Test if a host is affected by checking whether the output of the
-          <cmdname>hostname</cmdname> command includes the FQDN. On hosts where <cmdname>hostname</cmdname>
-          only returns the short name, pass the command-line flag
-          <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph>
-          in the startup options of all Impala-related daemons.
+
+        <p>
+          <b>Resolution:</b> The issue is expected to occur less frequently on systems with
+          fixes for <xref keyref="IMPALA-4978">IMPALA-4978</xref>,
+          <xref keyref="IMPALA-5253">IMPALA-5253</xref>, or both. Even on systems with fixes for
+          both of these issues, the workaround might still be required in some cases.
         </p>
+
+        <p>
+          <b>Workaround:</b> Test if a host is affected by checking whether the output of the
+          <cmdname>hostname</cmdname> command includes the FQDN. On hosts where
+          <cmdname>hostname</cmdname> only returns the short name, pass the command-line flag
+          <codeph>--hostname=<varname>fully_qualified_domain_name</varname></codeph> in the
+          startup options of all Impala-related daemons.
+        </p>
+
       </conbody>
+
     </concept>
 
   </concept>
@@ -188,23 +154,100 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
     </conbody>
 
+    <concept id="impala-6841">
+
+      <title>Unable to view large catalog objects in catalogd Web UI</title>
+
+      <conbody>
+
+        <p>
+          In <codeph>catalogd</codeph> Web UI, you can list metadata objects and view their
+          details. These details are accessed via a link and printed to a string formatted using
+          thrift's <codeph>DebugProtocol</codeph>. Printing large objects (> 1 GB) in Web UI can
+          crash <codeph>catalogd</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-6841">IMPALA-6841</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="impala-6389">
+
+      <title><b>Crash when querying tables with "\0" as a row delimiter</b></title>
+
+      <conbody>
+
+        <p>
+          When querying a textfile-based Impala table that uses <codeph>\0</codeph> as a new
+          line separator, Impala crashes.
+        </p>
+
+        <p>
+          The following sequence causes <codeph>impalad</codeph> to crash:
+        </p>
+
+<pre>create table tab_separated(id bigint, s string, n int, t timestamp, b boolean)
+  row format delimited
+  fields terminated by '\t' escaped by '\\' lines terminated by '\000'
+  stored as textfile;
+select * from tab_separated; -- Done. 0 results.
+insert into tab_separated (id, s) values (100, ''); -- Success.
+select * from tab_separated; -- 20 second delay before getting "Cancelled due to unreachable impalad(s): xxxx:22000"</pre>
+
+        <p>
+          <b>Bug:</b>
+          <xref keyref="IMPALA-6389" scope="external" format="html"
+            >IMPALA-6389</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use an alternative delimiter, e.g. <codeph>\001</codeph>.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-4828">
+
       <title>Altering Kudu table schema outside of Impala may result in crash on read</title>
+
       <conbody>
+
         <p>
-          Creating a table in Impala, changing the column schema outside of Impala,
-          and then reading again in Impala may result in a crash. Neither Impala nor
-          the Kudu client validates the schema immediately before reading, so Impala may attempt to
-          dereference pointers that aren't there. This happens if a string column is dropped
-          and then a new, non-string column is added with the old string column's name.
+          Creating a table in Impala, changing the column schema outside of Impala, and then
+          reading again in Impala may result in a crash. Neither Impala nor the Kudu client
+          validates the schema immediately before reading, so Impala may attempt to dereference
+          pointers that aren't there. This happens if a string column is dropped and then a new,
+          non-string column is added with the old string column's name.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> Run the statement <codeph>REFRESH <varname>table_name</varname></codeph>
-          after any occasion when the table structure, such as the number, names, and data types
-          of columns, are modified outside of Impala using the Kudu API.
+
+        <p>
+          <b>Bug:</b>
+          <xref keyref="IMPALA-4828" scope="external" format="html">IMPALA-4828</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Run the statement <codeph>REFRESH
+          <varname>table_name</varname></codeph> after any occasion when the table structure,
+          such as the number, names, and data types of columns, are modified outside of Impala
+          using the Kudu API.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-1972" rev="IMPALA-1972">
@@ -214,10 +257,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Trying to get the details of a query through the debug web page
-          while the query is planning will block new queries that had not
-          started when the web page was requested. The web UI becomes
-          unresponsive until the planning phase is finished.
+          Trying to get the details of a query through the debug web page while the query is
+          planning will block new queries that had not started when the web page was requested.
+          The web UI becomes unresponsive until the planning phase is finished.
         </p>
 
         <p>
@@ -228,22 +270,44 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala290"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4595">
+
       <title>Linking IR UDF module to main module crashes Impala</title>
+
       <conbody>
+
         <p>
-          A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash
-          when executed.
+          A UDF compiled as an LLVM module (<codeph>.ll</codeph>) could cause a crash when
+          executed.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead of a
-          <codeph>.ll</codeph> IR module.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4595">IMPALA-4595</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Compile the external UDFs to a <codeph>.so</codeph> library instead
+          of a <codeph>.ll</codeph> IR module.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3069" rev="IMPALA-3069">
@@ -253,8 +317,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option, together with wide rows or large string values in
-          columns, could cause a memory allocation of more than 2 GB resulting in a crash.
+          Using a value in the millions for the <codeph>BATCH_SIZE</codeph> query option,
+          together with wide rows or large string values in columns, could cause a memory
+          allocation of more than 2 GB resulting in a crash.
         </p>
 
         <p>
@@ -265,7 +330,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
+        </p>
 
       </conbody>
 
@@ -278,7 +345,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Malformed Avro data, such as out-of-bounds integers or values in the wrong format, could cause a crash when queried.
+          Malformed Avro data, such as out-of-bounds integers or values in the wrong format,
+          could cause a crash when queried.
         </p>
 
         <p>
@@ -289,7 +357,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and <keyword keyref="impala262"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/> and
+          <keyword keyref="impala262"/>.
+        </p>
 
       </conbody>
 
@@ -302,8 +373,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the channel on an error. This causes the node on
-          the other side of the channel to wait indefinitely, causing a hang.
+          The <codeph>DataStreamSender::Channel::CloseInternal()</codeph> does not close the
+          channel on an error. This causes the node on the other side of the channel to wait
+          indefinitely, causing a hang.
         </p>
 
         <p>
@@ -325,15 +397,18 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala <codeph>CREATE FUNCTION</codeph> statement is
-          issued, the <cmdname>impalad</cmdname> daemon crashes.
+          If the JAR file corresponding to a Java UDF is removed from HDFS after the Impala
+          <codeph>CREATE FUNCTION</codeph> statement is issued, the <cmdname>impalad</cmdname>
+          daemon crashes.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2365">IMPALA-2365</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -353,30 +428,94 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
     </conbody>
 
+    <concept id="impala-6671">
+
+      <title>Metadata operations block read-only operations on unrelated tables</title>
+
+      <conbody>
+
+        <p>
+          Metadata operations that change the state of a table, like <codeph>COMPUTE
+          STATS</codeph> or <codeph>ALTER RECOVER PARTITIONS</codeph>, may delay metadata
+          propagation of unrelated unloaded tables triggered by statements like
+          <codeph>DESCRIBE</codeph> or <codeph>SELECT</codeph> queries.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-6671">IMPALA-6671</xref>
+        </p>
+
+      </conbody>
+
+    </concept>
+
+    <concept id="impala-5200">
+
+      <title>Profile timers not updated during long-running sort</title>
+
+      <conbody>
+
+        <p>
+          If you have a query plan with a long-running sort operation, e.g. minutes, the profile
+          timers are not updated to reflect the time spent in the sort until the sort starts
+          returning rows.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5200">IMPALA-5200</xref>
+        </p>
+
+        <p>
+          <b>Workaround:</b> Slow sorts can be identified by looking at "Peak Mem" in the
+          summary or "PeakMemoryUsage" in the profile. If a sort is consuming multiple GB of
+          memory per host, it will likely spend a significant amount of time sorting the data.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-3316">
+
       <title>Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</title>
+
       <conbody>
+
         <p>
-          The configuration setting <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph>
-          uses an underlying function that can be a bottleneck on high volume, highly concurrent
-          queries due to the use of a global lock while loading time zone information. This bottleneck
-          can cause slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount
-          of slowdown depends on factors such as the number of cores and number of threads involved in the query.
+          The configuration setting
+          <codeph>convert_legacy_hive_parquet_utc_timestamps=true</codeph> uses an underlying
+          function that can be a bottleneck on high volume, highly concurrent queries due to the
+          use of a global lock while loading time zone information. This bottleneck can cause
+          slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
+          slowdown depends on factors such as the number of cores and number of threads involved
+          in the query.
         </p>
+
         <note>
           <p>
-            The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within Parquet files that
-            were generated by Hive, and therefore require the on-the-fly timezone conversion processing.
+            The slowdown only occurs when accessing <codeph>TIMESTAMP</codeph> columns within
+            Parquet files that were generated by Hive, and therefore require the on-the-fly
+            timezone conversion processing.
           </p>
         </note>
-        <p><b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table represent dates only,
-          with no time portion, consider storing them as strings in <codeph>yyyy-MM-dd</codeph> format.
-          Impala implicitly converts such string values to <codeph>TIMESTAMP</codeph> in calls to date/time
-          functions.
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-3316">IMPALA-3316</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
         </p>
+
+        <p>
+          <b>Workaround:</b> If the <codeph>TIMESTAMP</codeph> values stored in the table
+          represent dates only, with no time portion, consider storing them as strings in
+          <codeph>yyyy-MM-dd</codeph> format. Impala implicitly converts such string values to
+          <codeph>TIMESTAMP</codeph> in calls to date/time functions.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-1480" rev="IMPALA-1480">
@@ -399,31 +538,37 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Workaround:</b> Run the DDL statement in Hive if the slowness is an issue.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
     </concept>
 
     <concept id="ki_file_handle_cache">
+
       <title>Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</title>
+
       <conbody>
+
         <p>
-          If a data file used by Impala is being continuously appended or
-          overwritten in place by an HDFS mechanism, such as <cmdname>hdfs dfs
-            -appendToFile</cmdname>, interaction with the file handle caching
-          feature in <keyword keyref="impala210_full"/> and higher could cause
-          short-circuit reads to sometimes be disabled on some DataNodes. When a
-          mismatch is detected between the cached file handle and a data block
-          that was rewritten because of an append, short-circuit reads are
-          turned off on the affected host for a 10-minute period.
+          If a data file used by Impala is being continuously appended or overwritten in place
+          by an HDFS mechanism, such as <cmdname>hdfs dfs -appendToFile</cmdname>, interaction
+          with the file handle caching feature in <keyword keyref="impala210_full"/> and higher
+          could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
+          mismatch is detected between the cached file handle and a data block that was
+          rewritten because of an append, short-circuit reads are turned off on the affected
+          host for a 10-minute period.
         </p>
+
         <p>
-          The possibility of encountering such an issue is the reason why the
-          file handle caching feature is currently turned off by default. See
-            <xref keyref="scalability_file_handle_cache"/> for information about
-          this feature and how to enable it.
+          The possibility of encountering such an issue is the reason why the file handle
+          caching feature is currently turned off by default. See
+          <xref keyref="scalability_file_handle_cache"/> for information about this feature and
+          how to enable it.
         </p>
+
         <p>
           <b>Bug:</b>
           <xref href="https://issues.apache.org/jira/browse/HDFS-12528"
@@ -434,31 +579,29 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
           <b>Severity:</b> High
         </p>
 
-        <p><b>Workaround:</b> Verify whether your ETL process is susceptible to
-          this issue before enabling the file handle caching feature. You can
-          set the <cmdname>impalad</cmdname> configuration option
-            <codeph>unused_file_handle_timeout_sec</codeph> to a time period
+        <p>
+          <b>Workaround:</b> Verify whether your ETL process is susceptible to this issue before
+          enabling the file handle caching feature. You can set the <cmdname>impalad</cmdname>
+          configuration option <codeph>unused_file_handle_timeout_sec</codeph> to a time period
           that is shorter than the HDFS setting
-            <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>.
-          (Keep in mind that the HDFS setting is in milliseconds while the
-          Impala setting is in seconds.)
+          <codeph>dfs.client.read.shortcircuit.streams.cache.expiry.ms</codeph>. (Keep in mind
+          that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
         </p>
 
         <p>
-          <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS
-          parameter <codeph>dfs.domain.socket.disable.interval.seconds</codeph>
-          to specify the amount of time that short circuit reads are disabled on
-          encountering an error. The default value is 10 minutes
-            (<codeph>600</codeph> seconds). It is recommended that you set
-            <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a
-          small value, such as <codeph>1</codeph> second, when using the file
-          handle cache. Setting <codeph>
-            dfs.domain.socket.disable.interval.seconds</codeph> to
-            <codeph>0</codeph> is not recommended as a non-zero interval
-          protects the system if there is a persistent problem with short
-          circuit reads.
+          <b>Resolution:</b> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
+          <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to specify the amount of
+          time that short circuit reads are disabled on encountering an error. The default value
+          is 10 minutes (<codeph>600</codeph> seconds). It is recommended that you set
+          <codeph>dfs.domain.socket.disable.interval.seconds</codeph> to a small value, such as
+          <codeph>1</codeph> second, when using the file handle cache. Setting <codeph>
+          dfs.domain.socket.disable.interval.seconds</codeph> to <codeph>0</codeph> is not
+          recommended as a non-zero interval protects the system if there is a persistent
+          problem with short circuit reads.
         </p>
+
       </conbody>
+
     </concept>
 
   </concept>
@@ -470,24 +613,41 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues affect the convenience of interacting directly with Impala, typically through the Impala shell or Hue.
+        These issues affect the convenience of interacting directly with Impala, typically
+        through the Impala shell or Hue.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4570">
+
       <title>Impala shell tarball is not usable on systems with setuptools versions where '0.7' is a substring of the full version string</title>
+
       <conbody>
+
         <p>
           For example, this issue could occur on a system using setuptools version 20.7.0.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Change to a setuptools version that does not have <codeph>0.7</codeph> as
-          a substring.
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4570">IMPALA-4570</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Change to a setuptools version that does not have
+          <codeph>0.7</codeph> as a substring.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3133" rev="IMPALA-3133">
@@ -497,9 +657,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Due to a timing condition in updating cached policy data from Sentry, the <codeph>SHOW</codeph> statements for Sentry roles could
-          sometimes display out-of-date role settings. Because Impala rechecks authorization for each SQL statement, this discrepancy does
-          not represent a security issue for other statements.
+          Due to a timing condition in updating cached policy data from Sentry, the
+          <codeph>SHOW</codeph> statements for Sentry roles could sometimes display out-of-date
+          role settings. Because Impala rechecks authorization for each SQL statement, this
+          discrepancy does not represent a security issue for other statements.
         </p>
 
         <p>
@@ -511,11 +672,10 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
         </p>
 
         <p>
-          <b>Resolution:</b> Fixes have been issued for some but not all Impala releases. Check the JIRA for details of fix releases.
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and
+          <keyword keyref="impala251"/>.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/> and <keyword keyref="impala251"/>.</p>
-
       </conbody>
 
     </concept>
@@ -527,7 +687,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they are already completed.
+          Simple <codeph>SELECT</codeph> queries show less than 100% progress even though they
+          are already completed.
         </p>
 
         <p>
@@ -547,8 +708,11 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
         <p conref="../shared/impala_common.xml#common/int_overflow_behavior" />
 
         <p>
-          <b>Bug:</b>
-          <xref keyref="IMPALA-3123">IMPALA-3123</xref>
+          <b>Bug:</b> <xref keyref="IMPALA-3123">IMPALA-3123</xref>
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
         </p>
 
       </conbody>
@@ -564,8 +728,8 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues affect applications that use the JDBC or ODBC APIs, such as business intelligence tools or custom-written applications
-        in languages such as Java or C++.
+        These issues affect applications that use the JDBC or ODBC APIs, such as business
+        intelligence tools or custom-written applications in languages such as Java or C++.
       </p>
 
     </conbody>
@@ -579,8 +743,9 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function calls must follow the same order as the
-          columns. For example, if data is fetched from column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
+          If the ODBC <codeph>SQLGetData</codeph> is called on a series of columns, the function
+          calls must follow the same order as the columns. For example, if data is fetched from
+          column 2 then column 1, the <codeph>SQLGetData</codeph> call for column 1 returns
           <codeph>NULL</codeph>.
         </p>
 
@@ -605,31 +770,78 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues relate to security features, such as Kerberos authentication, Sentry authorization, encryption, auditing, and
-        redaction.
+        These issues relate to security features, such as Kerberos authentication, Sentry
+        authorization, encryption, auditing, and redaction.
       </p>
 
     </conbody>
 
+    <concept id="impala-4712">
+
+      <title>Transient kerberos authentication error during table loading</title>
+
+      <conbody>
+
+        <p>
+          A transient Kerberos error can cause a table to get into a bad state with an error:
+          <codeph>Failed to load metadata for table</codeph>.
+        </p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4712">IMPALA-4712</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Resolve the Kerberos authentication problem and run
+          <codeph>INVALIDATE METADATA</codeph> on the affected table.
+        </p>
+
+      </conbody>
+
+    </concept>
+
     <concept id="IMPALA-5638">
+
       <title>Malicious user can gain unauthorized access to Kudu table data via Impala</title>
+
       <conbody>
+
         <p>
-          A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access any
-          other Kudu table data by altering the table properties to make it <q>external</q>
-          and then changing the underlying table mapping to point to other Kudu tables.
-          This violates and works around the authorization requirement that creating a
-          Kudu external table via Impala requires an <codeph>ALL</codeph> privilege at the server scope.
-          This privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely avoid
-          this scenario where a malicious user can change the underlying Kudu table
-          mapping. The fix is to enforce the same privilege requirement for <codeph>ALTER</codeph>
-          commands that would make existing non-external Kudu tables external.
+          A malicious user with <codeph>ALTER</codeph> permissions on an Impala table can access
+          any other Kudu table data by altering the table properties to make it <q>external</q>
+          and then changing the underlying table mapping to point to other Kudu tables. This
+          violates and works around the authorization requirement that creating a Kudu external
+          table via Impala requires an <codeph>ALL</codeph> privilege at the server scope. This
+          privilege requirement for <codeph>CREATE</codeph> commands is enforced to precisely
+          avoid this scenario where a malicious user can change the underlying Kudu table
+          mapping. The fix is to enforce the same privilege requirement for
+          <codeph>ALTER</codeph> commands that would make existing non-external Kudu tables
+          external.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph> permissions on Impala tables.</p>
-        <p><b>Resolution:</b> Upgrade to an Impala version containing the fix for <xref keyref="IMPALA-5638">IMPALA-5638</xref>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5638">IMPALA-5638</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> A temporary workaround is to revoke <codeph>ALTER</codeph>
+          permissions on Impala tables.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala2100"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="renewable_kerberos_tickets">
@@ -641,12 +853,13 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
       <conbody>
 
         <p>
-          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if Kerberos tickets are not renewable.
+          In a Kerberos environment, the <cmdname>impalad</cmdname> daemon might not start if
+          Kerberos tickets are not renewable.
         </p>
 
         <p>
-          <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure <filepath>krb5.conf</filepath> to request
-          renewable tickets.
+          <b>Workaround:</b> Configure your KDC to allow tickets to be renewed, and configure
+          <filepath>krb5.conf</filepath> to request renewable tickets.
         </p>
 
       </conbody>
@@ -685,22 +898,38 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
 
   </concept>
 
-<!--
-  <concept id="known_issues_supportability">
+  <concept id="impala-6726">
 
-    <title id="ki_supportability">Impala Known Issues: Supportability</title>
+    <title>Catalog server's kerberos ticket gets deleted after 'ticket_lifetime' on SLES11</title>
 
     <conbody>
 
       <p>
-        These issues affect the ability to debug and troubleshoot Impala, such as incorrect output in query profiles or the query state
-        shown in monitoring applications.
+        On SLES11, after 'ticket_lifetime', the kerberos ticket gets deleted by the Java krb5
+        library.
+      </p>
+
+      <p>
+        <b>Bug:</b> <xref keyref="IMPALA-6726"/>
+      </p>
+
+      <p>
+        <b>Severity:</b> High
+      </p>
+
+      <p>
+        <b>Workaround:</b> On Impala 2.11.0, set <codeph>--use_kudu_kinit=false</codeph> in
+        Impala startup flag.
+      </p>
+
+      <p>
+        On Impala 2.12.0, set <codeph>--use_kudu_kinit=false</codeph> and
+        <codeph>--use_krpc=false</codeph> in Impala startup flags.
       </p>
 
     </conbody>
 
   </concept>
--->
 
   <concept id="known_issues_resources">
 
@@ -709,92 +938,156 @@ https://issues.apache.org/jira/browse/IMPALA-2144 - Don't have
     <conbody>
 
       <p>
-        These issues involve memory or disk usage, including out-of-memory conditions, the spill-to-disk feature, and resource management
-        features.
+        These issues involve memory or disk usage, including out-of-memory conditions, the
+        spill-to-disk feature, and resource management features.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-5605">
+
       <title>Configuration to prevent crashes caused by thread resource limits</title>
+
       <conbody>
+
         <p>
-          Impala could encounter a serious error due to resource usage under very high concurrency.
-          The error message is similar to:
+          Impala could encounter a serious error due to resource usage under very high
+          concurrency. The error message is similar to:
         </p>
+
 <codeblock><![CDATA[
 F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
 terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
 ]]>
 </codeblock>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b>
-          To prevent such errors, configure each host running an <cmdname>impalad</cmdname>
-          daemon with the following settings:
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5605">IMPALA-5605</xref>
         </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> To prevent such errors, configure each host running an
+          <cmdname>impalad</cmdname> daemon with the following settings:
+        </p>
+
 <codeblock>
 echo 2000000 > /proc/sys/kernel/threads-max
 echo 2000000 > /proc/sys/kernel/pid_max
 echo 8000000 > /proc/sys/vm/max_map_count
 </codeblock>
+
         <p>
-        Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
+          Add the following lines in <filepath>/etc/security/limits.conf</filepath>:
         </p>
+
 <codeblock>
 impala soft nproc 262144
 impala hard nproc 262144
 </codeblock>
+
       </conbody>
+
     </concept>
 
     <concept id="flatbuffers_mem_usage">
+
       <title>Memory usage when compact_catalog_topic flag enabled</title>
+
       <conbody>
+
         <p>
-          The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref>
-          can cause an increase in size of the updates to Impala catalog metadata
-          that are broadcast to the <cmdname>impalad</cmdname> daemons
-          by the <cmdname>statestored</cmdname> daemon.
-          The increase in catalog update topic size results in higher CPU and network
+          The efficiency improvement from <xref keyref="IMPALA-4029">IMPALA-4029</xref> can
+          cause an increase in size of the updates to Impala catalog metadata that are broadcast
+          to the <cmdname>impalad</cmdname> daemons by the <cmdname>statestored</cmdname>
+          daemon. The increase in catalog update topic size results in higher CPU and network
           utilization. By default, the increase in topic size is about 5-7%. If the
-          <codeph>compact_catalog_topic</codeph> flag is used, the
-          size increase is more substantial, with a topic size approximately twice as
-          large as in previous versions.
+          <codeph>compact_catalog_topic</codeph> flag is used, the size increase is more
+          substantial, with a topic size approximately twice as large as in previous versions.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref></p>
-        <p><b>Severity:</b> Medium</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-5500">IMPALA-5500</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> Medium
+        </p>
+
         <p>
-          <b>Workaround:</b> Consider setting the
-            <codeph>compact_catalog_topic</codeph> configuration setting to
-            <codeph>false</codeph> until this issue is resolved. </p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.</p>
+          <b>Workaround:</b> Consider setting the <codeph>compact_catalog_topic</codeph>
+          configuration setting to <codeph>false</codeph> until this issue is resolved.
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala210"/>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-2294">
+
       <title>Kerberos initialization errors due to high memory usage</title>
+
       <conbody>
+
         <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_intro"/>
-        <p><b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b></p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-2294">IMPALA-2294</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala211"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b>
+        </p>
+
         <p conref="../shared/impala_common.xml#common/vm_overcommit_memory_start" conrefend="../shared/impala_common.xml#common/vm_overcommit_memory_end"/>
+
       </conbody>
+
     </concept>
 
     <concept id="drop_table_purge_s3a">
+
       <title>DROP TABLE PURGE on S3A table may not delete externally written files</title>
+
       <conbody>
+
         <p>
-          A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data files
-          behind, if the table directory and the data files were created with a combination of
-          <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
+          A <codeph>DROP TABLE PURGE</codeph> statement against an S3 table could leave the data
+          files behind, if the table directory and the data files were created with a
+          combination of <cmdname>hadoop fs</cmdname> and <cmdname>aws s3</cmdname> commands.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> The underlying issue with the S3A connector depends on the resolution of <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-3558">IMPALA-3558</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> The underlying issue with the S3A connector depends on the
+          resolution of
+          <xref href="https://issues.apache.org/jira/browse/HADOOP-13230" format="html" scope="external">HADOOP-13230</xref>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="catalogd_heap">
@@ -804,27 +1097,30 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The default heap size for Impala <cmdname>catalogd</cmdname> has changed in <keyword keyref="impala25_full"/> and higher:
+          The default heap size for Impala <cmdname>catalogd</cmdname> has changed in
+          <keyword keyref="impala25_full"/> and higher:
         </p>
 
         <ul>
           <li>
             <p>
-              Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default heap size, which is the smaller of 1/4th of the
-              physical memory or 32 GB.
+              Previously, by default <cmdname>catalogd</cmdname> was using the JVM's default
+              heap size, which is the smaller of 1/4th of the physical memory or 32 GB.
             </p>
           </li>
 
           <li>
             <p>
-              Starting with <keyword keyref="impala250"/>, the default <cmdname>catalogd</cmdname> heap size is 4 GB.
+              Starting with <keyword keyref="impala250"/>, the default
+              <cmdname>catalogd</cmdname> heap size is 4 GB.
             </p>
           </li>
         </ul>
 
         <p>
-          For example, on a host with 128GB physical memory this will result in catalogd heap decreasing from 32GB to 4GB. This can result
-          in out-of-memory errors in catalogd and leading to query failures.
+          For example, on a host with 128GB physical memory this will result in catalogd heap
+          decreasing from 32GB to 4GB. This can result in out-of-memory errors in catalogd and
+          leading to query failures.
         </p>
 
         <p>
@@ -833,9 +1129,6 @@ impala hard nproc 262144
 
         <p>
           <b>Workaround:</b> Increase the <cmdname>catalogd</cmdname> memory limit as follows.
-<!-- See <xref href="impala_scalability.xml#scalability_catalog"/> for the procedure. -->
-<!-- Including full details here via conref, for benefit of PDF readers or anyone else
-             who might have trouble seeing or following the link. -->
         </p>
 
         <p conref="../shared/impala_common.xml#common/increase_catalogd_heap_size"/>
@@ -851,8 +1144,9 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The size of the breakpad minidump files grows linearly with the number of threads. By default, each thread adds 8 KB to the
-          minidump size. Minidump files could consume significant disk space when the daemons have a high number of threads.
+          The size of the breakpad minidump files grows linearly with the number of threads. By
+          default, each thread adds 8 KB to the minidump size. Minidump files could consume
+          significant disk space when the daemons have a high number of threads.
         </p>
 
         <p>
@@ -864,11 +1158,13 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> Add <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft upper limit on the
-          size of each minidump file. If the minidump file would exceed that limit, Impala reduces the amount of information for each thread
-          from 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB per thread after that.) The minidump
-          file can still grow larger than the <q>hinted</q> size. For example, if you have 10,000 threads, the minidump file can be more
-          than 20 MB.
+          <b>Workaround:</b> Add
+          <codeph>--minidump_size_limit_hint_kb=<varname>size</varname></codeph> to set a soft
+          upper limit on the size of each minidump file. If the minidump file would exceed that
+          limit, Impala reduces the amount of information for each thread from 8 KB to 2 KB.
+          (Full thread information is captured for the first 20 threads, then 2 KB per thread
+          after that.) The minidump file can still grow larger than the <q>hinted</q> size. For
+          example, if you have 10,000 threads, the minidump file can be more than 20 MB.
         </p>
 
       </conbody>
@@ -882,14 +1178,16 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak memory usage than in previous releases while reading
-          Parquet files.
+          The initial release of <keyword keyref="impala26_full"/> sometimes has a higher peak
+          memory usage than in previous releases while reading Parquet files.
         </p>
 
         <p>
-          <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the efficiency of Parquet scans by up to 2x. The faster scans
-          may result in a higher peak memory consumption compared to earlier versions of Impala due to the new column-wise row
-          materialization strategy. You are likely to experience higher memory consumption in any of the following scenarios:
+          <keyword keyref="impala26_full"/> addresses the issue IMPALA-2736, which improves the
+          efficiency of Parquet scans by up to 2x. The faster scans may result in a higher peak
+          memory consumption compared to earlier versions of Impala due to the new column-wise
+          row materialization strategy. You are likely to experience higher memory consumption
+          in any of the following scenarios:
           <ul>
             <li>
               <p>
@@ -899,14 +1197,15 @@ impala hard nproc 262144
 
             <li>
               <p>
-                Very large rows due to big column values, for example, long strings or nested collections with many items.
+                Very large rows due to big column values, for example, long strings or nested
+                collections with many items.
               </p>
             </li>
 
             <li>
               <p>
-                Producer/consumer speed imbalances, leading to more rows being buffered between a scan (producer) and downstream (consumer)
-                plan nodes.
+                Producer/consumer speed imbalances, leading to more rows being buffered between
+                a scan (producer) and downstream (consumer) plan nodes.
               </p>
             </li>
           </ul>
@@ -921,10 +1220,16 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> The following query options might help to reduce memory consumption in the Parquet scanner:
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> The following query options might help to reduce memory consumption
+          in the Parquet scanner:
           <ul>
             <li>
-              Reduce the number of scanner threads, for example: <codeph>set num_scanner_threads=30</codeph>
+              Reduce the number of scanner threads, for example: <codeph>set
+              num_scanner_threads=30</codeph>
             </li>
 
             <li>
@@ -950,8 +1255,8 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          Some memory allocated by the JVM used internally by Impala is not counted against the memory limit for the
-          <cmdname>impalad</cmdname> daemon.
+          Some memory allocated by the JVM used internally by Impala is not counted against the
+          memory limit for the <cmdname>impalad</cmdname> daemon.
         </p>
 
         <p>
@@ -959,8 +1264,9 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname> command, or add the memory figures in the
-          Impala web UI <uicontrol>/memz</uicontrol> tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
+          <b>Workaround:</b> To monitor overall memory usage, use the <cmdname>top</cmdname>
+          command, or add the memory figures in the Impala web UI <uicontrol>/memz</uicontrol>
+          tab to JVM memory usage shown on the <uicontrol>/metrics</uicontrol> tab.
         </p>
 
       </conbody>
@@ -982,10 +1288,13 @@ impala hard nproc 262144
         </p>
 
         <p>
-          <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation mechanism if practical.
+          <b>Workaround:</b> Transition away from the <q>old-style</q> join and aggregation
+          mechanism if practical.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1000,88 +1309,145 @@ impala hard nproc 262144
     <conbody>
 
       <p>
-        These issues can cause incorrect or unexpected results from queries. They typically only arise in very specific circumstances.
+        These issues can cause incorrect or unexpected results from queries. They typically only
+        arise in very specific circumstances.
       </p>
 
     </conbody>
 
     <concept id="IMPALA-4539">
+
       <title>Parquet scanner memory bug: I/O buffer is attached to output batch while scratch batch rows still reference it</title>
+
 <!-- TSB-225 title: Possibly incorrect results when scanning uncompressed Parquet files with Impala. -->
+
       <conbody>
+
         <p>
-          Impala queries may return incorrect results when scanning plain-encoded string
-          columns in uncompressed Parquet files. I/O buffers holding the string data are
-          prematurely freed, leading to invalid memory reads and possibly
-          non-deterministic results. This does not affect Parquet files that use a
-          compression codec such as Snappy. Snappy is both strongly recommended generally
-          and the default choice for Impala-written Parquet files.
+          Impala queries may return incorrect results when scanning plain-encoded string columns
+          in uncompressed Parquet files. I/O buffers holding the string data are prematurely
+          freed, leading to invalid memory reads and possibly non-deterministic results. This
+          does not affect Parquet files that use a compression codec such as Snappy. Snappy is
+          both strongly recommended generally and the default choice for Impala-written Parquet
+          files.
         </p>
+
         <p>
           How to determine whether a query might be affected:
         </p>
+
         <ul>
           <li>
             The query must reference <codeph>STRING</codeph> columns from a Parquet table.
           </li>
+
           <li>
             A selective filter on the Parquet table makes this issue more likely.
           </li>
+
           <li>
-            Identify any uncompressed Parquet files processed by the query.
-            Examine the <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the
-            suspected table. Use a query that performs a full table scan, and materializes the column
-            values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM <varname>tablename</varname></codeph>.)
-            Look for <q>File Formats</q>. A value containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
+            Identify any uncompressed Parquet files processed by the query. Examine the
+            <codeph>HDFS_SCAN_NODE</codeph> portion of a query profile that scans the suspected
+            table. Use a query that performs a full table scan, and materializes the column
+            values. (For example, <codeph>SELECT MIN(<varname>colname</varname>) FROM
+            <varname>tablename</varname></codeph>.) Look for <q>File Formats</q>. A value
+            containing <codeph>PARQUET/NONE</codeph> means uncompressed Parquet.
           </li>
+
           <li>
-            Identify any plain-encoded string columns in the associated table. Pay special attention to tables
-            containing Parquet files generated through Hive, Spark, or other mechanisms outside of Impala,
-            because Impala uses Snappy compression by default for Parquet files. Use <codeph>parquet-tools</codeph>
-            to dump the file metadata. Note that a column could have several encodings within the same file (the column
-            data is stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output of
-            <codeph>parquet-tools</codeph>, which means the values are plain encoded.
+            Identify any plain-encoded string columns in the associated table. Pay special
+            attention to tables containing Parquet files generated through Hive, Spark, or other
+            mechanisms outside of Impala, because Impala uses Snappy compression by default for
+            Parquet files. Use <codeph>parquet-tools</codeph> to dump the file metadata. Note
+            that a column could have several encodings within the same file (the column data is
+            stored in several column chunks). Look for <codeph>VLE:PLAIN</codeph> in the output
+            of <codeph>parquet-tools</codeph>, which means the values are plain encoded.
           </li>
         </ul>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Upgrade to a version of Impala containing the fix for <xref keyref="IMPALA-4539">IMPALA-4539</xref>.</p>
-        <p><b>Workaround:</b> Use Snappy or another compression codec for Parquet files.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4539">IMPALA-4539</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Use Snappy or another compression codec for Parquet files.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4513">
+
       <title>ABS(n) where n is the lowest bound for the int types returns negative values</title>
+
       <conbody>
+
         <p>
-          If the <codeph>abs()</codeph> function evaluates a number that is right at the lower bound for
-          an integer data type, the positive result cannot be represented in the same type, and the
-          result is returned as a negative number. For example, <codeph>abs(-128)</codeph> returns -128
-          because the argument is interpreted as a <codeph>TINYINT</codeph> and the return value is also
-          a <codeph>TINYINT</codeph>.
+          If the <codeph>abs()</codeph> function evaluates a number that is right at the lower
+          bound for an integer data type, the positive result cannot be represented in the same
+          type, and the result is returned as a negative number. For example,
+          <codeph>abs(-128)</codeph> returns -128 because the argument is interpreted as a
+          <codeph>TINYINT</codeph> and the return value is also a <codeph>TINYINT</codeph>.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
-          <codeph>abs(<varname>tinyint_col</varname>)</codeph> as <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.</p>
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4513">IMPALA-4513</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Workaround:</b> Cast the integer value to a larger type. For example, rewrite
+          <codeph>abs(<varname>tinyint_col</varname>)</codeph> as
+          <codeph>abs(cast(<varname>tinyint_col</varname> as smallint))</codeph>.
+        </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-4266">
+
       <title>Java udf expression returning string in group by can give incorrect results.</title>
+
       <conbody>
+
         <p>
-          If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a string value,
-          the UDF could return an incorrect result.
+          If the <codeph>GROUP BY</codeph> clause included a call to a Java UDF that returned a
+          string value, the UDF could return an incorrect result.
         </p>
-        <p><b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref></p>
-        <p><b>Severity:</b> High</p>
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.</p>
-        <p><b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF with an
-          empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
+
+        <p>
+          <b>Bug:</b> <xref keyref="IMPALA-4266">IMPALA-4266</xref>
+        </p>
+
+        <p>
+          <b>Severity:</b> High
+        </p>
+
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala28_full"/> and higher.
+        </p>
+
+        <p>
+          <b>Workaround:</b> Rewrite the expression to concatenate the results of the Java UDF
+          with an empty string call. For example, rewrite <codeph>my_hive_udf()</codeph> as
           <codeph>concat(my_hive_udf(), '')</codeph>.
         </p>
+
       </conbody>
+
     </concept>
 
     <concept id="IMPALA-3084" rev="IMPALA-3084">
@@ -1091,8 +1457,9 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          A query could return wrong results (too many or too few <codeph>NULL</codeph> values) if it referenced an outer-joined nested
-          collection and also contained a null-checking predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
+          A query could return wrong results (too many or too few <codeph>NULL</codeph> values)
+          if it referenced an outer-joined nested collection and also contained a null-checking
+          predicate (<codeph>IS NULL</codeph>, <codeph>IS NOT NULL</codeph>, or the
           <codeph>&lt;=&gt;</codeph> operator) in the <codeph>WHERE</codeph> clause.
         </p>
 
@@ -1104,7 +1471,9 @@ impala hard nproc 262144
           <b>Severity:</b> High
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala270"/>.
+        </p>
 
       </conbody>
 
@@ -1117,8 +1486,8 @@ impala hard nproc 262144
       <conbody>
 
         <p>
-          An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a constant such as <codeph>FALSE</codeph> in
-          another join clause. For example:
+          An <codeph>OUTER JOIN</codeph> query could omit some expected result rows due to a
+          constant such as <codeph>FALSE</codeph> in another join clause. For example:
         </p>
 
 <codeblock><![CDATA[
@@ -1144,10 +1513,6 @@ explain SELECT 1 FROM alltypestiny a1
         </p>
 
         <p>
-          <b>Resolution:</b>
-        </p>
-
-        <p>
           <b>Workaround:</b>
         </p>
 
@@ -1174,8 +1539,8 @@ explain SELECT 1 FROM alltypestiny a1
 
           <li>
             <p>
-              The INNER JOIN has an On-clause with a predicate that references at least two tables that are on the nullable side of the
-              preceding OUTER JOINs.
+              The INNER JOIN has an On-clause with a predicate that references at least two
+              tables that are on the nullable side of the preceding OUTER JOINs.
             </p>
           </li>
         </ul>
@@ -1258,13 +1623,19 @@ on b.int_col = c.int_col;
         </p>
 
         <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala280"/>.
+        </p>
+
+        <p>
           <b>Workaround:</b> High
         </p>
 
         <p>
-          For some queries, this problem can be worked around by placing the problematic <codeph>ON</codeph> clause predicate in the
-          <codeph>WHERE</codeph> clause instead, or changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s (if
-          the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s). For example, to fix the problematic query above:
+          For some queries, this problem can be worked around by placing the problematic
+          <codeph>ON</codeph> clause predicate in the <codeph>WHERE</codeph> clause instead, or
+          changing the preceding <codeph>OUTER JOIN</codeph>s to <codeph>INNER JOIN</codeph>s
+          (if the <codeph>ON</codeph> clause predicate would discard <codeph>NULL</codeph>s).
+          For example, to fix the problematic query above:
         </p>
 
 <codeblock><![CDATA[
@@ -1340,7 +1711,8 @@ where b.int_col = c.int_col
       <conbody>
 
         <p>
-          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first. The parquet standard says it is MSB first.
+          Parquet <codeph>BIT_PACKED</codeph> encoding as implemented by Impala is LSB first.
+          The parquet standard says it is MSB first.
         </p>
 
         <p>
@@ -1348,8 +1720,8 @@ where b.int_col = c.int_col
         </p>
 
         <p>
-          <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used, is not written by Impala, and is deprecated
-          in Parquet 2.0.
+          <b>Severity:</b> High, but rare in practice because BIT_PACKED is infrequently used,
+          is not written by Impala, and is deprecated in Parquet 2.0.
         </p>
 
       </conbody>
@@ -1363,10 +1735,11 @@ where b.int_col = c.int_col
       <conbody>
 
         <p>
-          The calculation of start and end times for the BST (British Summer Time) time zone could be incorrect between 1972 and 1995.
-          Between 1972 and 1995, BST began and ended at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
-          third) and fourth Sunday in October. For example, both function calls should return 13, but actually return 12, in a query such
-          as:
+          The calculation of start and end times for the BST (British Summer Time) time zone
+          could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
+          at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+          third) and fourth Sunday in October. For example, both function calls should return
+          13, but actually return 12, in a query such as:
         </p>
 
 <codeblock>
@@ -1394,15 +1767,18 @@ select
       <conbody>
 
         <p>
-          If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph> function could return an incorrect value for
-          the hostname field.
+          If a URL contains an <codeph>@</codeph> character, the <codeph>parse_url()</codeph>
+          function could return an incorrect value for the hostname field.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-1170"></xref>IMPALA-1170
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
+          <keyword keyref="impala234"/>.
+        </p>
 
       </conbody>
 
@@ -1415,8 +1791,9 @@ select
       <conbody>
 
         <p>
-          If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an escaped <codeph>\%</codeph> character, it
-          does not match a <codeph>%</codeph> final character of the LHS argument.
+          If the final character in the RHS argument of a <codeph>LIKE</codeph> operator is an
+          escaped <codeph>\%</codeph> character, it does not match a <codeph>%</codeph> final
+          character of the LHS argument.
         </p>
 
         <p>
@@ -1434,8 +1811,9 @@ select
       <conbody>
 
         <p>
-          Because the value for <codeph>rand()</codeph> is computed early in a query, using an <codeph>ORDER BY</codeph> expression
-          involving a call to <codeph>rand()</codeph> does not actually randomize the results.
+          Because the value for <codeph>rand()</codeph> is computed early in a query, using an
+          <codeph>ORDER BY</codeph> expression involving a call to <codeph>rand()</codeph> does
+          not actually randomize the results.
         </p>
 
         <p>
@@ -1453,8 +1831,9 @@ select
       <conbody>
 
         <p>
-          If the same column is queried twice within a view, <codeph>NULL</codeph> values for that column are omitted. For example, the
-          result of <codeph>COUNT(*)</codeph> on the view could be less than expected.
+          If the same column is queried twice within a view, <codeph>NULL</codeph> values for
+          that column are omitted. For example, the result of <codeph>COUNT(*)</codeph> on the
+          view could be less than expected.
         </p>
 
         <p>
@@ -1465,7 +1844,10 @@ select
           <b>Workaround:</b> Avoid selecting the same column twice within an inline view.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala2210"/>.
+        </p>
 
       </conbody>
 
@@ -1480,15 +1862,19 @@ select
       <conbody>
 
         <p>
-          A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table references is an inline view might apply predicates
-          from the <codeph>ON</codeph> clause incorrectly.
+          A query involving an <codeph>OUTER JOIN</codeph> clause where one of the table
+          references is an inline view might apply predicates from the <codeph>ON</codeph>
+          clause incorrectly.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-1459">IMPALA-1459</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
+        </p>
 
       </conbody>
 
@@ -1501,8 +1887,8 @@ select
       <conbody>
 
         <p>
-          A query could encounter a serious error if includes multiple nested levels of <codeph>INNER JOIN</codeph> clauses involving
-          subqueries.
+          A query could encounter a serious error if includes multiple nested levels of
+          <codeph>INNER JOIN</codeph> clauses involving subqueries.
         </p>
 
         <p>
@@ -1520,7 +1906,8 @@ select
       <conbody>
 
         <p>
-          A query might return incorrect results due to wrong predicate assignment in the following scenario:
+          A query might return incorrect results due to wrong predicate assignment in the
+          following scenario:
         </p>
 
         <ol>
@@ -1533,8 +1920,8 @@ select
           </li>
 
           <li>
-            That join has an On-clause containing a predicate that only references columns originating from the outer-joined tables inside
-            the inline view
+            That join has an On-clause containing a predicate that only references columns
+            originating from the outer-joined tables inside the inline view
           </li>
         </ol>
 
@@ -1542,7 +1929,10 @@ select
           <b>Bug:</b> <xref keyref="IMPALA-2665">IMPALA-2665</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>, <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>,
+          <keyword keyref="impala232"/>, and <keyword keyref="impala229"/>.
+        </p>
 
       </conbody>
 
@@ -1555,15 +1945,18 @@ select
       <conbody>
 
         <p>
-          In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the comparison from the <codeph>HAVING</codeph>
-          clause might be applied at the wrong stage of query processing, leading to incorrect results.
+          In an <codeph>OUTER JOIN</codeph> query with a <codeph>HAVING</codeph> clause, the
+          comparison from the <codeph>HAVING</codeph> clause might be applied at the wrong stage
+          of query processing, leading to incorrect results.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2144">IMPALA-2144</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1576,15 +1969,18 @@ select
       <conbody>
 
         <p>
-          A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function, such as <codeph>NOT IN (SELECT
-          SUM(...))</codeph>, could return incorrect results.
+          A <codeph>NOT IN</codeph> operator with a subquery that calls an aggregate function,
+          such as <codeph>NOT IN (SELECT SUM(...))</codeph>, could return incorrect results.
         </p>
 
         <p>
           <b>Bug:</b> <xref keyref="IMPALA-2093">IMPALA-2093</xref>
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and <keyword keyref="impala234"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/> and
+          <keyword keyref="impala234"/>.
+        </p>
 
       </conbody>
 
@@ -1599,8 +1995,9 @@ select
     <conbody>
 
       <p>
-        These issues affect how Impala interacts with metadata. They cover areas such as the metastore database, the <codeph>COMPUTE
-        STATS</codeph> statement, and the Impala <cmdname>catalogd</cmdname> daemon.
+        These issues affect how Impala interacts with metadata. They cover areas such as the
+        metastore database, the <codeph>COMPUTE STATS</codeph> statement, and the Impala
+        <cmdname>catalogd</cmdname> daemon.
       </p>
 
     </conbody>
@@ -1612,9 +2009,11 @@ select
       <conbody>
 
         <p>
-          Incremental stats use up about 400 bytes per partition for each column. For example, for a table with 20K partitions and 100
-          columns, the memory overhead from incremental statistics is about 800 MB. When serialized for transmission across the network,
-          this metadata exceeds the 2 GB Java array size limit and leads to a <codeph>catalogd</codeph> crash.
+          Incremental stats use up about 400 bytes per partition for each column. For example,
+          for a table with 20K partitions and 100 columns, the memory overhead from incremental
+          statistics is about 800 MB. When serialized for transmission across the network, this
+          metadata exceeds the 2 GB Java array size limit and leads to a
+          <codeph>catalogd</codeph> crash.
         </p>
 
         <p>
@@ -1624,8 +2023,9 @@ select
         </p>
 
         <p>
-          <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing incremental stats for that table. The
-          scalability of incremental stats computation is a continuing work item.
+          <b>Workaround:</b> If feasible, compute full stats periodically and avoid computing
+          incremental stats for that table. The scalability of incremental stats computation is
+          a continuing work item.
         </p>
 
       </conbody>
@@ -1647,17 +2047,21 @@ select
         </p>
 
         <p>
-          <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics manually by setting the <codeph>numRows</codeph>, you must also
-          enable the Boolean property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement like the following to
-          set both properties with a single <codeph>ALTER TABLE</codeph> statement:
+          <b>Workaround:</b> On <keyword keyref="impala20"/>, when adjusting table statistics
+          manually by setting the <codeph>numRows</codeph>, you must also enable the Boolean
+          property <codeph>STATS_GENERATED_VIA_STATS_TASK</codeph>. For example, use a statement
+          like the following to set both properties with a single <codeph>ALTER TABLE</codeph>
+          statement:
         </p>
 
 <codeblock>ALTER TABLE <varname>table_name</varname> SET TBLPROPERTIES('numRows'='<varname>new_value</varname>', 'STATS_GENERATED_VIA_STATS_TASK' = 'true');</codeblock>
 
         <p>
           <b>Resolution:</b> The underlying cause is the issue
-          <xref href="https://issues.apache.org/jira/browse/HIVE-8648" scope="external" format="html">HIVE-8648</xref> that affects the
-          metastore in Hive 0.13. The workaround is only needed until the fix for this issue is incorporated into release of <keyword keyref="distro"/>.
+          <xref
+            href="https://issues.apache.org/jira/browse/HIVE-8648"
+            scope="external" format="html">HIVE-8648</xref>
+          that affects the metastore in Hive 0.13.
         </p>
 
       </conbody>
@@ -1673,8 +2077,8 @@ select
     <conbody>
 
       <p>
-        These issues affect the ability to interchange data between Impala and other database systems. They cover areas such as data types
-        and file formats.
+        These issues affect the ability to interchange data between Impala and other database
+        systems. They cover areas such as data types and file formats.
       </p>
 
     </conbody>
@@ -1688,26 +2092,32 @@ select
       <conbody>
 
         <p>
-          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when changing the Avro schema file by
-          adding or removing columns. Columns added to the schema file will not show up in the output of the <codeph>DESCRIBE
-          FORMATTED</codeph> command. Removing columns from the schema file will trigger a <codeph>NullPointerException</codeph>.
+          This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
+          changing the Avro schema file by adding or removing columns. Columns added to the
+          schema file will not show up in the output of the <codeph>DESCRIBE FORMATTED</codeph>
+          command. Removing columns from the schema file will trigger a
+          <codeph>NullPointerException</codeph>.
         </p>
 
         <p>
-          As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop and recreate the table. This will populate
-          the Hive metastore database with the correct column definitions.
+          As a workaround, you can use the output of <codeph>SHOW CREATE TABLE</codeph> to drop
+          and recreate the table. This will populate the Hive metastore database with the
+          correct column definitions.
         </p>
 
         <note type="warning">
-          <p>Only use this for external tables, or Impala will remove the data
-            files. In case of an internal table, set it to external first:
+          <p>
+            Only use this for external tables, or Impala will remove the data files. In case of
+            an internal table, set it to external first:
 <codeblock>
 ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
 </codeblock>
-          (The part in parentheses is case sensitive.) Make sure to pick the
-          right choice between internal and external when recreating the table.
-          See <xref href="impala_tables.xml#tables"/> for the differences
-          between internal and external tables. </p></note>
+            (The part in parentheses is case sensitive.) Make sure to pick the right choice
+            between internal and external when recreating the table. See
+            <xref href="impala_tables.xml#tables"/> for the differences between internal and
+            external tables.
+          </p>
+        </note>
 
         <p>
           <b>Severity:</b> High
@@ -1746,8 +2156,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Impala behavior differs from Hive with respect to out of range float/double values. Out of range values are returned as maximum
-          allowed value of type (Hive returns NULL).
+          Impala behavior differs from Hive with respect to out of range float/double values.
+          Out of range values are returned as maximum allowed value of type (Hive returns NULL).
         </p>
 
         <p>
@@ -1767,14 +2177,16 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          For compatibility with Impala, the value for the Flume HDFS Sink <codeph>hdfs.writeFormat</codeph> must be set to
-          <codeph>Text</codeph>, rather than its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph> setting
-          must be changed to <codeph>Text</codeph> before creating data files with Flume; otherwise, those files cannot be read by either
-          Impala or Hive.
+          For compatibility with Impala, the value for the Flume HDFS Sink
+          <codeph>hdfs.writeFormat</codeph> must be set to <codeph>Text</codeph>, rather than
+          its default value of <codeph>Writable</codeph>. The <codeph>hdfs.writeFormat</codeph>
+          setting must be changed to <codeph>Text</codeph> before creating data files with
+          Flume; otherwise, those files cannot be read by either Impala or Hive.
         </p>
 
         <p>
-          <b>Resolution:</b> This information has been requested to be added to the upstream Flume documentation.
+          <b>Resolution:</b> This information has been requested to be added to the upstream
+          Flume documentation.
         </p>
 
       </conbody>
@@ -1790,7 +2202,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Querying certain Avro tables could cause a crash or return no rows, even though Impala could <codeph>DESCRIBE</codeph> the table.
+          Querying certain Avro tables could cause a crash or return no rows, even though Impala
+          could <codeph>DESCRIBE</codeph> the table.
         </p>
 
         <p>
@@ -1798,13 +2211,14 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
         </p>
 
         <p>
-          <b>Workaround:</b> Swap the order of the fields in the schema specification. For example, <codeph>["null", "string"]</codeph>
-          instead of <codeph>["string", "null"]</codeph>.
+          <b>Workaround:</b> Swap the order of the fields in the schema specification. For
+          example, <codeph>["null", "string"]</codeph> instead of <codeph>["string",
+          "null"]</codeph>.
         </p>
 
         <p>
-          <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it may still cause an error even when the
-          crashing issue is resolved.
+          <b>Resolution:</b> Not allowing this syntax agrees with the Avro specification, so it
+          may still cause an error even when the crashing issue is resolved.
         </p>
 
       </conbody>
@@ -1820,7 +2234,8 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          If an Avro table has a schema definition with a trailing semicolon, Impala encounters an error when the table is queried.
+          If an Avro table has a schema definition with a trailing semicolon, Impala encounters
+          an error when the table is queried.
         </p>
 
         <p>
@@ -1844,8 +2259,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          Currently, Impala can only read gzipped files containing a single stream. If a gzipped file contains multiple concatenated
-          streams, the Impala query only processes the data from the first stream.
+          Currently, Impala can only read gzipped files containing a single stream. If a gzipped
+          file contains multiple concatenated streams, the Impala query only processes the data
+          from the first stream.
         </p>
 
         <p>
@@ -1856,7 +2272,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
           <b>Workaround:</b> Use a different gzip tool to compress file to a single stream file.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala250"/>.
+        </p>
 
       </conbody>
 
@@ -1871,8 +2289,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          If a carriage return / newline pair of characters in a text table is split between HDFS data blocks, Impala incorrectly processes
-          the row following the <codeph>\n\r</codeph> pair twice.
+          If a carriage return / newline pair of characters in a text table is split between
+          HDFS data blocks, Impala incorrectly processes the row following the
+          <codeph>\n\r</codeph> pair twice.
         </p>
 
         <p>
@@ -1883,7 +2302,9 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
           <b>Workaround:</b> Use the Parquet format for large volumes of data where practical.
         </p>
 
-        <p><b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.</p>
+        <p>
+          <b>Resolution:</b> Fixed in <keyword keyref="impala260"/>.
+        </p>
 
       </conbody>
 
@@ -1898,30 +2319,33 @@ ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
       <conbody>
 
         <p>
-          In some cases, an invalid <codeph>BOOLEAN</codeph> value read from a table does not produce a warning message about the bad value.
-          The result is still <codeph>NULL</codeph> as expected. Therefore, this is not a query correctness issue, but it could lead to
-          overlooking the presence of in

<TRUNCATED>

[09/20] impala git commit: [DOCS] A typo fix introduced in impala-6748

Posted by sa...@apache.org.
[DOCS] A typo fix introduced in impala-6748

Change-Id: Idb89c97d2cc470960b88f14f038252d1d46778d0
Reviewed-on: http://gerrit.cloudera.org:8080/10126
Reviewed-by: Alex Rodoni <ar...@cloudera.com>
Tested-by: Impala Public Jenkins <im...@cloudera.com>


Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/c7c94331
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/c7c94331
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/c7c94331

Branch: refs/heads/2.x
Commit: c7c943316f32cdb65c3f3f776ea1296d3d871265
Parents: b018c5b
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Thu Apr 19 14:04:57 2018 -0700
Committer: Impala Public Jenkins <im...@gerrit.cloudera.org>
Committed: Fri Apr 20 20:17:57 2018 +0000

----------------------------------------------------------------------
 docs/shared/impala_common.xml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/impala/blob/c7c94331/docs/shared/impala_common.xml
----------------------------------------------------------------------
diff --git a/docs/shared/impala_common.xml b/docs/shared/impala_common.xml
index 1d6ef1f..26d06df 100644
--- a/docs/shared/impala_common.xml
+++ b/docs/shared/impala_common.xml
@@ -1338,7 +1338,7 @@ drop database temp;
       </p>
       <p>Leading zeroes are not required in the numbers representing the date
         component, such as month and date, or the time component, such as
-        month, date, hour, minute, second. For example, Impala accepts both
+        hour, minute, and second. For example, Impala accepts both
         <codeph>"2018-1-1 01:02:03"</codeph> and
           <codeph>"2018-01-01 1:2:3"</codeph> as valid.</p>