You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/01 21:30:05 UTC

[05/15] impala git commit: [DOCS] Add Impala docs from branch 2.x

http://git-wip-us.apache.org/repos/asf/impala/blob/592fd85d/docs/build/html/topics/impala_perf_stats.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_stats.html b/docs/build/html/topics/impala_perf_stats.html
index eaf369d..ce79b71 100644
--- a/docs/build/html/topics/impala_perf_stats.html
+++ b/docs/build/html/topics/impala_perf_stats.html
@@ -1,60 +1,52 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Table and Column Statistics</h1>
+
   
 
   <div class="body conbody">
 
     <p class="p">
-      Impala can do better optimization for complex or multi-table queries when it has access to statistics about
-      the volume of data and how the values are distributed. Impala uses this information to help parallelize and
-      distribute the work for a query. For example, optimizing join queries requires a way of determining if one
-      table is <span class="q">"bigger"</span> than another, which is a function of the number of rows and the average row size
-      for each table. The following sections describe the categories of statistics Impala can work
-      with, and how to produce them and keep them up to date.
+      Impala can do better optimization for complex or multi-table queries when it has access to
+      statistics about the volume of data and how the values are distributed. Impala uses this
+      information to help parallelize and distribute the work for a query. For example,
+      optimizing join queries requires a way of determining if one table is <span class="q">"bigger"</span> than
+      another, which is a function of the number of rows and the average row size for each
+      table. The following sections describe the categories of statistics Impala can work with,
+      and how to produce them and keep them up to date.
     </p>
 
-    <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
-      <p class="p">
-        Originally, Impala relied on the Hive mechanism for collecting statistics, through the Hive <code class="ph codeph">ANALYZE
-        TABLE</code> statement which initiates a MapReduce job. For better user-friendliness and reliability,
-        Impala implements its own <code class="ph codeph">COMPUTE STATS</code> statement in Impala 1.2.2 and higher, along with the
-        <code class="ph codeph">DROP STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN STATS</code>
-        statements.
-      </p>
-    </div>
+    <p class="p toc inpage all"></p>
 
-    <p class="p toc inpage"></p>
   </div>
 
   <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_stats__perf_table_stats">
 
     <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview of Table Statistics</h2>
-  
 
-    <div class="body conbody">
 
 
+    <div class="body conbody">
 
       <p class="p">
         The Impala query planner can make use of statistics about entire tables and partitions.
-        This information includes physical characteristics such as the number of rows, number of data files,
-        the total size of the data files, and the file format. For partitioned tables, the numbers
-        are calculated per partition, and as totals for the whole table.
-        This metadata is stored in the metastore database, and can be updated by either Impala or Hive.
-        If a number is not available, the value -1 is used as a placeholder.
-        Some numbers, such as number and total sizes of data files, are always kept up to date because
-        they can be calculated cheaply, as part of gathering HDFS block metadata.
+        This information includes physical characteristics such as the number of rows, number of
+        data files, the total size of the data files, and the file format. For partitioned
+        tables, the numbers are calculated per partition, and as totals for the whole table.
+        This metadata is stored in the metastore database, and can be updated by either Impala
+        or Hive. If a number is not available, the value -1 is used as a placeholder. Some
+        numbers, such as number and total sizes of data files, are always kept up to date
+        because they can be calculated cheaply, as part of gathering HDFS block metadata.
       </p>
 
       <p class="p">
-        The following example shows table stats for an unpartitioned Parquet table.
-        The values for the number and sizes of files are always available.
-        Initially, the number of rows is not known, because it requires a potentially expensive
-        scan through the entire table, and so that value is displayed as -1.
-        The <code class="ph codeph">COMPUTE STATS</code> statement fills in any unknown table stats values.
+        The following example shows table stats for an unpartitioned Parquet table. The values
+        for the number and sizes of files are always available. Initially, the number of rows is
+        not known, because it requires a potentially expensive scan through the entire table,
+        and so that value is displayed as -1. The <code class="ph codeph">COMPUTE STATS</code> statement fills
+        in any unknown table stats values.
       </p>
 
 <pre class="pre codeblock"><code>
@@ -82,22 +74,23 @@ show table stats parquet_snappy;
 </code></pre>
 
       <p class="p">
-        Impala performs some optimizations using this metadata on its own, and other optimizations by
-        using a combination of table and column statistics.
+        Impala performs some optimizations using this metadata on its own, and other
+        optimizations by using a combination of table and column statistics.
       </p>
 
       <p class="p">
-        To check that table statistics are available for a table, and see the details of those statistics, use the
-        statement <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code>. See
-        <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        To check that table statistics are available for a table, and see the details of those
+        statistics, use the statement <code class="ph codeph">SHOW TABLE STATS
+        <var class="keyword varname">table_name</var></code>. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+        details.
       </p>
 
       <p class="p">
         If you use the Hive-based methods of gathering statistics, see
         <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the
-        Hive wiki</a> for information about the required configuration on the Hive side. Where practical,
-        use the Impala <code class="ph codeph">COMPUTE STATS</code> statement to avoid potential configuration and scalability
-        issues with the statistics-gathering process.
+        Hive wiki</a> for information about the required configuration on the Hive side.
+        Where practical, use the Impala <code class="ph codeph">COMPUTE STATS</code> statement to avoid
+        potential configuration and scalability issues with the statistics-gathering process.
       </p>
 
       <p class="p">
@@ -105,7 +98,9 @@ show table stats parquet_snappy;
         Impala can only use the resulting column statistics if the table is unpartitioned.
         Impala cannot use Hive-generated column statistics for a partitioned table.
       </p>
+
     </div>
+
   </article>
 
   <article class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_stats__perf_column_stats">
@@ -115,23 +110,24 @@ show table stats parquet_snappy;
     <div class="body conbody">
 
       <p class="p">
-        The Impala query planner can make use of statistics about individual columns when that metadata is
-        available in the metastore database. This technique is most valuable for columns compared across tables in
-        <a class="xref" href="impala_perf_joins.html#perf_joins">join queries</a>, to help estimate how many rows the query
-        will retrieve from each table. <span class="ph"> These statistics are also important for correlated
-        subqueries using the <code class="ph codeph">EXISTS()</code> or <code class="ph codeph">IN()</code> operators, which are processed
+        The Impala query planner can make use of statistics about individual columns when that
+        metadata is available in the metastore database. This technique is most valuable for
+        columns compared across tables in <a class="xref" href="impala_perf_joins.html#perf_joins">join
+        queries</a>, to help estimate how many rows the query will retrieve from each table.
+        <span class="ph"> These statistics are also important for correlated subqueries using the
+        <code class="ph codeph">EXISTS()</code> or <code class="ph codeph">IN()</code> operators, which are processed
         internally the same way as join queries.</span>
       </p>
 
       <p class="p">
-        The following example shows column stats for an unpartitioned Parquet table.
-        The values for the maximum and average sizes of some types are always available,
-        because those figures are constant for numeric and other fixed-size types.
-        Initially, the number of distinct values is not known, because it requires a potentially expensive
-        scan through the entire table, and so that value is displayed as -1.
-        The same applies to maximum and average sizes of variable-sized types, such as <code class="ph codeph">STRING</code>.
-        The <code class="ph codeph">COMPUTE STATS</code> statement fills in most unknown column stats values.
-        (It does not record the number of <code class="ph codeph">NULL</code> values, because currently Impala
+        The following example shows column stats for an unpartitioned Parquet table. The values
+        for the maximum and average sizes of some types are always available, because those
+        figures are constant for numeric and other fixed-size types. Initially, the number of
+        distinct values is not known, because it requires a potentially expensive scan through
+        the entire table, and so that value is displayed as -1. The same applies to maximum and
+        average sizes of variable-sized types, such as <code class="ph codeph">STRING</code>. The
+        <code class="ph codeph">COMPUTE STATS</code> statement fills in most unknown column stats values. (It
+        does not record the number of <code class="ph codeph">NULL</code> values, because currently Impala
         does not use that figure for query optimization.)
       </p>
 
@@ -170,10 +166,11 @@ show column stats parquet_snappy;
 
       <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
         <p class="p">
-          For column statistics to be effective in Impala, you also need to have table statistics for the
-          applicable tables, as described in <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a>. When you use
-          the Impala <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are automatically
-          gathered at the same time, for all columns in the table.
+          For column statistics to be effective in Impala, you also need to have table
+          statistics for the applicable tables, as described in
+          <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a>. When you use the Impala
+          <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are
+          automatically gathered at the same time, for all columns in the table.
         </p>
       </div>
 
@@ -185,13 +182,12 @@ show column stats parquet_snappy;
         higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
         skipping this <code class="ph codeph">NULL</code> counting. </div>
 
-
-
       <p class="p">
-        To check whether column statistics are available for a particular set of columns, use the <code class="ph codeph">SHOW
-        COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check the extended
-        <code class="ph codeph">EXPLAIN</code> output for a query against that table that refers to those columns. See
-        <a class="xref" href="impala_show.html#show">SHOW Statement</a> and <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details.
+        To check whether column statistics are available for a particular set of columns, use
+        the <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check
+        the extended <code class="ph codeph">EXPLAIN</code> output for a query against that table that refers
+        to those columns. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> and
+        <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details.
       </p>
 
       <p class="p">
@@ -199,32 +195,37 @@ show column stats parquet_snappy;
         Impala can only use the resulting column statistics if the table is unpartitioned.
         Impala cannot use Hive-generated column statistics for a partitioned table.
       </p>
+
     </div>
+
   </article>
 
   <article class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats__perf_stats_partitions">
+
     <h2 class="title topictitle2" id="perf_stats_partitions__stats_partitions">How Table and Column Statistics Work for Partitioned Tables</h2>
+
     <div class="body conbody">
 
       <p class="p">
-        When you use Impala for <span class="q">"big data"</span>, you are highly likely to use partitioning
-        for your biggest tables, the ones representing data that can be logically divided
-        based on dates, geographic regions, or similar criteria. The table and column statistics
-        are especially useful for optimizing queries on such tables. For example, a query involving
-        one year might involve substantially more or less data than a query involving a different year,
-        or a range of several years. Each query might be optimized differently as a result.
+        When you use Impala for <span class="q">"big data"</span>, you are highly likely to use partitioning for
+        your biggest tables, the ones representing data that can be logically divided based on
+        dates, geographic regions, or similar criteria. The table and column statistics are
+        especially useful for optimizing queries on such tables. For example, a query involving
+        one year might involve substantially more or less data than a query involving a
+        different year, or a range of several years. Each query might be optimized differently
+        as a result.
       </p>
 
       <p class="p">
         The following examples show how table and column stats work with a partitioned table.
-        The table for this example is partitioned by year, month, and day.
-        For simplicity, the sample data consists of 5 partitions, all from the same year and month.
-        Table stats are collected independently for each partition. (In fact, the
-        <code class="ph codeph">SHOW PARTITIONS</code> statement displays exactly the same information as
-        <code class="ph codeph">SHOW TABLE STATS</code> for a partitioned table.) Column stats apply to
-        the entire table, not to individual partitions. Because the partition key column values
-        are represented as HDFS directories, their characteristics are typically known in advance,
-        even when the values for non-key columns are shown as -1.
+        The table for this example is partitioned by year, month, and day. For simplicity, the
+        sample data consists of 5 partitions, all from the same year and month. Table stats are
+        collected independently for each partition. (In fact, the <code class="ph codeph">SHOW
+        PARTITIONS</code> statement displays exactly the same information as <code class="ph codeph">SHOW
+        TABLE STATS</code> for a partitioned table.) Column stats apply to the entire table,
+        not to individual partitions. Because the partition key column values are represented as
+        HDFS directories, their characteristics are typically known in advance, even when the
+        values for non-key columns are shown as -1.
       </p>
 
 <pre class="pre codeblock"><code>
@@ -305,231 +306,419 @@ show column stats year_month_day;
         Impala can only use the resulting column statistics if the table is unpartitioned.
         Impala cannot use Hive-generated column statistics for a partitioned table.
       </p>
+
     </div>
+
   </article>
 
-  <article class="topic concept nested1" aria-labelledby="perf_stats_incremental__incremental_stats" id="perf_stats__perf_stats_incremental">
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="perf_stats__perf_generating_stats">
 
-    <h2 class="title topictitle2" id="perf_stats_incremental__incremental_stats">Overview of Incremental Statistics</h2>
+    <h2 class="title topictitle2" id="ariaid-title5">Generating Table and Column Statistics</h2>
 
     <div class="body conbody">
 
       <p class="p">
-        In Impala 2.1.0 and higher, you can use the syntax <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> and
-        <code class="ph codeph">DROP INCREMENTAL STATS</code>. The <code class="ph codeph">INCREMENTAL</code> clauses work with incremental
-        statistics, a specialized feature for partitioned tables that are large or frequently updated with new
-        partitions.
-      </p>
-
-      <p class="p">
-        When you compute incremental statistics for a partitioned table, by default Impala only processes those
-        partitions that do not yet have incremental statistics. By processing only newly added partitions, you can
-        keep statistics up to date for large partitioned tables, without incurring the overhead of reprocessing the
-        entire table each time.
+        Use the <code class="ph codeph">COMPUTE STATS</code> family of commands to collect table and
+        column statistics. The <code class="ph codeph">COMPUTE STATS</code> variants offer
+        different tradeoffs between computation cost, staleness, and maintenance
+        workflows which are explained below.
       </p>
 
       <div class="note important note_important"><span class="note__title importanttitle">Important:</span> 
         <p class="p">
         For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
-        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or alternate
-        between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
-        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or vice
-        versa, drop all statistics (by running both <code class="ph codeph">DROP STATS</code> and
-        <code class="ph codeph">DROP INCREMENTAL STATS</code>) before making the switch.
-      </p>
-        <p class="p">
-        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
-        the statistics are computed again from scratch regardless of whether the table already
-        has statistics. Therefore, expect a one-time resource-intensive operation
-        for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
-        for the first time on a given table.
-      </p>
-        <p class="p">
-        For a table with a huge number of partitions and many columns, the approximately 400 bytes
-        of metadata per column per partition can add up to significant memory overhead, as it must
-        be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
-        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
-        you might experience service downtime.
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+        alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+        <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+        vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+        making the switch.
       </p>
       </div>
 
-      <p class="p">
-        You can also compute or drop statistics for a single partition by including a <code class="ph codeph">PARTITION</code>
-        clause in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or <code class="ph codeph">DROP INCREMENTAL STATS</code>
-        statement.
-      </p>
 
-      <p class="p">
-        The metadata for incremental statistics is handled differently from the original style of statistics:
-      </p>
 
-      <ul class="ul">
-        <li class="li">
-          <p class="p">
-            If you have a partitioned table for which you have already run a regular <code class="ph codeph">COMPUTE STATS</code>
-            statement, issuing <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a partition clause causes Impala
-            to rescan the entire table. Once the incremental statistics are computed, any future <code class="ph codeph">COMPUTE INCREMENTAL
-            STATS</code> statements only scan any new partitions and any partitions where you performed
-            <code class="ph codeph">DROP INCREMENTAL STATS</code>.
-          </p>
-        </li>
+    </div>
 
-        <li class="li">
-          <p class="p">
-            The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code> statements now include an
-            additional column showing whether incremental statistics are available for each column. A partition
-            could already be covered by the original type of statistics based on a prior <code class="ph codeph">COMPUTE
-            STATS</code> statement, as indicated by a value other than <code class="ph codeph">-1</code> under the
-            <code class="ph codeph">#Rows</code> column. Impala query planning uses either kind of statistics when available.
-          </p>
-        </li>
+    <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="perf_generating_stats__concept_y2f_nfl_mdb">
 
-        <li class="li">
-          <p class="p">
-            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE STATS</code> for the
-            same volume of data. Therefore it is most suitable for tables with large data volume where new
-            partitions are added frequently, making it impractical to run a full <code class="ph codeph">COMPUTE STATS</code>
-            operation for each new partition. For unpartitioned tables, or partitioned tables that are loaded once
-            and not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code> syntax.
-          </p>
-        </li>
+      <h3 class="title topictitle3" id="ariaid-title6">COMPUTE STATS</h3>
 
-        <li class="li">
-          <p class="p">
-            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the <span class="keyword cmdname">catalogd</span> process,
-            proportional to the number of partitions and number of columns in the applicable table. The memory
-            overhead is approximately 400 bytes for each column in each partition. This memory is reserved in the
-            <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and in each instance of
-            the <span class="keyword cmdname">impalad</span> daemon.
-          </p>
-        </li>
+      <div class="body conbody">
 
-        <li class="li">
-          <p class="p">
-            In cases where new files are added to an existing partition, issue a <code class="ph codeph">REFRESH</code> statement
-            for the table, followed by a <code class="ph codeph">DROP INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL
-            STATS</code> sequence for the changed partition.
-          </p>
-        </li>
+        <p class="p">
+          The <code class="ph codeph">COMPUTE STATS</code> command collects and sets the table-level
+          and partition-level row counts as well as all column statistics for a given
+          table. The collection process is CPU-intensive and can take a long time to
+          complete for very large tables.
+        </p>
+        <div class="p">
+          To speed up <code class="ph codeph">COMPUTE STATS</code> consider the following options
+          which can be combined.
+          <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Limit the number of columns for which statistics are collected to increase
+              the efficiency of COMPUTE STATS. Queries benefit from statistics for those
+              columns involved in filters, join conditions, group by or partition by
+              clauses. Other columns are good candidates to exclude from COMPUTE STATS.
+              This feature is available since Impala 2.12.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Set the MT_DOP query option to use more threads within each participating
+              impalad to compute the statistics faster - but not more efficiently. Note
+              that computing stats on a large table with a high MT_DOP value can
+              negatively affect other queries running at the same time if the
+              COMPUTE STATS claims most CPU cycles.
+              This feature is available since Impala 2.8.
+            </p>
+          </li>
+          <li class="li">
+            <p class="p">
+              Consider the experimental extrapolation and sampling features (see below)
+              to further increase the efficiency of computing stats.
+            </p>
+          </li>
+          </ul>
+        </div>
 
-        <li class="li">
-          <p class="p">
-            The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single partition at a time. To
-            remove statistics (whether incremental or not) from all partitions of a table, issue a <code class="ph codeph">DROP
-            STATS</code> statement with no <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
-          </p>
-        </li>
-      </ul>
+        <p class="p">
+          <code class="ph codeph">COMPUTE STATS</code> is intended to be run periodically,
+          e.g. weekly, or on-demand when the contents of a table have changed
+          significantly. Due to the high resource utilization and long repsonse
+          time of t<code class="ph codeph">COMPUTE STATS</code>, it is most practical to run it
+          in a scheduled maintnance window where the Impala cluster is idle
+          enough to accommodate the expensive operation. The degree of change that
+          qualifies as <span class="q">"significant"</span> depends on the query workload, but typically,
+          if 30% of the rows have changed then it is recommended to recompute
+          statistics.
+        </p>
 
-      <p class="p">
-        The following considerations apply to incremental statistics when the structure of an existing table is
-        changed (known as <dfn class="term">schema evolution</dfn>):
-      </p>
+        <p class="p">
+          If you reload a complete new set of data for a table, but the number of rows and
+          number of distinct values for each column is relatively unchanged from before, you
+          do not need to recompute stats for the table.
+        </p>
 
-      <ul class="ul">
-        <li class="li">
-          <p class="p">
-            If you use an <code class="ph codeph">ALTER TABLE</code> statement to drop a column, the existing statistics remain
-            valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any partitions.
-          </p>
-        </li>
+      </div>
 
-        <li class="li">
+      <article class="topic concept nested3" aria-labelledby="ariaid-title7" id="concept_y2f_nfl_mdb__experimental_stats_features">
+        <h4 class="title topictitle4" id="ariaid-title7">Experimental: Extrapolation and Sampling</h4>
+        <div class="body conbody">
+          <div class="p">
+            Impala 2.12 and higher includes two experimental features to alleviate
+            common issues for computing and maintaining statistics on very large tables.
+            The following shortcomings are improved upon:
+            <ul class="ul">
+            <li class="li">
+              <p class="p">
+                Newly added partitions do not have row count statistics. Table scans
+                that only access those new partitions are treated as not having stats.
+                Similarly, table scans that access both new and old partitions estimate
+                the scan cardinality based on those old partitions that have stats, and
+                the new partitions without stats are treated as having 0 rows.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                The row counts of existing partitions become stale when data is added
+                or dropped.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                Computing stats for tables with a 100,000 or more partitions might fail
+                or be very slow due to the high cost of updating the partition metadata
+                in the Hive Metastore.
+              </p>
+            </li>
+            <li class="li">
+              <p class="p">
+                With transient compute resources it is important to minimize the time
+                from starting a new cluster to successfully running queries.
+                Since the cluster might be relatively short-lived, users might prefer to
+                quickly collect stats that are "good enough" as opposed to spending
+                a lot of time and resouces on computing full-fidelity stats.
+              </p>
+            </li>
+            </ul>
+            For very large tables, it is often wasteful or impractical to run a full
+            COMPUTE STATS to address the scenarios above on a frequent basis.
+          </div>
           <p class="p">
-            If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans all partitions and
-            fills in the appropriate column-level values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL
-            STATS</code>.
+            The sampling feature makes COMPUTE STATS more efficient by processing a
+            fraction of the table data, and the extrapolation feature aims to reduce
+            the frequency at which COMPUTE STATS needs to be re-run by estimating
+            the row count of new and modified partitions.
           </p>
-        </li>
-
-        <li class="li">
           <p class="p">
-            If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a column, Impala
-            rescans all partitions and fills in the appropriate column-level values the next time you run
-            <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            The sampling and extrapolation features are disabled by default.
+            They can be enabled globally or for specific tables, as follows.
+            Set the impalad start-up configuration "--enable_stats_extrapolation" to
+            enable the features globally. To enable them only for a specific table, set
+            the "impala.enable.stats.extrapolation" table property to "true" for the
+            desired table. The tbale-level property overrides the global setting, so
+            it is also possible to enable sampling and extrapolation globally, but
+            disable it for specific tables by setting the table property to "false".
+            Example:
+            ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true")
           </p>
-        </li>
+          <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+            Why are these features experimental? Due to their probabilistic nature
+            it is possible that these features perform pathologically poorly on tables
+            with extreme data/file/size distributions. Since it is not feasible for us
+            to test all possible scenarios we only cautiously advertise these new
+            capabilities. That said, the features have been thoroughly tested and
+            are considered functionally stable. If you decide to give these features
+            a try, please tell us about your experience at user@impala.apache.org!
+            We rely on user feedback to guide future inprovements in statistics
+            collection.
+          </div>
+        </div>
+
+        <article class="topic concept nested4" aria-labelledby="ariaid-title8" id="experimental_stats_features__experimental_stats_extrapolation">
+          <h5 class="title topictitle5" id="ariaid-title8">Stats Extrapolation</h5>
+          <div class="body conbody">
+            <p class="p">
+              The main idea of stats extrapolation is to estimate the row count of new
+              and modified partitions based on the result of the last COMPUTE STATS.
+              Enabling stats extrapolation changes the behavior of COMPUTE STATS,
+              as well as the cardinality estimation of table scans. COMPUTE STATS no
+              longer computes and stores per-partition row counts, and instead, only
+              computes a table-level row count together with the total number of file
+              bytes in the table at that time. No partition metadata is modified. The
+              input cardinality of a table scan is estimated by converting the data
+              volume of relevant partitions to a row count, based on the table-level
+              row count and file bytes statistics. It is assumed that within the same
+              table, different sets of files with the same data volume correspond
+              to the similar number of rows on average. With extrapolation enabled,
+              the scan cardinality estimation ignores per-partition row counts. It
+              only relies on the table-level statistics and the scanned data volume.
+            </p>
+            <p class="p">
+              The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts
+              stored in the Hive Metastore, and the row counts extrapolated based on the
+              above process. Consult the SHOW TABLE STATS and EXPLAIN documentation
+              for more details.
+            </p>
+          </div>
+        </article>
+
+        <article class="topic concept nested4" aria-labelledby="ariaid-title9" id="experimental_stats_features__experimental_stats_sampling">
+          <h5 class="title topictitle5" id="ariaid-title9">Sampling</h5>
+          <div class="body conbody">
+            <p class="p">
+              A TABLESAMPLE clause may be added to COMPUTE STATS to limit the
+              percentage of data to be processed. The final statistics are obtained
+              by extrapolating the statistics from the data sample over the entire table.
+              The extrapolated statistics are stored in the Hive Metastore, just as if no
+              sampling was used. The following example runs COMPUTE STATS over a 10 percent
+              data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10)
+            </p>
+            <p class="p">
+            We have found that a 10 percent sampling rate typically offers a good
+            tradeoff between statistics accuracy and execution cost. A sampling rate
+            well below 10 percent has shown poor results and is not recommended.
+            </p>
+            <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+              Sampling-based techniques sacrifice result accuracy for execution
+              efficiency, so your mileage may vary for different tables and columns
+              depending on their data distribution. The extrapolation procedure Impala
+              uses for estimating the number of distinct values per column is inherently
+              non-detetministic, so your results may even vary between runs of
+              COMPUTE STATS TABLESAMPLE, even if no data has changed.
+            </div>
+          </div>
+        </article>
+      </article>
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="perf_generating_stats__concept_bmk_pfl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title10">COMPUTE INCREMENTAL STATS</h3>
+
+      <div class="body conbody">
 
-        <li class="li">
-          <p class="p">
-            If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a table, the existing
-            statistics remain valid and a subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not rescan any
-            partitions.
-          </p>
-        </li>
-      </ul>
+        <p class="p">
+          In Impala 2.1.0 and higher, you can use the
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> and
+          <code class="ph codeph">DROP INCREMENTAL STATS</code> commands.
+          The <code class="ph codeph">INCREMENTAL</code> clauses work with incremental statistics,
+          a specialized feature for partitioned tables.
+        </p>
 
-      <p class="p">
-        See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
-        <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details.
-      </p>
-    </div>
-  </article>
+        <p class="p">
+          When you compute incremental statistics for a partitioned table, by default Impala only
+          processes those partitions that do not yet have incremental statistics. By processing
+          only newly added partitions, you can keep statistics up to date without incurring the
+          overhead of reprocessing the entire table each time.
+        </p>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="perf_stats__perf_stats_computing">
-    <h2 class="title topictitle2" id="ariaid-title6">Generating Table and Column Statistics (COMPUTE STATS Statement)</h2>
-    <div class="body conbody">
+        <p class="p">
+          You can also compute or drop statistics for a specified subset of partitions by
+          including a <code class="ph codeph">PARTITION</code> clause in the
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or <code class="ph codeph">DROP INCREMENTAL STATS</code>
+          statement.
+        </p>
 
-      <p class="p">
-        To gather table statistics after loading data into a table or partition, you typically use the
-        <code class="ph codeph">COMPUTE STATS</code> statement. This statement is available in Impala 1.2.2 and higher.
-        It gathers both table statistics and column statistics for all columns in a single operation.
-        For large partitioned tables, where you frequently need to update statistics and it is impractical
-        to scan the entire table each time, use the syntax <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>,
-        which is available in <span class="keyword">Impala 2.1</span> and higher.
+        <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+          <p class="p">
+        For a table with a huge number of partitions and many columns, the approximately 400 bytes
+        of metadata per column per partition can add up to significant memory overhead, as it must
+        be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+        that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+        you might experience service downtime.
       </p>
-
-      <p class="p">
-        If you use Hive as part of your ETL workflow, you can also use Hive to generate table and
-        column statistics. You might need to do extra configuration within Hive itself, the metastore,
-        or even set up a separate database to hold Hive-generated statistics. You might need to run
-        multiple statements to generate all the necessary statistics. Therefore, prefer the
-        Impala <code class="ph codeph">COMPUTE STATS</code> statement where that technique is practical.
-        For details about collecting statistics through Hive, see
-        <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the Hive wiki</a>.
+          <p class="p">
+        When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+        the statistics are computed again from scratch regardless of whether the table already
+        has statistics. Therefore, expect a one-time resource-intensive operation
+        for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+        for the first time on a given table.
       </p>
+        </div>
 
-      <p class="p">
-        If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
-        Impala can only use the resulting column statistics if the table is unpartitioned.
-        Impala cannot use Hive-generated column statistics for a partitioned table.
-      </p>
+        <p class="p">
+          The metadata for incremental statistics is handled differently from the original style
+          of statistics:
+        </p>
 
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              Issuing a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a partition
+              clause causes Impala to compute incremental stats for all partitions that
+              do not already have incremental stats. This might be the entire table when
+              running the command for the first time, but subsequent runs should only
+              update new partitions. You can force updating a partition that already has
+              incremental stats by issuing a <code class="ph codeph">DROP INCREMENTAL STATS</code>
+              before running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code>
+              statements now include an additional column showing whether incremental statistics
+              are available for each column. A partition could already be covered by the original
+              type of statistics based on a prior <code class="ph codeph">COMPUTE STATS</code> statement, as
+              indicated by a value other than <code class="ph codeph">-1</code> under the <code class="ph codeph">#Rows</code>
+              column. Impala query planning uses either kind of statistics when available.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE
+              STATS</code> for the same volume of data. Therefore it is most suitable for tables
+              with large data volume where new partitions are added frequently, making it
+              impractical to run a full <code class="ph codeph">COMPUTE STATS</code> operation for each new
+              partition. For unpartitioned tables, or partitioned tables that are loaded once and
+              not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code>
+              syntax.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the
+              <span class="keyword cmdname">catalogd</span> process, proportional to the number of partitions and
+              number of columns in the applicable table. The memory overhead is approximately 400
+              bytes for each column in each partition. This memory is reserved in the
+              <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and
+              in each instance of the <span class="keyword cmdname">impalad</span> daemon.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              In cases where new files are added to an existing partition, issue a
+              <code class="ph codeph">REFRESH</code> statement for the table, followed by a <code class="ph codeph">DROP
+              INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> sequence
+              for the changed partition.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single
+              partition at a time. To remove statistics (whether incremental or not) from all
+              partitions of a table, issue a <code class="ph codeph">DROP STATS</code> statement with no
+              <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
+            </p>
+          </li>
+        </ul>
 
+        <p class="p">
+          The following considerations apply to incremental statistics when the structure of an
+          existing table is changed (known as <dfn class="term">schema evolution</dfn>):
+        </p>
 
-      <p class="p">
+        <ul class="ul">
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to drop a column, the existing
+              statistics remain valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not
+              rescan any partitions.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans
+              all partitions and fills in the appropriate column-level values the next time you
+              run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a
+              column, Impala rescans all partitions and fills in the appropriate column-level
+              values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+            </p>
+          </li>
+
+          <li class="li">
+            <p class="p">
+              If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a
+              table, the existing statistics remain valid and a subsequent <code class="ph codeph">COMPUTE
+              INCREMENTAL STATS</code> does not rescan any partitions.
+            </p>
+          </li>
+        </ul>
 
+        <p class="p">
+          See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
+          <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details.
+        </p>
 
-        For your very largest tables, you might find that <code class="ph codeph">COMPUTE STATS</code> or even <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
-        take so long to scan the data that it is impractical to use them regularly. In such a case, after adding a partition or inserting new data,
-        you can update just the number of rows property through an <code class="ph codeph">ALTER TABLE</code> statement.
-        See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">Setting the NUMROWS Value Manually through ALTER TABLE</a> for details.
-        Because the column statistics might be left in a stale state, do not use this technique as a replacement
-        for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique if all other means of collecting statistics are impractical, or as a
-        low-overhead operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
-      </p>
+      </div>
+
+    </article>
 
-    </div>
   </article>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="perf_stats__perf_stats_checking">
+  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_checking">
 
-    <h2 class="title topictitle2" id="ariaid-title7">Detecting Missing Statistics</h2>
+    <h2 class="title topictitle2" id="ariaid-title11">Detecting Missing Statistics</h2>
 
     <div class="body conbody">
 
       <p class="p">
-        You can check whether a specific table has statistics using the <code class="ph codeph">SHOW TABLE STATS</code> statement
-        (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code> statement (for a partitioned table). Both
-        statements display the same information. If a table or a partition does not have any statistics, the
-        <code class="ph codeph">#Rows</code> field contains <code class="ph codeph">-1</code>. Once you compute statistics for the table or
-        partition, the <code class="ph codeph">#Rows</code> field changes to an accurate value.
+        You can check whether a specific table has statistics using the <code class="ph codeph">SHOW TABLE
+        STATS</code> statement (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code>
+        statement (for a partitioned table). Both statements display the same information. If a
+        table or a partition does not have any statistics, the <code class="ph codeph">#Rows</code> field
+        contains <code class="ph codeph">-1</code>. Once you compute statistics for the table or partition,
+        the <code class="ph codeph">#Rows</code> field changes to an accurate value.
       </p>
 
       <p class="p">
-        The following example shows a table that initially does not have any statistics. The <code class="ph codeph">SHOW TABLE
-        STATS</code> statement displays different values for <code class="ph codeph">#Rows</code> before and after the
-        <code class="ph codeph">COMPUTE STATS</code> operation.
+        The following example shows a table that initially does not have any statistics. The
+        <code class="ph codeph">SHOW TABLE STATS</code> statement displays different values for
+        <code class="ph codeph">#Rows</code> before and after the <code class="ph codeph">COMPUTE STATS</code> operation.
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
@@ -555,9 +744,10 @@ show column stats year_month_day;
 
       <p class="p">
         The following example shows a similar progression with a partitioned table. Initially,
-        <code class="ph codeph">#Rows</code> is <code class="ph codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code> operation,
-        <code class="ph codeph">#Rows</code> changes to an accurate value. Any newly added partition starts with no statistics,
-        meaning that you must collect statistics after adding a new partition.
+        <code class="ph codeph">#Rows</code> is <code class="ph codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code>
+        operation, <code class="ph codeph">#Rows</code> changes to an accurate value. Any newly added
+        partition starts with no statistics, meaning that you must collect statistics after
+        adding a new partition.
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats_partitioned (x int) partitioned by (year smallint);
@@ -592,20 +782,22 @@ show column stats year_month_day;
 </code></pre>
 
       <div class="note note note_note"><span class="note__title notetitle">Note:</span> 
-        Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates statistics for all
-        partitions in a table, if you expect to frequently add new partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL
-        STATS</code> syntax instead, which lets you compute stats for a single specified partition, or only for
-        those partitions that do not already have incremental stats.
+        Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates
+        statistics for all partitions in a table, if you expect to frequently add new
+        partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax instead, which
+        lets you compute stats for a single specified partition, or only for those partitions
+        that do not already have incremental stats.
       </div>
 
       <p class="p">
-        If checking each individual table is impractical, due to a large number of tables or views that hide the
-        underlying base tables, you can also check for missing statistics for a particular query. Use the
-        <code class="ph codeph">EXPLAIN</code> statement to preview query efficiency before actually running the query. Use the
-        query profile output available through the <code class="ph codeph">PROFILE</code> command in
-        <span class="keyword cmdname">impala-shell</span> or the web UI to verify query execution and timing after running the query.
-        Both the <code class="ph codeph">EXPLAIN</code> plan and the <code class="ph codeph">PROFILE</code> output display a warning if any
-        tables or partitions involved in the query do not have statistics.
+        If checking each individual table is impractical, due to a large number of tables or
+        views that hide the underlying base tables, you can also check for missing statistics
+        for a particular query. Use the <code class="ph codeph">EXPLAIN</code> statement to preview query
+        efficiency before actually running the query. Use the query profile output available
+        through the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> or the
+        web UI to verify query execution and timing after running the query. Both the
+        <code class="ph codeph">EXPLAIN</code> plan and the <code class="ph codeph">PROFILE</code> output display a warning
+        if any tables or partitions involved in the query do not have statistics.
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; create table no_stats (x int);
@@ -631,10 +823,11 @@ show column stats year_month_day;
 </code></pre>
 
       <p class="p">
-        Because Impala uses the <dfn class="term">partition pruning</dfn> technique when possible to only evaluate certain
-        partitions, if you have a partitioned table with statistics for some partitions and not others, whether or
-        not the <code class="ph codeph">EXPLAIN</code> statement shows the warning depends on the actual partitions used by the
-        query. For example, you might see warnings or not for different queries against the same table:
+        Because Impala uses the <dfn class="term">partition pruning</dfn> technique when possible to only
+        evaluate certain partitions, if you have a partitioned table with statistics for some
+        partitions and not others, whether or not the <code class="ph codeph">EXPLAIN</code> statement shows
+        the warning depends on the actual partitions used by the query. For example, you might
+        see warnings or not for different queries against the same table:
       </p>
 
 <pre class="pre codeblock"><code>-- No warning because all the partitions for the year 2012 have stats.
@@ -646,73 +839,36 @@ EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
 </code></pre>
 
       <p class="p">
-        To confirm if any partitions at all in the table are missing statistics, you might explain a query that
-        scans the entire table, such as <code class="ph codeph">SELECT COUNT(*) FROM <var class="keyword varname">table_name</var></code>.
-      </p>
-    </div>
-  </article>
-
-  <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="perf_stats__perf_stats_collecting">
-
-    <h2 class="title topictitle2" id="ariaid-title8">Keeping Statistics Up to Date</h2>
-
-    <div class="body conbody">
-
-      <p class="p">
-        When the contents of a table or partition change significantly, recompute the stats for the relevant table
-        or partition. The degree of change that qualifies as <span class="q">"significant"</span> varies, depending on the absolute
-        and relative sizes of the tables. Typically, if you add more than 30% more data to a table, it is
-        worthwhile to recompute stats, because the differences in number of rows and number of distinct values
-        might cause Impala to choose a different join order when that table is used in join queries. This guideline
-        is most important for the largest tables. For example, adding 30% new data to a table containing 1 TB has a
-        greater effect on join order than adding 30% to a table containing only a few megabytes, and the larger
-        table has a greater effect on query performance if Impala chooses a suboptimal join order as a result of
-        outdated statistics.
-      </p>
-
-      <p class="p">
-        If you reload a complete new set of data for a table, but the number of rows and number of distinct values
-        for each column is relatively unchanged from before, you do not need to recompute stats for the table.
+        To confirm if any partitions at all in the table are missing statistics, you might
+        explain a query that scans the entire table, such as <code class="ph codeph">SELECT COUNT(*) FROM
+        <var class="keyword varname">table_name</var></code>.
       </p>
 
-      <p class="p">
-        If the statistics for a table are out of date, and the table's large size makes it impractical to recompute
-        new stats immediately, you can use the <code class="ph codeph">DROP STATS</code> statement to remove the obsolete
-        statistics, making it easier to identify tables that need a new <code class="ph codeph">COMPUTE STATS</code> operation.
-      </p>
-
-      <p class="p">
-        For a large partitioned table, consider using the incremental stats feature available in Impala 2.1.0 and
-        higher, as explained in <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">Overview of Incremental Statistics</a>. If you add a new
-        partition to a table, it is worthwhile to recompute incremental stats, because the operation only scans the
-        data for that one new partition.
-      </p>
     </div>
-  </article>
-
-
-
-  
 
+  </article>
 
+  <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="perf_stats__concept_s3c_4gl_mdb">
 
-  
+    <h2 class="title topictitle2" id="ariaid-title12">Manually Setting Table and Column Statistics with ALTER TABLE</h2>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="perf_stats__perf_table_stats_manual">
+    <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="concept_s3c_4gl_mdb__concept_wpt_pgl_mdb">
 
-    <h2 class="title topictitle2" id="ariaid-title9">Setting the NUMROWS Value Manually through ALTER TABLE</h2>
+      <h3 class="title topictitle3" id="ariaid-title13">Setting Table Statistics</h3>
 
-    <div class="body conbody">
+      <div class="body conbody">
 
-      <p class="p">
-        The most crucial piece of data in all the statistics is the number of rows in the table (for an
-        unpartitioned or partitioned table) and for each partition (for a partitioned table). The <code class="ph codeph">COMPUTE STATS</code>
-        statement always gathers statistics about all columns, as well as overall table statistics. If it is not
-        practical to do a full <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
-        operation after adding a partition or inserting data, or if you can see that Impala would produce a more
-        efficient plan if the number of rows was different, you can manually set the number of rows through an
-        <code class="ph codeph">ALTER TABLE</code> statement:
-      </p>
+        <p class="p">
+          The most crucial piece of data in all the statistics is the number of rows in the
+          table (for an unpartitioned or partitioned table) and for each partition (for a
+          partitioned table). The <code class="ph codeph">COMPUTE STATS</code> statement always gathers
+          statistics about all columns, as well as overall table statistics. If it is not
+          practical to do a full <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+          STATS</code> operation after adding a partition or inserting data, or if you can see
+          that Impala would produce a more efficient plan if the number of rows was different,
+          you can manually set the number of rows through an <code class="ph codeph">ALTER TABLE</code>
+          statement:
+        </p>
 
 <pre class="pre codeblock"><code>
 -- Set total number of rows. Applies to both unpartitioned and partitioned tables.
@@ -723,11 +879,13 @@ alter table <var class="keyword varname">table_name</var> set tblproperties('num
 alter table <var class="keyword varname">table_name</var> partition (<var class="keyword varname">keycol1</var>=<var class="keyword varname">val1</var>,<var class="keyword varname">keycol2</var>=<var class="keyword varname">val2</var>...) set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
 </code></pre>
 
-      <p class="p">
-        This statement avoids re-scanning any data files. (The requirement to include the <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> property is relatively new, as a
-        result of the issue <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a>
-        for the Hive metastore.)
-      </p>
+        <p class="p">
+          This statement avoids re-scanning any data files. (The requirement to include the
+          <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> property is relatively new, as a
+          result of the issue
+          <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a>
+          for the Hive metastore.)
+        </p>
 
 <pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
 Inserted 1000000000 rows in 181.98s
@@ -737,40 +895,48 @@ Inserted 1000000 rows in 15.32s
 -- Now there are 1001000000 rows. We can update this single data point in the stats.
 alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
 
-      <p class="p">
-        For a partitioned table, update both the per-partition number of rows and the number of rows for the whole
-        table:
-      </p>
+        <p class="p">
+          For a partitioned table, update both the per-partition number of rows and the number
+          of rows for the whole table:
+        </p>
 
 <pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
 -- change the numRows property for the partition and the overall table.
 alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
 alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
 
-      <p class="p">
-        In practice, the <code class="ph codeph">COMPUTE STATS</code> statement, or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
-        for a partitioned table, should be fast and convenient enough that this technique is only useful for the very
-        largest partitioned tables.
-        
-        
-        Because the column statistics might be left in a stale state, do not use this technique as a replacement
-        for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique if all other means of collecting statistics are impractical, or as a
-        low-overhead operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
-      </p>
-    </div>
-  </article>
+        <p class="p">
+          In practice, the <code class="ph codeph">COMPUTE STATS</code> statement, or <code class="ph codeph">COMPUTE
+          INCREMENTAL STATS</code> for a partitioned table, should be fast and convenient
+          enough that this technique is only useful for the very largest partitioned tables.
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="perf_stats__perf_column_stats_manual">
-    <h2 class="title topictitle2" id="ariaid-title10">Setting Column Stats Manually through ALTER TABLE</h2>
-    <div class="body conbody">
-      <p class="p">
-        In <span class="keyword">Impala 2.6</span> and higher, you can also use the <code class="ph codeph">SET COLUMN STATS</code>
-        clause of <code class="ph codeph">ALTER TABLE</code> to manually set or change column statistics.
-        Only use this technique in cases where it is impractical to run
-        <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
-        frequently enough to keep up with data changes for a huge table.
-      </p>
-      <div class="p">
+
+          Because the column statistics might be left in a stale state, do not use this
+          technique as a replacement for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique
+          if all other means of collecting statistics are impractical, or as a low-overhead
+          operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or
+          <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
+        </p>
+
+      </div>
+
+    </article>
+
+    <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="concept_s3c_4gl_mdb__concept_asb_vgl_mdb">
+
+      <h3 class="title topictitle3" id="ariaid-title14">Setting Column Statistics</h3>
+
+      <div class="body conbody">
+
+        <p class="p">
+          In <span class="keyword">Impala 2.6</span> and higher, you can also use the <code class="ph codeph">SET
+          COLUMN STATS</code> clause of <code class="ph codeph">ALTER TABLE</code> to manually set or change
+          column statistics. Only use this technique in cases where it is impractical to run
+          <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+          frequently enough to keep up with data changes for a huge table.
+        </p>
+
+        <div class="p">
         You specify a case-insensitive symbolic name for the kind of statistics:
         <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
         The key names and values are both quoted. This operation applies to an entire table,
@@ -796,30 +962,37 @@ show column stats t1;
 +--------+--------+------------------+--------+----------+----------+
 </code></pre>
       </div>
-    </div>
+
+      </div>
+
+    </article>
+
   </article>
 
-  <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_examples">
+  <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="perf_stats__perf_stats_examples">
 
-    <h2 class="title topictitle2" id="ariaid-title11">Examples of Using Table and Column Statistics with Impala</h2>
+    <h2 class="title topictitle2" id="ariaid-title15">Examples of Using Table and Column Statistics with Impala</h2>
 
     <div class="body conbody">
 
       <p class="p">
-        The following examples walk through a sequence of <code class="ph codeph">SHOW TABLE STATS</code>, <code class="ph codeph">SHOW COLUMN
-        STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code>
-        statements to illustrate various aspects of how Impala uses statistics to help optimize queries.
+        The following examples walk through a sequence of <code class="ph codeph">SHOW TABLE STATS</code>,
+        <code class="ph codeph">SHOW COLUMN STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and
+        <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements to illustrate various
+        aspects of how Impala uses statistics to help optimize queries.
       </p>
 
       <p class="p">
-        This example shows table and column statistics for the <code class="ph codeph">STORE</code> column used in the
-        <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS benchmarks for decision
-        support</a> systems. It is a tiny table holding data for 12 stores. Initially, before any statistics are
-        gathered by a <code class="ph codeph">COMPUTE STATS</code> statement, most of the numeric fields show placeholder values
-        of -1, indicating that the figures are unknown. The figures that are filled in are values that are easily
-        countable or deducible at the physical level, such as the number of files, total data size of the files,
-        and the maximum and average sizes for data types that have a constant size such as <code class="ph codeph">INT</code>,
-        <code class="ph codeph">FLOAT</code>, and <code class="ph codeph">TIMESTAMP</code>.
+        This example shows table and column statistics for the <code class="ph codeph">STORE</code> column
+        used in the <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS
+        benchmarks for decision support</a> systems. It is a tiny table holding data for 12
+        stores. Initially, before any statistics are gathered by a <code class="ph codeph">COMPUTE
+        STATS</code> statement, most of the numeric fields show placeholder values of -1,
+        indicating that the figures are unknown. The figures that are filled in are values that
+        are easily countable or deducible at the physical level, such as the number of files,
+        total data size of the files, and the maximum and average sizes for data types that have
+        a constant size such as <code class="ph codeph">INT</code>, <code class="ph codeph">FLOAT</code>, and
+        <code class="ph codeph">TIMESTAMP</code>.
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; show table stats store;
@@ -866,11 +1039,13 @@ Returned 1 row(s) in 0.03s
 Returned 29 row(s) in 0.04s</code></pre>
 
       <p class="p">
-        With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement for column statistics, you had to specify each
-        column for which to gather statistics. The Impala <code class="ph codeph">COMPUTE STATS</code> statement automatically
-        gathers statistics for all columns, because it reads through the entire table relatively quickly and can
-        efficiently compute the values for all the columns. This example shows how after running the
-        <code class="ph codeph">COMPUTE STATS</code> statement, statistics are filled in for both the table and all its columns:
+        With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement for column statistics, you had to
+        specify each column for which to gather statistics. The Impala <code class="ph codeph">COMPUTE
+        STATS</code> statement automatically gathers statistics for all columns, because it
+        reads through the entire table relatively quickly and can efficiently compute the values
+        for all the columns. This example shows how after running the <code class="ph codeph">COMPUTE
+        STATS</code> statement, statistics are filled in for both the table and all its
+        columns:
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats store;
@@ -924,11 +1099,12 @@ Returned 1 row(s) in 0.02s
 Returned 29 row(s) in 0.04s</code></pre>
 
       <p class="p">
-        The following example shows how statistics are represented for a partitioned table. In this case, we have
-        set up a table to hold the world's most trivial census data, a single <code class="ph codeph">STRING</code> field,
-        partitioned by a <code class="ph codeph">YEAR</code> column. The table statistics include a separate entry for each
-        partition, plus final totals for the numeric fields. The column statistics include some easily deducible
-        facts for the partitioning column, such as the number of distinct values (the number of partition
+        The following example shows how statistics are represented for a partitioned table. In
+        this case, we have set up a table to hold the world's most trivial census data, a single
+        <code class="ph codeph">STRING</code> field, partitioned by a <code class="ph codeph">YEAR</code> column. The table
+        statistics include a separate entry for each partition, plus final totals for the
+        numeric fields. The column statistics include some easily deducible facts for the
+        partitioning column, such as the number of distinct values (the number of partition
         subdirectories).
 
       </p>
@@ -965,8 +1141,8 @@ Returned 8 row(s) in 0.02s
 Returned 2 row(s) in 0.02s</code></pre>
 
       <p class="p">
-        The following example shows how the statistics are filled in by a <code class="ph codeph">COMPUTE STATS</code> statement
-        in Impala.
+        The following example shows how the statistics are filled in by a <code class="ph codeph">COMPUTE
+        STATS</code> statement in Impala.
       </p>
 
 <pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats census;
@@ -1000,13 +1176,17 @@ Returned 8 row(s) in 0.02s
 Returned 2 row(s) in 0.02s</code></pre>
 
       <p class="p">
-        For examples showing how some queries work differently when statistics are available, see
-        <a class="xref" href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order Optimization</a>. You can see how Impala executes a query
-        differently in each case by observing the <code class="ph codeph">EXPLAIN</code> output before and after collecting
-        statistics. Measure the before and after query times, and examine the throughput numbers in before and
-        after <code class="ph codeph">SUMMARY</code> or <code class="ph codeph">PROFILE</code> output, to verify how much the improved plan
-        speeds up performance.
+        For examples showing how some queries work differently when statistics are available,
+        see <a class="xref" href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order Optimization</a>. You can see how Impala
+        executes a query differently in each case by observing the <code class="ph codeph">EXPLAIN</code>
+        output before and after collecting statistics. Measure the before and after query times,
+        and examine the throughput numbers in before and after <code class="ph codeph">SUMMARY</code> or
+        <code class="ph codeph">PROFILE</code> output, to verify how much the improved plan speeds up
+        performance.
       </p>
+
     </div>
+
   </article>
-</article></main></body></html>
\ No newline at end of file
+
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/592fd85d/docs/build/html/topics/impala_perf_testing.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_perf_testing.html b/docs/build/html/topics/impala_perf_testing.html
index 6d312dc..1ecf66f 100644
--- a/docs/build/html/topics/impala_perf_testing.html
+++ b/docs/build/html/topics/impala_perf_testing.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Testing Impala Performance</h1>
   

http://git-wip-us.apache.org/repos/asf/impala/blob/592fd85d/docs/build/html/topics/impala_performance.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_performance.html b/docs/build/html/topics/impala_performance.html
index a8e7efb..7ff8e5c 100644
--- a/docs/build/html/topics/impala_performance.html
+++ b/docs/build/html/topics/impala_performance.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im
 pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im
 pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for Performance</h1>
   

http://git-wip-us.apache.org/repos/asf/impala/blob/592fd85d/docs/build/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build/html/topics/impala_planning.html b/docs/build/html/topics/impala_planning.html
index d4fe262..cedbd58 100644
--- a/docs/build/html/topics/impala_planning.html
+++ b/docs/build/html/topics/impala_planning.html
@@ -1,6 +1,6 @@
 <!DOCTYPE html
   SYSTEM "about:legacy-compat">
-<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_cluster_sizing.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.8.x"><meta name="version" content="Impala 2.8.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 2.12x"><meta name="version" content="Impala 2.12x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
 
   <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
   
@@ -17,4 +17,4 @@
 
     <p class="p toc"></p>
   </div>
-<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_cluster_sizing.html">Cluster Sizing Guidelines for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>
\ No newline at end of file