You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:23 UTC
[14/51] [partial] impala git commit: [DOCS] Impala doc site update for 3.0

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_hdfs_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_hdfs_caching.html b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
new file mode 100644
index 0000000..596675d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
@@ -0,0 +1,578 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hdfs_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using HDFS Caching with Impala (Impala 2.1 or higher only)</title></head><body id="hdfs_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Using HDFS Caching with Impala (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      HDFS caching provides performance and scalability benefits in production environments where Impala queries
+      and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes,
+      making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory.
+      Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when
+      using data from the Linux OS cache.
+    </p>
+
+    <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+      <p class="p">
+        On a small or lightly loaded cluster, HDFS caching might not produce any speedup. It might even lead to
+        slower queries, if I/O read operations that were performed in parallel across the entire cluster are replaced by in-memory
+        operations operating on a smaller number of hosts. The hosts where the HDFS blocks are cached can become
+        bottlenecks because they experience high CPU load while processing the cached data blocks, while other hosts remain idle.
+        Therefore, always compare performance with and without this feature enabled, using a realistic workload.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, you can spread the CPU load more evenly by specifying the <code class="ph codeph">WITH REPLICATION</code>
+        clause of the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+        This clause lets you control the replication factor for
+        HDFS caching for a specific table or partition. By default, each cached block is
+        only present on a single host, which can lead to CPU contention if the same host
+        processes each cached block. Increasing the replication factor lets Impala choose
+        different hosts to process different cached blocks, to better distribute the CPU load.
+        Always use a <code class="ph codeph">WITH REPLICATION</code> setting of at least 3, and adjust upward
+        if necessary to match the replication factor for the underlying HDFS data files.
+      </p>
+      <p class="p">
+        In <span class="keyword">Impala 2.5</span> and higher, Impala automatically randomizes which host processes
+        a cached HDFS block, to avoid CPU hotspots. For tables where HDFS caching is not applied,
+        Impala designates which host to process a data block using an algorithm that estimates
+        the load on each host. If CPU hotspots still arise during queries,
+        you can enable additional randomization for the scheduling algorithm for non-HDFS cached data
+        by setting the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+      </p>
+    </div>
+
+    <p class="p toc inpage"></p>
+
+
+
+    <p class="p">
+      For background information about how to set up and manage HDFS caching for a <span class="keyword"></span> cluster, see
+      <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="hdfs_caching__hdfs_caching_overview">
+
+    <h2 class="title topictitle2" id="ariaid-title2">Overview of HDFS Caching for Impala</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        In <span class="keyword">Impala 1.4</span> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that
+        repeated queries can take advantage of data <span class="q">"pinned"</span> in memory regardless of how much data is
+        processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be
+        pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This
+        technique is suitable for tables or partitions that are frequently accessed and are small enough to fit
+        entirely within the HDFS memory cache. For example, you might designate several dimension tables to be
+        pinned in the cache, to speed up many different join queries that reference them. Or in a partitioned
+        table, you might pin a partition holding data from the most recent time period because that data will be
+        queried intensively; then when the next set of data arrives, you could unpin the previous partition and pin
+        the partition holding the new data.
+      </p>
+
+      <p class="p">
+        Because this Impala performance feature relies on HDFS infrastructure, it only applies to Impala tables
+        that use HDFS data files. HDFS caching for Impala does not apply to HBase tables, S3 tables,
+        Kudu tables,
+        or Isilon tables.
+      </p>
+
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="hdfs_caching__hdfs_caching_prereqs">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Setting Up HDFS Caching for Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        To use HDFS caching with Impala, first set up that feature for your <span class="keyword"></span> cluster:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          <p class="p">
+          Decide how much memory to devote to the HDFS cache on each host. Remember that the total memory available
+          for cached data is the sum of the cache sizes on all the hosts. By default, any data block is only cached on one
+          host, although you can cache a block across multiple hosts by increasing the replication factor.
+
+          </p>
+        </li>
+
+        <li class="li">
+          <div class="p">
+          Issue <span class="keyword cmdname">hdfs cacheadmin</span> commands to set up one or more cache pools, owned by the same
+          user as the <span class="keyword cmdname">impalad</span> daemon (typically <code class="ph codeph">impala</code>). For example:
+<pre class="pre codeblock"><code>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
+</code></pre>
+          For details about the <span class="keyword cmdname">hdfs cacheadmin</span> command, see
+          <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+          </div>
+        </li>
+      </ul>
+
+      <p class="p">
+        Once HDFS caching is enabled and one or more pools are available, see
+        <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching_ddl">Enabling HDFS Caching for Impala Tables and Partitions</a> for how to choose which Impala data to load
+        into the HDFS cache. On the Impala side, you specify the cache pool name defined by the <code class="ph codeph">hdfs
+        cacheadmin</code> command in the Impala DDL statements that enable HDFS caching for a table or partition,
+        such as <code class="ph codeph">CREATE TABLE ... CACHED IN <var class="keyword varname">pool</var></code> or <code class="ph codeph">ALTER TABLE ... SET
+        CACHED IN <var class="keyword varname">pool</var></code>.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="hdfs_caching__hdfs_caching_ddl">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Enabling HDFS Caching for Impala Tables and Partitions</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Begin by choosing which tables or partitions to cache. For example, these might be lookup tables that are
+        accessed by many different join queries, or partitions corresponding to the most recent time period that
+        are analyzed by different reports or ad hoc queries.
+      </p>
+
+      <p class="p">
+        In your SQL statements, you specify logical divisions such as tables and partitions to be cached. Impala
+        translates these requests into HDFS-level directives that apply to particular directories and files. For
+        example, given a partitioned table <code class="ph codeph">CENSUS</code> with a partition key column
+        <code class="ph codeph">YEAR</code>, you could choose to cache all or part of the data as follows:
+      </p>
+
+      <p class="p">
+        In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+        for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+        a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+        When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+        selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+        usage on a single host when the same cached data block is processed multiple times.
+        Where practical, specify a value greater than or equal to the HDFS block replication factor.
+      </p>
+
+<pre class="pre codeblock"><code>-- Cache the entire table (all partitions).
+alter table census set cached in '<var class="keyword varname">pool_name</var>';
+
+-- Remove the entire table from the cache.
+alter table census set uncached;
+
+-- Cache a portion of the table (a single partition).
+-- If the table is partitioned by multiple columns (such as year, month, day),
+-- the ALTER TABLE command must specify values for all those columns.
+alter table census partition (year=1960) set cached in '<var class="keyword varname">pool_name</var>';
+
+<span class="ph">-- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any
+-- single host when the same data block is processed multiple times.
+alter table census partition (year=1970)
+  set cached in '<var class="keyword varname">pool_name</var>' with replication = 4;</span>
+
+-- At each stage, check the volume of cached data.
+-- For large tables or partitions, the background loading might take some time,
+-- so you might have to wait and reissue the statement until all the data
+-- has finished being loaded into the cache.
+show table stats census;
++-------+-------+--------+------+--------------+--------+
+| year  | #Rows | #Files | Size | Bytes Cached | Format |
++-------+-------+--------+------+--------------+--------+
+| 1900  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1940  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| 1960  | -1    | 1      | 11B  | 11B          | TEXT   |
+| 1970  | -1    | 1      | 11B  | NOT CACHED   | TEXT   |
+| Total | -1    | 4      | 44B  | 11B          |        |
++-------+-------+--------+------+--------------+--------+
+</code></pre>
+
+      <p class="p">
+        <strong class="ph b">CREATE TABLE considerations:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature affects the Impala <code class="ph codeph">CREATE TABLE</code> statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+        <p class="p">
+          You can put a <code class="ph codeph">CACHED IN '<var class="keyword varname">pool_name</var>'</code> clause
+          <span class="ph">and optionally a <code class="ph codeph">WITH REPLICATION = <var class="keyword varname">number_of_hosts</var></code> clause</span>
+          at the end of a
+          <code class="ph codeph">CREATE TABLE</code> statement to automatically cache the entire contents of the table,
+          including any partitions added later. The <var class="keyword varname">pool_name</var> is a pool that you previously set
+          up with the <span class="keyword cmdname">hdfs cacheadmin</span> command.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          Once a table is designated for HDFS caching through the <code class="ph codeph">CREATE TABLE</code> statement, if new
+          partitions are added later through <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements, the data in
+          those new partitions is automatically cached in the same pool.
+        </p>
+        </li>
+
+        <li class="li">
+        <p class="p">
+          If you want to perform repetitive queries on a subset of data from a large table, and it is not practical
+          to designate the entire table or specific partitions for HDFS caching, you can create a new cached table
+          with just a subset of the data by using <code class="ph codeph">CREATE TABLE ... CACHED IN '<var class="keyword varname">pool_name</var>'
+          AS SELECT ... WHERE ...</code>. When you are finished with generating reports from this subset of data,
+          drop the table and both the data files and the data cached in RAM are automatically deleted.
+        </p>
+        </li>
+      </ul>
+
+      <p class="p">
+        See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for the full syntax.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Other memory considerations:</strong>
+      </p>
+
+      <p class="p">
+        Certain DDL operations, such as <code class="ph codeph">ALTER TABLE ... SET LOCATION</code>, are blocked while the
+        underlying HDFS directories contain cached files. You must uncache the files first, before changing the
+        location, dropping the table, and so on.
+      </p>
+
+      <p class="p">
+        When data is requested to be pinned in memory, that process happens in the background without blocking
+        access to the data while the caching is in progress. Loading the data from disk could take some time.
+        Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
+        been pinned yet. When files are added to a table or partition whose contents are cached, Impala
+        automatically detects those changes and performs a <code class="ph codeph">REFRESH</code> automatically once the relevant
+        data is cached.
+      </p>
+
+      <p class="p">
+        The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota
+        that is enforced by the underlying HDFS service. Before requesting to pin an Impala table or partition in
+        memory, check that its size does not exceed this quota.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        Because the HDFS cache consists of combined memory from all the DataNodes in the cluster, cached tables or
+        partitions can be bigger than the amount of HDFS cache memory on any single host.
+      </div>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="hdfs_caching__hdfs_caching_etl">
+
+    <h2 class="title topictitle2" id="ariaid-title5">Loading and Removing Data with HDFS Caching Enabled</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        When HDFS caching is enabled, extra processing happens in the background when you add or remove data
+        through statements such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">DROP TABLE</code>.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">Inserting or loading data:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When Impala performs an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> or
+          <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code> statement for a table or
+          partition that is cached, the new data files are automatically cached and Impala recognizes that fact
+          automatically.
+        </li>
+
+        <li class="li">
+          If you perform an <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> through Hive, as always, Impala
+          only recognizes the new data files after a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+          statement in Impala.
+        </li>
+
+        <li class="li">
+          If the cache pool is entirely full, or becomes full before all the requested data can be cached, the
+          Impala DDL statement returns an error. This is to avoid situations where only some of the requested data
+          could be cached.
+        </li>
+
+        <li class="li">
+          When HDFS caching is enabled for a table or partition, new data files are cached automatically when they
+          are added to the appropriate directory in HDFS, without the need for a <code class="ph codeph">REFRESH</code> statement
+          in Impala. Impala automatically performs a <code class="ph codeph">REFRESH</code> once the new data is loaded into the
+          HDFS cache.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Dropping tables, partitions, or cache pools:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE</a></code> and
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... DROP PARTITION</a></code>
+        statements as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          When you issue a <code class="ph codeph">DROP TABLE</code> for a table that is entirely cached, or has some partitions
+          cached, the <code class="ph codeph">DROP TABLE</code> succeeds and all the cache directives Impala submitted for that
+          table are removed from the HDFS cache system.
+        </li>
+
+        <li class="li">
+          The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code>. The operation succeeds and any cache
+          directives are removed.
+        </li>
+
+        <li class="li">
+          As always, the underlying data files are removed if the dropped table is an internal table, or the
+          dropped partition is in its default location underneath an internal table. The data files are left alone
+          if the dropped table is an external table, or if the dropped partition is in a non-default location.
+        </li>
+
+        <li class="li">
+          If you designated the data files as cached through the <span class="keyword cmdname">hdfs cacheadmin</span> command, and
+          the data files are left behind as described in the previous item, the data files remain cached. Impala
+          only removes the cache directives submitted by Impala through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements. It is OK to have multiple redundant cache directives pertaining
+          to the same files; the directives all have unique IDs and owners so that the system can tell them apart.
+        </li>
+
+        <li class="li">
+          If you drop an HDFS cache pool through the <span class="keyword cmdname">hdfs cacheadmin</span> command, all the Impala
+          data files are preserved, just no longer cached. After a subsequent <code class="ph codeph">REFRESH</code>,
+          <code class="ph codeph">SHOW TABLE STATS</code> reports 0 bytes cached for each associated Impala table or partition.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Relocating a table or partition:</strong>
+      </p>
+
+      <p class="p">
+        The HDFS caching feature interacts with the Impala
+        <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... SET LOCATION</a></code>
+        statement as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you have designated a table or partition as cached through the <code class="ph codeph">CREATE TABLE</code> or
+          <code class="ph codeph">ALTER TABLE</code> statements, subsequent attempts to relocate the table or partition through
+          an <code class="ph codeph">ALTER TABLE ... SET LOCATION</code> statement will fail. You must issue an <code class="ph codeph">ALTER
+          TABLE ... SET UNCACHED</code> statement for the table or partition first. Otherwise, Impala would lose
+          track of some cached data files and have no way to uncache them later.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="hdfs_caching__hdfs_caching_admin">
+
+    <h2 class="title topictitle2" id="ariaid-title6">Administration for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are the guidelines and steps to check or change the status of HDFS caching for Impala data:
+      </p>
+
+      <p class="p">
+        <strong class="ph b">hdfs cacheadmin command:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          If you drop a cache pool with the <span class="keyword cmdname">hdfs cacheadmin</span> command, Impala queries against the
+          associated data files will still work, by falling back to reading the files from disk. After performing a
+          <code class="ph codeph">REFRESH</code> on the table, Impala reports the number of bytes cached as 0 for all associated
+          tables and partitions.
+        </li>
+
+        <li class="li">
+          You might use <span class="keyword cmdname">hdfs cacheadmin</span> to get a list of existing cache pools, or detailed
+          information about the pools, as follows:
+<pre class="pre codeblock"><code>hdfs cacheadmin -listDirectives         # Basic info
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01
+...
+
+hdfs cacheadmin -listDirectives -stats  # More details
+Found 122 entries
+  ID POOL       REPL EXPIRY  PATH                                                        BYTES_NEEDED  BYTES_CACHED  FILES_NEEDED  FILES_CACHED
+ 123 testPool      1 never   /user/hive/warehouse/tpcds.store_sales                                 0             0             0             0
+ 124 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15         143169        143169             1             1
+ 125 testPool      1 never   /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01         112447        112447             1             1
+...
+</code></pre>
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala SHOW statement:</strong>
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          For each table or partition, the <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code>
+          statement displays the number of bytes currently cached by the HDFS caching feature. If there are no
+          cache directives in place for that table or partition, the result set displays <code class="ph codeph">NOT
+          CACHED</code>. A value of 0, or a smaller number than the overall size of the table or partition,
+          indicates that the cache request has been submitted but the data has not been entirely loaded into memory
+          yet. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+        </li>
+      </ul>
+
+      <p class="p">
+        <strong class="ph b">Impala memory limits:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the Impala memory limits as follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          The maximum size of each HDFS cache pool is specified externally to Impala, through the <span class="keyword cmdname">hdfs
+          cacheadmin</span> command.
+        </li>
+
+        <li class="li">
+          All the memory used for HDFS caching is separate from the <span class="keyword cmdname">impalad</span> daemon address space
+          and does not count towards the limits of the <code class="ph codeph">--mem_limit</code> startup option,
+          <code class="ph codeph">MEM_LIMIT</code> query option, or further limits imposed through YARN resource management or
+          the Linux <code class="ph codeph">cgroups</code> mechanism.
+        </li>
+
+        <li class="li">
+          Because accessing HDFS cached data avoids a memory-to-memory copy operation, queries involving cached
+          data require less memory on the Impala side than the equivalent queries on uncached data. In addition to
+          any performance benefits in a single-user environment, the reduced memory helps to improve scalability
+          under high-concurrency workloads.
+        </li>
+      </ul>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="hdfs_caching__hdfs_caching_performance">
+
+    <h2 class="title topictitle2" id="ariaid-title7">Performance Considerations for HDFS Caching with Impala</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        In Impala 1.4.0 and higher, Impala supports efficient reads from data that is pinned in memory through HDFS
+        caching. Impala takes advantage of the HDFS API and reads the data from memory rather than from disk
+        whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where
+        you specify HDFS paths.
+      </p>
+
+      <p class="p">
+        When you examine the output of the <span class="keyword cmdname">impala-shell</span> <span class="keyword cmdname">SUMMARY</span> command, or
+        look in the metrics report for the <span class="keyword cmdname">impalad</span> daemon, you see how many bytes are read from
+        the HDFS cache. For example, this excerpt from a query profile illustrates that all the data read during a
+        particular phase of the query came from the HDFS cache, because the <code class="ph codeph">BytesRead</code> and
+        <code class="ph codeph">BytesReadDataNodeCache</code> values are identical.
+      </p>
+
+<pre class="pre codeblock"><code>HDFS_SCAN_NODE (id=0):(Total: 11s114ms, non-child: 11s114ms, % non-child: 100.00%)
+        - AverageHdfsReadThreadConcurrency: 0.00
+        - AverageScannerThreadConcurrency: 32.75
+<strong class="ph b">        - BytesRead: 10.47 GB (11240756479)
+        - BytesReadDataNodeCache: 10.47 GB (11240756479)</strong>
+        - BytesReadLocal: 10.47 GB (11240756479)
+        - BytesReadShortCircuit: 10.47 GB (11240756479)
+        - DecompressionTime: 27s572ms
+</code></pre>
+
+      <p class="p">
+        For queries involving smaller amounts of data, or in single-user workloads, you might not notice a
+        significant difference in query response time with or without HDFS caching. Even with HDFS caching turned
+        off, the data for the query might still be in the Linux OS buffer cache. The benefits become clearer as
+        data volume increases, and especially as the system processes more concurrent queries. HDFS caching
+        improves the scalability of the overall system. That is, it prevents query performance from declining when
+        the workload outstrips the capacity of the Linux OS cache.
+      </p>
+
+      <p class="p">
+        Due to a limitation of HDFS, zero-copy reads are not supported with
+        encryption. Where practical, avoid HDFS caching for Impala data
+        files in encryption zones. The queries fall back to the normal read
+        path during query execution, which might cause some performance overhead.
+      </p>
+
+      <p class="p">
+        <strong class="ph b">SELECT considerations:</strong>
+      </p>
+
+      <p class="p">
+        The Impala HDFS caching feature interacts with the
+        <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement and query performance as
+        follows:
+      </p>
+
+      <ul class="ul">
+        <li class="li">
+          Impala automatically reads from memory any data that has been designated as cached and actually loaded
+          into the HDFS cache. (It could take some time after the initial request to fully populate the cache for a
+          table with large size or many partitions.) The speedup comes from two aspects: reading from RAM instead
+          of disk, and accessing the data straight from the cache area instead of copying from one RAM area to
+          another. This second aspect yields further performance improvement over the standard OS caching
+          mechanism, which still results in memory-to-memory copying of cached data.
+        </li>
+
+        <li class="li">
+          For small amounts of data, the query speedup might not be noticeable in terms of wall clock time. The
+          performance might be roughly the same with HDFS caching turned on or off, due to recently used data being
+          held in the Linux OS cache. The difference is more pronounced with:
+          <ul class="ul">
+            <li class="li">
+              Data volumes (for all queries running concurrently) that exceed the size of the Linux OS cache.
+            </li>
+
+            <li class="li">
+              A busy cluster running many concurrent queries, where the reduction in memory-to-memory copying and
+              overall memory usage during queries results in greater scalability and throughput.
+            </li>
+
+            <li class="li">
+              Thus, to really exercise and benchmark this feature in a development environment, you might need to
+              simulate realistic workloads and concurrent queries that match your production environment.
+            </li>
+
+            <li class="li">
+              One way to simulate a heavy workload on a lightly loaded system is to flush the OS buffer cache (on
+              each DataNode) between iterations of queries against the same tables or partitions:
+<pre class="pre codeblock"><code>$ sync
+$ echo 1 &gt; /proc/sys/vm/drop_caches
+</code></pre>
+            </li>
+          </ul>
+        </li>
+
+        <li class="li">
+          Impala queries take advantage of HDFS cached data regardless of whether the cache directive was issued by
+          Impala or externally through the <span class="keyword cmdname">hdfs cacheadmin</span> command, for example for an external
+          table where the cached data files might be accessed by several different Hadoop components.
+        </li>
+
+        <li class="li">
+          If your query returns a large result set, the time reported for the query could be dominated by the time
+          needed to print the results on the screen. To measure the time for the underlying query processing, query
+          the <code class="ph codeph">COUNT()</code> of the big result set, which does all the same processing but only prints a
+          single line to the screen.
+        </li>
+      </ul>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_joins.html b/docs/build3x/html/topics/impala_perf_joins.html
new file mode 100644
index 0000000..7def5b4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_joins.html
@@ -0,0 +1,508 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Performance Considerations for Join Queries</title></head><body id="perf_joins"><ma
 in role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Performance Considerations for Join Queries</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Queries involving join operations often require more tuning than queries that refer to only one table. The
+      maximum size of the result set from a join query is the product of the number of rows in all the joined
+      tables. When joining several tables with millions or billions of rows, any missed opportunity to filter the
+      result set, or other inefficiency in the query, could lead to an operation that does not finish in a
+      practical time and has to be cancelled.
+    </p>
+
+    <p class="p">
+      The simplest technique for tuning an Impala join query is to collect statistics on each table involved in the
+      join using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS</a></code>
+      statement, and then let Impala automatically optimize the query based on the size of each table, number of
+      distinct values of each column, and so on. The <code class="ph codeph">COMPUTE STATS</code> statement and the join
+      optimization are new features introduced in Impala 1.2.2. For accurate statistics about each table, issue the
+      <code class="ph codeph">COMPUTE STATS</code> statement after loading the data into that table, and again if the amount of
+      data changes substantially due to an <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, adding a partition,
+      and so on.
+    </p>
+
+    <p class="p">
+      If statistics are not available for all the tables in the join query, or if Impala chooses a join order that
+      is not the most efficient, you can override the automatic join order optimization by specifying the
+      <code class="ph codeph">STRAIGHT_JOIN</code> keyword immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code>
+      or <code class="ph codeph">ALL</code> keywords. In this case, Impala uses the order the tables appear in the query to guide how the
+      joins are processed.
+    </p>
+
+    <p class="p">
+      When you use the <code class="ph codeph">STRAIGHT_JOIN</code> technique, you must order the tables in the join query
+      manually instead of relying on the Impala optimizer. The optimizer uses sophisticated techniques to estimate
+      the size of the result set at each stage of the join. For manual ordering, use this heuristic approach to
+      start with, and then experiment to fine-tune the order:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        Specify the largest table first. This table is read from disk by each Impala node and so its size is not
+        significant in terms of memory usage during the query.
+      </li>
+
+      <li class="li">
+        Next, specify the smallest table. The contents of the second, third, and so on tables are all transmitted
+        across the network. You want to minimize the size of the result set from each subsequent stage of the join
+        query. The most likely approach involves joining a small table first, so that the result set remains small
+        even as subsequent larger tables are processed.
+      </li>
+
+      <li class="li">
+        Join the next smallest table, then the next smallest, and so on.
+      </li>
+
+      <li class="li">
+        For example, if you had tables <code class="ph codeph">BIG</code>, <code class="ph codeph">MEDIUM</code>, <code class="ph codeph">SMALL</code>, and
+        <code class="ph codeph">TINY</code>, the logical join order to try would be <code class="ph codeph">BIG</code>, <code class="ph codeph">TINY</code>,
+        <code class="ph codeph">SMALL</code>, <code class="ph codeph">MEDIUM</code>.
+      </li>
+    </ul>
+
+    <p class="p">
+      The terms <span class="q">"largest"</span> and <span class="q">"smallest"</span> refers to the size of the intermediate result set based on the
+      number of rows and columns from each table that are part of the result set. For example, if you join one
+      table <code class="ph codeph">sales</code> with another table <code class="ph codeph">customers</code>, a query might find results from
+      100 different customers who made a total of 5000 purchases. In that case, you would specify <code class="ph codeph">SELECT
+      ... FROM sales JOIN customers ...</code>, putting <code class="ph codeph">customers</code> on the right side because it
+      is smaller in the context of this query.
+    </p>
+
+    <p class="p">
+      The Impala query planner chooses between different techniques for performing join queries, depending on the
+      absolute and relative sizes of the tables. <strong class="ph b">Broadcast joins</strong> are the default, where the right-hand table
+      is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes
+      involved in the query. The alternative technique is known as a <strong class="ph b">partitioned join</strong> (not related to a
+      partitioned table), which is more suitable for large tables of roughly equal size. With this technique,
+      portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in
+      parallel. The choice of broadcast or partitioned join also depends on statistics being available for all
+      tables in the join, gathered by the <code class="ph codeph">COMPUTE STATS</code> statement.
+    </p>
+
+    <p class="p">
+      To see which join strategy is used for a particular query, issue an <code class="ph codeph">EXPLAIN</code> statement for
+      the query. If you find that a query uses a broadcast join when you know through benchmarking that a
+      partitioned join would be more efficient, or vice versa, add a hint to the query to specify the precise join
+      mechanism to use. See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+    </p>
+  </div>
+
+  <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="perf_joins__joins_no_stats">
+
+    <h2 class="title topictitle2" id="ariaid-title2">How Joins Are Processed when Statistics Are Unavailable</h2>
+
+
+    <div class="body conbody">
+
+      <p class="p">
+        If table or column statistics are not available for some tables in a join, Impala still reorders the tables
+        using the information that is available. Tables with statistics are placed on the left side of the join
+        order, in descending order of cost based on overall size and cardinality. Tables without statistics are
+        treated as zero-size, that is, they are always placed on the right side of the join order.
+      </p>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="perf_joins__straight_join">
+
+    <h2 class="title topictitle2" id="ariaid-title3">Overriding Join Reordering with STRAIGHT_JOIN</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        If an Impala join query is inefficient because of outdated statistics or unexpected data distribution, you
+        can keep Impala from reordering the joined tables by using the <code class="ph codeph">STRAIGHT_JOIN</code> keyword
+        immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code>
+        keywords. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword turns off
+        the reordering of join clauses that Impala does internally, and produces a plan that relies on the join
+        clauses being ordered optimally in the query text. In this case, rewrite the query so that the largest
+        table is on the left, followed by the next largest, and so on until the smallest table is on the right.
+      </p>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        <p class="p">
+        The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+        block containing the hint. It does not affect the join order of nested queries, such as views,
+        inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+        tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+      </p>
+      </div>
+
+      <p class="p">
+        In this example, the subselect from the <code class="ph codeph">BIG</code> table produces a very small result set, but
+        the table might still be treated as if it were the biggest and placed first in the join order. Using
+        <code class="ph codeph">STRAIGHT_JOIN</code> for the last join clause prevents the final table from being reordered,
+        keeping it as the rightmost table in the join order.
+      </p>
+
+<pre class="pre codeblock"><code>select straight_join x from medium join small join (select * from big where c1 &lt; 10) as big
+  where medium.id = small.id and small.id = big.id;
+
+-- If the query contains [DISTINCT | ALL], the hint goes after those keywords.
+select distinct straight_join x from medium join small join (select * from big where c1 &lt; 10) as big
+  where medium.id = small.id and small.id = big.id;</code></pre>
+    </div>
+  </article>
+
+  <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="perf_joins__perf_joins_examples">
+
+    <h2 class="title topictitle2" id="ariaid-title4">Examples of Join Order Optimization</h2>
+
+    <div class="body conbody">
+
+      <p class="p">
+        Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this
+        case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data
+        from the largest one, for convenience of joining on the unique <code class="ph codeph">ID</code> column. The smallest
+        table only contains a subset of columns from the others.
+      </p>
+
+      <p class="p"></p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; create table big stored as parquet as select * from raw_data;
++----------------------------+
+| summary                    |
++----------------------------+
+| Inserted 1000000000 row(s) |
++----------------------------+
+Returned 1 row(s) in 671.56s
+[localhost:21000] &gt; desc big;
++-----------+---------+---------+
+| name      | type    | comment |
++-----------+---------+---------+
+| id        | int     |         |
+| val       | int     |         |
+| zfill     | string  |         |
+| name      | string  |         |
+| assertion | boolean |         |
++-----------+---------+---------+
+Returned 5 row(s) in 0.01s
+[localhost:21000] &gt; create table medium stored as parquet as select * from big limit 200 * floor(1e6);
++---------------------------+
+| summary                   |
++---------------------------+
+| Inserted 200000000 row(s) |
++---------------------------+
+Returned 1 row(s) in 138.31s
+[localhost:21000] &gt; create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
++-------------------------+
+| summary                 |
++-------------------------+
+| Inserted 1000000 row(s) |
++-------------------------+
+Returned 1 row(s) in 6.32s</code></pre>
+
+      <p class="p">
+        For any kind of performance experimentation, use the <code class="ph codeph">EXPLAIN</code> statement to see how any
+        expensive query will be performed without actually running it, and enable verbose <code class="ph codeph">EXPLAIN</code>
+        plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold,
+        showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of
+        rows involved at each stage of processing, and is likely to stick with the <code class="ph codeph">BROADCAST</code> join
+        mechanism that sends a complete copy of one of the tables to each node.
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; set explain_level=verbose;
+EXPLAIN_LEVEL set to verbose
+[localhost:21000] &gt; explain select count(*) from big join medium where big.id = medium.id;
++----------------------------------------------------------+
+| Explain String                                           |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=2.10GB VCores=2  |
+|                                                          |
+| PLAN FRAGMENT 0                                          |
+|   PARTITION: UNPARTITIONED                               |
+|                                                          |
+|   6:AGGREGATE (merge finalize)                           |
+|   |  output: SUM(COUNT(*))                               |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: unavailable                        |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   5:EXCHANGE                                             |
+|      cardinality: 1                                      |
+|      per-host memory: unavailable                        |
+|      tuple ids: 2                                        |
+|                                                          |
+| PLAN FRAGMENT 1                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 5                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   3:AGGREGATE                                            |
+|   |  output: COUNT(*)                                    |
+|   |  cardinality: 1                                      |
+|   |  per-host memory: 10.00MB                            |
+|   |  tuple ids: 2                                        |
+|   |                                                      |
+|   2:HASH JOIN                                            |
+<strong class="ph b">|   |  join op: INNER JOIN (BROADCAST)                     |</strong>
+|   |  hash predicates:                                    |
+|   |    big.id = medium.id                                |
+<strong class="ph b">|   |  cardinality: unavailable                            |</strong>
+|   |  per-host memory: 2.00GB                             |
+|   |  tuple ids: 0 1                                      |
+|   |                                                      |
+|   |----4:EXCHANGE                                        |
+|   |       cardinality: unavailable                       |
+|   |       per-host memory: 0B                            |
+|   |       tuple ids: 1                                   |
+|   |                                                      |
+|   0:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.big #partitions=1/1 size=23.12GB   |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 0                                        |
+|                                                          |
+| PLAN FRAGMENT 2                                          |
+|   PARTITION: RANDOM                                      |
+|                                                          |
+|   STREAM DATA SINK                                       |
+|     EXCHANGE ID: 4                                       |
+|     UNPARTITIONED                                        |
+|                                                          |
+|   1:SCAN HDFS                                            |
+<strong class="ph b">|      table=join_order.medium #partitions=1/1 size=4.62GB |
+|      table stats: unavailable                            |
+|      column stats: unavailable                           |
+|      cardinality: unavailable                            |</strong>
+|      per-host memory: 88.00MB                            |
+|      tuple ids: 1                                        |
++----------------------------------------------------------+
+Returned 64 row(s) in 0.04s</code></pre>
+
+      <p class="p">
+        Gathering statistics for all the tables is straightforward, one <code class="ph codeph">COMPUTE STATS</code> statement
+        per table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; compute stats small;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 4.26s
+[localhost:21000] &gt; compute stats medium;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 42.11s
+[localhost:21000] &gt; compute stats big;
++-----------------------------------------+
+| summary                                 |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 165.44s</code></pre>
+
+      <p class="p">
+        With statistics in place, Impala can choose a more effective join order rather than following the
+        left-to-right sequence of tables in the query, and can choose <code class="ph codeph">BROADCAST</code> or
+        <code class="ph codeph">PARTITIONED</code> join strategies based on the overall sizes and number of rows in the table:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; explain select count(*) from medium join big where big.id = medium.id;
+Query: explain select count(*) from medium join big where big.id = medium.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = medium.id                                 |
+|   |  cardinality: 1443004441                              |
+|   |  per-host memory: 839.23MB                            |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 200000000                          |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.medium #partitions=1/1 size=4.62GB  |
+|      table stats: 200000000 rows total                    |
+|      column stats: all                                    |
+|      cardinality: 200000000                               |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.04s
+
+[localhost:21000] &gt; explain select count(*) from small join big where big.id = small.id;
+Query: explain select count(*) from small join big where big.id = small.id
++-----------------------------------------------------------+
+| Explain String                                            |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
+|                                                           |
+| PLAN FRAGMENT 0                                           |
+|   PARTITION: UNPARTITIONED                                |
+|                                                           |
+|   6:AGGREGATE (merge finalize)                            |
+|   |  output: SUM(COUNT(*))                                |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: unavailable                         |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   5:EXCHANGE                                              |
+|      cardinality: 1                                       |
+|      per-host memory: unavailable                         |
+|      tuple ids: 2                                         |
+|                                                           |
+| PLAN FRAGMENT 1                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 5                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   3:AGGREGATE                                             |
+|   |  output: COUNT(*)                                     |
+|   |  cardinality: 1                                       |
+|   |  per-host memory: 10.00MB                             |
+|   |  tuple ids: 2                                         |
+|   |                                                       |
+|   2:HASH JOIN                                             |
+|   |  join op: INNER JOIN (BROADCAST)                      |
+|   |  hash predicates:                                     |
+|   |    big.id = small.id                                  |
+|   |  cardinality: 1000000000                              |
+|   |  per-host memory: 3.15MB                              |
+|   |  tuple ids: 1 0                                       |
+|   |                                                       |
+|   |----4:EXCHANGE                                         |
+|   |       cardinality: 1000000                            |
+|   |       per-host memory: 0B                             |
+|   |       tuple ids: 0                                    |
+|   |                                                       |
+|   1:SCAN HDFS                                             |
+|      table=join_order.big #partitions=1/1 size=23.12GB    |
+|      table stats: 1000000000 rows total                   |
+|      column stats: all                                    |
+|      cardinality: 1000000000                              |
+|      per-host memory: 88.00MB                             |
+|      tuple ids: 1                                         |
+|                                                           |
+| PLAN FRAGMENT 2                                           |
+|   PARTITION: RANDOM                                       |
+|                                                           |
+|   STREAM DATA SINK                                        |
+|     EXCHANGE ID: 4                                        |
+|     UNPARTITIONED                                         |
+|                                                           |
+|   0:SCAN HDFS                                             |
+|      table=join_order.small #partitions=1/1 size=17.93MB  |
+|      table stats: 1000000 rows total                      |
+|      column stats: all                                    |
+|      cardinality: 1000000                                 |
+|      per-host memory: 32.00MB                             |
+|      tuple ids: 0                                         |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.03s</code></pre>
+
+      <p class="p">
+        When queries like these are actually run, the execution times are relatively consistent regardless of the
+        table order in the query text. Here are examples using both the unique <code class="ph codeph">ID</code> column and the
+        <code class="ph codeph">VAL</code> column containing duplicate values:
+      </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] &gt; select count(*) from big join small on (big.id = small.id);
+Query: select count(*) from big join small on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 21.68s
+[localhost:21000] &gt; select count(*) from small join big on (big.id = small.id);
+Query: select count(*) from small join big on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000  |
++----------+
+Returned 1 row(s) in 20.45s
+
+[localhost:21000] &gt; select count(*) from big join small on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 108.85s
+[localhost:21000] &gt; select count(*) from small join big on (big.val = small.val);
++------------+
+| count(*)   |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 100.76s</code></pre>
+
+      <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+        When examining the performance of join queries and the effectiveness of the join order optimization, make
+        sure the query involves enough data and cluster resources to see a difference depending on the query plan.
+        For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed
+        on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much
+        difference in efficiency for the broadcast or partitioned join strategies.
+      </div>
+    </div>
+  </article>
+</article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_resources.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_resources.html b/docs/build3x/html/topics/impala_perf_resources.html
new file mode 100644
index 0000000..2bd7503
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_resources.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limits"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Controlling Impala Resource Usage</title></head><body id="mem_limits"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Controlling Impala Resource Usage</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      Sometimes, balancing raw query performance against scalability requires limiting the amount of resources,
+      such as memory or CPU, used by a single query or group of queries. Impala can use several mechanisms that
+      help to smooth out the load during heavy concurrent usage, resulting in faster overall query times and
+      sharing of resources across Impala queries, MapReduce jobs, and other kinds of workloads across a <span class="keyword"></span>
+      cluster:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        The Impala admission control feature uses a fast, distributed mechanism to hold back queries that exceed
+        limits on the number of concurrent queries or the amount of memory used. The queries are queued, and
+        executed as other queries finish and resources become available. You can control the concurrency limits,
+        and specify different limits for different groups of users to divide cluster resources according to the
+        priorities of different classes of users. This feature is new in Impala 1.3.
+        See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+      </li>
+
+      <li class="li">
+        <p class="p">
+          You can restrict the amount of memory Impala reserves during query execution by specifying the
+          <code class="ph codeph">-mem_limit</code> option for the <code class="ph codeph">impalad</code> daemon. See
+          <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details. This limit applies only to the
+          memory that is directly consumed by queries; Impala reserves additional memory at startup, for example to
+          hold cached metadata.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          For production deployments, implement resource isolation using your cluster management
+          tool.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>

http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_skew.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_skew.html b/docs/build3x/html/topics/impala_perf_skew.html
new file mode 100644
index 0000000..20e5bfc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_skew.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+  SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_skew"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Detecting and Correcting HDFS Block Skew Conditions</title></head><body id="perf_skew"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+  <h1 class="title topictitle1" id="ariaid-title1">Detecting and Correcting HDFS Block Skew Conditions</h1>
+
+
+
+  <div class="body conbody">
+
+    <p class="p">
+      For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and
+      all hosts take approximately equal time to finish their work. If one host takes substantially longer than
+      others, the extra time needed for the slow host can become the dominant factor in query performance.
+      Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions.
+    </p>
+
+    <p class="p">
+      The main cause of uneven performance that you can correct within Impala is <dfn class="term">skew</dfn> in the number of
+      HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others.
+      This condition can occur because of uneven distribution of the data values themselves, for example causing
+      certain data files or partitions to be large while others are very small. (Although it is possible to have
+      unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also
+      be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and
+      the way that Impala chooses the host to process each data block.
+    </p>
+
+    <p class="p">
+      The most convenient way to detect block skew, or slow-host issues in general, is to examine the <span class="q">"executive
+      summary"</span> information from the query profile after running a query:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          In <span class="keyword cmdname">impala-shell</span>, issue the <code class="ph codeph">SUMMARY</code> command immediately after the
+          query is complete, to see just the summary information. If you detect issues involving skew, you might
+          switch to issuing the <code class="ph codeph">PROFILE</code> command, which displays the summary information followed
+          by a detailed performance analysis.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          In the Impala debug web UI, click on the <span class="ph uicontrol">Profile</span> link associated with the query after it is
+          complete. The executive summary information is displayed early in the profile output.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      For each phase of the query, you see an <span class="ph uicontrol">Avg Time</span> and a <span class="ph uicontrol">Max Time</span>
+      value, along with <span class="ph uicontrol">#Hosts</span> indicating how many hosts are involved in that query phase.
+      For all the phases with <span class="ph uicontrol">#Hosts</span> greater than one, look for cases where the maximum time
+      is substantially greater than the average time. Focus on the phases that took the longest, for example, those
+      taking multiple seconds rather than milliseconds or microseconds.
+    </p>
+
+    <p class="p">
+      If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some
+      hosts could be slower than others is if those hosts have less capacity than the others, or if they are
+      substantially busier due to unevenly distributed non-Impala workloads:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings
+          from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven
+          performance, and the time spent diagnosing performance issues.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing
+          techniques for the non-Impala components to smooth out the load across the cluster.
+        </p>
+      </li>
+    </ul>
+
+    <p class="p">
+      If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to
+      determine which host is taking longer than others for the query phase in question. Examine how many bytes are
+      processed during that phase on that host, how much memory is used, and how many bytes are transmitted across
+      the network.
+    </p>
+
+    <p class="p">
+      The most common symptom is a higher number of bytes read on one host than others, due to one host being
+      requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the
+      number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and
+      the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle
+      while another node processes two blocks, the query could take twice as long as if the data was perfectly
+      distributed.
+    </p>
+
+    <p class="p">
+      Possible solutions in this case include:
+    </p>
+
+    <ul class="ul">
+      <li class="li">
+        <p class="p">
+          If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger
+          data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of
+          the uneven distribution is much lower than when some nodes did twice as much work as others. As a
+          guideline, aim for a <span class="q">"sweet spot"</span> where each node reads 2 GB or more from HDFS per query. Queries
+          that process lower volumes than that could experience inconsistent performance that smooths out as
+          queries become more data-intensive.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          If the query processes only a few large blocks, so that many nodes sit idle and cannot help to
+          parallelize the query, consider reducing the overall block size. For example, you might adjust the
+          <code class="ph codeph">PARQUET_FILE_SIZE</code> query option before copying or converting data into a Parquet table.
+          Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala
+          components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to
+          improve parallelism for common cluster sizes and data volumes.
+        </p>
+      </li>
+
+      <li class="li">
+        <p class="p">
+          Reduce the amount of compression applied to the data. For text data files, the highest degree of
+          compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel,
+          and require extra memory during processing to hold the compressed and uncompressed data simultaneously.
+          For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but
+          remember that when queries process relatively few blocks, there is less opportunity for parallel
+          execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with
+          the query option <code class="ph codeph">COMPRESSION_CODEC=NONE</code> enabled, the data is still typically compact due
+          to the encoding schemes used by Parquet, independent of the final compression step.
+        </p>
+      </li>
+    </ul>
+  </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>