You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:23 UTC
[14/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_hdfs_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_hdfs_caching.html b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
new file mode 100644
index 0000000..596675d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
@@ -0,0 +1,578 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hdfs_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using HDFS Caching with Impala (Impala 2.1 or higher only)</title></head><body id="hdfs_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using HDFS Caching with Impala (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ HDFS caching provides performance and scalability benefits in production environments where Impala queries
+ and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes,
+ making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory.
+ Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when
+ using data from the Linux OS cache.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ On a small or lightly loaded cluster, HDFS caching might not produce any speedup. It might even lead to
+ slower queries, if I/O read operations that were performed in parallel across the entire cluster are replaced by in-memory
+ operations operating on a smaller number of hosts. The hosts where the HDFS blocks are cached can become
+ bottlenecks because they experience high CPU load while processing the cached data blocks, while other hosts remain idle.
+ Therefore, always compare performance with and without this feature enabled, using a realistic workload.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, you can spread the CPU load more evenly by specifying the <code class="ph codeph">WITH REPLICATION</code>
+ clause of the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ This clause lets you control the replication factor for
+ HDFS caching for a specific table or partition. By default, each cached block is
+ only present on a single host, which can lead to CPU contention if the same host
+ processes each cached block. Increasing the replication factor lets Impala choose
+ different hosts to process different cached blocks, to better distribute the CPU load.
+ Always use a <code class="ph codeph">WITH REPLICATION</code> setting of at least 3, and adjust upward
+ if necessary to match the replication factor for the underlying HDFS data files.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala automatically randomizes which host processes
+ a cached HDFS block, to avoid CPU hotspots. For tables where HDFS caching is not applied,
+ Impala designates which host to process a data block using an algorithm that estimates
+ the load on each host. If CPU hotspots still arise during queries,
+ you can enable additional randomization for the scheduling algorithm for non-HDFS cached data
+ by setting the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+
+
+ <p class="p">
+ For background information about how to set up and manage HDFS caching for a <span class="keyword"></span> cluster, see
+ <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="hdfs_caching__hdfs_caching_overview">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of HDFS Caching for Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ In <span class="keyword">Impala 1.4</span> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that
+ repeated queries can take advantage of data <span class="q">"pinned"</span> in memory regardless of how much data is
+ processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be
+ pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This
+ technique is suitable for tables or partitions that are frequently accessed and are small enough to fit
+ entirely within the HDFS memory cache. For example, you might designate several dimension tables to be
+ pinned in the cache, to speed up many different join queries that reference them. Or in a partitioned
+ table, you might pin a partition holding data from the most recent time period because that data will be
+ queried intensively; then when the next set of data arrives, you could unpin the previous partition and pin
+ the partition holding the new data.
+ </p>
+
+ <p class="p">
+ Because this Impala performance feature relies on HDFS infrastructure, it only applies to Impala tables
+ that use HDFS data files. HDFS caching for Impala does not apply to HBase tables, S3 tables,
+ Kudu tables,
+ or Isilon tables.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="hdfs_caching__hdfs_caching_prereqs">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Setting Up HDFS Caching for Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To use HDFS caching with Impala, first set up that feature for your <span class="keyword"></span> cluster:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Decide how much memory to devote to the HDFS cache on each host. Remember that the total memory available
+ for cached data is the sum of the cache sizes on all the hosts. By default, any data block is only cached on one
+ host, although you can cache a block across multiple hosts by increasing the replication factor.
+
+ </p>
+ </li>
+
+ <li class="li">
+ <div class="p">
+ Issue <span class="keyword cmdname">hdfs cacheadmin</span> commands to set up one or more cache pools, owned by the same
+ user as the <span class="keyword cmdname">impalad</span> daemon (typically <code class="ph codeph">impala</code>). For example:
+<pre class="pre codeblock"><code>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
+</code></pre>
+ For details about the <span class="keyword cmdname">hdfs cacheadmin</span> command, see
+ <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+ </div>
+ </li>
+ </ul>
+
+ <p class="p">
+ Once HDFS caching is enabled and one or more pools are available, see
+ <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching_ddl">Enabling HDFS Caching for Impala Tables and Partitions</a> for how to choose which Impala data to load
+ into the HDFS cache. On the Impala side, you specify the cache pool name defined by the <code class="ph codeph">hdfs
+ cacheadmin</code> command in the Impala DDL statements that enable HDFS caching for a table or partition,
+ such as <code class="ph codeph">CREATE TABLE ... CACHED IN <var class="keyword varname">pool</var></code> or <code class="ph codeph">ALTER TABLE ... SET
+ CACHED IN <var class="keyword varname">pool</var></code>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="hdfs_caching__hdfs_caching_ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Enabling HDFS Caching for Impala Tables and Partitions</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Begin by choosing which tables or partitions to cache. For example, these might be lookup tables that are
+ accessed by many different join queries, or partitions corresponding to the most recent time period that
+ are analyzed by different reports or ad hoc queries.
+ </p>
+
+ <p class="p">
+ In your SQL statements, you specify logical divisions such as tables and partitions to be cached. Impala
+ translates these requests into HDFS-level directives that apply to particular directories and files. For
+ example, given a partitioned table <code class="ph codeph">CENSUS</code> with a partition key column
+ <code class="ph codeph">YEAR</code>, you could choose to cache all or part of the data as follows:
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+ for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+ a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+ When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+ selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+ usage on a single host when the same cached data block is processed multiple times.
+ Where practical, specify a value greater than or equal to the HDFS block replication factor.
+ </p>
+
+<pre class="pre codeblock"><code>-- Cache the entire table (all partitions).
+alter table census set cached in '<var class="keyword varname">pool_name</var>';
+
+-- Remove the entire table from the cache.
+alter table census set uncached;
+
+-- Cache a portion of the table (a single partition).
+-- If the table is partitioned by multiple columns (such as year, month, day),
+-- the ALTER TABLE command must specify values for all those columns.
+alter table census partition (year=1960) set cached in '<var class="keyword varname">pool_name</var>';
+
+<span class="ph">-- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any
+-- single host when the same data block is processed multiple times.
+alter table census partition (year=1970)
+ set cached in '<var class="keyword varname">pool_name</var>' with replication = 4;</span>
+
+-- At each stage, check the volume of cached data.
+-- For large tables or partitions, the background loading might take some time,
+-- so you might have to wait and reissue the statement until all the data
+-- has finished being loaded into the cache.
+show table stats census;
++-------+-------+--------+------+--------------+--------+
+| year | #Rows | #Files | Size | Bytes Cached | Format |
++-------+-------+--------+------+--------------+--------+
+| 1900 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| 1940 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| 1960 | -1 | 1 | 11B | 11B | TEXT |
+| 1970 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| Total | -1 | 4 | 44B | 11B | |
++-------+-------+--------+------+--------------+--------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">CREATE TABLE considerations:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature affects the Impala <code class="ph codeph">CREATE TABLE</code> statement as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You can put a <code class="ph codeph">CACHED IN '<var class="keyword varname">pool_name</var>'</code> clause
+ <span class="ph">and optionally a <code class="ph codeph">WITH REPLICATION = <var class="keyword varname">number_of_hosts</var></code> clause</span>
+ at the end of a
+ <code class="ph codeph">CREATE TABLE</code> statement to automatically cache the entire contents of the table,
+ including any partitions added later. The <var class="keyword varname">pool_name</var> is a pool that you previously set
+ up with the <span class="keyword cmdname">hdfs cacheadmin</span> command.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Once a table is designated for HDFS caching through the <code class="ph codeph">CREATE TABLE</code> statement, if new
+ partitions are added later through <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements, the data in
+ those new partitions is automatically cached in the same pool.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you want to perform repetitive queries on a subset of data from a large table, and it is not practical
+ to designate the entire table or specific partitions for HDFS caching, you can create a new cached table
+ with just a subset of the data by using <code class="ph codeph">CREATE TABLE ... CACHED IN '<var class="keyword varname">pool_name</var>'
+ AS SELECT ... WHERE ...</code>. When you are finished with generating reports from this subset of data,
+ drop the table and both the data files and the data cached in RAM are automatically deleted.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for the full syntax.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Other memory considerations:</strong>
+ </p>
+
+ <p class="p">
+ Certain DDL operations, such as <code class="ph codeph">ALTER TABLE ... SET LOCATION</code>, are blocked while the
+ underlying HDFS directories contain cached files. You must uncache the files first, before changing the
+ location, dropping the table, and so on.
+ </p>
+
+ <p class="p">
+ When data is requested to be pinned in memory, that process happens in the background without blocking
+ access to the data while the caching is in progress. Loading the data from disk could take some time.
+ Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
+ been pinned yet. When files are added to a table or partition whose contents are cached, Impala
+ automatically detects those changes and performs a <code class="ph codeph">REFRESH</code> automatically once the relevant
+ data is cached.
+ </p>
+
+ <p class="p">
+ The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota
+ that is enforced by the underlying HDFS service. Before requesting to pin an Impala table or partition in
+ memory, check that its size does not exceed this quota.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the HDFS cache consists of combined memory from all the DataNodes in the cluster, cached tables or
+ partitions can be bigger than the amount of HDFS cache memory on any single host.
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="hdfs_caching__hdfs_caching_etl">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Loading and Removing Data with HDFS Caching Enabled</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ When HDFS caching is enabled, extra processing happens in the background when you add or remove data
+ through statements such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">DROP TABLE</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Inserting or loading data:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When Impala performs an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> or
+ <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code> statement for a table or
+ partition that is cached, the new data files are automatically cached and Impala recognizes that fact
+ automatically.
+ </li>
+
+ <li class="li">
+ If you perform an <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> through Hive, as always, Impala
+ only recognizes the new data files after a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement in Impala.
+ </li>
+
+ <li class="li">
+ If the cache pool is entirely full, or becomes full before all the requested data can be cached, the
+ Impala DDL statement returns an error. This is to avoid situations where only some of the requested data
+ could be cached.
+ </li>
+
+ <li class="li">
+ When HDFS caching is enabled for a table or partition, new data files are cached automatically when they
+ are added to the appropriate directory in HDFS, without the need for a <code class="ph codeph">REFRESH</code> statement
+ in Impala. Impala automatically performs a <code class="ph codeph">REFRESH</code> once the new data is loaded into the
+ HDFS cache.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Dropping tables, partitions, or cache pools:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature interacts with the Impala
+ <code class="ph codeph"><a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE</a></code> and
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... DROP PARTITION</a></code>
+ statements as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When you issue a <code class="ph codeph">DROP TABLE</code> for a table that is entirely cached, or has some partitions
+ cached, the <code class="ph codeph">DROP TABLE</code> succeeds and all the cache directives Impala submitted for that
+ table are removed from the HDFS cache system.
+ </li>
+
+ <li class="li">
+ The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code>. The operation succeeds and any cache
+ directives are removed.
+ </li>
+
+ <li class="li">
+ As always, the underlying data files are removed if the dropped table is an internal table, or the
+ dropped partition is in its default location underneath an internal table. The data files are left alone
+ if the dropped table is an external table, or if the dropped partition is in a non-default location.
+ </li>
+
+ <li class="li">
+ If you designated the data files as cached through the <span class="keyword cmdname">hdfs cacheadmin</span> command, and
+ the data files are left behind as described in the previous item, the data files remain cached. Impala
+ only removes the cache directives submitted by Impala through the <code class="ph codeph">CREATE TABLE</code> or
+ <code class="ph codeph">ALTER TABLE</code> statements. It is OK to have multiple redundant cache directives pertaining
+ to the same files; the directives all have unique IDs and owners so that the system can tell them apart.
+ </li>
+
+ <li class="li">
+ If you drop an HDFS cache pool through the <span class="keyword cmdname">hdfs cacheadmin</span> command, all the Impala
+ data files are preserved, just no longer cached. After a subsequent <code class="ph codeph">REFRESH</code>,
+ <code class="ph codeph">SHOW TABLE STATS</code> reports 0 bytes cached for each associated Impala table or partition.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Relocating a table or partition:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature interacts with the Impala
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... SET LOCATION</a></code>
+ statement as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If you have designated a table or partition as cached through the <code class="ph codeph">CREATE TABLE</code> or
+ <code class="ph codeph">ALTER TABLE</code> statements, subsequent attempts to relocate the table or partition through
+ an <code class="ph codeph">ALTER TABLE ... SET LOCATION</code> statement will fail. You must issue an <code class="ph codeph">ALTER
+ TABLE ... SET UNCACHED</code> statement for the table or partition first. Otherwise, Impala would lose
+ track of some cached data files and have no way to uncache them later.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="hdfs_caching__hdfs_caching_admin">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Administration for HDFS Caching with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are the guidelines and steps to check or change the status of HDFS caching for Impala data:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">hdfs cacheadmin command:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If you drop a cache pool with the <span class="keyword cmdname">hdfs cacheadmin</span> command, Impala queries against the
+ associated data files will still work, by falling back to reading the files from disk. After performing a
+ <code class="ph codeph">REFRESH</code> on the table, Impala reports the number of bytes cached as 0 for all associated
+ tables and partitions.
+ </li>
+
+ <li class="li">
+ You might use <span class="keyword cmdname">hdfs cacheadmin</span> to get a list of existing cache pools, or detailed
+ information about the pools, as follows:
+<pre class="pre codeblock"><code>hdfs cacheadmin -listDirectives # Basic info
+Found 122 entries
+ ID POOL REPL EXPIRY PATH
+ 123 testPool 1 never /user/hive/warehouse/tpcds.store_sales
+ 124 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15
+ 125 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01
+...
+
+hdfs cacheadmin -listDirectives -stats # More details
+Found 122 entries
+ ID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED
+ 123 testPool 1 never /user/hive/warehouse/tpcds.store_sales 0 0 0 0
+ 124 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15 143169 143169 1 1
+ 125 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01 112447 112447 1 1
+...
+</code></pre>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Impala SHOW statement:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ For each table or partition, the <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code>
+ statement displays the number of bytes currently cached by the HDFS caching feature. If there are no
+ cache directives in place for that table or partition, the result set displays <code class="ph codeph">NOT
+ CACHED</code>. A value of 0, or a smaller number than the overall size of the table or partition,
+ indicates that the cache request has been submitted but the data has not been entirely loaded into memory
+ yet. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Impala memory limits:</strong>
+ </p>
+
+ <p class="p">
+ The Impala HDFS caching feature interacts with the Impala memory limits as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The maximum size of each HDFS cache pool is specified externally to Impala, through the <span class="keyword cmdname">hdfs
+ cacheadmin</span> command.
+ </li>
+
+ <li class="li">
+ All the memory used for HDFS caching is separate from the <span class="keyword cmdname">impalad</span> daemon address space
+ and does not count towards the limits of the <code class="ph codeph">--mem_limit</code> startup option,
+ <code class="ph codeph">MEM_LIMIT</code> query option, or further limits imposed through YARN resource management or
+ the Linux <code class="ph codeph">cgroups</code> mechanism.
+ </li>
+
+ <li class="li">
+ Because accessing HDFS cached data avoids a memory-to-memory copy operation, queries involving cached
+ data require less memory on the Impala side than the equivalent queries on uncached data. In addition to
+ any performance benefits in a single-user environment, the reduced memory helps to improve scalability
+ under high-concurrency workloads.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="hdfs_caching__hdfs_caching_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Performance Considerations for HDFS Caching with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In Impala 1.4.0 and higher, Impala supports efficient reads from data that is pinned in memory through HDFS
+ caching. Impala takes advantage of the HDFS API and reads the data from memory rather than from disk
+ whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where
+ you specify HDFS paths.
+ </p>
+
+ <p class="p">
+ When you examine the output of the <span class="keyword cmdname">impala-shell</span> <span class="keyword cmdname">SUMMARY</span> command, or
+ look in the metrics report for the <span class="keyword cmdname">impalad</span> daemon, you see how many bytes are read from
+ the HDFS cache. For example, this excerpt from a query profile illustrates that all the data read during a
+ particular phase of the query came from the HDFS cache, because the <code class="ph codeph">BytesRead</code> and
+ <code class="ph codeph">BytesReadDataNodeCache</code> values are identical.
+ </p>
+
+<pre class="pre codeblock"><code>HDFS_SCAN_NODE (id=0):(Total: 11s114ms, non-child: 11s114ms, % non-child: 100.00%)
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 32.75
+<strong class="ph b"> - BytesRead: 10.47 GB (11240756479)
+ - BytesReadDataNodeCache: 10.47 GB (11240756479)</strong>
+ - BytesReadLocal: 10.47 GB (11240756479)
+ - BytesReadShortCircuit: 10.47 GB (11240756479)
+ - DecompressionTime: 27s572ms
+</code></pre>
+
+ <p class="p">
+ For queries involving smaller amounts of data, or in single-user workloads, you might not notice a
+ significant difference in query response time with or without HDFS caching. Even with HDFS caching turned
+ off, the data for the query might still be in the Linux OS buffer cache. The benefits become clearer as
+ data volume increases, and especially as the system processes more concurrent queries. HDFS caching
+ improves the scalability of the overall system. That is, it prevents query performance from declining when
+ the workload outstrips the capacity of the Linux OS cache.
+ </p>
+
+ <p class="p">
+ Due to a limitation of HDFS, zero-copy reads are not supported with
+ encryption. Where practical, avoid HDFS caching for Impala data
+ files in encryption zones. The queries fall back to the normal read
+ path during query execution, which might cause some performance overhead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">SELECT considerations:</strong>
+ </p>
+
+ <p class="p">
+ The Impala HDFS caching feature interacts with the
+ <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement and query performance as
+ follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala automatically reads from memory any data that has been designated as cached and actually loaded
+ into the HDFS cache. (It could take some time after the initial request to fully populate the cache for a
+ table with large size or many partitions.) The speedup comes from two aspects: reading from RAM instead
+ of disk, and accessing the data straight from the cache area instead of copying from one RAM area to
+ another. This second aspect yields further performance improvement over the standard OS caching
+ mechanism, which still results in memory-to-memory copying of cached data.
+ </li>
+
+ <li class="li">
+ For small amounts of data, the query speedup might not be noticeable in terms of wall clock time. The
+ performance might be roughly the same with HDFS caching turned on or off, due to recently used data being
+ held in the Linux OS cache. The difference is more pronounced with:
+ <ul class="ul">
+ <li class="li">
+ Data volumes (for all queries running concurrently) that exceed the size of the Linux OS cache.
+ </li>
+
+ <li class="li">
+ A busy cluster running many concurrent queries, where the reduction in memory-to-memory copying and
+ overall memory usage during queries results in greater scalability and throughput.
+ </li>
+
+ <li class="li">
+ Thus, to really exercise and benchmark this feature in a development environment, you might need to
+ simulate realistic workloads and concurrent queries that match your production environment.
+ </li>
+
+ <li class="li">
+ One way to simulate a heavy workload on a lightly loaded system is to flush the OS buffer cache (on
+ each DataNode) between iterations of queries against the same tables or partitions:
+<pre class="pre codeblock"><code>$ sync
+$ echo 1 > /proc/sys/vm/drop_caches
+</code></pre>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Impala queries take advantage of HDFS cached data regardless of whether the cache directive was issued by
+ Impala or externally through the <span class="keyword cmdname">hdfs cacheadmin</span> command, for example for an external
+ table where the cached data files might be accessed by several different Hadoop components.
+ </li>
+
+ <li class="li">
+ If your query returns a large result set, the time reported for the query could be dominated by the time
+ needed to print the results on the screen. To measure the time for the underlying query processing, query
+ the <code class="ph codeph">COUNT()</code> of the big result set, which does all the same processing but only prints a
+ single line to the screen.
+ </li>
+ </ul>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_joins.html b/docs/build3x/html/topics/impala_perf_joins.html
new file mode 100644
index 0000000..7def5b4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_joins.html
@@ -0,0 +1,508 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Performance Considerations for Join Queries</title></head><body id="perf_joins"><ma
in role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Performance Considerations for Join Queries</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Queries involving join operations often require more tuning than queries that refer to only one table. The
+ maximum size of the result set from a join query is the product of the number of rows in all the joined
+ tables. When joining several tables with millions or billions of rows, any missed opportunity to filter the
+ result set, or other inefficiency in the query, could lead to an operation that does not finish in a
+ practical time and has to be cancelled.
+ </p>
+
+ <p class="p">
+ The simplest technique for tuning an Impala join query is to collect statistics on each table involved in the
+ join using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS</a></code>
+ statement, and then let Impala automatically optimize the query based on the size of each table, number of
+ distinct values of each column, and so on. The <code class="ph codeph">COMPUTE STATS</code> statement and the join
+ optimization are new features introduced in Impala 1.2.2. For accurate statistics about each table, issue the
+ <code class="ph codeph">COMPUTE STATS</code> statement after loading the data into that table, and again if the amount of
+ data changes substantially due to an <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, adding a partition,
+ and so on.
+ </p>
+
+ <p class="p">
+ If statistics are not available for all the tables in the join query, or if Impala chooses a join order that
+ is not the most efficient, you can override the automatic join order optimization by specifying the
+ <code class="ph codeph">STRAIGHT_JOIN</code> keyword immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code>
+ or <code class="ph codeph">ALL</code> keywords. In this case, Impala uses the order the tables appear in the query to guide how the
+ joins are processed.
+ </p>
+
+ <p class="p">
+ When you use the <code class="ph codeph">STRAIGHT_JOIN</code> technique, you must order the tables in the join query
+ manually instead of relying on the Impala optimizer. The optimizer uses sophisticated techniques to estimate
+ the size of the result set at each stage of the join. For manual ordering, use this heuristic approach to
+ start with, and then experiment to fine-tune the order:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Specify the largest table first. This table is read from disk by each Impala node and so its size is not
+ significant in terms of memory usage during the query.
+ </li>
+
+ <li class="li">
+ Next, specify the smallest table. The contents of the second, third, and so on tables are all transmitted
+ across the network. You want to minimize the size of the result set from each subsequent stage of the join
+ query. The most likely approach involves joining a small table first, so that the result set remains small
+ even as subsequent larger tables are processed.
+ </li>
+
+ <li class="li">
+ Join the next smallest table, then the next smallest, and so on.
+ </li>
+
+ <li class="li">
+ For example, if you had tables <code class="ph codeph">BIG</code>, <code class="ph codeph">MEDIUM</code>, <code class="ph codeph">SMALL</code>, and
+ <code class="ph codeph">TINY</code>, the logical join order to try would be <code class="ph codeph">BIG</code>, <code class="ph codeph">TINY</code>,
+ <code class="ph codeph">SMALL</code>, <code class="ph codeph">MEDIUM</code>.
+ </li>
+ </ul>
+
+ <p class="p">
+ The terms <span class="q">"largest"</span> and <span class="q">"smallest"</span> refers to the size of the intermediate result set based on the
+ number of rows and columns from each table that are part of the result set. For example, if you join one
+ table <code class="ph codeph">sales</code> with another table <code class="ph codeph">customers</code>, a query might find results from
+ 100 different customers who made a total of 5000 purchases. In that case, you would specify <code class="ph codeph">SELECT
+ ... FROM sales JOIN customers ...</code>, putting <code class="ph codeph">customers</code> on the right side because it
+ is smaller in the context of this query.
+ </p>
+
+ <p class="p">
+ The Impala query planner chooses between different techniques for performing join queries, depending on the
+ absolute and relative sizes of the tables. <strong class="ph b">Broadcast joins</strong> are the default, where the right-hand table
+ is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes
+ involved in the query. The alternative technique is known as a <strong class="ph b">partitioned join</strong> (not related to a
+ partitioned table), which is more suitable for large tables of roughly equal size. With this technique,
+ portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in
+ parallel. The choice of broadcast or partitioned join also depends on statistics being available for all
+ tables in the join, gathered by the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ To see which join strategy is used for a particular query, issue an <code class="ph codeph">EXPLAIN</code> statement for
+ the query. If you find that a query uses a broadcast join when you know through benchmarking that a
+ partitioned join would be more efficient, or vice versa, add a hint to the query to specify the precise join
+ mechanism to use. See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="perf_joins__joins_no_stats">
+
+ <h2 class="title topictitle2" id="ariaid-title2">How Joins Are Processed when Statistics Are Unavailable</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If table or column statistics are not available for some tables in a join, Impala still reorders the tables
+ using the information that is available. Tables with statistics are placed on the left side of the join
+ order, in descending order of cost based on overall size and cardinality. Tables without statistics are
+ treated as zero-size, that is, they are always placed on the right side of the join order.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="perf_joins__straight_join">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overriding Join Reordering with STRAIGHT_JOIN</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If an Impala join query is inefficient because of outdated statistics or unexpected data distribution, you
+ can keep Impala from reordering the joined tables by using the <code class="ph codeph">STRAIGHT_JOIN</code> keyword
+ immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code>
+ keywords. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword turns off
+ the reordering of join clauses that Impala does internally, and produces a plan that relies on the join
+ clauses being ordered optimally in the query text. In this case, rewrite the query so that the largest
+ table is on the left, followed by the next largest, and so on until the smallest table is on the right.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+ block containing the hint. It does not affect the join order of nested queries, such as views,
+ inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+ tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+ </p>
+ </div>
+
+ <p class="p">
+ In this example, the subselect from the <code class="ph codeph">BIG</code> table produces a very small result set, but
+ the table might still be treated as if it were the biggest and placed first in the join order. Using
+ <code class="ph codeph">STRAIGHT_JOIN</code> for the last join clause prevents the final table from being reordered,
+ keeping it as the rightmost table in the join order.
+ </p>
+
+<pre class="pre codeblock"><code>select straight_join x from medium join small join (select * from big where c1 < 10) as big
+ where medium.id = small.id and small.id = big.id;
+
+-- If the query contains [DISTINCT | ALL], the hint goes after those keywords.
+select distinct straight_join x from medium join small join (select * from big where c1 < 10) as big
+ where medium.id = small.id and small.id = big.id;</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="perf_joins__perf_joins_examples">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Examples of Join Order Optimization</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this
+ case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data
+ from the largest one, for convenience of joining on the unique <code class="ph codeph">ID</code> column. The smallest
+ table only contains a subset of columns from the others.
+ </p>
+
+ <p class="p"></p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table big stored as parquet as select * from raw_data;
++----------------------------+
+| summary |
++----------------------------+
+| Inserted 1000000000 row(s) |
++----------------------------+
+Returned 1 row(s) in 671.56s
+[localhost:21000] > desc big;
++-----------+---------+---------+
+| name | type | comment |
++-----------+---------+---------+
+| id | int | |
+| val | int | |
+| zfill | string | |
+| name | string | |
+| assertion | boolean | |
++-----------+---------+---------+
+Returned 5 row(s) in 0.01s
+[localhost:21000] > create table medium stored as parquet as select * from big limit 200 * floor(1e6);
++---------------------------+
+| summary |
++---------------------------+
+| Inserted 200000000 row(s) |
++---------------------------+
+Returned 1 row(s) in 138.31s
+[localhost:21000] > create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
++-------------------------+
+| summary |
++-------------------------+
+| Inserted 1000000 row(s) |
++-------------------------+
+Returned 1 row(s) in 6.32s</code></pre>
+
+ <p class="p">
+ For any kind of performance experimentation, use the <code class="ph codeph">EXPLAIN</code> statement to see how any
+ expensive query will be performed without actually running it, and enable verbose <code class="ph codeph">EXPLAIN</code>
+ plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold,
+ showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of
+ rows involved at each stage of processing, and is likely to stick with the <code class="ph codeph">BROADCAST</code> join
+ mechanism that sends a complete copy of one of the tables to each node.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=verbose;
+EXPLAIN_LEVEL set to verbose
+[localhost:21000] > explain select count(*) from big join medium where big.id = medium.id;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=2.10GB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+<strong class="ph b">| | join op: INNER JOIN (BROADCAST) |</strong>
+| | hash predicates: |
+| | big.id = medium.id |
+<strong class="ph b">| | cardinality: unavailable |</strong>
+| | per-host memory: 2.00GB |
+| | tuple ids: 0 1 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: unavailable |
+| | per-host memory: 0B |
+| | tuple ids: 1 |
+| | |
+| 0:SCAN HDFS |
+<strong class="ph b">| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: unavailable |
+| column stats: unavailable |
+| cardinality: unavailable |</strong>
+| per-host memory: 88.00MB |
+| tuple ids: 0 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 1:SCAN HDFS |
+<strong class="ph b">| table=join_order.medium #partitions=1/1 size=4.62GB |
+| table stats: unavailable |
+| column stats: unavailable |
+| cardinality: unavailable |</strong>
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
++----------------------------------------------------------+
+Returned 64 row(s) in 0.04s</code></pre>
+
+ <p class="p">
+ Gathering statistics for all the tables is straightforward, one <code class="ph codeph">COMPUTE STATS</code> statement
+ per table:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats small;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 4.26s
+[localhost:21000] > compute stats medium;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 42.11s
+[localhost:21000] > compute stats big;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 165.44s</code></pre>
+
+ <p class="p">
+ With statistics in place, Impala can choose a more effective join order rather than following the
+ left-to-right sequence of tables in the query, and can choose <code class="ph codeph">BROADCAST</code> or
+ <code class="ph codeph">PARTITIONED</code> join strategies based on the overall sizes and number of rows in the table:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > explain select count(*) from medium join big where big.id = medium.id;
+Query: explain select count(*) from medium join big where big.id = medium.id
++-----------------------------------------------------------+
+| Explain String |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+| | join op: INNER JOIN (BROADCAST) |
+| | hash predicates: |
+| | big.id = medium.id |
+| | cardinality: 1443004441 |
+| | per-host memory: 839.23MB |
+| | tuple ids: 1 0 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: 200000000 |
+| | per-host memory: 0B |
+| | tuple ids: 0 |
+| | |
+| 1:SCAN HDFS |
+| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: 1000000000 rows total |
+| column stats: all |
+| cardinality: 1000000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=join_order.medium #partitions=1/1 size=4.62GB |
+| table stats: 200000000 rows total |
+| column stats: all |
+| cardinality: 200000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 0 |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.04s
+
+[localhost:21000] > explain select count(*) from small join big where big.id = small.id;
+Query: explain select count(*) from small join big where big.id = small.id
++-----------------------------------------------------------+
+| Explain String |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+| | join op: INNER JOIN (BROADCAST) |
+| | hash predicates: |
+| | big.id = small.id |
+| | cardinality: 1000000000 |
+| | per-host memory: 3.15MB |
+| | tuple ids: 1 0 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: 1000000 |
+| | per-host memory: 0B |
+| | tuple ids: 0 |
+| | |
+| 1:SCAN HDFS |
+| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: 1000000000 rows total |
+| column stats: all |
+| cardinality: 1000000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=join_order.small #partitions=1/1 size=17.93MB |
+| table stats: 1000000 rows total |
+| column stats: all |
+| cardinality: 1000000 |
+| per-host memory: 32.00MB |
+| tuple ids: 0 |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.03s</code></pre>
+
+ <p class="p">
+ When queries like these are actually run, the execution times are relatively consistent regardless of the
+ table order in the query text. Here are examples using both the unique <code class="ph codeph">ID</code> column and the
+ <code class="ph codeph">VAL</code> column containing duplicate values:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select count(*) from big join small on (big.id = small.id);
+Query: select count(*) from big join small on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+Returned 1 row(s) in 21.68s
+[localhost:21000] > select count(*) from small join big on (big.id = small.id);
+Query: select count(*) from small join big on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+Returned 1 row(s) in 20.45s
+
+[localhost:21000] > select count(*) from big join small on (big.val = small.val);
++------------+
+| count(*) |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 108.85s
+[localhost:21000] > select count(*) from small join big on (big.val = small.val);
++------------+
+| count(*) |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 100.76s</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ When examining the performance of join queries and the effectiveness of the join order optimization, make
+ sure the query involves enough data and cluster resources to see a difference depending on the query plan.
+ For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed
+ on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much
+ difference in efficiency for the broadcast or partitioned join strategies.
+ </div>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_resources.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_resources.html b/docs/build3x/html/topics/impala_perf_resources.html
new file mode 100644
index 0000000..2bd7503
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_resources.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limits"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Controlling Impala Resource Usage</title></head><body id="mem_limits"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Controlling Impala Resource Usage</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Sometimes, balancing raw query performance against scalability requires limiting the amount of resources,
+ such as memory or CPU, used by a single query or group of queries. Impala can use several mechanisms that
+ help to smooth out the load during heavy concurrent usage, resulting in faster overall query times and
+ sharing of resources across Impala queries, MapReduce jobs, and other kinds of workloads across a <span class="keyword"></span>
+ cluster:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The Impala admission control feature uses a fast, distributed mechanism to hold back queries that exceed
+ limits on the number of concurrent queries or the amount of memory used. The queries are queued, and
+ executed as other queries finish and resources become available. You can control the concurrency limits,
+ and specify different limits for different groups of users to divide cluster resources according to the
+ priorities of different classes of users. This feature is new in Impala 1.3.
+ See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can restrict the amount of memory Impala reserves during query execution by specifying the
+ <code class="ph codeph">-mem_limit</code> option for the <code class="ph codeph">impalad</code> daemon. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details. This limit applies only to the
+ memory that is directly consumed by queries; Impala reserves additional memory at startup, for example to
+ hold cached metadata.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For production deployments, implement resource isolation using your cluster management
+ tool.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_skew.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_skew.html b/docs/build3x/html/topics/impala_perf_skew.html
new file mode 100644
index 0000000..20e5bfc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_skew.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_skew"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Detecting and Correcting HDFS Block Skew Conditions</title></head><body id="perf_skew"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Detecting and Correcting HDFS Block Skew Conditions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and
+ all hosts take approximately equal time to finish their work. If one host takes substantially longer than
+ others, the extra time needed for the slow host can become the dominant factor in query performance.
+ Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions.
+ </p>
+
+ <p class="p">
+ The main cause of uneven performance that you can correct within Impala is <dfn class="term">skew</dfn> in the number of
+ HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others.
+ This condition can occur because of uneven distribution of the data values themselves, for example causing
+ certain data files or partitions to be large while others are very small. (Although it is possible to have
+ unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also
+ be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and
+ the way that Impala chooses the host to process each data block.
+ </p>
+
+ <p class="p">
+ The most convenient way to detect block skew, or slow-host issues in general, is to examine the <span class="q">"executive
+ summary"</span> information from the query profile after running a query:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In <span class="keyword cmdname">impala-shell</span>, issue the <code class="ph codeph">SUMMARY</code> command immediately after the
+ query is complete, to see just the summary information. If you detect issues involving skew, you might
+ switch to issuing the <code class="ph codeph">PROFILE</code> command, which displays the summary information followed
+ by a detailed performance analysis.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In the Impala debug web UI, click on the <span class="ph uicontrol">Profile</span> link associated with the query after it is
+ complete. The executive summary information is displayed early in the profile output.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ For each phase of the query, you see an <span class="ph uicontrol">Avg Time</span> and a <span class="ph uicontrol">Max Time</span>
+ value, along with <span class="ph uicontrol">#Hosts</span> indicating how many hosts are involved in that query phase.
+ For all the phases with <span class="ph uicontrol">#Hosts</span> greater than one, look for cases where the maximum time
+ is substantially greater than the average time. Focus on the phases that took the longest, for example, those
+ taking multiple seconds rather than milliseconds or microseconds.
+ </p>
+
+ <p class="p">
+ If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some
+ hosts could be slower than others is if those hosts have less capacity than the others, or if they are
+ substantially busier due to unevenly distributed non-Impala workloads:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings
+ from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven
+ performance, and the time spent diagnosing performance issues.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing
+ techniques for the non-Impala components to smooth out the load across the cluster.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to
+ determine which host is taking longer than others for the query phase in question. Examine how many bytes are
+ processed during that phase on that host, how much memory is used, and how many bytes are transmitted across
+ the network.
+ </p>
+
+ <p class="p">
+ The most common symptom is a higher number of bytes read on one host than others, due to one host being
+ requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the
+ number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and
+ the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle
+ while another node processes two blocks, the query could take twice as long as if the data was perfectly
+ distributed.
+ </p>
+
+ <p class="p">
+ Possible solutions in this case include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger
+ data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of
+ the uneven distribution is much lower than when some nodes did twice as much work as others. As a
+ guideline, aim for a <span class="q">"sweet spot"</span> where each node reads 2 GB or more from HDFS per query. Queries
+ that process lower volumes than that could experience inconsistent performance that smooths out as
+ queries become more data-intensive.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If the query processes only a few large blocks, so that many nodes sit idle and cannot help to
+ parallelize the query, consider reducing the overall block size. For example, you might adjust the
+ <code class="ph codeph">PARQUET_FILE_SIZE</code> query option before copying or converting data into a Parquet table.
+ Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala
+ components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to
+ improve parallelism for common cluster sizes and data volumes.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Reduce the amount of compression applied to the data. For text data files, the highest degree of
+ compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel,
+ and require extra memory during processing to hold the compressed and uncompressed data simultaneously.
+ For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but
+ remember that when queries process relatively few blocks, there is less opportunity for parallel
+ execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with
+ the query option <code class="ph codeph">COMPRESSION_CODEC=NONE</code> enabled, the data is still typically compact due
+ to the encoding schemes used by Parquet, independent of the final compression step.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>