You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:17 UTC
[08/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_resource_management.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_resource_management.html b/docs/build3x/html/topics/impala_resource_management.html
new file mode 100644
index 0000000..cbc116a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_resource_management.html
@@ -0,0 +1,97 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="resource_management"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Resource Management for Impala</title></head><body id="resource_management"><mai
n role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Resource Management for Impala</h1>
+
+
+ <div class="body conbody">
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The use of the Llama component for integrated resource management within YARN
+ is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+ The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+ </p>
+ <p class="p">
+ For clusters running Impala alongside
+ other data management components, you define static service pools to define the resources
+ available to Impala and other components. Then within the area allocated for Impala,
+ you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+ </p>
+ </div>
+
+ <p class="p">
+ You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
+ that run jobs from many Hadoop components.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="resource_management__rm_enforcement">
+
+ <h2 class="title topictitle2" id="ariaid-title2">How Resource Limits Are Enforced</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Limits on memory usage are enforced by Impala's process memory limit (the <code class="ph codeph">MEM_LIMIT</code>
+ query option setting). The admission control feature checks this setting to decide how many queries
+ can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
+ spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="resource_management__rm_query_options">
+
+ <h2 class="title topictitle2" id="ariaid-title3">impala-shell Query Options for Resource Management</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Before issuing SQL statements through the <span class="keyword cmdname">impala-shell</span> interpreter, you can use the
+ <code class="ph codeph">SET</code> command to configure the following parameters related to resource management:
+ </p>
+
+ <ul class="ul" id="rm_query_options__ul_nzt_twf_jp">
+ <li class="li">
+ <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+ </li>
+
+ </ul>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="resource_management__rm_limitations">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Limitations of Resource Management for Impala</h2>
+
+ <div class="body conbody">
+
+
+
+
+
+
+
+ <p class="p">
+ The <code class="ph codeph">MEM_LIMIT</code> query option, and the other resource-related query options, are settable
+ through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
+ lifted.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_revoke.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_revoke.html b/docs/build3x/html/topics/impala_revoke.html
new file mode 100644
index 0000000..02cbb59
--- /dev/null
+++ b/docs/build3x/html/topics/impala_revoke.html
@@ -0,0 +1,151 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="revoke"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REVOKE Statement (Impala 2.0 or higher only)</title></head><body id="revoke"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REVOKE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">REVOKE</code> statement revokes roles or
+ privileges on a specified object from groups.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>REVOKE ROLE <var class="keyword varname">role_name</var> FROM GROUP <var class="keyword varname">group_name</var>
+
+REVOKE <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+ FROM [ROLE] <var class="keyword varname">role_name</var>
+
+<span class="ph">
+ privilege ::= ALL | ALTER | CREATE | DROP | INSERT | REFRESH | SELECT | SELECT(<var class="keyword varname">column_name</var>)
+</span>
+<span class="ph">
+ object_type ::= TABLE | DATABASE | SERVER | URI
+</span>
+</code></pre>
+
+ <p class="p">
+ See <a href="impala_grant.html"><span class="keyword">GRANT Statement (Impala 2.0 or higher only)</span></a> for the required privileges and the scope
+ for SQL operations.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ALL</code> privilege is a distinct privilege and not a
+ union of all other privileges. Revoking <code class="ph codeph">SELECT</code>,
+ <code class="ph codeph">INSERT</code>, etc. from a role that only has the
+ <code class="ph codeph">ALL</code> privilege has no effect. To reduce the privileges
+ of that role you must <code class="ph codeph">REVOKE ALL</code> and
+ <code class="ph codeph">GRANT</code> the desired privileges.
+ </p>
+
+ <p class="p">
+ Typically, the object name is an identifier. For URIs, it is a string literal.
+ </p>
+
+ <p class="p">
+ The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+ in <span class="keyword">Impala 2.3</span> and higher. See
+ <span class="xref">the documentation for Apache Sentry</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Required privileges:</strong>
+ </p>
+
+ <p class="p">
+ Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+ policy file) can use this statement.
+ </p>
+ <p class="p">Only Sentry administrative users can revoke the role from a group.</p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <div class="p">
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">REVOKE</code> statements are available in <span class="keyword">Impala 2.0</span> and higher.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 1.4</span> and higher, Impala makes use of any roles and privileges specified by the
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+ use the Sentry service instead of the file-based policy mechanism.
+ </li>
+
+ <li class="li">
+ The Impala <code class="ph codeph">REVOKE</code> statements do not require the
+ <code class="ph codeph">ROLE</code> keyword to be repeated before each role name,
+ unlike the equivalent Hive statements.
+ </li>
+
+ <li class="li">
+ Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+ revoke a single privilege to or from a single role.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <div class="p">
+ Access to Kudu tables must be granted to and revoked from roles with the
+ following considerations:
+ <ul class="ul">
+ <li class="li">
+ Only users with the <code class="ph codeph">ALL</code> privilege on
+ <code class="ph codeph">SERVER</code> can create external Kudu tables.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+ required to specify the <code class="ph codeph">kudu.master_addresses</code>
+ property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+ tables as well as external tables.
+ </li>
+ <li class="li">
+ Access to Kudu tables is enforced at the table level and at the
+ column level.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+ permissions are supported.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+ <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+ privilege.
+ </li>
+ </ul>
+ Because non-SQL APIs can access Kudu data without going through Sentry
+ authorization, currently the Sentry support is considered preliminary
+ and subject to change.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+ <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
new file mode 100644
index 0000000..7f9466e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
@@ -0,0 +1,104 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_bloom_filter_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_bloom_filter_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_BLOOM_FILTER_SIZE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Size (in bytes) of Bloom filter data structure used by the runtime filtering
+ feature.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, this query option only applies as a fallback, when statistics
+ are not available. By default, Impala estimates the optimal size of the Bloom filter structure
+ regardless of the setting for this option. (This is a change from the original behavior in
+ <span class="keyword">Impala 2.5</span>.)
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, when the value of this query option is used for query planning,
+ it is constrained by the minimum and maximum sizes specified by the
+ <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query options.
+ The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 1048576 (1 MB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Maximum:</strong> 16 MB
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This setting affects optimizations for large and complex queries, such
+ as dynamic partition pruning for partitioned tables, and join optimization
+ for queries that join large tables.
+ Larger filters are more effective at handling
+ higher cardinality input sets, but consume more memory per filter.
+
+ </p>
+
+ <p class="p">
+ If your query filters on high-cardinality columns (for example, millions of different values)
+ and you do not get the expected speedup from the runtime filtering mechanism, consider
+ doing some benchmarks with a higher value for <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>.
+ The extra memory devoted to the Bloom filter data structures can help make the filtering
+ more accurate.
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ Because the effectiveness of this setting depends so much on query characteristics and data distribution,
+ you typically only use it for specific queries that need some extra tuning, and the ideal value depends
+ on the query. Consider setting this query option immediately before the expensive query and
+ unsetting it immediately afterward.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_max_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_max_size.html b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
new file mode 100644
index 0000000..b1cf316
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_max_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_max_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MAX_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ This option defines the maximum size for a filter,
+ no matter what the estimates produced by the planner are.
+ This value also overrides any lower number specified for the
+ <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+ Filter sizes are rounded up to the nearest power of two.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_min_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_min_size.html b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
new file mode 100644
index 0000000..fd70cdb
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_min_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_min_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MIN_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ This option defines the minimum size for a filter,
+ no matter what the estimates produced by the planner are.
+ This value also overrides any lower number specified for the
+ <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+ Filter sizes are rounded up to the nearest power of two.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_mode.html b/docs/build3x/html/topics/impala_runtime_filter_mode.html
new file mode 100644
index 0000000..6ce6b3b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_mode.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MODE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ It turns this feature on and off, and controls how
+ extensively the filters are transmitted between hosts.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric (0, 1, 2)
+ or corresponding mnemonic strings (<code class="ph codeph">OFF</code>, <code class="ph codeph">LOCAL</code>, <code class="ph codeph">GLOBAL</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 2 (equivalent to <code class="ph codeph">GLOBAL</code>); formerly was 1 / <code class="ph codeph">LOCAL</code>, in <span class="keyword">Impala 2.5</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the default is <code class="ph codeph">GLOBAL</code>.
+ This setting is recommended for a wide variety of workloads, to provide best
+ performance with <span class="q">"out of the box"</span> settings.
+ </p>
+
+ <p class="p">
+ The lowest setting of <code class="ph codeph">LOCAL</code> does a similar level of optimization
+ (such as partition pruning) as in earlier Impala releases.
+ This setting was the default in <span class="keyword">Impala 2.5</span>,
+ to allow for a period of post-upgrade testing for existing workloads.
+ This setting is suitable for workloads with non-performance-critical queries,
+ or if the coordinator node is under heavy CPU or memory pressure.
+ </p>
+
+ <p class="p">
+ You might change the setting to <code class="ph codeph">OFF</code> if your workload contains
+ many queries involving partitioned tables or joins that do not experience a performance
+ increase from the runtime filters feature. If the overhead of producing the runtime filters
+ outweighs the performance benefit for queries, you can turn the feature off entirely.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about runtime filtering.
+ <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a>,
+ and
+ <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+ for tuning options for runtime filtering.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
new file mode 100644
index 0000000..bcee5c6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
@@ -0,0 +1,51 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_wait_time_ms"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_wait_time_ms"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_WAIT_TIME_MS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option
+ adjusts the settings for the runtime filtering feature.
+ It specifies a time in milliseconds that each scan node waits for
+ runtime filters to be produced by other plan fragments.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filtering.html b/docs/build3x/html/topics/impala_runtime_filtering.html
new file mode 100644
index 0000000..1280838
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filtering.html
@@ -0,0 +1,533 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</title></head><body id="runtime_filtering"><main role="main"><article role="article" aria-labelledby="runtime_filtering__runtime_filters">
+
+ <h1 class="title topictitle1" id="runtime_filtering__runtime_filters">Runtime Filtering for Impala Queries (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ <dfn class="term">Runtime filtering</dfn> is a wide-ranging optimization feature available in
+ <span class="keyword">Impala 2.5</span> and higher. When only a fraction of the data in a table is
+ needed for a query against a partitioned table or to evaluate a join condition,
+ Impala determines the appropriate conditions while the query is running, and
+ broadcasts that information to all the <span class="keyword cmdname">impalad</span> nodes that are reading the table
+ so that they can avoid unnecessary I/O to read partition data, and avoid
+ unnecessary network transmission by sending only the subset of rows that match the join keys
+ across the network.
+ </p>
+
+ <p class="p">
+ This feature is primarily used to optimize queries against large partitioned tables
+ (under the name <dfn class="term">dynamic partition pruning</dfn>) and joins of large tables.
+ The information in this section includes concepts, internals, and troubleshooting
+ information for the entire runtime filtering feature.
+ For specific tuning steps for partitioned tables,
+
+ see
+ <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a>.
+
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ When this feature made its debut in <span class="keyword">Impala 2.5</span>,
+ the default setting was <code class="ph codeph">RUNTIME_FILTER_MODE=LOCAL</code>.
+ Now the default is <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code> in <span class="keyword">Impala 2.6</span> and higher,
+ which enables more wide-ranging and ambitious query optimization without requiring you to
+ explicitly set any query options.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="runtime_filtering__runtime_filtering_concepts">
+ <h2 class="title topictitle2" id="ariaid-title2">Background Information for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ To understand how runtime filtering works at a detailed level, you must
+ be familiar with some terminology from the field of distributed database technology:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ What a <dfn class="term">plan fragment</dfn> is.
+ Impala decomposes each query into smaller units of work that are distributed across the cluster.
+ Wherever possible, a data block is read, filtered, and aggregated by plan fragments executing
+ on the same host. For some operations, such as joins and combining intermediate results into
+ a final result set, data is transmitted across the network from one DataNode to another.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ What <code class="ph codeph">SCAN</code> and <code class="ph codeph">HASH JOIN</code> plan nodes are, and their role in computing query results:
+ </p>
+ <p class="p">
+ In the Impala query plan, a <dfn class="term">scan node</dfn> performs the I/O to read from the underlying data files.
+ Although this is an expensive operation from the traditional database perspective, Hadoop clusters and Impala are
+ optimized to do this kind of I/O in a highly parallel fashion. The major potential cost savings come from using
+ the columnar Parquet format (where Impala can avoid reading data for unneeded columns) and partitioned tables
+ (where Impala can avoid reading data for unneeded partitions).
+ </p>
+ <p class="p">
+ Most Impala joins use the
+ <a class="xref" href="https://en.wikipedia.org/wiki/Hash_join" target="_blank"><dfn class="term">hash join</dfn></a>
+ mechanism. (It is only fairly recently that Impala
+ started using the nested-loop join technique, for certain kinds of non-equijoin queries.)
+ In a hash join, when evaluating join conditions from two tables, Impala constructs a hash table in memory with all
+ the different column values from the table on one side of the join.
+ Then, for each row from the table on the other side of the join, Impala tests whether the relevant column values
+ are in this hash table or not.
+ </p>
+ <p class="p">
+ A <dfn class="term">hash join node</dfn> constructs such an in-memory hash table, then performs the comparisons to
+ identify which rows match the relevant join conditions
+ and should be included in the result set (or at least sent on to the subsequent intermediate stage of
+ query processing). Because some of the input for a hash join might be transmitted across the network from another host,
+ it is especially important from a performance perspective to prune out ahead of time any data that is known to be
+ irrelevant.
+ </p>
+ <p class="p">
+ The more distinct values are in the columns used as join keys, the larger the in-memory hash table and
+ thus the more memory required to process the query.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The difference between a <dfn class="term">broadcast join</dfn> and a <dfn class="term">shuffle join</dfn>.
+ (The Hadoop notion of a shuffle join is sometimes referred to in Impala as a <dfn class="term">partitioned join</dfn>.)
+ In a broadcast join, the table from one side of the join (typically the smaller table)
+ is sent in its entirety to all the hosts involved in the query. Then each host can compare its
+ portion of the data from the other (larger) table against the full set of possible join keys.
+ In a shuffle join, there is no obvious <span class="q">"smaller"</span> table, and so the contents of both tables
+ are divided up, and corresponding portions of the data are transmitted to each host involved in the query.
+ See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for information about how these different kinds of
+ joins are processed.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The notion of the build phase and probe phase when Impala processes a join query.
+ The <dfn class="term">build phase</dfn> is where the rows containing the join key columns, typically for the smaller table,
+ are transmitted across the network and built into an in-memory hash table data structure on one or
+ more destination nodes.
+ The <dfn class="term">probe phase</dfn> is where data is read locally (typically from the larger table) and the join key columns
+ are compared to the values in the in-memory hash table.
+ The corresponding input sources (tables, subqueries, and so on) for these
+ phases are referred to as the <dfn class="term">build side</dfn> and the <dfn class="term">probe side</dfn>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ How to set Impala query options: interactively within an <span class="keyword cmdname">impala-shell</span> session through
+ the <code class="ph codeph">SET</code> command, for a JDBC or ODBC application through the <code class="ph codeph">SET</code> statement, or
+ globally for all <span class="keyword cmdname">impalad</span> daemons through the <code class="ph codeph">default_query_options</code> configuration
+ setting.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="runtime_filtering__runtime_filtering_internals">
+ <h2 class="title topictitle2" id="ariaid-title3">Runtime Filtering Internals</h2>
+ <div class="body conbody">
+ <p class="p">
+ The <dfn class="term">filter</dfn> that is transmitted between plan fragments is essentially a list
+ of values for join key columns. When this list is values is transmitted in time to a scan node,
+ Impala can filter out non-matching values immediately after reading them, rather than transmitting
+ the raw data to another host to compare against the in-memory hash table on that host.
+ </p>
+ <p class="p">
+ For HDFS-based tables, this data structure is implemented as a <dfn class="term">Bloom filter</dfn>, which uses
+ a probability-based algorithm to determine all possible matching values. (The probability-based aspects
+ means that the filter might include some non-matching values, but if so, that does not cause any inaccuracy
+ in the final results.)
+ </p>
+ <p class="p">
+ Another kind of filter is the <span class="q">"min-max"</span> filter. It currently only applies to Kudu tables. The
+ filter is a data structure representing a minimum and maximum value. These filters are passed to
+ Kudu to reduce the number of rows returned to Impala when scanning the probe side of the join.
+ </p>
+ <p class="p">
+ There are different kinds of filters to match the different kinds of joins (partitioned and broadcast).
+ A broadcast filter reflects the complete list of relevant values and can be immediately evaluated by a scan node.
+ A partitioned filter reflects only the values processed by one host in the
+ cluster; all the partitioned filters must be combined into one (by the coordinator node) before the
+ scan nodes can use the results to accurately filter the data as it is read from storage.
+ </p>
+ <p class="p">
+ Broadcast filters are also classified as local or global. With a local broadcast filter, the information
+ in the filter is used by a subsequent query fragment that is running on the same host that produced the filter.
+ A non-local broadcast filter must be transmitted across the network to a query fragment that is running on a
+ different host. Impala designates 3 hosts to each produce non-local broadcast filters, to guard against the
+ possibility of a single slow host taking too long. Depending on the setting of the <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+ (<code class="ph codeph">LOCAL</code> or <code class="ph codeph">GLOBAL</code>), Impala either uses a conservative optimization
+ strategy where filters are only consumed on the same host that produced them, or a more aggressive strategy
+ where filters are eligible to be transmitted across the network.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In <span class="keyword">Impala 2.6</span> and higher, the default for runtime filtering is the <code class="ph codeph">GLOBAL</code> setting.
+ </div>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="runtime_filtering__runtime_filtering_file_formats">
+ <h2 class="title topictitle2" id="ariaid-title4">File Format Considerations for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ Parquet tables get the most benefit from
+ the runtime filtering optimizations. Runtime filtering can speed up
+ join queries against partitioned or unpartitioned Parquet tables,
+ and single-table queries against partitioned Parquet tables.
+ See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for information about
+ using Parquet tables with Impala.
+ </p>
+ <p class="p">
+ For other file formats (text, Avro, RCFile, and SequenceFile),
+ runtime filtering speeds up queries against partitioned tables only.
+ Because partitioned tables can use a mixture of formats, Impala produces
+ the filters in all cases, even if they are not ultimately used to
+ optimize the query.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="runtime_filtering__runtime_filtering_timing">
+ <h2 class="title topictitle2" id="ariaid-title5">Wait Intervals for Runtime Filters</h2>
+ <div class="body conbody">
+ <p class="p">
+ Because it takes time to produce runtime filters, especially for
+ partitioned filters that must be combined by the coordinator node,
+ there is a time interval above which it is more efficient for
+ the scan nodes to go ahead and construct their intermediate result sets,
+ even if that intermediate data is larger than optimal. If it only takes
+ a few seconds to produce the filters, it is worth the extra time if pruning
+ the unnecessary data can save minutes in the overall query time.
+ You can specify the maximum wait time in milliseconds using the
+ <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option.
+ </p>
+ <p class="p">
+ By default, each scan node waits for up to 1 second (1000 milliseconds)
+ for filters to arrive. If all filters have not arrived within the
+ specified interval, the scan node proceeds, using whatever filters
+ did arrive to help avoid reading unnecessary data. If a filter arrives
+ after the scan node begins reading data, the scan node applies that
+ filter to the data that is read after the filter arrives, but not to
+ the data that was already read.
+ </p>
+ <p class="p">
+ If the cluster is relatively busy and your workload contains many
+ resource-intensive or long-running queries, consider increasing the wait time
+ so that complicated queries do not miss opportunities for optimization.
+ If the cluster is lightly loaded and your workload contains many small queries
+ taking only a few seconds, consider decreasing the wait time to avoid the
+ 1 second delay for each query.
+ </p>
+ </div>
+ </article>
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="runtime_filtering__runtime_filtering_query_options">
+ <h2 class="title topictitle2" id="ariaid-title6">Query Options for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ See the following sections for information about the query options that control runtime filtering:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The first query option adjusts the <span class="q">"sensitivity"</span> of this feature.
+ <span class="ph">By default, it is set to the highest level (<code class="ph codeph">GLOBAL</code>).
+ (This default applies to <span class="keyword">Impala 2.6</span> and higher.
+ In previous releases, the default was <code class="ph codeph">LOCAL</code>.)</span>
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ The other query options are tuning knobs that you typically only adjust after doing
+ performance testing, and that you might want to change only for the duration of a single
+ expensive query:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>;
+ in <span class="keyword">Impala 2.6</span> and higher, this setting acts as a fallback when
+ statistics are not available, rather than as a directive.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="runtime_filtering__runtime_filtering_explain_plan">
+ <h2 class="title topictitle2" id="ariaid-title7">Runtime Filtering and Query Plans</h2>
+ <div class="body conbody">
+ <p class="p">
+ In the same way the query plan displayed by the
+ <code class="ph codeph">EXPLAIN</code> statement includes information
+ about predicates used by each plan fragment, it also
+ includes annotations showing whether a plan fragment
+ produces or consumes a runtime filter.
+ A plan fragment that produces a filter includes an
+ annotation such as
+ <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> <- <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>,
+ while a plan fragment that consumes a filter includes an annotation such as
+ <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> -> <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>.
+ <span class="ph">Setting the query option <code class="ph codeph">EXPLAIN_LEVEL=2</code> adds additional
+ annotations showing the type of the filter, either <code class="ph codeph"><var class="keyword varname">filter_id</var>[bloom]</code>
+ (for HDFS-based tables) or <code class="ph codeph"><var class="keyword varname">filter_id</var>[min_max]</code> (for Kudu tables).</span>
+ </p>
+
+ <p class="p">
+ The following example shows a query that uses a single runtime filter (labelled <code class="ph codeph">RF00</code>)
+ to prune the partitions that are scanned in one stage of the query, based on evaluating the
+ result set of a subquery:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+| |
+| 04:EXCHANGE [UNPARTITIONED] |
+| | |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
+| | hash predicates: year = year |
+| | <strong class="ph b">runtime filters: RF000 <- year</strong> |
+| | |
+| |--03:EXCHANGE [BROADCAST] |
+| | | |
+| | 01:SCAN HDFS [dpp.yy] |
+| | partitions=2/4 files=2 size=468B |
+| | |
+| 00:SCAN HDFS [dpp.yy2] |
+| partitions=2/3 files=2 size=468B |
+| <strong class="ph b">runtime filters: RF000 -> year</strong> |
++----------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The query profile (displayed by the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span>)
+ contains both the <code class="ph codeph">EXPLAIN</code> plan and more detailed information about the internal
+ workings of the query. The profile output includes a section labelled the <span class="q">"filter routing table"</span>,
+ with information about each filter based on its ID.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="runtime_filtering__runtime_filtering_queries">
+ <h2 class="title topictitle2" id="ariaid-title8">Examples of Queries that Benefit from Runtime Filtering</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ In this example, Impala would normally do extra work to interpret the columns
+ <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>, <code class="ph codeph">C3</code>, and <code class="ph codeph">ID</code>
+ for each row in <code class="ph codeph">HUGE_T1</code>, before checking the <code class="ph codeph">ID</code>
+ value against the in-memory hash table constructed from all the <code class="ph codeph">TINY_T2.ID</code>
+ values. By producing a filter containing all the <code class="ph codeph">TINY_T2.ID</code> values
+ even before the query starts scanning the <code class="ph codeph">HUGE_T1</code> table, Impala
+ can skip the unnecessary work to parse the column info as soon as it determines
+ that an <code class="ph codeph">ID</code> value does not match any of the values from the other table.
+ </p>
+
+ <p class="p">
+ The example shows <code class="ph codeph">COMPUTE STATS</code> statements for both the tables (even
+ though that is a one-time operation after loading data into those tables) because
+ Impala relies on up-to-date statistics to
+ determine which one has more distinct <code class="ph codeph">ID</code> values than the other.
+ That information lets Impala make effective decisions about which table to use to
+ construct the in-memory hash table, and which table to read from disk and
+ compare against the entries in the hash table.
+ </p>
+
+<pre class="pre codeblock"><code>
+COMPUTE STATS huge_t1;
+COMPUTE STATS tiny_t2;
+SELECT c1, c2, c3 FROM huge_t1 JOIN tiny_t2 WHERE huge_t1.id = tiny_t2.id;
+</code></pre>
+
+
+
+ <p class="p">
+ In this example, <code class="ph codeph">T1</code> is a table partitioned by year. The subquery
+ on <code class="ph codeph">T2</code> produces multiple values, and transmits those values as a filter to the plan
+ fragments that are reading from <code class="ph codeph">T1</code>. Any non-matching partitions in <code class="ph codeph">T1</code>
+ are skipped.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from t1 where year in (select distinct year from t2);
+</code></pre>
+
+ <p class="p">
+ Now the <code class="ph codeph">WHERE</code> clause contains an additional test that does not apply to
+ the partition key column.
+ A filter on a column that is not a partition key is called a per-row filter.
+ Because per-row filters only apply for Parquet, <code class="ph codeph">T1</code> must be a Parquet table.
+ </p>
+
+ <p class="p">
+ The subqueries result in two filters being transmitted to
+ the scan nodes that read from <code class="ph codeph">T1</code>. The filter on <code class="ph codeph">YEAR</code> helps the query eliminate
+ entire partitions based on non-matching years. The filter on <code class="ph codeph">C2</code> lets Impala discard
+ rows with non-matching <code class="ph codeph">C2</code> values immediately after reading them. Without runtime filtering,
+ Impala would have to keep the non-matching values in memory, assemble <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>,
+ and <code class="ph codeph">C3</code> into rows in the intermediate result set, and transmit all the intermediate rows
+ back to the coordinator node, where they would be eliminated only at the very end of the query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1, c2, c3 from t1
+ where year in (select distinct year from t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ <p class="p">
+ This example involves a broadcast join.
+ The fact that the <code class="ph codeph">ON</code> clause would
+ return a small number of matching rows (because there
+ are not very many rows in <code class="ph codeph">TINY_T2</code>)
+ means that the corresponding filter is very selective.
+ Therefore, runtime filtering will probably be effective
+ in optimizing this query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [broadcast] tiny_t2
+ on huge_t1.id = tiny_t2.id
+ where huge_t1.year in (select distinct year from tiny_t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ <p class="p">
+ This example involves a shuffle or partitioned join.
+ Assume that most rows in <code class="ph codeph">HUGE_T1</code>
+ have a corresponding row in <code class="ph codeph">HUGE_T2</code>.
+ The fact that the <code class="ph codeph">ON</code> clause could
+ return a large number of matching rows means that
+ the corresponding filter would not be very selective.
+ Therefore, runtime filtering might be less effective
+ in optimizing this query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [shuffle] huge_t2
+ on huge_t1.id = huge_t2.id
+ where huge_t1.year in (select distinct year from huge_t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="runtime_filtering__runtime_filtering_tuning">
+ <h2 class="title topictitle2" id="ariaid-title9">Tuning and Troubleshooting Queries that Use Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ These tuning and troubleshooting procedures apply to queries that are
+ resource-intensive enough, long-running enough, and frequent enough
+ that you can devote special attention to optimizing them individually.
+ </p>
+
+ <p class="p">
+ Use the <code class="ph codeph">EXPLAIN</code> statement and examine the <code class="ph codeph">runtime filters:</code>
+ lines to determine whether runtime filters are being applied to the <code class="ph codeph">WHERE</code> predicates
+ and join clauses that you expect. For example, runtime filtering does not apply to queries that use
+ the nested loop join mechanism due to non-equijoin operators.
+ </p>
+
+ <p class="p">
+ Make sure statistics are up-to-date for all tables involved in the queries.
+ Use the <code class="ph codeph">COMPUTE STATS</code> statement after loading data into non-partitioned tables,
+ and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> after adding new partitions to partitioned tables.
+ </p>
+
+ <p class="p">
+ If join queries involving large tables use unique columns as the join keys,
+ for example joining a primary key column with a foreign key column, the overhead of
+ producing and transmitting the filter might outweigh the performance benefit because
+ not much data could be pruned during the early stages of the query.
+ For such queries, consider setting the query option <code class="ph codeph">RUNTIME_FILTER_MODE=OFF</code>.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="runtime_filtering__runtime_filtering_limits">
+ <h2 class="title topictitle2" id="ariaid-title10">Limitations and Restrictions for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ The runtime filtering feature is most effective for the Parquet file formats.
+ For other file formats, filtering only applies for partitioned tables.
+ See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering_file_formats">File Format Considerations for Runtime Filtering</a>.
+ For the ways in which runtime filtering works for Kudu tables, see
+ <a class="xref" href="impala_kudu.html#kudu_performance">Impala Query Performance for Kudu Tables</a>.
+ </p>
+
+
+ <p class="p">
+ When the spill-to-disk mechanism is activated on a particular host during a query,
+ that host does not produce any filters while processing that query.
+ This limitation does not affect the correctness of results; it only reduces the
+ amount of optimization that can be applied to the query.
+ </p>
+
+ </div>
+ </article>
+
+
+</article></main></body></html>