You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:30 UTC
[21/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_row_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_row_size.html b/docs/build3x/html/topics/impala_max_row_size.html
new file mode 100644
index 0000000..76c6d69
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_row_size.html
@@ -0,0 +1,221 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_row_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ROW_SIZE Query Option</title></head><body id="max_row_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_ROW_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Ensures that Impala can process rows of at least the specified size. (Larger
+ rows might be successfully processed, but that is not guaranteed.) Applies when
+ constructing intermediate or final rows in the result set. This setting prevents
+ out-of-control memory use when accessing columns containing huge strings.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">524288</code> (512 KB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+ or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+ specify a value with unrecognized formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ If a query fails because it involves rows with long strings and/or
+ many columns, causing the total row size to exceed <code class="ph codeph">MAX_ROW_SIZE</code>
+ bytes, increase the <code class="ph codeph">MAX_ROW_SIZE</code> setting to accommodate
+ the total bytes stored in the largest row. Examine the error messages for any
+ failed queries to see the size of the row that caused the problem.
+ </p>
+ <p class="p">
+ Impala attempts to handle rows that exceed the <code class="ph codeph">MAX_ROW_SIZE</code>
+ value where practical, so in many cases, queries succeed despite having rows
+ that are larger than this setting.
+ </p>
+ <p class="p">
+ Specifying a value that is substantially higher than actually needed can cause
+ Impala to reserve more memory than is necessary to execute the query.
+ </p>
+ <p class="p">
+ In a Hadoop cluster with highly concurrent workloads and queries that process
+ high volumes of data, traditional SQL tuning advice about minimizing wasted memory
+ is worth remembering. For example, if a table has <code class="ph codeph">STRING</code> columns
+ where a single value might be multiple megabytes, make sure that the
+ <code class="ph codeph">SELECT</code> lists in queries only refer to columns that are actually
+ needed in the result set, instead of using the <code class="ph codeph">SELECT *</code> shorthand.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show the kinds of situations where it is necessary to
+ adjust the <code class="ph codeph">MAX_ROW_SIZE</code> setting. First, we create a table
+ containing some very long values in <code class="ph codeph">STRING</code> columns:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table big_strings (s1 string, s2 string, s3 string) stored as parquet;
+
+-- Turn off compression to more easily reason about data volume by doing SHOW TABLE STATS.
+-- Does not actually affect query success or failure, because MAX_ROW_SIZE applies when
+-- column values are materialized in memory.
+set compression_codec=none;
+set;
+...
+ MAX_ROW_SIZE: [524288]
+...
+
+-- A very small row.
+insert into big_strings values ('one', 'two', 'three');
+-- A row right around the default MAX_ROW_SIZE limit: a 500 KiB string and a 30 KiB string.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('123',10000));
+-- A row that is too big if the query has to materialize both S1 and S3.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('12345',100000));
+
+</code></pre>
+
+ <p class="p">
+ With the default <code class="ph codeph">MAX_ROW_SIZE</code> setting, different queries succeed
+ or fail based on which column values have to be materialized during query processing:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- All the S1 values can be materialized within the 512 KB MAX_ROW_SIZE buffer.
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 2 |
++--------------------+
+
+-- A row where even the S1 value is too large to materialize within MAX_ROW_SIZE.
+insert into big_strings values (repeat('12345',1000000), 'short', repeat('12345',1000000));
+
+-- The 5 MiB string is too large to materialize. The message explains the size of the result
+-- set row the query is attempting to materialize.
+select count(distinct(s1)) from big_strings;
+WARNINGS: Row of size 4.77 MB could not be materialized in plan node with id 1.
+ Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- If more columns are involved, the result set row being materialized is bigger.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1.
+ Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- Column S2, containing only short strings, can still be examined.
+select count(distinct(s2)) from big_strings;
++----------------------+
+| count(distinct (s2)) |
++----------------------+
+| 2 |
++----------------------+
+
+-- Queries that do not materialize the big column values are OK.
+select count(*) from big_strings;
++----------+
+| count(*) |
++----------+
+| 4 |
++----------+
+
+</code></pre>
+
+ <p class="p">
+ The following examples show how adjusting <code class="ph codeph">MAX_ROW_SIZE</code> upward
+ allows queries involving the long string columns to succeed:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Boosting MAX_ROW_SIZE moderately allows all S1 values to be materialized.
+set max_row_size=7mb;
+
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 3 |
++--------------------+
+
+-- But the combination of S1 + S3 strings is still too large.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1. Increase the max_row_size query option (currently 7.00 MB) to process larger rows.
+
+-- Boosting MAX_ROW_SIZE to larger than the largest row in the table allows
+-- all queries to complete successfully.
+set max_row_size=12mb;
+
+select count(distinct s1, s2, s3) from big_strings;
++----------------------------+
+| count(distinct s1, s2, s3) |
++----------------------------+
+| 4 |
++----------------------------+
+
+</code></pre>
+
+ <p class="p">
+ The following examples show how to reason about appropriate values for
+ <code class="ph codeph">MAX_ROW_SIZE</code>, based on the characteristics of the
+ columns containing the long values:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- With a large MAX_ROW_SIZE in place, we can examine the columns to
+-- understand the practical lower limit for MAX_ROW_SIZE based on the
+-- table structure and column values.
+select max(length(s1) + length(s2) + length(s3)) / 1e6 as megabytes from big_strings;
++-----------+
+| megabytes |
++-----------+
+| 10.000005 |
++-----------+
+
+-- We can also examine the 'Max Size' for each column after computing stats.
+compute stats big_strings;
+show column stats big_strings;
++--------+--------+------------------+--------+----------+-----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+-----------+
+| s1 | STRING | 2 | -1 | 5000000 | 2500002.5 |
+| s2 | STRING | 2 | -1 | 10 | 7.5 |
+| s3 | STRING | 2 | -1 | 5000000 | 2500005 |
++--------+--------+------------------+--------+----------+-----------+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_scan_range_length.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_scan_range_length.html b/docs/build3x/html/topics/impala_max_scan_range_length.html
new file mode 100644
index 0000000..0eaf110
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_scan_range_length.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_scan_range_length"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_SCAN_RANGE_LENGTH Query Option</title></head><body id="max_scan_range_length"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_SCAN_RANGE_LENGTH Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many
+ CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan
+ range.)
+ </p>
+
+ <p class="p">
+ Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value
+ can limit query performance because each scan range involves extra overhead.
+ </p>
+
+ <p class="p">
+ Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default,
+ which is the same as the HDFS block size for each table.
+ </p>
+
+ <p class="p">
+ Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can
+ query tables with huge block sizes without allocating equivalent blocks of memory.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.7</span> and higher, the argument value can include unit specifiers,
+ such as <code class="ph codeph">100m</code> or <code class="ph codeph">100mb</code>. In previous versions,
+ Impala interpreted such formatted values as 0, leading to query failures.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mem_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mem_limit.html b/docs/build3x/html/topics/impala_mem_limit.html
new file mode 100644
index 0000000..46e1cd3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mem_limit.html
@@ -0,0 +1,206 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
+ Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes.
+ </p>
+
+ <p class="p">
+ There are two levels of memory limit for Impala.
+ The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process
+ (which handles multiple queries concurrently).
+ That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>.
+ The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span>
+ or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query.
+ The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>,
+ and must always be less than the <span class="keyword cmdname">impalad</span> memory limit.
+ </p>
+
+ <p class="p">
+ If query processing exceeds the specified memory limit on any node, either the per-query limit or the
+ <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically.
+ Memory limits are checked periodically during query processing, so the actual memory in use
+ might briefly exceed the limit without the query being cancelled.
+ </p>
+
+ <p class="p">
+ When resource management is enabled, the mechanism for this option changes. If set, it overrides the
+ automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
+ query does not proceed until that much memory is available. The actual memory used by the query could be
+ lower, since some queries use much less memory than others. With resource management, the
+ <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any
+ node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
+ is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is
+ specified, Impala estimates the amount of memory needed on each node for each query, requests that much
+ memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each
+ node to the requested amount of memory during the query. Thus, if the query takes more memory than was
+ originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query
+ itself.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code>
+ for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized
+ formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (unlimited)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting,
+ or on a cluster with a workload shared between Impala and other data processing components.
+ You can prevent any query from accidentally using much more memory than expected,
+ which could negatively impact other Impala queries.
+ </p>
+
+ <p class="p">
+ Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>
+ to get a report of memory used for each phase of your most heavyweight queries on each node,
+ and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that.
+ See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about
+ the <code class="ph codeph">SUMMARY</code> command.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option
+ using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > set mem_limit=3000000000;
+MEM_LIMIT set to 3000000000
+[localhost:21000] > select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3g;
+MEM_LIMIT set to 3g
+[localhost:21000] > select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3gb;
+MEM_LIMIT set to 3gb
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3m;
+MEM_LIMIT set to 3m
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+[localhost:21000] > set mem_limit=3mb;
+MEM_LIMIT set to 3mb
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+</code></pre>
+
+ <p class="p">
+ The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code>
+ values lead to errors for subsequent queries.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > set mem_limit=3tb;
+MEM_LIMIT set to 3tb
+[localhost:21000] > select 5;
+ERROR: Failed to parse query memory limit from '3tb'.
+
+[localhost:21000] > set mem_limit=xyz;
+MEM_LIMIT set to xyz
+[localhost:21000] > select 5;
+Query: select 5
+ERROR: Failed to parse query memory limit from 'xyz'.
+</code></pre>
+
+ <p class="p">
+ The following examples shows the automatic query cancellation
+ when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded
+ on any host involved in the Impala query. First it runs a
+ successful query and checks the largest amount of memory
+ used on any node for any stage of the query.
+ Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code>
+ setting so that the same query cannot run.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > select count(*) from customer;
+Query: select count(*) from customer
++----------+
+| count(*) |
++----------+
+| 150000 |
++----------+
+
+[localhost:21000] > select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
++------------------------+
+| count(distinct c_name) |
++------------------------+
+| 150000 |
++------------------------+
+
+[localhost:21000] > summary;
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| 06:AGGREGATE | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
+| 05:EXCHANGE | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 02:AGGREGATE | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | |
+| 04:AGGREGATE | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | |
+| 03:EXCHANGE | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) |
+<strong class="ph b">| 01:AGGREGATE | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |</strong>
+| 00:SCAN HDFS | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+
+[localhost:21000] > set mem_limit=15mb;
+MEM_LIMIT set to 15mb
+[localhost:21000] > select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
+ERROR:
+Memory limit exceeded
+Query did not have enough memory to get the minimum required buffers in the block manager.
+</code></pre>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min.html b/docs/build3x/html/topics/impala_min.html
new file mode 100644
index 0000000..bfdfd0f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min.html
@@ -0,0 +1,297 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns the minimum value from a set of numbers. Opposite of the
+ <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function
+ or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+ are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are
+ <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+ <p class="p">
+ When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+ grouping values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+ For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+ bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ arguments which produce a <code class="ph codeph">STRING</code> result
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+ <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+ query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+ for the kinds of queries that this option applies to, and slight differences in how partitions are
+ evaluated when this query option is enabled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table.
+select min(c1) from t1;
+-- Find the smallest value for this column from a subset of the table.
+select min(c1) from t1 where month = 'January' and year = '2013';
+-- Find the smallest value from a set of numeric function results.
+select min(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, min(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select min(distinct x) from t1;
+</code></pre>
+
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table
+ containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as
+ opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
++----+----------+-----+
+| x | property | min |
++----+----------+-----+
+| 2 | even | 2 |
+| 4 | even | 2 |
+| 6 | even | 2 |
+| 8 | even | 2 |
+| 10 | even | 2 |
+| 1 | odd | 1 |
+| 3 | odd | 1 |
+| 5 | odd | 1 |
+| 7 | odd | 1 |
+| 9 | odd | 1 |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code>
+result only decreases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point'
+ from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">range between unbounded preceding and current row</strong>
+ ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">rows between unbounded preceding and current row</strong>
+ ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running minimum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+ ) as 'local minimum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local minimum |
++---+----------+---------------+
+| 7 | prime | 5 |
+| 5 | prime | 3 |
+| 3 | prime | 2 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 1 |
+| 1 | square | 1 |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">range between unbounded preceding and 1 following</strong>
+ ) as 'local minimum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+ <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min_spillable_buffer_size.html b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
new file mode 100644
index 0000000..9f3c84e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min_spillable_buffer_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN_SPILLABLE_BUFFER_SIZE Query Option</title></head><body id="min_spillable_buffer_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MIN_SPILLABLE_BUFFER_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies the minimum size for a memory buffer used when the
+ spill-to-disk mechanism is activated, for example for queries against
+ a large table with no statistics, or large join operations.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">65536</code> (64 KB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+ or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+ specify a value with unrecognized formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ This query option sets a lower bound on the size of the internal
+ buffer size that can be used during spill-to-disk operations. The
+ actual size of the buffer is chosen by the query planner.
+ </p>
+ <p class="p">
+ If overall query performance is limited by the time needed for spilling,
+ consider increasing the <code class="ph codeph">MIN_SPILLABLE_BUFFER_SIZE</code> setting.
+ Larger buffer sizes result in Impala issuing larger I/O requests to storage
+ devices, which might result in higher throughput, particularly on rotational
+ disks.
+ </p>
+ <p class="p">
+ The tradeoff with a large value for this setting is increased memory usage during
+ spill-to-disk operations. Reducing this value may reduce memory consumption.
+ </p>
+ <p class="p">
+ To determine if the value for this setting is having an effect by capping the
+ spillable buffer size, you can see the buffer size chosen by the query planner for
+ a particular query. <code class="ph codeph">EXPLAIN</code> the query while the setting
+ <code class="ph codeph">EXPLAIN_LEVEL=2</code> is in effect.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+set min_spillable_buffer_size=128KB;
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_misc_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_misc_functions.html b/docs/build3x/html/topics/impala_misc_functions.html
new file mode 100644
index 0000000..4210a99
--- /dev/null
+++ b/docs/build3x/html/topics/impala_misc_functions.html
@@ -0,0 +1,175 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala supports the following utility functions that do not operate on a particular column or data type:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="misc_functions__current_database">
+ <code class="ph codeph">current_database()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code>
+ if no database has been selected, or whatever database the session switched to through a
+ <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__effective_user">
+ <code class="ph codeph">effective_user()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>,
+ except if delegation is enabled, in which case it returns the ID of the delegated user.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__pid">
+ <code class="ph codeph">pid()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is
+ connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the
+ arguments, and so on the <span class="keyword cmdname">impalad</span> process.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__user">
+ <code class="ph codeph">user()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span>
+ daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to
+ understand how authorization settings apply in a security context; once you know the logged-in username,
+ you can check which groups that user belongs to, and from the list of groups you can check which roles
+ are available to those groups through the authorization policy file.
+ <p class="p">
+ In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+ <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+ </p>
+ <p class="p">
+ When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__uuid">
+ <code class="ph codeph">uuid()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used
+ as identifiers within a table, and as join keys across multiple tables.
+ The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not
+ require storing or querying the highest sequence number. For example, you
+ can use it to quickly construct new unique identifiers during a data import job,
+ or to combine data from different tables without the likelihood of ID collisions.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+-- Each call to uuid() produces a new arbitrary value.
+select uuid();
++--------------------------------------+
+| uuid() |
++--------------------------------------+
+| c7013e25-1455-457f-bf74-a2046e58caea |
++--------------------------------------+
+
+-- If you get a UUID for each row of a result set, you can use it as a
+-- unique identifier within a table, or even a unique ID across tables.
+select uuid() from four_row_table;
++--------------------------------------+
+| uuid() |
++--------------------------------------+
+| 51d3c540-85e5-4cb9-9110-604e53999e2e |
+| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 |
+| 5e9d7c36-9842-4a96-862d-c13cd0457c02 |
+| cae29095-0cc0-4053-a5ea-7fcd3c780861 |
++--------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__version">
+ <code class="ph codeph">version()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the
+ <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you
+ are connected to the expected level of Impala to use a particular feature, or to connect to several nodes
+ and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines)
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mixed_security.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mixed_security.html b/docs/build3x/html/topics/impala_mixed_security.html
new file mode 100644
index 0000000..9cadbf7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mixed_security.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each
+ <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No
+ special actions need to be taken if some users authenticate through Kerberos and some through LDAP.
+ </p>
+
+ <p class="p">
+ Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port
+ depending on the kind of authentication, then configure your network load balancer to forward each kind of
+ request to a DataNode that was set up with the appropriate authentication type. Once the initial request was
+ made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating
+ the work across multiple nodes and transmitting intermediate results back to the coordinator node.
+ </p>
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mt_dop.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mt_dop.html b/docs/build3x/html/topics/impala_mt_dop.html
new file mode 100644
index 0000000..42d9591
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mt_dop.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Sets the degree of intra-node parallelism used for certain operations that
+ can benefit from multithreaded execution. You can specify values
+ higher than zero to find the ideal balance of response time,
+ memory usage, and CPU usage during statement processing.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The Impala execution engine is being revamped incrementally to add
+ additional parallelism within a single host for certain statements and
+ kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
+ <span class="q">"old"</span> code path with limited intra-node parallelism.
+ </p>
+
+ <p class="p">
+ Currently, the operations affected by the <code class="ph codeph">MT_DOP</code>
+ query option are:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
+ <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Queries with execution plans containing only scan and aggregation operators,
+ or local joins that do not need data exchanges (such as for nested types).
+ Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero
+ value. Therefore, this query option is typically only set for the duration of
+ specific long-running, CPU-intensive queries.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
+ </p>
+ <p class="p">
+ Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ statements for Parquet tables benefit substantially from extra intra-node
+ parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
+ for Parquet tables.
+ </p>
+ <p class="p">
+ <strong class="ph b">Range:</strong> 0 to 64
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Any timing figures in the following examples are on a small, lightly loaded development cluster.
+ Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
+ partitions within each table.
+ </p>
+ </div>
+
+ <p class="p">
+ The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
+ statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
+ setting:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Explicitly setting MT_DOP to 0 selects the old code path.
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- The analysis for the billion rows is distributed among hosts,
+-- but uses only a single core on each host.
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Using 4 logical processors per host is faster.
+set mt_dop = 4;
+MT_DOP set to 4
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Unsetting the option reverts back to its default.
+-- Which for COMPUTE STATS and a Parquet table is 4,
+-- so again it uses the fast path.
+unset MT_DOP;
+Unsetting option MT_DOP
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
+ for a query involving only scan and aggregation operations for a Parquet table:
+ </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- COUNT(DISTINCT) for a unique column is CPU-intensive.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000 |
++--------------------+
+Fetched 1 row(s) in 67.20s
+
+set mt_dop = 16;
+MT_DOP set to 16
+
+-- Introducing more intra-node parallelism for the aggregation
+-- speeds things up, and potentially reduces memory overhead by
+-- reducing the number of scanner threads.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000 |
++--------------------+
+Fetched 1 row(s) in 17.19s
+
+</code></pre>
+
+ <p class="p">
+ The following example shows how queries that are not compatible with non-zero
+ <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
+ is set:
+ </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop=1;
+MT_DOP set to 1
+
+select * from a1 inner join a2
+ on a1.id = a2.id limit 4;
+ERROR: NotImplementedException: MT_DOP not supported for plans with
+ base table joins or table sinks.
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+ <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ndv.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ndv.html b/docs/build3x/html/topics/impala_ndv.html
new file mode 100644
index 0000000..a3f7e2c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ndv.html
@@ -0,0 +1,226 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT
+ <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the
+ combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and
+ thus is less memory-intensive for columns with high cardinality.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the
+ number of distinct values in a column.
+ </p>
+
+ <p class="p">
+ Because this number is an estimate, it might not reflect the precise number of different values in the
+ column, especially if the cardinality is very low or very high. If the estimated number is higher than the
+ number of rows in the table, Impala adjusts the value internally during query planning.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+ releases
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example queries a billion-row table to illustrate the relative performance of
+ <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code>
+ gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient.
+ The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster.
+ </p>
+
+<pre class="pre codeblock"><code>select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000 |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select cast(ndv(col1) as bigint) as col1 from sample_data;
++----------+
+| col1 |
++----------+
+| 139017 |
++----------+
+Fetched 1 row(s) in 8.91s
+</code></pre>
+
+ <p class="p">
+ The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to
+ easily learn which columns have substantially more or fewer distinct values. This technique is faster than
+ running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls.
+ </p>
+
+<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2,
+ cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4
+ from sample_data;
++----------+-----------+------------+-----------+
+| col1 | col2 | col3 | col4 |
++----------+-----------+------------+-----------+
+| 139017 | 282 | 46 | 145636240 |
++----------+-----------+------------+-----------+
+Fetched 1 row(s) in 34.97s
+
+select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000 |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select count(distinct col2) from sample_data;
++----------------------+
+| count(distinct col2) |
++----------------------+
+| 278 |
++----------------------+
+Fetched 1 row(s) in 20.09s
+
+select count(distinct col3) from sample_data;
++-----------------------+
+| count(distinct col3) |
++-----------------------+
+| 46 |
++-----------------------+
+Fetched 1 row(s) in 19.12s
+
+select count(distinct col4) from sample_data;
++----------------------+
+| count(distinct col4) |
++----------------------+
+| 147135880 |
++----------------------+
+Fetched 1 row(s) in 266.95s
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>