You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@impala.apache.org by mi...@apache.org on 2018/05/09 21:10:10 UTC
[01/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Repository: impala
Updated Branches:
refs/heads/asf-site 52b8807de -> fae51ec24
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_string_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_string_functions.html b/docs/build3x/html/topics/impala_string_functions.html
new file mode 100644
index 0000000..b623a47
--- /dev/null
+++ b/docs/build3x/html/topics/impala_string_functions.html
@@ -0,0 +1,1719 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala String Functions</title></head><body id="string_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala String Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <div class="p">
+ String functions are classified as those primarily accepting or returning <code class="ph codeph">STRING</code>,
+ <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> data types, for example to measure the length of a string
+ or concatenate two strings together.
+ <ul class="ul">
+ <li class="li">
+ All the functions that accept <code class="ph codeph">STRING</code> arguments also accept the <code class="ph codeph">VARCHAR</code>
+ and <code class="ph codeph">CHAR</code> types introduced in Impala 2.0.
+ </li>
+
+ <li class="li">
+ Whenever <code class="ph codeph">VARCHAR</code> or <code class="ph codeph">CHAR</code> values are passed to a function that returns a
+ string value, the return type is normalized to <code class="ph codeph">STRING</code>. For example, a call to
+ <code class="ph codeph">concat()</code> with a mix of <code class="ph codeph">STRING</code>, <code class="ph codeph">VARCHAR</code>, and
+ <code class="ph codeph">CHAR</code> arguments produces a <code class="ph codeph">STRING</code> result.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The string functions operate mainly on these data types: <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+ <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, and <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following string functions:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="string_functions__ascii">
+ <code class="ph codeph">ascii(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the numeric ASCII code of the first character of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__base64decode">
+ <code class="ph codeph">base64decode(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ For general information about Base64 encoding, see
+ <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+ </p>
+ <p class="p">
+ The functions <code class="ph codeph">base64encode()</code> and
+ <code class="ph codeph">base64decode()</code> are typically used
+ in combination, to store in an Impala table string data that is
+ problematic to store or transmit. For example, you could use
+ these functions to store string data that uses an encoding
+ other than UTF-8, or to transform the values in contexts that
+ require ASCII values, such as for partition key columns.
+ Keep in mind that base64-encoded values produce different results
+ for string functions such as <code class="ph codeph">LENGTH()</code>,
+ <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+ those functions are called with the unencoded string values.
+ </p>
+ <p class="p">
+ The set of characters that can be generated as output
+ from <code class="ph codeph">base64encode()</code>, or specified in
+ the argument string to <code class="ph codeph">base64decode()</code>,
+ are the ASCII uppercase and lowercase letters (A-Z, a-z),
+ digits (0-9), and the punctuation characters
+ <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+ </p>
+ <p class="p">
+ All return values produced by <code class="ph codeph">base64encode()</code>
+ are a multiple of 4 bytes in length. All argument values
+ supplied to <code class="ph codeph">base64decode()</code> must also be a
+ multiple of 4 bytes in length. If a base64-encoded value
+ would otherwise have a different length, it can be padded
+ with trailing <code class="ph codeph">=</code> characters to reach a length
+ that is a multiple of 4 bytes.
+ </p>
+ <p class="p">
+ If the argument string to <code class="ph codeph">base64decode()</code> does
+ not represent a valid base64-encoded value, subject to the
+ constraints of the Impala implementation such as the allowed
+ character set, the function returns <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">base64encode()</code>
+ and <code class="ph codeph">base64decode()</code> together to store and retrieve
+ string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+ length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16 |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+ These examples demonstrate incorrect encoded values that
+ produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+ which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+ These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+ encoded string, and back again. This technique is applicable if the original
+ source is in an unknown encoding, or if some intermediate processing stage
+ might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+ base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original | encoded |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+ base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded | decoded |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__base64encode">
+ <code class="ph codeph">base64encode(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ For general information about Base64 encoding, see
+ <a class="xref" href="https://en.wikipedia.org/wiki/Base64" target="_blank">Base64 article on Wikipedia</a>.
+ </p>
+ <p class="p">
+ The functions <code class="ph codeph">base64encode()</code> and
+ <code class="ph codeph">base64decode()</code> are typically used
+ in combination, to store in an Impala table string data that is
+ problematic to store or transmit. For example, you could use
+ these functions to store string data that uses an encoding
+ other than UTF-8, or to transform the values in contexts that
+ require ASCII values, such as for partition key columns.
+ Keep in mind that base64-encoded values produce different results
+ for string functions such as <code class="ph codeph">LENGTH()</code>,
+ <code class="ph codeph">MAX()</code>, and <code class="ph codeph">MIN()</code> than when
+ those functions are called with the unencoded string values.
+ </p>
+ <p class="p">
+ The set of characters that can be generated as output
+ from <code class="ph codeph">base64encode()</code>, or specified in
+ the argument string to <code class="ph codeph">base64decode()</code>,
+ are the ASCII uppercase and lowercase letters (A-Z, a-z),
+ digits (0-9), and the punctuation characters
+ <code class="ph codeph">+</code>, <code class="ph codeph">/</code>, and <code class="ph codeph">=</code>.
+ </p>
+ <p class="p">
+ All return values produced by <code class="ph codeph">base64encode()</code>
+ are a multiple of 4 bytes in length. All argument values
+ supplied to <code class="ph codeph">base64decode()</code> must also be a
+ multiple of 4 bytes in length. If a base64-encoded value
+ would otherwise have a different length, it can be padded
+ with trailing <code class="ph codeph">=</code> characters to reach a length
+ that is a multiple of 4 bytes.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">base64encode()</code>
+ and <code class="ph codeph">base64decode()</code> together to store and retrieve
+ string values:
+<pre class="pre codeblock"><code>
+-- An arbitrary string can be encoded in base 64.
+-- The length of the output is a multiple of 4 bytes,
+-- padded with trailing = characters if necessary.
+select base64encode('hello world') as encoded,
+ length(base64encode('hello world')) as length;
++------------------+--------+
+| encoded | length |
++------------------+--------+
+| aGVsbG8gd29ybGQ= | 16 |
++------------------+--------+
+
+-- Passing an encoded value to base64decode() produces
+-- the original value.
+select base64decode('aGVsbG8gd29ybGQ=') as decoded;
++-------------+
+| decoded |
++-------------+
+| hello world |
++-------------+
+</code></pre>
+
+ These examples demonstrate incorrect encoded values that
+ produce <code class="ph codeph">NULL</code> return values when decoded:
+
+<pre class="pre codeblock"><code>
+-- The input value to base64decode() must be a multiple of 4 bytes.
+-- In this case, leaving off the trailing = padding character
+-- produces a NULL return value.
+select base64decode('aGVsbG8gd29ybGQ') as decoded;
++---------+
+| decoded |
++---------+
+| NULL |
++---------+
+WARNINGS: UDF WARNING: Invalid base64 string; input length is 15,
+ which is not a multiple of 4.
+
+-- The input to base64decode() can only contain certain characters.
+-- The $ character in this case causes a NULL return value.
+select base64decode('abc$');
++----------------------+
+| base64decode('abc$') |
++----------------------+
+| NULL |
++----------------------+
+WARNINGS: UDF WARNING: Could not base64 decode input in space 4; actual output length 0
+</code></pre>
+
+ These examples demonstrate <span class="q">"round-tripping"</span> of an original string to an
+ encoded string, and back again. This technique is applicable if the original
+ source is in an unknown encoding, or if some intermediate processing stage
+ might cause national characters to be misrepresented:
+
+<pre class="pre codeblock"><code>
+select 'circumflex accents: â, ê, î, ô, û' as original,
+ base64encode('circumflex accents: â, ê, î, ô, û') as encoded;
++-----------------------------------+------------------------------------------------------+
+| original | encoded |
++-----------------------------------+------------------------------------------------------+
+| circumflex accents: â, ê, î, ô, û | Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= |
++-----------------------------------+------------------------------------------------------+
+
+select base64encode('circumflex accents: â, ê, î, ô, û') as encoded,
+ base64decode(base64encode('circumflex accents: â, ê, î, ô, û')) as decoded;
++------------------------------------------------------+-----------------------------------+
+| encoded | decoded |
++------------------------------------------------------+-----------------------------------+
+| Y2lyY3VtZmxleCBhY2NlbnRzOiDDoiwgw6osIMOuLCDDtCwgw7s= | circumflex accents: â, ê, î, ô, û |
++------------------------------------------------------+-----------------------------------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__btrim">
+ <code class="ph codeph">btrim(string a)</code>,
+ <code class="ph codeph">btrim(string a, string chars_to_trim)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Removes all instances of one or more characters
+ from the start and end of a <code class="ph codeph">STRING</code> value.
+ By default, removes only spaces.
+ If a non-<code class="ph codeph">NULL</code> optional second argument is specified, the function removes all
+ occurrences of characters in that second argument from the beginning and
+ end of the string.
+ <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the default <code class="ph codeph">btrim()</code> behavior,
+ and what changes when you specify the optional second argument.
+ All the examples bracket the output value with <code class="ph codeph">[ ]</code>
+ so that you can see any leading or trailing spaces in the <code class="ph codeph">btrim()</code> result.
+ By default, the function removes and number of both leading and trailing spaces.
+ When the second argument is specified, any number of occurrences of any
+ character in the second argument are removed from the start and end of the
+ input string; in this case, spaces are not removed (unless they are part of the second
+ argument) and any instances of the characters are not removed if they do not come
+ right at the beginning or end of the string.
+ </p>
+<pre class="pre codeblock"><code>-- Remove multiple spaces before and one space after.
+select concat('[',btrim(' hello '),']');
++---------------------------------------+
+| concat('[', btrim(' hello '), ']') |
++---------------------------------------+
+| [hello] |
++---------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end. Leave spaces alone.
+select concat('[',btrim('xy hello zyzzxx','xyz'),']');
++------------------------------------------------------+
+| concat('[', btrim('xy hello zyzzxx', 'xyz'), ']') |
++------------------------------------------------------+
+| [ hello ] |
++------------------------------------------------------+
+
+-- Remove any instances of x or y or z at beginning or end.
+-- Leave x, y, z alone in the middle of the string.
+select concat('[',btrim('xyhelxyzlozyzzxx','xyz'),']');
++----------------------------------------------------+
+| concat('[', btrim('xyhelxyzlozyzzxx', 'xyz'), ']') |
++----------------------------------------------------+
+| [helxyzlo] |
++----------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__char_length">
+ <code class="ph codeph">char_length(string a), <span class="ph" id="string_functions__character_length">character_length(string a)</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string, including any
+ trailing spaces that pad a <code class="ph codeph">CHAR</code> value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ When applied to a <code class="ph codeph">STRING</code> value, it returns the
+ same result as the <code class="ph codeph">length()</code> function. When applied
+ to a <code class="ph codeph">CHAR</code> value, it might return a larger value
+ than <code class="ph codeph">length()</code> does, to account for trailing spaces
+ in the <code class="ph codeph">CHAR</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following example demonstrates how <code class="ph codeph">length()</code>
+ and <code class="ph codeph">char_length()</code> sometimes produce the same result,
+ and sometimes produce different results depending on the type of the
+ argument and the presence of trailing spaces for <code class="ph codeph">CHAR</code>
+ values. The <code class="ph codeph">S</code> and <code class="ph codeph">C</code> values are
+ displayed with enclosing quotation marks to show any trailing spaces.
+<pre class="pre codeblock" id="string_functions__d6e2627"><code>create table length_demo (s string, c char(5));
+insert into length_demo values
+ ('a',cast('a' as char(5))),
+ ('abc',cast('abc' as char(5))),
+ ('hello',cast('hello' as char(5)));
+
+select concat('"',s,'"') as s, concat('"',c,'"') as c,
+ length(s), length(c),
+ char_length(s), char_length(c)
+from length_demo;
++---------+---------+-----------+-----------+----------------+----------------+
+| s | c | length(s) | length(c) | char_length(s) | char_length(c) |
++---------+---------+-----------+-----------+----------------+----------------+
+| "a" | "a " | 1 | 1 | 1 | 5 |
+| "abc" | "abc " | 3 | 3 | 3 | 5 |
+| "hello" | "hello" | 5 | 5 | 5 | 5 |
++---------+---------+-----------+-----------+----------------+----------------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__chr">
+ <code class="ph codeph">chr(int character_code)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a character specified by a decimal code point value.
+ The interpretation and display of the resulting character depends on your system locale.
+ Because consistent processing of Impala string values is only guaranteed
+ for values within the ASCII range, only use this function for values
+ corresponding to ASCII characters.
+ In particular, parameter values greater than 255 return an empty string.
+ <p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Can be used as the inverse of the <code class="ph codeph">ascii()</code> function, which
+ converts a character to its numeric ASCII code.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>SELECT chr(65);
++---------+
+| chr(65) |
++---------+
+| A |
++---------+
+
+SELECT chr(97);
++---------+
+| chr(97) |
++---------+
+| a |
++---------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__concat">
+ <code class="ph codeph">concat(string a, string b...)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing all the argument values joined together.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__concat_ws">
+ <code class="ph codeph">concat_ws(string sep, string a, string b...)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing the second and following argument values joined
+ together, delimited by a specified separator.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__find_in_set">
+ <code class="ph codeph">find_in_set(string str, string strList)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a specified string
+ within a comma-separated string. Returns <code class="ph codeph">NULL</code> if either argument is
+ <code class="ph codeph">NULL</code>, 0 if the search string is not found, or 0 if the search string contains a comma.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__group_concat">
+ <code class="ph codeph">group_concat(string s [, string sep])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a single string representing the argument value concatenated together for each
+ row of the result set. If the optional separator string is specified, the separator is added between each
+ pair of concatenated values.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+ <p class="p">
+ By default, returns a single string covering the whole result set. To include other columns or values
+ in the result set, or to produce multiple concatenated strings for subsets of rows, include a
+ <code class="ph codeph">GROUP BY</code> clause in the query.
+ </p>
+ <p class="p">
+ Strictly speaking, <code class="ph codeph">group_concat()</code> is an aggregate function, not a scalar
+ function like the others in this list.
+ For additional details and examples, see <a class="xref" href="impala_group_concat.html#group_concat">GROUP_CONCAT Function</a>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__initcap">
+ <code class="ph codeph">initcap(string str)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with the first letter capitalized.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__instr">
+ <code class="ph codeph">instr(string str, string substr <span class="ph">[, bigint position [, bigint occurrence ] ]</span>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+ longer string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If the substring is not present in the string, the function returns 0:
+ </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'z');
++------------------------------+
+| instr('foo bar bletch', 'z') |
++------------------------------+
+| 0 |
++------------------------------+
+</code></pre>
+
+ <p class="p">
+ The optional third and fourth arguments let you find instances of the substring
+ other than the first instance starting from the left:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The third argument lets you specify a starting point within the string
+ other than 1:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Restricting the search to positions 7..end,
+-- the first occurrence of 'b' is at position 9.
+select instr('foo bar bletch', 'b', 7);
++---------------------------------+
+| instr('foo bar bletch', 'b', 7) |
++---------------------------------+
+| 9 |
++---------------------------------+
+
+-- If there are no more occurrences after the
+-- specified position, the result is 0.
+select instr('foo bar bletch', 'b', 10);
++----------------------------------+
+| instr('foo bar bletch', 'b', 10) |
++----------------------------------+
+| 0 |
++----------------------------------+
+</code></pre>
+
+ <p class="p">
+ If the third argument is negative, the search works right-to-left
+ starting that many characters from the right. The return value still
+ represents the position starting from the left side of the string.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Scanning right to left, the first occurrence of 'o'
+-- is at position 8. (8th character from the left.)
+select instr('hello world','o',-1);
++-------------------------------+
+| instr('hello world', 'o', -1) |
++-------------------------------+
+| 8 |
++-------------------------------+
+
+-- Scanning right to left, starting from the 6th character
+-- from the right, the first occurrence of 'o' is at
+-- position 5 (5th character from the left).
+select instr('hello world','o',-6);
++-------------------------------+
+| instr('hello world', 'o', -6) |
++-------------------------------+
+| 5 |
++-------------------------------+
+
+-- If there are no more occurrences after the
+-- specified position, the result is 0.
+select instr('hello world','o',-10);
++--------------------------------+
+| instr('hello world', 'o', -10) |
++--------------------------------+
+| 0 |
++--------------------------------+
+</code></pre>
+
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The fourth argument lets you specify an occurrence other than the first:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- 2nd occurrence of 'b' is at position 9.
+select instr('foo bar bletch', 'b', 1, 2);
++------------------------------------+
+| instr('foo bar bletch', 'b', 1, 2) |
++------------------------------------+
+| 9 |
++------------------------------------+
+
+-- Negative position argument means scan right-to-left.
+-- This example finds second instance of 'b' from the right.
+select instr('foo bar bletch', 'b', -1, 2);
++-------------------------------------+
+| instr('foo bar bletch', 'b', -1, 2) |
++-------------------------------------+
+| 5 |
++-------------------------------------+
+</code></pre>
+
+ <p class="p">
+ If the fourth argument is greater than the number of matching occurrences,
+ the function returns 0:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- There is no 3rd occurrence within the string.
+select instr('foo bar bletch', 'b', 1, 3);
++------------------------------------+
+| instr('foo bar bletch', 'b', 1, 3) |
++------------------------------------+
+| 0 |
++------------------------------------+
+
+-- There is not even 1 occurrence when scanning
+-- the string starting at position 10.
+select instr('foo bar bletch', 'b', 10, 1);
++-------------------------------------+
+| instr('foo bar bletch', 'b', 10, 1) |
++-------------------------------------+
+| 0 |
++-------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The fourth argument cannot be negative or zero. A non-positive value for
+ this argument causes an error:
+ </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'b', 1, 0);
+ERROR: UDF ERROR: Invalid occurrence parameter to instr function: 0
+
+select instr('aaaaaaaaa','aa', 1, -1);
+ERROR: UDF ERROR: Invalid occurrence parameter to instr function: -1
+</code></pre>
+
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If either of the optional arguments is <code class="ph codeph">NULL</code>,
+ the function also returns <code class="ph codeph">NULL</code>:
+ </p>
+
+<pre class="pre codeblock"><code>
+select instr('foo bar bletch', 'b', null);
++------------------------------------+
+| instr('foo bar bletch', 'b', null) |
++------------------------------------+
+| NULL |
++------------------------------------+
+
+select instr('foo bar bletch', 'b', 1, null);
++---------------------------------------+
+| instr('foo bar bletch', 'b', 1, null) |
++---------------------------------------+
+| NULL |
++---------------------------------------+
+</code></pre>
+ </li>
+
+ </ul>
+
+ </dd>
+
+
+
+ <dt class="dt dlterm" id="string_functions__left">
+ <code class="ph codeph">left(string a, int num_chars)</code>
+ </dt>
+ <dd class="dd">
+ See the <code class="ph codeph">strleft</code> function.
+ </dd>
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__length">
+ <code class="ph codeph">length(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the length in characters of the argument string,
+ ignoring any trailing spaces in <code class="ph codeph">CHAR</code> values.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ When applied to a <code class="ph codeph">STRING</code> value, it returns the
+ same result as the <code class="ph codeph">char_length()</code> function. When applied
+ to a <code class="ph codeph">CHAR</code> value, it might return a smaller value
+ than <code class="ph codeph">char_length()</code> does, because <code class="ph codeph">length()</code>
+ ignores any trailing spaces in the <code class="ph codeph">CHAR</code>.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the behavior of <code class="ph codeph">length()</code> with <code class="ph codeph">CHAR</code>
+ values containing trailing spaces is not standardized across the industry,
+ when porting code from other database systems, evaluate the behavior of
+ <code class="ph codeph">length()</code> on the source system and switch to
+ <code class="ph codeph">char_length()</code> for Impala if necessary.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following example demonstrates how <code class="ph codeph">length()</code>
+ and <code class="ph codeph">char_length()</code> sometimes produce the same result,
+ and sometimes produce different results depending on the type of the
+ argument and the presence of trailing spaces for <code class="ph codeph">CHAR</code>
+ values. The <code class="ph codeph">S</code> and <code class="ph codeph">C</code> values are
+ displayed with enclosing quotation marks to show any trailing spaces.
+<pre class="pre codeblock" id="string_functions__d6e2627"><code>create table length_demo (s string, c char(5));
+insert into length_demo values
+ ('a',cast('a' as char(5))),
+ ('abc',cast('abc' as char(5))),
+ ('hello',cast('hello' as char(5)));
+
+select concat('"',s,'"') as s, concat('"',c,'"') as c,
+ length(s), length(c),
+ char_length(s), char_length(c)
+from length_demo;
++---------+---------+-----------+-----------+----------------+----------------+
+| s | c | length(s) | length(c) | char_length(s) | char_length(c) |
++---------+---------+-----------+-----------+----------------+----------------+
+| "a" | "a " | 1 | 1 | 1 | 5 |
+| "abc" | "abc " | 3 | 3 | 3 | 5 |
+| "hello" | "hello" | 5 | 5 | 5 | 5 |
++---------+---------+-----------+-----------+----------------+----------------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__locate">
+ <code class="ph codeph">locate(string substr, string str[, int pos])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the position (starting from 1) of the first occurrence of a substring within a
+ longer string, optionally after a particular position.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__lower">
+ <code class="ph codeph">lower(string a), <span class="ph" id="string_functions__lcase">lcase(string a)</span> </code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-lowercase.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+ use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+ to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+ or <code class="ph codeph">IREGEXP</code> operators instead. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+ <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__lpad">
+ <code class="ph codeph">lpad(string str, int len, string pad)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+ specified string is too short, it is padded on the left with a repeating sequence of the characters from
+ the pad string. If the specified string is too long, it is truncated on the right.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__ltrim">
+ <code class="ph codeph">ltrim(string a [, string chars_to_trim])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with all occurrences
+ of characters specified by the second argument removed from
+ the left side. Removes spaces if the second argument is not specified.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__parse_url">
+ <code class="ph codeph">parse_url(string urlString, string partToExtract [, string keyToExtract])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the portion of a URL corresponding to a specified part. The part argument can be
+ <code class="ph codeph">'PROTOCOL'</code>, <code class="ph codeph">'HOST'</code>, <code class="ph codeph">'PATH'</code>, <code class="ph codeph">'REF'</code>,
+ <code class="ph codeph">'AUTHORITY'</code>, <code class="ph codeph">'FILE'</code>, <code class="ph codeph">'USERINFO'</code>, or
+ <code class="ph codeph">'QUERY'</code>. Uppercase is required for these literal values. When requesting the
+ <code class="ph codeph">QUERY</code> portion of the URL, you can optionally specify a key to retrieve just the
+ associated value from the key-value pairs in the query string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> This function is important for the traditional Hadoop use case of interpreting web
+ logs. For example, if the web traffic data features raw URLs not divided into separate table columns,
+ you can count visitors to a particular page by extracting the <code class="ph codeph">'PATH'</code> or
+ <code class="ph codeph">'FILE'</code> field, or analyze search terms by extracting the corresponding key from the
+ <code class="ph codeph">'QUERY'</code> field.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_escape">
+ <code class="ph codeph">regexp_escape(string source)</code>
+ </dt>
+
+ <dd class="dd">
+ <strong class="ph b">Purpose:</strong> The <code class="ph codeph">regexp_escape</code> function returns
+ a string escaped for the special character in RE2 library so that the
+ special characters are interpreted literally rather than as special
+ characters. The following special characters are escaped by the
+ function:
+<pre class="pre codeblock"><code>.\+*?[^]$(){}=!<>|:-</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong>
+ <code class="ph codeph">string</code>
+ </p>
+
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ This example shows escaping one of special characters in RE2.
+ </p>
+<pre class="pre codeblock"><code>
++------------------------------------------------------+
+| regexp_escape('Hello.world') |
++------------------------------------------------------+
+| Hello\.world |
++------------------------------------------------------+
+</code></pre>
+ <p class="p">
+ This example shows escaping all the special characters in RE2.
+ </p>
+<pre class="pre codeblock"><code>
++------------------------------------------------------------+
+| regexp_escape('a.b\\c+d*e?f[g]h$i(j)k{l}m=n!o<p>q|r:s-t') |
++------------------------------------------------------------+
+| a\.b\\c\+d\*e\?f\[g\]h\$i\(j\)k\{l\}m\=n\!o\<p\>q\|r\:s\-t |
++------------------------------------------------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_extract">
+ <code class="ph codeph">regexp_extract(string subject, string pattern, int index)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified () group from a string based on a regular expression pattern. Group
+ 0 refers to the entire extracted string, while group 1, 2, and so on refers to the first, second, and so
+ on <code class="ph codeph">(...)</code> portion.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ This example shows how group 0 matches the full pattern string, including the portion outside any
+ <code class="ph codeph">()</code> group:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',0);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 0) |
++------------------------------------------------------+
+| abcdef123ghi456 |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ This example shows how group 1 matches just the contents inside the first <code class="ph codeph">()</code> group in
+ the pattern string:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('abcdef123ghi456jkl','.*?(\\d+)',1);
++------------------------------------------------------+
+| regexp_extract('abcdef123ghi456jkl', '.*?(\\d+)', 1) |
++------------------------------------------------------+
+| 456 |
++------------------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Unlike in earlier Impala releases, the regular expression library used in Impala 2.0 and later supports
+ the <code class="ph codeph">.*?</code> idiom for non-greedy matches. This example shows how a pattern string starting
+ with <code class="ph codeph">.*?</code> matches the shortest possible portion of the source string, returning the
+ rightmost set of lowercase letters. A pattern string both starting and ending with <code class="ph codeph">.*?</code>
+ finds two potential matches of equal length, and returns the first one found (the leftmost set of
+ lowercase letters).
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+)',1);
++--------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+)', 1) |
++--------------------------------------------------------+
+| def |
++--------------------------------------------------------+
+[localhost:21000] > select regexp_extract('AbcdBCdefGHI','.*?([[:lower:]]+).*?',1);
++-----------------------------------------------------------+
+| regexp_extract('abcdbcdefghi', '.*?([[:lower:]]+).*?', 1) |
++-----------------------------------------------------------+
+| bcd |
++-----------------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_like">
+ <code class="ph codeph">regexp_like(string source, string pattern[, string options])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">true</code> or <code class="ph codeph">false</code> to indicate
+ whether the source string contains anywhere inside it the regular expression given by the pattern.
+ The optional third argument consists of letter flags that change how the match is performed,
+ such as <code class="ph codeph">i</code> for case-insensitive matching.
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+ <p class="p">
+ The flags that you can include in the optional third argument are:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">c</code>: Case-sensitive matching (the default).
+ </li>
+ <li class="li">
+ <code class="ph codeph">i</code>: Case-insensitive matching. If multiple instances of <code class="ph codeph">c</code> and <code class="ph codeph">i</code>
+ are included in the third argument, the last such option takes precedence.
+ </li>
+ <li class="li">
+ <code class="ph codeph">m</code>: Multi-line matching. The <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+ operators match the start or end of any line within the source string, not the
+ start and end of the entire string.
+ </li>
+ <li class="li">
+ <code class="ph codeph">n</code>: Newline matching. The <code class="ph codeph">.</code> operator can match the
+ newline character. A repetition operator such as <code class="ph codeph">.*</code> can
+ match a portion of the source string that spans multiple lines.
+ </li>
+ </ul>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ This example shows how <code class="ph codeph">regexp_like()</code> can test for the existence
+ of various kinds of regular expression patterns within a source string:
+ </p>
+<pre class="pre codeblock"><code>
+-- Matches because the 'f' appears somewhere in 'foo'.
+select regexp_like('foo','f');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| true |
++-------------------------+
+
+-- Does not match because the comparison is case-sensitive by default.
+select regexp_like('foo','F');
++-------------------------+
+| regexp_like('foo', 'f') |
++-------------------------+
+| false |
++-------------------------+
+
+-- The 3rd argument can change the matching logic, such as 'i' meaning case-insensitive.
+select regexp_like('foo','F','i');
++------------------------------+
+| regexp_like('foo', 'f', 'i') |
++------------------------------+
+| true |
++------------------------------+
+
+-- The familiar regular expression notations work, such as ^ and $ anchors...
+select regexp_like('foo','f$');
++--------------------------+
+| regexp_like('foo', 'f$') |
++--------------------------+
+| false |
++--------------------------+
+
+select regexp_like('foo','o$');
++--------------------------+
+| regexp_like('foo', 'o$') |
++--------------------------+
+| true |
++--------------------------+
+
+-- ...and repetition operators such as * and +
+select regexp_like('foooooobar','fo+b');
++-----------------------------------+
+| regexp_like('foooooobar', 'fo+b') |
++-----------------------------------+
+| true |
++-----------------------------------+
+
+select regexp_like('foooooobar','fx*y*o*b');
++---------------------------------------+
+| regexp_like('foooooobar', 'fx*y*o*b') |
++---------------------------------------+
+| true |
++---------------------------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__regexp_replace">
+ <code class="ph codeph">regexp_replace(string initial, string pattern, string replacement)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the initial argument with the regular expression pattern replaced by the final
+ argument string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ These examples show how you can replace parts of a string matching a pattern with replacement text,
+ which can include backreferences to any <code class="ph codeph">()</code> groups in the pattern string. The
+ backreference numbers start at 1, and any <code class="ph codeph">\</code> characters must be escaped as
+ <code class="ph codeph">\\</code>.
+ </p>
+ <p class="p">
+ Replace a character pattern with new text:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','b+','xyz');
++------------------------------------------+
+| regexp_replace('aaabbbaaa', 'b+', 'xyz') |
++------------------------------------------+
+| aaaxyzaaa |
++------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Replace a character pattern with substitution text that includes the original matching text:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('aaabbbaaa','(b+)','<\\1>');
++----------------------------------------------+
+| regexp_replace('aaabbbaaa', '(b+)', '<\\1>') |
++----------------------------------------------+
+| aaa<bbb>aaa |
++----------------------------------------------+
+Returned 1 row(s) in 0.11s</code></pre>
+ <p class="p">
+ Remove all characters that are not digits:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select regexp_replace('123-456-789','[^[:digit:]]','');
++---------------------------------------------------+
+| regexp_replace('123-456-789', '[^[:digit:]]', '') |
++---------------------------------------------------+
+| 123456789 |
++---------------------------------------------------+
+Returned 1 row(s) in 0.12s</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__repeat">
+ <code class="ph codeph">repeat(string str, int n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string repeated a specified number of times.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__replace">
+ <code class="ph codeph">replace(string initial, string target, string replacement)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the initial argument with all occurrences of the target string
+ replaced by the replacement string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Because this function does not use any regular expression patterns, it is typically faster
+ than <code class="ph codeph">regexp_replace()</code> for simple string substitutions.
+ </p>
+ <p class="p">
+ If any argument is <code class="ph codeph">NULL</code>, the return value is <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ Matching is case-sensitive.
+ </p>
+ <p class="p">
+ If the replacement string contains another instance of the target
+ string, the expansion is only performed once, instead of
+ applying again to the newly constructed string.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>-- Replace one string with another.
+select replace('hello world','world','earth');
++------------------------------------------+
+| replace('hello world', 'world', 'earth') |
++------------------------------------------+
+| hello earth |
++------------------------------------------+
+
+-- All occurrences of the target string are replaced.
+select replace('hello world','o','0');
++----------------------------------+
+| replace('hello world', 'o', '0') |
++----------------------------------+
+| hell0 w0rld |
++----------------------------------+
+
+-- If no match is found, the original string is returned unchanged.
+select replace('hello world','xyz','abc');
++--------------------------------------+
+| replace('hello world', 'xyz', 'abc') |
++--------------------------------------+
+| hello world |
++--------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__reverse">
+ <code class="ph codeph">reverse(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with characters in reversed order.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+ <dt class="dt dlterm" id="string_functions__right">
+ <code class="ph codeph">right(string a, int num_chars)</code>
+ </dt>
+ <dd class="dd">
+ See the <code class="ph codeph">strright</code> function.
+ </dd>
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__rpad">
+ <code class="ph codeph">rpad(string str, int len, string pad)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string of a specified length, based on the first argument string. If the
+ specified string is too short, it is padded on the right with a repeating sequence of the characters from
+ the pad string. If the specified string is too long, it is truncated on the right.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__rtrim">
+ <code class="ph codeph">rtrim(string a [, string chars_to_trim])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string with all occurrences
+ of characters specified by the second argument removed from
+ the right side. Removes spaces if the second argument is not specified.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__space">
+ <code class="ph codeph">space(int n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a concatenated string of the specified number of spaces. Shorthand for
+ <code class="ph codeph">repeat(' ',<var class="keyword varname">n</var>)</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__split_part">
+ <code class="ph codeph">split_part(string source, string delimiter, bigint n)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the nth field within a delimited string. The
+ fields are numbered starting from 1. The delimiter can consist of
+ multiple characters, not just a single character. All matching of the
+ delimiter is done exactly, not using any regular expression patterns.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ <p class="p">
+ Because the <span class="keyword cmdname">impala-shell</span> interpreter uses the <code class="ph codeph">\</code> character for escaping,
+ use <code class="ph codeph">\\</code> to represent the regular expression escape character in any regular expressions
+ that you submit through <span class="keyword cmdname">impala-shell</span> . You might prefer to use the equivalent character
+ class names, such as <code class="ph codeph">[[:digit:]]</code> instead of <code class="ph codeph">\d</code> which you would have to
+ escape as <code class="ph codeph">\\d</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ These examples show how to retrieve the nth field from a delimited
+ string:
+ </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',1);
++-----------------------------+
+| split_part('x,y,z', ',', 1) |
++-----------------------------+
+| x |
++-----------------------------+
+
+select split_part('x,y,z',',',2);
++-----------------------------+
+| split_part('x,y,z', ',', 2) |
++-----------------------------+
+| y |
++-----------------------------+
+
+select split_part('x,y,z',',',3);
++-----------------------------+
+| split_part('x,y,z', ',', 3) |
++-----------------------------+
+| z |
++-----------------------------+
+
+</code></pre>
+ <p class="p">
+ These examples show what happens for out-of-range field positions.
+ Specifying a value less than 1 produces an error. Specifying a value
+ greater than the number of fields returns a zero-length string
+ (which is not the same as <code class="ph codeph">NULL</code>).
+ </p>
+<pre class="pre codeblock"><code>
+select split_part('x,y,z',',',0);
+ERROR: Invalid field position: 0
+
+with t1 as (select split_part('x,y,z',',',4) nonexistent_field)
+ select
+ nonexistent_field
+ , concat('[',nonexistent_field,']')
+ , length(nonexistent_field);
+from t1
++-------------------+-------------------------------------+---------------------------+
+| nonexistent_field | concat('[', nonexistent_field, ']') | length(nonexistent_field) |
++-------------------+-------------------------------------+---------------------------+
+| | [] | 0 |
++-------------------+-------------------------------------+---------------------------+
+
+</code></pre>
+ <p class="p">
+ These examples show how the delimiter can be a multi-character value:
+ </p>
+<pre class="pre codeblock"><code>
+select split_part('one***two***three','***',2);
++-------------------------------------------+
+| split_part('one***two***three', '***', 2) |
++-------------------------------------------+
+| two |
++-------------------------------------------+
+
+select split_part('one\|/two\|/three','\|/',3);
++-------------------------------------------+
+| split_part('one\|/two\|/three', '\|/', 3) |
++-------------------------------------------+
+| three |
++-------------------------------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__strleft">
+ <code class="ph codeph">strleft(string a, int num_chars)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the leftmost characters of the string. Shorthand for a call to
+ <code class="ph codeph">substr()</code> with 2 arguments.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+
+ </dd>
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__strright">
+ <code class="ph codeph">strright(string a, int num_chars)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the rightmost characters of the string. Shorthand for a call to
+ <code class="ph codeph">substr()</code> with 2 arguments.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__substr">
+ <code class="ph codeph">substr(string a, int start [, int len]), <span class="ph" id="string_functions__substring">substring(string a, int start [, int
+ len])</span></code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the portion of the string starting at a specified point, optionally with a
+ specified maximum length. The characters in the string are indexed starting at 1.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__translate">
+ <code class="ph codeph">translate(string input, string from, string to)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with a set of characters replaced by another set of characters.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__trim">
+ <code class="ph codeph">trim(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the input string with both leading and trailing spaces removed. The same as
+ passing the string through both <code class="ph codeph">ltrim()</code> and <code class="ph codeph">rtrim()</code>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Often used during data cleansing operations during the ETL cycle, if input values might still have surrounding spaces.
+ For a more general-purpose function that can remove other leading and trailing characters besides spaces, see <code class="ph codeph">btrim()</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="string_functions__upper">
+ <code class="ph codeph">upper(string a), <span class="ph" id="string_functions__ucase">ucase(string a)</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the argument string converted to all-uppercase.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can simplify queries that
+ use many <code class="ph codeph">UPPER()</code> and <code class="ph codeph">LOWER()</code> calls
+ to do case-insensitive comparisons, by using the <code class="ph codeph">ILIKE</code>
+ or <code class="ph codeph">IREGEXP</code> operators instead. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#ilike">ILIKE Operator</a> and
+ <a class="xref" href="../shared/../topics/impala_operators.html#iregexp">IREGEXP Operator</a> for details.
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
[04/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_options.html b/docs/build3x/html/topics/impala_shell_options.html
new file mode 100644
index 0000000..4d61196
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_options.html
@@ -0,0 +1,618 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Configuration Options</title></head><body id="shell_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">impala-shell Configuration Options</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can specify the following options when starting the <code class="ph codeph">impala-shell</code> command to change how
+ shell commands are executed. The table shows the format to use when specifying each option on the command
+ line, or through the <span class="ph filepath">$HOME/.impalarc</span> configuration file.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ These options are different than the configuration options for the <code class="ph codeph">impalad</code> daemon itself.
+ For the <code class="ph codeph">impalad</code> options, see <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_options__shell_option_summary">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Summary of impala-shell Configuration Options</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following table shows the names and allowed arguments for the <span class="keyword cmdname">impala-shell</span>
+ configuration options. You can specify options on the command line, or in a configuration file as described
+ in <a class="xref" href="impala_shell_options.html#shell_config_file">impala-shell Configuration File</a>.
+ </p>
+
+ <table class="table"><caption></caption><colgroup><col style="width:25%"><col style="width:25%"><col style="width:50%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="shell_option_summary__entry__1">
+ Command-Line Option
+ </th>
+ <th class="entry nocellnorowborder" id="shell_option_summary__entry__2">
+ Configuration File Setting
+ </th>
+ <th class="entry nocellnorowborder" id="shell_option_summary__entry__3">
+ Explanation
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -B or --delimited
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ write_delimited=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Causes all query results to be printed in plain format as a delimited text file. Useful for
+ producing data files to be used with other Hadoop components. Also useful for avoiding the
+ performance overhead of pretty-printing all output, especially when running benchmark tests using
+ queries returning large result sets. Specify the delimiter character with the
+ <code class="ph codeph">--output_delimiter</code> option. Store all query results in a file rather than
+ printing to the screen with the <code class="ph codeph">-B</code> option. Added in Impala 1.0.1.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -b or
+ </p>
+ <p class="p">
+ --kerberos_host_fqdn
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ kerberos_host_fqdn=
+ </p>
+ <p class="p">
+ <var class="keyword varname">load-balancer-hostname</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ If set, the setting overrides the expected hostname of the
+ Impala daemon's Kerberos service principal.
+ <span class="keyword cmdname">impala-shell</span> will check that the server's
+ principal matches this hostname. This may be used when
+ <code class="ph codeph">impalad</code> is configured to be accessed via a
+ load-balancer, but it is desired for impala-shell to talk to a
+ specific <code class="ph codeph">impalad</code> directly.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ --print_header
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ print_header=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p"></p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -o <var class="keyword varname">filename</var> or --output_file <var class="keyword varname">filename</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ output_file=<var class="keyword varname">filename</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Stores all query results in the specified file. Typically used to store the results of a single
+ query issued from the command line with the <code class="ph codeph">-q</code> option. Also works for
+ interactive sessions; you see the messages such as number of rows fetched, but not the actual
+ result set. To suppress these incidental messages when combining the <code class="ph codeph">-q</code> and
+ <code class="ph codeph">-o</code> options, redirect <code class="ph codeph">stderr</code> to <code class="ph codeph">/dev/null</code>.
+ Added in Impala 1.0.1.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ --output_delimiter=<var class="keyword varname">character</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ output_delimiter=<var class="keyword varname">character</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Specifies the character to use as a delimiter between fields when query results are printed in
+ plain format by the <code class="ph codeph">-B</code> option. Defaults to tab (<code class="ph codeph">'\t'</code>). If an
+ output value contains the delimiter character, that field is quoted, escaped by doubling quotation marks, or both. Added in
+ Impala 1.0.1.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -p or --show_profiles
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ show_profiles=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Displays the query execution plan (same output as the <code class="ph codeph">EXPLAIN</code> statement) and a
+ more detailed low-level breakdown of execution steps, for every query executed by the shell.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -h or --help
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ N/A
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Displays help information.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ N/A
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ history_max=1000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Sets the maximum number of queries to store in the history file.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -i <var class="keyword varname">hostname</var> or
+ --impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ impalad=<var class="keyword varname">hostname</var>[:<var class="keyword varname">portnum</var>]
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Connects to the <code class="ph codeph">impalad</code> daemon on the specified host. The default port of 21000
+ is assumed unless you provide another value. You can connect to any host in your cluster that is
+ running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+ was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, provide that
+ alternative port.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -q <var class="keyword varname">query</var> or --query=<var class="keyword varname">query</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ query=<var class="keyword varname">query</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Passes a query or other <span class="keyword cmdname">impala-shell</span> command from the command line. The
+ <span class="keyword cmdname">impala-shell</span> interpreter immediately exits after processing the statement. It
+ is limited to a single statement, which could be a <code class="ph codeph">SELECT</code>, <code class="ph codeph">CREATE
+ TABLE</code>, <code class="ph codeph">SHOW TABLES</code>, or any other statement recognized in
+ <code class="ph codeph">impala-shell</code>. Because you cannot pass a <code class="ph codeph">USE</code> statement and
+ another query, fully qualify the names for any tables outside the <code class="ph codeph">default</code>
+ database. (Or use the <code class="ph codeph">-f</code> option to pass a file with a <code class="ph codeph">USE</code>
+ statement followed by other queries.)
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -f <var class="keyword varname">query_file</var> or --query_file=<var class="keyword varname">query_file</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ query_file=<var class="keyword varname">path_to_query_file</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Passes a SQL query from a file. Multiple statements must be semicolon (;) delimited.
+ <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify a filename of <code class="ph codeph">-</code>
+ to represent standard input. This feature makes it convenient to use <span class="keyword cmdname">impala-shell</span>
+ as part of a Unix pipeline where SQL statements are generated dynamically by other tools.</span>
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ --query_option="<var class="keyword varname">option</var>=<var class="keyword varname">value</var>"
+ -Q "<var class="keyword varname">option</var>=<var class="keyword varname">value</var>"
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ Header line <code class="ph codeph">[impala.query_options]</code>,
+ followed on subsequent lines by <var class="keyword varname">option</var>=<var class="keyword varname">value</var>, one option per line.
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Sets default query options for an invocation of the <span class="keyword cmdname">impala-shell</span> command.
+ To set multiple query options at once, use more than one instance of this command-line option.
+ The query option names are not case-sensitive.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -k or --kerberos
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ use_kerberos=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Kerberos authentication is used when the shell connects to <code class="ph codeph">impalad</code>. If Kerberos
+ is not enabled on the instance of <code class="ph codeph">impalad</code> to which you are connecting, errors
+ are displayed.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -s <var class="keyword varname">kerberos_service_name</var> or --kerberos_service_name=<var class="keyword varname">name</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ kerberos_service_name=<var class="keyword varname">name</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Instructs <code class="ph codeph">impala-shell</code> to authenticate to a particular <code class="ph codeph">impalad</code>
+ service principal. If a <var class="keyword varname">kerberos_service_name</var> is not specified,
+ <code class="ph codeph">impala</code> is used by default. If this option is used in conjunction with a
+ connection in which Kerberos is not supported, errors are returned.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -V or --verbose
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ verbose=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Enables verbose output.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ --quiet
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ verbose=false
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Disables verbose output.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -v or --version
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ version=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Displays version information.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -c
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ ignore_query_failure=true
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Continues on query failure.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ <p class="p">
+ -d <var class="keyword varname">default_db</var> or --database=<var class="keyword varname">default_db</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ <p class="p">
+ default_db=<var class="keyword varname">default_db</var>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ <p class="p">
+ Specifies the database to be used on startup. Same as running the
+ <code class="ph codeph"><a class="xref" href="impala_use.html#use">USE</a></code> statement after connecting. If not
+ specified, a database named <code class="ph codeph">DEFAULT</code> is used.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ -ssl
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ ssl=true
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ --ca_cert=<var class="keyword varname">path_to_certificate</var>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ ca_cert=<var class="keyword varname">path_to_certificate</var>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ The local pathname pointing to the third-party CA certificate, or to a copy of the server
+ certificate for self-signed server certificates. If <code class="ph codeph">--ca_cert</code> is not set,
+ <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server certificate. This is
+ useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+ certificate is not available (such as when debugging customer installations).
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ -l
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ use_ldap=true
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Enables LDAP authentication.
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ -u
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ user=<var class="keyword varname">user_name</var>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Supplies the username, when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+ (Specify the short username, not the full LDAP distinguished name.) The shell then prompts
+ interactively for the password.
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ --ldap_password_cmd=<var class="keyword varname">command</var>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ N/A
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Specifies a command to run to retrieve the LDAP password,
+ when LDAP authentication is enabled by the <code class="ph codeph">-l</code> option.
+ If the command includes space-separated arguments, enclose the command and
+ its arguments in quotation marks.
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">
+ --config_file=<var class="keyword varname">path_to_config_file</var>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">
+ N/A
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Specifies the path of the file containing <span class="keyword cmdname">impala-shell</span> configuration settings.
+ The default is <span class="ph filepath">$HOME/.impalarc</span>. This setting can only be specified on the
+ command line.
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_progress</td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a progress bar showing roughly the percentage complete for each query.
+ The information is updated interactively as the query progresses.
+ See <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--live_summary</td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">Prints a detailed report, similar to the <code class="ph codeph">SUMMARY</code> command, showing progress details for each phase of query execution.
+ The information is updated interactively as the query progresses.
+ See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__1 ">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__2 ">N/A</td>
+ <td class="entry nocellnorowborder" headers="shell_option_summary__entry__3 ">
+ Defines a substitution variable that can be used within the <span class="keyword cmdname">impala-shell</span> session.
+ The variable can be substituted into statements processed by the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options,
+ or in an interactive shell session.
+ Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+ This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ </td>
+ </tr>
+ </tbody></table>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="shell_options__shell_config_file">
+
+ <h2 class="title topictitle2" id="ariaid-title3">impala-shell Configuration File</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can define a set of default options for your <span class="keyword cmdname">impala-shell</span> environment, stored in the
+ file <span class="ph filepath">$HOME/.impalarc</span>. This file consists of key-value pairs, one option per line.
+ Everything after a <code class="ph codeph">#</code> character on a line is treated as a comment and ignored.
+ </p>
+
+ <p class="p">
+ The configuration file must contain a header label <code class="ph codeph">[impala]</code>, followed by the options
+ specific to <span class="keyword cmdname">impala-shell</span>. (This standard convention for configuration files lets you
+ use a single file to hold configuration options for multiple applications.)
+ </p>
+
+ <p class="p">
+ To specify a different filename or path for the configuration file, specify the argument
+ <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code> on the
+ <span class="keyword cmdname">impala-shell</span> command line.
+ </p>
+
+ <p class="p">
+ The names of the options in the configuration file are similar (although not necessarily identical) to the
+ long-form command-line arguments to the <span class="keyword cmdname">impala-shell</span> command. For the names to use, see
+ <a class="xref" href="impala_shell_options.html#shell_option_summary">Summary of impala-shell Configuration Options</a>.
+ </p>
+
+ <p class="p">
+ Any options you specify on the <span class="keyword cmdname">impala-shell</span> command line override any corresponding
+ options within the configuration file.
+ </p>
+
+ <p class="p">
+ The following example shows a configuration file that you might use during benchmarking tests. It sets
+ verbose mode, so that the output from each SQL query is followed by timing information.
+ <span class="keyword cmdname">impala-shell</span> starts inside the database containing the tables with the benchmark data,
+ avoiding the need to issue a <code class="ph codeph">USE</code> statement or use fully qualified table names.
+ </p>
+
+ <p class="p">
+ In this example, the query output is formatted as delimited text rather than enclosed in ASCII art boxes,
+ and is stored in a file rather than printed to the screen. Those options are appropriate for benchmark
+ situations, so that the overhead of <span class="keyword cmdname">impala-shell</span> formatting and printing the result set
+ does not factor into the timing measurements. It also enables the <code class="ph codeph">show_profiles</code> option.
+ That option prints detailed performance information after each query, which might be valuable in
+ understanding the performance of benchmark queries.
+ </p>
+
+<pre class="pre codeblock"><code>[impala]
+verbose=true
+default_db=tpc_benchmarking
+write_delimited=true
+output_delimiter=,
+output_file=/home/tester1/benchmark_results.csv
+show_profiles=true
+</code></pre>
+
+ <p class="p">
+ The following example shows a configuration file that connects to a specific remote Impala node, runs a
+ single query within a particular database, then exits. Any query options predefined under the
+ <code class="ph codeph">[impala.query_options]</code> section in the configuration file take effect during the session.
+ </p>
+
+ <p class="p">
+ You would typically use this kind of single-purpose
+ configuration setting with the <span class="keyword cmdname">impala-shell</span> command-line option
+ <code class="ph codeph">--config_file=<var class="keyword varname">path_to_config_file</var></code>, to easily select between many
+ predefined queries that could be run against different databases, hosts, or even different clusters. To run
+ a sequence of statements instead of a single query, specify the configuration option
+ <code class="ph codeph">query_file=<var class="keyword varname">path_to_query_file</var></code> instead.
+ </p>
+
+<pre class="pre codeblock"><code>[impala]
+impalad=impala-test-node1.example.com
+default_db=site_stats
+# Issue a predefined query and immediately exit.
+query=select count(*) from web_traffic where event_date = trunc(now(),'dd')
+
+<span class="ph">[impala.query_options]
+mem_limit=32g</span>
+</code></pre>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_running_commands.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_running_commands.html b/docs/build3x/html/topics/impala_shell_running_commands.html
new file mode 100644
index 0000000..98c4d24
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_running_commands.html
@@ -0,0 +1,322 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_running_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Running Commands and SQL Statements in impala-shell</title></head><body id="shell_running_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Running Commands and SQL Statements in impala-shell</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ For information on available commands, see
+ <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>. You can see the full set of available
+ commands by pressing TAB twice, for example:
+ </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] >
+connect describe explain help history insert quit refresh select set shell show use version
+[impalad-host:21000] ></code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Commands must be terminated by a semi-colon. A command can span multiple lines.
+ </div>
+
+ <p class="p">
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select *
+ > from t1
+ > limit 5;
++---------+-----------+
+| s1 | s2 |
++---------+-----------+
+| hello | world |
+| goodbye | cleveland |
++---------+-----------+
+</code></pre>
+
+ <p class="p">
+ A comment is considered part of the statement it precedes, so when you enter a <code class="ph codeph">--</code> or
+ <code class="ph codeph">/* */</code> comment, you get a continuation prompt until you finish entering a statement ending
+ with a semicolon:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > -- This is a test comment
+ > show tables like 't*';
++--------+
+| name |
++--------+
+| t1 |
+| t2 |
+| tab1 |
+| tab2 |
+| tab3 |
+| text_t |
++--------+
+</code></pre>
+
+ <p class="p">
+ Use the up-arrow and down-arrow keys to cycle through and edit previous commands.
+ <span class="keyword cmdname">impala-shell</span> uses the <code class="ph codeph">readline</code> library and so supports a standard set of
+ keyboard shortcuts for editing and cursor movement, such as <code class="ph codeph">Ctrl-A</code> for beginning of line and
+ <code class="ph codeph">Ctrl-E</code> for end of line.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can define substitution variables to be used within SQL statements
+ processed by <span class="keyword cmdname">impala-shell</span>. On the command line, you specify the option
+ <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+ Within an interactive session or a script file processed by the <code class="ph codeph">-f</code> option, you specify
+ a <code class="ph codeph">SET</code> command using the notation <code class="ph codeph">SET VAR:<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+ Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because this feature is part of <span class="keyword cmdname">impala-shell</span> rather than the <span class="keyword cmdname">impalad</span>
+ backend, make sure the client system you are connecting from has the most recent <span class="keyword cmdname">impala-shell</span>.
+ You can use this feature with a new <span class="keyword cmdname">impala-shell</span> connecting to an older <span class="keyword cmdname">impalad</span>,
+ but not the reverse.
+ </div>
+
+ <p class="p">
+ For example, here are some <span class="keyword cmdname">impala-shell</span> commands that define substitution variables and then
+ use them in SQL statements executed through the <code class="ph codeph">-q</code> and <code class="ph codeph">-f</code> options.
+ Notice how the <code class="ph codeph">-q</code> argument strings are single-quoted to prevent shell expansion of the
+ <code class="ph codeph">${var:value}</code> notation, and any string literals within the queries are enclosed by double quotation marks.
+ </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --var=tname=table1 --var=colname=x --var=coltype=string -q 'create table ${var:tname} (${var:colname} ${var:coltype}) stored as parquet'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: create table table1 (x string) stored as parquet
+
+$ NEW_STRING="hello world"
+$ impala-shell --var=tname=table1 --var=insert_val="$NEW_STRING" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: insert into table1 values ("hello world")
+Inserted 1 row(s) in 1.40s
+
+$ for VAL in foo bar bletch
+do
+ impala-shell --var=tname=table1 --var=insert_val="$VAL" -q 'insert into ${var:tname} values ("${var:insert_val}")'
+done
+...
+Query: insert into table1 values ("foo")
+Inserted 1 row(s) in 0.22s
+Query: insert into table1 values ("bar")
+Inserted 1 row(s) in 0.11s
+Query: insert into table1 values ("bletch")
+Inserted 1 row(s) in 0.21s
+
+$ echo "Search for what substring?" ; read answer
+Search for what substring?
+b
+$ impala-shell --var=tname=table1 -q 'select x from ${var:tname} where x like "%${var:answer}%"'
+Starting Impala Shell without Kerberos authentication
+Connected to <var class="keyword varname">hostname</var>
+Server version: <var class="keyword varname">impalad_version</var>
+Query: select x from table1 where x like "%b%"
++--------+
+| x |
++--------+
+| bletch |
+| bar |
++--------+
+Fetched 2 row(s) in 0.83s
+</code></pre>
+
+ <p class="p">
+ Here is a substitution variable passed in by the <code class="ph codeph">--var</code> option,
+ and then referenced by statements issued interactively. Then the variable is
+ cleared with the <code class="ph codeph">UNSET</code> command, and defined again with the
+ <code class="ph codeph">SET</code> command.
+ </p>
+
+<pre class="pre codeblock"><code>
+$ impala-shell --quiet --var=tname=table1
+Starting Impala Shell without Kerberos authentication
+***********************************************************************************
+<var class="keyword varname">banner_message</var>
+***********************************************************************************
+[<var class="keyword varname">hostname</var>:21000] > select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4 |
++----------+
+[<var class="keyword varname">hostname</var>:21000] > unset var:tname;
+Unsetting variable TNAME
+[<var class="keyword varname">hostname</var>:21000] > select count(*) from ${var:tname};
+Error: Unknown variable TNAME
+[<var class="keyword varname">hostname</var>:21000] > set var:tname=table1;
+[<var class="keyword varname">hostname</var>:21000] > select count(*) from ${var:tname};
++----------+
+| count(*) |
++----------+
+| 4 |
++----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how the <code class="ph codeph">SOURCE</code> command can execute
+ a series of statements from a file:
+ </p>
+
+<pre class="pre codeblock"><code>
+$ cat commands.sql
+show databases;
+show tables in default;
+show functions in _impala_builtins like '*minute*';
+
+$ impala-shell -i localhost
+...
+[localhost:21000] > source commands.sql;
+Query: show databases
++------------------+----------------------------------------------+
+| name | comment |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default | Default Hive database |
++------------------+----------------------------------------------+
+Fetched 2 row(s) in 0.06s
+Query: show tables in default
++-----------+
+| name |
++-----------+
+| customers |
+| sample_07 |
+| sample_08 |
+| web_logs |
++-----------+
+Fetched 4 row(s) in 0.02s
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT | minute(TIMESTAMP) | BUILTIN | true |
+| TIMESTAMP | minutes_add(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | minutes_add(TIMESTAMP, INT) | BUILTIN | true |
+| TIMESTAMP | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | minutes_sub(TIMESTAMP, INT) | BUILTIN | true |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.03s
+</code></pre>
+
+ <p class="p">
+ The following example shows how a file that is run by the <code class="ph codeph">SOURCE</code> command,
+ or through the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options of <span class="keyword cmdname">impala-shell</span>,
+ can contain additional <code class="ph codeph">SOURCE</code> commands.
+ The first file, <span class="ph filepath">nested1.sql</span>, runs an <span class="keyword cmdname">impala-shell</span> command
+ and then also runs the commands from <span class="ph filepath">nested2.sql</span>.
+ This ability for scripts to call each other is often useful for code that sets up schemas for applications
+ or test environments.
+ </p>
+
+<pre class="pre codeblock"><code>
+$ cat nested1.sql
+show functions in _impala_builtins like '*minute*';
+source nested2.sql
+$ cat nested2.sql
+show functions in _impala_builtins like '*hour*'
+
+$ impala-shell -i localhost -f nested1.sql
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: show functions in _impala_builtins like '*minute*'
++-------------+--------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+--------------------------------+-------------+---------------+
+| INT | minute(TIMESTAMP) | BUILTIN | true |
+| TIMESTAMP | minutes_add(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | minutes_add(TIMESTAMP, INT) | BUILTIN | true |
+| TIMESTAMP | minutes_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | minutes_sub(TIMESTAMP, INT) | BUILTIN | true |
++-------------+--------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+Query: show functions in _impala_builtins like '*hour*'
++-------------+------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT | hour(TIMESTAMP) | BUILTIN | true |
+| TIMESTAMP | hours_add(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | hours_add(TIMESTAMP, INT) | BUILTIN | true |
+| TIMESTAMP | hours_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | hours_sub(TIMESTAMP, INT) | BUILTIN | true |
++-------------+------------------------------+-------------+---------------+
+Fetched 5 row(s) in 0.01s
+</code></pre>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="shell_running_commands__rerun">
+ <h2 class="title topictitle2" id="ariaid-title2">Rerunning impala-shell Commands</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ In <span class="keyword">Impala 2.10</span> and higher, you can use the
+ <code class="ph codeph">rerun</code> command, or its abbreviation <code class="ph codeph">@</code>,
+ to re-execute commands from the history list. The argument can be
+ a positive integer (reflecting the number shown in <code class="ph codeph">history</code>
+ output) or a negative integer (reflecting the N'th last command in the
+ <code class="ph codeph">history</code> output. For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > select * from p1 order by t limit 5;
+...
+[localhost:21000] > show table stats p1;
++-----------+--------+--------+------------------------------------------------------------+
+| #Rows | #Files | Size | Location |
++-----------+--------+--------+------------------------------------------------------------+
+| 134217728 | 50 | 4.66MB | hdfs://test.example.com:8020/user/hive/warehouse/jdr.db/p1 |
++-----------+--------+--------+------------------------------------------------------------+
+[localhost:21000] > compute stats p1;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] > history;
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] > @-2; <- Rerun the 2nd last command in the history list
+Rerunning compute stats p1;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+[localhost:21000] > history; <- History list is not updated by rerunning commands
+ or by repeating the last command, in this case 'history'.
+[1]: use jdr;
+[2]: history;
+[3]: show tables;
+[4]: select * from p1 order by t limit 5;
+[5]: show table stats p1;
+[6]: compute stats p1;
+[7]: history;
+[localhost:21000] > @4; <- Rerun command #4 in the history list using short form '@'.
+Rerunning select * from p1 order by t limit 5;
+...
+[localhost:21000] > rerun 4; <- Rerun command #4 using long form 'rerun'.
+Rerunning select * from p1 order by t limit 5;
+...
+
+</code></pre>
+
+ </div>
+ </article>
+
+</article></main></body></html>
[51/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
[DOCS] Impala doc site update for 3.0
Add Impala docs from branch master.
Commit hash: f20415755401d56103df7f16348cea8ed12fb3d8
Change-Id: Icf5927efa7baa965095a3ff2fd4ec7411313342d
Reviewed-on: http://gerrit.cloudera.org:8080/10322
Reviewed-by: Michael Brown <mi...@cloudera.com>
Tested-by: Alex Rodoni <ar...@cloudera.com>
Project: http://git-wip-us.apache.org/repos/asf/impala/repo
Commit: http://git-wip-us.apache.org/repos/asf/impala/commit/fae51ec2
Tree: http://git-wip-us.apache.org/repos/asf/impala/tree/fae51ec2
Diff: http://git-wip-us.apache.org/repos/asf/impala/diff/fae51ec2
Branch: refs/heads/asf-site
Commit: fae51ec244b5005d21a45e434bc3425cfe08e871
Parents: 52b8807
Author: Alex Rodoni <ar...@cloudera.com>
Authored: Tue May 8 10:56:06 2018 -0700
Committer: Alex Rodoni <ar...@cloudera.com>
Committed: Wed May 9 20:55:22 2018 +0000
----------------------------------------------------------------------
docs/build3x/html/commonltr.css | 555 ++
docs/build3x/html/commonrtl.css | 592 ++
docs/build3x/html/images/impala_arch.jpeg | Bin 0 -> 41900 bytes
docs/build3x/html/index.html | 3 +
.../html/topics/impala_abort_on_error.html | 42 +
docs/build3x/html/topics/impala_adls.html | 638 ++
docs/build3x/html/topics/impala_admin.html | 52 +
docs/build3x/html/topics/impala_admission.html | 822 +++
.../html/topics/impala_aggregate_functions.html | 34 +
docs/build3x/html/topics/impala_aliases.html | 148 +
.../impala_allow_unsupported_formats.html | 24 +
.../build3x/html/topics/impala_alter_table.html | 1117 ++++
docs/build3x/html/topics/impala_alter_view.html | 139 +
.../html/topics/impala_analytic_functions.html | 1785 ++++++
.../html/topics/impala_appx_count_distinct.html | 82 +
.../build3x/html/topics/impala_appx_median.html | 132 +
docs/build3x/html/topics/impala_array.html | 321 +
docs/build3x/html/topics/impala_auditing.html | 232 +
.../html/topics/impala_authentication.html | 37 +
.../html/topics/impala_authorization.html | 1176 ++++
docs/build3x/html/topics/impala_avg.html | 318 +
docs/build3x/html/topics/impala_avro.html | 565 ++
docs/build3x/html/topics/impala_batch_size.html | 34 +
docs/build3x/html/topics/impala_bigint.html | 138 +
.../html/topics/impala_bit_functions.html | 848 +++
docs/build3x/html/topics/impala_boolean.html | 170 +
docs/build3x/html/topics/impala_breakpad.html | 239 +
.../html/topics/impala_buffer_pool_limit.html | 71 +
docs/build3x/html/topics/impala_char.html | 305 +
docs/build3x/html/topics/impala_comments.html | 46 +
.../html/topics/impala_complex_types.html | 2606 ++++++++
docs/build3x/html/topics/impala_components.html | 227 +
.../html/topics/impala_compression_codec.html | 92 +
.../html/topics/impala_compute_stats.html | 637 ++
.../impala_compute_stats_min_sample_size.html | 23 +
docs/build3x/html/topics/impala_concepts.html | 48 +
.../topics/impala_conditional_functions.html | 611 ++
docs/build3x/html/topics/impala_config.html | 48 +
.../html/topics/impala_config_options.html | 389 ++
.../html/topics/impala_config_performance.html | 149 +
docs/build3x/html/topics/impala_connecting.html | 187 +
.../topics/impala_conversion_functions.html | 288 +
docs/build3x/html/topics/impala_count.html | 353 ++
.../html/topics/impala_create_database.html | 209 +
.../html/topics/impala_create_function.html | 502 ++
.../build3x/html/topics/impala_create_role.html | 70 +
.../html/topics/impala_create_table.html | 1346 ++++
.../build3x/html/topics/impala_create_view.html | 194 +
docs/build3x/html/topics/impala_databases.html | 62 +
docs/build3x/html/topics/impala_datatypes.html | 33 +
.../html/topics/impala_datetime_functions.html | 3105 +++++++++
docs/build3x/html/topics/impala_ddl.html | 141 +
.../html/topics/impala_debug_action.html | 24 +
docs/build3x/html/topics/impala_decimal.html | 907 +++
docs/build3x/html/topics/impala_decimal_v2.html | 32 +
.../impala_default_join_distribution_mode.html | 113 +
.../impala_default_spillable_buffer_size.html | 87 +
docs/build3x/html/topics/impala_delegation.html | 70 +
docs/build3x/html/topics/impala_delete.html | 177 +
docs/build3x/html/topics/impala_describe.html | 817 +++
.../build3x/html/topics/impala_development.html | 197 +
.../html/topics/impala_disable_codegen.html | 36 +
.../impala_disable_row_runtime_filtering.html | 90 +
...mpala_disable_streaming_preaggregations.html | 50 +
.../topics/impala_disable_unsafe_spills.html | 50 +
docs/build3x/html/topics/impala_disk_space.html | 133 +
docs/build3x/html/topics/impala_distinct.html | 81 +
docs/build3x/html/topics/impala_dml.html | 82 +
docs/build3x/html/topics/impala_double.html | 157 +
.../html/topics/impala_drop_database.html | 193 +
.../html/topics/impala_drop_function.html | 136 +
docs/build3x/html/topics/impala_drop_role.html | 71 +
docs/build3x/html/topics/impala_drop_stats.html | 285 +
docs/build3x/html/topics/impala_drop_table.html | 192 +
docs/build3x/html/topics/impala_drop_view.html | 80 +
.../impala_exec_single_node_rows_threshold.html | 89 +
.../html/topics/impala_exec_time_limit_s.html | 70 +
docs/build3x/html/topics/impala_explain.html | 296 +
.../html/topics/impala_explain_level.html | 342 +
.../html/topics/impala_explain_plan.html | 592 ++
docs/build3x/html/topics/impala_faq.html | 21 +
.../html/topics/impala_file_formats.html | 236 +
.../html/topics/impala_fixed_issues.html | 5961 ++++++++++++++++++
docs/build3x/html/topics/impala_float.html | 153 +
docs/build3x/html/topics/impala_functions.html | 162 +
.../html/topics/impala_functions_overview.html | 109 +
docs/build3x/html/topics/impala_grant.html | 256 +
docs/build3x/html/topics/impala_group_by.html | 140 +
.../html/topics/impala_group_concat.html | 141 +
docs/build3x/html/topics/impala_hadoop.html | 138 +
docs/build3x/html/topics/impala_having.html | 39 +
docs/build3x/html/topics/impala_hbase.html | 772 +++
.../html/topics/impala_hbase_cache_blocks.html | 36 +
.../html/topics/impala_hbase_caching.html | 36 +
docs/build3x/html/topics/impala_hints.html | 488 ++
.../build3x/html/topics/impala_identifiers.html | 110 +
.../html/topics/impala_impala_shell.html | 87 +
.../topics/impala_incompatible_changes.html | 1526 +++++
docs/build3x/html/topics/impala_insert.html | 911 +++
docs/build3x/html/topics/impala_install.html | 126 +
docs/build3x/html/topics/impala_int.html | 121 +
docs/build3x/html/topics/impala_intro.html | 198 +
.../html/topics/impala_invalidate_metadata.html | 286 +
docs/build3x/html/topics/impala_isilon.html | 89 +
docs/build3x/html/topics/impala_jdbc.html | 340 +
docs/build3x/html/topics/impala_joins.html | 531 ++
docs/build3x/html/topics/impala_kerberos.html | 342 +
.../html/topics/impala_known_issues.html | 1012 +++
docs/build3x/html/topics/impala_kudu.html | 1449 +++++
docs/build3x/html/topics/impala_langref.html | 66 +
.../build3x/html/topics/impala_langref_sql.html | 28 +
.../html/topics/impala_langref_unsupported.html | 337 +
docs/build3x/html/topics/impala_ldap.html | 294 +
docs/build3x/html/topics/impala_limit.html | 168 +
docs/build3x/html/topics/impala_lineage.html | 91 +
docs/build3x/html/topics/impala_literals.html | 424 ++
.../html/topics/impala_live_progress.html | 131 +
.../html/topics/impala_live_summary.html | 177 +
docs/build3x/html/topics/impala_load_data.html | 322 +
docs/build3x/html/topics/impala_logging.html | 423 ++
docs/build3x/html/topics/impala_map.html | 331 +
.../html/topics/impala_math_functions.html | 1711 +++++
docs/build3x/html/topics/impala_max.html | 298 +
docs/build3x/html/topics/impala_max_errors.html | 40 +
.../topics/impala_max_num_runtime_filters.html | 75 +
.../html/topics/impala_max_row_size.html | 221 +
.../topics/impala_max_scan_range_length.html | 47 +
docs/build3x/html/topics/impala_mem_limit.html | 206 +
docs/build3x/html/topics/impala_min.html | 297 +
.../impala_min_spillable_buffer_size.html | 87 +
.../html/topics/impala_misc_functions.html | 175 +
.../html/topics/impala_mixed_security.html | 26 +
docs/build3x/html/topics/impala_mt_dop.html | 190 +
docs/build3x/html/topics/impala_ndv.html | 226 +
.../html/topics/impala_new_features.html | 3806 +++++++++++
docs/build3x/html/topics/impala_num_nodes.html | 61 +
.../html/topics/impala_num_scanner_threads.html | 27 +
docs/build3x/html/topics/impala_odbc.html | 24 +
docs/build3x/html/topics/impala_offset.html | 67 +
docs/build3x/html/topics/impala_operators.html | 2042 ++++++
.../impala_optimize_partition_key_scans.html | 188 +
docs/build3x/html/topics/impala_order_by.html | 398 ++
docs/build3x/html/topics/impala_parquet.html | 1421 +++++
.../impala_parquet_annotate_strings_utf8.html | 54 +
.../topics/impala_parquet_array_resolution.html | 180 +
.../impala_parquet_compression_codec.html | 17 +
...pala_parquet_fallback_schema_resolution.html | 55 +
.../html/topics/impala_parquet_file_size.html | 101 +
.../html/topics/impala_partitioning.html | 801 +++
.../html/topics/impala_perf_benchmarking.html | 27 +
.../html/topics/impala_perf_cookbook.html | 256 +
.../html/topics/impala_perf_hdfs_caching.html | 578 ++
docs/build3x/html/topics/impala_perf_joins.html | 508 ++
.../html/topics/impala_perf_resources.html | 47 +
docs/build3x/html/topics/impala_perf_skew.html | 139 +
docs/build3x/html/topics/impala_perf_stats.html | 1192 ++++
.../html/topics/impala_perf_testing.html | 152 +
.../build3x/html/topics/impala_performance.html | 116 +
docs/build3x/html/topics/impala_planning.html | 20 +
docs/build3x/html/topics/impala_porting.html | 603 ++
docs/build3x/html/topics/impala_ports.html | 421 ++
.../html/topics/impala_prefetch_mode.html | 47 +
docs/build3x/html/topics/impala_prereqs.html | 275 +
docs/build3x/html/topics/impala_processes.html | 115 +
docs/build3x/html/topics/impala_proxy.html | 501 ++
.../html/topics/impala_query_options.html | 55 +
.../html/topics/impala_query_timeout_s.html | 62 +
docs/build3x/html/topics/impala_rcfile.html | 246 +
docs/build3x/html/topics/impala_real.html | 39 +
docs/build3x/html/topics/impala_refresh.html | 408 ++
.../html/topics/impala_release_notes.html | 26 +
docs/build3x/html/topics/impala_relnotes.html | 26 +
.../html/topics/impala_replica_preference.html | 68 +
.../html/topics/impala_request_pool.html | 35 +
.../html/topics/impala_reserved_words.html | 3853 +++++++++++
.../html/topics/impala_resource_management.html | 97 +
docs/build3x/html/topics/impala_revoke.html | 151 +
.../impala_runtime_bloom_filter_size.html | 104 +
.../topics/impala_runtime_filter_max_size.html | 65 +
.../topics/impala_runtime_filter_min_size.html | 65 +
.../html/topics/impala_runtime_filter_mode.html | 75 +
.../impala_runtime_filter_wait_time_ms.html | 51 +
.../html/topics/impala_runtime_filtering.html | 533 ++
docs/build3x/html/topics/impala_s3.html | 775 +++
.../topics/impala_s3_skip_insert_staging.html | 78 +
.../build3x/html/topics/impala_scalability.html | 920 +++
.../topics/impala_schedule_random_replica.html | 83 +
.../html/topics/impala_schema_design.html | 184 +
.../html/topics/impala_schema_objects.html | 48 +
.../html/topics/impala_scratch_limit.html | 77 +
docs/build3x/html/topics/impala_security.html | 99 +
.../html/topics/impala_security_files.html | 58 +
.../html/topics/impala_security_guidelines.html | 99 +
.../html/topics/impala_security_install.html | 17 +
.../html/topics/impala_security_metastore.html | 30 +
.../html/topics/impala_security_webui.html | 57 +
docs/build3x/html/topics/impala_select.html | 236 +
docs/build3x/html/topics/impala_seqfile.html | 240 +
docs/build3x/html/topics/impala_set.html | 280 +
.../html/topics/impala_shell_commands.html | 416 ++
.../html/topics/impala_shell_options.html | 618 ++
.../topics/impala_shell_running_commands.html | 322 +
docs/build3x/html/topics/impala_show.html | 1525 +++++
.../topics/impala_shuffle_distinct_exprs.html | 37 +
docs/build3x/html/topics/impala_smallint.html | 127 +
docs/build3x/html/topics/impala_ssl.html | 180 +
docs/build3x/html/topics/impala_stddev.html | 121 +
docs/build3x/html/topics/impala_string.html | 197 +
.../html/topics/impala_string_functions.html | 1719 +++++
docs/build3x/html/topics/impala_struct.html | 500 ++
docs/build3x/html/topics/impala_subqueries.html | 332 +
docs/build3x/html/topics/impala_sum.html | 333 +
.../html/topics/impala_support_start_over.html | 30 +
docs/build3x/html/topics/impala_sync_ddl.html | 55 +
docs/build3x/html/topics/impala_tables.html | 446 ++
.../build3x/html/topics/impala_tablesample.html | 560 ++
docs/build3x/html/topics/impala_timeouts.html | 182 +
docs/build3x/html/topics/impala_timestamp.html | 656 ++
docs/build3x/html/topics/impala_tinyint.html | 133 +
.../html/topics/impala_troubleshooting.html | 370 ++
.../html/topics/impala_truncate_table.html | 200 +
docs/build3x/html/topics/impala_tutorial.html | 2270 +++++++
docs/build3x/html/topics/impala_txtfile.html | 770 +++
docs/build3x/html/topics/impala_udf.html | 1603 +++++
docs/build3x/html/topics/impala_union.html | 146 +
docs/build3x/html/topics/impala_update.html | 169 +
docs/build3x/html/topics/impala_upgrading.html | 280 +
docs/build3x/html/topics/impala_upsert.html | 113 +
docs/build3x/html/topics/impala_use.html | 84 +
docs/build3x/html/topics/impala_varchar.html | 254 +
docs/build3x/html/topics/impala_variance.html | 132 +
docs/build3x/html/topics/impala_views.html | 300 +
docs/build3x/html/topics/impala_webui.html | 311 +
docs/build3x/html/topics/impala_with.html | 63 +
docs/build3x/impala-3.0.pdf | Bin 0 -> 3886205 bytes
impala-docs.html | 9 +-
236 files changed, 88682 insertions(+), 1 deletion(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/commonltr.css
----------------------------------------------------------------------
diff --git a/docs/build3x/html/commonltr.css b/docs/build3x/html/commonltr.css
new file mode 100644
index 0000000..f0738c6
--- /dev/null
+++ b/docs/build3x/html/commonltr.css
@@ -0,0 +1,555 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+ font-family: monospace;
+}
+
+.codeph {
+ font-family: monospace;
+}
+
+.kwd {
+ font-weight: bold;
+}
+
+.parmname {
+ font-weight: bold;
+}
+
+.var {
+ font-style: italic;
+}
+
+.filepath {
+ font-family: monospace;
+}
+
+div.tasklabel {
+ margin-top: 1em;
+ margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+ font-size: 100%;
+}
+
+.screen {
+ padding: 5px 5px 5px 5px;
+ border: outset;
+ background-color: #CCCCCC;
+ margin-top: 2px;
+ margin-bottom: 2px;
+ white-space: pre;
+}
+
+.wintitle {
+ font-weight: bold;
+}
+
+.numcharref {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+ border-top: solid 1px;
+ border-right: 0;
+ border-bottom: 0;
+ border-left: 0;
+}
+
+.frame-bottom {
+ border-top: 0;
+ border-right: 0;
+ border-bottom: solid 1px;
+ border-left: 0;
+}
+
+.frame-topbot {
+ border-top: solid 1px;
+ border-right: 0;
+ border-bottom: solid 1px;
+ border-left: 0;
+}
+
+.frame-all {
+ border: solid 1px;
+}
+
+.frame-sides {
+ border-top: 0;
+ border-left: solid 1px;
+ border-right: solid 1px;
+ border-bottom: 0;
+}
+
+.frame-none {
+ border: 0;
+}
+
+.scale-50 {
+ font-size: 50%;
+}
+
+.scale-60 {
+ font-size: 60%;
+}
+
+.scale-70 {
+ font-size: 70%;
+}
+
+.scale-80 {
+ font-size: 80%;
+}
+
+.scale-90 {
+ font-size: 90%;
+}
+
+.scale-100 {
+ font-size: 100%;
+}
+
+.scale-110 {
+ font-size: 110%;
+}
+
+.scale-120 {
+ font-size: 120%;
+}
+
+.scale-140 {
+ font-size: 140%;
+}
+
+.scale-160 {
+ font-size: 160%;
+}
+
+.scale-180 {
+ font-size: 180%;
+}
+
+.scale-200 {
+ font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+ width: 100%;
+}
+
+.fig {
+ /* Default of italics to set apart figure captions */
+ /* Use @frame to create frames on figures */
+}
+.figcap {
+ font-style: italic;
+}
+.figdesc {
+ font-style: normal;
+}
+.figborder {
+ border-color: Silver;
+ border-style: solid;
+ border-width: 2px;
+ margin-top: 1em;
+ padding-left: 3px;
+ padding-right: 3px;
+}
+.figsides {
+ border-color: Silver;
+ border-left: 2px solid;
+ border-right: 2px solid;
+ margin-top: 1em;
+ padding-left: 3px;
+ padding-right: 3px;
+}
+.figtop {
+ border-color: Silver;
+ border-top: 2px solid;
+ margin-top: 1em;
+}
+.figbottom {
+ border-bottom: 2px solid;
+ border-color: Silver;
+}
+.figtopbot {
+ border-bottom: 2px solid;
+ border-color: Silver;
+ border-top: 2px solid;
+ margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+ text-align: left;
+}
+
+div.imagecenter {
+ text-align: center;
+}
+
+div.imageright {
+ text-align: right;
+}
+
+div.imagejustify {
+ text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+ font-size: 1.34em;
+ margin-bottom: 0.1em;
+ margin-top: 0;
+}
+
+.topictitle2 {
+ font-size: 1.17em;
+ margin-bottom: 0.45em;
+ margin-top: 1pc;
+}
+
+.topictitle3 {
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-bottom: 0.17em;
+ margin-top: 1pc;
+}
+
+.topictitle4 {
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-top: 0.83em;
+}
+
+.topictitle5 {
+ font-size: 1.17em;
+ font-weight: bold;
+}
+
+.topictitle6 {
+ font-size: 1.17em;
+ font-style: italic;
+}
+
+.sectiontitle {
+ color: #000;
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-bottom: 0;
+ margin-top: 1em;
+}
+
+.section {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.example {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+ list-style-type: none;
+}
+
+.ulchildlink {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.olchildlink {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.linklist {
+ margin-bottom: 1em;
+}
+
+.linklistwithchild {
+ margin-bottom: 1em;
+ margin-left: 1.5em;
+}
+
+.sublinklist {
+ margin-bottom: 1em;
+ margin-left: 1.5em;
+}
+
+.relconcepts {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.reltasks {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.relref {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.relinfo {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.breadcrumb {
+ font-size: smaller;
+ margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+ list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+ font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+ font-weight: bold;
+ margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+ margin-top: 0;
+}
+
+*[compact="no"] > li {
+ margin-top: 0.53em;
+}
+
+.liexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.sliexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.dlexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.ddexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.stepexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.substepexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+dt.prereq {
+ margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+ font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+ font-weight: bold;
+}
+
+.bolditalic {
+ font-style: italic;
+ font-weight: bold;
+}
+
+.italic {
+ font-style: italic;
+}
+
+.underlined {
+ text-decoration: underline;
+}
+
+.uicontrol {
+ font-weight: bold;
+}
+
+.defkwd {
+ font-weight: bold;
+ text-decoration: underline;
+}
+
+.shortcut {
+ text-decoration: underline;
+}
+
+table {
+ border-collapse: collapse;
+}
+
+table .desc {
+ display: block;
+ font-style: italic;
+}
+
+.cellrowborder {
+ border-bottom: solid 1px;
+ border-left: 0;
+ border-right: solid 1px;
+ border-top: 0;
+}
+
+.row-nocellborder {
+ border-bottom: solid 1px;
+ border-left: 0;
+ border-top: 0;
+}
+
+.cell-norowborder {
+ border-left: 0;
+ border-right: solid 1px;
+ border-top: 0;
+}
+
+.nocellnorowborder {
+ border: 0;
+}
+
+.firstcol {
+ font-weight: bold;
+}
+
+.table--pgwide-1 {
+ width: 100%;
+}
+
+.align-left {
+ text-align: left;
+}
+
+.align-right {
+ text-align: right;
+}
+
+.align-center {
+ text-align: center;
+}
+
+.align-justify {
+ text-align: justify;
+}
+
+.align-char {
+ text-align: char;
+}
+
+.valign-top {
+ vertical-align: top;
+}
+
+.valign-bottom {
+ vertical-align: bottom;
+}
+
+.valign-middle {
+ vertical-align: middle;
+}
+
+.colsep-0 {
+ border-right: 0;
+}
+
+.colsep-1 {
+ border-right: 1px solid;
+}
+
+.rowsep-0 {
+ border-bottom: 0;
+}
+
+.rowsep-1 {
+ border-bottom: 1px solid;
+}
+
+.stentry {
+ border-right: 1px solid;
+ border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+ border-right: 0;
+}
+
+.strow:last-child .stentry {
+ border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+ margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+ margin-top: 1em;
+}
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/commonrtl.css
----------------------------------------------------------------------
diff --git a/docs/build3x/html/commonrtl.css b/docs/build3x/html/commonrtl.css
new file mode 100644
index 0000000..99acb72
--- /dev/null
+++ b/docs/build3x/html/commonrtl.css
@@ -0,0 +1,592 @@
+/*!
+ * This file is part of the DITA Open Toolkit project. See the accompanying LICENSE.md file for applicable licenses.
+ */
+/*
+ | (c) Copyright IBM Corp. 2004, 2005 All Rights Reserved.
+ */
+.codeblock {
+ font-family: monospace;
+}
+
+.codeph {
+ font-family: monospace;
+}
+
+.kwd {
+ font-weight: bold;
+}
+
+.parmname {
+ font-weight: bold;
+}
+
+.var {
+ font-style: italic;
+}
+
+.filepath {
+ font-family: monospace;
+}
+
+div.tasklabel {
+ margin-top: 1em;
+ margin-bottom: 1em;
+}
+
+h2.tasklabel,
+h3.tasklabel,
+h4.tasklabel,
+h5.tasklabel,
+h6.tasklabel {
+ font-size: 100%;
+}
+
+.screen {
+ padding: 5px 5px 5px 5px;
+ border: outset;
+ background-color: #CCCCCC;
+ margin-top: 2px;
+ margin-bottom: 2px;
+ white-space: pre;
+}
+
+.wintitle {
+ font-weight: bold;
+}
+
+.numcharref {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.parameterentity {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.textentity {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlatt {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlelement {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlnsname {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.xmlpi {
+ color: #663399;
+ font-family: Menlo, Monaco, Consolas, "Courier New", monospace;
+}
+
+.frame-top {
+ border-top: solid 1px;
+ border-right: 0;
+ border-bottom: 0;
+ border-left: 0;
+}
+
+.frame-bottom {
+ border-top: 0;
+ border-right: 0;
+ border-bottom: solid 1px;
+ border-left: 0;
+}
+
+.frame-topbot {
+ border-top: solid 1px;
+ border-right: 0;
+ border-bottom: solid 1px;
+ border-left: 0;
+}
+
+.frame-all {
+ border: solid 1px;
+}
+
+.frame-sides {
+ border-top: 0;
+ border-left: solid 1px;
+ border-right: solid 1px;
+ border-bottom: 0;
+}
+
+.frame-none {
+ border: 0;
+}
+
+.scale-50 {
+ font-size: 50%;
+}
+
+.scale-60 {
+ font-size: 60%;
+}
+
+.scale-70 {
+ font-size: 70%;
+}
+
+.scale-80 {
+ font-size: 80%;
+}
+
+.scale-90 {
+ font-size: 90%;
+}
+
+.scale-100 {
+ font-size: 100%;
+}
+
+.scale-110 {
+ font-size: 110%;
+}
+
+.scale-120 {
+ font-size: 120%;
+}
+
+.scale-140 {
+ font-size: 140%;
+}
+
+.scale-160 {
+ font-size: 160%;
+}
+
+.scale-180 {
+ font-size: 180%;
+}
+
+.scale-200 {
+ font-size: 200%;
+}
+
+.expanse-page, .expanse-spread {
+ width: 100%;
+}
+
+.fig {
+ /* Default of italics to set apart figure captions */
+ /* Use @frame to create frames on figures */
+}
+.figcap {
+ font-style: italic;
+}
+.figdesc {
+ font-style: normal;
+}
+.figborder {
+ border-color: Silver;
+ border-style: solid;
+ border-width: 2px;
+ margin-top: 1em;
+ padding-left: 3px;
+ padding-right: 3px;
+}
+.figsides {
+ border-color: Silver;
+ border-left: 2px solid;
+ border-right: 2px solid;
+ margin-top: 1em;
+ padding-left: 3px;
+ padding-right: 3px;
+}
+.figtop {
+ border-color: Silver;
+ border-top: 2px solid;
+ margin-top: 1em;
+}
+.figbottom {
+ border-bottom: 2px solid;
+ border-color: Silver;
+}
+.figtopbot {
+ border-bottom: 2px solid;
+ border-color: Silver;
+ border-top: 2px solid;
+ margin-top: 1em;
+}
+
+/* Align images based on @align on topic/image */
+div.imageleft {
+ text-align: left;
+}
+
+div.imagecenter {
+ text-align: center;
+}
+
+div.imageright {
+ text-align: right;
+}
+
+div.imagejustify {
+ text-align: justify;
+}
+
+/* Set heading sizes, getting smaller for deeper nesting */
+.topictitle1 {
+ font-size: 1.34em;
+ margin-bottom: 0.1em;
+ margin-top: 0;
+}
+
+.topictitle2 {
+ font-size: 1.17em;
+ margin-bottom: 0.45em;
+ margin-top: 1pc;
+}
+
+.topictitle3 {
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-bottom: 0.17em;
+ margin-top: 1pc;
+}
+
+.topictitle4 {
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-top: 0.83em;
+}
+
+.topictitle5 {
+ font-size: 1.17em;
+ font-weight: bold;
+}
+
+.topictitle6 {
+ font-size: 1.17em;
+ font-style: italic;
+}
+
+.sectiontitle {
+ color: #000;
+ font-size: 1.17em;
+ font-weight: bold;
+ margin-bottom: 0;
+ margin-top: 1em;
+}
+
+.section {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.example {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+/* Most link groups are created with <div>. Ensure they have space before and after. */
+.ullinks {
+ list-style-type: none;
+}
+
+.ulchildlink {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.olchildlink {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.linklist {
+ margin-bottom: 1em;
+}
+
+.linklistwithchild {
+ margin-bottom: 1em;
+ margin-left: 1.5em;
+}
+
+.sublinklist {
+ margin-bottom: 1em;
+ margin-left: 1.5em;
+}
+
+.relconcepts {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.reltasks {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.relref {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.relinfo {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.breadcrumb {
+ font-size: smaller;
+ margin-bottom: 1em;
+}
+
+/* Simple lists do not get a bullet */
+ul.simple {
+ list-style-type: none;
+}
+
+/* Default of bold for definition list terms */
+.dlterm {
+ font-weight: bold;
+}
+
+/* Use CSS to expand lists with @compact="no" */
+.dltermexpand {
+ font-weight: bold;
+ margin-top: 1em;
+}
+
+*[compact="yes"] > li {
+ margin-top: 0;
+}
+
+*[compact="no"] > li {
+ margin-top: 0.53em;
+}
+
+.liexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.sliexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.dlexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.ddexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.stepexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+.substepexpand {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+
+dt.prereq {
+ margin-left: 20px;
+}
+
+/* All note formats have the same default presentation */
+.note {
+ margin-bottom: 1em;
+ margin-top: 1em;
+}
+.note .notetitle, .note .notelisttitle,
+.note .note__title {
+ font-weight: bold;
+}
+
+/* Various basic phrase styles */
+.bold {
+ font-weight: bold;
+}
+
+.bolditalic {
+ font-style: italic;
+ font-weight: bold;
+}
+
+.italic {
+ font-style: italic;
+}
+
+.underlined {
+ text-decoration: underline;
+}
+
+.uicontrol {
+ font-weight: bold;
+}
+
+.defkwd {
+ font-weight: bold;
+ text-decoration: underline;
+}
+
+.shortcut {
+ text-decoration: underline;
+}
+
+table {
+ border-collapse: collapse;
+}
+
+table .desc {
+ display: block;
+ font-style: italic;
+}
+
+.cellrowborder {
+ border-bottom: solid 1px;
+ border-left: 0;
+ border-right: solid 1px;
+ border-top: 0;
+}
+
+.row-nocellborder {
+ border-bottom: solid 1px;
+ border-left: 0;
+ border-top: 0;
+}
+
+.cell-norowborder {
+ border-left: 0;
+ border-right: solid 1px;
+ border-top: 0;
+}
+
+.nocellnorowborder {
+ border: 0;
+}
+
+.firstcol {
+ font-weight: bold;
+}
+
+.table--pgwide-1 {
+ width: 100%;
+}
+
+.align-left {
+ text-align: left;
+}
+
+.align-right {
+ text-align: right;
+}
+
+.align-center {
+ text-align: center;
+}
+
+.align-justify {
+ text-align: justify;
+}
+
+.align-char {
+ text-align: char;
+}
+
+.valign-top {
+ vertical-align: top;
+}
+
+.valign-bottom {
+ vertical-align: bottom;
+}
+
+.valign-middle {
+ vertical-align: middle;
+}
+
+.colsep-0 {
+ border-right: 0;
+}
+
+.colsep-1 {
+ border-right: 1px solid;
+}
+
+.rowsep-0 {
+ border-bottom: 0;
+}
+
+.rowsep-1 {
+ border-bottom: 1px solid;
+}
+
+.stentry {
+ border-right: 1px solid;
+ border-bottom: 1px solid;
+}
+
+.stentry:last-child {
+ border-right: 0;
+}
+
+.strow:last-child .stentry {
+ border-bottom: 0;
+}
+
+/* Add space for top level topics */
+.nested0 {
+ margin-top: 1em;
+}
+
+/* div with class=p is used for paragraphs that contain blocks, to keep the XHTML valid */
+.p {
+ margin-top: 1em;
+}
+
+.linklist {
+ margin-bottom: 1em;
+}
+
+.linklistwithchild {
+ margin-right: 1.5em;
+ margin-top: 1em;
+}
+
+.sublinklist {
+ margin-right: 1.5em;
+ margin-top: 1em;
+}
+
+dt.prereq {
+ margin-right: 20px;
+}
+
+.cellrowborder {
+ border-left: solid 1px;
+ border-right: none;
+}
+
+.row-nocellborder {
+ border-left: hidden;
+ border-right: none;
+}
+
+.cell-norowborder {
+ border-left: solid 1px;
+ border-right: none;
+}
+
+.nocellnorowborder {
+ border-left: hidden;
+}
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/images/impala_arch.jpeg
----------------------------------------------------------------------
diff --git a/docs/build3x/html/images/impala_arch.jpeg b/docs/build3x/html/images/impala_arch.jpeg
new file mode 100644
index 0000000..8289469
Binary files /dev/null and b/docs/build3x/html/images/impala_arch.jpeg differ
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/index.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/index.html b/docs/build3x/html/index.html
new file mode 100644
index 0000000..41fc348
--- /dev/null
+++ b/docs/build3x/html/index.html
@@ -0,0 +1,3 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="map"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala"><link rel="stylesheet" type="text/css" href="commonltr.css"><title>Apache Impala Guide</title></head><body id="impala"><h1 class="title topictitle1">Apache Impala Guide</h1><nav><ul class="map"><li class="topicref"><a href="topics/impala_intro.html">Introducing Apache Impala</a></li><li class="topicref"><a href="topics/impala_concepts.html">Concepts and Architecture</a><ul><li class="topicref"><a href="topics/impala_components.html">Components</a></li><li class="topicref"><a href="topics/impala_development.html">Developing Applications</a></li><li class="topicref"><a href="topics/impala_hadoop.html">Role in the Hadoop Ecosystem</a></li></ul></li><li
class="topicref"><a href="topics/impala_planning.html">Deployment Planning</a><ul><li class="topicref"><a href="topics/impala_prereqs.html#prereqs">Requirements</a></li><li class="topicref"><a href="topics/impala_schema_design.html">Designing Schemas</a></li></ul></li><li class="topicref"><a href="topics/impala_install.html#install">Installing Impala</a></li><li class="topicref"><a href="topics/impala_config.html">Managing Impala</a><ul><li class="topicref"><a href="topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></li><li class="topicref"><a href="topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></li><li class="topicref"><a href="topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></li></ul></li><li class="topicref"><a href="topics/impala_upgrading.html">Upgrading Impala</a></li><li class="topicref"><a href="topics/impala_processes.html">Starting Impala</a><ul><li class="topicref"><a href="topics/impala_config_options
.html">Modifying Impala Startup Options</a></li></ul></li><li class="topicref"><a href="topics/impala_tutorial.html">Tutorials</a></li><li class="topicref"><a href="topics/impala_admin.html">Administration</a><ul><li class="topicref"><a href="topics/impala_admission.html">Admission Control and Query Queuing</a></li><li class="topicref"><a href="topics/impala_resource_management.html">Resource Management for Impala</a></li><li class="topicref"><a href="topics/impala_timeouts.html">Setting Timeouts</a></li><li class="topicref"><a href="topics/impala_proxy.html">Load-Balancing Proxy for HA</a></li><li class="topicref"><a href="topics/impala_disk_space.html">Managing Disk Space</a></li></ul></li><li class="topicref"><a href="topics/impala_security.html">Impala Security</a><ul><li class="topicref"><a href="topics/impala_security_guidelines.html">Security Guidelines for Impala</a></li><li class="topicref"><a href="topics/impala_security_files.html">Securing Impala Data and Log Files</a></
li><li class="topicref"><a href="topics/impala_security_install.html">Installation Considerations for Impala Security</a></li><li class="topicref"><a href="topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></li><li class="topicref"><a href="topics/impala_security_webui.html">Securing the Impala Web User Interface</a></li><li class="topicref"><a href="topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></li><li class="topicref"><a href="topics/impala_authorization.html">Enabling Sentry Authorization for Impala</a></li><li class="topicref"><a href="topics/impala_authentication.html">Impala Authentication</a><ul><li class="topicref"><a href="topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></li><li class="topicref"><a href="topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></li><li class="topicref"><a href="topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></li><li clas
s="topicref"><a href="topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></li></ul></li><li class="topicref"><a href="topics/impala_auditing.html">Auditing</a></li><li class="topicref"><a href="topics/impala_lineage.html">Viewing Lineage Info</a></li></ul></li><li class="topicref"><a href="topics/impala_langref.html">SQL Reference</a><ul><li class="topicref"><a href="topics/impala_comments.html">Comments</a></li><li class="topicref"><a href="topics/impala_datatypes.html">Data Types</a><ul><li class="topicref"><a href="topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_bigint.html">BIGINT</a></li><li class="topicref"><a href="topics/impala_boolean.html">BOOLEAN</a></li><li class="topicref"><a href="topics/impala_char.html">CHAR</a></li><li class="topicref"><a href="topics/impala_decimal.html">DECIMAL</a></li><li class="topicref"><a href="topics/impala_double.html">DOUBLE</a></l
i><li class="topicref"><a href="topics/impala_float.html">FLOAT</a></li><li class="topicref"><a href="topics/impala_int.html">INT</a></li><li class="topicref"><a href="topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_real.html">REAL</a></li><li class="topicref"><a href="topics/impala_smallint.html">SMALLINT</a></li><li class="topicref"><a href="topics/impala_string.html">STRING</a></li><li class="topicref"><a href="topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></li><li class="topicref"><a href="topics/impala_timestamp.html">TIMESTAMP</a></li><li class="topicref"><a href="topics/impala_tinyint.html">TINYINT</a></li><li class="topicref"><a href="topics/impala_varchar.html">VARCHAR</a></li><li class="topicref"><a href="topics/impala_complex_types.html">Complex Types (Impala 2.3 or higher only)</a></li></ul></li><li class="topicref"><a href="topics/impala_literals.html">Literals</a></
li><li class="topicref"><a href="topics/impala_operators.html">SQL Operators</a></li><li class="topicref"><a href="topics/impala_schema_objects.html">Schema Objects and Object Names</a><ul><li class="topicref"><a href="topics/impala_aliases.html">Aliases</a></li><li class="topicref"><a href="topics/impala_databases.html">Databases</a></li><li class="topicref"><a href="topics/impala_functions_overview.html">Functions</a></li><li class="topicref"><a href="topics/impala_identifiers.html">Identifiers</a></li><li class="topicref"><a href="topics/impala_tables.html">Tables</a></li><li class="topicref"><a href="topics/impala_views.html">Views</a></li></ul></li><li class="topicref"><a href="topics/impala_langref_sql.html">SQL Statements</a><ul><li class="topicref"><a href="topics/impala_ddl.html">DDL Statements</a></li><li class="topicref"><a href="topics/impala_dml.html">DML Statements</a></li><li class="topicref"><a href="topics/impala_alter_table.html">ALTER TABLE</a></li><li class="topi
cref"><a href="topics/impala_alter_view.html">ALTER VIEW</a></li><li class="topicref"><a href="topics/impala_compute_stats.html">COMPUTE STATS</a></li><li class="topicref"><a href="topics/impala_create_database.html">CREATE DATABASE</a></li><li class="topicref"><a href="topics/impala_create_function.html">CREATE FUNCTION</a></li><li class="topicref"><a href="topics/impala_create_role.html">CREATE ROLE</a></li><li class="topicref"><a href="topics/impala_create_table.html">CREATE TABLE</a></li><li class="topicref"><a href="topics/impala_create_view.html">CREATE VIEW</a></li><li class="topicref"><a href="topics/impala_delete.html">DELETE</a></li><li class="topicref"><a href="topics/impala_describe.html">DESCRIBE</a></li><li class="topicref"><a href="topics/impala_drop_database.html">DROP DATABASE</a></li><li class="topicref"><a href="topics/impala_drop_function.html">DROP FUNCTION</a></li><li class="topicref"><a href="topics/impala_drop_role.html">DROP ROLE</a></li><li class="topicref"
><a href="topics/impala_drop_stats.html">DROP STATS</a></li><li class="topicref"><a href="topics/impala_drop_table.html">DROP TABLE</a></li><li class="topicref"><a href="topics/impala_drop_view.html">DROP VIEW</a></li><li class="topicref"><a href="topics/impala_explain.html">EXPLAIN</a></li><li class="topicref"><a href="topics/impala_grant.html">GRANT</a></li><li class="topicref"><a href="topics/impala_insert.html">INSERT</a></li><li class="topicref"><a href="topics/impala_invalidate_metadata.html">INVALIDATE METADATA</a></li><li class="topicref"><a href="topics/impala_load_data.html">LOAD DATA</a></li><li class="topicref"><a href="topics/impala_refresh.html">REFRESH</a></li><li class="topicref"><a href="topics/impala_revoke.html">REVOKE</a></li><li class="topicref"><a href="topics/impala_select.html">SELECT</a><ul><li class="topicref"><a href="topics/impala_joins.html">Joins</a></li><li class="topicref"><a href="topics/impala_order_by.html">ORDER BY Clause</a></li><li class="topicr
ef"><a href="topics/impala_group_by.html">GROUP BY Clause</a></li><li class="topicref"><a href="topics/impala_having.html">HAVING Clause</a></li><li class="topicref"><a href="topics/impala_limit.html">LIMIT Clause</a></li><li class="topicref"><a href="topics/impala_offset.html">OFFSET Clause</a></li><li class="topicref"><a href="topics/impala_union.html">UNION Clause</a></li><li class="topicref"><a href="topics/impala_subqueries.html">Subqueries</a></li><li class="topicref"><a href="topics/impala_tablesample.html">TABLESAMPLE Clause</a></li><li class="topicref"><a href="topics/impala_with.html">WITH Clause</a></li><li class="topicref"><a href="topics/impala_distinct.html">DISTINCT Operator</a></li></ul></li><li class="topicref"><a href="topics/impala_set.html">SET</a><ul><li class="topicref"><a href="topics/impala_query_options.html">Query Options for the SET Statement</a><ul><li class="topicref"><a href="topics/impala_abort_on_error.html">ABORT_ON_ERROR</a></li><li class="topicref"
><a href="topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS</a></li><li class="topicref"><a href="topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT</a></li><li class="topicref"><a href="topics/impala_batch_size.html">BATCH_SIZE</a></li><li class="topicref"><a href="topics/impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT</a></li><li class="topicref"><a href="topics/impala_compression_codec.html">COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_compute_stats_min_sample_size.html">COMPUTE_STATS_MIN_SAMPLE_SIZE</a></li><li class="topicref"><a href="topics/impala_debug_action.html">DEBUG_ACTION</a></li><li class="topicref"><a href="topics/impala_decimal_v2.html">DECIMAL_V2</a></li><li class="topicref"><a href="topics/impala_default_join_distribution_mode.html">DEFAULT_JOIN_DISTRIBUTION_MODE</a></li><li class="topicref"><a href="topics/impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE</a></li><li class="topicref">
<a href="topics/impala_disable_codegen.html">DISABLE_CODEGEN</a></li><li class="topicref"><a href="topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a></li><li class="topicref"><a href="topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS</a></li><li class="topicref"><a href="topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS</a></li><li class="topicref"><a href="topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD</a></li><li class="topicref"><a href="topics/impala_exec_time_limit_s.html">EXEC_TIME_LIMIT_S</a></li><li class="topicref"><a href="topics/impala_explain_level.html">EXPLAIN_LEVEL</a></li><li class="topicref"><a href="topics/impala_hbase_cache_blocks.html">HBASE_CACHE_BLOCKS</a></li><li class="topicref"><a href="topics/impala_hbase_caching.html">HBASE_CACHING</a></li><li class="topicref"><a href="topics/impala_live_progress.html">LIVE_PROGRESS</a></li><li class="t
opicref"><a href="topics/impala_live_summary.html">LIVE_SUMMARY</a></li><li class="topicref"><a href="topics/impala_max_errors.html">MAX_ERRORS</a></li><li class="topicref"><a href="topics/impala_max_row_size.html">MAX_ROW_SIZE</a></li><li class="topicref"><a href="topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a></li><li class="topicref"><a href="topics/impala_max_scan_range_length.html">MAX_SCAN_RANGE_LENGTH</a></li><li class="topicref"><a href="topics/impala_mem_limit.html">MEM_LIMIT</a></li><li class="topicref"><a href="topics/impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE</a></li><li class="topicref"><a href="topics/impala_mt_dop.html">MT_DOP</a></li><li class="topicref"><a href="topics/impala_num_nodes.html">NUM_NODES</a></li><li class="topicref"><a href="topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS</a></li><li class="topicref"><a href="topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS</a></li><
li class="topicref"><a href="topics/impala_parquet_compression_codec.html">PARQUET_COMPRESSION_CODEC</a></li><li class="topicref"><a href="topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8</a></li><li class="topicref"><a href="topics/impala_parquet_array_resolution.html">PARQUET_ARRAY_RESOLUTION</a></li><li class="topicref"><a href="topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION</a></li><li class="topicref"><a href="topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE</a></li><li class="topicref"><a href="topics/impala_prefetch_mode.html">PREFETCH_MODE</a></li><li class="topicref"><a href="topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S</a></li><li class="topicref"><a href="topics/impala_request_pool.html">REQUEST_POOL</a></li><li class="topicref"><a href="topics/impala_replica_preference.html">REPLICA_PREFERENCE</a></li><li class="topicref"><a href="topics/impala_runtime_bloom_filter_size.html">RUNTIME_
BLOOM_FILTER_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a></li><li class="topicref"><a href="topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a></li><li class="topicref"><a href="topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING</a></li><li class="topicref"><a href="topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA</a></li><li class="topicref"><a href="topics/impala_scratch_limit.html">SCRATCH_LIMIT</a></li><li class="topicref"><a href="topics/impala_shuffle_distinct_exprs.html">SHUFFLE_DISTINCT_EXPRS</a></li><li class="topicref"><a href="topics/impala_support_start_over.html">SUPPORT_START_OVER</a></li><li class="topicref"><a href="topics/impala_sync_dd
l.html">SYNC_DDL</a></li></ul></li></ul></li><li class="topicref"><a href="topics/impala_show.html">SHOW</a></li><li class="topicref"><a href="topics/impala_truncate_table.html">TRUNCATE TABLE</a></li><li class="topicref"><a href="topics/impala_update.html">UPDATE</a></li><li class="topicref"><a href="topics/impala_upsert.html">UPSERT</a></li><li class="topicref"><a href="topics/impala_use.html">USE</a></li><li class="topicref"><a href="topics/impala_hints.html">Optimizer Hints</a></li></ul></li><li class="topicref"><a href="topics/impala_functions.html">Built-In Functions</a><ul><li class="topicref"><a href="topics/impala_math_functions.html">Mathematical Functions</a></li><li class="topicref"><a href="topics/impala_bit_functions.html">Bit Functions</a></li><li class="topicref"><a href="topics/impala_conversion_functions.html">Type Conversion Functions</a></li><li class="topicref"><a href="topics/impala_datetime_functions.html">Date and Time Functions</a></li><li class="topicref"><
a href="topics/impala_conditional_functions.html">Conditional Functions</a></li><li class="topicref"><a href="topics/impala_string_functions.html">String Functions</a></li><li class="topicref"><a href="topics/impala_misc_functions.html">Miscellaneous Functions</a></li><li class="topicref"><a href="topics/impala_aggregate_functions.html">Aggregate Functions</a><ul><li class="topicref"><a href="topics/impala_appx_median.html">APPX_MEDIAN</a></li><li class="topicref"><a href="topics/impala_avg.html">AVG</a></li><li class="topicref"><a href="topics/impala_count.html">COUNT</a></li><li class="topicref"><a href="topics/impala_group_concat.html">GROUP_CONCAT</a></li><li class="topicref"><a href="topics/impala_max.html">MAX</a></li><li class="topicref"><a href="topics/impala_min.html">MIN</a></li><li class="topicref"><a href="topics/impala_ndv.html">NDV</a></li><li class="topicref"><a href="topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP</a></li><li class="topicref"><a href="topi
cs/impala_sum.html">SUM</a></li><li class="topicref"><a href="topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP</a></li></ul></li><li class="topicref"><a href="topics/impala_analytic_functions.html">Analytic Functions</a></li><li class="topicref"><a href="topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></li></ul></li><li class="topicref"><a href="topics/impala_langref_unsupported.html">SQL Differences Between Impala and Hive</a></li><li class="topicref"><a href="topics/impala_porting.html">Porting SQL</a></li></ul></li><li class="topicref"><a href="topics/impala_impala_shell.html">The Impala Shell</a><ul><li class="topicref"><a href="topics/impala_shell_options.html">Configuration Options</a></li><li class="topicref"><a href="topics/impala_connecting.html">Connecting to impalad</a></li><li class="topicref"><a href="topics/impala_shell_running_commands.html">Running Commands and SQL Statements</a></li><li class="topicref"><a href="t
opics/impala_shell_commands.html">Command Reference</a></li></ul></li><li class="topicref"><a href="topics/impala_performance.html">Performance Tuning</a><ul><li class="topicref"><a href="topics/impala_perf_cookbook.html">Performance Best Practices</a></li><li class="topicref"><a href="topics/impala_perf_joins.html">Join Performance</a></li><li class="topicref"><a href="topics/impala_perf_stats.html">Table and Column Statistics</a></li><li class="topicref"><a href="topics/impala_perf_benchmarking.html">Benchmarking</a></li><li class="topicref"><a href="topics/impala_perf_resources.html">Controlling Resource Usage</a></li><li class="topicref"><a href="topics/impala_runtime_filtering.html">Runtime Filtering</a></li><li class="topicref"><a href="topics/impala_perf_hdfs_caching.html">HDFS Caching</a></li><li class="topicref"><a href="topics/impala_perf_testing.html">Testing Impala Performance</a></li><li class="topicref"><a href="topics/impala_explain_plan.html">EXPLAIN Plans and Query
Profiles</a></li><li class="topicref"><a href="topics/impala_perf_skew.html">HDFS Block Skew</a></li></ul></li><li class="topicref"><a href="topics/impala_scalability.html">Scalability Considerations</a></li><li class="topicref"><a href="topics/impala_partitioning.html">Partitioning</a></li><li class="topicref"><a href="topics/impala_file_formats.html">File Formats</a><ul><li class="topicref"><a href="topics/impala_txtfile.html">Text Data Files</a></li><li class="topicref"><a href="topics/impala_parquet.html">Parquet Data Files</a></li><li class="topicref"><a href="topics/impala_avro.html">Avro Data Files</a></li><li class="topicref"><a href="topics/impala_rcfile.html">RCFile Data Files</a></li><li class="topicref"><a href="topics/impala_seqfile.html">SequenceFile Data Files</a></li></ul></li><li class="topicref"><a href="topics/impala_kudu.html">Using Impala to Query Kudu Tables</a></li><li class="topicref"><a href="topics/impala_hbase.html">HBase Tables</a></li><li class="topicref
"><a href="topics/impala_s3.html">S3 Tables</a></li><li class="topicref"><a href="topics/impala_adls.html">ADLS Tables</a></li><li class="topicref"><a href="topics/impala_isilon.html">Isilon Storage</a></li><li class="topicref"><a href="topics/impala_logging.html">Logging</a></li><li class="topicref"><a href="topics/impala_troubleshooting.html">Troubleshooting Impala</a><ul><li class="topicref"><a href="topics/impala_webui.html">Web User Interface</a></li><li class="topicref"><a href="topics/impala_breakpad.html">Breakpad Minidumps</a></li></ul></li><li class="topicref"><a href="topics/impala_ports.html">Ports Used by Impala</a></li><li class="topicref"><a href="topics/impala_reserved_words.html">Impala Reserved Words</a></li><li class="topicref"><a href="topics/impala_faq.html">Impala Frequently Asked Questions</a></li><li class="topicref"><a href="topics/impala_release_notes.html">Impala Release Notes</a><ul><li class="topicref"><a href="topics/impala_new_features.html">New Featur
es in Apache Impala</a></li><li class="topicref"><a href="topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala</a></li><li class="topicref"><a href="topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></li><li class="topicref"><a href="topics/impala_fixed_issues.html">Fixed Issues in Apache Impala</a></li></ul></li></ul></nav></body></html>
\ No newline at end of file
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_abort_on_error.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_abort_on_error.html b/docs/build3x/html/topics/impala_abort_on_error.html
new file mode 100644
index 0000000..6887375
--- /dev/null
+++ b/docs/build3x/html/topics/impala_abort_on_error.html
@@ -0,0 +1,42 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="abort_on_error"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ABORT_ON_ERROR Query Option</title></head><body id="abort_on_error"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ABORT_ON_ERROR Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ When this option is enabled, Impala cancels a query immediately when any of the nodes encounters an error,
+ rather than continuing and possibly returning incomplete results. This option is disabled by default, to help
+ gather maximum diagnostic information when an error occurs, for example, whether the same problem occurred on
+ all nodes or only a single node. Currently, the errors that Impala can skip over involve data corruption,
+ such as a column that contains a string value when expected to contain an integer value.
+ </p>
+
+ <p class="p">
+ To control how much logging Impala does for non-fatal errors when <code class="ph codeph">ABORT_ON_ERROR</code> is turned
+ off, use the <code class="ph codeph">MAX_ERRORS</code> option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_max_errors.html#max_errors">MAX_ERRORS Query Option</a>,
+ <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[28/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_insert.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_insert.html b/docs/build3x/html/topics/impala_insert.html
new file mode 100644
index 0000000..61044fb
--- /dev/null
+++ b/docs/build3x/html/topics/impala_insert.html
@@ -0,0 +1,911 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="insert"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INSERT Statement</title></head><body id="insert"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">INSERT Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports inserting into tables and partitions that you create with the Impala <code class="ph codeph">CREATE
+ TABLE</code> statement, or pre-defined tables and partitions created through Hive.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>[<var class="keyword varname">with_clause</var>]
+ INSERT <span class="ph">[<var class="keyword varname">hint_clause</var>]</span> { INTO | OVERWRITE } [TABLE] <var class="keyword varname">table_name</var>
+ [(<var class="keyword varname">column_list</var>)]
+ [ PARTITION (<var class="keyword varname">partition_clause</var>)]
+{
+ [<var class="keyword varname">hint_clause</var>] <var class="keyword varname">select_statement</var>
+ | VALUES (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) [, (<var class="keyword varname">value</var> [, <var class="keyword varname">value</var> ...]) ...]
+}
+
+partition_clause ::= <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] [, <var class="keyword varname">col_name</var> [= <var class="keyword varname">constant</var>] ...]
+
+hint_clause ::=
+ <var class="keyword varname">hint_with_dashes</var> |
+ <var class="keyword varname">hint_with_cstyle_delimiters</var> |
+ <var class="keyword varname">hint_with_brackets</var>
+
+hint_with_dashes ::= -- +SHUFFLE | -- +NOSHUFFLE <span class="ph">-- +CLUSTERED</span>
+
+hint_with_cstyle_comments ::= /* +SHUFFLE */ | /* +NOSHUFFLE */ <span class="ph">| /* +CLUSTERED */</span>
+
+hint_with_brackets ::= [SHUFFLE] | [NOSHUFFLE]
+ (With this hint format, the square brackets are part of the syntax.)
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The square bracket style of hint is now deprecated and might be removed in
+ a future release. For that reason, any newly added hints are not available
+ with the square bracket syntax.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Appending or replacing (INTO and OVERWRITE clauses):</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">INSERT INTO</code> syntax appends data to a table. The existing data files are left as-is, and
+ the inserted data is put into one or more new data files.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">INSERT OVERWRITE</code> syntax replaces the data in a table.
+
+
+ Currently, the overwritten data files are deleted immediately; they do not go through the HDFS trash
+ mechanism.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement currently does not support writing data files
+ containing complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+ To prepare Parquet data for such tables, you generate the data files outside Impala and then
+ use <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE</code> to associate those
+ data files with the table. Currently, such tables must use the Parquet file format.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about working with complex types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ Currently, the <code class="ph codeph">INSERT OVERWRITE</code> syntax cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ Kudu tables require a unique primary key for each row. If an <code class="ph codeph">INSERT</code>
+ statement attempts to insert a row with the same values for the primary key columns
+ as an existing row, that row is discarded and the insert operation continues.
+ When rows are discarded due to duplicate primary keys, the statement finishes
+ with a warning, not an error. (This is a change from early releases of Kudu
+ where the default was to return in error in such cases, and the syntax
+ <code class="ph codeph">INSERT IGNORE</code> was required to make the statement succeed.
+ The <code class="ph codeph">IGNORE</code> clause is no longer part of the <code class="ph codeph">INSERT</code>
+ syntax.)
+ </p>
+
+ <p class="p">
+ For situations where you prefer to replace rows with duplicate primary key values,
+ rather than discarding the new data, you can use the <code class="ph codeph">UPSERT</code>
+ statement instead of <code class="ph codeph">INSERT</code>. <code class="ph codeph">UPSERT</code> inserts
+ rows that are entirely new, and for rows that match an existing primary key in the
+ table, the non-primary-key columns are updated to reflect the values in the
+ <span class="q">"upserted"</span> data.
+ </p>
+
+ <p class="p">
+ If you really want to store new rows, not replace existing ones, but cannot do so
+ because of the primary key uniqueness constraint, consider recreating the table
+ with additional columns included in the primary key.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a> for more details about using Impala with Kudu.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Impala currently supports:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Copy data from another table using <code class="ph codeph">SELECT</code> query. In Impala 1.2.1 and higher, you can
+ combine <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">INSERT</code> operations into a single step with the
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax, which bypasses the actual <code class="ph codeph">INSERT</code> keyword.
+ </li>
+
+ <li class="li">
+ An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+ <code class="ph codeph">INSERT</code> keyword, to define a subquery referenced in the <code class="ph codeph">SELECT</code> portion.
+ </li>
+
+ <li class="li">
+ Create one or more new rows using constant expressions through <code class="ph codeph">VALUES</code> clause. (The
+ <code class="ph codeph">VALUES</code> clause was added in Impala 1.0.1.)
+ </li>
+
+ <li class="li">
+ <p class="p">
+ By default, the first column of each newly inserted row goes into the first column of the table, the
+ second column into the second column, and so on.
+ </p>
+ <p class="p">
+ You can also specify the columns to be inserted, an arbitrarily ordered subset of the columns in the
+ destination table, by specifying a column list immediately after the name of the destination table. This
+ feature lets you adjust the inserted columns to match the layout of a <code class="ph codeph">SELECT</code> statement,
+ rather than the other way around. (This feature was added in Impala 1.1.)
+ </p>
+ <p class="p">
+ The number of columns mentioned in the column list (known as the <span class="q">"column permutation"</span>) must match
+ the number of columns in the <code class="ph codeph">SELECT</code> list or the <code class="ph codeph">VALUES</code> tuples. The
+ order of columns in the column permutation can be different than in the underlying table, and the columns
+ of each input row are reordered to match. If the number of columns in the column permutation is less than
+ in the destination table, all unmentioned columns are set to <code class="ph codeph">NULL</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ An optional hint clause immediately either before the <code class="ph codeph">SELECT</code> keyword or after the
+ <code class="ph codeph">INSERT</code> keyword, to fine-tune the behavior when doing an <code class="ph codeph">INSERT ... SELECT</code>
+ operation into partitioned Parquet tables. The hint clause cannot be specified in multiple places.
+ The hint keywords are <code class="ph codeph">[SHUFFLE]</code> and <code class="ph codeph">[NOSHUFFLE]</code>, including the square brackets.
+ Inserting into partitioned Parquet tables can be a resource-intensive operation because it potentially
+ involves many files being written to HDFS simultaneously, and separate
+ <span class="ph">large</span> memory buffers being allocated to buffer the data for each
+ partition. For usage details, see <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a>.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <ul class="ul">
+ <li class="li">
+ Insert commands that partition or add files result in changes to Hive metadata. Because Impala uses Hive
+ metadata, such changes may necessitate a metadata refresh. For more information, see the
+ <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> function.
+ </li>
+
+ <li class="li">
+ Currently, Impala can only insert data into tables that use the text and Parquet formats. For other file
+ formats, insert the data using Hive and use Impala to query it.
+ </li>
+
+ <li class="li">
+ As an alternative to the <code class="ph codeph">INSERT</code> statement, if you have existing data files elsewhere in
+ HDFS, the <code class="ph codeph">LOAD DATA</code> statement can move those files into a table. This statement works
+ with tables of any file format.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DML (but still affected by
+ <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ When you insert the results of an expression, particularly of a built-in function call, into a small numeric
+ column such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">TINYINT</code>, or
+ <code class="ph codeph">FLOAT</code>, you might need to use a <code class="ph codeph">CAST()</code> expression to coerce values into the
+ appropriate type. Impala does not automatically convert from a larger type to a smaller one. For example, to
+ insert cosine values into a <code class="ph codeph">FLOAT</code> column, write <code class="ph codeph">CAST(COS(angle) AS FLOAT)</code>
+ in the <code class="ph codeph">INSERT</code> statement to make the conversion explicit.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">File format considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because Impala can read certain file formats that it cannot write,
+ the <code class="ph codeph">INSERT</code> statement does not work for all kinds of
+ Impala tables. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>
+ for details about what file formats are supported by the
+ <code class="ph codeph">INSERT</code> statement.
+ </p>
+
+ <p class="p">
+ Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+ to write one block. Because Parquet data files use a block size of 1 GB by default, an
+ <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+ space.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example sets up new tables with the same definition as the <code class="ph codeph">TAB1</code> table from the
+ <a class="xref" href="impala_tutorial.html#tutorial">Tutorial</a> section, using different file
+ formats, and demonstrates inserting data into the tables created with the <code class="ph codeph">STORED AS TEXTFILE</code>
+ and <code class="ph codeph">STORED AS PARQUET</code> clauses:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE DATABASE IF NOT EXISTS file_formats;
+USE file_formats;
+
+DROP TABLE IF EXISTS text_table;
+CREATE TABLE text_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS TEXTFILE;
+
+DROP TABLE IF EXISTS parquet_table;
+CREATE TABLE parquet_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS PARQUET;</code></pre>
+
+ <p class="p">
+ With the <code class="ph codeph">INSERT INTO TABLE</code> syntax, each new set of inserted rows is appended to any existing
+ data in the table. This is how you would record small amounts of data that arrive continuously, or ingest new
+ batches of data alongside the existing data. For example, after running 2 <code class="ph codeph">INSERT INTO TABLE</code>
+ statements with 5 rows each, the table contains 10 rows total:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.41s
+
+[localhost:21000] > insert into table text_table select * from default.tab1;
+Inserted 5 rows in 0.46s
+
+[localhost:21000] > select count(*) from text_table;
++----------+
+| count(*) |
++----------+
+| 10 |
++----------+
+Returned 1 row(s) in 0.26s</code></pre>
+
+ <p class="p">
+ With the <code class="ph codeph">INSERT OVERWRITE TABLE</code> syntax, each new set of inserted rows replaces any existing
+ data in the table. This is how you load data to query in a data warehousing scenario where you analyze just
+ the data for a particular day, quarter, and so on, discarding the previous data each time. You might keep the
+ entire set of data in one raw table, and transfer and transform certain rows into a more compact and
+ efficient form to perform intensive analysis on that subset.
+ </p>
+
+ <p class="p">
+ For example, here we insert 5 rows into a table using the <code class="ph codeph">INSERT INTO</code> clause, then replace
+ the data by inserting 3 rows with the <code class="ph codeph">INSERT OVERWRITE</code> clause. Afterward, the table only
+ contains the 3 rows from the final <code class="ph codeph">INSERT</code> statement.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > insert into table parquet_table select * from default.tab1;
+Inserted 5 rows in 0.35s
+
+[localhost:21000] > insert overwrite table parquet_table select * from default.tab1 limit 3;
+Inserted 3 rows in 0.43s
+[localhost:21000] > select count(*) from parquet_table;
++----------+
+| count(*) |
++----------+
+| 3 |
++----------+
+Returned 1 row(s) in 0.43s</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph"><a class="xref" href="impala_insert.html#values">VALUES</a></code> clause lets you insert one or more
+ rows by specifying constant values for all the columns. The number, types, and order of the expressions must
+ match the table definition.
+ </p>
+
+ <div class="note note note_note" id="insert__insert_values_warning"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+ HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+ data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+ syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do not
+ run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row each
+ time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as one
+ stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code> clause,
+ and use a separate database to make cleanup easier if the operation does produce many tiny files.
+ </div>
+
+ <p class="p">
+ The following example shows how to insert one row or multiple rows, with expressions of different types,
+ using literal values, expressions, and function return values:
+ </p>
+
+<pre class="pre codeblock"><code>create table val_test_1 (c1 int, c2 float, c3 string, c4 boolean, c5 timestamp);
+insert into val_test_1 values (100, 99.9/10, 'abc', true, now());
+create table val_test_2 (id int, token string);
+insert overwrite val_test_2 values (1, 'a'), (2, 'b'), (-1,'xyzzy');</code></pre>
+
+ <p class="p">
+ These examples show the type of <span class="q">"not implemented"</span> error that you see when attempting to insert data into
+ a table with a file format that Impala currently does not write to:
+ </p>
+
+<pre class="pre codeblock"><code>DROP TABLE IF EXISTS sequence_table;
+CREATE TABLE sequence_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS SEQUENCEFILE;
+
+DROP TABLE IF EXISTS rc_table;
+CREATE TABLE rc_table
+( id INT, col_1 BOOLEAN, col_2 DOUBLE, col_3 TIMESTAMP )
+STORED AS RCFILE;
+
+[localhost:21000] > insert into table rc_table select * from default.tab1;
+Remote error
+Backend 0:RC_FILE not implemented.
+
+[localhost:21000] > insert into table sequence_table select * from default.tab1;
+Remote error
+Backend 0:SEQUENCE_FILE not implemented. </code></pre>
+
+ <p class="p">
+ The following examples show how you can copy the data in all the columns from one table to another, copy the
+ data from only some columns, or specify the columns in the select list in a different order than they
+ actually appear in the table:
+ </p>
+
+<pre class="pre codeblock"><code>-- Start with 2 identical tables.
+create table t1 (c1 int, c2 int);
+create table t2 like t1;
+
+-- If there is no () part after the destination table name,
+-- all columns must be specified, either as * or by name.
+insert into t2 select * from t1;
+insert into t2 select c1, c2 from t1;
+
+-- With the () notation following the destination table name,
+-- you can omit columns (all values for that column are NULL
+-- in the destination table), and/or reorder the values
+-- selected from the source table. This is the "column permutation" feature.
+insert into t2 (c1) select c1 from t1;
+insert into t2 (c2, c1) select c1, c2 from t1;
+
+-- The column names can be entirely different in the source and destination tables.
+-- You can copy any columns, not just the corresponding ones, from the source table.
+-- But the number and type of selected columns must match the columns mentioned in the () part.
+alter table t2 replace columns (x int, y int);
+insert into t2 (y) select c1 from t1;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+ <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+ results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+ many different data files, prepared on different data nodes, and therefore the notion of the data being
+ stored in sorted order is impractical.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Concurrency considerations:</strong> Each <code class="ph codeph">INSERT</code> operation creates new data files with unique
+ names, so you can run multiple <code class="ph codeph">INSERT INTO</code> statements simultaneously without filename
+ conflicts.
+
+ While data is being inserted into an Impala table, the data is staged temporarily in a subdirectory inside
+ the data directory; during this period, you cannot issue queries against that table in Hive. If an
+ <code class="ph codeph">INSERT</code> operation fails, the temporary data file and the subdirectory could be left behind in
+ the data directory. If so, remove the relevant subdirectory and any data files it contains manually, by
+ issuing an <code class="ph codeph">hdfs dfs -rm -r</code> command, specifying the full path of the work subdirectory, whose
+ name ends in <code class="ph codeph">_dir</code>.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="insert__values">
+
+ <h2 class="title topictitle2" id="ariaid-title2">VALUES Clause</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">VALUES</code> clause is a general-purpose way to specify the columns of one or more rows,
+ typically within an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">INSERT ... VALUES</code> technique is not suitable for loading large quantities of data into
+ HDFS-based tables, because the insert operations cannot be parallelized, and each one produces a separate
+ data file. Use it for setting up small dimension tables or tiny amounts of data for experimenting with SQL
+ syntax, or with HBase tables. Do not use it for large ETL jobs or benchmark tests for load operations. Do
+ not run scripts with thousands of <code class="ph codeph">INSERT ... VALUES</code> statements that insert a single row
+ each time. If you do run <code class="ph codeph">INSERT ... VALUES</code> operations to load data into a staging table as
+ one stage in an ETL pipeline, include multiple row values if possible within each <code class="ph codeph">VALUES</code>
+ clause, and use a separate database to make cleanup easier if the operation does produce many tiny files.
+ </div>
+
+ <p class="p">
+ The following examples illustrate:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ How to insert a single row using a <code class="ph codeph">VALUES</code> clause.
+ </li>
+
+ <li class="li">
+ How to insert multiple rows using a <code class="ph codeph">VALUES</code> clause.
+ </li>
+
+ <li class="li">
+ How the row or rows from a <code class="ph codeph">VALUES</code> clause can be appended to a table through
+ <code class="ph codeph">INSERT INTO</code>, or replace the contents of the table through <code class="ph codeph">INSERT
+ OVERWRITE</code>.
+ </li>
+
+ <li class="li">
+ How the entries in a <code class="ph codeph">VALUES</code> clause can be literals, function results, or any other kind
+ of expression. See <a class="xref" href="impala_literals.html#literals">Literals</a> for the notation to use for literal
+ values, especially <a class="xref" href="impala_literals.html#string_literals">String Literals</a> for quoting and escaping
+ conventions for strings. See <a class="xref" href="impala_operators.html#operators">SQL Operators</a> and
+ <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a> for other things you can include in expressions with the
+ <code class="ph codeph">VALUES</code> clause.
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] > describe val_example;
+Query: describe val_example
+Query finished, fetching results ...
++-------+---------+---------+
+| name | type | comment |
++-------+---------+---------+
+| id | int | |
+| col_1 | boolean | |
+| col_2 | double | |
++-------+---------+---------+
+
+[localhost:21000] > insert into val_example values (1,true,100.0);
+Inserted 1 rows in 0.30s
+[localhost:21000] > select * from val_example;
++----+-------+-------+
+| id | col_1 | col_2 |
++----+-------+-------+
+| 1 | true | 100 |
++----+-------+-------+
+
+[localhost:21000] > insert overwrite val_example values (10,false,pow(2,5)), (50,true,10/3);
+Inserted 2 rows in 0.16s
+[localhost:21000] > select * from val_example;
++----+-------+-------------------+
+| id | col_1 | col_2 |
++----+-------+-------------------+
+| 10 | false | 32 |
+| 50 | true | 3.333333333333333 |
++----+-------+-------------------+</code></pre>
+
+ <p class="p">
+ When used in an <code class="ph codeph">INSERT</code> statement, the Impala <code class="ph codeph">VALUES</code> clause can specify
+ some or all of the columns in the destination table, and the columns can be specified in a different order
+ than they actually appear in the table. To specify a different set or order of columns than in the table,
+ use the syntax:
+ </p>
+
+<pre class="pre codeblock"><code>INSERT INTO <var class="keyword varname">destination</var>
+ (<var class="keyword varname">col_x</var>, <var class="keyword varname">col_y</var>, <var class="keyword varname">col_z</var>)
+ VALUES
+ (<var class="keyword varname">val_x</var>, <var class="keyword varname">val_y</var>, <var class="keyword varname">val_z</var>);
+</code></pre>
+
+ <p class="p">
+ Any columns in the table that are not listed in the <code class="ph codeph">INSERT</code> statement are set to
+ <code class="ph codeph">NULL</code>.
+ </p>
+
+
+
+ <p class="p">
+ To use a <code class="ph codeph">VALUES</code> clause like a table in other statements, wrap it in parentheses and use
+ <code class="ph codeph">AS</code> clauses to specify aliases for the entire object and any columns you need to refer to:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select * from (values(4,5,6),(7,8,9)) as t;
++---+---+---+
+| 4 | 5 | 6 |
++---+---+---+
+| 4 | 5 | 6 |
+| 7 | 8 | 9 |
++---+---+---+
+[localhost:21000] > select * from (values(1 as c1, true as c2, 'abc' as c3),(100,false,'xyz')) as t;
++-----+-------+-----+
+| c1 | c2 | c3 |
++-----+-------+-----+
+| 1 | true | abc |
+| 100 | false | xyz |
++-----+-------+-----+</code></pre>
+
+ <p class="p">
+ For example, you might use a tiny table constructed like this from constant literals or function return
+ values as part of a longer statement involving joins or <code class="ph codeph">UNION ALL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS considerations:</strong>
+ </p>
+
+ <p class="p">
+ Impala physically writes all inserted files under the ownership of its default user, typically
+ <code class="ph codeph">impala</code>. Therefore, this user must have HDFS write permission in the corresponding table
+ directory.
+ </p>
+
+ <p class="p">
+ The permission requirement is independent of the authorization performed by the Sentry framework. (If the
+ connected user is not authorized to insert into a table, Sentry blocks that operation immediately,
+ regardless of the privileges available to the <code class="ph codeph">impala</code> user.) Files created by Impala are
+ not owned by and do not inherit permissions from the connected user.
+ </p>
+
+ <p class="p">
+ The number of data files produced by an <code class="ph codeph">INSERT</code> statement depends on the size of the
+ cluster, the number of data blocks that are processed, the partition key columns in a partitioned table,
+ and the mechanism Impala uses for dividing the work in parallel. Do not assume that an
+ <code class="ph codeph">INSERT</code> statement will produce some particular number of output files. In case of
+ performance issues with data written by Impala, check that the output files do not suffer from issues such
+ as many tiny files or many tiny partitions. (In the Hadoop context, even files or partitions of a few tens
+ of megabytes are considered <span class="q">"tiny"</span>.)
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+ directory of the table. Formerly, this hidden work directory was named
+ <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+ <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+ either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+ supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+ adjust them to use the new name.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+
+ <p class="p">
+ You can use the <code class="ph codeph">INSERT</code> statement with HBase tables as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You can insert a single row or a small set of rows into an HBase table with the <code class="ph codeph">INSERT ...
+ VALUES</code> syntax. This is a good use case for HBase tables with Impala, because HBase tables are
+ not subject to the same kind of fragmentation from many small insert operations as HDFS tables are.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can insert any number of rows at once into an HBase table using the <code class="ph codeph">INSERT ...
+ SELECT</code> syntax.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If more than one inserted row has the same value for the HBase key column, only the last inserted row
+ with that value is visible to Impala queries. You can take advantage of this fact with <code class="ph codeph">INSERT
+ ... VALUES</code> statements to effectively update rows one at a time, by inserting new rows with the
+ same key values as existing rows. Be aware that after an <code class="ph codeph">INSERT ... SELECT</code> operation
+ copying from an HDFS table, the HBase table might contain fewer rows than were inserted, if the key
+ column in the source table contained duplicate values.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You cannot <code class="ph codeph">INSERT OVERWRITE</code> into an HBase table. New rows are always appended.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ When you create an Impala or Hive table that maps to an HBase table, the column order you specify with
+ the <code class="ph codeph">INSERT</code> statement might be different than the order you declare with the
+ <code class="ph codeph">CREATE TABLE</code> statement. Behind the scenes, HBase arranges the columns based on how
+ they are divided into column families. This might cause a mismatch during insert operations, especially
+ if you use the syntax <code class="ph codeph">INSERT INTO <var class="keyword varname">hbase_table</var> SELECT * FROM
+ <var class="keyword varname">hdfs_table</var></code>. Before inserting data, verify the column order by issuing a
+ <code class="ph codeph">DESCRIBE</code> statement for the table, and adjust the order of the select list in the
+ <code class="ph codeph">INSERT</code> statement.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for more details about using Impala with HBase.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Amazon Simple Storage Service (S3).
+ The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+ partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+ </p>
+ <p class="p">
+ Because of differences between S3 and traditional filesystems, DML operations
+ for S3 tables can take longer than for tables on HDFS. For example, both the
+ <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+ to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the files are moved from a temporary staging directory to the final destination directory.)
+ Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+ actually copies the data files from one location to another and then removes the original files.
+ In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+ to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+ that a problem during statement execution could leave data in an inconsistent state.
+ It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+ <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+ <p class="p">
+ <strong class="ph b">ADLS considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Azure Data Lake Store (ADLS).
+ The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+ partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+ </p>
+ <p class="p">See <a class="xref" href="impala_adls.html#adls">Using Impala with the Azure Data Lake Store (ADLS)</a> for details about reading and writing ADLS data with Impala.</p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+ <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+ <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+ in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+ (port 25000).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read
+ permission for the files in the source directory of an <code class="ph codeph">INSERT ... SELECT</code>
+ operation, and write permission for all affected directories in the destination table.
+ (An <code class="ph codeph">INSERT</code> operation could write files to multiple different HDFS directories
+ if the destination table is partitioned.)
+ This user must also have write permission to create a temporary work directory
+ in the top-level HDFS directory of the destination table.
+ An <code class="ph codeph">INSERT OVERWRITE</code> operation does not require write permission on
+ the original data files in the table, only on the table directories themselves.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ For <code class="ph codeph">INSERT</code> operations into <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> columns, you
+ must cast all <code class="ph codeph">STRING</code> literals or expressions returning <code class="ph codeph">STRING</code> to to a
+ <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type with the appropriate length.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related startup options:</strong>
+ </p>
+
+ <p class="p">
+ By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+ table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+ make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+ <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+ </div>
+ </article>
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="insert__partition_insert">
+ <h2 class="title topictitle2" id="ariaid-title3">Inserting Into Partitioned Tables with PARTITION Clause</h2>
+ <div class="body conbody">
+ <p class="p">
+ For a partitioned table, the optional <code class="ph codeph">PARTITION</code> clause
+ identifies which partition or partitions the values are inserted
+ into.
+ </p>
+ <p class="p">
+ All examples in this section will use the table declared as below:
+ </p>
+<pre class="pre codeblock"><code>CREATE TABLE t1 (w INT) PARTITIONED BY (x INT, y STRING);</code></pre>
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="partition_insert__static_partition_insert">
+ <h3 class="title topictitle3" id="ariaid-title4">Static Partition Inserts</h3>
+ <div class="body conbody">
+ <p class="p">
+ In a static partition insert where a partition key column is given a
+ constant value, such as <code class="ph codeph">PARTITION</code>
+ <code class="ph codeph">(year=2012, month=2)</code>, the rows are inserted with the
+ same values specified for those partition key columns.
+ </p>
+ <p class="p">
+ The number of columns in the <code class="ph codeph">SELECT</code> list must equal
+ the number of columns in the column permutation.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">PARTITION</code> clause must be used for static
+ partitioning inserts.
+ </p>
+ <p class="p">
+ Example:
+ </p>
+ <div class="p">
+ The following statement will insert the
+ <code class="ph codeph">some_other_table.c1</code> values for the
+ <code class="ph codeph">w</code> column, and all the rows inserted will have the
+ same <code class="ph codeph">x</code> value of <code class="ph codeph">10</code>, and the same
+ <code class="ph codeph">y</code> value of
+ <code class="ph codeph">‘a’</code>.<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=10, y='a')
+ SELECT c1 FROM some_other_table;</code></pre>
+ </div>
+ </div>
+ </article>
+ <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="partition_insert__dynamic_partition_insert">
+ <h3 class="title topictitle3" id="ariaid-title5">Dynamic Partition Inserts</h3>
+ <div class="body conbody">
+ <p class="p">
+ In a dynamic partition insert where a partition key
+ column is in the <code class="ph codeph">INSERT</code> statement but not assigned a
+ value, such as in <code class="ph codeph">PARTITION (year, region)</code>(both
+ columns unassigned) or <code class="ph codeph">PARTITION(year, region='CA')</code>
+ (<code class="ph codeph">year</code> column unassigned), the unassigned columns
+ are filled in with the final columns of the <code class="ph codeph">SELECT</code> or
+ <code class="ph codeph">VALUES</code> clause. In this case, the number of columns
+ in the <code class="ph codeph">SELECT</code> list must equal the number of columns
+ in the column permutation plus the number of partition key columns not
+ assigned a constant value.
+ </p>
+ <p class="p">
+ See <a class="xref" href="https://www.cloudera.com/documentation/enterprise/latest/topics/impala_partitioning.html#partition_static_dynamic" target="_blank"><u class="ph u">Static and Dynamic Partitioning
+ Clauses</u></a> for examples and performance characteristics
+ of static and dynamic partitioned inserts.
+ </p>
+ <p class="p">
+ The following rules apply to dynamic partition
+ inserts.
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The columns are bound in the order they appear in the
+ <code class="ph codeph">INSERT</code> statement.
+ </p>
+ <p class="p">
+ The table below shows the values inserted with the
+ <code class="ph codeph">INSERT</code> statements of different column
+ orders.
+ </p>
+ </li>
+ </ul>
+ <table class="table frame-all" id="dynamic_partition_insert__table_vyx_dp3_ldb"><caption></caption><colgroup><col><col><col><col></colgroup><tbody class="tbody">
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">w</code> Value</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">x</code> Value</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">Column <code class="ph codeph">y</code> Value</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">INSERT INTO t1 (w, x, y) VALUES (1, 2,
+ 'c');</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">1</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">2</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">‘c’</code></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">INSERT INTO t1 (x,w) PARTITION (y) VALUES (1,
+ 2, 'c');</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">2</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">1</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">‘c’</code></td>
+ </tr>
+ </tbody></table>
+ <ul class="ul">
+ <li class="li">
+ When a partition clause is specified but the non-partition
+ columns are not specified in the <code class="ph codeph">INSERT</code> statement,
+ as in the first example below, the non-partition columns are treated
+ as though they had been specified before the
+ <code class="ph codeph">PARTITION</code> clause in the SQL.
+ <p class="p">
+ Example: These
+ three statements are equivalent, inserting <code class="ph codeph">1</code> to
+ <code class="ph codeph">w</code>, <code class="ph codeph">2</code> to <code class="ph codeph">x</code>,
+ and <code class="ph codeph">‘c’</code> to <code class="ph codeph">y</code>
+ columns.
+ </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 (w) PARTITION (x, y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 PARTITION (x, y='c') VALUES (1, 2);</code></pre>
+ </li>
+ <li class="li">
+ The <code class="ph codeph">PARTITION</code> clause is not required for
+ dynamic partition, but all the partition columns must be explicitly
+ present in the <code class="ph codeph">INSERT</code> statement in the column list
+ or in the <code class="ph codeph">PARTITION</code> clause. The partition columns
+ cannot be defaulted to <code class="ph codeph">NULL</code>.
+ <p class="p">
+ Example:
+ </p>
+ <p class="p">The following statements are valid because the partition
+ columns, <code class="ph codeph">x</code> and <code class="ph codeph">y</code>, are present in
+ the <code class="ph codeph">INSERT</code> statements, either in the
+ <code class="ph codeph">PARTITION</code> clause or in the column
+ list.
+ </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x,y) VALUES (1, 2, ‘c’);
+INSERT INTO t1 (w, x) PARTITION (y) VALUES (1, 2, ‘c’);</code></pre>
+ <p class="p">
+ The following statement is not valid for the partitioned table as
+ defined above because the partition columns, <code class="ph codeph">x</code>
+ and <code class="ph codeph">y</code>, are not present in the
+ <code class="ph codeph">INSERT</code> statement.
+ </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 VALUES (1, 2, 'c');</code></pre>
+ </li>
+ <li class="li">
+ If partition columns do not exist in the source table, you can
+ specify a specific value for that column in the
+ <code class="ph codeph">PARTITION</code> clause.
+ <p class="p">
+ Example: The <code class="ph codeph">source</code> table only contains the column
+ <code class="ph codeph">w</code> and <code class="ph codeph">y</code>. The value,
+ <code class="ph codeph">20</code>, specified in the <code class="ph codeph">PARTITION</code>
+ clause, is inserted into the <code class="ph codeph">x</code> column.
+ </p>
+<pre class="pre codeblock"><code>INSERT INTO t1 PARTITION (x=20, y) SELECT * FROM source;</code></pre>
+ </li>
+ </ul>
+ </div>
+ </article>
+ </article>
+ </article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_install.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_install.html b/docs/build3x/html/topics/impala_install.html
new file mode 100644
index 0000000..9071134
--- /dev/null
+++ b/docs/build3x/html/topics/impala_install.html
@@ -0,0 +1,126 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installing Impala</title></head><body id="install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Installing Impala</span></h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+ Impala is an open-source analytic database for Apache Hadoop
+ that returns rapid responses to queries.
+ </p>
+
+ <p class="p">
+ Follow these steps to set up Impala on a cluster by building from source:
+ </p>
+
+
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Download the latest release. See
+ <a class="xref" href="http://impala.apache.org/downloads.html" target="_blank">the Impala downloads page</a>
+ for the link to the latest release.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Check the <span class="ph filepath">README.md</span> file for a pointer
+ to the build instructions.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Please check the MD5 and SHA1 and GPG signature, the latter by using the code signing keys of the release managers.
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ Developers interested in working on Impala can clone the Impala source repository:
+<pre class="pre codeblock"><code>
+git clone https://git-wip-us.apache.org/repos/asf/impala.git
+</code></pre>
+ </div>
+ </li>
+ </ul>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="install__install_details">
+
+ <h2 class="title topictitle2" id="ariaid-title2">What is Included in an Impala Installation</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala is made up of a set of components that can be installed on multiple nodes throughout your cluster.
+ The key installation step for performance is to install the <span class="keyword cmdname">impalad</span> daemon (which does
+ most of the query processing work) on <em class="ph i">all</em> DataNodes in the cluster.
+ </p>
+
+ <p class="p">
+ Impala primarily consists of these executables, which should be available after you build from source:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <span class="keyword cmdname">impalad</span> - The Impala daemon. Plans and executes queries against HDFS, HBase, <span class="ph">and Amazon S3 data</span>.
+ <a class="xref" href="impala_processes.html#processes">Run one impalad process</a> on each node in the cluster
+ that has a DataNode.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <span class="keyword cmdname">statestored</span> - Name service that tracks location and status of all
+ <code class="ph codeph">impalad</code> instances in the cluster. <a class="xref" href="impala_processes.html#processes">Run one
+ instance of this daemon</a> on a node in your cluster. Most production deployments run this daemon
+ on the namenode.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <span class="keyword cmdname">catalogd</span> - Metadata coordination service that broadcasts changes from Impala DDL and
+ DML statements to all affected Impala nodes, so that new tables, newly loaded data, and so on are
+ immediately visible to queries submitted through any Impala node.
+
+ (Prior to Impala 1.2, you had to run the <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+ METADATA</code> statement on each node to synchronize changed metadata. Now those statements are only
+ required if you perform the DDL or DML through an external mechanism such as Hive <span class="ph">or by uploading
+ data to the Amazon S3 filesystem</span>.)
+ <a class="xref" href="impala_processes.html#processes">Run one instance of this daemon</a> on a node in your cluster,
+ preferably on the same host as the <code class="ph codeph">statestored</code> daemon.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <span class="keyword cmdname">impala-shell</span> - <a class="xref" href="impala_impala_shell.html#impala_shell">Command-line
+ interface</a> for issuing queries to the Impala daemon. You install this on one or more hosts
+ anywhere on your network, not necessarily DataNodes or even within the same cluster as Impala. It can
+ connect remotely to any instance of the Impala daemon.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ Before starting working with Impala, ensure that you have all necessary prerequisites. See
+ <a class="xref" href="impala_prereqs.html#prereqs">Impala Requirements</a> for details.
+ </p>
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_int.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_int.html b/docs/build3x/html/topics/impala_int.html
new file mode 100644
index 0000000..44f4ee1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_int.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="int"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INT Data Type</title></head><body id="int"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">INT Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A 4-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> INT</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> -2147483648 .. 2147483647. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">BIGINT</code>) or a
+ floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>) automatically. Use
+ <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+ <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+ <span class="ph">
+ Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The data type <code class="ph codeph">INTEGER</code> is an alias for <code class="ph codeph">INT</code>.
+ </p>
+
+ <p class="p">
+ For a convenient and automated way to check the bounds of the <code class="ph codeph">INT</code> type, call the functions
+ <code class="ph codeph">MIN_INT()</code> and <code class="ph codeph">MAX_INT()</code>.
+ </p>
+
+ <p class="p">
+ If an integer value is too large to be represented as a <code class="ph codeph">INT</code>, use a <code class="ph codeph">BIGINT</code>
+ instead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+ value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x INT);
+SELECT CAST(1000 AS INT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+ type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+ <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_intro.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_intro.html b/docs/build3x/html/topics/impala_intro.html
new file mode 100644
index 0000000..99c24b3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_intro.html
@@ -0,0 +1,198 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Introducing Apache Impala</title></head><body id="intro"><main role="main"><article role="article" aria-labelledby="intro__impala">
+
+ <h1 class="title topictitle1" id="intro__impala"><span class="ph">Introducing Apache Impala</span></h1>
+
+
+ <div class="body conbody" id="intro__intro_body">
+
+ <p class="p">
+ Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS,
+ HBase, <span class="ph">or the Amazon Simple Storage Service (S3)</span>.
+ In addition to using the same unified storage platform,
+ Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface
+ (Impala query UI in Hue) as Apache Hive. This
+ provides a familiar and unified platform for real-time or batch-oriented queries.
+ </p>
+
+ <p class="p">
+ Impala is an addition to tools available for querying big data. Impala does not replace the batch
+ processing frameworks built on MapReduce such as Hive. Hive and other frameworks built on MapReduce are
+ best suited for long running batch jobs, such as those involving batch processing of Extract, Transform,
+ and Load (ETL) type jobs.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Impala graduated from the Apache Incubator on November 15, 2017.
+ In places where the documentation formerly referred to <span class="q">"Cloudera Impala"</span>,
+ now the official name is <span class="q">"Apache Impala"</span>.
+ </div>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro__benefits">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Impala Benefits</h2>
+
+ <div class="body conbody">
+
+ <div class="p">
+ Impala provides:
+
+ <ul class="ul">
+ <li class="li">
+ Familiar SQL interface that data scientists and analysts already know.
+ </li>
+
+ <li class="li">
+ Ability to query high volumes of data (<span class="q">"big data"</span>) in Apache Hadoop.
+ </li>
+
+ <li class="li">
+ Distributed queries in a cluster environment, for convenient scaling and to make use of cost-effective
+ commodity hardware.
+ </li>
+
+ <li class="li">
+ Ability to share data files between different components with no copy or export/import step; for example,
+ to write with Pig, transform with Hive and query with Impala. Impala can read from and write to Hive
+ tables, enabling simple data interchange using Impala for analytics on Hive-produced data.
+ </li>
+
+ <li class="li">
+ Single system for big data processing and analytics, so customers can avoid costly modeling and ETL just
+ for analytics.
+ </li>
+ </ul>
+ </div>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro__impala_hadoop">
+
+ <h2 class="title topictitle2" id="ariaid-title3">How Impala Works with <span class="keyword">Apache Hadoop</span></h2>
+
+
+ <div class="body conbody">
+
+
+
+ <div class="p">
+ The Impala solution is composed of the following components:
+ <ul class="ul">
+ <li class="li">
+ Clients - Entities including Hue, ODBC clients, JDBC clients, and the Impala Shell can all interact
+ with Impala. These interfaces are typically used to issue queries or complete administrative tasks such
+ as connecting to Impala.
+ </li>
+
+ <li class="li">
+ Hive Metastore - Stores information about the data available to Impala. For example, the metastore lets
+ Impala know what databases are available and what the structure of those databases is. As you create,
+ drop, and alter schema objects, load data into tables, and so on through Impala SQL statements, the
+ relevant metadata changes are automatically broadcast to all Impala nodes by the dedicated catalog
+ service introduced in Impala 1.2.
+ </li>
+
+ <li class="li">
+ Impala - This process, which runs on DataNodes, coordinates and executes queries. Each
+ instance of Impala can receive, plan, and coordinate queries from Impala clients. Queries are
+ distributed among Impala nodes, and these nodes then act as workers, executing parallel query
+ fragments.
+ </li>
+
+ <li class="li">
+ HBase and HDFS - Storage for data to be queried.
+ </li>
+ </ul>
+ </div>
+
+ <div class="p">
+ Queries executed using Impala are handled as follows:
+ <ol class="ol">
+ <li class="li">
+ User applications send SQL queries to Impala through ODBC or JDBC, which provide standardized querying
+ interfaces. The user application may connect to any <code class="ph codeph">impalad</code> in the cluster. This
+ <code class="ph codeph">impalad</code> becomes the coordinator for the query.
+ </li>
+
+ <li class="li">
+ Impala parses the query and analyzes it to determine what tasks need to be performed by
+ <code class="ph codeph">impalad</code> instances across the cluster. Execution is planned for optimal efficiency.
+ </li>
+
+ <li class="li">
+ Services such as HDFS and HBase are accessed by local <code class="ph codeph">impalad</code> instances to provide
+ data.
+ </li>
+
+ <li class="li">
+ Each <code class="ph codeph">impalad</code> returns data to the coordinating <code class="ph codeph">impalad</code>, which sends
+ these results to the client.
+ </li>
+ </ol>
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro__features">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Primary Impala Features</h2>
+
+ <div class="body conbody">
+
+ <div class="p">
+ Impala provides support for:
+ <ul class="ul">
+ <li class="li">
+ Most common SQL-92 features of Hive Query Language (HiveQL) including
+ <a class="xref" href="../shared/../topics/impala_select.html#select">SELECT</a>,
+ <a class="xref" href="../shared/../topics/impala_joins.html#joins">joins</a>, and
+ <a class="xref" href="../shared/../topics/impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>.
+ </li>
+
+ <li class="li">
+ HDFS, HBase, <span class="ph">and Amazon Simple Storage System (S3)</span> storage, including:
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="../shared/../topics/impala_file_formats.html#file_formats">HDFS file formats</a>: delimited text files, Parquet,
+ Avro, SequenceFile, and RCFile.
+ </li>
+
+ <li class="li">
+ Compression codecs: Snappy, GZIP, Deflate, BZIP.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Common data access interfaces including:
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="../shared/../topics/impala_jdbc.html#impala_jdbc">JDBC driver</a>.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="../shared/../topics/impala_odbc.html#impala_odbc">ODBC driver</a>.
+ </li>
+
+ <li class="li">
+ Hue Beeswax and the Impala Query UI.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="../shared/../topics/impala_impala_shell.html#impala_shell">impala-shell command-line interface</a>.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="../shared/../topics/impala_security.html#security">Kerberos authentication</a>.
+ </li>
+ </ul>
+ </div>
+ </div>
+ </article>
+</article></main></body></html>
[38/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_datatypes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_datatypes.html b/docs/build3x/html/topics/impala_datatypes.html
new file mode 100644
index 0000000..45bc6fc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_datatypes.html
@@ -0,0 +1,33 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_array.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bigint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_boolean.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_char.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_double.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_float.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_int.html"><meta name="DC.Relation" scheme="URI" content=".
./topics/impala_map.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_real.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_smallint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_struct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timestamp.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tinyint.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_varchar.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_complex_types.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datatypes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Data Types</title></h
ead><body id="datatypes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Data Types</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports a set of data types that you can use for table columns, expression values, and function
+ arguments and return values.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Currently, Impala supports only scalar types, not composite or nested types. Accessing a table containing any
+ columns with unsupported types causes an error.
+ </div>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ For the notation to write literals of each of these data types, see
+ <a class="xref" href="impala_literals.html#literals">Literals</a>.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a> for differences between Impala and
+ Hive data types.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_array.html">ARRAY Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bigint.html">BIGINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_boolean.html">BOOLEAN Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_char.html">CHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal.html">DECIMAL Data Type (Impala 3.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_double.html">DOUBLE Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_float.html">FLOAT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../to
pics/impala_int.html">INT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_map.html">MAP Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_real.html">REAL Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_smallint.html">SMALLINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string.html">STRING Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_struct.html">STRUCT Complex Type (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timestamp.html">TIMESTAMP Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tinyint.html">TINYINT Data Type</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_varchar.html">V
ARCHAR Data Type (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_complex_types.html">Complex Types (Impala 2.3 or higher only)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
[36/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ddl.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ddl.html b/docs/build3x/html/topics/impala_ddl.html
new file mode 100644
index 0000000..e9737bf
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ddl.html
@@ -0,0 +1,141 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ddl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DDL Statements</title></head><body id="ddl"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DDL Statements</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ DDL refers to <span class="q">"Data Definition Language"</span>, a subset of SQL statements that change the structure of the
+ database schema in some way, typically by creating, deleting, or modifying schema objects such as databases,
+ tables, and views. Most Impala DDL statements start with the keywords <code class="ph codeph">CREATE</code>,
+ <code class="ph codeph">DROP</code>, or <code class="ph codeph">ALTER</code>.
+ </p>
+
+ <p class="p">
+ The Impala DDL statements are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>
+ </li>
+ </ul>
+
+ <p class="p">
+ After Impala executes a DDL command, information about available tables, columns, views, partitions, and so
+ on is automatically synchronized between all the Impala nodes in a cluster. (Prior to Impala 1.2, you had to
+ issue a <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> statement manually on the other
+ nodes to make them aware of the changes.)
+ </p>
+
+ <p class="p">
+ If the timing of metadata updates is significant, for example if you use round-robin scheduling where each
+ query could be issued through a different Impala node, you can enable the
+ <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option to make the DDL statement wait until
+ all nodes have been notified about the metadata changes.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about how Impala DDL statements interact with
+ tables and partitions stored in the Amazon S3 filesystem.
+ </p>
+
+ <p class="p">
+ Although the <code class="ph codeph">INSERT</code> statement is officially classified as a DML (data manipulation language)
+ statement, it also involves metadata changes that must be broadcast to all Impala nodes, and so is also
+ affected by the <code class="ph codeph">SYNC_DDL</code> query option.
+ </p>
+
+ <p class="p">
+ Because the <code class="ph codeph">SYNC_DDL</code> query option makes each DDL operation take longer than normal, you
+ might only enable it before the last DDL operation in a sequence. For example, if you are running a script
+ that issues multiple of DDL operations to set up an entire new schema, add several new partitions, and so on,
+ you might minimize the performance overhead by enabling the query option only before the last
+ <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code>, <code class="ph codeph">ALTER</code>, or <code class="ph codeph">INSERT</code> statement.
+ The script only finishes when all the relevant metadata changes are recognized by all the Impala nodes, so
+ you could connect to any node and issue queries through it.
+ </p>
+
+ <p class="p">
+ The classification of DDL, DML, and other statements is not necessarily the same between Impala and Hive.
+ Impala organizes these statements in a way intended to be familiar to people familiar with relational
+ databases or data warehouse products. Statements that modify the metastore database, such as <code class="ph codeph">COMPUTE
+ STATS</code>, are classified as DDL. Statements that only query the metastore database, such as
+ <code class="ph codeph">SHOW</code> or <code class="ph codeph">DESCRIBE</code>, are put into a separate category of utility statements.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The query types shown in the Impala debug web user interface might not match exactly the categories listed
+ here. For example, currently the <code class="ph codeph">USE</code> statement is shown as DDL in the debug web UI. The
+ query types shown in the debug web UI are subject to change, for improved consistency.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The other major classifications of SQL statements are data manipulation language (see
+ <a class="xref" href="impala_dml.html#dml">DML Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_debug_action.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_debug_action.html b/docs/build3x/html/topics/impala_debug_action.html
new file mode 100644
index 0000000..f39c89f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_debug_action.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="debug_action"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEBUG_ACTION Query Option</title></head><body id="debug_action"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DEBUG_ACTION Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Introduces artificial problem conditions within queries. For internal debugging and troubleshooting.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> empty string
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_decimal.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_decimal.html b/docs/build3x/html/topics/impala_decimal.html
new file mode 100644
index 0000000..3c0a917
--- /dev/null
+++ b/docs/build3x/html/topics/impala_decimal.html
@@ -0,0 +1,907 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL Data Type (Impala 3.0 or higher only)</title></head><body id="decimal"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DECIMAL Data Type (<span class="keyword">Impala 3.0</span> or higher only)</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">DECIMAL</code> data type is a numeric data type with fixed scale and
+ precision.
+ </p>
+
+ <p class="p">
+ The data type is useful for storing and doing operations on precise decimal values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DECIMAL[(<var class="keyword varname">precision</var>[, <var class="keyword varname">scale</var>])]</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Precision:</strong>
+ </p>
+
+ <p class="p">
+ <var class="keyword varname">precision</var> represents the total number of digits that can be represented
+ regardless of the location of the decimal point.
+ </p>
+
+ <p class="p">
+ This value must be between 1 and 38, specified as an integer literal.
+ </p>
+
+ <p class="p">
+ The default precision is 9.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Scale:</strong>
+ </p>
+
+ <p class="p">
+ <var class="keyword varname">scale</var> represents the number of fractional digits.
+ </p>
+
+ <p class="p">
+ This value must be less than or equal to the precision, specified as an integer literal.
+ </p>
+
+ <p class="p">
+ The default scale is 0.
+ </p>
+
+ <p class="p">
+ When the precision and the scale are omitted, a <code class="ph codeph">DECIMAL</code> is treated as
+ <code class="ph codeph">DECIMAL(9, 0)</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong>
+ </p>
+
+ <p class="p">
+ The range of <code class="ph codeph">DECIMAL</code> type is -10^38 +1 through 10^38 –1.
+ </p>
+
+ <p class="p">
+ The largest value is represented by <code class="ph codeph">DECIMAL(38, 0)</code>.
+ </p>
+
+ <p class="p">
+ The most precise fractional value (between 0 and 1, or 0 and -1) is represented by
+ <code class="ph codeph">DECIMAL(38, 38)</code>, with 38 digits to the right of the decimal point. The
+ value closest to 0 would be .0000...1 (37 zeros and the final 1). The value closest to 1
+ would be .999... (9 repeated 38 times).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Memory and disk storage:</strong>
+ </p>
+
+ <p class="p">
+ Only the precision determines the storage size for <code class="ph codeph">DECIMAL</code> values, and
+ the scale setting has no effect on the storage size. The following table describes the
+ in-memory storage once the values are loaded into memory.
+ </p>
+
+ <div class="p">
+ <table class="simpletable frame-all" id="decimal__simpletable_tty_3y2_mdb"><col style="width:50%"><col style="width:50%"><thead><tr class="sthead">
+
+ <th class="stentry" id="decimal__simpletable_tty_3y2_mdb__stentry__1">Precision</th>
+
+ <th class="stentry" id="decimal__simpletable_tty_3y2_mdb__stentry__2">In-memory Storage</th>
+
+ </tr></thead><tbody><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">1 - 9</td>
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">4 bytes</td>
+
+ </tr><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">10 - 18</td>
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">8 bytes</td>
+
+ </tr><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__1">19 - 38</td>
+
+ <td class="stentry" headers="decimal__simpletable_tty_3y2_mdb__stentry__2">16 bytes</td>
+
+ </tr></tbody></table>
+ </div>
+
+ <p class="p">
+ The on-disk representation varies depending on the file format of the table.
+ </p>
+
+ <p class="p">
+ Text, RCFile, and SequenceFile tables use ASCII-based formats as below:
+ </p>
+
+ <div class="p">
+ <ul class="ul">
+ <li class="li">
+ Leading zeros are not stored.
+ </li>
+
+ <li class="li">
+ Trailing zeros are stored.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Each <code class="ph codeph">DECIMAL</code> value takes up as many bytes as the precision of the
+ value, plus:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ One extra byte if the decimal point is present.
+ </li>
+
+ <li class="li">
+ One extra byte for negative values.
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ Parquet and Avro tables use binary formats and offer more compact storage for
+ <code class="ph codeph">DECIMAL</code> values. In these tables, Impala stores each value in fewer bytes
+ where possible depending on the precision specified for the <code class="ph codeph">DECIMAL</code>
+ column. To conserve space in large tables, use the smallest-precision
+ <code class="ph codeph">DECIMAL</code> type.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Precision and scale in arithmetic operations:</strong>
+ </p>
+
+ <p class="p">
+ For all arithmetic operations, the resulting precision is at most 38.
+ </p>
+
+ <p class="p">
+ If the resulting precision would be greater than 38, Impala truncates the result from the
+ back, but keeps at least 6 fractional digits in scale and rounds.
+ </p>
+
+ <p class="p">
+ For example, <code class="ph codeph">DECIMAL(38, 20) * DECIMAL(38, 20)</code> returns
+ <code class="ph codeph">DECIMAL(38, 6)</code>. According to the table below, the resulting precision and
+ scale would be <code class="ph codeph">(77, 40)</code>, but they are higher than the maximum precision
+ and scale for <code class="ph codeph">DECIMAL</code>. So, Impala sets the precision to the maximum
+ allowed 38, and truncates the scale to 6.
+ </p>
+
+ <div class="p">
+ When you use <code class="ph codeph">DECIMAL</code> values in arithmetic operations, the precision and
+ scale of the result value are determined as follows. For better readability, the following
+ terms are used in the table below:
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ P1, P2: Input precisions
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ S1, S2: Input scales
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ L1, L2: Leading digits in input <code class="ph codeph">DECIMAL</code>s, i.e., L1 = P1 - S1 and L2
+ = P2 - S2
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <div class="p">
+ <table class="table frame-all" id="decimal__table_inl_sz2_mdb"><caption></caption><colgroup><col><col><col></colgroup><tbody class="tbody">
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ <strong class="ph b">Operation</strong>
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ <strong class="ph b">Resulting Precision</strong>
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ <strong class="ph b">Resulting Scale</strong>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ Addition and Subtraction
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ <p class="p">
+ max (L1, L2) + max (S1, S2) + 1
+ </p>
+
+
+
+ <p class="p">
+ 1 is for carry-over.
+ </p>
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ max (S1, S2)
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ Multiplication
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ P1 + P2 + 1
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ S1 + S2
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ Division
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ L1 + S2 + max (S1 + P2 + 1, 6)
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ max (S1 + P2 + 1, 6)
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ Modulo
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ min (L1, L2) + max (S1, S2)
+ </td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">
+ max (S1, S2)
+ </td>
+ </tr>
+ </tbody></table>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Precision and scale in functions:</strong>
+ </p>
+
+ <div class="p">
+ When you use <code class="ph codeph">DECIMAL</code> values in built-in functions, the precision and
+ scale of the result value are determined as follows:
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ The result of the <code class="ph codeph">SUM</code> aggregate function on a
+ <code class="ph codeph">DECIMAL</code> value is:
+ <ul class="ul">
+ <li class="li">
+ <p dir="ltr" class="p">
+ Precision: 38
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ Scale: The same scale as the input column
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ The result of <code class="ph codeph">AVG</code> aggregate function on a <code class="ph codeph">DECIMAL</code>
+ value is:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p dir="ltr" class="p">
+ Precision: 38
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ Scale: max(Scale of input column, 6)
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Implicit conversions in DECIMAL assignments:</strong>
+ </p>
+
+ <p class="p">
+ Impala enforces strict conversion rules in decimal assignments like in
+ <code class="ph codeph">INSERT</code> and <code class="ph codeph">UNION</code> statements, or in functions like
+ <code class="ph codeph">COALESCE</code>.
+ </p>
+
+ <p class="p">
+ If there is not enough precision and scale in the destination, Impala fails with an error.
+ </p>
+
+ <div class="p">
+ Impala performs implicit conversions between <code class="ph codeph">DECIMAL</code> and other numeric
+ types as below:
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">DECIMAL</code> is implicitly converted to <code class="ph codeph">DOUBLE</code> or
+ <code class="ph codeph">FLOAT</code> when necessary even with a loss of precision. It can be
+ necessary, for example when inserting a <code class="ph codeph">DECIMAL</code> value into a
+ <code class="ph codeph">DOUBLE</code> column. For example:
+<pre class="pre codeblock"><code>CREATE TABLE flt(c FLOAT);
+INSERT INTO flt SELECT CAST(1e37 AS DECIMAL(38, 0));
+SELECT CAST(c AS DECIMAL(38, 0)) FROM flt;
+
+Result: 9999999933815812510711506376257961984</code></pre>
+ <p dir="ltr" class="p">
+ The result has a loss of information due to implicit casting. This is why we
+ discourage using the <code class="ph codeph">DOUBLE</code> and <code class="ph codeph">FLOAT</code> types in
+ general.
+ </p>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DOUBLE</code> and <code class="ph codeph">FLOAT</code> cannot be implicitly converted to
+ <code class="ph codeph">DECIMAL</code>. An error is returned.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DECIMAL</code> is implicitly converted to <code class="ph codeph">DECIMAL</code> if all
+ digits fit in the resulting <code class="ph codeph">DECIMAL</code>.
+ <div class="p">
+ For example, the following query returns an error because the resulting type that
+ guarantees that all digits fit cannot be determined .
+<pre class="pre codeblock"><code>SELECT GREATEST (CAST(1 AS DECIMAL(38, 0)), CAST(2 AS DECIMAL(38, 37)));</code></pre>
+ </div>
+ </li>
+
+ <li class="li">
+ Integer values can be implicitly converted to <code class="ph codeph">DECIMAL</code> when there is
+ enough room in the <code class="ph codeph">DECIMAL</code> to guarantee that all digits fit. The
+ integer types require the following numbers of digits to the left of the decimal point
+ when converted to <code class="ph codeph">DECIMAL</code>:
+ <ul class="ul">
+ <li class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">BIGINT</code>: 19 digits
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">INT</code>: 10 digits
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">SMALLINT</code>: 5 digits
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">TINYINT</code>: 3 digits
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ For example:
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>CREATE TABLE decimals_10_8 (x DECIMAL(10, 8));
+INSERT INTO decimals_10_8 VALUES (CAST(1 AS TINYINT));</code></pre>
+ </div>
+
+ <p class="p">
+ The above <code class="ph codeph">INSERT</code> statement fails because <code class="ph codeph">TINYINT</code>
+ requires room for 3 digits to the left of the decimal point in the
+ <code class="ph codeph">DECIMAL</code>.
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>CREATE TABLE decimals_11_8(x DECIMAL(11, 8));
+INSERT INTO decimals_11_8 VALUES (CAST(1 AS TINYINT));</code></pre>
+ </div>
+
+ <p class="p">
+ The above <code class="ph codeph">INSERT</code> statement succeeds because there is enough room
+ for 3 digits to the left of the decimal point that <code class="ph codeph">TINYINT</code>
+ requires.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <div class="p">
+ In <code class="ph codeph">UNION</code>, the resulting precision and scales are determined as follows.
+ <ul class="ul">
+ <li class="li">
+ Precision: max (L1, L2) + max (S1, S2)
+ <p class="p">
+ If the resulting type does not fit in the <code class="ph codeph">DECIMAL</code> type, an error is
+ returned. See the first example below.
+ </p>
+ </li>
+
+ <li class="li">
+ Scale: max (S1, S2)
+ </li>
+ </ul>
+ </div>
+
+ <div class="p">
+ Examples for <code class="ph codeph">UNION</code>:
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">DECIMAL(20, 0) UNION DECIMAL(20, 20)</code> would require a
+ <code class="ph codeph">DECIMAL(40, 20)</code> to fit all the digits. Since this is larger than the
+ max precision for <code class="ph codeph">DECIMAL</code>, Impala returns an error. One way to fix
+ the error is to cast both operands to the desired type, for example
+ <code class="ph codeph">DECIMAL(38, 18)</code>.
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">DECIMAL(20, 2) UNION DECIMAL(8, 6)</code> returns <code class="ph codeph">DECIMAL(24,
+ 6)</code>.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ <code class="ph codeph">INT UNION DECIMAL(9, 4)</code> returns <code class="ph codeph">DECIMAL(14, 4)</code>.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">INT</code> has the precision 10 and the scale 0, so it is treated as
+ <code class="ph codeph">DECIMAL(10, 0) UNION DECIMAL(9. 4)</code>.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Casting between DECIMAL and other data types:</strong>
+ </p>
+
+ <div class="p">
+ To avoid potential conversion errors, use <code class="ph codeph">CAST</code> to explicitly convert
+ between <code class="ph codeph">DECIMAL</code> and other types in decimal assignments like in
+ <code class="ph codeph">INSERT</code> and <code class="ph codeph">UNION</code> statements, or in functions like
+ <code class="ph codeph">COALESCE</code>:
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ You can cast the following types to <code class="ph codeph">DECIMAL</code>:
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+ <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ You can cast <code class="ph codeph">DECIMAL</code> to the following types:
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+ <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>,
+ <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">TIMESTAMP</code>
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <div class="p">
+ Impala performs <code class="ph codeph">CAST</code> between <code class="ph codeph">DECIMAL</code> and other numeric
+ types as below:
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Precision: If you cast a value with bigger precision than the precision of the
+ destination type, Impala returns an error. For example, <code class="ph codeph">CAST(123456 AS
+ DECIMAL(3,0))</code> returns an error because all digits do not fit into
+ <code class="ph codeph">DECIMAL(3, 0)</code>
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Scale: If you cast a value with more fractional digits than the scale of the
+ destination type, the fractional digits are rounded. For example, <code class="ph codeph">CAST(1.239
+ AS DECIMAL(3, 2))</code> returns <code class="ph codeph">1.24</code>.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Casting STRING to DECIMAL:</strong>
+ </p>
+
+ <div class="p">
+ You can cast <code class="ph codeph">STRING</code> of numeric characters in columns, literals, or
+ expressions to <code class="ph codeph">DECIMAL</code> as long as number fits within the specified target
+ <code class="ph codeph">DECIMAL</code> type without overflow.
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ If scale in <code class="ph codeph">STRING</code> > scale in <code class="ph codeph">DECIMAL</code>, the
+ fractional digits are rounded to the <code class="ph codeph">DECIMAL</code> scale.
+ </p>
+
+ <p dir="ltr" class="p">
+ For example, <code class="ph codeph">CAST('98.678912' AS DECIMAL(15, 1))</code> returns
+ <code class="ph codeph">98.7</code>.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ If # leading digits in <code class="ph codeph">STRING</code> > # leading digits in
+ <code class="ph codeph">DECIMAL</code>, an error is returned.
+ </p>
+
+ <p dir="ltr" class="p">
+ For example, <code class="ph codeph">CAST('123.45' AS DECIMAL(2, 2))</code> returns an error.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ Exponential notation is supported when casting from <code class="ph codeph">STRING</code>.
+ </p>
+
+ <p class="p">
+ For example, <code class="ph codeph">CAST('1.0e6' AS DECIMAL(32, 0))</code> returns
+ <code class="ph codeph">1000000</code>.
+ </p>
+
+ <p class="p">
+ Casting any non-numeric value, such as <code class="ph codeph">'ABC'</code> to the
+ <code class="ph codeph">DECIMAL</code> type returns an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Casting DECIMAL to TIMESTAMP:</strong>
+ </p>
+
+ <p class="p">
+ Casting a <code class="ph codeph">DECIMAL</code> value N to <code class="ph codeph">TIMESTAMP</code> produces a value
+ that is N seconds past the start of the epoch date (January 1, 1970).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">DECIMAL vs FLOAT consideration:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> types can cause problems or
+ unexpected behavior due to inability to precisely represent certain fractional values, for
+ example dollar and cents values for currency. You might find output values slightly
+ different than you inserted, equality tests that do not match precisely, or unexpected
+ values for <code class="ph codeph">GROUP BY</code> columns. The <code class="ph codeph">DECIMAL</code> type can help
+ reduce unexpected behavior and rounding errors, but at the expense of some performance
+ overhead for assignments and comparisons.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Literals and expressions:</strong>
+ </p>
+
+ <div class="p">
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Numeric literals without a decimal point
+ </p>
+ <ul class="ul">
+ <li class="li">
+ The literals are treated as the smallest integer that would fit the literal. For
+ example, <code class="ph codeph">111</code> is a <code class="ph codeph">TINYINT</code>, and
+ <code class="ph codeph">1111</code> is a <code class="ph codeph">SMALLINT</code>.
+ </li>
+
+ <li class="li">
+ Large literals that do not fit into any integer type are treated as
+ <code class="ph codeph">DECIMAL</code>.
+ </li>
+
+ <li class="li">
+ The literals too large to fit into a <code class="ph codeph">DECIMAL(38, 0)</code> are treated
+ as <code class="ph codeph">DOUBLE</code>.
+ </li>
+ </ul>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Numeric literals with a decimal point
+ </p>
+ <ul class="ul">
+ <li class="li">
+ The literal with less than 38 digits are treated as <code class="ph codeph">DECIMAL</code>.
+ </li>
+
+ <li class="li">
+ The literals with 38 or more digits are treated as a <code class="ph codeph">DOUBLE</code>.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Exponential notation is supported in <code class="ph codeph">DECIMAL</code> literals.
+ </li>
+
+ <li dir="ltr" class="li">
+ <p class="p">
+ To represent a very large or precise <code class="ph codeph">DECIMAL</code> value as a literal,
+ for example one that contains more digits than can be represented by a
+ <code class="ph codeph">BIGINT</code> literal, use a quoted string or a floating-point value for
+ the number and <code class="ph codeph">CAST</code> the string to the desired
+ <code class="ph codeph">DECIMAL</code> type.
+ </p>
+
+ <p class="p">
+ For example: <code class="ph codeph">CAST('999999999999999999999999999999' AS DECIMAL(38,
+ 5)))</code>
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">File format considerations:</strong>
+ </p>
+
+ <div class="p" dir="ltr">
+ The <code class="ph codeph">DECIMAL</code> data type can be stored in any of the file formats supported
+ by Impala.
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Impala can query Avro, RCFile, or SequenceFile tables that contain
+ <code class="ph codeph">DECIMAL</code> columns, created by other Hadoop components.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Impala can query and insert into Kudu tables that contain <code class="ph codeph">DECIMAL</code>
+ columns.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ The <code class="ph codeph">DECIMAL</code> data type is fully compatible with HBase tables.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ The <code class="ph codeph">DECIMAL</code> data type is fully compatible with Parquet tables.
+ </p>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ Values of the <code class="ph codeph">DECIMAL</code> data type are potentially larger in text
+ tables than in tables using Parquet or other binary formats.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">UDF consideration:</strong>
+ </p>
+
+ <p class="p">
+ When writing a C++ UDF, use the <code class="ph codeph">DecimalVal</code> data type defined in
+ <span class="ph filepath">/usr/include/impala_udf/udf.h</span>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Changing precision and scale:</strong>
+ </p>
+
+ <div class="p">
+ You can issue an <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement to change the
+ precision and scale of an existing <code class="ph codeph">DECIMAL</code> column.
+ <ul class="ul">
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ For text-based formats (text, RCFile, and SequenceFile tables)
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p dir="ltr" class="p">
+ If the values in the column fit within the new precision and scale, they are
+ returned correctly by a query.
+ </p>
+ </li>
+
+ <li class="li">
+ <div class="p" dir="ltr">
+ If any values that do not fit within the new precision and scale:
+ <ul class="ul">
+ <li class="li">
+ Impala returns an error if the query option <code class="ph codeph">ABORT_ON_ERROR</code>
+ is set to <code class="ph codeph">true</code>.
+ </li>
+
+ <li class="li">
+ Impala returns a <code class="ph codeph">NULL</code> and warning that conversion failed if
+ the query option <code class="ph codeph">ABORT_ON_ERROR</code> is set to
+ <code class="ph codeph">false</code>.
+ </li>
+ </ul>
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Leading zeros do not count against the precision value, but trailing zeros after
+ the decimal point do.
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li dir="ltr" class="li">
+ <p dir="ltr" class="p">
+ For binary formats (Parquet and Avro tables)
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p dir="ltr" class="p">
+ Although an <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement that
+ changes the precision or scale of a <code class="ph codeph">DECIMAL</code> column succeeds,
+ any subsequent attempt to query the changed column results in a fatal error.
+ This is because the metadata about the columns is stored in the data files
+ themselves, and <code class="ph codeph">ALTER TABLE</code> does not actually make any updates
+ to the data files. The other unaltered columns can still be queried
+ successfully.
+ </p>
+ </li>
+
+ <li class="li">
+ <p dir="ltr" class="p">
+ If the metadata in the data files disagrees with the metadata in the metastore
+ database, Impala cancels the query.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong>
+ </p>
+
+ <p class="p">
+ Using a <code class="ph codeph">DECIMAL</code> column as a partition key provides you a better match
+ between the partition key values and the HDFS directory names than using a
+ <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code> partitioning column.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because the <code class="ph codeph">DECIMAL</code> type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE
+ STATS</code> statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility with older version of DECIMAL:</strong>
+ </p>
+
+ <p class="p">
+ This version of <code class="ph codeph">DECIMAL</code> type is the default in
+ <span class="keyword">Impala 3.0</span> and higher. The key differences between this
+ version of <code class="ph codeph">DECIMAL</code> and the previous <code class="ph codeph">DECIMAL</code> V1 in Impala
+ 2.x include the following.
+ </p>
+
+ <div class="p">
+ <table class="simpletable frame-all" id="decimal__simpletable_bwl_khm_rdb"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><thead><tr class="sthead">
+
+ <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__1"></th>
+
+ <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__2">DECIMAL in <span class="keyword">Impala 3.0</span> or
+ higher</th>
+
+ <th class="stentry" id="decimal__simpletable_bwl_khm_rdb__stentry__3">DECIMAL in <span class="keyword">Impala 2.12</span> or lower
+ </th>
+
+ </tr></thead><tbody><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Overall behavior</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Returns either the result or an error.</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Returns either the result or <code class="ph codeph">NULL</code> with a
+ warning.</td>
+
+ </tr><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Overflow behavior</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Aborts with an error.</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Issues a warning and returns <code class="ph codeph">NULL</code>.</td>
+
+ </tr><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">Truncation / rounding behavior in arithmetic</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Truncates and rounds digits from the back.</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Truncates digits from the front.</td>
+
+ </tr><tr class="strow">
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__1">String cast</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__2">Truncates from the back and rounds.</td>
+
+ <td class="stentry" headers="decimal__simpletable_bwl_khm_rdb__stentry__3">Truncates from the back.</td>
+
+ </tr></tbody></table>
+ </div>
+
+ <div class="p">
+ If you need to continue using the first version of the <code class="ph codeph">DECIMAL</code> type for
+ the backward compatibility of your queries, set the <code class="ph codeph">DECIMAL_V2</code> query
+ option to <code class="ph codeph">FALSE</code>:
+<pre class="pre codeblock"><code>SET DECIMAL_V2=FALSE;</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Compatibility with other databases:</strong>
+ </p>
+
+ <p dir="ltr" class="p">
+ Use the <code class="ph codeph">DECIMAL</code> data type in Impala for applications where you used the
+ <code class="ph codeph">NUMBER</code> data type in Oracle.
+ </p>
+
+ <p dir="ltr" class="p">
+ The Impala <code class="ph codeph">DECIMAL</code> type does not support the Oracle idioms of
+ <code class="ph codeph">*</code> for scale.
+ </p>
+
+ <p dir="ltr" class="p">
+ The Impala <code class="ph codeph">DECIMAL</code> type does not support negative values for precision.
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_decimal_v2.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_decimal_v2.html b/docs/build3x/html/topics/impala_decimal_v2.html
new file mode 100644
index 0000000..b26c3e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_decimal_v2.html
@@ -0,0 +1,32 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="decimal_v2"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DECIMAL_V2 Query Option</title></head><body id="decimal_v2"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DECIMAL_V2 Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A query option that changes behavior related to the <code class="ph codeph">DECIMAL</code>
+ data type.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ This query option is currently unsupported.
+ Its precise behavior is currently undefined and might change
+ in the future.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_default_join_distribution_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_default_join_distribution_mode.html b/docs/build3x/html/topics/impala_default_join_distribution_mode.html
new file mode 100644
index 0000000..95ae29b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_default_join_distribution_mode.html
@@ -0,0 +1,113 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_join_distribution_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</title></head><body id="default_join_distribution_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ This option determines the join distribution that Impala uses when any of the tables
+ involved in a join query is missing statistics.
+ </p>
+
+ <p class="p">
+ Impala optimizes join queries based on the presence of table statistics,
+ which are produced by the Impala <code class="ph codeph">COMPUTE STATS</code> statement.
+ By default, when a table involved in the join query does not have statistics,
+ Impala uses the <span class="q">"broadcast"</span> technique that transmits the entire contents
+ of the table to all executor nodes participating in the query. If one table
+ involved in a join has statistics and the other does not, the table without
+ statistics is broadcast. If both tables are missing statistics, the table
+ that is referenced second in the join order is broadcast. This behavior
+ is appropriate when the table involved is relatively small, but can lead to
+ excessive network, memory, and CPU overhead if the table being broadcast is
+ large.
+ </p>
+
+ <p class="p">
+ Because Impala queries frequently involve very large tables, and suboptimal
+ joins for such tables could result in spilling or out-of-memory errors,
+ the setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code> lets you
+ override the default behavior. The shuffle join mechanism divides the corresponding rows
+ of each table involved in a join query using a hashing algorithm, and transmits
+ subsets of the rows to other nodes for processing. Typically, this kind of join is
+ more efficient for joins between large tables of similar size.
+ </p>
+
+ <p class="p">
+ The setting <code class="ph codeph">DEFAULT_JOIN_DISTRIBUTION_MODE=SHUFFLE</code> is
+ recommended when setting up and deploying new clusters, because it is less likely
+ to result in serious consequences such as spilling or out-of-memory errors if
+ the query plan is based on incomplete information. This setting is not the default,
+ to avoid changing the performance characteristics of join queries for clusters that
+ are already tuned for their existing workloads.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+ <p class="p">
+ The allowed values are <code class="ph codeph">BROADCAST</code> (equivalent to 0)
+ or <code class="ph codeph">SHUFFLE</code> (equivalent to 1).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples demonstrate appropriate scenarios for each
+ setting of this query option.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Create a billion-row table.
+create table big_table stored as parquet
+ as select * from huge_table limit 1e9;
+
+-- For a big table with no statistics, the
+-- shuffle join mechanism is appropriate.
+set default_join_distribution_mode=shuffle;
+
+...join queries involving the big table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Create a hundred-row table.
+create table tiny_table stored as parquet
+ as select * from huge_table limit 100;
+
+-- For a tiny table with no statistics, the
+-- broadcast join mechanism is appropriate.
+set default_join_distribution_mode=broadcast;
+
+...join queries involving the tiny table...
+</code></pre>
+
+<pre class="pre codeblock"><code>
+compute stats tiny_table;
+compute stats big_table;
+
+-- Once the stats are computed, the query option has
+-- no effect on join queries involving these tables.
+-- Impala can determine the absolute and relative sizes
+-- of each side of the join query by examining the
+-- row size, cardinality, and so on of each table.
+
+...join queries involving both of these tables...
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+ <a class="xref" href="impala_joins.html">Joins in Impala SELECT Statements</a>,
+ <a class="xref" href="impala_perf_joins.html">Performance Considerations for Join Queries</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_default_spillable_buffer_size.html b/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
new file mode 100644
index 0000000..3eb3689
--- /dev/null
+++ b/docs/build3x/html/topics/impala_default_spillable_buffer_size.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="default_spillable_buffer_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</title></head><body id="default_spillable_buffer_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies the default size for a memory buffer used when the
+ spill-to-disk mechanism is activated, for example for queries against
+ a large table with no statistics, or large join operations.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">2097152</code> (2 MB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+ or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+ specify a value with unrecognized formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ This query option sets an upper bound on the size of the internal
+ buffer size that can be used during spill-to-disk operations. The
+ actual size of the buffer is chosen by the query planner.
+ </p>
+ <p class="p">
+ If overall query performance is limited by the time needed for spilling,
+ consider increasing the <code class="ph codeph">DEFAULT_SPILLABLE_BUFFER_SIZE</code> setting.
+ Larger buffer sizes result in Impala issuing larger I/O requests to storage
+ devices, which might result in higher throughput, particularly on rotational
+ disks.
+ </p>
+ <p class="p">
+ The tradeoff with a large value for this setting is increased memory usage during
+ spill-to-disk operations. Reducing this value may reduce memory consumption.
+ </p>
+ <p class="p">
+ To determine if the value for this setting is having an effect by capping the
+ spillable buffer size, you can see the buffer size chosen by the query planner for
+ a particular query. <code class="ph codeph">EXPLAIN</code> the query while the setting
+ <code class="ph codeph">EXPLAIN_LEVEL=2</code> is in effect.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+set default_spillable_buffer_size=4MB;
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+ <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_delegation.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_delegation.html b/docs/build3x/html/topics/impala_delegation.html
new file mode 100644
index 0000000..696af37
--- /dev/null
+++ b/docs/build3x/html/topics/impala_delegation.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delegation"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala Delegation for Hue and BI Tools</title></head><body id="delegation"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Configuring Impala Delegation for Hue and BI Tools</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ When users submit Impala queries through a separate application, such as Hue or a business intelligence tool,
+ typically all requests are treated as coming from the same user. In Impala 1.2 and higher, authentication is
+ extended by a new feature that allows applications to pass along credentials for the users that connect to
+ them (known as <span class="q">"delegation"</span>), and issue Impala queries with the privileges for those users. Currently,
+ the delegation feature is available only for Impala queries submitted through application interfaces such as
+ Hue and BI tools; for example, Impala cannot issue queries using the privileges of the HDFS user.
+ </p>
+
+ <p class="p">
+ The delegation feature is enabled by a startup option for <span class="keyword cmdname">impalad</span>:
+ <code class="ph codeph">--authorized_proxy_user_config</code>. When you specify this option, users whose names you specify
+ (such as <code class="ph codeph">hue</code>) can delegate the execution of a query to another user. The query runs with the
+ privileges of the delegated user, not the original user such as <code class="ph codeph">hue</code>. The name of the
+ delegated user is passed using the HiveServer2 configuration property <code class="ph codeph">impala.doas.user</code>.
+ </p>
+
+ <p class="p">
+ You can specify a list of users that the application user can delegate to, or <code class="ph codeph">*</code> to allow a
+ superuser to delegate to any other user. For example:
+ </p>
+
+<pre class="pre codeblock"><code>impalad --authorized_proxy_user_config 'hue=user1,user2;admin=*' ...</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Make sure to use single quotes or escape characters to ensure that any <code class="ph codeph">*</code> characters do not
+ undergo wildcard expansion when specified in command-line arguments.
+ </div>
+
+ <p class="p">
+ See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about adding or changing
+ <span class="keyword cmdname">impalad</span> startup options. See
+ <a class="xref" href="http://blog.cloudera.com/blog/2013/07/how-hiveserver2-brings-security-and-concurrency-to-apache-hive/" target="_blank">this
+ blog post</a> for background information about the delegation capability in HiveServer2.
+ </p>
+ <p class="p">
+ To set up authentication for the delegated users:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ On the server side, configure either user/password authentication through LDAP, or Kerberos
+ authentication, for all the delegated users. See <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a> or
+ <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ On the client side, to learn how to enable delegation, consult the documentation
+ for the ODBC driver you are using.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_delete.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_delete.html b/docs/build3x/html/topics/impala_delete.html
new file mode 100644
index 0000000..668970e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_delete.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="delete"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DELETE Statement (Impala 2.8 or higher only)</title></head><body id="delete"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DELETE Statement (<span class="keyword">Impala 2.8</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Deletes an arbitrary number of rows from a Kudu table.
+ This statement only works for Impala tables that use the Kudu storage engine.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+DELETE [FROM] [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> [ WHERE <var class="keyword varname">where_conditions</var> ]
+
+DELETE <var class="keyword varname">table_ref</var> FROM [<var class="keyword varname">joined_table_refs</var>] [ WHERE <var class="keyword varname">where_conditions</var> ]
+</code></pre>
+
+ <p class="p">
+ The first form evaluates rows from one table against an optional
+ <code class="ph codeph">WHERE</code> clause, and deletes all the rows that
+ match the <code class="ph codeph">WHERE</code> conditions, or all rows if
+ <code class="ph codeph">WHERE</code> is omitted.
+ </p>
+
+ <p class="p">
+ The second form evaluates one or more join clauses, and deletes
+ all matching rows from one of the tables. The join clauses can
+ include non-Kudu tables, but the table from which the rows
+ are deleted must be a Kudu table. The <code class="ph codeph">FROM</code>
+ keyword is required in this case, to separate the name of
+ the table whose rows are being deleted from the table names
+ of the join clauses.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The conditions in the <code class="ph codeph">WHERE</code> clause are the same ones allowed
+ for the <code class="ph codeph">SELECT</code> statement. See <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+ for details.
+ </p>
+
+ <p class="p">
+ The conditions in the <code class="ph codeph">WHERE</code> clause can refer to
+ any combination of primary key columns or other columns. Referring to
+ primary key columns in the <code class="ph codeph">WHERE</code> clause is more efficient
+ than referring to non-primary key columns.
+ </p>
+
+ <p class="p">
+ If the <code class="ph codeph">WHERE</code> clause is omitted, all rows are removed from the table.
+ </p>
+
+ <p class="p">
+ Because Kudu currently does not enforce strong consistency during concurrent DML operations,
+ be aware that the results after this statement finishes might be different than you
+ intuitively expect:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If some rows cannot be deleted because their
+ some primary key columns are not found, due to their being deleted
+ by a concurrent <code class="ph codeph">DELETE</code> operation,
+ the statement succeeds but returns a warning.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A <code class="ph codeph">DELETE</code> statement might also overlap with
+ <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>,
+ or <code class="ph codeph">UPSERT</code> statements running concurrently on the same table.
+ After the statement finishes, there might be more or fewer rows than expected in the table
+ because it is undefined whether the <code class="ph codeph">DELETE</code> applies to rows that are
+ inserted or updated while the <code class="ph codeph">DELETE</code> is in progress.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The number of affected rows is reported in an <span class="keyword cmdname">impala-shell</span> message
+ and in the query profile.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DML
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how to delete rows from a specified
+ table, either all rows or rows that match a <code class="ph codeph">WHERE</code>
+ clause:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Deletes all rows. The FROM keyword is optional.
+DELETE FROM kudu_table;
+DELETE kudu_table;
+
+-- Deletes 0, 1, or more rows.
+-- (If c1 is a single-column primary key, the statement could only
+-- delete 0 or 1 rows.)
+DELETE FROM kudu_table WHERE c1 = 100;
+
+-- Deletes all rows that match all the WHERE conditions.
+DELETE FROM kudu_table WHERE
+ (c1 > c2 OR c3 IN ('hello','world')) AND c4 IS NOT NULL;
+DELETE FROM t1 WHERE
+ (c1 IN (1,2,3) AND c2 > c3) OR c4 IS NOT NULL;
+DELETE FROM time_series WHERE
+ year = 2016 AND month IN (11,12) AND day > 15;
+
+-- WHERE condition with a subquery.
+DELETE FROM t1 WHERE
+ c5 IN (SELECT DISTINCT other_col FROM other_table);
+
+-- Does not delete any rows, because the WHERE condition is always false.
+DELETE FROM kudu_table WHERE 1 = 0;
+</code></pre>
+
+ <p class="p">
+ The following examples show how to delete rows that are part
+ of the result set from a join:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Remove _all_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x;
+
+-- Remove _some_ rows from t1 that have a matching X value in t2.
+DELETE t1 FROM t1 JOIN t2 ON t1.x = t2.x
+ WHERE t1.y = FALSE and t2.z > 100;
+
+-- Delete from a Kudu table based on a join with a non-Kudu table.
+DELETE t1 FROM kudu_table t1 JOIN non_kudu_table t2 ON t1.x = t2.x;
+
+-- The tables can be joined in any order as long as the Kudu table
+-- is specified as the deletion target.
+DELETE t2 FROM non_kudu_table t1 JOIN kudu_table t2 ON t1.x = t2.x;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_kudu.html#impala_kudu">Using Impala to Query Kudu Tables</a>, <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+ <a class="xref" href="impala_update.html#update">UPDATE Statement (Impala 2.8 or higher only)</a>, <a class="xref" href="impala_upsert.html#upsert">UPSERT Statement (Impala 2.8 or higher only)</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
[21/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_row_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_row_size.html b/docs/build3x/html/topics/impala_max_row_size.html
new file mode 100644
index 0000000..76c6d69
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_row_size.html
@@ -0,0 +1,221 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_row_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ROW_SIZE Query Option</title></head><body id="max_row_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_ROW_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Ensures that Impala can process rows of at least the specified size. (Larger
+ rows might be successfully processed, but that is not guaranteed.) Applies when
+ constructing intermediate or final rows in the result set. This setting prevents
+ out-of-control memory use when accessing columns containing huge strings.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">524288</code> (512 KB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+ or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+ specify a value with unrecognized formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ If a query fails because it involves rows with long strings and/or
+ many columns, causing the total row size to exceed <code class="ph codeph">MAX_ROW_SIZE</code>
+ bytes, increase the <code class="ph codeph">MAX_ROW_SIZE</code> setting to accommodate
+ the total bytes stored in the largest row. Examine the error messages for any
+ failed queries to see the size of the row that caused the problem.
+ </p>
+ <p class="p">
+ Impala attempts to handle rows that exceed the <code class="ph codeph">MAX_ROW_SIZE</code>
+ value where practical, so in many cases, queries succeed despite having rows
+ that are larger than this setting.
+ </p>
+ <p class="p">
+ Specifying a value that is substantially higher than actually needed can cause
+ Impala to reserve more memory than is necessary to execute the query.
+ </p>
+ <p class="p">
+ In a Hadoop cluster with highly concurrent workloads and queries that process
+ high volumes of data, traditional SQL tuning advice about minimizing wasted memory
+ is worth remembering. For example, if a table has <code class="ph codeph">STRING</code> columns
+ where a single value might be multiple megabytes, make sure that the
+ <code class="ph codeph">SELECT</code> lists in queries only refer to columns that are actually
+ needed in the result set, instead of using the <code class="ph codeph">SELECT *</code> shorthand.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show the kinds of situations where it is necessary to
+ adjust the <code class="ph codeph">MAX_ROW_SIZE</code> setting. First, we create a table
+ containing some very long values in <code class="ph codeph">STRING</code> columns:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table big_strings (s1 string, s2 string, s3 string) stored as parquet;
+
+-- Turn off compression to more easily reason about data volume by doing SHOW TABLE STATS.
+-- Does not actually affect query success or failure, because MAX_ROW_SIZE applies when
+-- column values are materialized in memory.
+set compression_codec=none;
+set;
+...
+ MAX_ROW_SIZE: [524288]
+...
+
+-- A very small row.
+insert into big_strings values ('one', 'two', 'three');
+-- A row right around the default MAX_ROW_SIZE limit: a 500 KiB string and a 30 KiB string.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('123',10000));
+-- A row that is too big if the query has to materialize both S1 and S3.
+insert into big_strings values (repeat('12345',100000), 'short', repeat('12345',100000));
+
+</code></pre>
+
+ <p class="p">
+ With the default <code class="ph codeph">MAX_ROW_SIZE</code> setting, different queries succeed
+ or fail based on which column values have to be materialized during query processing:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- All the S1 values can be materialized within the 512 KB MAX_ROW_SIZE buffer.
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 2 |
++--------------------+
+
+-- A row where even the S1 value is too large to materialize within MAX_ROW_SIZE.
+insert into big_strings values (repeat('12345',1000000), 'short', repeat('12345',1000000));
+
+-- The 5 MiB string is too large to materialize. The message explains the size of the result
+-- set row the query is attempting to materialize.
+select count(distinct(s1)) from big_strings;
+WARNINGS: Row of size 4.77 MB could not be materialized in plan node with id 1.
+ Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- If more columns are involved, the result set row being materialized is bigger.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1.
+ Increase the max_row_size query option (currently 512.00 KB) to process larger rows.
+
+-- Column S2, containing only short strings, can still be examined.
+select count(distinct(s2)) from big_strings;
++----------------------+
+| count(distinct (s2)) |
++----------------------+
+| 2 |
++----------------------+
+
+-- Queries that do not materialize the big column values are OK.
+select count(*) from big_strings;
++----------+
+| count(*) |
++----------+
+| 4 |
++----------+
+
+</code></pre>
+
+ <p class="p">
+ The following examples show how adjusting <code class="ph codeph">MAX_ROW_SIZE</code> upward
+ allows queries involving the long string columns to succeed:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Boosting MAX_ROW_SIZE moderately allows all S1 values to be materialized.
+set max_row_size=7mb;
+
+select count(distinct s1) from big_strings;
++--------------------+
+| count(distinct s1) |
++--------------------+
+| 3 |
++--------------------+
+
+-- But the combination of S1 + S3 strings is still too large.
+select count(distinct s1, s2, s3) from big_strings;
+WARNINGS: Row of size 9.54 MB could not be materialized in plan node with id 1. Increase the max_row_size query option (currently 7.00 MB) to process larger rows.
+
+-- Boosting MAX_ROW_SIZE to larger than the largest row in the table allows
+-- all queries to complete successfully.
+set max_row_size=12mb;
+
+select count(distinct s1, s2, s3) from big_strings;
++----------------------------+
+| count(distinct s1, s2, s3) |
++----------------------------+
+| 4 |
++----------------------------+
+
+</code></pre>
+
+ <p class="p">
+ The following examples show how to reason about appropriate values for
+ <code class="ph codeph">MAX_ROW_SIZE</code>, based on the characteristics of the
+ columns containing the long values:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- With a large MAX_ROW_SIZE in place, we can examine the columns to
+-- understand the practical lower limit for MAX_ROW_SIZE based on the
+-- table structure and column values.
+select max(length(s1) + length(s2) + length(s3)) / 1e6 as megabytes from big_strings;
++-----------+
+| megabytes |
++-----------+
+| 10.000005 |
++-----------+
+
+-- We can also examine the 'Max Size' for each column after computing stats.
+compute stats big_strings;
+show column stats big_strings;
++--------+--------+------------------+--------+----------+-----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+-----------+
+| s1 | STRING | 2 | -1 | 5000000 | 2500002.5 |
+| s2 | STRING | 2 | -1 | 10 | 7.5 |
+| s3 | STRING | 2 | -1 | 5000000 | 2500005 |
++--------+--------+------------------+--------+----------+-----------+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_scan_range_length.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_scan_range_length.html b/docs/build3x/html/topics/impala_max_scan_range_length.html
new file mode 100644
index 0000000..0eaf110
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_scan_range_length.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_scan_range_length"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_SCAN_RANGE_LENGTH Query Option</title></head><body id="max_scan_range_length"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_SCAN_RANGE_LENGTH Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Maximum length of the scan range. Interacts with the number of HDFS blocks in the table to determine how many
+ CPU cores across the cluster are involved with the processing for a query. (Each core processes one scan
+ range.)
+ </p>
+
+ <p class="p">
+ Lowering the value can sometimes increase parallelism if you have unused CPU capacity, but a too-small value
+ can limit query performance because each scan range involves extra overhead.
+ </p>
+
+ <p class="p">
+ Only applicable to HDFS tables. Has no effect on Parquet tables. Unspecified or 0 indicates backend default,
+ which is the same as the HDFS block size for each table.
+ </p>
+
+ <p class="p">
+ Although the scan range can be arbitrarily long, Impala internally uses an 8 MB read buffer so that it can
+ query tables with huge block sizes without allocating equivalent blocks of memory.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.7</span> and higher, the argument value can include unit specifiers,
+ such as <code class="ph codeph">100m</code> or <code class="ph codeph">100mb</code>. In previous versions,
+ Impala interpreted such formatted values as 0, leading to query failures.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mem_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mem_limit.html b/docs/build3x/html/topics/impala_mem_limit.html
new file mode 100644
index 0000000..46e1cd3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mem_limit.html
@@ -0,0 +1,206 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MEM_LIMIT Query Option</title></head><body id="mem_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MEM_LIMIT Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ When resource management is not enabled, defines the maximum amount of memory a query can allocate on each node.
+ Therefore, the total memory that can be used by a query is the <code class="ph codeph">MEM_LIMIT</code> times the number of nodes.
+ </p>
+
+ <p class="p">
+ There are two levels of memory limit for Impala.
+ The <code class="ph codeph">-mem_limit</code> startup option sets an overall limit for the <span class="keyword cmdname">impalad</span> process
+ (which handles multiple queries concurrently).
+ That limit is typically expressed in terms of a percentage of the RAM available on the host, such as <code class="ph codeph">-mem_limit=70%</code>.
+ The <code class="ph codeph">MEM_LIMIT</code> query option, which you set through <span class="keyword cmdname">impala-shell</span>
+ or the <code class="ph codeph">SET</code> statement in a JDBC or ODBC application, applies to each individual query.
+ The <code class="ph codeph">MEM_LIMIT</code> query option is usually expressed as a fixed size such as <code class="ph codeph">10gb</code>,
+ and must always be less than the <span class="keyword cmdname">impalad</span> memory limit.
+ </p>
+
+ <p class="p">
+ If query processing exceeds the specified memory limit on any node, either the per-query limit or the
+ <span class="keyword cmdname">impalad</span> limit, Impala cancels the query automatically.
+ Memory limits are checked periodically during query processing, so the actual memory in use
+ might briefly exceed the limit without the query being cancelled.
+ </p>
+
+ <p class="p">
+ When resource management is enabled, the mechanism for this option changes. If set, it overrides the
+ automatic memory estimate from Impala. Impala requests this amount of memory from YARN on each node, and the
+ query does not proceed until that much memory is available. The actual memory used by the query could be
+ lower, since some queries use much less memory than others. With resource management, the
+ <code class="ph codeph">MEM_LIMIT</code> setting acts both as a hard limit on the amount of memory a query can use on any
+ node (enforced by YARN) and a guarantee that that much memory will be available on each node while the query
+ is being executed. When resource management is enabled but no <code class="ph codeph">MEM_LIMIT</code> setting is
+ specified, Impala estimates the amount of memory needed on each node for each query, requests that much
+ memory from YARN before starting the query, and then internally sets the <code class="ph codeph">MEM_LIMIT</code> on each
+ node to the requested amount of memory during the query. Thus, if the query takes more memory than was
+ originally estimated, Impala detects that the <code class="ph codeph">MEM_LIMIT</code> is exceeded and cancels the query
+ itself.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents memory size in bytes; you can also use a suffix of <code class="ph codeph">m</code> or <code class="ph codeph">mb</code>
+ for megabytes, or more commonly <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you specify a value with unrecognized
+ formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (unlimited)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">MEM_LIMIT</code> setting is primarily useful in a high-concurrency setting,
+ or on a cluster with a workload shared between Impala and other data processing components.
+ You can prevent any query from accidentally using much more memory than expected,
+ which could negatively impact other Impala queries.
+ </p>
+
+ <p class="p">
+ Use the output of the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>
+ to get a report of memory used for each phase of your most heavyweight queries on each node,
+ and then set a <code class="ph codeph">MEM_LIMIT</code> somewhat higher than that.
+ See <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for usage information about
+ the <code class="ph codeph">SUMMARY</code> command.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how to set the <code class="ph codeph">MEM_LIMIT</code> query option
+ using a fixed number of bytes, or suffixes representing gigabytes or megabytes.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > set mem_limit=3000000000;
+MEM_LIMIT set to 3000000000
+[localhost:21000] > select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3g;
+MEM_LIMIT set to 3g
+[localhost:21000] > select 5;
+Query: select 5
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3gb;
+MEM_LIMIT set to 3gb
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+
+[localhost:21000] > set mem_limit=3m;
+MEM_LIMIT set to 3m
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+| 5 |
++---+
+[localhost:21000] > set mem_limit=3mb;
+MEM_LIMIT set to 3mb
+[localhost:21000] > select 5;
++---+
+| 5 |
++---+
+</code></pre>
+
+ <p class="p">
+ The following examples show how unrecognized <code class="ph codeph">MEM_LIMIT</code>
+ values lead to errors for subsequent queries.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > set mem_limit=3tb;
+MEM_LIMIT set to 3tb
+[localhost:21000] > select 5;
+ERROR: Failed to parse query memory limit from '3tb'.
+
+[localhost:21000] > set mem_limit=xyz;
+MEM_LIMIT set to xyz
+[localhost:21000] > select 5;
+Query: select 5
+ERROR: Failed to parse query memory limit from 'xyz'.
+</code></pre>
+
+ <p class="p">
+ The following examples shows the automatic query cancellation
+ when the <code class="ph codeph">MEM_LIMIT</code> value is exceeded
+ on any host involved in the Impala query. First it runs a
+ successful query and checks the largest amount of memory
+ used on any node for any stage of the query.
+ Then it sets an artificially low <code class="ph codeph">MEM_LIMIT</code>
+ setting so that the same query cannot run.
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > select count(*) from customer;
+Query: select count(*) from customer
++----------+
+| count(*) |
++----------+
+| 150000 |
++----------+
+
+[localhost:21000] > select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
++------------------------+
+| count(distinct c_name) |
++------------------------+
+| 150000 |
++------------------------+
+
+[localhost:21000] > summary;
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+| 06:AGGREGATE | 1 | 230.00ms | 230.00ms | 1 | 1 | 16.00 KB | -1 B | FINALIZE |
+| 05:EXCHANGE | 1 | 43.44us | 43.44us | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 02:AGGREGATE | 1 | 227.14ms | 227.14ms | 1 | 1 | 12.00 KB | 10.00 MB | |
+| 04:AGGREGATE | 1 | 126.27ms | 126.27ms | 150.00K | 150.00K | 15.17 MB | 10.00 MB | |
+| 03:EXCHANGE | 1 | 44.07ms | 44.07ms | 150.00K | 150.00K | 0 B | 0 B | HASH(c_name) |
+<strong class="ph b">| 01:AGGREGATE | 1 | 361.94ms | 361.94ms | 150.00K | 150.00K | 23.04 MB | 10.00 MB | |</strong>
+| 00:SCAN HDFS | 1 | 43.64ms | 43.64ms | 150.00K | 150.00K | 24.19 MB | 64.00 MB | tpch.customer |
++--------------+--------+----------+----------+---------+------------+----------+---------------+---------------+
+
+[localhost:21000] > set mem_limit=15mb;
+MEM_LIMIT set to 15mb
+[localhost:21000] > select count(distinct c_name) from customer;
+Query: select count(distinct c_name) from customer
+ERROR:
+Memory limit exceeded
+Query did not have enough memory to get the minimum required buffers in the block manager.
+</code></pre>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min.html b/docs/build3x/html/topics/impala_min.html
new file mode 100644
index 0000000..bfdfd0f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min.html
@@ -0,0 +1,297 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN Function</title></head><body id="min"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MIN Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns the minimum value from a set of numbers. Opposite of the
+ <code class="ph codeph">MAX</code> function. Its single argument can be numeric column, or the numeric result of a function
+ or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+ are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MIN</code> are
+ <code class="ph codeph">NULL</code>, <code class="ph codeph">MIN</code> returns <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>MIN([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+ <p class="p">
+ When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+ grouping values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+ For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+ bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ arguments which produce a <code class="ph codeph">STRING</code> result
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+ <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+ query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+ for the kinds of queries that this option applies to, and slight differences in how partitions are
+ evaluated when this query option is enabled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Find the smallest value for this column in the table.
+select min(c1) from t1;
+-- Find the smallest value for this column from a subset of the table.
+select min(c1) from t1 where month = 'January' and year = '2013';
+-- Find the smallest value from a set of numeric function results.
+select min(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, min(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select min(distinct x) from t1;
+</code></pre>
+
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">MIN()</code> in an analytic context. They use a table
+ containing integers from 1 to 10. Notice how the <code class="ph codeph">MIN()</code> is reported for each input value, as
+ opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, min(x) over (partition by property) as min from int_t where property in ('odd','even');
++----+----------+-----+
+| x | property | min |
++----+----------+-----+
+| 2 | even | 2 |
+| 4 | even | 2 |
+| 6 | even | 2 |
+| 8 | even | 2 |
+| 10 | even | 2 |
+| 1 | odd | 1 |
+| 3 | odd | 1 |
+| 5 | odd | 1 |
+| 7 | odd | 1 |
+| 9 | odd | 1 |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MIN()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MIN()</code>
+result only decreases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property, min(x) <strong class="ph b">over (order by property, x desc)</strong> as 'minimum to this point'
+ from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">range between unbounded preceding and current row</strong>
+ ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">rows between unbounded preceding and current row</strong>
+ ) as 'minimum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | minimum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 5 |
+| 3 | prime | 3 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 2 |
+| 1 | square | 1 |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running minimum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+ ) as 'local minimum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local minimum |
++---+----------+---------------+
+| 7 | prime | 5 |
+| 5 | prime | 3 |
+| 3 | prime | 2 |
+| 2 | prime | 2 |
+| 9 | square | 2 |
+| 4 | square | 1 |
+| 1 | square | 1 |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+ min(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">range between unbounded preceding and 1 following</strong>
+ ) as 'local minimum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+ <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_min_spillable_buffer_size.html b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
new file mode 100644
index 0000000..9f3c84e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_min_spillable_buffer_size.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="min_spillable_buffer_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MIN_SPILLABLE_BUFFER_SIZE Query Option</title></head><body id="min_spillable_buffer_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MIN_SPILLABLE_BUFFER_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies the minimum size for a memory buffer used when the
+ spill-to-disk mechanism is activated, for example for queries against
+ a large table with no statistics, or large join operations.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">65536</code> (64 KB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Units:</strong> A numeric argument represents a size in bytes; you can also use a suffix of <code class="ph codeph">m</code>
+ or <code class="ph codeph">mb</code> for megabytes, or <code class="ph codeph">g</code> or <code class="ph codeph">gb</code> for gigabytes. If you
+ specify a value with unrecognized formats, subsequent queries fail with an error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ This query option sets a lower bound on the size of the internal
+ buffer size that can be used during spill-to-disk operations. The
+ actual size of the buffer is chosen by the query planner.
+ </p>
+ <p class="p">
+ If overall query performance is limited by the time needed for spilling,
+ consider increasing the <code class="ph codeph">MIN_SPILLABLE_BUFFER_SIZE</code> setting.
+ Larger buffer sizes result in Impala issuing larger I/O requests to storage
+ devices, which might result in higher throughput, particularly on rotational
+ disks.
+ </p>
+ <p class="p">
+ The tradeoff with a large value for this setting is increased memory usage during
+ spill-to-disk operations. Reducing this value may reduce memory consumption.
+ </p>
+ <p class="p">
+ To determine if the value for this setting is having an effect by capping the
+ spillable buffer size, you can see the buffer size chosen by the query planner for
+ a particular query. <code class="ph codeph">EXPLAIN</code> the query while the setting
+ <code class="ph codeph">EXPLAIN_LEVEL=2</code> is in effect.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+set min_spillable_buffer_size=128KB;
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_misc_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_misc_functions.html b/docs/build3x/html/topics/impala_misc_functions.html
new file mode 100644
index 0000000..4210a99
--- /dev/null
+++ b/docs/build3x/html/topics/impala_misc_functions.html
@@ -0,0 +1,175 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="misc_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Miscellaneous Functions</title></head><body id="misc_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Miscellaneous Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala supports the following utility functions that do not operate on a particular column or data type:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="misc_functions__current_database">
+ <code class="ph codeph">current_database()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the database that the session is currently using, either <code class="ph codeph">default</code>
+ if no database has been selected, or whatever database the session switched to through a
+ <code class="ph codeph">USE</code> statement or the <span class="keyword cmdname">impalad</span><code class="ph codeph">-d</code> option.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__effective_user">
+ <code class="ph codeph">effective_user()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Typically returns the same value as <code class="ph codeph">user()</code>,
+ except if delegation is enabled, in which case it returns the ID of the delegated user.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.5</span>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__pid">
+ <code class="ph codeph">pid()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the process ID of the <span class="keyword cmdname">impalad</span> daemon that the session is
+ connected to. You can use it during low-level debugging, to issue Linux commands that trace, show the
+ arguments, and so on the <span class="keyword cmdname">impalad</span> process.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__user">
+ <code class="ph codeph">user()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the username of the Linux user who is connected to the <span class="keyword cmdname">impalad</span>
+ daemon. Typically called a single time, in a query without any <code class="ph codeph">FROM</code> clause, to
+ understand how authorization settings apply in a security context; once you know the logged-in username,
+ you can check which groups that user belongs to, and from the list of groups you can check which roles
+ are available to those groups through the authorization policy file.
+ <p class="p">
+ In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+ <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+ </p>
+ <p class="p">
+ When delegation is enabled, consider calling the <code class="ph codeph">effective_user()</code> function instead.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__uuid">
+ <code class="ph codeph">uuid()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a <a class="xref" href="https://en.wikipedia.org/wiki/Universally_unique_identifier" target="_blank">universal unique identifier</a>, a 128-bit value encoded as a string with groups of hexadecimal digits separated by dashes.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Ascending numeric sequences of type <code class="ph codeph">BIGINT</code> are often used
+ as identifiers within a table, and as join keys across multiple tables.
+ The <code class="ph codeph">uuid()</code> value is a convenient alternative that does not
+ require storing or querying the highest sequence number. For example, you
+ can use it to quickly construct new unique identifiers during a data import job,
+ or to combine data from different tables without the likelihood of ID collisions.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+-- Each call to uuid() produces a new arbitrary value.
+select uuid();
++--------------------------------------+
+| uuid() |
++--------------------------------------+
+| c7013e25-1455-457f-bf74-a2046e58caea |
++--------------------------------------+
+
+-- If you get a UUID for each row of a result set, you can use it as a
+-- unique identifier within a table, or even a unique ID across tables.
+select uuid() from four_row_table;
++--------------------------------------+
+| uuid() |
++--------------------------------------+
+| 51d3c540-85e5-4cb9-9110-604e53999e2e |
+| 0bb40071-92f6-4a59-a6a4-60d46e9703e2 |
+| 5e9d7c36-9842-4a96-862d-c13cd0457c02 |
+| cae29095-0cc0-4053-a5ea-7fcd3c780861 |
++--------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="misc_functions__version">
+ <code class="ph codeph">version()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns information such as the precise version number and build date for the
+ <code class="ph codeph">impalad</code> daemon that you are currently connected to. Typically used to confirm that you
+ are connected to the expected level of Impala to use a particular feature, or to connect to several nodes
+ and confirm they are all running the same level of <span class="keyword cmdname">impalad</span>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code> (with one or more embedded newlines)
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mixed_security.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mixed_security.html b/docs/build3x/html/topics/impala_mixed_security.html
new file mode 100644
index 0000000..9cadbf7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mixed_security.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mixed_security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Multiple Authentication Methods with Impala</title></head><body id="mixed_security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Multiple Authentication Methods with Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala 2.0 and later automatically handles both Kerberos and LDAP authentication. Each
+ <span class="keyword cmdname">impalad</span> daemon can accept both Kerberos and LDAP requests through the same port. No
+ special actions need to be taken if some users authenticate through Kerberos and some through LDAP.
+ </p>
+
+ <p class="p">
+ Prior to Impala 2.0, you had to configure each <span class="keyword cmdname">impalad</span> to listen on a specific port
+ depending on the kind of authentication, then configure your network load balancer to forward each kind of
+ request to a DataNode that was set up with the appropriate authentication type. Once the initial request was
+ made using either Kerberos or LDAP authentication, Impala automatically handled the process of coordinating
+ the work across multiple nodes and transmitting intermediate results back to the coordinator node.
+ </p>
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_mt_dop.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_mt_dop.html b/docs/build3x/html/topics/impala_mt_dop.html
new file mode 100644
index 0000000..42d9591
--- /dev/null
+++ b/docs/build3x/html/topics/impala_mt_dop.html
@@ -0,0 +1,190 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mt_dop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MT_DOP Query Option</title></head><body id="mt_dop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MT_DOP Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Sets the degree of intra-node parallelism used for certain operations that
+ can benefit from multithreaded execution. You can specify values
+ higher than zero to find the ideal balance of response time,
+ memory usage, and CPU usage during statement processing.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The Impala execution engine is being revamped incrementally to add
+ additional parallelism within a single host for certain statements and
+ kinds of operations. The setting <code class="ph codeph">MT_DOP=0</code> uses the
+ <span class="q">"old"</span> code path with limited intra-node parallelism.
+ </p>
+
+ <p class="p">
+ Currently, the operations affected by the <code class="ph codeph">MT_DOP</code>
+ query option are:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">COMPUTE [INCREMENTAL] STATS</code>. Impala automatically sets
+ <code class="ph codeph">MT_DOP=4</code> for <code class="ph codeph">COMPUTE STATS</code> and
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements on Parquet tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Queries with execution plans containing only scan and aggregation operators,
+ or local joins that do not need data exchanges (such as for nested types).
+ Other queries produce an error if <code class="ph codeph">MT_DOP</code> is set to a non-zero
+ value. Therefore, this query option is typically only set for the duration of
+ specific long-running, CPU-intensive queries.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">0</code>
+ </p>
+ <p class="p">
+ Because <code class="ph codeph">COMPUTE STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ statements for Parquet tables benefit substantially from extra intra-node
+ parallelism, Impala automatically sets <code class="ph codeph">MT_DOP=4</code> when computing stats
+ for Parquet tables.
+ </p>
+ <p class="p">
+ <strong class="ph b">Range:</strong> 0 to 64
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Any timing figures in the following examples are on a small, lightly loaded development cluster.
+ Your mileage may vary. Speedups depend on many factors, including the number of rows, columns, and
+ partitions within each table.
+ </p>
+ </div>
+
+ <p class="p">
+ The following example shows how to run a <code class="ph codeph">COMPUTE STATS</code>
+ statement against a Parquet table with or without an explicit <code class="ph codeph">MT_DOP</code>
+ setting:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Explicitly setting MT_DOP to 0 selects the old code path.
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- The analysis for the billion rows is distributed among hosts,
+-- but uses only a single core on each host.
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Using 4 logical processors per host is faster.
+set mt_dop = 4;
+MT_DOP set to 4
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+drop stats billion_rows_parquet;
+
+-- Unsetting the option reverts back to its default.
+-- Which for COMPUTE STATS and a Parquet table is 4,
+-- so again it uses the fast path.
+unset MT_DOP;
+Unsetting option MT_DOP
+
+compute stats billion_rows_parquet;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows the effects of setting <code class="ph codeph">MT_DOP</code>
+ for a query involving only scan and aggregation operations for a Parquet table:
+ </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop = 0;
+MT_DOP set to 0
+
+-- COUNT(DISTINCT) for a unique column is CPU-intensive.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000 |
++--------------------+
+Fetched 1 row(s) in 67.20s
+
+set mt_dop = 16;
+MT_DOP set to 16
+
+-- Introducing more intra-node parallelism for the aggregation
+-- speeds things up, and potentially reduces memory overhead by
+-- reducing the number of scanner threads.
+select count(distinct id) from billion_rows_parquet;
++--------------------+
+| count(distinct id) |
++--------------------+
+| 1000000000 |
++--------------------+
+Fetched 1 row(s) in 17.19s
+
+</code></pre>
+
+ <p class="p">
+ The following example shows how queries that are not compatible with non-zero
+ <code class="ph codeph">MT_DOP</code> settings produce an error when <code class="ph codeph">MT_DOP</code>
+ is set:
+ </p>
+
+<pre class="pre codeblock"><code>
+set mt_dop=1;
+MT_DOP set to 1
+
+select * from a1 inner join a2
+ on a1.id = a2.id limit 4;
+ERROR: NotImplementedException: MT_DOP not supported for plans with
+ base table joins or table sinks.
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html">COMPUTE STATS Statement</a>,
+ <a class="xref" href="impala_aggregate_functions.html">Impala Aggregate Functions</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ndv.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ndv.html b/docs/build3x/html/topics/impala_ndv.html
new file mode 100644
index 0000000..a3f7e2c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ndv.html
@@ -0,0 +1,226 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ndv"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NDV Function</title></head><body id="ndv"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">NDV Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns an approximate value similar to the result of <code class="ph codeph">COUNT(DISTINCT
+ <var class="keyword varname">col</var>)</code>, the <span class="q">"number of distinct values"</span>. It is much faster than the
+ combination of <code class="ph codeph">COUNT</code> and <code class="ph codeph">DISTINCT</code>, and uses a constant amount of memory and
+ thus is less memory-intensive for columns with high cardinality.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>NDV([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This is the mechanism used internally by the <code class="ph codeph">COMPUTE STATS</code> statement for computing the
+ number of distinct values in a column.
+ </p>
+
+ <p class="p">
+ Because this number is an estimate, it might not reflect the precise number of different values in the
+ column, especially if the cardinality is very low or very high. If the estimated number is higher than the
+ number of rows in the table, Impala adjusts the value internally during query planning.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+ releases
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example queries a billion-row table to illustrate the relative performance of
+ <code class="ph codeph">COUNT(DISTINCT)</code> and <code class="ph codeph">NDV()</code>. It shows how <code class="ph codeph">COUNT(DISTINCT)</code>
+ gives a precise answer, but is inefficient for large-scale data where an approximate result is sufficient.
+ The <code class="ph codeph">NDV()</code> function gives an approximate result but is much faster.
+ </p>
+
+<pre class="pre codeblock"><code>select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000 |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select cast(ndv(col1) as bigint) as col1 from sample_data;
++----------+
+| col1 |
++----------+
+| 139017 |
++----------+
+Fetched 1 row(s) in 8.91s
+</code></pre>
+
+ <p class="p">
+ The following example shows how you can code multiple <code class="ph codeph">NDV()</code> calls in a single query, to
+ easily learn which columns have substantially more or fewer distinct values. This technique is faster than
+ running a sequence of queries with <code class="ph codeph">COUNT(DISTINCT)</code> calls.
+ </p>
+
+<pre class="pre codeblock"><code>select cast(ndv(col1) as bigint) as col1, cast(ndv(col2) as bigint) as col2,
+ cast(ndv(col3) as bigint) as col3, cast(ndv(col4) as bigint) as col4
+ from sample_data;
++----------+-----------+------------+-----------+
+| col1 | col2 | col3 | col4 |
++----------+-----------+------------+-----------+
+| 139017 | 282 | 46 | 145636240 |
++----------+-----------+------------+-----------+
+Fetched 1 row(s) in 34.97s
+
+select count(distinct col1) from sample_data;
++---------------------+
+| count(distinct col1)|
++---------------------+
+| 100000 |
++---------------------+
+Fetched 1 row(s) in 20.13s
+
+select count(distinct col2) from sample_data;
++----------------------+
+| count(distinct col2) |
++----------------------+
+| 278 |
++----------------------+
+Fetched 1 row(s) in 20.09s
+
+select count(distinct col3) from sample_data;
++-----------------------+
+| count(distinct col3) |
++-----------------------+
+| 46 |
++-----------------------+
+Fetched 1 row(s) in 19.12s
+
+select count(distinct col4) from sample_data;
++----------------------+
+| count(distinct col4) |
++----------------------+
+| 147135880 |
++----------------------+
+Fetched 1 row(s) in 266.95s
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
[10/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_replica_preference.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_replica_preference.html b/docs/build3x/html/topics/impala_replica_preference.html
new file mode 100644
index 0000000..38b8698
--- /dev/null
+++ b/docs/build3x/html/topics/impala_replica_preference.html
@@ -0,0 +1,68 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="replica_preference"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</title></head><body id="replica_preference"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REPLICA_PREFERENCE Query Option (<span class="keyword">Impala 2.7</span> or higher only)</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">REPLICA_PREFERENCE</code> query option lets you distribute the work more
+ evenly if hotspots and bottlenecks persist. It causes the access cost of all replicas of a
+ data block to be considered equal to or worse than the configured value. This allows
+ Impala to schedule reads to suboptimal replicas (e.g. local in the presence of cached
+ ones) in order to distribute the work across more executor nodes.
+ </p>
+
+ <p class="p">
+ Allowed values are: <code class="ph codeph">CACHE_LOCAL</code> (<code class="ph codeph">0</code>),
+ <code class="ph codeph">DISK_LOCAL</code> (<code class="ph codeph">2</code>), <code class="ph codeph">REMOTE</code>
+ (<code class="ph codeph">4</code>)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Enum
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">CACHE_LOCAL (0)</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.7.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage Notes:</strong>
+ </p>
+
+ <p class="p">
+ By default Impala selects the best replica it can find in terms of access cost. The
+ preferred order is cached, local, and remote. With <code class="ph codeph">REPLICA_PREFERENCE</code>,
+ the preference of all replicas are capped at the selected value. For example, when
+ <code class="ph codeph">REPLICA_PREFERENCE</code> is set to <code class="ph codeph">DISK_LOCAL</code>, cached and
+ local replicas are treated with the equal preference. When set to
+ <code class="ph codeph">REMOTE</code>, all three types of replicas, cached, local, remote, are treated
+ with equal preference.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+ <a class="xref" href="impala_schedule_random_replica.html#schedule_random_replica">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_request_pool.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_request_pool.html b/docs/build3x/html/topics/impala_request_pool.html
new file mode 100644
index 0000000..39be2da
--- /dev/null
+++ b/docs/build3x/html/topics/impala_request_pool.html
@@ -0,0 +1,35 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="request_pool"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REQUEST_POOL Query Option</title></head><body id="request_pool"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REQUEST_POOL Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The pool or queue name that queries should be submitted to. Only applies when you enable the Impala admission control feature.
+ Specifies the name of the pool used by requests from Impala to the resource manager.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> empty (use the user-to-pool mapping defined by an <span class="keyword cmdname">impalad</span> startup option
+ in the Impala configuration file)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_admission.html">Admission Control and Query Queuing</a>
+ </p>
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[19/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_num_nodes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_num_nodes.html b/docs/build3x/html/topics/impala_num_nodes.html
new file mode 100644
index 0000000..691bab9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_num_nodes.html
@@ -0,0 +1,61 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_nodes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_NODES Query Option</title></head><body id="num_nodes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">NUM_NODES Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Limit the number of nodes that process a query, typically during debugging.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+<p class="p">
+ <strong class="ph b">Allowed values:</strong> Only accepts the values 0
+ (meaning all nodes) or 1 (meaning all work is done on the coordinator node).
+</p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If you are diagnosing a problem that you suspect is due to a timing issue due to
+ distributed query processing, you can set <code class="ph codeph">NUM_NODES=1</code> to verify
+ if the problem still occurs when all the work is done on a single node.
+ </p>
+
+ <p class="p">
+ You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+ files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+ partitioned table, the default behavior could produce many small files when intuitively you might expect
+ only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+ write operation, making it more likely to produce only one or a few data files.
+ </p>
+
+ <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+ <p class="p">
+ Because this option results in increased resource utilization on a single host,
+ it could cause problems due to contention with other Impala statements or
+ high resource usage. Symptoms could include queries running slowly, exceeding the memory limit,
+ or appearing to hang. Use it only in a single-user development/test environment;
+ <strong class="ph b">do not</strong> use it in a production environment or in a cluster with a high-concurrency
+ or high-volume or performance-critical workload.
+ </p>
+ </div>
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_num_scanner_threads.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_num_scanner_threads.html b/docs/build3x/html/topics/impala_num_scanner_threads.html
new file mode 100644
index 0000000..0617048
--- /dev/null
+++ b/docs/build3x/html/topics/impala_num_scanner_threads.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="num_scanner_threads"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>NUM_SCANNER_THREADS Query Option</title></head><body id="num_scanner_threads"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">NUM_SCANNER_THREADS Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Maximum number of scanner threads (on each node) used for each query. By default, Impala uses as many cores
+ as are available (one thread per core). You might lower this value if queries are using excessive resources
+ on a busy cluster. Impala imposes a maximum value automatically, so a high value has no practical effect.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_odbc.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_odbc.html b/docs/build3x/html/topics/impala_odbc.html
new file mode 100644
index 0000000..9d73173
--- /dev/null
+++ b/docs/build3x/html/topics/impala_odbc.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_odbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with ODBC</title></head><body id="impala_odbc"><main role="main"><article role="article" aria-labelledby="impala_odbc__odbc">
+
+ <h1 class="title topictitle1" id="impala_odbc__odbc">Configuring Impala to Work with ODBC</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Third-party products, especially business intelligence and reporting tools, can access Impala
+ using the ODBC protocol. For the best experience, ensure any third-party product you intend to use is supported.
+ Verifying support includes checking that the versions of Impala, ODBC, the operating system, the
+ Apache Hadoop distribution, and the third-party product have all been approved by the appropriate suppliers
+ for use together. To configure your systems to use ODBC, download and install a connector, typically from
+ the supplier of the third-party product or the Hadoop distribution.
+ You may need to sign in and accept license agreements before accessing the pages required for downloading
+ ODBC connectors.
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_offset.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_offset.html b/docs/build3x/html/topics/impala_offset.html
new file mode 100644
index 0000000..b96e7af
--- /dev/null
+++ b/docs/build3x/html/topics/impala_offset.html
@@ -0,0 +1,67 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="offset"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OFFSET Clause</title></head><body id="offset"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">OFFSET Clause</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">OFFSET</code> clause in a <code class="ph codeph">SELECT</code> query causes the result set to start some
+ number of rows after the logical first item. The result set is numbered starting from zero, so <code class="ph codeph">OFFSET
+ 0</code> produces the same result as leaving out the <code class="ph codeph">OFFSET</code> clause. Always use this clause
+ in combination with <code class="ph codeph">ORDER BY</code> (so that it is clear which item should be first, second, and so
+ on) and <code class="ph codeph">LIMIT</code> (so that the result set covers a bounded range, such as items 0-9, 100-199,
+ and so on).
+ </p>
+
+ <p class="p">
+ In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+ clause to produce a small result set that is different from a top-N query, for example, to return items 11
+ through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+ involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+ rewrite the application logic. For best performance and scalability, wherever practical, query as many
+ items as you expect to need, cache them on the application side, and display small groups of results to
+ users using application logic.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how you could run a <span class="q">"paging"</span> query originally written for a traditional
+ database application. Because typical Impala queries process megabytes or gigabytes of data and read large
+ data files from disk each time, it is inefficient to run a separate query to retrieve each small group of
+ items. Use this technique only for compatibility while porting older applications, then rewrite the
+ application code to use a single query with a large result set, and display pages of results from the cached
+ result set.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table numbers (x int);
+[localhost:21000] > insert into numbers select x from very_long_sequence;
+Inserted 1000000 rows in 1.34s
+[localhost:21000] > select x from numbers order by x limit 5 offset 0;
++----+
+| x |
++----+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
++----+
+[localhost:21000] > select x from numbers order by x limit 5 offset 5;
++----+
+| x |
++----+
+| 6 |
+| 7 |
+| 8 |
+| 9 |
+| 10 |
++----+
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
[02/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html b/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
new file mode 100644
index 0000000..2799c1f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shuffle_distinct_exprs.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shuffle_distinct_exprs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SHUFFLE_DISTINCT_EXPRS Query Option</title></head><body id="shuffle_distinct_exprs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SHUFFLE_DISTINCT_EXPRS Query Option</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHUFFLE_DISTINCT_EXPRS</code> query option controls the
+ shuffling behavior when a query has both grouping and distinct expressions.
+ Impala can optionally include the distinct expressions in the hash exchange
+ to spread the data among more nodes. However, this plan requires one more
+ hash exchange phase.
+ </p>
+
+ <p class="p">
+ It is recommended that you turn off this option if the NDVs of the grouping
+ expressions are high.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_smallint.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_smallint.html b/docs/build3x/html/topics/impala_smallint.html
new file mode 100644
index 0000000..86d5089
--- /dev/null
+++ b/docs/build3x/html/topics/impala_smallint.html
@@ -0,0 +1,127 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="smallint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SMALLINT Data Type</title></head><body id="smallint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SMALLINT Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A 2-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> SMALLINT</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> -32768 .. 32767. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala automatically converts to a larger integer type (<code class="ph codeph">INT</code> or
+ <code class="ph codeph">BIGINT</code>) or a floating-point type (<code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>)
+ automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">STRING</code>,
+ or <code class="ph codeph">TIMESTAMP</code>.
+ <span class="ph">
+ Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ For a convenient and automated way to check the bounds of the <code class="ph codeph">SMALLINT</code> type, call the
+ functions <code class="ph codeph">MIN_SMALLINT()</code> and <code class="ph codeph">MAX_SMALLINT()</code>.
+ </p>
+
+ <p class="p">
+ If an integer value is too large to be represented as a <code class="ph codeph">SMALLINT</code>, use an
+ <code class="ph codeph">INT</code> instead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+ value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x SMALLINT);
+SELECT CAST(1000 AS SMALLINT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Physically, Parquet files represent <code class="ph codeph">TINYINT</code> and <code class="ph codeph">SMALLINT</code> values as 32-bit
+ integers. Although Impala rejects attempts to insert out-of-range values into such columns, if you create a
+ new table with the <code class="ph codeph">CREATE TABLE ... LIKE PARQUET</code> syntax, any <code class="ph codeph">TINYINT</code> or
+ <code class="ph codeph">SMALLINT</code> columns in the original table turn into <code class="ph codeph">INT</code> columns in the new
+ table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+ type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as a 2-byte value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+ <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ssl.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ssl.html b/docs/build3x/html/topics/impala_ssl.html
new file mode 100644
index 0000000..a91b69b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ssl.html
@@ -0,0 +1,180 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ssl"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring TLS/SSL for Impala</title></head><body id="ssl"><main role="main"><article role="article" aria-labelledby="ssl__tls">
+
+ <h1 class="title topictitle1" id="ssl__tls">Configuring TLS/SSL for Impala</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports TLS/SSL network encryption, between Impala and client
+ programs, and between the Impala-related daemons running on different nodes
+ in the cluster. This feature is important when you also use other features such as Kerberos
+ authentication or Sentry authorization, where credentials are being
+ transmitted back and forth.
+ </p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ssl__concept_q1p_j2d_rp">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Using the Command Line</h2>
+
+ <div class="body conbody">
+ <p class="p">
+ To enable SSL for when client applications connect to Impala, add both of the following flags to the <span class="keyword cmdname">impalad</span> startup options:
+ </p>
+
+ <ul class="ul" id="concept_q1p_j2d_rp__ul_i2p_m2d_rp">
+ <li class="li">
+ <code class="ph codeph">--ssl_server_certificate</code>: the full path to the server certificate, on the local filesystem.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ssl_private_key</code>: the full path to the server private key, on the local filesystem.
+ </li>
+ </ul>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, Impala can also use SSL for its own internal communication between the
+ <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <code class="ph codeph">catalogd</code> daemons.
+ To enable this additional SSL encryption, set the <code class="ph codeph">--ssl_server_certificate</code>
+ and <code class="ph codeph">--ssl_private_key</code> flags in the startup options for
+ <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>, and <span class="keyword cmdname">statestored</span>,
+ and also add the <code class="ph codeph">--ssl_client_ca_certificate</code> flag for all three of those daemons.
+ </p>
+
+ <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+ Prior to <span class="keyword">Impala 2.3.2</span>, you could enable Kerberos authentication between Impala internal components,
+ or SSL encryption between Impala internal components, but not both at the same time.
+ This restriction has now been lifted.
+ See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>
+ to see the maintenance releases for different levels of Impala where the fix has been published.
+ </div>
+
+ <p class="p">
+ If either of these flags are set, both must be set. In that case, Impala starts listening for Beeswax and HiveServer2 requests on
+ SSL-secured ports only. (The port numbers stay the same; see <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for details.)
+ </p>
+
+ <p class="p">
+ Since Impala uses passphrase-less certificates in PEM format, you can reuse a host's existing Java keystore
+ by using the <code class="ph codeph">openssl</code> toolkit to convert it to the PEM format.
+ </p>
+
+ <section class="section" id="concept_q1p_j2d_rp__secref"><h3 class="title sectiontitle">Configuring TLS/SSL Communication for the Impala Shell</h3>
+
+
+
+ <p class="p">
+ With SSL enabled for Impala, use the following options when starting the
+ <span class="keyword cmdname">impala-shell</span> interpreter:
+ </p>
+
+ <ul class="ul" id="concept_q1p_j2d_rp__ul_kgp_m2d_rp">
+ <li class="li">
+ <code class="ph codeph">--ssl</code>: enables TLS/SSL for <span class="keyword cmdname">impala-shell</span>.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ca_cert</code>: the local pathname pointing to the third-party CA certificate, or to a copy of the server
+ certificate for self-signed server certificates.
+ </li>
+ </ul>
+
+ <p class="p">
+ If <code class="ph codeph">--ca_cert</code> is not set, <span class="keyword cmdname">impala-shell</span> enables TLS/SSL, but does not validate the server
+ certificate. This is useful for connecting to a known-good Impala that is only running over TLS/SSL, when a copy of the
+ certificate is not available (such as when debugging customer installations).
+ </p>
+
+ </section>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ssl__ssl_jdbc_odbc">
+ <h2 class="title topictitle2" id="ariaid-title3">Using TLS/SSL with Business Intelligence Tools</h2>
+ <div class="body conbody">
+ <p class="p">
+ You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+ connections from JDBC and ODBC applications to Impala.
+ See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+ for details.
+ </p>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+ and SSL encryption. If your cluster is running an older release that has this restriction,
+ use an alternative JDBC driver that supports
+ both of these security features.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ssl__tls_min_version">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Specifying TLS/SSL Minimum Allowed Version and Ciphers</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Depending on your cluster configuration and the security practices in your
+ organization, you might need to restrict the allowed versions of TLS/SSL
+ used by Impala. Older TLS/SSL versions might have vulnerabilities or lack
+ certain features. In <span class="keyword">Impala 2.10</span>, you can use startup
+ options for the <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+ and <span class="keyword cmdname">statestored</span> daemons to specify a minimum allowed
+ version of TLS/SSL.
+ </p>
+
+ <p class="p">
+ Specify one of the following values for the <code class="ph codeph">--ssl_minimum_version</code>
+ configuration setting:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">tlsv1</code>: Allow any TLS version of 1.0 or higher.
+ This setting is the default when TLS/SSL is enabled.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">tlsv1.1</code>: Allow any TLS version of 1.1 or higher.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">tlsv1.2</code>: Allow any TLS version of 1.2 or higher.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ Along with specifying the version, you can also specify the allowed set of TLS ciphers
+ by using the <code class="ph codeph">--ssl_cipher_list</code> configuration setting. The argument to
+ this option is a list of keywords, separated by colons, commas, or spaces, and
+ optionally including other notation. For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+--ssl_cipher_list="RC4-SHA,RC4-MD5"
+</code></pre>
+
+ <p class="p">
+ By default, the cipher list is empty, and Impala uses the default cipher list for
+ the underlying platform. See the output of <span class="keyword cmdname">man ciphers</span> for the full
+ set of keywords and notation allowed in the argument string.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_stddev.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_stddev.html b/docs/build3x/html/topics/impala_stddev.html
new file mode 100644
index 0000000..e775089
--- /dev/null
+++ b/docs/build3x/html/topics/impala_stddev.html
@@ -0,0 +1,121 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="stddev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STDDEV, STDDEV_SAMP, STDDEV_POP Functions</title></head><body id="stddev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ An aggregate function that
+ <a class="xref" href="http://en.wikipedia.org/wiki/Standard_deviation" target="_blank">standard
+ deviation</a> of a set of numbers.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>{ STDDEV | STDDEV_SAMP | STDDEV_POP } ([DISTINCT | ALL] <var class="keyword varname">expression</var>)</code></pre>
+
+ <p class="p">
+ This function works with any numeric data type.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> in Impala 2.0 and higher; <code class="ph codeph">STRING</code> in earlier
+ releases
+ </p>
+
+ <p class="p">
+ This function is typically used in mathematical formulas related to probability distributions.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STDDEV_POP()</code> and <code class="ph codeph">STDDEV_SAMP()</code> functions compute the population
+ standard deviation and sample standard deviation, respectively, of the input values.
+ (<code class="ph codeph">STDDEV()</code> is an alias for <code class="ph codeph">STDDEV_SAMP()</code>.) Both functions evaluate all input
+ rows matched by the query. The difference is that <code class="ph codeph">STDDEV_SAMP()</code> is scaled by
+ <code class="ph codeph">1/(N-1)</code> while <code class="ph codeph">STDDEV_POP()</code> is scaled by <code class="ph codeph">1/N</code>.
+ </p>
+
+ <p class="p">
+ If no input rows match the query, the result of any of these functions is <code class="ph codeph">NULL</code>. If a single
+ input row matches the query, the result of any of these functions is <code class="ph codeph">"0.0"</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example demonstrates how <code class="ph codeph">STDDEV()</code> and <code class="ph codeph">STDDEV_SAMP()</code> return the same
+ result, while <code class="ph codeph">STDDEV_POP()</code> uses a slightly different calculation to reflect that the input
+ data is considered part of a larger <span class="q">"population"</span>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select stddev(score) from test_scores;
++---------------+
+| stddev(score) |
++---------------+
+| 28.5 |
++---------------+
+[localhost:21000] > select stddev_samp(score) from test_scores;
++--------------------+
+| stddev_samp(score) |
++--------------------+
+| 28.5 |
++--------------------+
+[localhost:21000] > select stddev_pop(score) from test_scores;
++-------------------+
+| stddev_pop(score) |
++-------------------+
+| 28.4858 |
++-------------------+
+</code></pre>
+
+ <p class="p">
+ This example demonstrates that, because the return value of these aggregate functions is a
+ <code class="ph codeph">STRING</code>, you must currently convert the result with <code class="ph codeph">CAST</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table score_stats as select cast(stddev(score) as decimal(7,4)) `standard_deviation`, cast(variance(score) as decimal(7,4)) `variance` from test_scores;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc score_stats;
++--------------------+--------------+---------+
+| name | type | comment |
++--------------------+--------------+---------+
+| standard_deviation | decimal(7,4) | |
+| variance | decimal(7,4) | |
++--------------------+--------------+---------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STDDEV()</code>, <code class="ph codeph">STDDEV_POP()</code>, and <code class="ph codeph">STDDEV_SAMP()</code> functions
+ compute the standard deviation (square root of the variance) based on the results of
+ <code class="ph codeph">VARIANCE()</code>, <code class="ph codeph">VARIANCE_POP()</code>, and <code class="ph codeph">VARIANCE_SAMP()</code>
+ respectively. See <a class="xref" href="impala_variance.html#variance">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a> for details about the variance property.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_string.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_string.html b/docs/build3x/html/topics/impala_string.html
new file mode 100644
index 0000000..6f594ca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_string.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="string"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>STRING Data Type</title></head><body id="string"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">STRING Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> STRING</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Length:</strong> Maximum of 32,767 bytes. Do not use any length constraint when declaring
+ <code class="ph codeph">STRING</code> columns, as you might be familiar with from <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">CHAR</code>, or similar column types from relational database systems. <span class="ph">If you do
+ need to manipulate string values with precise or maximum lengths, in Impala 2.0 and higher you can declare
+ columns as <code class="ph codeph">VARCHAR(<var class="keyword varname">max_length</var>)</code> or
+ <code class="ph codeph">CHAR(<var class="keyword varname">length</var>)</code>, but for best performance use <code class="ph codeph">STRING</code>
+ where practical.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Character sets:</strong> For full support in all Impala subsystems, restrict string values to the ASCII
+ character set. Although some UTF-8 character data can be stored in Impala and retrieved through queries, UTF-8 strings
+ containing non-ASCII characters are not guaranteed to work properly in combination with many SQL aspects,
+ including but not limited to:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ String manipulation functions.
+ </li>
+ <li class="li">
+ Comparison operators.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ORDER BY</code> clause.
+ </li>
+ <li class="li">
+ Values in partition key columns.
+ </li>
+ </ul>
+
+ <p class="p">
+ For any national language aspects such as
+ collation order or interpreting extended ASCII variants such as ISO-8859-1 or ISO-8859-2 encodings, Impala
+ does not include such metadata with the table definition. If you need to sort, manipulate, or display data
+ depending on those national language characteristics of string data, use logic on the application side.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala does not automatically convert <code class="ph codeph">STRING</code> to any numeric type. Impala does
+ automatically convert <code class="ph codeph">STRING</code> to <code class="ph codeph">TIMESTAMP</code> if the value matches one of
+ the accepted <code class="ph codeph">TIMESTAMP</code> formats; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for
+ details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">STRING</code> values to
+ <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">TIMESTAMP</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You cannot directly cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>. You can use a
+ <code class="ph codeph">CASE</code> expression to evaluate string values such as <code class="ph codeph">'T'</code>,
+ <code class="ph codeph">'true'</code>, and so on and return Boolean <code class="ph codeph">true</code> and <code class="ph codeph">false</code>
+ values as appropriate.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can cast a <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code>
+ for <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong>
+ </p>
+
+ <p class="p">
+ Although it might be convenient to use <code class="ph codeph">STRING</code> columns for partition keys, even when those
+ columns contain numbers, for performance and scalability it is much better to use numeric columns as
+ partition keys whenever practical. Although the underlying HDFS directory name might be the same in either
+ case, the in-memory storage for the partition key columns is more compact, and computations are faster, if
+ partition key columns such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, <code class="ph codeph">DAY</code> and so on
+ are declared as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>, and so on.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+ BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+ to all be different values.
+ </p>
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+ <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+ <p class="p">
+ The Avro specification allows string values up to 2**64 bytes in length.
+ Impala queries for Avro tables use 32-bit integers to hold string lengths.
+ In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+ and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+ If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+ bytes in an Avro table, the query fails. In earlier releases,
+ encountering such long values in an Avro table could cause a crash.
+ </p>
+
+
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because the values of this type have variable size, none of the
+ column statistics fields are filled in until you run the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples demonstrate double-quoted and single-quoted string literals, and required escaping for
+ quotation marks within string literals:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT 'I am a single-quoted string';
+SELECT "I am a double-quoted string";
+SELECT 'I\'m a single-quoted string with an apostrophe';
+SELECT "I\'m a double-quoted string with an apostrophe";
+SELECT 'I am a "short" single-quoted string containing quotes';
+SELECT "I am a \"short\" double-quoted string containing quotes";
+</code></pre>
+
+ <p class="p">
+ The following examples demonstrate calls to string manipulation functions to concatenate strings, convert
+ numbers to strings, or pull out substrings:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT CONCAT("Once upon a time, there were ", CAST(3 AS STRING), ' little pigs.');
+SELECT SUBSTR("hello world",7,5);
+</code></pre>
+
+ <p class="p">
+ The following examples show how to perform operations on <code class="ph codeph">STRING</code> columns within a table:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (s1 STRING, s2 STRING);
+INSERT INTO t1 VALUES ("hello", 'world'), (CAST(7 AS STRING), "wonders");
+SELECT s1, s2, length(s1) FROM t1 WHERE s2 LIKE 'w%';
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#string_literals">String Literals</a>, <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>,
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
[39/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_table.html b/docs/build3x/html/topics/impala_create_table.html
new file mode 100644
index 0000000..e2c3528
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_table.html
@@ -0,0 +1,1346 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE TABLE Statement</title></head><body class="impala sql_statement" id="create_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1 impala_title sql_statement_title" id="ariaid-title1">CREATE TABLE Statement</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Creates a new table and specifies its characteristics. While creating a table, you
+ optionally specify aspects such as:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Whether the table is internal or external.
+ </li>
+
+ <li class="li">
+ The columns and associated data types.
+ </li>
+
+ <li class="li">
+ The columns used for physically partitioning the data.
+ </li>
+
+ <li class="li">
+ The file format for data files.
+ </li>
+
+ <li class="li">
+ The HDFS directory where the data files are located.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ The general syntax for creating a table and specifying its columns is as follows:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Explicit column definitions:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+ [COMMENT '<var class="keyword varname">col_comment</var>']
+ [, ...]
+ )
+ [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+ <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+ [
+ [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+ ]
+ [LOCATION '<var class="keyword varname">hdfs_path</var>']
+ [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph"> [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ <span class="ph">[PARTITIONED BY (<var class="keyword varname">col_name</var>[, ...])]</span>
+ <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+ [
+ [ROW FORMAT <var class="keyword varname">row_format</var>] <span class="ph">[STORED AS <var class="keyword varname">ctas_file_format</var>]</span>
+ ]
+ [LOCATION '<var class="keyword varname">hdfs_path</var>']
+ [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph"> [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+AS
+ <var class="keyword varname">select_statement</var></code></pre>
+
+<pre class="pre codeblock"><code>primitive_type:
+ TINYINT
+ | SMALLINT
+ | INT
+ | BIGINT
+ | BOOLEAN
+ | FLOAT
+ | DOUBLE
+ <span class="ph">| DECIMAL</span>
+ | STRING
+ <span class="ph">| CHAR</span>
+ <span class="ph">| VARCHAR</span>
+ | TIMESTAMP
+
+<span class="ph">complex_type:
+ struct_type
+ | array_type
+ | map_type
+
+struct_type: STRUCT < <var class="keyword varname">name</var> : <var class="keyword varname">primitive_or_complex_type</var> [COMMENT '<var class="keyword varname">comment_string</var>'], ... >
+
+array_type: ARRAY < <var class="keyword varname">primitive_or_complex_type</var> >
+
+map_type: MAP < <var class="keyword varname">primitive_type</var>, <var class="keyword varname">primitive_or_complex_type</var> >
+</span>
+row_format:
+ DELIMITED [FIELDS TERMINATED BY '<var class="keyword varname">char</var>' [ESCAPED BY '<var class="keyword varname">char</var>']]
+ [LINES TERMINATED BY '<var class="keyword varname">char</var>']
+
+file_format:
+ PARQUET
+ | TEXTFILE
+ | AVRO
+ | SEQUENCEFILE
+ | RCFILE
+
+<span class="ph">ctas_file_format:
+ PARQUET
+ | TEXTFILE</span>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Column definitions inferred from data file:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ LIKE PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>'
+ <span class="ph">[SORT BY ([<var class="keyword varname">column</var> [, <var class="keyword varname">column</var> ...]])]</span>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ [PARTITIONED BY (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var> [COMMENT '<var class="keyword varname">col_comment</var>'], ...)]
+ [WITH SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+ [
+ [ROW FORMAT <var class="keyword varname">row_format</var>] [STORED AS <var class="keyword varname">file_format</var>]
+ ]
+ [LOCATION '<var class="keyword varname">hdfs_path</var>']
+ [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+<span class="ph"> [CACHED IN '<var class="keyword varname">pool_name</var>'</span> <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED]
+data_type:
+ <var class="keyword varname">primitive_type</var>
+ | array_type
+ | map_type
+ | struct_type
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Kudu tables:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ (<var class="keyword varname">col_name</var> <var class="keyword varname">data_type</var>
+ <span class="ph">[<var class="keyword varname">kudu_column_attribute</var> ...]</span>
+ [COMMENT '<var class="keyword varname">col_comment</var>']
+ [, ...]
+ [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+ )
+ <span class="ph">[PARTITION BY <var class="keyword varname">kudu_partition_clause</var>]</span>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ STORED AS KUDU
+ [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+</code></pre>
+
+ <div class="p">
+ <strong class="ph b">Kudu column attributes:</strong>
+<pre class="pre codeblock"><code>
+ PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+ </div>
+
+ <div class="p">
+ <strong class="ph b">kudu_partition_clause:</strong>
+<pre class="pre codeblock"><code>
+kudu_partition_clause ::= [<var class="keyword varname">hash_clause</var>] [, <var class="keyword varname">range_clause</var> [ , <var class="keyword varname">range_clause</var> ] ]
+
+hash_clause ::=
+ HASH [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+ PARTITIONS <var class="keyword varname">n</var>
+
+range_clause ::=
+ RANGE [ (<var class="keyword varname">pk_col</var> [, ...]) ]
+ (
+ {
+ PARTITION <var class="keyword varname">constant_expression</var> <var class="keyword varname">range_comparison_operator</var> VALUES <var class="keyword varname">range_comparison_operator</var> <var class="keyword varname">constant_expression</var>
+ | PARTITION VALUE = <var class="keyword varname">constant_expression_or_tuple</var>
+ }
+ [, ...]
+ )
+
+range_comparison_operator ::= { < | <= }
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">External Kudu tables:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ STORED AS KUDU
+ [TBLPROPERTIES ('kudu.table_name'='<var class="keyword varname">internal_kudu_name</var>')]
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">CREATE TABLE AS SELECT for Kudu tables:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE [IF NOT EXISTS] <var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ [PRIMARY KEY (<var class="keyword varname">col_name</var>[, ...])]
+ [PARTITION BY <var class="keyword varname">kudu_partition_clause</var>]
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ STORED AS KUDU
+ [TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>', ...)]
+AS
+ <var class="keyword varname">select_statement</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Column definitions:</strong>
+ </p>
+
+ <p class="p">
+ Depending on the form of the <code class="ph codeph">CREATE TABLE</code> statement, the column
+ definitions are required or not allowed.
+ </p>
+
+ <p class="p">
+ With the <code class="ph codeph">CREATE TABLE AS SELECT</code> and <code class="ph codeph">CREATE TABLE LIKE</code>
+ syntax, you do not specify the columns at all; the column names and types are derived from
+ the source table, query, or data file.
+ </p>
+
+ <p class="p">
+ With the basic <code class="ph codeph">CREATE TABLE</code> syntax, you must list one or more columns,
+ its name, type, and optionally a comment, in addition to any columns used as partitioning
+ keys. There is one exception where the column list is not required: when creating an Avro
+ table with the <code class="ph codeph">STORED AS AVRO</code> clause, you can omit the list of columns
+ and specify the same metadata as part of the <code class="ph codeph">TBLPROPERTIES</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or
+ <code class="ph codeph">MAP</code>) are available in <span class="keyword">Impala 2.3</span> and higher.
+ Because you can nest these types (for example, to make an array of maps or a struct with
+ an array field), these types are also sometimes referred to as nested types. See
+ <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details.
+ </p>
+
+
+
+ <p class="p">
+ Impala can create tables containing complex type columns, with any supported file format.
+ Because currently Impala can only query complex type columns in Parquet tables, creating
+ tables with complex type columns and other file formats such as text is of limited use.
+ For example, you might create a text table including some columns with complex types with
+ Impala, and use Hive as part of your to ingest the nested type data and copy it to an
+ identical Parquet table. Or you might create a partitioned table containing complex type
+ columns using one file format, and use <code class="ph codeph">ALTER TABLE</code> to change the file
+ format of individual partitions to Parquet; Impala can then query only the Parquet-format
+ partitions in that table.
+ </p>
+
+ <p class="p">
+ Partitioned tables can contain complex type columns.
+ All the partition key columns must be scalar types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Internal and external tables (EXTERNAL and LOCATION clauses):</strong>
+ </p>
+
+ <p class="p">
+ By default, Impala creates an <span class="q">"internal"</span> table, where Impala manages the underlying
+ data files for the table, and physically deletes the data files when you drop the table.
+ If you specify the <code class="ph codeph">EXTERNAL</code> clause, Impala treats the table as an
+ <span class="q">"external"</span> table, where the data files are typically produced outside Impala and
+ queried from their original locations in HDFS, and Impala leaves the data files in place
+ when you drop the table. For details about internal and external tables, see
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+ </p>
+
+ <p class="p">
+ Typically, for an external table you include a <code class="ph codeph">LOCATION</code> clause to specify
+ the path to the HDFS directory where Impala reads and writes files for the table. For
+ example, if your data pipeline produces Parquet files in the HDFS directory
+ <span class="ph filepath">/user/etl/destination</span>, you might create an external table as follows:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE external_parquet (c1 INT, c2 STRING, c3 TIMESTAMP)
+ STORED AS PARQUET LOCATION '/user/etl/destination';
+</code></pre>
+
+ <p class="p">
+ Although the <code class="ph codeph">EXTERNAL</code> and <code class="ph codeph">LOCATION</code> clauses are often
+ specified together, <code class="ph codeph">LOCATION</code> is optional for external tables, and you can
+ also specify <code class="ph codeph">LOCATION</code> for internal tables. The difference is all about
+ whether Impala <span class="q">"takes control"</span> of the underlying data files and moves them when you
+ rename the table, or deletes them when you drop the table. For more about internal and
+ external tables and how they interact with the <code class="ph codeph">LOCATION</code> attribute, see
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioned tables (PARTITIONED BY clause):</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITIONED BY</code> clause divides the data files based on the values from
+ one or more specified columns. Impala queries can use the partition metadata to minimize
+ the amount of data that is read from disk or transmitted across the network, particularly
+ during join queries. For details about partitioning, see
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ All Kudu tables require partitioning, which involves different syntax than non-Kudu
+ tables. See the <code class="ph codeph">PARTITION BY</code> clause, rather than <code class="ph codeph">PARTITIONED
+ BY</code>, for Kudu tables.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.10</span> and higher, the <code class="ph codeph">PARTITION BY</code>
+ clause is optional for Kudu tables. If the clause is omitted, Impala automatically
+ constructs a single partition that is not connected to any column. Because such a
+ table cannot take advantage of Kudu features for parallelized queries and
+ query optimizations, omitting the <code class="ph codeph">PARTITION BY</code> clause is only
+ appropriate for small lookup tables.
+ </p>
+ </div>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span>, you could use a partitioned table as the
+ source and copy data from it, but could not specify any partitioning clauses for the new
+ table. In <span class="keyword">Impala 2.5</span> and higher, you can now use the
+ <code class="ph codeph">PARTITIONED BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ statement. See the examples under the following discussion of the <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> syntax variation.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Sorted tables (SORT BY clause):</strong>
+ </p>
+
+ <p class="p">
+ The optional <code class="ph codeph">SORT BY</code> clause lets you specify zero or more columns
+ that are sorted in the data files created by each Impala <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> operation. Creating data files that are
+ sorted is most useful for Parquet tables, where the metadata stored inside each file includes
+ the minimum and maximum values for each column in the file. (The statistics apply to each row group
+ within the file; for simplicity, Impala writes a single row group in each file.) Grouping
+ data values together in relatively narrow ranges within each data file makes it possible
+ for Impala to quickly skip over data files that do not contain value ranges indicated in
+ the <code class="ph codeph">WHERE</code> clause of a query, and can improve the effectiveness
+ of Parquet encoding and compression.
+ </p>
+
+ <p class="p">
+ This clause is not applicable for Kudu tables or HBase tables. Although it works
+ for other HDFS file formats besides Parquet, the more efficient layout is most
+ evident with Parquet tables, because each Parquet data file includes statistics
+ about the data values in that file.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SORT BY</code> columns cannot include any partition key columns
+ for a partitioned table, because those column values are not represented in
+ the underlying data files.
+ </p>
+
+ <p class="p">
+ Because data files can arrive in Impala tables by mechanisms that do not respect
+ the <code class="ph codeph">SORT BY</code> clause, such as <code class="ph codeph">LOAD DATA</code> or ETL
+ tools that create HDFS files, Impala does not guarantee or rely on the data being
+ sorted. The sorting aspect is only used to create a more efficient layout for
+ Parquet files generated by Impala, which helps to optimize the processing of
+ those Parquet files during Impala queries. During an <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">CREATE TABLE AS SELECT</code> operation, the sorting occurs
+ when the <code class="ph codeph">SORT BY</code> clause applies to the destination table
+ for the data, regardless of whether the source table has a <code class="ph codeph">SORT BY</code>
+ clause.
+ </p>
+
+ <p class="p">
+ For example, when creating a table intended to contain census data, you might define
+ sort columns such as last name and state. If a data file in this table contains a
+ narrow range of last names, for example from <code class="ph codeph">Smith</code> to <code class="ph codeph">Smythe</code>,
+ Impala can quickly detect that this data file contains no matches for a <code class="ph codeph">WHERE</code>
+ clause such as <code class="ph codeph">WHERE last_name = 'Jones'</code> and avoid reading the entire file.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE census_data (last_name STRING, first_name STRING, state STRING, address STRING)
+ SORT BY (last_name, state)
+ STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ Likewise, if an existing table contains data without any sort order, you can reorganize
+ the data in a more efficient way by using <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy that data into a new table with a
+ <code class="ph codeph">SORT BY</code> clause:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE sorted_census_data
+ SORT BY (last_name, state)
+ STORED AS PARQUET
+ AS SELECT last_name, first_name, state, address
+ FROM unsorted_census_data;
+</code></pre>
+
+ <p class="p">
+ The metadata for the <code class="ph codeph">SORT BY</code> clause is stored in the <code class="ph codeph">TBLPROPERTIES</code>
+ fields for the table. Other SQL engines that can interoperate with Impala tables, such as Hive
+ and Spark SQL, do not recognize this property when inserting into a table that has a <code class="ph codeph">SORT BY</code>
+ clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because Kudu tables do not support clauses related to HDFS and S3 data files and
+ partitioning mechanisms, the syntax associated with the <code class="ph codeph">STORED AS KUDU</code>
+ clause is shown separately in the above syntax descriptions. Kudu tables have their own
+ syntax for <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">CREATE EXTERNAL TABLE</code>, and
+ <code class="ph codeph">CREATE TABLE AS SELECT</code>. <span class="ph">Prior to <span class="keyword">Impala 2.10</span>,
+ all internal Kudu tables require a <code class="ph codeph">PARTITION BY</code> clause, different than
+ the <code class="ph codeph">PARTITIONED BY</code> clause for HDFS-backed tables.</span>
+ </p>
+
+ <p class="p">
+ Here are some examples of creating empty Kudu tables:
+ </p>
+
+<pre class="pre codeblock"><code>
+<span class="ph">-- Single partition. Only for <span class="keyword">Impala 2.10</span> and higher.
+-- Only suitable for small lookup tables.
+CREATE TABLE kudu_no_partition_by_clause
+ (
+ id bigint PRIMARY KEY, s STRING, b BOOLEAN
+ )
+ STORED AS KUDU;</span>
+
+-- Single-column primary key.
+CREATE TABLE kudu_t1 (id BIGINT PRIMARY key, s STRING, b BOOLEAN)
+ PARTITION BY HASH (id) PARTITIONS 20 STORED AS KUDU;
+
+-- Multi-column primary key.
+CREATE TABLE kudu_t2 (id BIGINT, s STRING, b BOOLEAN, PRIMARY KEY (id,s))
+ PARTITION BY HASH (s) PARTITIONS 30 STORED AS KUDU;
+
+-- Meaningful primary key column is good for range partitioning.
+CREATE TABLE kudu_t3 (id BIGINT, year INT, s STRING,
+ b BOOLEAN, PRIMARY KEY (id,year))
+ PARTITION BY HASH (id) PARTITIONS 20,
+ RANGE (year) (PARTITION 1980 <= VALUES < 1990,
+ PARTITION 1990 <= VALUES < 2000,
+ PARTITION VALUE = 2001,
+ PARTITION 2001 < VALUES)
+ STORED AS KUDU;
+
+</code></pre>
+
+ <p class="p">
+ Here is an example of creating an external Kudu table:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Inherits column definitions from original table.
+-- For tables created through Impala, the kudu.table_name property
+-- comes from DESCRIBE FORMATTED output from the original table.
+CREATE EXTERNAL TABLE external_t1 STORED AS KUDU
+ TBLPROPERTIES ('kudu.table_name'='kudu_tbl_created_via_api');
+
+</code></pre>
+
+ <p class="p">
+ Here is an example of <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax for a Kudu table:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The CTAS statement defines the primary key and partitioning scheme.
+-- The rest of the column definitions are derived from the select list.
+CREATE TABLE ctas_t1
+ PRIMARY KEY (id) PARTITION BY HASH (id) PARTITIONS 10
+ STORED AS KUDU
+ AS SELECT id, s FROM kudu_t1;
+
+</code></pre>
+
+ <p class="p">
+ The following <code class="ph codeph">CREATE TABLE</code> clauses are not supported for Kudu tables:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">PARTITIONED BY</code> (Kudu tables use the clause <code class="ph codeph">PARTITION
+ BY</code> instead)
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">LOCATION</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">ROWFORMAT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">CACHED IN | UNCACHED</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">WITH SERDEPROPERTIES</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ For more on the <code class="ph codeph">PRIMARY KEY</code> clause, see
+ <a class="xref" href="impala_kudu.html#kudu_primary_key">Primary Key Columns for Kudu Tables</a> and
+ <a class="xref" href="impala_kudu.html#kudu_primary_key_attribute">PRIMARY KEY Attribute</a>.
+ </p>
+
+ <p class="p">
+ For more on the <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code> attributes, see
+ <a class="xref" href="impala_kudu.html#kudu_not_null_attribute">NULL | NOT NULL Attribute</a>.
+ </p>
+
+ <p class="p">
+ For more on the <code class="ph codeph">ENCODING</code> attribute, see
+ <a class="xref" href="impala_kudu.html#kudu_encoding_attribute">ENCODING Attribute</a>.
+ </p>
+
+ <p class="p">
+ For more on the <code class="ph codeph">COMPRESSION</code> attribute, see
+ <a class="xref" href="impala_kudu.html#kudu_compression_attribute">COMPRESSION Attribute</a>.
+ </p>
+
+ <p class="p">
+ For more on the <code class="ph codeph">DEFAULT</code> attribute, see
+ <a class="xref" href="impala_kudu.html#kudu_default_attribute">DEFAULT Attribute</a>.
+ </p>
+
+ <p class="p">
+ For more on the <code class="ph codeph">BLOCK_SIZE</code> attribute, see
+ <a class="xref" href="impala_kudu.html#kudu_block_size_attribute">BLOCK_SIZE Attribute</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioning for Kudu tables (PARTITION BY clause)</strong>
+ </p>
+
+ <p class="p">
+ For Kudu tables, you specify logical partitioning across one or more columns using the
+ <code class="ph codeph">PARTITION BY</code> clause. In contrast to partitioning for HDFS-based tables,
+ multiple values for a partition key column can be located in the same partition. The
+ optional <code class="ph codeph">HASH</code> clause lets you divide one or a set of partition key
+ columns into a specified number of buckets. You can use more than one
+ <code class="ph codeph">HASH</code> clause, specifying a distinct set of partition key columns for each.
+ The optional <code class="ph codeph">RANGE</code> clause further subdivides the partitions, based on a
+ set of comparison operations for the partition key columns.
+ </p>
+
+ <p class="p">
+ Here are some examples of the <code class="ph codeph">PARTITION BY HASH</code> syntax:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Apply hash function to 1 primary key column.
+create table hash_t1 (x bigint, y bigint, s string, primary key (x,y))
+ partition by hash (x) partitions 10
+ stored as kudu;
+
+-- Apply hash function to a different primary key column.
+create table hash_t2 (x bigint, y bigint, s string, primary key (x,y))
+ partition by hash (y) partitions 10
+ stored as kudu;
+
+-- Apply hash function to both primary key columns.
+-- In this case, the total number of partitions is 10.
+create table hash_t3 (x bigint, y bigint, s string, primary key (x,y))
+ partition by hash (x,y) partitions 10
+ stored as kudu;
+
+-- When the column list is omitted, apply hash function to all primary key columns.
+create table hash_t4 (x bigint, y bigint, s string, primary key (x,y))
+ partition by hash partitions 10
+ stored as kudu;
+
+-- Hash the X values independently from the Y values.
+-- In this case, the total number of partitions is 10 x 20.
+create table hash_t5 (x bigint, y bigint, s string, primary key (x,y))
+ partition by hash (x) partitions 10, hash (y) partitions 20
+ stored as kudu;
+
+</code></pre>
+
+ <p class="p">
+ Here are some examples of the <code class="ph codeph">PARTITION BY RANGE</code> syntax:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Create partitions that cover every possible value of X.
+-- Ranges that span multiple values use the keyword VALUES between
+-- a pair of < and <= comparisons.
+create table range_t1 (x bigint, s string, s2 string, primary key (x, s))
+ partition by range (x)
+ (
+ partition 0 <= values <= 49, partition 50 <= values <= 100,
+ partition values < 0, partition 100 < values
+ )
+ stored as kudu;
+
+-- Create partitions that cover some possible values of X.
+-- Values outside the covered range(s) are rejected.
+-- New range partitions can be added through ALTER TABLE.
+create table range_t2 (x bigint, s string, s2 string, primary key (x, s))
+ partition by range (x)
+ (
+ partition 0 <= values <= 49, partition 50 <= values <= 100
+ )
+ stored as kudu;
+
+-- A range can also specify a single specific value, using the keyword VALUE
+-- with an = comparison.
+create table range_t3 (x bigint, s string, s2 string, primary key (x, s))
+ partition by range (s)
+ (
+ partition value = 'Yes', partition value = 'No', partition value = 'Maybe'
+ )
+ stored as kudu;
+
+-- Using multiple columns in the RANGE clause and tuples inside the partition spec
+-- only works for partitions specified with the VALUE= syntax.
+create table range_t4 (x bigint, s string, s2 string, primary key (x, s))
+ partition by range (x,s)
+ (
+ partition value = (0,'zero'), partition value = (1,'one'), partition value = (2,'two')
+ )
+ stored as kudu;
+
+</code></pre>
+
+ <p class="p">
+ Here are some examples combining both <code class="ph codeph">HASH</code> and <code class="ph codeph">RANGE</code>
+ syntax for the <code class="ph codeph">PARTITION BY</code> clause:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Values from each range partition are hashed into 10 associated buckets.
+-- Total number of partitions in this case is 10 x 2.
+create table combined_t1 (x bigint, s string, s2 string, primary key (x, s))
+ partition by hash (x) partitions 10, range (x)
+ (
+ partition 0 <= values <= 49, partition 50 <= values <= 100
+ )
+ stored as kudu;
+
+-- The hash partitioning and range partitioning can apply to different columns.
+-- But all the columns used in either partitioning scheme must be from the primary key.
+create table combined_t2 (x bigint, s string, s2 string, primary key (x, s))
+ partition by hash (s) partitions 10, range (x)
+ (
+ partition 0 <= values <= 49, partition 50 <= values <= 100
+ )
+ stored as kudu;
+
+</code></pre>
+
+ <p class="p">
+ For more usage details and examples of the Kudu partitioning syntax, see
+ <a class="xref" href="impala_kudu.html">Using Impala to Query Kudu Tables</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Specifying file format (STORED AS and ROW FORMAT clauses):</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STORED AS</code> clause identifies the format of the underlying data files.
+ Currently, Impala can query more types of file formats than it can create or insert into.
+ Use Hive to perform any create or data load operations that are not currently available in
+ Impala. For example, Impala can create an Avro, SequenceFile, or RCFile table but cannot
+ insert data into it. There are also Impala-specific procedures for using compression with
+ each kind of file format. For details about working with data files of various formats,
+ see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In Impala 1.4.0 and higher, Impala can create Avro tables, which formerly required doing
+ the <code class="ph codeph">CREATE TABLE</code> statement in Hive. See
+ <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for details and examples.
+ </div>
+
+ <p class="p">
+ By default (when no <code class="ph codeph">STORED AS</code> clause is specified), data files in Impala
+ tables are created as text files with Ctrl-A (hex 01) characters as the delimiter.
+
+ Specify the <code class="ph codeph">ROW FORMAT DELIMITED</code> clause to produce or ingest data files
+ that use a different delimiter character such as tab or <code class="ph codeph">|</code>, or a different
+ line end character such as carriage return or newline. When specifying delimiter and line
+ end characters with the <code class="ph codeph">FIELDS TERMINATED BY</code> and <code class="ph codeph">LINES TERMINATED
+ BY</code> clauses, use <code class="ph codeph">'\t'</code> for tab, <code class="ph codeph">'\n'</code> for newline
+ or linefeed, <code class="ph codeph">'\r'</code> for carriage return, and
+ <code class="ph codeph">\</code><code class="ph codeph">0</code> for ASCII <code class="ph codeph">nul</code> (hex 00). For more
+ examples of text tables, see <a class="xref" href="impala_txtfile.html#txtfile">Using Text Data Files with Impala Tables</a>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ESCAPED BY</code> clause applies both to text files that you create through
+ an <code class="ph codeph">INSERT</code> statement to an Impala <code class="ph codeph">TEXTFILE</code> table, and to
+ existing data files that you put into an Impala table directory. (You can ingest existing
+ data files either by creating the table with <code class="ph codeph">CREATE EXTERNAL TABLE ...
+ LOCATION</code>, the <code class="ph codeph">LOAD DATA</code> statement, or through an HDFS operation
+ such as <code class="ph codeph">hdfs dfs -put <var class="keyword varname">file</var>
+ <var class="keyword varname">hdfs_path</var></code>.) Choose an escape character that is not used
+ anywhere else in the file, and put it in front of each instance of the delimiter character
+ that occurs within a field value. Surrounding field values with quotation marks does not
+ help Impala to parse fields with embedded delimiter characters; the quotation marks are
+ considered to be part of the column value. If you want to use <code class="ph codeph">\</code> as the
+ escape character, specify the clause in <span class="keyword cmdname">impala-shell</span> as <code class="ph codeph">ESCAPED
+ BY '\\'</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+ BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+ their argument, because they all require a single character. You can use a regular character surrounded by
+ single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+ or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+ single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+ TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+ character used as a delimiter by some data formats.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Cloning tables (LIKE clause):</strong>
+ </p>
+
+ <p class="p">
+ To create an empty table with the same columns, comments, and other attributes as another
+ table, use the following variation. The <code class="ph codeph">CREATE TABLE ... LIKE</code> form allows
+ a restricted set of clauses, currently only the <code class="ph codeph">LOCATION</code>,
+ <code class="ph codeph">COMMENT</code>, and <code class="ph codeph">STORED AS</code> clauses.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>
+ <span class="ph">LIKE { [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> | PARQUET '<var class="keyword varname">hdfs_path_of_parquet_file</var>' }</span>
+ [COMMENT '<var class="keyword varname">table_comment</var>']
+ [STORED AS <var class="keyword varname">file_format</var>]
+ [LOCATION '<var class="keyword varname">hdfs_path</var>']</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ To clone the structure of a table and transfer data into it in a single operation, use
+ the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax described in the next subsection.
+ </p>
+ </div>
+
+ <p class="p">
+ When you clone the structure of an existing table using the <code class="ph codeph">CREATE TABLE ...
+ LIKE</code> syntax, the new table keeps the same file format as the original one, so you
+ only need to specify the <code class="ph codeph">STORED AS</code> clause if you want to use a different
+ file format, or when specifying a view as the original table. (Creating a table
+ <span class="q">"like"</span> a view produces a text table by default.)
+ </p>
+
+ <p class="p">
+ Although normally Impala cannot create an HBase table directly, Impala can clone the
+ structure of an existing HBase table with the <code class="ph codeph">CREATE TABLE ... LIKE</code>
+ syntax, preserving the file format and metadata from the original table.
+ </p>
+
+ <p class="p">
+ There are some exceptions to the ability to use <code class="ph codeph">CREATE TABLE ... LIKE</code>
+ with an Avro table. For example, you cannot use this technique for an Avro table that is
+ specified with an Avro schema but no columns. When in doubt, check if a <code class="ph codeph">CREATE
+ TABLE ... LIKE</code> operation works in Hive; if not, it typically will not work in
+ Impala either.
+ </p>
+
+ <p class="p">
+ If the original table is partitioned, the new table inherits the same partition key
+ columns. Because the new table is initially empty, it does not inherit the actual
+ partitions that exist in the original one. To create partitions in the new table, insert
+ data or issue <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements.
+ </p>
+
+ <p class="p">
+ Prior to Impala 1.4.0, it was not possible to use the <code class="ph codeph">CREATE TABLE LIKE
+ <var class="keyword varname">view_name</var></code> syntax. In Impala 1.4.0 and higher, you can create a table with the
+ same column definitions as a view using the <code class="ph codeph">CREATE TABLE LIKE</code> technique. Although
+ <code class="ph codeph">CREATE TABLE LIKE</code> normally inherits the file format of the original table, a view has no
+ underlying file format, so <code class="ph codeph">CREATE TABLE LIKE <var class="keyword varname">view_name</var></code> produces a text
+ table by default. To specify a different file format, include a <code class="ph codeph">STORED AS
+ <var class="keyword varname">file_format</var></code> clause at the end of the <code class="ph codeph">CREATE TABLE LIKE</code>
+ statement.
+ </p>
+
+ <p class="p">
+ Because <code class="ph codeph">CREATE TABLE ... LIKE</code> only manipulates table metadata, not the
+ physical data of the table, issue <code class="ph codeph">INSERT INTO TABLE</code> statements afterward
+ to copy any data from the original table into the new one, optionally converting the data
+ to a new file format. (For some file formats, Impala can do a <code class="ph codeph">CREATE TABLE ...
+ LIKE</code> to create the table, but Impala cannot insert data in that file format; in
+ these cases, you must load the data in Hive. See
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details.)
+ </p>
+
+ <p class="p" id="create_table__ctas">
+ <strong class="ph b">CREATE TABLE AS SELECT:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax is a shorthand notation to create a
+ table based on column definitions from another table, and copy data from the source table
+ to the destination table without issuing any separate <code class="ph codeph">INSERT</code> statement.
+ This idiom is so popular that it has its own acronym, <span class="q">"CTAS"</span>.
+ </p>
+
+ <p class="p">
+ The following examples show how to copy data from a source table <code class="ph codeph">T1</code> to a
+ variety of destinations tables, applying various transformations to the table properties,
+ table layout, or the data itself as part of the operation:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Sample table to be the source of CTAS operations.
+CREATE TABLE t1 (x INT, y STRING);
+INSERT INTO t1 VALUES (1, 'one'), (2, 'two'), (3, 'three');
+
+-- Clone all the columns and data from one table to another.
+CREATE TABLE clone_of_t1 AS SELECT * FROM t1;
++-------------------+
+| summary |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Clone the columns and data, and convert the data to a different file format.
+CREATE TABLE parquet_version_of_t1 STORED AS PARQUET AS SELECT * FROM t1;
++-------------------+
+| summary |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Copy only some rows to the new table.
+CREATE TABLE subset_of_t1 AS SELECT * FROM t1 WHERE x >= 2;
++-------------------+
+| summary |
++-------------------+
+| Inserted 2 row(s) |
++-------------------+
+
+-- Same idea as CREATE TABLE LIKE: clone table layout but do not copy any data.
+CREATE TABLE empty_clone_of_t1 AS SELECT * FROM t1 WHERE 1=0;
++-------------------+
+| summary |
++-------------------+
+| Inserted 0 row(s) |
++-------------------+
+
+-- Reorder and rename columns and transform the data.
+CREATE TABLE t5 AS SELECT upper(y) AS s, x+1 AS a, 'Entirely new column' AS n FROM t1;
++-------------------+
+| summary |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+SELECT * FROM t5;
++-------+---+---------------------+
+| s | a | n |
++-------+---+---------------------+
+| ONE | 2 | Entirely new column |
+| TWO | 3 | Entirely new column |
+| THREE | 4 | Entirely new column |
++-------+---+---------------------+
+</code></pre>
+
+
+
+
+
+ <p class="p">
+ See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details about query syntax for the
+ <code class="ph codeph">SELECT</code> portion of a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement.
+ </p>
+
+ <p class="p">
+ The newly created table inherits the column names that you select from the original table,
+ which you can override by specifying column aliases in the query. Any column or table
+ comments from the original table are not carried over to the new table.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ When using the <code class="ph codeph">STORED AS</code> clause with a <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement, the destination table must be a file format that Impala can
+ write to: currently, text or Parquet. You cannot specify an Avro, SequenceFile, or RCFile
+ table as the destination table for a CTAS operation.
+ </div>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span> you could use a partitioned table as the source
+ and copy data from it, but could not specify any partitioning clauses for the new table.
+ In <span class="keyword">Impala 2.5</span> and higher, you can now use the <code class="ph codeph">PARTITIONED
+ BY</code> clause with a <code class="ph codeph">CREATE TABLE AS SELECT</code> statement. The following
+ example demonstrates how you can copy data from an unpartitioned table in a <code class="ph codeph">CREATE
+ TABLE AS SELECT</code> operation, creating a new partitioned table in the process. The
+ main syntax consideration is the column order in the <code class="ph codeph">PARTITIONED BY</code>
+ clause and the select list: the partition key columns must be listed last in the select
+ list, in the same order as in the <code class="ph codeph">PARTITIONED BY</code> clause. Therefore, in
+ this case, the column order in the destination table is different from the source table.
+ You also only specify the column names in the <code class="ph codeph">PARTITIONED BY</code> clause, not
+ the data types or column comments.
+ </p>
+
+<pre class="pre codeblock"><code>
+create table partitions_no (year smallint, month tinyint, s string);
+insert into partitions_no values (2016, 1, 'January 2016'),
+ (2016, 2, 'February 2016'), (2016, 3, 'March 2016');
+
+-- Prove that the source table is not partitioned.
+show partitions partitions_no;
+ERROR: AnalysisException: Table is not partitioned: ctas_partition_by.partitions_no
+
+-- Create new table with partitions based on column values from source table.
+<strong class="ph b">create table partitions_yes partitioned by (year, month)
+ as select s, year, month from partitions_no;</strong>
++-------------------+
+| summary |
++-------------------+
+| Inserted 3 row(s) |
++-------------------+
+
+-- Prove that the destination table is partitioned.
+show partitions partitions_yes;
++-------+-------+-------+--------+------+...
+| year | month | #Rows | #Files | Size |...
++-------+-------+-------+--------+------+...
+| 2016 | 1 | -1 | 1 | 13B |...
+| 2016 | 2 | -1 | 1 | 14B |...
+| 2016 | 3 | -1 | 1 | 11B |...
+| Total | | -1 | 3 | 38B |...
++-------+-------+-------+--------+------+...
+</code></pre>
+
+ <p class="p">
+ The most convenient layout for partitioned tables is with all the partition key columns at
+ the end. The CTAS <code class="ph codeph">PARTITIONED BY</code> syntax requires that column order in the
+ select list, resulting in that same column order in the destination table.
+ </p>
+
+<pre class="pre codeblock"><code>
+describe partitions_no;
++-------+----------+---------+
+| name | type | comment |
++-------+----------+---------+
+| year | smallint | |
+| month | tinyint | |
+| s | string | |
++-------+----------+---------+
+
+-- The CTAS operation forced us to put the partition key columns last.
+-- Having those columns last works better with idioms such as SELECT *
+-- for partitioned tables.
+describe partitions_yes;
++-------+----------+---------+
+| name | type | comment |
++-------+----------+---------+
+| s | string | |
+| year | smallint | |
+| month | tinyint | |
++-------+----------+---------+
+</code></pre>
+
+ <p class="p">
+ Attempting to use a select list with the partition key columns not at the end results in
+ an error due to a column name mismatch:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- We expect this CTAS to fail because non-key column S
+-- comes after key columns YEAR and MONTH in the select list.
+create table partitions_maybe partitioned by (year, month)
+ as select year, month, s from partitions_no;
+ERROR: AnalysisException: Partition column name mismatch: year != month
+</code></pre>
+
+ <p class="p">
+ For example, the following statements show how you can clone all the data in a table, or a
+ subset of the columns and/or rows, or reorder columns, rename them, or construct them out
+ of expressions:
+ </p>
+
+ <p class="p">
+ As part of a CTAS operation, you can convert the data to any file format that Impala can
+ write (currently, <code class="ph codeph">TEXTFILE</code> and <code class="ph codeph">PARQUET</code>). You cannot
+ specify the lower-level properties of a text table, such as the delimiter.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+ <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+ results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+ many different data files, prepared on different data nodes, and therefore the notion of the data being
+ stored in sorted order is impractical.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">CREATE TABLE LIKE PARQUET:</strong>
+ </p>
+
+ <p class="p">
+ The variation <code class="ph codeph">CREATE TABLE ... LIKE PARQUET
+ '<var class="keyword varname">hdfs_path_of_parquet_file</var>'</code> lets you skip the column
+ definitions of the <code class="ph codeph">CREATE TABLE</code> statement. The column names and data
+ types are automatically configured based on the organization of the specified Parquet data
+ file, which must already reside in HDFS. You can use a data file located outside the
+ Impala database directories, or a file from an existing Impala Parquet table; either way,
+ Impala only uses the column definitions from the file and does not use the HDFS location
+ for the <code class="ph codeph">LOCATION</code> attribute of the new table. (Although you can also
+ specify the enclosing directory with the <code class="ph codeph">LOCATION</code> attribute, to both use
+ the same schema as the data file and point the Impala table at the associated directory
+ for querying.)
+ </p>
+
+ <p class="p">
+ The following considerations apply when you use the <code class="ph codeph">CREATE TABLE LIKE
+ PARQUET</code> technique:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Any column comments from the original table are not preserved in the new table. Each
+ column in the new table has a comment stating the low-level Parquet field type used to
+ deduce the appropriate SQL column type.
+ </li>
+
+ <li class="li">
+ If you use a data file from a partitioned Impala table, any partition key columns from
+ the original table are left out of the new table, because they are represented in HDFS
+ directory names rather than stored in the data file. To preserve the partition
+ information, repeat the same <code class="ph codeph">PARTITION</code> clause as in the original
+ <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+
+ <li class="li">
+ The file format of the new table defaults to text, as with other kinds of <code class="ph codeph">CREATE
+ TABLE</code> statements. To make the new table also use Parquet format, include the
+ clause <code class="ph codeph">STORED AS PARQUET</code> in the <code class="ph codeph">CREATE TABLE LIKE
+ PARQUET</code> statement.
+ </li>
+
+ <li class="li">
+ If the Parquet data file comes from an existing Impala table, currently, any
+ <code class="ph codeph">TINYINT</code> or <code class="ph codeph">SMALLINT</code> columns are turned into
+ <code class="ph codeph">INT</code> columns in the new table. Internally, Parquet stores such values as
+ 32-bit integers.
+ </li>
+
+ <li class="li">
+ When the destination table uses the Parquet file format, the <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> and <code class="ph codeph">INSERT ... SELECT</code> statements always create at least
+ one data file, even if the <code class="ph codeph">SELECT</code> part of the statement does not match
+ any rows. You can use such an empty Parquet data file as a template for subsequent
+ <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> statements.
+ </li>
+ </ul>
+
+ <p class="p">
+ For more details about creating Parquet tables, and examples of the <code class="ph codeph">CREATE TABLE
+ LIKE PARQUET</code> syntax, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Visibility and Metadata (TBLPROPERTIES and WITH SERDEPROPERTIES clauses):</strong>
+ </p>
+
+ <p class="p">
+ You can associate arbitrary items of metadata with a table by specifying the
+ <code class="ph codeph">TBLPROPERTIES</code> clause. This clause takes a comma-separated list of
+ key-value pairs and stores those items in the metastore database. You can also change the
+ table properties later with an <code class="ph codeph">ALTER TABLE</code> statement. You can observe the
+ table properties for different delimiter and escape characters using the <code class="ph codeph">DESCRIBE
+ FORMATTED</code> command, and change those settings for an existing table with
+ <code class="ph codeph">ALTER TABLE ... SET TBLPROPERTIES</code>.
+ </p>
+
+ <p class="p">
+ You can also associate SerDes properties with the table by specifying key-value pairs
+ through the <code class="ph codeph">WITH SERDEPROPERTIES</code> clause. This metadata is not used by
+ Impala, which has its own built-in serializer and deserializer for the file formats it
+ supports. Particular property values might be needed for Hive compatibility with certain
+ variations of file formats, particularly Avro.
+ </p>
+
+ <p class="p">
+ Some DDL operations that interact with other Hadoop components require specifying
+ particular values in the <code class="ph codeph">SERDEPROPERTIES</code> or
+ <code class="ph codeph">TBLPROPERTIES</code> fields, such as creating an Avro table or an HBase table.
+ (You typically create HBase tables in Hive, because they require additional clauses not
+ currently available in Impala.)
+
+ </p>
+
+ <p class="p">
+ To see the column definitions and column comments for an existing table, for example
+ before issuing a <code class="ph codeph">CREATE TABLE ... LIKE</code> or a <code class="ph codeph">CREATE TABLE ... AS
+ SELECT</code> statement, issue the statement <code class="ph codeph">DESCRIBE
+ <var class="keyword varname">table_name</var></code>. To see even more detail, such as the location of
+ data files and the values for clauses such as <code class="ph codeph">ROW FORMAT</code> and
+ <code class="ph codeph">STORED AS</code>, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table_name</var></code>. <code class="ph codeph">DESCRIBE FORMATTED</code> is also needed
+ to see any overall table comment (as opposed to individual column comments).
+ </p>
+
+ <p class="p">
+ After creating a table, your <span class="keyword cmdname">impala-shell</span> session or another
+ <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately query that
+ table. There might be a brief interval (one statestore heartbeat) before the table can be
+ queried through a different Impala node. To make the <code class="ph codeph">CREATE TABLE</code>
+ statement return only when the table is recognized by all Impala nodes in the cluster,
+ enable the <code class="ph codeph">SYNC_DDL</code> query option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+ </p>
+
+ <p class="p">
+ If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in
+ the table directory or the partition subdirectories are designated to be loaded into
+ memory with the HDFS caching mechanism. See
+ <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+ caching feature.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+ for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+ a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+ When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+ selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+ usage on a single host when the same cached data block is processed multiple times.
+ Where practical, specify a value greater than or equal to the HDFS block replication factor.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Column order</strong>:
+ </p>
+
+ <p class="p">
+ If you intend to use the table to hold data files produced by some external source,
+ specify the columns in the same order as they appear in the data files.
+ </p>
+
+ <p class="p">
+ If you intend to insert or copy data into the table through Impala, or if you have control
+ over the way externally produced data files are arranged, use your judgment to specify
+ columns in the most convenient order:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If certain columns are often <code class="ph codeph">NULL</code>, specify those columns last. You
+ might produce data files that omit these trailing columns entirely. Impala
+ automatically fills in the <code class="ph codeph">NULL</code> values if so.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If an unpartitioned table will be used as the source for an <code class="ph codeph">INSERT ...
+ SELECT</code> operation into a partitioned table, specify last in the unpartitioned
+ table any columns that correspond to partition key columns in the partitioned table,
+ and in the same order as the partition key columns are declared in the partitioned
+ table. This technique lets you use <code class="ph codeph">INSERT ... SELECT *</code> when copying
+ data to the partitioned table, rather than specifying each column name individually.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you specify columns in an order that you later discover is suboptimal, you can
+ sometimes work around the problem without recreating the table. You can create a view
+ that selects columns from the original table in a permuted order, then do a
+ <code class="ph codeph">SELECT *</code> from the view. When inserting data into a table, you can
+ specify a permuted order for the inserted columns to match the order in the
+ destination table.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Hive considerations:</strong>
+ </p>
+
+ <p class="p">
+ Impala queries can make use of metadata about the table and columns, such as the number of
+ rows in a table or the number of different values in a column. Prior to Impala 1.2.2, to
+ create this metadata, you issued the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive to
+ gather this information, after creating the table and loading representative data into it.
+ In Impala 1.2.2 and higher, the <code class="ph codeph">COMPUTE STATS</code> statement produces these
+ statistics within Impala, without needing to use Hive at all.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The Impala <code class="ph codeph">CREATE TABLE</code> statement cannot create an HBase table, because
+ it currently does not support the <code class="ph codeph">STORED BY</code> clause needed for HBase
+ tables. Create such tables in Hive, then query them through Impala. For information on
+ using Impala with HBase tables, see <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+
+ <p class="p">
+ To create a table where the data resides in the Amazon Simple Storage Service (S3),
+ specify a <code class="ph codeph">s3a://</code> prefix <code class="ph codeph">LOCATION</code> attribute pointing to
+ the data files in S3.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, you can use this special
+ <code class="ph codeph">LOCATION</code> syntax as part of a <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ statement.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+ <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+ as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+ Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+ <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+ results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+ many different data files, prepared on different data nodes, and therefore the notion of the data being
+ stored in sorted order is impractical.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE TABLE</code> statement for an internal table creates a directory in
+ HDFS. The <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement associates the table with an
+ existing HDFS directory, and does not create any new directory in HDFS. To locate the HDFS
+ data directory for a table, issue a <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table</var></code> statement. To examine the contents of that HDFS
+ directory, use an OS command such as <code class="ph codeph">hdfs dfs -ls
+ hdfs://<var class="keyword varname">path</var></code>, either from the OS command line or through the
+ <code class="ph codeph">shell</code> or <code class="ph codeph">!</code> commands in <span class="keyword cmdname">impala-shell</span>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax creates data files under the table data
+ directory to hold any data copied by the <code class="ph codeph">INSERT</code> portion of the statement.
+ (Even if no data is copied, Impala might create one or more empty data files.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under, typically the
+ <code class="ph codeph">impala</code> user, must have both execute and write permission for the database
+ directory where the table is being created.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+ <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+ <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+ <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+ in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+ (port 25000).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>,
+ <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>,
+ <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+ <a class="xref" href="impala_tables.html#external_tables">External Tables</a>,
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>,
+ <a class="xref" href="impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a>, <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+ <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_view.html b/docs/build3x/html/topics/impala_create_view.html
new file mode 100644
index 0000000..f25d810
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_view.html
@@ -0,0 +1,194 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE VIEW Statement</title></head><body id="create_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">CREATE VIEW Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">CREATE VIEW</code> statement lets you create a shorthand abbreviation for a more complicated
+ query. The base query can involve joins, expressions, reordered columns, column aliases, and other SQL
+ features that can make a query hard to understand or maintain.
+ </p>
+
+ <p class="p">
+ Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+ <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+ in HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE VIEW [IF NOT EXISTS] <var class="keyword varname">view_name</var> [(<var class="keyword varname">column_list</var>)]
+ AS <var class="keyword varname">select_statement</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE VIEW</code> statement can be useful in scenarios such as the following:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ To turn even the most lengthy and complicated SQL query into a one-liner. You can issue simple queries
+ against the view from applications, scripts, or interactive queries in <span class="keyword cmdname">impala-shell</span>.
+ For example:
+<pre class="pre codeblock"><code>select * from <var class="keyword varname">view_name</var>;
+select * from <var class="keyword varname">view_name</var> order by c1 desc limit 10;</code></pre>
+ The more complicated and hard-to-read the original query, the more benefit there is to simplifying the
+ query using a view.
+ </li>
+
+ <li class="li">
+ To hide the underlying table and column names, to minimize maintenance problems if those names change. In
+ that case, you re-create the view using the new names, and all queries that use the view rather than the
+ underlying tables keep running with no changes.
+ </li>
+
+ <li class="li">
+ To experiment with optimization techniques and make the optimized queries available to all applications.
+ For example, if you find a combination of <code class="ph codeph">WHERE</code> conditions, join order, join hints, and so
+ on that works the best for a class of queries, you can establish a view that incorporates the
+ best-performing techniques. Applications can then make relatively simple queries against the view, without
+ repeating the complicated and optimized logic over and over. If you later find a better way to optimize the
+ original query, when you re-create the view, all the applications immediately take advantage of the
+ optimized base query.
+ </li>
+
+ <li class="li">
+ To simplify a whole class of related queries, especially complicated queries involving joins between
+ multiple tables, complicated expressions in the column list, and other SQL syntax that makes the query
+ difficult to understand and debug. For example, you might create a view that joins several tables, filters
+ using several <code class="ph codeph">WHERE</code> conditions, and selects several columns from the result set.
+ Applications might issue queries against this view that only vary in their <code class="ph codeph">LIMIT</code>,
+ <code class="ph codeph">ORDER BY</code>, and similar simple clauses.
+ </li>
+ </ul>
+
+ <p class="p">
+ For queries that require repeating complicated clauses over and over again, for example in the select list,
+ <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">GROUP BY</code> clauses, you can use the <code class="ph codeph">WITH</code>
+ clause as an alternative to creating a view.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+ <p class="p">
+ For tables containing complex type columns (<code class="ph codeph">ARRAY</code>,
+ <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>), you typically use
+ join queries to refer to the complex values. You can use views to
+ hide the join notation, making such tables seem like traditional denormalized
+ tables, and making those tables queryable by business intelligence tools
+ that do not have built-in support for those complex types.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_views">Accessing Complex Type Data in Flattened Form Using Views</a> for details.
+ </p>
+ <p class="p">
+ Because you cannot directly issue <code class="ph codeph">SELECT <var class="keyword varname">col_name</var></code>
+ against a column of complex type, you cannot use a view or a <code class="ph codeph">WITH</code>
+ clause to <span class="q">"rename"</span> a column by selecting it with a column alias.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+
+
+<pre class="pre codeblock"><code>-- Create a view that is exactly the same as the underlying table.
+create view v1 as select * from t1;
+
+-- Create a view that includes only certain columns from the underlying table.
+create view v2 as select c1, c3, c7 from t1;
+
+-- Create a view that filters the values from the underlying table.
+create view v3 as select distinct c1, c3, c7 from t1 where c1 is not null and c5 > 0;
+
+-- Create a view that that reorders and renames columns from the underlying table.
+create view v4 as select c4 as last_name, c6 as address, c2 as birth_date from t1;
+
+-- Create a view that runs functions to convert or transform certain columns.
+create view v5 as select c1, cast(c3 as string) c3, concat(c4,c5) c5, trim(c6) c6, "Constant" c8 from t1;
+
+-- Create a view that hides the complexity of a view query.
+create view v6 as select t1.c1, t2.c2 from t1 join t2 on t1.id = t2.id;
+</code></pre>
+
+
+
+ <div class="p">
+ The following example creates a series of views and then drops them. These examples illustrate how views
+ are associated with a particular database, and both the view definitions and the view names for
+ <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+ a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>,
+ <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_databases.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_databases.html b/docs/build3x/html/topics/impala_databases.html
new file mode 100644
index 0000000..550d744
--- /dev/null
+++ b/docs/build3x/html/topics/impala_databases.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="databases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Databases</title></head><body id="databases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Databases</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ In Impala, a database is a logical container for a group of tables. Each database defines a separate
+ namespace. Within a database, you can refer to the tables inside it using their unqualified names. Different
+ databases can contain tables with identical names.
+ </p>
+
+ <p class="p">
+ Creating a database is a lightweight operation. There are minimal database-specific properties to configure,
+ only <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>. There is no <code class="ph codeph">ALTER DATABASE</code> statement.
+ </p>
+
+ <p class="p">
+ Typically, you create a separate database for each project or application, to avoid naming conflicts between
+ tables and to make clear which tables are related to each other. The <code class="ph codeph">USE</code> statement lets
+ you switch between databases. Unqualified references to tables, views, and functions refer to objects
+ within the current database. You can also refer to objects in other databases by using qualified names
+ of the form <code class="ph codeph"><var class="keyword varname">dbname</var>.<var class="keyword varname">object_name</var></code>.
+ </p>
+
+ <p class="p">
+ Each database is physically represented by a directory in HDFS. When you do not specify a <code class="ph codeph">LOCATION</code>
+ attribute, the directory is located in the Impala data directory with the associated tables managed by Impala.
+ When you do specify a <code class="ph codeph">LOCATION</code> attribute, any read and write operations for tables in that
+ database are relative to the specified HDFS directory.
+ </p>
+
+ <p class="p">
+ There is a special database, named <code class="ph codeph">default</code>, where you begin when you connect to Impala.
+ Tables created in <code class="ph codeph">default</code> are physically located one level higher in HDFS than all the
+ user-created databases.
+ </p>
+
+ <div class="p">
+ Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+ for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+ functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related statements:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>,
+ <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
[14/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_hdfs_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_hdfs_caching.html b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
new file mode 100644
index 0000000..596675d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_hdfs_caching.html
@@ -0,0 +1,578 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hdfs_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using HDFS Caching with Impala (Impala 2.1 or higher only)</title></head><body id="hdfs_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using HDFS Caching with Impala (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ HDFS caching provides performance and scalability benefits in production environments where Impala queries
+ and other Hadoop jobs operate on quantities of data much larger than the physical RAM on the DataNodes,
+ making it impractical to rely on the Linux OS cache, which only keeps the most recently used data in memory.
+ Data read from the HDFS cache avoids the overhead of checksumming and memory-to-memory copying involved when
+ using data from the Linux OS cache.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ On a small or lightly loaded cluster, HDFS caching might not produce any speedup. It might even lead to
+ slower queries, if I/O read operations that were performed in parallel across the entire cluster are replaced by in-memory
+ operations operating on a smaller number of hosts. The hosts where the HDFS blocks are cached can become
+ bottlenecks because they experience high CPU load while processing the cached data blocks, while other hosts remain idle.
+ Therefore, always compare performance with and without this feature enabled, using a realistic workload.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, you can spread the CPU load more evenly by specifying the <code class="ph codeph">WITH REPLICATION</code>
+ clause of the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements.
+ This clause lets you control the replication factor for
+ HDFS caching for a specific table or partition. By default, each cached block is
+ only present on a single host, which can lead to CPU contention if the same host
+ processes each cached block. Increasing the replication factor lets Impala choose
+ different hosts to process different cached blocks, to better distribute the CPU load.
+ Always use a <code class="ph codeph">WITH REPLICATION</code> setting of at least 3, and adjust upward
+ if necessary to match the replication factor for the underlying HDFS data files.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala automatically randomizes which host processes
+ a cached HDFS block, to avoid CPU hotspots. For tables where HDFS caching is not applied,
+ Impala designates which host to process a data block using an algorithm that estimates
+ the load on each host. If CPU hotspots still arise during queries,
+ you can enable additional randomization for the scheduling algorithm for non-HDFS cached data
+ by setting the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+
+
+ <p class="p">
+ For background information about how to set up and manage HDFS caching for a <span class="keyword"></span> cluster, see
+ <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="hdfs_caching__hdfs_caching_overview">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of HDFS Caching for Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ In <span class="keyword">Impala 1.4</span> and higher, Impala can use the HDFS caching feature to make more effective use of RAM, so that
+ repeated queries can take advantage of data <span class="q">"pinned"</span> in memory regardless of how much data is
+ processed overall. The HDFS caching feature lets you designate a subset of frequently accessed data to be
+ pinned permanently in memory, remaining in the cache across multiple queries and never being evicted. This
+ technique is suitable for tables or partitions that are frequently accessed and are small enough to fit
+ entirely within the HDFS memory cache. For example, you might designate several dimension tables to be
+ pinned in the cache, to speed up many different join queries that reference them. Or in a partitioned
+ table, you might pin a partition holding data from the most recent time period because that data will be
+ queried intensively; then when the next set of data arrives, you could unpin the previous partition and pin
+ the partition holding the new data.
+ </p>
+
+ <p class="p">
+ Because this Impala performance feature relies on HDFS infrastructure, it only applies to Impala tables
+ that use HDFS data files. HDFS caching for Impala does not apply to HBase tables, S3 tables,
+ Kudu tables,
+ or Isilon tables.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="hdfs_caching__hdfs_caching_prereqs">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Setting Up HDFS Caching for Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To use HDFS caching with Impala, first set up that feature for your <span class="keyword"></span> cluster:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Decide how much memory to devote to the HDFS cache on each host. Remember that the total memory available
+ for cached data is the sum of the cache sizes on all the hosts. By default, any data block is only cached on one
+ host, although you can cache a block across multiple hosts by increasing the replication factor.
+
+ </p>
+ </li>
+
+ <li class="li">
+ <div class="p">
+ Issue <span class="keyword cmdname">hdfs cacheadmin</span> commands to set up one or more cache pools, owned by the same
+ user as the <span class="keyword cmdname">impalad</span> daemon (typically <code class="ph codeph">impala</code>). For example:
+<pre class="pre codeblock"><code>hdfs cacheadmin -addPool four_gig_pool -owner impala -limit 4000000000
+</code></pre>
+ For details about the <span class="keyword cmdname">hdfs cacheadmin</span> command, see
+ <span class="xref">the documentation for your Apache Hadoop distribution</span>.
+ </div>
+ </li>
+ </ul>
+
+ <p class="p">
+ Once HDFS caching is enabled and one or more pools are available, see
+ <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching_ddl">Enabling HDFS Caching for Impala Tables and Partitions</a> for how to choose which Impala data to load
+ into the HDFS cache. On the Impala side, you specify the cache pool name defined by the <code class="ph codeph">hdfs
+ cacheadmin</code> command in the Impala DDL statements that enable HDFS caching for a table or partition,
+ such as <code class="ph codeph">CREATE TABLE ... CACHED IN <var class="keyword varname">pool</var></code> or <code class="ph codeph">ALTER TABLE ... SET
+ CACHED IN <var class="keyword varname">pool</var></code>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="hdfs_caching__hdfs_caching_ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Enabling HDFS Caching for Impala Tables and Partitions</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Begin by choosing which tables or partitions to cache. For example, these might be lookup tables that are
+ accessed by many different join queries, or partitions corresponding to the most recent time period that
+ are analyzed by different reports or ad hoc queries.
+ </p>
+
+ <p class="p">
+ In your SQL statements, you specify logical divisions such as tables and partitions to be cached. Impala
+ translates these requests into HDFS-level directives that apply to particular directories and files. For
+ example, given a partitioned table <code class="ph codeph">CENSUS</code> with a partition key column
+ <code class="ph codeph">YEAR</code>, you could choose to cache all or part of the data as follows:
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+ for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+ a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+ When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+ selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+ usage on a single host when the same cached data block is processed multiple times.
+ Where practical, specify a value greater than or equal to the HDFS block replication factor.
+ </p>
+
+<pre class="pre codeblock"><code>-- Cache the entire table (all partitions).
+alter table census set cached in '<var class="keyword varname">pool_name</var>';
+
+-- Remove the entire table from the cache.
+alter table census set uncached;
+
+-- Cache a portion of the table (a single partition).
+-- If the table is partitioned by multiple columns (such as year, month, day),
+-- the ALTER TABLE command must specify values for all those columns.
+alter table census partition (year=1960) set cached in '<var class="keyword varname">pool_name</var>';
+
+<span class="ph">-- Cache the data from one partition on up to 4 hosts, to minimize CPU load on any
+-- single host when the same data block is processed multiple times.
+alter table census partition (year=1970)
+ set cached in '<var class="keyword varname">pool_name</var>' with replication = 4;</span>
+
+-- At each stage, check the volume of cached data.
+-- For large tables or partitions, the background loading might take some time,
+-- so you might have to wait and reissue the statement until all the data
+-- has finished being loaded into the cache.
+show table stats census;
++-------+-------+--------+------+--------------+--------+
+| year | #Rows | #Files | Size | Bytes Cached | Format |
++-------+-------+--------+------+--------------+--------+
+| 1900 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| 1940 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| 1960 | -1 | 1 | 11B | 11B | TEXT |
+| 1970 | -1 | 1 | 11B | NOT CACHED | TEXT |
+| Total | -1 | 4 | 44B | 11B | |
++-------+-------+--------+------+--------------+--------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">CREATE TABLE considerations:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature affects the Impala <code class="ph codeph">CREATE TABLE</code> statement as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You can put a <code class="ph codeph">CACHED IN '<var class="keyword varname">pool_name</var>'</code> clause
+ <span class="ph">and optionally a <code class="ph codeph">WITH REPLICATION = <var class="keyword varname">number_of_hosts</var></code> clause</span>
+ at the end of a
+ <code class="ph codeph">CREATE TABLE</code> statement to automatically cache the entire contents of the table,
+ including any partitions added later. The <var class="keyword varname">pool_name</var> is a pool that you previously set
+ up with the <span class="keyword cmdname">hdfs cacheadmin</span> command.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Once a table is designated for HDFS caching through the <code class="ph codeph">CREATE TABLE</code> statement, if new
+ partitions are added later through <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> statements, the data in
+ those new partitions is automatically cached in the same pool.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you want to perform repetitive queries on a subset of data from a large table, and it is not practical
+ to designate the entire table or specific partitions for HDFS caching, you can create a new cached table
+ with just a subset of the data by using <code class="ph codeph">CREATE TABLE ... CACHED IN '<var class="keyword varname">pool_name</var>'
+ AS SELECT ... WHERE ...</code>. When you are finished with generating reports from this subset of data,
+ drop the table and both the data files and the data cached in RAM are automatically deleted.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for the full syntax.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Other memory considerations:</strong>
+ </p>
+
+ <p class="p">
+ Certain DDL operations, such as <code class="ph codeph">ALTER TABLE ... SET LOCATION</code>, are blocked while the
+ underlying HDFS directories contain cached files. You must uncache the files first, before changing the
+ location, dropping the table, and so on.
+ </p>
+
+ <p class="p">
+ When data is requested to be pinned in memory, that process happens in the background without blocking
+ access to the data while the caching is in progress. Loading the data from disk could take some time.
+ Impala reads each HDFS data block from memory if it has been pinned already, or from disk if it has not
+ been pinned yet. When files are added to a table or partition whose contents are cached, Impala
+ automatically detects those changes and performs a <code class="ph codeph">REFRESH</code> automatically once the relevant
+ data is cached.
+ </p>
+
+ <p class="p">
+ The amount of data that you can pin on each node through the HDFS caching mechanism is subject to a quota
+ that is enforced by the underlying HDFS service. Before requesting to pin an Impala table or partition in
+ memory, check that its size does not exceed this quota.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the HDFS cache consists of combined memory from all the DataNodes in the cluster, cached tables or
+ partitions can be bigger than the amount of HDFS cache memory on any single host.
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="hdfs_caching__hdfs_caching_etl">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Loading and Removing Data with HDFS Caching Enabled</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ When HDFS caching is enabled, extra processing happens in the background when you add or remove data
+ through statements such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">DROP TABLE</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Inserting or loading data:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When Impala performs an <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> or
+ <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code> statement for a table or
+ partition that is cached, the new data files are automatically cached and Impala recognizes that fact
+ automatically.
+ </li>
+
+ <li class="li">
+ If you perform an <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> through Hive, as always, Impala
+ only recognizes the new data files after a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement in Impala.
+ </li>
+
+ <li class="li">
+ If the cache pool is entirely full, or becomes full before all the requested data can be cached, the
+ Impala DDL statement returns an error. This is to avoid situations where only some of the requested data
+ could be cached.
+ </li>
+
+ <li class="li">
+ When HDFS caching is enabled for a table or partition, new data files are cached automatically when they
+ are added to the appropriate directory in HDFS, without the need for a <code class="ph codeph">REFRESH</code> statement
+ in Impala. Impala automatically performs a <code class="ph codeph">REFRESH</code> once the new data is loaded into the
+ HDFS cache.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Dropping tables, partitions, or cache pools:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature interacts with the Impala
+ <code class="ph codeph"><a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE</a></code> and
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... DROP PARTITION</a></code>
+ statements as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When you issue a <code class="ph codeph">DROP TABLE</code> for a table that is entirely cached, or has some partitions
+ cached, the <code class="ph codeph">DROP TABLE</code> succeeds and all the cache directives Impala submitted for that
+ table are removed from the HDFS cache system.
+ </li>
+
+ <li class="li">
+ The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code>. The operation succeeds and any cache
+ directives are removed.
+ </li>
+
+ <li class="li">
+ As always, the underlying data files are removed if the dropped table is an internal table, or the
+ dropped partition is in its default location underneath an internal table. The data files are left alone
+ if the dropped table is an external table, or if the dropped partition is in a non-default location.
+ </li>
+
+ <li class="li">
+ If you designated the data files as cached through the <span class="keyword cmdname">hdfs cacheadmin</span> command, and
+ the data files are left behind as described in the previous item, the data files remain cached. Impala
+ only removes the cache directives submitted by Impala through the <code class="ph codeph">CREATE TABLE</code> or
+ <code class="ph codeph">ALTER TABLE</code> statements. It is OK to have multiple redundant cache directives pertaining
+ to the same files; the directives all have unique IDs and owners so that the system can tell them apart.
+ </li>
+
+ <li class="li">
+ If you drop an HDFS cache pool through the <span class="keyword cmdname">hdfs cacheadmin</span> command, all the Impala
+ data files are preserved, just no longer cached. After a subsequent <code class="ph codeph">REFRESH</code>,
+ <code class="ph codeph">SHOW TABLE STATS</code> reports 0 bytes cached for each associated Impala table or partition.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Relocating a table or partition:</strong>
+ </p>
+
+ <p class="p">
+ The HDFS caching feature interacts with the Impala
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE ... SET LOCATION</a></code>
+ statement as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If you have designated a table or partition as cached through the <code class="ph codeph">CREATE TABLE</code> or
+ <code class="ph codeph">ALTER TABLE</code> statements, subsequent attempts to relocate the table or partition through
+ an <code class="ph codeph">ALTER TABLE ... SET LOCATION</code> statement will fail. You must issue an <code class="ph codeph">ALTER
+ TABLE ... SET UNCACHED</code> statement for the table or partition first. Otherwise, Impala would lose
+ track of some cached data files and have no way to uncache them later.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="hdfs_caching__hdfs_caching_admin">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Administration for HDFS Caching with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are the guidelines and steps to check or change the status of HDFS caching for Impala data:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">hdfs cacheadmin command:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If you drop a cache pool with the <span class="keyword cmdname">hdfs cacheadmin</span> command, Impala queries against the
+ associated data files will still work, by falling back to reading the files from disk. After performing a
+ <code class="ph codeph">REFRESH</code> on the table, Impala reports the number of bytes cached as 0 for all associated
+ tables and partitions.
+ </li>
+
+ <li class="li">
+ You might use <span class="keyword cmdname">hdfs cacheadmin</span> to get a list of existing cache pools, or detailed
+ information about the pools, as follows:
+<pre class="pre codeblock"><code>hdfs cacheadmin -listDirectives # Basic info
+Found 122 entries
+ ID POOL REPL EXPIRY PATH
+ 123 testPool 1 never /user/hive/warehouse/tpcds.store_sales
+ 124 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15
+ 125 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01
+...
+
+hdfs cacheadmin -listDirectives -stats # More details
+Found 122 entries
+ ID POOL REPL EXPIRY PATH BYTES_NEEDED BYTES_CACHED FILES_NEEDED FILES_CACHED
+ 123 testPool 1 never /user/hive/warehouse/tpcds.store_sales 0 0 0 0
+ 124 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-01-15 143169 143169 1 1
+ 125 testPool 1 never /user/hive/warehouse/tpcds.store_sales/ss_date=1998-02-01 112447 112447 1 1
+...
+</code></pre>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Impala SHOW statement:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ For each table or partition, the <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code>
+ statement displays the number of bytes currently cached by the HDFS caching feature. If there are no
+ cache directives in place for that table or partition, the result set displays <code class="ph codeph">NOT
+ CACHED</code>. A value of 0, or a smaller number than the overall size of the table or partition,
+ indicates that the cache request has been submitted but the data has not been entirely loaded into memory
+ yet. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Impala memory limits:</strong>
+ </p>
+
+ <p class="p">
+ The Impala HDFS caching feature interacts with the Impala memory limits as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The maximum size of each HDFS cache pool is specified externally to Impala, through the <span class="keyword cmdname">hdfs
+ cacheadmin</span> command.
+ </li>
+
+ <li class="li">
+ All the memory used for HDFS caching is separate from the <span class="keyword cmdname">impalad</span> daemon address space
+ and does not count towards the limits of the <code class="ph codeph">--mem_limit</code> startup option,
+ <code class="ph codeph">MEM_LIMIT</code> query option, or further limits imposed through YARN resource management or
+ the Linux <code class="ph codeph">cgroups</code> mechanism.
+ </li>
+
+ <li class="li">
+ Because accessing HDFS cached data avoids a memory-to-memory copy operation, queries involving cached
+ data require less memory on the Impala side than the equivalent queries on uncached data. In addition to
+ any performance benefits in a single-user environment, the reduced memory helps to improve scalability
+ under high-concurrency workloads.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="hdfs_caching__hdfs_caching_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Performance Considerations for HDFS Caching with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In Impala 1.4.0 and higher, Impala supports efficient reads from data that is pinned in memory through HDFS
+ caching. Impala takes advantage of the HDFS API and reads the data from memory rather than from disk
+ whether the data files are pinned using Impala DDL statements, or using the command-line mechanism where
+ you specify HDFS paths.
+ </p>
+
+ <p class="p">
+ When you examine the output of the <span class="keyword cmdname">impala-shell</span> <span class="keyword cmdname">SUMMARY</span> command, or
+ look in the metrics report for the <span class="keyword cmdname">impalad</span> daemon, you see how many bytes are read from
+ the HDFS cache. For example, this excerpt from a query profile illustrates that all the data read during a
+ particular phase of the query came from the HDFS cache, because the <code class="ph codeph">BytesRead</code> and
+ <code class="ph codeph">BytesReadDataNodeCache</code> values are identical.
+ </p>
+
+<pre class="pre codeblock"><code>HDFS_SCAN_NODE (id=0):(Total: 11s114ms, non-child: 11s114ms, % non-child: 100.00%)
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 32.75
+<strong class="ph b"> - BytesRead: 10.47 GB (11240756479)
+ - BytesReadDataNodeCache: 10.47 GB (11240756479)</strong>
+ - BytesReadLocal: 10.47 GB (11240756479)
+ - BytesReadShortCircuit: 10.47 GB (11240756479)
+ - DecompressionTime: 27s572ms
+</code></pre>
+
+ <p class="p">
+ For queries involving smaller amounts of data, or in single-user workloads, you might not notice a
+ significant difference in query response time with or without HDFS caching. Even with HDFS caching turned
+ off, the data for the query might still be in the Linux OS buffer cache. The benefits become clearer as
+ data volume increases, and especially as the system processes more concurrent queries. HDFS caching
+ improves the scalability of the overall system. That is, it prevents query performance from declining when
+ the workload outstrips the capacity of the Linux OS cache.
+ </p>
+
+ <p class="p">
+ Due to a limitation of HDFS, zero-copy reads are not supported with
+ encryption. Where practical, avoid HDFS caching for Impala data
+ files in encryption zones. The queries fall back to the normal read
+ path during query execution, which might cause some performance overhead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">SELECT considerations:</strong>
+ </p>
+
+ <p class="p">
+ The Impala HDFS caching feature interacts with the
+ <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement and query performance as
+ follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala automatically reads from memory any data that has been designated as cached and actually loaded
+ into the HDFS cache. (It could take some time after the initial request to fully populate the cache for a
+ table with large size or many partitions.) The speedup comes from two aspects: reading from RAM instead
+ of disk, and accessing the data straight from the cache area instead of copying from one RAM area to
+ another. This second aspect yields further performance improvement over the standard OS caching
+ mechanism, which still results in memory-to-memory copying of cached data.
+ </li>
+
+ <li class="li">
+ For small amounts of data, the query speedup might not be noticeable in terms of wall clock time. The
+ performance might be roughly the same with HDFS caching turned on or off, due to recently used data being
+ held in the Linux OS cache. The difference is more pronounced with:
+ <ul class="ul">
+ <li class="li">
+ Data volumes (for all queries running concurrently) that exceed the size of the Linux OS cache.
+ </li>
+
+ <li class="li">
+ A busy cluster running many concurrent queries, where the reduction in memory-to-memory copying and
+ overall memory usage during queries results in greater scalability and throughput.
+ </li>
+
+ <li class="li">
+ Thus, to really exercise and benchmark this feature in a development environment, you might need to
+ simulate realistic workloads and concurrent queries that match your production environment.
+ </li>
+
+ <li class="li">
+ One way to simulate a heavy workload on a lightly loaded system is to flush the OS buffer cache (on
+ each DataNode) between iterations of queries against the same tables or partitions:
+<pre class="pre codeblock"><code>$ sync
+$ echo 1 > /proc/sys/vm/drop_caches
+</code></pre>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Impala queries take advantage of HDFS cached data regardless of whether the cache directive was issued by
+ Impala or externally through the <span class="keyword cmdname">hdfs cacheadmin</span> command, for example for an external
+ table where the cached data files might be accessed by several different Hadoop components.
+ </li>
+
+ <li class="li">
+ If your query returns a large result set, the time reported for the query could be dominated by the time
+ needed to print the results on the screen. To measure the time for the underlying query processing, query
+ the <code class="ph codeph">COUNT()</code> of the big result set, which does all the same processing but only prints a
+ single line to the screen.
+ </li>
+ </ul>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_joins.html b/docs/build3x/html/topics/impala_perf_joins.html
new file mode 100644
index 0000000..7def5b4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_joins.html
@@ -0,0 +1,508 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Performance Considerations for Join Queries</title></head><body id="perf_joins"><ma
in role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Performance Considerations for Join Queries</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Queries involving join operations often require more tuning than queries that refer to only one table. The
+ maximum size of the result set from a join query is the product of the number of rows in all the joined
+ tables. When joining several tables with millions or billions of rows, any missed opportunity to filter the
+ result set, or other inefficiency in the query, could lead to an operation that does not finish in a
+ practical time and has to be cancelled.
+ </p>
+
+ <p class="p">
+ The simplest technique for tuning an Impala join query is to collect statistics on each table involved in the
+ join using the <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS</a></code>
+ statement, and then let Impala automatically optimize the query based on the size of each table, number of
+ distinct values of each column, and so on. The <code class="ph codeph">COMPUTE STATS</code> statement and the join
+ optimization are new features introduced in Impala 1.2.2. For accurate statistics about each table, issue the
+ <code class="ph codeph">COMPUTE STATS</code> statement after loading the data into that table, and again if the amount of
+ data changes substantially due to an <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, adding a partition,
+ and so on.
+ </p>
+
+ <p class="p">
+ If statistics are not available for all the tables in the join query, or if Impala chooses a join order that
+ is not the most efficient, you can override the automatic join order optimization by specifying the
+ <code class="ph codeph">STRAIGHT_JOIN</code> keyword immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code>
+ or <code class="ph codeph">ALL</code> keywords. In this case, Impala uses the order the tables appear in the query to guide how the
+ joins are processed.
+ </p>
+
+ <p class="p">
+ When you use the <code class="ph codeph">STRAIGHT_JOIN</code> technique, you must order the tables in the join query
+ manually instead of relying on the Impala optimizer. The optimizer uses sophisticated techniques to estimate
+ the size of the result set at each stage of the join. For manual ordering, use this heuristic approach to
+ start with, and then experiment to fine-tune the order:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Specify the largest table first. This table is read from disk by each Impala node and so its size is not
+ significant in terms of memory usage during the query.
+ </li>
+
+ <li class="li">
+ Next, specify the smallest table. The contents of the second, third, and so on tables are all transmitted
+ across the network. You want to minimize the size of the result set from each subsequent stage of the join
+ query. The most likely approach involves joining a small table first, so that the result set remains small
+ even as subsequent larger tables are processed.
+ </li>
+
+ <li class="li">
+ Join the next smallest table, then the next smallest, and so on.
+ </li>
+
+ <li class="li">
+ For example, if you had tables <code class="ph codeph">BIG</code>, <code class="ph codeph">MEDIUM</code>, <code class="ph codeph">SMALL</code>, and
+ <code class="ph codeph">TINY</code>, the logical join order to try would be <code class="ph codeph">BIG</code>, <code class="ph codeph">TINY</code>,
+ <code class="ph codeph">SMALL</code>, <code class="ph codeph">MEDIUM</code>.
+ </li>
+ </ul>
+
+ <p class="p">
+ The terms <span class="q">"largest"</span> and <span class="q">"smallest"</span> refers to the size of the intermediate result set based on the
+ number of rows and columns from each table that are part of the result set. For example, if you join one
+ table <code class="ph codeph">sales</code> with another table <code class="ph codeph">customers</code>, a query might find results from
+ 100 different customers who made a total of 5000 purchases. In that case, you would specify <code class="ph codeph">SELECT
+ ... FROM sales JOIN customers ...</code>, putting <code class="ph codeph">customers</code> on the right side because it
+ is smaller in the context of this query.
+ </p>
+
+ <p class="p">
+ The Impala query planner chooses between different techniques for performing join queries, depending on the
+ absolute and relative sizes of the tables. <strong class="ph b">Broadcast joins</strong> are the default, where the right-hand table
+ is considered to be smaller than the left-hand table, and its contents are sent to all the other nodes
+ involved in the query. The alternative technique is known as a <strong class="ph b">partitioned join</strong> (not related to a
+ partitioned table), which is more suitable for large tables of roughly equal size. With this technique,
+ portions of each table are sent to appropriate other nodes where those subsets of rows can be processed in
+ parallel. The choice of broadcast or partitioned join also depends on statistics being available for all
+ tables in the join, gathered by the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ To see which join strategy is used for a particular query, issue an <code class="ph codeph">EXPLAIN</code> statement for
+ the query. If you find that a query uses a broadcast join when you know through benchmarking that a
+ partitioned join would be more efficient, or vice versa, add a hint to the query to specify the precise join
+ mechanism to use. See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="perf_joins__joins_no_stats">
+
+ <h2 class="title topictitle2" id="ariaid-title2">How Joins Are Processed when Statistics Are Unavailable</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If table or column statistics are not available for some tables in a join, Impala still reorders the tables
+ using the information that is available. Tables with statistics are placed on the left side of the join
+ order, in descending order of cost based on overall size and cardinality. Tables without statistics are
+ treated as zero-size, that is, they are always placed on the right side of the join order.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="perf_joins__straight_join">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overriding Join Reordering with STRAIGHT_JOIN</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If an Impala join query is inefficient because of outdated statistics or unexpected data distribution, you
+ can keep Impala from reordering the joined tables by using the <code class="ph codeph">STRAIGHT_JOIN</code> keyword
+ immediately after the <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code>
+ keywords. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword turns off
+ the reordering of join clauses that Impala does internally, and produces a plan that relies on the join
+ clauses being ordered optimally in the query text. In this case, rewrite the query so that the largest
+ table is on the left, followed by the next largest, and so on until the smallest table is on the right.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+ block containing the hint. It does not affect the join order of nested queries, such as views,
+ inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+ tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+ </p>
+ </div>
+
+ <p class="p">
+ In this example, the subselect from the <code class="ph codeph">BIG</code> table produces a very small result set, but
+ the table might still be treated as if it were the biggest and placed first in the join order. Using
+ <code class="ph codeph">STRAIGHT_JOIN</code> for the last join clause prevents the final table from being reordered,
+ keeping it as the rightmost table in the join order.
+ </p>
+
+<pre class="pre codeblock"><code>select straight_join x from medium join small join (select * from big where c1 < 10) as big
+ where medium.id = small.id and small.id = big.id;
+
+-- If the query contains [DISTINCT | ALL], the hint goes after those keywords.
+select distinct straight_join x from medium join small join (select * from big where c1 < 10) as big
+ where medium.id = small.id and small.id = big.id;</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="perf_joins__perf_joins_examples">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Examples of Join Order Optimization</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are examples showing joins between tables with 1 billion, 200 million, and 1 million rows. (In this
+ case, the tables are unpartitioned and using Parquet format.) The smaller tables contain subsets of data
+ from the largest one, for convenience of joining on the unique <code class="ph codeph">ID</code> column. The smallest
+ table only contains a subset of columns from the others.
+ </p>
+
+ <p class="p"></p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table big stored as parquet as select * from raw_data;
++----------------------------+
+| summary |
++----------------------------+
+| Inserted 1000000000 row(s) |
++----------------------------+
+Returned 1 row(s) in 671.56s
+[localhost:21000] > desc big;
++-----------+---------+---------+
+| name | type | comment |
++-----------+---------+---------+
+| id | int | |
+| val | int | |
+| zfill | string | |
+| name | string | |
+| assertion | boolean | |
++-----------+---------+---------+
+Returned 5 row(s) in 0.01s
+[localhost:21000] > create table medium stored as parquet as select * from big limit 200 * floor(1e6);
++---------------------------+
+| summary |
++---------------------------+
+| Inserted 200000000 row(s) |
++---------------------------+
+Returned 1 row(s) in 138.31s
+[localhost:21000] > create table small stored as parquet as select id,val,name from big where assertion = true limit 1 * floor(1e6);
++-------------------------+
+| summary |
++-------------------------+
+| Inserted 1000000 row(s) |
++-------------------------+
+Returned 1 row(s) in 6.32s</code></pre>
+
+ <p class="p">
+ For any kind of performance experimentation, use the <code class="ph codeph">EXPLAIN</code> statement to see how any
+ expensive query will be performed without actually running it, and enable verbose <code class="ph codeph">EXPLAIN</code>
+ plans containing more performance-oriented detail: The most interesting plan lines are highlighted in bold,
+ showing that without statistics for the joined tables, Impala cannot make a good estimate of the number of
+ rows involved at each stage of processing, and is likely to stick with the <code class="ph codeph">BROADCAST</code> join
+ mechanism that sends a complete copy of one of the tables to each node.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=verbose;
+EXPLAIN_LEVEL set to verbose
+[localhost:21000] > explain select count(*) from big join medium where big.id = medium.id;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=2.10GB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+<strong class="ph b">| | join op: INNER JOIN (BROADCAST) |</strong>
+| | hash predicates: |
+| | big.id = medium.id |
+<strong class="ph b">| | cardinality: unavailable |</strong>
+| | per-host memory: 2.00GB |
+| | tuple ids: 0 1 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: unavailable |
+| | per-host memory: 0B |
+| | tuple ids: 1 |
+| | |
+| 0:SCAN HDFS |
+<strong class="ph b">| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: unavailable |
+| column stats: unavailable |
+| cardinality: unavailable |</strong>
+| per-host memory: 88.00MB |
+| tuple ids: 0 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 1:SCAN HDFS |
+<strong class="ph b">| table=join_order.medium #partitions=1/1 size=4.62GB |
+| table stats: unavailable |
+| column stats: unavailable |
+| cardinality: unavailable |</strong>
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
++----------------------------------------------------------+
+Returned 64 row(s) in 0.04s</code></pre>
+
+ <p class="p">
+ Gathering statistics for all the tables is straightforward, one <code class="ph codeph">COMPUTE STATS</code> statement
+ per table:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats small;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 3 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 4.26s
+[localhost:21000] > compute stats medium;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 42.11s
+[localhost:21000] > compute stats big;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 5 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 165.44s</code></pre>
+
+ <p class="p">
+ With statistics in place, Impala can choose a more effective join order rather than following the
+ left-to-right sequence of tables in the query, and can choose <code class="ph codeph">BROADCAST</code> or
+ <code class="ph codeph">PARTITIONED</code> join strategies based on the overall sizes and number of rows in the table:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > explain select count(*) from medium join big where big.id = medium.id;
+Query: explain select count(*) from medium join big where big.id = medium.id
++-----------------------------------------------------------+
+| Explain String |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=937.23MB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+| | join op: INNER JOIN (BROADCAST) |
+| | hash predicates: |
+| | big.id = medium.id |
+| | cardinality: 1443004441 |
+| | per-host memory: 839.23MB |
+| | tuple ids: 1 0 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: 200000000 |
+| | per-host memory: 0B |
+| | tuple ids: 0 |
+| | |
+| 1:SCAN HDFS |
+| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: 1000000000 rows total |
+| column stats: all |
+| cardinality: 1000000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=join_order.medium #partitions=1/1 size=4.62GB |
+| table stats: 200000000 rows total |
+| column stats: all |
+| cardinality: 200000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 0 |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.04s
+
+[localhost:21000] > explain select count(*) from small join big where big.id = small.id;
+Query: explain select count(*) from small join big where big.id = small.id
++-----------------------------------------------------------+
+| Explain String |
++-----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=101.15MB VCores=2 |
+| |
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 6:AGGREGATE (merge finalize) |
+| | output: SUM(COUNT(*)) |
+| | cardinality: 1 |
+| | per-host memory: unavailable |
+| | tuple ids: 2 |
+| | |
+| 5:EXCHANGE |
+| cardinality: 1 |
+| per-host memory: unavailable |
+| tuple ids: 2 |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 5 |
+| UNPARTITIONED |
+| |
+| 3:AGGREGATE |
+| | output: COUNT(*) |
+| | cardinality: 1 |
+| | per-host memory: 10.00MB |
+| | tuple ids: 2 |
+| | |
+| 2:HASH JOIN |
+| | join op: INNER JOIN (BROADCAST) |
+| | hash predicates: |
+| | big.id = small.id |
+| | cardinality: 1000000000 |
+| | per-host memory: 3.15MB |
+| | tuple ids: 1 0 |
+| | |
+| |----4:EXCHANGE |
+| | cardinality: 1000000 |
+| | per-host memory: 0B |
+| | tuple ids: 0 |
+| | |
+| 1:SCAN HDFS |
+| table=join_order.big #partitions=1/1 size=23.12GB |
+| table stats: 1000000000 rows total |
+| column stats: all |
+| cardinality: 1000000000 |
+| per-host memory: 88.00MB |
+| tuple ids: 1 |
+| |
+| PLAN FRAGMENT 2 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 4 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=join_order.small #partitions=1/1 size=17.93MB |
+| table stats: 1000000 rows total |
+| column stats: all |
+| cardinality: 1000000 |
+| per-host memory: 32.00MB |
+| tuple ids: 0 |
++-----------------------------------------------------------+
+Returned 64 row(s) in 0.03s</code></pre>
+
+ <p class="p">
+ When queries like these are actually run, the execution times are relatively consistent regardless of the
+ table order in the query text. Here are examples using both the unique <code class="ph codeph">ID</code> column and the
+ <code class="ph codeph">VAL</code> column containing duplicate values:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select count(*) from big join small on (big.id = small.id);
+Query: select count(*) from big join small on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+Returned 1 row(s) in 21.68s
+[localhost:21000] > select count(*) from small join big on (big.id = small.id);
+Query: select count(*) from small join big on (big.id = small.id)
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+Returned 1 row(s) in 20.45s
+
+[localhost:21000] > select count(*) from big join small on (big.val = small.val);
++------------+
+| count(*) |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 108.85s
+[localhost:21000] > select count(*) from small join big on (big.val = small.val);
++------------+
+| count(*) |
++------------+
+| 2000948962 |
++------------+
+Returned 1 row(s) in 100.76s</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ When examining the performance of join queries and the effectiveness of the join order optimization, make
+ sure the query involves enough data and cluster resources to see a difference depending on the query plan.
+ For example, a single data file of just a few megabytes will reside in a single HDFS block and be processed
+ on a single node. Likewise, if you use a single-node or two-node cluster, there might not be much
+ difference in efficiency for the broadcast or partitioned join strategies.
+ </div>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_resources.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_resources.html b/docs/build3x/html/topics/impala_perf_resources.html
new file mode 100644
index 0000000..2bd7503
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_resources.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="mem_limits"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Controlling Impala Resource Usage</title></head><body id="mem_limits"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Controlling Impala Resource Usage</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Sometimes, balancing raw query performance against scalability requires limiting the amount of resources,
+ such as memory or CPU, used by a single query or group of queries. Impala can use several mechanisms that
+ help to smooth out the load during heavy concurrent usage, resulting in faster overall query times and
+ sharing of resources across Impala queries, MapReduce jobs, and other kinds of workloads across a <span class="keyword"></span>
+ cluster:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The Impala admission control feature uses a fast, distributed mechanism to hold back queries that exceed
+ limits on the number of concurrent queries or the amount of memory used. The queries are queued, and
+ executed as other queries finish and resources become available. You can control the concurrency limits,
+ and specify different limits for different groups of users to divide cluster resources according to the
+ priorities of different classes of users. This feature is new in Impala 1.3.
+ See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You can restrict the amount of memory Impala reserves during query execution by specifying the
+ <code class="ph codeph">-mem_limit</code> option for the <code class="ph codeph">impalad</code> daemon. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details. This limit applies only to the
+ memory that is directly consumed by queries; Impala reserves additional memory at startup, for example to
+ hold cached metadata.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For production deployments, implement resource isolation using your cluster management
+ tool.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_skew.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_skew.html b/docs/build3x/html/topics/impala_perf_skew.html
new file mode 100644
index 0000000..20e5bfc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_skew.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_skew"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Detecting and Correcting HDFS Block Skew Conditions</title></head><body id="perf_skew"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Detecting and Correcting HDFS Block Skew Conditions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ For best performance of Impala parallel queries, the work is divided equally across hosts in the cluster, and
+ all hosts take approximately equal time to finish their work. If one host takes substantially longer than
+ others, the extra time needed for the slow host can become the dominant factor in query performance.
+ Therefore, one of the first steps in performance tuning for Impala is to detect and correct such conditions.
+ </p>
+
+ <p class="p">
+ The main cause of uneven performance that you can correct within Impala is <dfn class="term">skew</dfn> in the number of
+ HDFS data blocks processed by each host, where some hosts process substantially more data blocks than others.
+ This condition can occur because of uneven distribution of the data values themselves, for example causing
+ certain data files or partitions to be large while others are very small. (Although it is possible to have
+ unevenly distributed data without any problems with the distribution of HDFS blocks.) Block skew could also
+ be due to the underlying block allocation policies within HDFS, the replication factor of the data files, and
+ the way that Impala chooses the host to process each data block.
+ </p>
+
+ <p class="p">
+ The most convenient way to detect block skew, or slow-host issues in general, is to examine the <span class="q">"executive
+ summary"</span> information from the query profile after running a query:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In <span class="keyword cmdname">impala-shell</span>, issue the <code class="ph codeph">SUMMARY</code> command immediately after the
+ query is complete, to see just the summary information. If you detect issues involving skew, you might
+ switch to issuing the <code class="ph codeph">PROFILE</code> command, which displays the summary information followed
+ by a detailed performance analysis.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In the Impala debug web UI, click on the <span class="ph uicontrol">Profile</span> link associated with the query after it is
+ complete. The executive summary information is displayed early in the profile output.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ For each phase of the query, you see an <span class="ph uicontrol">Avg Time</span> and a <span class="ph uicontrol">Max Time</span>
+ value, along with <span class="ph uicontrol">#Hosts</span> indicating how many hosts are involved in that query phase.
+ For all the phases with <span class="ph uicontrol">#Hosts</span> greater than one, look for cases where the maximum time
+ is substantially greater than the average time. Focus on the phases that took the longest, for example, those
+ taking multiple seconds rather than milliseconds or microseconds.
+ </p>
+
+ <p class="p">
+ If you detect that some hosts take longer than others, first rule out non-Impala causes. One reason that some
+ hosts could be slower than others is if those hosts have less capacity than the others, or if they are
+ substantially busier due to unevenly distributed non-Impala workloads:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ For clusters running Impala, keep the relative capacities of all hosts roughly equal. Any cost savings
+ from including some underpowered hosts in the cluster will likely be outweighed by poor or uneven
+ performance, and the time spent diagnosing performance issues.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If non-Impala workloads cause slowdowns on some hosts but not others, use the appropriate load-balancing
+ techniques for the non-Impala components to smooth out the load across the cluster.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ If the hosts on your cluster are evenly powered and evenly loaded, examine the detailed profile output to
+ determine which host is taking longer than others for the query phase in question. Examine how many bytes are
+ processed during that phase on that host, how much memory is used, and how many bytes are transmitted across
+ the network.
+ </p>
+
+ <p class="p">
+ The most common symptom is a higher number of bytes read on one host than others, due to one host being
+ requested to process a higher number of HDFS data blocks. This condition is more likely to occur when the
+ number of blocks accessed by the query is relatively small. For example, if you have a 10-node cluster and
+ the query processes 10 HDFS blocks, each node might not process exactly one block. If one node sits idle
+ while another node processes two blocks, the query could take twice as long as if the data was perfectly
+ distributed.
+ </p>
+
+ <p class="p">
+ Possible solutions in this case include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If the query is artificially small, perhaps for benchmarking purposes, scale it up to process a larger
+ data set. For example, if some nodes read 10 HDFS data blocks while others read 11, the overall effect of
+ the uneven distribution is much lower than when some nodes did twice as much work as others. As a
+ guideline, aim for a <span class="q">"sweet spot"</span> where each node reads 2 GB or more from HDFS per query. Queries
+ that process lower volumes than that could experience inconsistent performance that smooths out as
+ queries become more data-intensive.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If the query processes only a few large blocks, so that many nodes sit idle and cannot help to
+ parallelize the query, consider reducing the overall block size. For example, you might adjust the
+ <code class="ph codeph">PARQUET_FILE_SIZE</code> query option before copying or converting data into a Parquet table.
+ Or you might adjust the granularity of data files produced earlier in the ETL pipeline by non-Impala
+ components. In Impala 2.0 and later, the default Parquet block size is 256 MB, reduced from 1 GB, to
+ improve parallelism for common cluster sizes and data volumes.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Reduce the amount of compression applied to the data. For text data files, the highest degree of
+ compression (gzip) produces unsplittable files that are more difficult for Impala to process in parallel,
+ and require extra memory during processing to hold the compressed and uncompressed data simultaneously.
+ For binary formats such as Parquet and Avro, compression can result in fewer data blocks overall, but
+ remember that when queries process relatively few blocks, there is less opportunity for parallel
+ execution and many nodes in the cluster might sit idle. Note that when Impala writes Parquet data with
+ the query option <code class="ph codeph">COMPRESSION_CODEC=NONE</code> enabled, the data is still typically compact due
+ to the encoding schemes used by Parquet, independent of the final compression step.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
[25/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_kudu.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_kudu.html b/docs/build3x/html/topics/impala_kudu.html
new file mode 100644
index 0000000..1f10f44
--- /dev/null
+++ b/docs/build3x/html/topics/impala_kudu.html
@@ -0,0 +1,1449 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_kudu"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query Kudu Tables</title></head><body id="impala_kudu"><main role="main"><article role="article" aria-labelledby="impala_kudu__kudu">
+
+ <h1 class="title topictitle1" id="impala_kudu__kudu">Using Impala to Query Kudu Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You can use Impala to query tables stored by Apache Kudu. This capability
+ allows convenient access to a storage system that is tuned for different kinds of
+ workloads than the default with Impala.
+ </p>
+
+ <p class="p">
+ By default, Impala tables are stored on HDFS using data files with various file formats.
+ HDFS files are ideal for bulk loads (append operations) and queries using full-table scans,
+ but do not support in-place updates or deletes. Kudu is an alternative storage engine used
+ by Impala which can do both in-place updates (for mixed read/write workloads) and fast scans
+ (for data-warehouse/analytic operations). Using Kudu tables with Impala can simplify the
+ ETL pipeline by avoiding extra steps to segregate and reorganize newly arrived data.
+ </p>
+
+ <p class="p">
+ Certain Impala SQL statements and clauses, such as <code class="ph codeph">DELETE</code>,
+ <code class="ph codeph">UPDATE</code>, <code class="ph codeph">UPSERT</code>, and <code class="ph codeph">PRIMARY KEY</code> work
+ only with Kudu tables. Other statements and clauses, such as <code class="ph codeph">LOAD DATA</code>,
+ <code class="ph codeph">TRUNCATE TABLE</code>, and <code class="ph codeph">INSERT OVERWRITE</code>, are not applicable
+ to Kudu tables.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_kudu__kudu_benefits">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Benefits of Using Kudu Tables with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The combination of Kudu and Impala works best for tables where scan performance is
+ important, but data arrives continuously, in small batches, or needs to be updated
+ without being completely replaced. HDFS-backed tables can require substantial overhead
+ to replace or reorganize data files as new data arrives. Impala can perform efficient
+ lookups and scans within Kudu tables, and Impala can also perform update or
+ delete operations efficiently. You can also use the Kudu Java, C++, and Python APIs to
+ do ingestion or transformation operations outside of Impala, and Impala can query the
+ current data at any time.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_kudu__kudu_config">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala for Use with Kudu</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">-kudu_master_hosts</code> configuration property must be set correctly
+ for the <span class="keyword cmdname">impalad</span> daemon, for <code class="ph codeph">CREATE TABLE ... STORED AS
+ KUDU</code> statements to connect to the appropriate Kudu server. Typically, the
+ required value for this setting is <code class="ph codeph"><var class="keyword varname">kudu_host</var>:7051</code>.
+ In a high-availability Kudu deployment, specify the names of multiple Kudu hosts separated by commas.
+ </p>
+
+ <p class="p">
+ If the <code class="ph codeph">-kudu_master_hosts</code> configuration property is not set, you can
+ still associate the appropriate value for each table by specifying a
+ <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> clause in the <code class="ph codeph">CREATE TABLE</code> statement or
+ changing the <code class="ph codeph">TBLPROPERTIES('kudu.master_addresses')</code> value with an <code class="ph codeph">ALTER TABLE</code>
+ statement.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kudu_config__kudu_topology">
+
+ <h3 class="title topictitle3" id="ariaid-title4">Cluster Topology for Kudu Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ With HDFS-backed tables, you are typically concerned with the number of DataNodes in
+ the cluster, how many and how large HDFS data files are read during a query, and
+ therefore the amount of work performed by each DataNode and the network communication
+ to combine intermediate results and produce the final result set.
+ </p>
+
+ <p class="p">
+ With Kudu tables, the topology considerations are different, because:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The underlying storage is managed and organized by Kudu, not represented as HDFS
+ data files.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Kudu handles some of the underlying mechanics of partitioning the data. You can specify
+ the partitioning scheme with combinations of hash and range partitioning, so that you can
+ decide how much effort to expend to manage the partitions as new data arrives. For example,
+ you can construct partitions that apply to date ranges rather than a separate partition for each
+ day or each hour.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Data is physically divided based on units of storage called <dfn class="term">tablets</dfn>. Tablets are
+ stored by <dfn class="term">tablet servers</dfn>. Each tablet server can store multiple tablets,
+ and each tablet is replicated across multiple tablet servers, managed automatically by Kudu.
+ Where practical, colocate the tablet servers on the same hosts as the DataNodes, although that is not required.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ One consideration for the cluster topology is that the number of replicas for a Kudu table
+ must be odd.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_kudu__kudu_ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Impala DDL Enhancements for Kudu Tables (CREATE TABLE and ALTER TABLE)</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can use the Impala <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements to create and fine-tune the characteristics of Kudu tables. Because Kudu
+ tables have features and properties that do not apply to other kinds of Impala tables,
+ familiarize yourself with Kudu-related concepts and syntax first.
+ For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+ statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="kudu_ddl__kudu_primary_key">
+
+ <h3 class="title topictitle3" id="ariaid-title6">Primary Key Columns for Kudu Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Kudu tables introduce the notion of primary keys to Impala for the first time. The
+ primary key is made up of one or more columns, whose values are combined and used as a
+ lookup key during queries. The tuple represented by these columns must be unique and cannot contain any
+ <code class="ph codeph">NULL</code> values, and can never be updated once inserted. For a
+ Kudu table, all the partition key columns must come from the set of
+ primary key columns.
+ </p>
+
+ <p class="p">
+ The primary key has both physical and logical aspects:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ On the physical side, it is used to map the data values to particular tablets for fast retrieval.
+ Because the tuples formed by the primary key values are unique, the primary key columns are typically
+ highly selective.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ On the logical side, the uniqueness constraint allows you to avoid duplicate data in a table.
+ For example, if an <code class="ph codeph">INSERT</code> operation fails partway through, only some of the
+ new rows might be present in the table. You can re-run the same <code class="ph codeph">INSERT</code>, and
+ only the missing rows will be added. Or if data in the table is stale, you can run an
+ <code class="ph codeph">UPSERT</code> statement that brings the data up to date, without the possibility
+ of creating duplicate copies of existing rows.
+ </p>
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Impala only allows <code class="ph codeph">PRIMARY KEY</code> clauses and <code class="ph codeph">NOT NULL</code>
+ constraints on columns for Kudu tables. These constraints are enforced on the Kudu side.
+ </p>
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="kudu_ddl__kudu_column_attributes">
+
+ <h3 class="title topictitle3" id="ariaid-title7">Kudu-Specific Column Attributes for CREATE TABLE</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the general syntax of the <code class="ph codeph">CREATE TABLE</code>
+ statement for Kudu tables, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+ The following sections provide more detail for some of the
+ Kudu-specific keywords you can use in column definitions.
+ </p>
+
+ <p class="p">
+ The column list in a <code class="ph codeph">CREATE TABLE</code> statement can include the following
+ attributes, which only apply to Kudu tables:
+ </p>
+
+<pre class="pre codeblock"><code>
+ PRIMARY KEY
+| [NOT] NULL
+| ENCODING <var class="keyword varname">codec</var>
+| COMPRESSION <var class="keyword varname">algorithm</var>
+| DEFAULT <var class="keyword varname">constant_expression</var>
+| BLOCK_SIZE <var class="keyword varname">number</var>
+</code></pre>
+
+ <p class="p toc inpage">
+ See the following sections for details about each column attribute.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title8" id="kudu_column_attributes__kudu_primary_key_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title8">PRIMARY KEY Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The primary key for a Kudu table is a column, or set of columns, that uniquely
+ identifies every row. The primary key value also is used as the natural sort order
+ for the values from the table. The primary key value for each row is based on the
+ combination of values for the columns.
+ </p>
+
+ <p class="p">
+ Because all of the primary key columns must have non-null values, specifying a column
+ in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+ NULL</code> attribute to that column.
+ </p>
+
+ <p class="p">
+ The primary key columns must be the first ones specified in the <code class="ph codeph">CREATE
+ TABLE</code> statement. For a single-column primary key, you can include a
+ <code class="ph codeph">PRIMARY KEY</code> attribute inline with the column definition. For a
+ multi-column primary key, you include a <code class="ph codeph">PRIMARY KEY (<var class="keyword varname">c1</var>,
+ <var class="keyword varname">c2</var>, ...)</code> clause as a separate entry at the end of the
+ column list.
+ </p>
+
+ <p class="p">
+ You can specify the <code class="ph codeph">PRIMARY KEY</code> attribute either inline in a single
+ column definition, or as a separate clause at the end of the column list:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_inline
+(
+ col1 BIGINT PRIMARY KEY,
+ col2 STRING,
+ col3 BOOLEAN
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+
+CREATE TABLE pk_at_end
+(
+ col1 BIGINT,
+ col2 STRING,
+ col3 BOOLEAN,
+ PRIMARY KEY (col1)
+) PARTITION BY HASH(col1) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+ <p class="p">
+ When the primary key is a single column, these two forms are equivalent. If the
+ primary key consists of more than one column, you must specify the primary key using
+ a separate entry in the column list:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE pk_multiple_columns
+(
+ col1 BIGINT,
+ col2 STRING,
+ col3 BOOLEAN,
+ <strong class="ph b">PRIMARY KEY (col1, col2)</strong>
+) PARTITION BY HASH(col2) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">SHOW CREATE TABLE</code> statement always represents the
+ <code class="ph codeph">PRIMARY KEY</code> specification as a separate item in the column list:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE inline_pk_rewritten (id BIGINT <strong class="ph b">PRIMARY KEY</strong>, s STRING)
+ PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+SHOW CREATE TABLE inline_pk_rewritten;
++------------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------------+
+| CREATE TABLE user.inline_pk_rewritten ( |
+| id BIGINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+| s STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+| <strong class="ph b">PRIMARY KEY (id)</strong> |
+| ) |
+| PARTITION BY HASH (id) PARTITIONS 2 |
+| STORED AS KUDU |
+| TBLPROPERTIES ('kudu.master_addresses'='host.example.com') |
++------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The notion of primary key only applies to Kudu tables. Every Kudu table requires a
+ primary key. The primary key consists of one or more columns. You must specify any
+ primary key columns first in the column list.
+ </p>
+
+ <p class="p">
+ The contents of the primary key columns cannot be changed by an
+ <code class="ph codeph">UPDATE</code> or <code class="ph codeph">UPSERT</code> statement. Including too many
+ columns in the primary key (more than 5 or 6) can also reduce the performance of
+ write operations. Therefore, pick the most selective and most frequently
+ tested non-null columns for the primary key specification.
+ If a column must always have a value, but that value
+ might change later, leave it out of the primary key and use a <code class="ph codeph">NOT
+ NULL</code> clause for that column instead. If an existing row has an
+ incorrect or outdated key column value, delete the old row and insert an entirely
+ new row with the correct primary key.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title9" id="kudu_column_attributes__kudu_not_null_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title9">NULL | NOT NULL Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For Kudu tables, you can specify which columns can contain nulls or not. This
+ constraint offers an extra level of consistency enforcement for Kudu tables. If an
+ application requires a field to always be specified, include a <code class="ph codeph">NOT
+ NULL</code> clause in the corresponding column definition, and Kudu prevents rows
+ from being inserted with a <code class="ph codeph">NULL</code> in that column.
+ </p>
+
+ <p class="p">
+ For example, a table containing geographic information might require the latitude
+ and longitude coordinates to always be specified. Other attributes might be allowed
+ to be <code class="ph codeph">NULL</code>. For example, a location might not have a designated
+ place name, its altitude might be unimportant, and its population might be initially
+ unknown, to be filled in later.
+ </p>
+
+ <p class="p">
+ Because all of the primary key columns must have non-null values, specifying a column
+ in the <code class="ph codeph">PRIMARY KEY</code> clause implicitly adds the <code class="ph codeph">NOT
+ NULL</code> attribute to that column.
+ </p>
+
+ <p class="p">
+ For non-Kudu tables, Impala allows any column to contain <code class="ph codeph">NULL</code>
+ values, because it is not practical to enforce a <span class="q">"not null"</span> constraint on HDFS
+ data files that could be prepared using external tools and ETL processes.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE required_columns
+(
+ id BIGINT PRIMARY KEY,
+ latitude DOUBLE NOT NULL,
+ longitude DOUBLE NOT NULL,
+ place_name STRING,
+ altitude DOUBLE,
+ population BIGINT
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+ <p class="p">
+ During performance optimization, Kudu can use the knowledge that nulls are not
+ allowed to skip certain checks on each input row, speeding up queries and join
+ operations. Therefore, specify <code class="ph codeph">NOT NULL</code> constraints when
+ appropriate.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">NULL</code> clause is the default condition for all columns that are not
+ part of the primary key. You can omit it, or specify it to clarify that you have made a
+ conscious design decision to allow nulls in a column.
+ </p>
+
+ <p class="p">
+ Because primary key columns cannot contain any <code class="ph codeph">NULL</code> values, the
+ <code class="ph codeph">NOT NULL</code> clause is not required for the primary key columns,
+ but you might still specify it to make your code self-describing.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title10" id="kudu_column_attributes__kudu_default_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title10">DEFAULT Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can specify a default value for columns in Kudu tables. The default value can be
+ any constant expression, for example, a combination of literal values, arithmetic
+ and string operations. It cannot contain references to columns or non-deterministic
+ function calls.
+ </p>
+
+ <p class="p">
+ The following example shows different kinds of expressions for the
+ <code class="ph codeph">DEFAULT</code> clause. The requirement to use a constant value means that
+ you can fill in a placeholder value such as <code class="ph codeph">NULL</code>, empty string,
+ 0, -1, <code class="ph codeph">'N/A'</code> and so on, but you cannot reference functions or
+ column names. Therefore, you cannot use <code class="ph codeph">DEFAULT</code> to do things such as
+ automatically making an uppercase copy of a string value, storing Boolean values based
+ on tests of other columns, or add or subtract one from another column representing a sequence number.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE default_vals
+(
+ id BIGINT PRIMARY KEY,
+ name STRING NOT NULL DEFAULT 'unknown',
+ address STRING DEFAULT upper('no fixed address'),
+ age INT DEFAULT -1,
+ earthling BOOLEAN DEFAULT TRUE,
+ planet_of_origin STRING DEFAULT 'Earth',
+ optional_col STRING DEFAULT NULL
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ When designing an entirely new schema, prefer to use <code class="ph codeph">NULL</code> as the
+ placeholder for any unknown or missing values, because that is the universal convention
+ among database systems. Null values can be stored efficiently, and easily checked with the
+ <code class="ph codeph">IS NULL</code> or <code class="ph codeph">IS NOT NULL</code> operators. The <code class="ph codeph">DEFAULT</code>
+ attribute is appropriate when ingesting data that already has an established convention for
+ representing unknown or missing values, or where the vast majority of rows have some common
+ non-null value.
+ </p>
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title11" id="kudu_column_attributes__kudu_encoding_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title11">ENCODING Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Each column in a Kudu table can optionally use an encoding, a low-overhead form of
+ compression that reduces the size on disk, then requires additional CPU cycles to
+ reconstruct the original values during queries. Typically, highly compressible data
+ benefits from the reduced I/O to read the data back from disk.
+ </p>
+
+ <div class="p">
+ The encoding keywords that Impala recognizes are:
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">AUTO_ENCODING</code>: use the default encoding based
+ on the column type, which are bitshuffle for the numeric type
+ columns and dictionary for the string type columns.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">PLAIN_ENCODING</code>: leave the value in its original binary format.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">RLE</code>: compress repeated values (when sorted in primary key
+ order) by including a count.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">DICT_ENCODING</code>: when the number of different string values is
+ low, replace the original string with a numeric ID.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">BIT_SHUFFLE</code>: rearrange the bits of the values to efficiently
+ compress sequences of values that are identical or vary only slightly based
+ on primary key order. The resulting encoded data is also compressed with LZ4.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">PREFIX_ENCODING</code>: compress common prefixes in string values; mainly for use internally within Kudu.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+
+
+ <p class="p">
+ The following example shows the Impala keywords representing the encoding types.
+ (The Impala keywords match the symbolic names used within Kudu.)
+ For usage guidelines on the different kinds of encoding, see
+ <a class="xref" href="https://kudu.apache.org/docs/schema_design.html" target="_blank">the Kudu documentation</a>.
+ The <code class="ph codeph">DESCRIBE</code> output shows how the encoding is reported after
+ the table is created, and that omitting the encoding (in this case, for the
+ <code class="ph codeph">ID</code> column) is the same as specifying <code class="ph codeph">DEFAULT_ENCODING</code>.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE various_encodings
+(
+ id BIGINT PRIMARY KEY,
+ c1 BIGINT ENCODING PLAIN_ENCODING,
+ c2 BIGINT ENCODING AUTO_ENCODING,
+ c3 TINYINT ENCODING BIT_SHUFFLE,
+ c4 DOUBLE ENCODING BIT_SHUFFLE,
+ c5 BOOLEAN ENCODING RLE,
+ c6 STRING ENCODING DICT_ENCODING,
+ c7 STRING ENCODING PREFIX_ENCODING
+) PARTITION BY HASH(id) PARTITIONS 2 STORED AS KUDU;
+
+-- Some columns are omitted from the output for readability.
+describe various_encodings;
++------+---------+-------------+----------+-----------------+
+| name | type | primary_key | nullable | encoding |
++------+---------+-------------+----------+-----------------+
+| id | bigint | true | false | AUTO_ENCODING |
+| c1 | bigint | false | true | PLAIN_ENCODING |
+| c2 | bigint | false | true | AUTO_ENCODING |
+| c3 | tinyint | false | true | BIT_SHUFFLE |
+| c4 | double | false | true | BIT_SHUFFLE |
+| c5 | boolean | false | true | RLE |
+| c6 | string | false | true | DICT_ENCODING |
+| c7 | string | false | true | PREFIX_ENCODING |
++------+---------+-------------+----------+-----------------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title12" id="kudu_column_attributes__kudu_compression_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title12">COMPRESSION Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can specify a compression algorithm to use for each column in a Kudu table. This
+ attribute imposes more CPU overhead when retrieving the values than the
+ <code class="ph codeph">ENCODING</code> attribute does. Therefore, use it primarily for columns with
+ long strings that do not benefit much from the less-expensive <code class="ph codeph">ENCODING</code>
+ attribute.
+ </p>
+
+ <p class="p">
+ The choices for <code class="ph codeph">COMPRESSION</code> are <code class="ph codeph">LZ4</code>,
+ <code class="ph codeph">SNAPPY</code>, and <code class="ph codeph">ZLIB</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Columns that use the <code class="ph codeph">BITSHUFFLE</code> encoding are already compressed
+ using <code class="ph codeph">LZ4</code>, and so typically do not need any additional
+ <code class="ph codeph">COMPRESSION</code> attribute.
+ </p>
+ </div>
+
+ <p class="p">
+ The following example shows design considerations for several
+ <code class="ph codeph">STRING</code> columns with different distribution characteristics, leading
+ to choices for both the <code class="ph codeph">ENCODING</code> and <code class="ph codeph">COMPRESSION</code>
+ attributes. The <code class="ph codeph">country</code> values come from a specific set of strings,
+ therefore this column is a good candidate for dictionary encoding. The
+ <code class="ph codeph">post_id</code> column contains an ascending sequence of integers, where
+ several leading bits are likely to be all zeroes, therefore this column is a good
+ candidate for bitshuffle encoding. The <code class="ph codeph">body</code>
+ column and the corresponding columns for translated versions tend to be long unique
+ strings that are not practical to use with any of the encoding schemes, therefore
+ they employ the <code class="ph codeph">COMPRESSION</code> attribute instead. The ideal compression
+ codec in each case would require some experimentation to determine how much space
+ savings it provided and how much CPU overhead it added, based on real-world data.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE blog_posts
+(
+ user_id STRING ENCODING DICT_ENCODING,
+ post_id BIGINT ENCODING BIT_SHUFFLE,
+ subject STRING ENCODING PLAIN_ENCODING,
+ body STRING COMPRESSION LZ4,
+ spanish_translation STRING COMPRESSION SNAPPY,
+ esperanto_translation STRING COMPRESSION ZLIB,
+ PRIMARY KEY (user_id, post_id)
+) PARTITION BY HASH(user_id, post_id) PARTITIONS 2 STORED AS KUDU;
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title13" id="kudu_column_attributes__kudu_block_size_attribute">
+
+ <h4 class="title topictitle4" id="ariaid-title13">BLOCK_SIZE Attribute</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Kudu does not use HDFS files internally, and thus is not affected by
+ the HDFS block size, it does have an underlying unit of I/O called the
+ <dfn class="term">block size</dfn>. The <code class="ph codeph">BLOCK_SIZE</code> attribute lets you set the
+ block size for any column.
+ </p>
+
+ <p class="p">
+ The block size attribute is a relatively advanced feature. Refer to
+ <a class="xref" href="https://kudu.apache.org/docs/index.html" target="_blank">the Kudu documentation</a>
+ for usage details.
+ </p>
+
+
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="kudu_ddl__kudu_partitioning">
+
+ <h3 class="title topictitle3" id="ariaid-title14">Partitioning for Kudu Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Kudu tables use special mechanisms to distribute data among the underlying
+ tablet servers. Although we refer to such tables as partitioned tables, they are
+ distinguished from traditional Impala partitioned tables by use of different clauses
+ on the <code class="ph codeph">CREATE TABLE</code> statement. Kudu tables use
+ <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">HASH</code>, <code class="ph codeph">RANGE</code>, and
+ range specification clauses rather than the <code class="ph codeph">PARTITIONED BY</code> clause
+ for HDFS-backed tables, which specifies only a column name and creates a new partition for each
+ different value.
+ </p>
+
+ <p class="p">
+ For background information and architectural details about the Kudu partitioning
+ mechanism, see
+ <a class="xref" href="https://kudu.apache.org/kudu.pdf" target="_blank">the Kudu white paper, section 3.2</a>.
+ </p>
+
+
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The Impala DDL syntax for Kudu tables is different than in early Kudu versions,
+ which used an experimental fork of the Impala code. For example, the
+ <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITION BY</code>, the
+ <code class="ph codeph">INTO <var class="keyword varname">n</var> BUCKETS</code> clause is now
+ <code class="ph codeph">PARTITIONS <var class="keyword varname">n</var></code> and the range partitioning syntax
+ is reworked to replace the <code class="ph codeph">SPLIT ROWS</code> clause with more expressive
+ syntax involving comparison operators.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="kudu_partitioning__kudu_hash_partitioning">
+ <h4 class="title topictitle4" id="ariaid-title15">Hash Partitioning</h4>
+ <div class="body conbody">
+
+ <p class="p">
+ Hash partitioning is the simplest type of partitioning for Kudu tables.
+ For hash-partitioned Kudu tables, inserted rows are divided up between a fixed number
+ of <span class="q">"buckets"</span> by applying a hash function to the values of the columns specified
+ in the <code class="ph codeph">HASH</code> clause.
+ Hashing ensures that rows with similar values are evenly distributed, instead of
+ clumping together all in the same bucket. Spreading new rows across the buckets this
+ way lets insertion operations work in parallel across multiple tablet servers.
+ Separating the hashed values can impose additional overhead on queries, where
+ queries with range-based predicates might have to read multiple tablets to retrieve
+ all the relevant values.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- 1M rows with 50 hash partitions = approximately 20,000 rows per partition.
+-- The values in each partition are not sequential, but rather based on a hash function.
+-- Rows 1, 99999, and 123456 might be in the same partition.
+CREATE TABLE million_rows (id string primary key, s string)
+ PARTITION BY HASH(id) PARTITIONS 50
+ STORED AS KUDU;
+
+-- Because the ID values are unique, we expect the rows to be roughly
+-- evenly distributed between the buckets in the destination table.
+INSERT INTO million_rows SELECT * FROM billion_rows ORDER BY id LIMIT 1e6;
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The largest number of buckets that you can create with a <code class="ph codeph">PARTITIONS</code>
+ clause varies depending on the number of tablet servers in the cluster, while the smallest is 2.
+ For simplicity, some of the simple <code class="ph codeph">CREATE TABLE</code> statements throughout this section
+ use <code class="ph codeph">PARTITIONS 2</code> to illustrate the minimum requirements for a Kudu table.
+ For large tables, prefer to use roughly 10 partitions per server in the cluster.
+ </p>
+ </div>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title16" id="kudu_partitioning__kudu_range_partitioning">
+ <h4 class="title topictitle4" id="ariaid-title16">Range Partitioning</h4>
+ <div class="body conbody">
+
+ <p class="p">
+ Range partitioning lets you specify partitioning precisely, based on single values or ranges
+ of values within one or more columns. You add one or more <code class="ph codeph">RANGE</code> clauses to the
+ <code class="ph codeph">CREATE TABLE</code> statement, following the <code class="ph codeph">PARTITION BY</code>
+ clause.
+ </p>
+
+ <p class="p">
+ Range-partitioned Kudu tables use one or more range clauses, which include a
+ combination of constant expressions, <code class="ph codeph">VALUE</code> or <code class="ph codeph">VALUES</code>
+ keywords, and comparison operators. (This syntax replaces the <code class="ph codeph">SPLIT
+ ROWS</code> clause used with early Kudu versions.)
+ For the full syntax, see <a class="xref" href="impala_create_table.html">CREATE TABLE Statement</a>.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- 50 buckets, all for IDs beginning with a lowercase letter.
+-- Having only a single range enforces the allowed range of values
+-- but does not add any extra parallelism.
+create table million_rows_one_range (id string primary key, s string)
+ partition by hash(id) partitions 50,
+ range (partition 'a' <= values < '{')
+ stored as kudu;
+
+-- 50 buckets for IDs beginning with a lowercase letter
+-- plus 50 buckets for IDs beginning with an uppercase letter.
+-- Total number of buckets = number in the PARTITIONS clause x number of ranges.
+-- We are still enforcing constraints on the primary key values
+-- allowed in the table, and the 2 ranges provide better parallelism
+-- as rows are inserted or the table is scanned.
+create table million_rows_two_ranges (id string primary key, s string)
+ partition by hash(id) partitions 50,
+ range (partition 'a' <= values < '{', partition 'A' <= values < '[')
+ stored as kudu;
+
+-- Same as previous table, with an extra range covering the single key value '00000'.
+create table million_rows_three_ranges (id string primary key, s string)
+ partition by hash(id) partitions 50,
+ range (partition 'a' <= values < '{', partition 'A' <= values < '[', partition value = '00000')
+ stored as kudu;
+
+-- The range partitioning can be displayed with a SHOW command in impala-shell.
+show range partitions million_rows_three_ranges;
++---------------------+
+| RANGE (id) |
++---------------------+
+| VALUE = "00000" |
+| "A" <= VALUES < "[" |
+| "a" <= VALUES < "{" |
++---------------------+
+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ When defining ranges, be careful to avoid <span class="q">"fencepost errors"</span> where values at the
+ extreme ends might be included or omitted by accident. For example, in the tables defined
+ in the preceding code listings, the range <code class="ph codeph">"a" <= VALUES < "{"</code> ensures that
+ any values starting with <code class="ph codeph">z</code>, such as <code class="ph codeph">za</code> or <code class="ph codeph">zzz</code>
+ or <code class="ph codeph">zzz-ZZZ</code>, are all included, by using a less-than operator for the smallest
+ value after all the values starting with <code class="ph codeph">z</code>.
+ </p>
+ </div>
+
+ <p class="p">
+ For range-partitioned Kudu tables, an appropriate range must exist before a data value can be created in the table.
+ Any <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statements fail if they try to
+ create column values that fall outside the specified ranges. The error checking for ranges is performed on the
+ Kudu side; Impala passes the specified range information to Kudu, and passes back any error or warning if the
+ ranges are not valid. (A nonsensical range specification causes an error for a DDL statement, but only a warning
+ for a DML statement.)
+ </p>
+
+ <p class="p">
+ Ranges can be non-contiguous:
+ </p>
+
+<pre class="pre codeblock"><code>
+partition by range (year) (partition 1885 <= values <= 1889, partition 1893 <= values <= 1897)
+
+partition by range (letter_grade) (partition value = 'A', partition value = 'B',
+ partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ALTER TABLE</code> statement with the <code class="ph codeph">ADD PARTITION</code> or
+ <code class="ph codeph">DROP PARTITION</code> clauses can be used to add or remove ranges from an
+ existing Kudu table.
+ </p>
+
+<pre class="pre codeblock"><code>
+ALTER TABLE foo ADD PARTITION 30 <= VALUES < 50;
+ALTER TABLE foo DROP PARTITION 1 <= VALUES < 5;
+
+</code></pre>
+
+ <p class="p">
+ When a range is added, the new range must not overlap with any of the previous ranges;
+ that is, it can only fill in gaps within the previous ranges.
+ </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores add range partition value = 'E';
+
+alter table year_ranges add range partition 1890 <= values < 1893;
+
+</code></pre>
+
+ <p class="p">
+ When a range is removed, all the associated rows in the table are deleted. (This
+ is true whether the table is internal or external.)
+ </p>
+
+<pre class="pre codeblock"><code>
+alter table test_scores drop range partition value = 'E';
+
+alter table year_ranges drop range partition 1890 <= values < 1893;
+
+</code></pre>
+
+ <p class="p">
+ Kudu tables can also use a combination of hash and range partitioning.
+ </p>
+
+<pre class="pre codeblock"><code>
+partition by hash (school) partitions 10,
+ range (letter_grade) (partition value = 'A', partition value = 'B',
+ partition value = 'C', partition value = 'D', partition value = 'F')
+
+</code></pre>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title17" id="kudu_partitioning__kudu_partitioning_misc">
+ <h4 class="title topictitle4" id="ariaid-title17">Working with Partitioning in Kudu Tables</h4>
+ <div class="body conbody">
+
+ <p class="p">
+ To see the current partitioning scheme for a Kudu table, you can use the <code class="ph codeph">SHOW
+ CREATE TABLE</code> statement or the <code class="ph codeph">SHOW PARTITIONS</code> statement. The
+ <code class="ph codeph">CREATE TABLE</code> syntax displayed by this statement includes all the
+ hash, range, or both clauses that reflect the original table structure plus any
+ subsequent <code class="ph codeph">ALTER TABLE</code> statements that changed the table structure.
+ </p>
+
+ <p class="p">
+ To see the underlying buckets and partitions for a Kudu table, use the
+ <code class="ph codeph">SHOW TABLE STATS</code> or <code class="ph codeph">SHOW PARTITIONS</code> statement.
+ </p>
+
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="kudu_ddl__kudu_timestamps">
+
+ <h3 class="title topictitle3" id="ariaid-title18">Handling Date, Time, or Timestamp Data with Kudu</h3>
+
+ <div class="body conbody">
+
+ <div class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, you can include <code class="ph codeph">TIMESTAMP</code>
+ columns in Kudu tables, instead of representing the date and time as a <code class="ph codeph">BIGINT</code>
+ value. The behavior of <code class="ph codeph">TIMESTAMP</code> for Kudu tables has some special considerations:
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Any nanoseconds in the original 96-bit value produced by Impala are not stored, because
+ Kudu represents date/time columns using 64-bit values. The nanosecond portion of the value
+ is rounded, not truncated. Therefore, a <code class="ph codeph">TIMESTAMP</code> value
+ that you store in a Kudu table might not be bit-for-bit identical to the value returned by a query.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The conversion between the Impala 96-bit representation and the Kudu 64-bit representation
+ introduces some performance overhead when reading or writing <code class="ph codeph">TIMESTAMP</code>
+ columns. You can minimize the overhead during writes by performing inserts through the
+ Kudu API. Because the overhead during reads applies to each query, you might continue to
+ use a <code class="ph codeph">BIGINT</code> column to represent date/time values in performance-critical
+ applications.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The Impala <code class="ph codeph">TIMESTAMP</code> type has a narrower range for years than the underlying
+ Kudu data type. Impala can represent years 1400-9999. If year values outside this range
+ are written to a Kudu table by a non-Impala client, Impala returns <code class="ph codeph">NULL</code>
+ by default when reading those <code class="ph codeph">TIMESTAMP</code> values during a query. Or, if the
+ <code class="ph codeph">ABORT_ON_ERROR</code> query option is enabled, the query fails when it encounters
+ a value with an out-of-range year.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+<pre class="pre codeblock"><code>--- Make a table representing a date/time value as TIMESTAMP.
+-- The strings representing the partition bounds are automatically
+-- cast to TIMESTAMP values.
+create table native_timestamp(id bigint, when_exactly timestamp, event string, primary key (id, when_exactly))
+ partition by hash (id) partitions 20,
+ range (when_exactly)
+ (
+ partition '2015-01-01' <= values < '2016-01-01',
+ partition '2016-01-01' <= values < '2017-01-01',
+ partition '2017-01-01' <= values < '2018-01-01'
+ )
+ stored as kudu;
+
+insert into native_timestamp values (12345, now(), 'Working on doc examples');
+
+select * from native_timestamp;
++-------+-------------------------------+-------------------------+
+| id | when_exactly | event |
++-------+-------------------------------+-------------------------+
+| 12345 | 2017-05-31 16:27:42.667542000 | Working on doc examples |
++-------+-------------------------------+-------------------------+
+
+</code></pre>
+
+ <p class="p">
+ Because Kudu tables have some performance overhead to convert <code class="ph codeph">TIMESTAMP</code>
+ columns to the Impala 96-bit internal representation, for performance-critical
+ applications you might store date/time information as the number
+ of seconds, milliseconds, or microseconds since the Unix epoch date of January 1,
+ 1970. Specify the column as <code class="ph codeph">BIGINT</code> in the Impala <code class="ph codeph">CREATE
+ TABLE</code> statement, corresponding to an 8-byte integer (an
+ <code class="ph codeph">int64</code>) in the underlying Kudu table). Then use Impala date/time
+ conversion functions as necessary to produce a numeric, <code class="ph codeph">TIMESTAMP</code>,
+ or <code class="ph codeph">STRING</code> value depending on the context.
+ </p>
+
+ <p class="p">
+ For example, the <code class="ph codeph">unix_timestamp()</code> function returns an integer result
+ representing the number of seconds past the epoch. The <code class="ph codeph">now()</code> function
+ produces a <code class="ph codeph">TIMESTAMP</code> representing the current date and time, which can
+ be passed as an argument to <code class="ph codeph">unix_timestamp()</code>. And string literals
+ representing dates and date/times can be cast to <code class="ph codeph">TIMESTAMP</code>, and from there
+ converted to numeric values. The following examples show how you might store a date/time
+ column as <code class="ph codeph">BIGINT</code> in a Kudu table, but still use string literals and
+ <code class="ph codeph">TIMESTAMP</code> values for convenience.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- now() returns a TIMESTAMP and shows the format for string literals you can cast to TIMESTAMP.
+select now();
++-------------------------------+
+| now() |
++-------------------------------+
+| 2017-01-25 23:50:10.132385000 |
++-------------------------------+
+
+-- unix_timestamp() accepts either a TIMESTAMP or an equivalent string literal.
+select unix_timestamp(now());
++------------------+
+| unix_timestamp() |
++------------------+
+| 1485386670 |
++------------------+
+
+select unix_timestamp('2017-01-01');
++------------------------------+
+| unix_timestamp('2017-01-01') |
++------------------------------+
+| 1483228800 |
++------------------------------+
+
+-- Make a table representing a date/time value as BIGINT.
+-- Construct 1 range partition and 20 associated hash partitions for each year.
+-- Use date/time conversion functions to express the ranges as human-readable dates.
+create table time_series(id bigint, when_exactly bigint, event string, primary key (id, when_exactly))
+ partition by hash (id) partitions 20,
+ range (when_exactly)
+ (
+ partition unix_timestamp('2015-01-01') <= values < unix_timestamp('2016-01-01'),
+ partition unix_timestamp('2016-01-01') <= values < unix_timestamp('2017-01-01'),
+ partition unix_timestamp('2017-01-01') <= values < unix_timestamp('2018-01-01')
+ )
+ stored as kudu;
+
+-- On insert, we can transform a human-readable date/time into a numeric value.
+insert into time_series values (12345, unix_timestamp('2017-01-25 23:24:56'), 'Working on doc examples');
+
+-- On retrieval, we can examine the numeric date/time value or turn it back into a string for readability.
+select id, when_exactly, from_unixtime(when_exactly) as 'human-readable date/time', event
+ from time_series order by when_exactly limit 100;
++-------+--------------+--------------------------+-------------------------+
+| id | when_exactly | human-readable date/time | event |
++-------+--------------+--------------------------+-------------------------+
+| 12345 | 1485386696 | 2017-01-25 23:24:56 | Working on doc examples |
++-------+--------------+--------------------------+-------------------------+
+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ If you do high-precision arithmetic involving numeric date/time values,
+ when dividing millisecond values by 1000, or microsecond values by 1 million, always
+ cast the integer numerator to a <code class="ph codeph">DECIMAL</code> with sufficient precision
+ and scale to avoid any rounding or loss of precision.
+ </p>
+ </div>
+
+<pre class="pre codeblock"><code>
+-- 1 million and 1 microseconds = 1.000001 seconds.
+select microseconds,
+ cast (microseconds as decimal(20,7)) / 1e6 as fractional_seconds
+ from table_with_microsecond_column;
++--------------+----------------------+
+| microseconds | fractional_seconds |
++--------------+----------------------+
+| 1000001 | 1.000001000000000000 |
++--------------+----------------------+
+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="kudu_ddl__kudu_metadata">
+
+ <h3 class="title topictitle3" id="ariaid-title19">How Impala Handles Kudu Metadata</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Much of the metadata for Kudu tables is handled by the underlying
+ storage layer. Kudu tables have less reliance on the metastore
+ database, and require less metadata caching on the Impala side.
+ For example, information about partitions in Kudu tables is managed
+ by Kudu, and Impala does not cache any block locality metadata
+ for Kudu tables.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+ statements are needed less frequently for Kudu tables than for
+ HDFS-backed tables. Neither statement is needed when data is
+ added to, removed, or updated in a Kudu table, even if the changes
+ are made directly to Kudu through a client program using the Kudu API.
+ Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+ for a Kudu table only after making a change to the Kudu table schema,
+ such as adding or dropping a column, by a mechanism other than
+ Impala.
+ </p>
+
+ <p class="p">
+ Because Kudu manages the metadata for its own tables separately from the metastore
+ database, there is a table name stored in the metastore database for Impala to use,
+ and a table name on the Kudu side, and these names can be modified independently
+ through <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ To avoid potential name conflicts, the prefix <code class="ph codeph">impala::</code>
+ and the Impala database name are encoded into the underlying Kudu
+ table name:
+ </p>
+
+<pre class="pre codeblock"><code>
+create database some_database;
+use some_database;
+
+create table table_name_demo (x int primary key, y int)
+ partition by hash (x) partitions 2 stored as kudu;
+
+describe formatted table_name_demo;
+...
+kudu.table_name | impala::some_database.table_name_demo
+
+</code></pre>
+
+ <p class="p">
+ See <a class="xref" href="impala_tables.html">Overview of Impala Tables</a> for examples of how to change the name of
+ the Impala table in the metastore database, the name of the underlying Kudu
+ table, or both.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="impala_kudu__kudu_etl">
+
+ <h2 class="title topictitle2" id="ariaid-title20">Loading Data into Kudu Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Kudu tables are well-suited to use cases where data arrives continuously, in small or
+ moderate volumes. To bring data into Kudu tables, use the Impala <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">UPSERT</code> statements. The <code class="ph codeph">LOAD DATA</code> statement does
+ not apply to Kudu tables.
+ </p>
+
+ <p class="p">
+ Because Kudu manages its own storage layer that is optimized for smaller block sizes than
+ HDFS, and performs its own housekeeping to keep data evenly distributed, it is not
+ subject to the <span class="q">"many small files"</span> issue and does not need explicit reorganization
+ and compaction as the data grows over time. The partitions within a Kudu table can be
+ specified to cover a variety of possible data distributions, instead of hardcoding a new
+ partition for each new day, hour, and so on, which can lead to inefficient,
+ hard-to-scale, and hard-to-manage partition schemes with HDFS tables.
+ </p>
+
+ <p class="p">
+ Your strategy for performing ETL or bulk updates on Kudu tables should take into account
+ the limitations on consistency for DML operations.
+ </p>
+
+ <p class="p">
+ Make <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, and <code class="ph codeph">UPSERT</code>
+ operations <dfn class="term">idempotent</dfn>: that is, able to be applied multiple times and still
+ produce an identical result.
+ </p>
+
+ <p class="p">
+ If a bulk operation is in danger of exceeding capacity limits due to timeouts or high
+ memory usage, split it into a series of smaller operations.
+ </p>
+
+ <p class="p">
+ Avoid running concurrent ETL operations where the end results depend on precise
+ ordering. In particular, do not rely on an <code class="ph codeph">INSERT ... SELECT</code> statement
+ that selects from the same table into which it is inserting, unless you include extra
+ conditions in the <code class="ph codeph">WHERE</code> clause to avoid reading the newly inserted rows
+ within the same statement.
+ </p>
+
+ <p class="p">
+ Because relationships between tables cannot be enforced by Impala and Kudu, and cannot
+ be committed or rolled back together, do not expect transactional semantics for
+ multi-table operations.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="impala_kudu__kudu_dml">
+
+ <h2 class="title topictitle2" id="ariaid-title21">Impala DML Support for Kudu Tables (INSERT, UPDATE, DELETE, UPSERT)</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala supports certain DML statements for Kudu tables only. The <code class="ph codeph">UPDATE</code>
+ and <code class="ph codeph">DELETE</code> statements let you modify data within Kudu tables without
+ rewriting substantial amounts of table data. The <code class="ph codeph">UPSERT</code> statement acts
+ as a combination of <code class="ph codeph">INSERT</code> and <code class="ph codeph">UPDATE</code>, inserting rows
+ where the primary key does not already exist, and updating the non-primary key columns
+ where the primary key does already exist in the table.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement for Kudu tables honors the unique and <code class="ph codeph">NOT
+ NULL</code> requirements for the primary key columns.
+ </p>
+
+ <p class="p">
+ Because Impala and Kudu do not support transactions, the effects of any
+ <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">DELETE</code> statement
+ are immediately visible. For example, you cannot do a sequence of
+ <code class="ph codeph">UPDATE</code> statements and only make the changes visible after all the
+ statements are finished. Also, if a DML statement fails partway through, any rows that
+ were already inserted, deleted, or changed remain in the table; there is no rollback
+ mechanism to undo the changes.
+ </p>
+
+ <p class="p">
+ In particular, an <code class="ph codeph">INSERT ... SELECT</code> statement that refers to the table
+ being inserted into might insert more rows than expected, because the
+ <code class="ph codeph">SELECT</code> part of the statement sees some of the new rows being inserted
+ and processes them again.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement, which involves manipulation of HDFS data files,
+ does not apply to Kudu tables.
+ </p>
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="impala_kudu__kudu_consistency">
+
+ <h2 class="title topictitle2" id="ariaid-title22">Consistency Considerations for Kudu Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Kudu tables have consistency characteristics such as uniqueness, controlled by the
+ primary key columns, and non-nullable columns. The emphasis for consistency is on
+ preventing duplicate or incomplete data from being stored in a table.
+ </p>
+
+ <p class="p">
+ Currently, Kudu does not enforce strong consistency for order of operations, total
+ success or total failure of a multi-row statement, or data that is read while a write
+ operation is in progress. Changes are applied atomically to each row, but not applied
+ as a single unit to all rows affected by a multi-row DML statement. That is, Kudu does
+ not currently have atomic multi-row statements or isolation between statements.
+ </p>
+
+ <p class="p">
+ If some rows are rejected during a DML operation because of a mismatch with duplicate
+ primary key values, <code class="ph codeph">NOT NULL</code> constraints, and so on, the statement
+ succeeds with a warning. Impala still inserts, deletes, or updates the other rows that
+ are not affected by the constraint violation.
+ </p>
+
+ <p class="p">
+ Consequently, the number of rows affected by a DML operation on a Kudu table might be
+ different than you expect.
+ </p>
+
+ <p class="p">
+ Because there is no strong consistency guarantee for information being inserted into,
+ deleted from, or updated across multiple tables simultaneously, consider denormalizing
+ the data where practical. That is, if you run separate <code class="ph codeph">INSERT</code>
+ statements to insert related rows into two different tables, one <code class="ph codeph">INSERT</code>
+ might fail while the other succeeds, leaving the data in an inconsistent state. Even if
+ both inserts succeed, a join query might happen during the interval between the
+ completion of the first and second statements, and the query would encounter incomplete
+ inconsistent data. Denormalizing the data into a single wide table can reduce the
+ possibility of inconsistency due to multi-table operations.
+ </p>
+
+ <p class="p">
+ Information about the number of rows affected by a DML operation is reported in
+ <span class="keyword cmdname">impala-shell</span> output, and in the <code class="ph codeph">PROFILE</code> output, but
+ is not currently reported to HiveServer2 clients such as JDBC or ODBC applications.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="impala_kudu__kudu_security">
+
+ <h2 class="title topictitle2" id="ariaid-title23">Security Considerations for Kudu Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Security for Kudu tables involves:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Sentry authorization.
+ </p>
+ <div class="p">
+ Access to Kudu tables must be granted to and revoked from roles with the
+ following considerations:
+ <ul class="ul">
+ <li class="li">
+ Only users with the <code class="ph codeph">ALL</code> privilege on
+ <code class="ph codeph">SERVER</code> can create external Kudu tables.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+ required to specify the <code class="ph codeph">kudu.master_addresses</code>
+ property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+ tables as well as external tables.
+ </li>
+ <li class="li">
+ Access to Kudu tables is enforced at the table level and at the
+ column level.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+ permissions are supported.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+ <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+ privilege.
+ </li>
+ </ul>
+ Because non-SQL APIs can access Kudu data without going through Sentry
+ authorization, currently the Sentry support is considered preliminary
+ and subject to change.
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Kerberos authentication. See <a class="xref" href="https://kudu.apache.org/docs/security.html" target="_blank">Kudu Security</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ TLS encryption. See <a class="xref" href="https://kudu.apache.org/docs/security.html" target="_blank">Kudu Security</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Lineage tracking.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Auditing.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Redaction of sensitive information from log files.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="impala_kudu__kudu_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title24">Impala Query Performance for Kudu Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For queries involving Kudu tables, Impala can delegate much of the work of filtering the
+ result set to Kudu, avoiding some of the I/O involved in full table scans of tables
+ containing HDFS data files. This type of optimization is especially effective for
+ partitioned Kudu tables, where the Impala query <code class="ph codeph">WHERE</code> clause refers to
+ one or more primary key columns that are also used as partition key columns. For
+ example, if a partitioned Kudu table uses a <code class="ph codeph">HASH</code> clause for
+ <code class="ph codeph">col1</code> and a <code class="ph codeph">RANGE</code> clause for <code class="ph codeph">col2</code>, a
+ query using a clause such as <code class="ph codeph">WHERE col1 IN (1,2,3) AND col2 > 100</code>
+ can determine exactly which tablet servers contain relevant data, and therefore
+ parallelize the query very efficiently.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, Impala can push down additional
+ information to optimize join queries involving Kudu tables. If the join clause
+ contains predicates of the form
+ <code class="ph codeph"><var class="keyword varname">column</var> = <var class="keyword varname">expression</var></code>,
+ after Impala constructs a hash table of possible matching values for the
+ join columns from the bigger table (either an HDFS table or a Kudu table), Impala
+ can <span class="q">"push down"</span> the minimum and maximum matching column values to Kudu,
+ so that Kudu can more efficiently locate matching rows in the second (smaller) table.
+ These min/max filters are affected by the <code class="ph codeph">RUNTIME_FILTER_MODE</code>,
+ <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code>, and <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code>
+ query options; the min/max filters are not affected by the
+ <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>, <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code>,
+ <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code>, and <code class="ph codeph">MAX_NUM_RUNTIME_FILTERS</code>
+ query options.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_explain.html">EXPLAIN Statement</a> for examples of evaluating the effectiveness of
+ the predicate pushdown for a specific query against a Kudu table.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+ statement does not apply to a table reference derived from a view, a subquery,
+ or anything other than a real base table. This clause only works for tables
+ backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+ HBase tables.
+ </p>
+
+
+
+
+ </div>
+
+
+
+
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref.html b/docs/build3x/html/topics/impala_langref.html
new file mode 100644
index 0000000..a515a63
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref.html
@@ -0,0 +1,66 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_comments.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_literals.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_operators.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_unsupported.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_porting.html"><met
a name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Language Reference</title></head><body id="langref"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Language Reference</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala uses SQL as its query language. To protect user investment in skills development and query
+ design, Impala provides a high degree of compatibility with the Hive Query Language (HiveQL):
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Because Impala uses the same metadata store as Hive to record information about table structure and
+ properties, Impala can access tables defined through the native Impala <code class="ph codeph">CREATE TABLE</code>
+ command, or tables created using the Hive data definition language (DDL).
+ </li>
+
+ <li class="li">
+ Impala supports data manipulation (DML) statements similar to the DML component of HiveQL.
+ </li>
+
+ <li class="li">
+ Impala provides many <a class="xref" href="impala_functions.html#builtins">built-in functions</a> with the same
+ names and parameter types as their HiveQL equivalents.
+ </li>
+ </ul>
+
+ <p class="p">
+ Impala supports most of the same <a class="xref" href="impala_langref_sql.html#langref_sql">statements and
+ clauses</a> as HiveQL, including, but not limited to <code class="ph codeph">JOIN</code>, <code class="ph codeph">AGGREGATE</code>,
+ <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">UNION ALL</code>, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">LIMIT</code> and
+ (uncorrelated) subquery in the <code class="ph codeph">FROM</code> clause. Impala also supports <code class="ph codeph">INSERT
+ INTO</code> and <code class="ph codeph">INSERT OVERWRITE</code>.
+ </p>
+
+ <p class="p">
+ Impala supports data types with the same names and semantics as the equivalent Hive data types:
+ <code class="ph codeph">STRING</code>, <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+ <code class="ph codeph">BIGINT</code>, <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, <code class="ph codeph">BOOLEAN</code>,
+ <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>.
+ </p>
+
+ <p class="p">
+ For full details about Impala SQL syntax and semantics, see
+ <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>.
+ </p>
+
+ <p class="p">
+ Most HiveQL <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements run unmodified with Impala. For
+ information about Hive syntax not available in Impala, see
+ <a class="xref" href="impala_langref_unsupported.html#langref_hiveql_delta">SQL Differences Between Impala and Hive</a>.
+ </p>
+
+ <p class="p">
+ For a list of the built-in functions available in Impala queries, see
+ <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_comments.html">Comments</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datatypes.html">Data Types</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_literals.html">Literals</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_operators.html">SQL Operators</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_sql.html">Impala SQL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions.html">Impala Built-In Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_langref_unsupported.html">SQL Diff
erences Between Impala and Hive</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_porting.html">Porting SQL from Other Database Systems to Impala</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref_sql.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref_sql.html b/docs/build3x/html/topics/impala_langref_sql.html
new file mode 100644
index 0000000..65c6d55
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref_sql.html
@@ -0,0 +1,28 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ddl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_dml.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_alter_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compute_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_role.html"><meta name
="DC.Relation" scheme="URI" content="../topics/impala_create_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_create_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delete.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_describe.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_database.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_function.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_role.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_drop_view.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_grant.html"><meta name="DC.Relation" scheme="URI" cont
ent="../topics/impala_insert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_invalidate_metadata.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_load_data.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_refresh.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_revoke.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_show.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_truncate_table.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_update.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_upsert.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_use.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hints.html"><meta name="prodname"
content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_sql"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala SQL Statements</title></head><body id="langref_sql"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala SQL Statements</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala SQL dialect supports a range of standard elements, plus some extensions for Big Data use cases
+ related to data loading and data warehousing.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In the <span class="keyword cmdname">impala-shell</span> interpreter, a semicolon at the end of each statement is required.
+ Since the semicolon is not actually part of the SQL syntax, we do not include it in the syntax definition
+ of each statement, but we do show it in examples intended to be run in <span class="keyword cmdname">impala-shell</span>.
+ </p>
+ </div>
+
+ <p class="p toc all">
+ The following sections show the major SQL statements that you work with in Impala:
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_ddl.html">DDL Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_dml.html">DML Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_table.html">ALTER TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_alter_view.html">ALTER VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compute_stats.html">COMPUTE STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_database.html">CREATE DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_function.html">CREATE FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_role.h
tml">CREATE ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_table.html">CREATE TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_create_view.html">CREATE VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_describe.html">DESCRIBE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_database.html">DROP DATABASE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_function.html">DROP FUNCTION Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_role.html">DROP ROLE Statement (Impala 2.0 or higher only)</a></strong><br></li><li cla
ss="link ulchildlink"><strong><a href="../topics/impala_drop_stats.html">DROP STATS Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_table.html">DROP TABLE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_drop_view.html">DROP VIEW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain.html">EXPLAIN Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_insert.html">INSERT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_invalidate_metadata.html">INVALIDATE METADATA Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_load_data.html">LOAD DATA Statement</a></strong><br></li><li class=
"link ulchildlink"><strong><a href="../topics/impala_refresh.html">REFRESH Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_select.html">SELECT Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_set.html">SET Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_show.html">SHOW Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_truncate_table.html">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_upsert.html">UPSERT Statement (Impala 2.8 or higher on
ly)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_use.html">USE Statement</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hints.html">Optimizer Hints</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
[47/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_median.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_median.html b/docs/build3x/html/topics/impala_appx_median.html
new file mode 100644
index 0000000..3003ec0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_median.html
@@ -0,0 +1,132 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_median"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_MEDIAN Function</title></head><body id="appx_median"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">APPX_MEDIAN Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns a value that is approximately the median (midpoint) of values in the set
+ of input values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>APPX_MEDIAN([DISTINCT | ALL] <var class="keyword varname">expression</var>)
+</code></pre>
+
+ <p class="p">
+ This function works with any input type, because the only requirement is that the type supports less-than and
+ greater-than comparison operators.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the return value represents the estimated midpoint, it might not reflect the precise midpoint value,
+ especially if the cardinality of the input values is very high. If the cardinality is low (up to
+ approximately 20,000), the result is more accurate because the sampling considers all or almost all of the
+ different values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ arguments which produce a <code class="ph codeph">STRING</code> result
+ </p>
+
+ <p class="p">
+ The return value is always the same as one of the input values, not an <span class="q">"in-between"</span> value produced by
+ averaging.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">APPX_MEDIAN</code> function returns only the first 10 characters for
+ string values (string, varchar, char). Additional characters are truncated.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses a table of a million random floating-point numbers ranging up to approximately
+ 50,000. The average is approximately 25,000. Because of the random distribution, we would expect the median
+ to be close to this same number. Computing the precise median is a more intensive operation than computing
+ the average, because it requires keeping track of every distinct value and how many times each occurs. The
+ <code class="ph codeph">APPX_MEDIAN()</code> function uses a sampling algorithm to return an approximate result, which in
+ this case is close to the expected value. To make sure that the value is not substantially out of range due
+ to a skewed distribution, subsequent queries confirm that there are approximately 500,000 values higher than
+ the <code class="ph codeph">APPX_MEDIAN()</code> value, and approximately 500,000 values lower than the
+ <code class="ph codeph">APPX_MEDIAN()</code> value.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select min(x), max(x), avg(x) from million_numbers;
++-------------------+-------------------+-------------------+
+| min(x) | max(x) | avg(x) |
++-------------------+-------------------+-------------------+
+| 4.725693727250069 | 49994.56852674231 | 24945.38563793553 |
++-------------------+-------------------+-------------------+
+[localhost:21000] > select appx_median(x) from million_numbers;
++----------------+
+| appx_median(x) |
++----------------+
+| 24721.6 |
++----------------+
+[localhost:21000] > select count(x) as higher from million_numbers where x > (select appx_median(x) from million_numbers);
++--------+
+| higher |
++--------+
+| 502013 |
++--------+
+[localhost:21000] > select count(x) as lower from million_numbers where x < (select appx_median(x) from million_numbers);
++--------+
+| lower |
++--------+
+| 497987 |
++--------+
+</code></pre>
+
+ <p class="p">
+ The following example computes the approximate median using a subset of the values from the table, and then
+ confirms that the result is a reasonable estimate for the midpoint.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select appx_median(x) from million_numbers where x between 1000 and 5000;
++-------------------+
+| appx_median(x) |
++-------------------+
+| 3013.107787358159 |
++-------------------+
+[localhost:21000] > select count(x) as higher from million_numbers where x between 1000 and 5000 and x > 3013.107787358159;
++--------+
+| higher |
++--------+
+| 37692 |
++--------+
+[localhost:21000] > select count(x) as lower from million_numbers where x between 1000 and 5000 and x < 3013.107787358159;
++-------+
+| lower |
++-------+
+| 37089 |
++-------+
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_array.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_array.html b/docs/build3x/html/topics/impala_array.html
new file mode 100644
index 0000000..caddc89
--- /dev/null
+++ b/docs/build3x/html/topics/impala_array.html
@@ -0,0 +1,321 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="array"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ARRAY Complex Type (Impala 2.3 or higher only)</title></head><body id="array"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ARRAY Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A complex data type that can represent an arbitrary number of ordered elements.
+ The elements can be scalars or another complex type (<code class="ph codeph">ARRAY</code>,
+ <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> ARRAY < <var class="keyword varname">type</var> >
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because complex types are often used in combination,
+ for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, if you are unfamiliar with the Impala complex types,
+ start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+ background information and usage examples.
+ </p>
+
+ <p class="p">
+ The elements of the array have no names. You refer to the value of the array item using the
+ <code class="ph codeph">ITEM</code> pseudocolumn, or its position in the array with the <code class="ph codeph">POS</code>
+ pseudocolumn. See <a class="xref" href="impala_complex_types.html#item">ITEM and POS Pseudocolumns</a> for information about
+ these pseudocolumns.
+ </p>
+
+
+
+ <p class="p">
+ Each row can have a different number of elements (including none) in the array for that row.
+ </p>
+
+
+
+ <p class="p">
+ When an array contains items of scalar types, you can use aggregation functions on the array elements without using join notation. For
+ example, you can find the <code class="ph codeph">COUNT()</code>, <code class="ph codeph">AVG()</code>, <code class="ph codeph">SUM()</code>, and so on of numeric array
+ elements, or the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> of any scalar array elements by referring to
+ <code class="ph codeph"><var class="keyword varname">table_name</var>.<var class="keyword varname">array_column</var></code> in the <code class="ph codeph">FROM</code> clause of the query. When
+ you need to cross-reference values from the array with scalar values from the same row, such as by including a <code class="ph codeph">GROUP
+ BY</code> clause to produce a separate aggregated result for each row, then the join clause is required.
+ </p>
+
+ <p class="p">
+ A common usage pattern with complex types is to have an array as the top-level type for the column:
+ an array of structs, an array of maps, or an array of arrays.
+ For example, you can model a denormalized table by creating a column that is an <code class="ph codeph">ARRAY</code>
+ of <code class="ph codeph">STRUCT</code> elements; each item in the array represents a row from a table that would
+ normally be used in a join query. This kind of data structure lets you essentially denormalize tables by
+ associating multiple rows from one table with the matching row in another table.
+ </p>
+
+ <p class="p">
+ You typically do not create more than one top-level <code class="ph codeph">ARRAY</code> column, because if there is
+ some relationship between the elements of multiple arrays, it is convenient to model the data as
+ an array of another complex type element (either <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Columns with this data type can only be used in tables or partitions with the Parquet file format.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Columns with this data type cannot be used as partition key columns in a partitioned table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p" id="array__d6e3285">
+ The maximum length of the column definition for any complex type, including declarations for any nested types,
+ is 4000 characters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+ and associated guidelines about complex type columns.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+ <p class="p">
+ The following example shows how to construct a table with various kinds of <code class="ph codeph">ARRAY</code> columns,
+ both at the top level and nested within other complex types.
+ Whenever the <code class="ph codeph">ARRAY</code> consists of a scalar value, such as in the <code class="ph codeph">PETS</code>
+ column or the <code class="ph codeph">CHILDREN</code> field, you can see that future expansion is limited.
+ For example, you could not easily evolve the schema to record the kind of pet or the child's birthday alongside the name.
+ Therefore, it is more common to use an <code class="ph codeph">ARRAY</code> whose elements are of <code class="ph codeph">STRUCT</code> type,
+ to associate multiple fields with each array element.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+ using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+ </div>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE array_demo
+(
+ id BIGINT,
+ name STRING,
+-- An ARRAY of scalar type as a top-level column.
+ pets ARRAY <STRING>,
+
+-- An ARRAY with elements of complex type (STRUCT).
+ places_lived ARRAY < STRUCT <
+ place: STRING,
+ start_year: INT
+ >>,
+
+-- An ARRAY as a field (CHILDREN) within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+ marriages ARRAY < STRUCT <
+ spouse: STRING,
+ children: ARRAY <STRING>
+ >>,
+
+-- An ARRAY as the value part of a MAP.
+-- The first MAP field (the key) would be a value such as
+-- 'Parent' or 'Grandparent', and the corresponding array would
+-- represent 2 parents, 4 grandparents, and so on.
+ ancestors MAP < STRING, ARRAY <STRING> >
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+ <p class="p">
+ The following example shows how to examine the structure of a table containing one or more <code class="ph codeph">ARRAY</code> columns by using the
+ <code class="ph codeph">DESCRIBE</code> statement. You can visualize each <code class="ph codeph">ARRAY</code> as its own two-column table, with columns
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>DESCRIBE array_demo;
++--------------+---------------------------+
+| name | type |
++--------------+---------------------------+
+| id | bigint |
+| name | string |
+| pets | array<string> |
+| marriages | array<struct< |
+| | spouse:string, |
+| | children:array<string> |
+| | >> |
+| places_lived | array<struct< |
+| | place:string, |
+| | start_year:int |
+| | >> |
+| ancestors | map<string,array<string>> |
++--------------+---------------------------+
+
+DESCRIBE array_demo.pets;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE array_demo.marriages;
++------+--------------------------+
+| name | type |
++------+--------------------------+
+| item | struct< |
+| | spouse:string, |
+| | children:array<string> |
+| | > |
+| pos | bigint |
++------+--------------------------+
+
+DESCRIBE array_demo.places_lived;
++------+------------------+
+| name | type |
++------+------------------+
+| item | struct< |
+| | place:string, |
+| | start_year:int |
+| | > |
+| pos | bigint |
++------+------------------+
+
+DESCRIBE array_demo.ancestors;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows queries involving <code class="ph codeph">ARRAY</code> columns containing elements of scalar or complex types. You
+ <span class="q">"unpack"</span> each <code class="ph codeph">ARRAY</code> column by referring to it in a join query, as if it were a separate table with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns. If the array element is a scalar type, you refer to its value using the
+ <code class="ph codeph">ITEM</code> pseudocolumn. If the array element is a <code class="ph codeph">STRUCT</code>, you refer to the <code class="ph codeph">STRUCT</code> fields
+ using dot notation and the field names. If the array element is another <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>, you use
+ another level of join to unpack the nested collection elements.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>-- Array of scalar values.
+-- Each array element represents a single string, plus we know its position in the array.
+SELECT id, name, pets.pos, pets.item FROM array_demo, array_demo.pets;
+
+-- Array of structs.
+-- Now each array element has named fields, possibly of different types.
+-- You can consider an ARRAY of STRUCT to represent a table inside another table.
+SELECT id, name, places_lived.pos, places_lived.item.place, places_lived.item.start_year
+FROM array_demo, array_demo.places_lived;
+
+-- The .ITEM name is optional for array elements that are structs.
+-- The following query is equivalent to the previous one, with .ITEM
+-- removed from the column references.
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+ FROM array_demo, array_demo.places_lived;
+
+-- To filter specific items from the array, do comparisons against the .POS or .ITEM
+-- pseudocolumns, or names of struct fields, in the WHERE clause.
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+ WHERE pets.pos in (0, 1, 3);
+
+SELECT id, name, pets.item FROM array_demo, array_demo.pets
+ WHERE pets.item LIKE 'Mr. %';
+
+SELECT id, name, places_lived.pos, places_lived.place, places_lived.start_year
+ FROM array_demo, array_demo.places_lived
+WHERE places_lived.place like '%California%';
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+
+ <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_auditing.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_auditing.html b/docs/build3x/html/topics/impala_auditing.html
new file mode 100644
index 0000000..bbdca95
--- /dev/null
+++ b/docs/build3x/html/topics/impala_auditing.html
@@ -0,0 +1,232 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="auditing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Auditing Impala Operations</title></head><body id="auditing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Auditing Impala Operations</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To monitor how Impala data is being used within your organization, ensure
+ that your Impala authorization and authentication policies are effective.
+ To detect attempts at intrusion or unauthorized access to Impala
+ data, you can use the auditing feature in Impala 1.2.1 and higher:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Enable auditing by including the option
+ <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+ in your <span class="keyword cmdname">impalad</span> startup options.
+ The log directory must be a local directory on the
+ server, not an HDFS directory.
+ </li>
+
+ <li class="li">
+ Decide how many queries will be represented in each audit event log file. By default,
+ Impala starts a new audit event log file every 5000 queries. To specify a different number,
+ <span class="ph">include
+ the option <code class="ph codeph">--max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+ in the <span class="keyword cmdname">impalad</span> startup options</span>.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.9</span> and higher, you can control how many
+ audit event log files are kept on each host. Specify the option
+ <code class="ph codeph">--max_audit_event_log_files=<var class="keyword varname">number_of_log_files</var></code>
+ in the <span class="keyword cmdname">impalad</span> startup options. Once the limit is reached, older
+ files are rotated out using the same mechanism as for other Impala log files.
+ The default value for this setting is 0, representing an unlimited number of audit
+ event log files.
+ </li>
+
+ <li class="li">
+ Use a cluster manager with governance capabilities to filter, visualize,
+ and produce reports based on the audit logs collected
+ from all the hosts in the cluster.
+ </li>
+ </ul>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="auditing__auditing_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Durability and Performance Considerations for Impala Auditing</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The auditing feature only imposes performance overhead while auditing is enabled.
+ </p>
+
+ <p class="p">
+ Because any Impala host can process a query, enable auditing on all hosts where the
+ <span class="ph"><span class="keyword cmdname">impalad</span> daemon</span>
+ runs. Each host stores its own log
+ files, in a directory in the local filesystem. The log data is periodically flushed to disk (through an
+ <code class="ph codeph">fsync()</code> system call) to avoid loss of audit data in case of a crash.
+ </p>
+
+ <p class="p">
+ The runtime overhead of auditing applies to whichever host serves as the coordinator
+ for the query, that is, the host you connect to when you issue the query. This might
+ be the same host for all queries, or different applications or users might connect to
+ and issue queries through different hosts.
+ </p>
+
+ <p class="p">
+ To avoid excessive I/O overhead on busy coordinator hosts, Impala syncs the audit log
+ data (using the <code class="ph codeph">fsync()</code> system call) periodically rather than after
+ every query. Currently, the <code class="ph codeph">fsync()</code> calls are issued at a fixed
+ interval, every 5 seconds.
+ </p>
+
+ <p class="p">
+ By default, Impala avoids losing any audit log data in the case of an error during a logging operation
+ (such as a disk full error), by immediately shutting down
+ <span class="keyword cmdname">impalad</span> on the host where the auditing problem occurred.
+ <span class="ph">You can override this setting by specifying the option
+ <code class="ph codeph">-abort_on_failed_audit_event=false</code> in the <span class="keyword cmdname">impalad</span> startup options.</span>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="auditing__auditing_format">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Format of the Audit Log Files</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The audit log files represent the query information in JSON format, one query per line.
+ Typically, rather than looking at the log files themselves, you should use cluster-management
+ software to consolidate the log data from all Impala hosts and filter and visualize the results
+ in useful ways. (If you do examine the raw log data, you might run the files through
+ a JSON pretty-printer first.)
+ </p>
+
+ <p class="p">
+ All the information about schema objects accessed by the query is encoded in a single nested record on the
+ same line. For example, the audit log for an <code class="ph codeph">INSERT ... SELECT</code> statement records that a
+ select operation occurs on the source table and an insert operation occurs on the destination table. The
+ audit log for a query against a view records the base table accessed by the view, or multiple base tables
+ in the case of a view that includes a join query. Every Impala operation that corresponds to a SQL
+ statement is recorded in the audit logs, whether the operation succeeds or fails. Impala records more
+ information for a successful operation than for a failed one, because an unauthorized query is stopped
+ immediately, before all the query planning is completed.
+ </p>
+
+
+
+ <p class="p">
+ The information logged for each query includes:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Client session state:
+ <ul class="ul">
+ <li class="li">
+ Session ID
+ </li>
+
+ <li class="li">
+ User name
+ </li>
+
+ <li class="li">
+ Network address of the client connection
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ SQL statement details:
+ <ul class="ul">
+ <li class="li">
+ Query ID
+ </li>
+
+ <li class="li">
+ Statement Type - DML, DDL, and so on
+ </li>
+
+ <li class="li">
+ SQL statement text
+ </li>
+
+ <li class="li">
+ Execution start time, in local time
+ </li>
+
+ <li class="li">
+ Execution Status - Details on any errors that were encountered
+ </li>
+
+ <li class="li">
+ Target Catalog Objects:
+ <ul class="ul">
+ <li class="li">
+ Object Type - Table, View, or Database
+ </li>
+
+ <li class="li">
+ Fully qualified object name
+ </li>
+
+ <li class="li">
+ Privilege - How the object is being used (<code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>,
+ <code class="ph codeph">CREATE</code>, and so on)
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="auditing__auditing_exceptions">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Which Operations Are Audited</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The kinds of SQL queries represented in the audit log are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Queries that are prevented due to lack of authorization.
+ </li>
+
+ <li class="li">
+ Queries that Impala can analyze and parse to determine that they are authorized. The audit data is
+ recorded immediately after Impala finishes its analysis, before the query is actually executed.
+ </li>
+ </ul>
+
+ <p class="p">
+ The audit log does not contain entries for queries that could not be parsed and analyzed. For example, a
+ query that fails due to a syntax error is not recorded in the audit log. The audit log also does not
+ contain queries that fail due to a reference to a table that does not exist, if you would be authorized to
+ access the table if it did exist.
+ </p>
+
+ <p class="p">
+ Certain statements in the <span class="keyword cmdname">impala-shell</span> interpreter, such as <code class="ph codeph">CONNECT</code>,
+ <code class="ph codeph">SUMMARY</code>, <code class="ph codeph">PROFILE</code>, <code class="ph codeph">SET</code>, and
+ <code class="ph codeph">QUIT</code>, do not correspond to actual SQL queries, and these statements are not reflected in
+ the audit log.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_authentication.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_authentication.html b/docs/build3x/html/topics/impala_authentication.html
new file mode 100644
index 0000000..b072c37
--- /dev/null
+++ b/docs/build3x/html/topics/impala_authentication.html
@@ -0,0 +1,37 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_kerberos.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ldap.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mixed_security.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_delegation.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authentication"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Auth
entication</title></head><body id="authentication"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Authentication</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Authentication is the mechanism to ensure that only specified hosts and users can connect to Impala. It also
+ verifies that when clients connect to Impala, they are connected to a legitimate server. This feature
+ prevents spoofing such as <dfn class="term">impersonation</dfn> (setting up a phony client system with the same account
+ and group names as a legitimate user) and <dfn class="term">man-in-the-middle attacks</dfn> (intercepting application
+ requests before they reach Impala and eavesdropping on sensitive information in the requests or the results).
+ </p>
+
+ <p class="p">
+ Impala supports authentication using either Kerberos or LDAP.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+ owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+ databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+ </div>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ Once you are finished setting up authentication, move on to authorization, which involves specifying what
+ databases, tables, HDFS directories, and so on can be accessed by particular users when they connect through
+ Impala. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_kerberos.html">Enabling Kerberos Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ldap.html">Enabling LDAP Authentication for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mixed_security.html">Using Multiple Authentication Methods with Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_delegation.html">Configuring Impala Delegation for Hue and BI Tools</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
[08/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_resource_management.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_resource_management.html b/docs/build3x/html/topics/impala_resource_management.html
new file mode 100644
index 0000000..cbc116a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_resource_management.html
@@ -0,0 +1,97 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="resource_management"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Resource Management for Impala</title></head><body id="resource_management"><mai
n role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Resource Management for Impala</h1>
+
+
+ <div class="body conbody">
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The use of the Llama component for integrated resource management within YARN
+ is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+ The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+ </p>
+ <p class="p">
+ For clusters running Impala alongside
+ other data management components, you define static service pools to define the resources
+ available to Impala and other components. Then within the area allocated for Impala,
+ you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+ </p>
+ </div>
+
+ <p class="p">
+ You can limit the CPU and memory resources used by Impala, to manage and prioritize workloads on clusters
+ that run jobs from many Hadoop components.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="resource_management__rm_enforcement">
+
+ <h2 class="title topictitle2" id="ariaid-title2">How Resource Limits Are Enforced</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Limits on memory usage are enforced by Impala's process memory limit (the <code class="ph codeph">MEM_LIMIT</code>
+ query option setting). The admission control feature checks this setting to decide how many queries
+ can be safely run at the same time. Then the Impala daemon enforces the limit by activating the
+ spill-to-disk mechanism when necessary, or cancelling a query altogether if the limit is exceeded at runtime.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="resource_management__rm_query_options">
+
+ <h2 class="title topictitle2" id="ariaid-title3">impala-shell Query Options for Resource Management</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Before issuing SQL statements through the <span class="keyword cmdname">impala-shell</span> interpreter, you can use the
+ <code class="ph codeph">SET</code> command to configure the following parameters related to resource management:
+ </p>
+
+ <ul class="ul" id="rm_query_options__ul_nzt_twf_jp">
+ <li class="li">
+ <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+ </li>
+
+ </ul>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="resource_management__rm_limitations">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Limitations of Resource Management for Impala</h2>
+
+ <div class="body conbody">
+
+
+
+
+
+
+
+ <p class="p">
+ The <code class="ph codeph">MEM_LIMIT</code> query option, and the other resource-related query options, are settable
+ through the ODBC or JDBC interfaces in Impala 2.0 and higher. This is a former limitation that is now
+ lifted.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_revoke.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_revoke.html b/docs/build3x/html/topics/impala_revoke.html
new file mode 100644
index 0000000..02cbb59
--- /dev/null
+++ b/docs/build3x/html/topics/impala_revoke.html
@@ -0,0 +1,151 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="revoke"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REVOKE Statement (Impala 2.0 or higher only)</title></head><body id="revoke"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REVOKE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">REVOKE</code> statement revokes roles or
+ privileges on a specified object from groups.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>REVOKE ROLE <var class="keyword varname">role_name</var> FROM GROUP <var class="keyword varname">group_name</var>
+
+REVOKE <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+ FROM [ROLE] <var class="keyword varname">role_name</var>
+
+<span class="ph">
+ privilege ::= ALL | ALTER | CREATE | DROP | INSERT | REFRESH | SELECT | SELECT(<var class="keyword varname">column_name</var>)
+</span>
+<span class="ph">
+ object_type ::= TABLE | DATABASE | SERVER | URI
+</span>
+</code></pre>
+
+ <p class="p">
+ See <a href="impala_grant.html"><span class="keyword">GRANT Statement (Impala 2.0 or higher only)</span></a> for the required privileges and the scope
+ for SQL operations.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ALL</code> privilege is a distinct privilege and not a
+ union of all other privileges. Revoking <code class="ph codeph">SELECT</code>,
+ <code class="ph codeph">INSERT</code>, etc. from a role that only has the
+ <code class="ph codeph">ALL</code> privilege has no effect. To reduce the privileges
+ of that role you must <code class="ph codeph">REVOKE ALL</code> and
+ <code class="ph codeph">GRANT</code> the desired privileges.
+ </p>
+
+ <p class="p">
+ Typically, the object name is an identifier. For URIs, it is a string literal.
+ </p>
+
+ <p class="p">
+ The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+ in <span class="keyword">Impala 2.3</span> and higher. See
+ <span class="xref">the documentation for Apache Sentry</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Required privileges:</strong>
+ </p>
+
+ <p class="p">
+ Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+ policy file) can use this statement.
+ </p>
+ <p class="p">Only Sentry administrative users can revoke the role from a group.</p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <div class="p">
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">REVOKE</code> statements are available in <span class="keyword">Impala 2.0</span> and higher.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 1.4</span> and higher, Impala makes use of any roles and privileges specified by the
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+ use the Sentry service instead of the file-based policy mechanism.
+ </li>
+
+ <li class="li">
+ The Impala <code class="ph codeph">REVOKE</code> statements do not require the
+ <code class="ph codeph">ROLE</code> keyword to be repeated before each role name,
+ unlike the equivalent Hive statements.
+ </li>
+
+ <li class="li">
+ Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+ revoke a single privilege to or from a single role.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <div class="p">
+ Access to Kudu tables must be granted to and revoked from roles with the
+ following considerations:
+ <ul class="ul">
+ <li class="li">
+ Only users with the <code class="ph codeph">ALL</code> privilege on
+ <code class="ph codeph">SERVER</code> can create external Kudu tables.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+ required to specify the <code class="ph codeph">kudu.master_addresses</code>
+ property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+ tables as well as external tables.
+ </li>
+ <li class="li">
+ Access to Kudu tables is enforced at the table level and at the
+ column level.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+ permissions are supported.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+ <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+ privilege.
+ </li>
+ </ul>
+ Because non-SQL APIs can access Kudu data without going through Sentry
+ authorization, currently the Sentry support is considered preliminary
+ and subject to change.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+ <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
new file mode 100644
index 0000000..7f9466e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_bloom_filter_size.html
@@ -0,0 +1,104 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_bloom_filter_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_bloom_filter_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_BLOOM_FILTER_SIZE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Size (in bytes) of Bloom filter data structure used by the runtime filtering
+ feature.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, this query option only applies as a fallback, when statistics
+ are not available. By default, Impala estimates the optimal size of the Bloom filter structure
+ regardless of the setting for this option. (This is a change from the original behavior in
+ <span class="keyword">Impala 2.5</span>.)
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, when the value of this query option is used for query planning,
+ it is constrained by the minimum and maximum sizes specified by the
+ <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query options.
+ The filter size is adjusted upward or downward if necessary to fit within the minimum/maximum range.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 1048576 (1 MB)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Maximum:</strong> 16 MB
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This setting affects optimizations for large and complex queries, such
+ as dynamic partition pruning for partitioned tables, and join optimization
+ for queries that join large tables.
+ Larger filters are more effective at handling
+ higher cardinality input sets, but consume more memory per filter.
+
+ </p>
+
+ <p class="p">
+ If your query filters on high-cardinality columns (for example, millions of different values)
+ and you do not get the expected speedup from the runtime filtering mechanism, consider
+ doing some benchmarks with a higher value for <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>.
+ The extra memory devoted to the Bloom filter data structures can help make the filtering
+ more accurate.
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ Because the effectiveness of this setting depends so much on query characteristics and data distribution,
+ you typically only use it for specific queries that need some extra tuning, and the ideal value depends
+ on the query. Consider setting this query option immediately before the expensive query and
+ unsetting it immediately afterward.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_max_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_max_size.html b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
new file mode 100644
index 0000000..b1cf316
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_max_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_max_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_max_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MAX_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ This option defines the maximum size for a filter,
+ no matter what the estimates produced by the planner are.
+ This value also overrides any lower number specified for the
+ <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+ Filter sizes are rounded up to the nearest power of two.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_min_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_min_size.html b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
new file mode 100644
index 0000000..fd70cdb
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_min_size.html
@@ -0,0 +1,65 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_min_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</title></head><body id="runtime_filter_min_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MIN_SIZE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ This option defines the minimum size for a filter,
+ no matter what the estimates produced by the planner are.
+ This value also overrides any lower number specified for the
+ <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> query option.
+ Filter sizes are rounded up to the nearest power of two.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_mode.html b/docs/build3x/html/topics/impala_runtime_filter_mode.html
new file mode 100644
index 0000000..6ce6b3b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_mode.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_MODE Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+ adjusts the settings for the runtime filtering feature.
+ It turns this feature on and off, and controls how
+ extensively the filters are transmitted between hosts.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric (0, 1, 2)
+ or corresponding mnemonic strings (<code class="ph codeph">OFF</code>, <code class="ph codeph">LOCAL</code>, <code class="ph codeph">GLOBAL</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 2 (equivalent to <code class="ph codeph">GLOBAL</code>); formerly was 1 / <code class="ph codeph">LOCAL</code>, in <span class="keyword">Impala 2.5</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the default is <code class="ph codeph">GLOBAL</code>.
+ This setting is recommended for a wide variety of workloads, to provide best
+ performance with <span class="q">"out of the box"</span> settings.
+ </p>
+
+ <p class="p">
+ The lowest setting of <code class="ph codeph">LOCAL</code> does a similar level of optimization
+ (such as partition pruning) as in earlier Impala releases.
+ This setting was the default in <span class="keyword">Impala 2.5</span>,
+ to allow for a period of post-upgrade testing for existing workloads.
+ This setting is suitable for workloads with non-performance-critical queries,
+ or if the coordinator node is under heavy CPU or memory pressure.
+ </p>
+
+ <p class="p">
+ You might change the setting to <code class="ph codeph">OFF</code> if your workload contains
+ many queries involving partitioned tables or joins that do not experience a performance
+ increase from the runtime filters feature. If the overhead of producing the runtime filters
+ outweighs the performance benefit for queries, you can turn the feature off entirely.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about runtime filtering.
+ <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a>,
+ and
+ <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+ for tuning options for runtime filtering.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
new file mode 100644
index 0000000..bcee5c6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filter_wait_time_ms.html
@@ -0,0 +1,51 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filter_wait_time_ms"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</title></head><body id="runtime_filter_wait_time_ms"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">RUNTIME_FILTER_WAIT_TIME_MS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option
+ adjusts the settings for the runtime filtering feature.
+ It specifies a time in milliseconds that each scan node waits for
+ runtime filters to be produced by other plan fragments.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning use the value from the corresponding <span class="keyword cmdname">impalad</span> startup option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_runtime_filtering.html b/docs/build3x/html/topics/impala_runtime_filtering.html
new file mode 100644
index 0000000..1280838
--- /dev/null
+++ b/docs/build3x/html/topics/impala_runtime_filtering.html
@@ -0,0 +1,533 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</title></head><body id="runtime_filtering"><main role="main"><article role="article" aria-labelledby="runtime_filtering__runtime_filters">
+
+ <h1 class="title topictitle1" id="runtime_filtering__runtime_filters">Runtime Filtering for Impala Queries (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ <dfn class="term">Runtime filtering</dfn> is a wide-ranging optimization feature available in
+ <span class="keyword">Impala 2.5</span> and higher. When only a fraction of the data in a table is
+ needed for a query against a partitioned table or to evaluate a join condition,
+ Impala determines the appropriate conditions while the query is running, and
+ broadcasts that information to all the <span class="keyword cmdname">impalad</span> nodes that are reading the table
+ so that they can avoid unnecessary I/O to read partition data, and avoid
+ unnecessary network transmission by sending only the subset of rows that match the join keys
+ across the network.
+ </p>
+
+ <p class="p">
+ This feature is primarily used to optimize queries against large partitioned tables
+ (under the name <dfn class="term">dynamic partition pruning</dfn>) and joins of large tables.
+ The information in this section includes concepts, internals, and troubleshooting
+ information for the entire runtime filtering feature.
+ For specific tuning steps for partitioned tables,
+
+ see
+ <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a>.
+
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ When this feature made its debut in <span class="keyword">Impala 2.5</span>,
+ the default setting was <code class="ph codeph">RUNTIME_FILTER_MODE=LOCAL</code>.
+ Now the default is <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code> in <span class="keyword">Impala 2.6</span> and higher,
+ which enables more wide-ranging and ambitious query optimization without requiring you to
+ explicitly set any query options.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="runtime_filtering__runtime_filtering_concepts">
+ <h2 class="title topictitle2" id="ariaid-title2">Background Information for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ To understand how runtime filtering works at a detailed level, you must
+ be familiar with some terminology from the field of distributed database technology:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ What a <dfn class="term">plan fragment</dfn> is.
+ Impala decomposes each query into smaller units of work that are distributed across the cluster.
+ Wherever possible, a data block is read, filtered, and aggregated by plan fragments executing
+ on the same host. For some operations, such as joins and combining intermediate results into
+ a final result set, data is transmitted across the network from one DataNode to another.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ What <code class="ph codeph">SCAN</code> and <code class="ph codeph">HASH JOIN</code> plan nodes are, and their role in computing query results:
+ </p>
+ <p class="p">
+ In the Impala query plan, a <dfn class="term">scan node</dfn> performs the I/O to read from the underlying data files.
+ Although this is an expensive operation from the traditional database perspective, Hadoop clusters and Impala are
+ optimized to do this kind of I/O in a highly parallel fashion. The major potential cost savings come from using
+ the columnar Parquet format (where Impala can avoid reading data for unneeded columns) and partitioned tables
+ (where Impala can avoid reading data for unneeded partitions).
+ </p>
+ <p class="p">
+ Most Impala joins use the
+ <a class="xref" href="https://en.wikipedia.org/wiki/Hash_join" target="_blank"><dfn class="term">hash join</dfn></a>
+ mechanism. (It is only fairly recently that Impala
+ started using the nested-loop join technique, for certain kinds of non-equijoin queries.)
+ In a hash join, when evaluating join conditions from two tables, Impala constructs a hash table in memory with all
+ the different column values from the table on one side of the join.
+ Then, for each row from the table on the other side of the join, Impala tests whether the relevant column values
+ are in this hash table or not.
+ </p>
+ <p class="p">
+ A <dfn class="term">hash join node</dfn> constructs such an in-memory hash table, then performs the comparisons to
+ identify which rows match the relevant join conditions
+ and should be included in the result set (or at least sent on to the subsequent intermediate stage of
+ query processing). Because some of the input for a hash join might be transmitted across the network from another host,
+ it is especially important from a performance perspective to prune out ahead of time any data that is known to be
+ irrelevant.
+ </p>
+ <p class="p">
+ The more distinct values are in the columns used as join keys, the larger the in-memory hash table and
+ thus the more memory required to process the query.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The difference between a <dfn class="term">broadcast join</dfn> and a <dfn class="term">shuffle join</dfn>.
+ (The Hadoop notion of a shuffle join is sometimes referred to in Impala as a <dfn class="term">partitioned join</dfn>.)
+ In a broadcast join, the table from one side of the join (typically the smaller table)
+ is sent in its entirety to all the hosts involved in the query. Then each host can compare its
+ portion of the data from the other (larger) table against the full set of possible join keys.
+ In a shuffle join, there is no obvious <span class="q">"smaller"</span> table, and so the contents of both tables
+ are divided up, and corresponding portions of the data are transmitted to each host involved in the query.
+ See <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for information about how these different kinds of
+ joins are processed.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The notion of the build phase and probe phase when Impala processes a join query.
+ The <dfn class="term">build phase</dfn> is where the rows containing the join key columns, typically for the smaller table,
+ are transmitted across the network and built into an in-memory hash table data structure on one or
+ more destination nodes.
+ The <dfn class="term">probe phase</dfn> is where data is read locally (typically from the larger table) and the join key columns
+ are compared to the values in the in-memory hash table.
+ The corresponding input sources (tables, subqueries, and so on) for these
+ phases are referred to as the <dfn class="term">build side</dfn> and the <dfn class="term">probe side</dfn>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ How to set Impala query options: interactively within an <span class="keyword cmdname">impala-shell</span> session through
+ the <code class="ph codeph">SET</code> command, for a JDBC or ODBC application through the <code class="ph codeph">SET</code> statement, or
+ globally for all <span class="keyword cmdname">impalad</span> daemons through the <code class="ph codeph">default_query_options</code> configuration
+ setting.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="runtime_filtering__runtime_filtering_internals">
+ <h2 class="title topictitle2" id="ariaid-title3">Runtime Filtering Internals</h2>
+ <div class="body conbody">
+ <p class="p">
+ The <dfn class="term">filter</dfn> that is transmitted between plan fragments is essentially a list
+ of values for join key columns. When this list is values is transmitted in time to a scan node,
+ Impala can filter out non-matching values immediately after reading them, rather than transmitting
+ the raw data to another host to compare against the in-memory hash table on that host.
+ </p>
+ <p class="p">
+ For HDFS-based tables, this data structure is implemented as a <dfn class="term">Bloom filter</dfn>, which uses
+ a probability-based algorithm to determine all possible matching values. (The probability-based aspects
+ means that the filter might include some non-matching values, but if so, that does not cause any inaccuracy
+ in the final results.)
+ </p>
+ <p class="p">
+ Another kind of filter is the <span class="q">"min-max"</span> filter. It currently only applies to Kudu tables. The
+ filter is a data structure representing a minimum and maximum value. These filters are passed to
+ Kudu to reduce the number of rows returned to Impala when scanning the probe side of the join.
+ </p>
+ <p class="p">
+ There are different kinds of filters to match the different kinds of joins (partitioned and broadcast).
+ A broadcast filter reflects the complete list of relevant values and can be immediately evaluated by a scan node.
+ A partitioned filter reflects only the values processed by one host in the
+ cluster; all the partitioned filters must be combined into one (by the coordinator node) before the
+ scan nodes can use the results to accurately filter the data as it is read from storage.
+ </p>
+ <p class="p">
+ Broadcast filters are also classified as local or global. With a local broadcast filter, the information
+ in the filter is used by a subsequent query fragment that is running on the same host that produced the filter.
+ A non-local broadcast filter must be transmitted across the network to a query fragment that is running on a
+ different host. Impala designates 3 hosts to each produce non-local broadcast filters, to guard against the
+ possibility of a single slow host taking too long. Depending on the setting of the <code class="ph codeph">RUNTIME_FILTER_MODE</code> query option
+ (<code class="ph codeph">LOCAL</code> or <code class="ph codeph">GLOBAL</code>), Impala either uses a conservative optimization
+ strategy where filters are only consumed on the same host that produced them, or a more aggressive strategy
+ where filters are eligible to be transmitted across the network.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In <span class="keyword">Impala 2.6</span> and higher, the default for runtime filtering is the <code class="ph codeph">GLOBAL</code> setting.
+ </div>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="runtime_filtering__runtime_filtering_file_formats">
+ <h2 class="title topictitle2" id="ariaid-title4">File Format Considerations for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ Parquet tables get the most benefit from
+ the runtime filtering optimizations. Runtime filtering can speed up
+ join queries against partitioned or unpartitioned Parquet tables,
+ and single-table queries against partitioned Parquet tables.
+ See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for information about
+ using Parquet tables with Impala.
+ </p>
+ <p class="p">
+ For other file formats (text, Avro, RCFile, and SequenceFile),
+ runtime filtering speeds up queries against partitioned tables only.
+ Because partitioned tables can use a mixture of formats, Impala produces
+ the filters in all cases, even if they are not ultimately used to
+ optimize the query.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="runtime_filtering__runtime_filtering_timing">
+ <h2 class="title topictitle2" id="ariaid-title5">Wait Intervals for Runtime Filters</h2>
+ <div class="body conbody">
+ <p class="p">
+ Because it takes time to produce runtime filters, especially for
+ partitioned filters that must be combined by the coordinator node,
+ there is a time interval above which it is more efficient for
+ the scan nodes to go ahead and construct their intermediate result sets,
+ even if that intermediate data is larger than optimal. If it only takes
+ a few seconds to produce the filters, it is worth the extra time if pruning
+ the unnecessary data can save minutes in the overall query time.
+ You can specify the maximum wait time in milliseconds using the
+ <code class="ph codeph">RUNTIME_FILTER_WAIT_TIME_MS</code> query option.
+ </p>
+ <p class="p">
+ By default, each scan node waits for up to 1 second (1000 milliseconds)
+ for filters to arrive. If all filters have not arrived within the
+ specified interval, the scan node proceeds, using whatever filters
+ did arrive to help avoid reading unnecessary data. If a filter arrives
+ after the scan node begins reading data, the scan node applies that
+ filter to the data that is read after the filter arrives, but not to
+ the data that was already read.
+ </p>
+ <p class="p">
+ If the cluster is relatively busy and your workload contains many
+ resource-intensive or long-running queries, consider increasing the wait time
+ so that complicated queries do not miss opportunities for optimization.
+ If the cluster is lightly loaded and your workload contains many small queries
+ taking only a few seconds, consider decreasing the wait time to avoid the
+ 1 second delay for each query.
+ </p>
+ </div>
+ </article>
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="runtime_filtering__runtime_filtering_query_options">
+ <h2 class="title topictitle2" id="ariaid-title6">Query Options for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ See the following sections for information about the query options that control runtime filtering:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The first query option adjusts the <span class="q">"sensitivity"</span> of this feature.
+ <span class="ph">By default, it is set to the highest level (<code class="ph codeph">GLOBAL</code>).
+ (This default applies to <span class="keyword">Impala 2.6</span> and higher.
+ In previous releases, the default was <code class="ph codeph">LOCAL</code>.)</span>
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ The other query options are tuning knobs that you typically only adjust after doing
+ performance testing, and that you might want to change only for the duration of a single
+ expensive query:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>;
+ in <span class="keyword">Impala 2.6</span> and higher, this setting acts as a fallback when
+ statistics are not available, rather than as a directive.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="runtime_filtering__runtime_filtering_explain_plan">
+ <h2 class="title topictitle2" id="ariaid-title7">Runtime Filtering and Query Plans</h2>
+ <div class="body conbody">
+ <p class="p">
+ In the same way the query plan displayed by the
+ <code class="ph codeph">EXPLAIN</code> statement includes information
+ about predicates used by each plan fragment, it also
+ includes annotations showing whether a plan fragment
+ produces or consumes a runtime filter.
+ A plan fragment that produces a filter includes an
+ annotation such as
+ <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> <- <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>,
+ while a plan fragment that consumes a filter includes an annotation such as
+ <code class="ph codeph">runtime filters: <var class="keyword varname">filter_id</var> -> <var class="keyword varname">table</var>.<var class="keyword varname">column</var></code>.
+ <span class="ph">Setting the query option <code class="ph codeph">EXPLAIN_LEVEL=2</code> adds additional
+ annotations showing the type of the filter, either <code class="ph codeph"><var class="keyword varname">filter_id</var>[bloom]</code>
+ (for HDFS-based tables) or <code class="ph codeph"><var class="keyword varname">filter_id</var>[min_max]</code> (for Kudu tables).</span>
+ </p>
+
+ <p class="p">
+ The following example shows a query that uses a single runtime filter (labelled <code class="ph codeph">RF00</code>)
+ to prune the partitions that are scanned in one stage of the query, based on evaluating the
+ result set of a subquery:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+| |
+| 04:EXCHANGE [UNPARTITIONED] |
+| | |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
+| | hash predicates: year = year |
+| | <strong class="ph b">runtime filters: RF000 <- year</strong> |
+| | |
+| |--03:EXCHANGE [BROADCAST] |
+| | | |
+| | 01:SCAN HDFS [dpp.yy] |
+| | partitions=2/4 files=2 size=468B |
+| | |
+| 00:SCAN HDFS [dpp.yy2] |
+| partitions=2/3 files=2 size=468B |
+| <strong class="ph b">runtime filters: RF000 -> year</strong> |
++----------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The query profile (displayed by the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span>)
+ contains both the <code class="ph codeph">EXPLAIN</code> plan and more detailed information about the internal
+ workings of the query. The profile output includes a section labelled the <span class="q">"filter routing table"</span>,
+ with information about each filter based on its ID.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="runtime_filtering__runtime_filtering_queries">
+ <h2 class="title topictitle2" id="ariaid-title8">Examples of Queries that Benefit from Runtime Filtering</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ In this example, Impala would normally do extra work to interpret the columns
+ <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>, <code class="ph codeph">C3</code>, and <code class="ph codeph">ID</code>
+ for each row in <code class="ph codeph">HUGE_T1</code>, before checking the <code class="ph codeph">ID</code>
+ value against the in-memory hash table constructed from all the <code class="ph codeph">TINY_T2.ID</code>
+ values. By producing a filter containing all the <code class="ph codeph">TINY_T2.ID</code> values
+ even before the query starts scanning the <code class="ph codeph">HUGE_T1</code> table, Impala
+ can skip the unnecessary work to parse the column info as soon as it determines
+ that an <code class="ph codeph">ID</code> value does not match any of the values from the other table.
+ </p>
+
+ <p class="p">
+ The example shows <code class="ph codeph">COMPUTE STATS</code> statements for both the tables (even
+ though that is a one-time operation after loading data into those tables) because
+ Impala relies on up-to-date statistics to
+ determine which one has more distinct <code class="ph codeph">ID</code> values than the other.
+ That information lets Impala make effective decisions about which table to use to
+ construct the in-memory hash table, and which table to read from disk and
+ compare against the entries in the hash table.
+ </p>
+
+<pre class="pre codeblock"><code>
+COMPUTE STATS huge_t1;
+COMPUTE STATS tiny_t2;
+SELECT c1, c2, c3 FROM huge_t1 JOIN tiny_t2 WHERE huge_t1.id = tiny_t2.id;
+</code></pre>
+
+
+
+ <p class="p">
+ In this example, <code class="ph codeph">T1</code> is a table partitioned by year. The subquery
+ on <code class="ph codeph">T2</code> produces multiple values, and transmits those values as a filter to the plan
+ fragments that are reading from <code class="ph codeph">T1</code>. Any non-matching partitions in <code class="ph codeph">T1</code>
+ are skipped.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from t1 where year in (select distinct year from t2);
+</code></pre>
+
+ <p class="p">
+ Now the <code class="ph codeph">WHERE</code> clause contains an additional test that does not apply to
+ the partition key column.
+ A filter on a column that is not a partition key is called a per-row filter.
+ Because per-row filters only apply for Parquet, <code class="ph codeph">T1</code> must be a Parquet table.
+ </p>
+
+ <p class="p">
+ The subqueries result in two filters being transmitted to
+ the scan nodes that read from <code class="ph codeph">T1</code>. The filter on <code class="ph codeph">YEAR</code> helps the query eliminate
+ entire partitions based on non-matching years. The filter on <code class="ph codeph">C2</code> lets Impala discard
+ rows with non-matching <code class="ph codeph">C2</code> values immediately after reading them. Without runtime filtering,
+ Impala would have to keep the non-matching values in memory, assemble <code class="ph codeph">C1</code>, <code class="ph codeph">C2</code>,
+ and <code class="ph codeph">C3</code> into rows in the intermediate result set, and transmit all the intermediate rows
+ back to the coordinator node, where they would be eliminated only at the very end of the query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1, c2, c3 from t1
+ where year in (select distinct year from t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ <p class="p">
+ This example involves a broadcast join.
+ The fact that the <code class="ph codeph">ON</code> clause would
+ return a small number of matching rows (because there
+ are not very many rows in <code class="ph codeph">TINY_T2</code>)
+ means that the corresponding filter is very selective.
+ Therefore, runtime filtering will probably be effective
+ in optimizing this query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [broadcast] tiny_t2
+ on huge_t1.id = tiny_t2.id
+ where huge_t1.year in (select distinct year from tiny_t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ <p class="p">
+ This example involves a shuffle or partitioned join.
+ Assume that most rows in <code class="ph codeph">HUGE_T1</code>
+ have a corresponding row in <code class="ph codeph">HUGE_T2</code>.
+ The fact that the <code class="ph codeph">ON</code> clause could
+ return a large number of matching rows means that
+ the corresponding filter would not be very selective.
+ Therefore, runtime filtering might be less effective
+ in optimizing this query.
+ </p>
+
+<pre class="pre codeblock"><code>
+select c1 from huge_t1 join [shuffle] huge_t2
+ on huge_t1.id = huge_t2.id
+ where huge_t1.year in (select distinct year from huge_t2)
+ and c2 in (select other_column from t3);
+</code></pre>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="runtime_filtering__runtime_filtering_tuning">
+ <h2 class="title topictitle2" id="ariaid-title9">Tuning and Troubleshooting Queries that Use Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ These tuning and troubleshooting procedures apply to queries that are
+ resource-intensive enough, long-running enough, and frequent enough
+ that you can devote special attention to optimizing them individually.
+ </p>
+
+ <p class="p">
+ Use the <code class="ph codeph">EXPLAIN</code> statement and examine the <code class="ph codeph">runtime filters:</code>
+ lines to determine whether runtime filters are being applied to the <code class="ph codeph">WHERE</code> predicates
+ and join clauses that you expect. For example, runtime filtering does not apply to queries that use
+ the nested loop join mechanism due to non-equijoin operators.
+ </p>
+
+ <p class="p">
+ Make sure statistics are up-to-date for all tables involved in the queries.
+ Use the <code class="ph codeph">COMPUTE STATS</code> statement after loading data into non-partitioned tables,
+ and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> after adding new partitions to partitioned tables.
+ </p>
+
+ <p class="p">
+ If join queries involving large tables use unique columns as the join keys,
+ for example joining a primary key column with a foreign key column, the overhead of
+ producing and transmitting the filter might outweigh the performance benefit because
+ not much data could be pruned during the early stages of the query.
+ For such queries, consider setting the query option <code class="ph codeph">RUNTIME_FILTER_MODE=OFF</code>.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="runtime_filtering__runtime_filtering_limits">
+ <h2 class="title topictitle2" id="ariaid-title10">Limitations and Restrictions for Runtime Filtering</h2>
+ <div class="body conbody">
+ <p class="p">
+ The runtime filtering feature is most effective for the Parquet file formats.
+ For other file formats, filtering only applies for partitioned tables.
+ See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering_file_formats">File Format Considerations for Runtime Filtering</a>.
+ For the ways in which runtime filtering works for Kudu tables, see
+ <a class="xref" href="impala_kudu.html#kudu_performance">Impala Query Performance for Kudu Tables</a>.
+ </p>
+
+
+ <p class="p">
+ When the spill-to-disk mechanism is activated on a particular host during a query,
+ that host does not produce any filters while processing that query.
+ This limitation does not affect the correctness of results; it only reduces the
+ amount of optimization that can be applied to the query.
+ </p>
+
+ </div>
+ </article>
+
+
+</article></main></body></html>
[06/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_scalability.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_scalability.html b/docs/build3x/html/topics/impala_scalability.html
new file mode 100644
index 0000000..f2b6a9f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_scalability.html
@@ -0,0 +1,920 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name
="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scalability"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Scalability Considerations for Impala</title></head><body id="scalability"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Scalability Considerations for Impala</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section explains how the size of your cluster and the volume of data influences SQL performance and
+ schema design for Impala tables. Typically, adding more cluster capacity reduces problems due to memory
+ limits or disk throughput. On the other hand, larger clusters are more likely to have other kinds of
+ scalability issues, such as a single slow node that causes performance problems for queries.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <p class="p">
+ A good source of tips related to scalability and performance tuning is the
+ <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+ presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+ </p>
+
+ </div>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="scalability__scalability_catalog">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Impact of Many Tables or Partitions on Impala Catalog Performance and Memory Usage</h2>
+
+ <div class="body conbody">
+
+
+
+ <p class="p">
+ Because Hadoop I/O is optimized for reading and writing large files, Impala is optimized for tables
+ containing relatively few, large data files. Schemas containing thousands of tables, or tables containing
+ thousands of partitions, can encounter performance issues during startup or during DDL operations such as
+ <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ Because of a change in the default heap size for the <span class="keyword cmdname">catalogd</span> daemon in
+ <span class="keyword">Impala 2.5</span> and higher, the following procedure to increase the <span class="keyword cmdname">catalogd</span>
+ memory limit might be required following an upgrade to <span class="keyword">Impala 2.5</span> even if not
+ needed previously.
+ </p>
+ </div>
+
+ <div class="p">
+ For schemas with large numbers of tables, partitions, and data files, the <span class="keyword cmdname">catalogd</span>
+ daemon might encounter an out-of-memory error. To increase the memory limit for the
+ <span class="keyword cmdname">catalogd</span> daemon:
+
+ <ol class="ol">
+ <li class="li">
+ <p class="p">
+ Check current memory usage for the <span class="keyword cmdname">catalogd</span> daemon by running the
+ following commands on the host where that daemon runs on your cluster:
+ </p>
+ <pre class="pre codeblock"><code>
+ jcmd <var class="keyword varname">catalogd_pid</var> VM.flags
+ jmap -heap <var class="keyword varname">catalogd_pid</var>
+ </code></pre>
+ </li>
+ <li class="li">
+ <p class="p">
+ Decide on a large enough value for the <span class="keyword cmdname">catalogd</span> heap.
+ You express it as an environment variable value as follows:
+ </p>
+ <pre class="pre codeblock"><code>
+ JAVA_TOOL_OPTIONS="-Xmx8g"
+ </code></pre>
+ </li>
+ <li class="li">
+ <p class="p">
+ On systems not using cluster management software, put this environment variable setting into the
+ startup script for the <span class="keyword cmdname">catalogd</span> daemon, then restart the <span class="keyword cmdname">catalogd</span>
+ daemon.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Use the same <span class="keyword cmdname">jcmd</span> and <span class="keyword cmdname">jmap</span> commands as earlier to
+ verify that the new settings are in effect.
+ </p>
+ </li>
+ </ol>
+ </div>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="scalability__statestore_scalability">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Scalability Considerations for the Impala Statestore</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Before <span class="keyword">Impala 2.1</span>, the statestore sent only one kind of message to its subscribers. This message contained all
+ updates for any topics that a subscriber had subscribed to. It also served to let subscribers know that the
+ statestore had not failed, and conversely the statestore used the success of sending a heartbeat to a
+ subscriber to decide whether or not the subscriber had failed.
+ </p>
+
+ <p class="p">
+ Combining topic updates and failure detection in a single message led to bottlenecks in clusters with large
+ numbers of tables, partitions, and HDFS data blocks. When the statestore was overloaded with metadata
+ updates to transmit, heartbeat messages were sent less frequently, sometimes causing subscribers to time
+ out their connection with the statestore. Increasing the subscriber timeout and decreasing the frequency of
+ statestore heartbeats worked around the problem, but reduced responsiveness when the statestore failed or
+ restarted.
+ </p>
+
+ <p class="p">
+ As of <span class="keyword">Impala 2.1</span>, the statestore now sends topic updates and heartbeats in separate messages. This allows the
+ statestore to send and receive a steady stream of lightweight heartbeats, and removes the requirement to
+ send topic updates according to a fixed schedule, reducing statestore network overhead.
+ </p>
+
+ <p class="p">
+ The statestore now has the following relevant configuration flags for the <span class="keyword cmdname">statestored</span>
+ daemon:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="statestore_scalability__statestore_num_update_threads">
+ <code class="ph codeph">-statestore_num_update_threads</code>
+ </dt>
+
+ <dd class="dd">
+ The number of threads inside the statestore dedicated to sending topic updates. You should not
+ typically need to change this value.
+ <p class="p">
+ <strong class="ph b">Default:</strong> 10
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="statestore_scalability__statestore_update_frequency_ms">
+ <code class="ph codeph">-statestore_update_frequency_ms</code>
+ </dt>
+
+ <dd class="dd">
+ The frequency, in milliseconds, with which the statestore tries to send topic updates to each
+ subscriber. This is a best-effort value; if the statestore is unable to meet this frequency, it sends
+ topic updates as fast as it can. You should not typically need to change this value.
+ <p class="p">
+ <strong class="ph b">Default:</strong> 2000
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="statestore_scalability__statestore_num_heartbeat_threads">
+ <code class="ph codeph">-statestore_num_heartbeat_threads</code>
+ </dt>
+
+ <dd class="dd">
+ The number of threads inside the statestore dedicated to sending heartbeats. You should not typically
+ need to change this value.
+ <p class="p">
+ <strong class="ph b">Default:</strong> 10
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="statestore_scalability__statestore_heartbeat_frequency_ms">
+ <code class="ph codeph">-statestore_heartbeat_frequency_ms</code>
+ </dt>
+
+ <dd class="dd">
+ The frequency, in milliseconds, with which the statestore tries to send heartbeats to each subscriber.
+ This value should be good for large catalogs and clusters up to approximately 150 nodes. Beyond that,
+ you might need to increase this value to make the interval longer between heartbeat messages.
+ <p class="p">
+ <strong class="ph b">Default:</strong> 1000 (one heartbeat message every second)
+ </p>
+ </dd>
+
+
+ </dl>
+
+ <p class="p">
+ If it takes a very long time for a cluster to start up, and <span class="keyword cmdname">impala-shell</span> consistently
+ displays <code class="ph codeph">This Impala daemon is not ready to accept user requests</code>, the statestore might be
+ taking too long to send the entire catalog topic to the cluster. In this case, consider adding
+ <code class="ph codeph">--load_catalog_in_background=false</code> to your catalog service configuration. This setting
+ stops the statestore from loading the entire catalog into memory at cluster startup. Instead, metadata for
+ each table is loaded when the table is accessed for the first time.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="scalability__scalability_coordinator">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Controlling which Hosts are Coordinators and Executors</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ By default, each host in the cluster that runs the <span class="keyword cmdname">impalad</span>
+ daemon can act as the coordinator for an Impala query, execute the fragments
+ of the execution plan for the query, or both. During highly concurrent
+ workloads for large-scale queries, especially on large clusters, the dual
+ roles can cause scalability issues:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The extra work required for a host to act as the coordinator could interfere
+ with its capacity to perform other work for the earlier phases of the query.
+ For example, the coordinator can experience significant network and CPU overhead
+ during queries containing a large number of query fragments. Each coordinator
+ caches metadata for all table partitions and data files, which can be substantial
+ and contend with memory needed to process joins, aggregations, and other operations
+ performed by query executors.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Having a large number of hosts act as coordinators can cause unnecessary network
+ overhead, or even timeout errors, as each of those hosts communicates with the
+ <span class="keyword cmdname">statestored</span> daemon for metadata updates.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <span class="q">"soft limits"</span> imposed by the admission control feature are more likely
+ to be exceeded when there are a large number of heavily loaded hosts acting as
+ coordinators.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ If such scalability bottlenecks occur, you can explicitly specify that certain
+ hosts act as query coordinators, but not executors for query fragments.
+ These hosts do not participate in I/O-intensive operations such as scans,
+ and CPU-intensive operations such as aggregations.
+ </p>
+
+ <p class="p">
+ Then, you specify that the
+ other hosts act as executors but not coordinators. These hosts do not communicate
+ with the <span class="keyword cmdname">statestored</span> daemon or process the final result sets
+ from queries. You cannot connect to these hosts through clients such as
+ <span class="keyword cmdname">impala-shell</span> or business intelligence tools.
+ </p>
+
+ <p class="p">
+ This feature is available in <span class="keyword">Impala 2.9</span> and higher.
+ </p>
+
+ <p class="p">
+ To use this feature, you specify one of the following startup flags for the
+ <span class="keyword cmdname">impalad</span> daemon on each host:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">is_executor=false</code> for each host that
+ does not act as an executor for Impala queries.
+ These hosts act exclusively as query coordinators.
+ This setting typically applies to a relatively small number of
+ hosts, because the most common topology is to have nearly all
+ DataNodes doing work for query execution.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">is_coordinator=false</code> for each host that
+ does not act as a coordinator for Impala queries.
+ These hosts act exclusively as executors.
+ The number of hosts with this setting typically increases
+ as the cluster grows larger and handles more table partitions,
+ data files, and concurrent queries. As the overhead for query
+ coordination increases, it becomes more important to centralize
+ that work on dedicated hosts.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ By default, both of these settings are enabled for each <code class="ph codeph">impalad</code>
+ instance, allowing all such hosts to act as both executors and coordinators.
+ </p>
+
+ <p class="p">
+ For example, on a 100-node cluster, you might specify <code class="ph codeph">is_executor=false</code>
+ for 10 hosts, to dedicate those hosts as query coordinators. Then specify
+ <code class="ph codeph">is_coordinator=false</code> for the remaining 90 hosts. All explicit or
+ load-balanced connections must go to the 10 hosts acting as coordinators. These hosts
+ perform the network communication to keep metadata up-to-date and route query results
+ to the appropriate clients. The remaining 90 hosts perform the intensive I/O, CPU, and
+ memory operations that make up the bulk of the work for each query. If a bottleneck or
+ other performance issue arises on a specific host, you can narrow down the cause more
+ easily because each host is dedicated to specific operations within the overall
+ Impala workload.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="scalability__scalability_buffer_pool">
+ <h2 class="title topictitle2" id="ariaid-title5">Effect of Buffer Pool on Memory Usage (<span class="keyword">Impala 2.10</span> and higher)</h2>
+ <div class="body conbody">
+ <p class="p">
+ The buffer pool feature, available in <span class="keyword">Impala 2.10</span> and higher, changes the
+ way Impala allocates memory during a query. Most of the memory needed is reserved at the
+ beginning of the query, avoiding cases where a query might run for a long time before failing
+ with an out-of-memory error. The actual memory estimates and memory buffers are typically
+ smaller than before, so that more queries can run concurrently or process larger volumes
+ of data than previously.
+ </p>
+ <p class="p">
+ The buffer pool feature includes some query options that you can fine-tune:
+ <a class="xref" href="impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a>,
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>, and
+ <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>.
+ </p>
+ <p class="p">
+ Most of the effects of the buffer pool are transparent to you as an Impala user.
+ Memory use during spilling is now steadier and more predictable, instead of
+ increasing rapidly as more data is spilled to disk. The main change from a user
+ perspective is the need to increase the <code class="ph codeph">MAX_ROW_SIZE</code> query option
+ setting when querying tables with columns containing long strings, many columns,
+ or other combinations of factors that produce very large rows. If Impala encounters
+ rows that are too large to process with the default query option settings, the query
+ fails with an error message suggesting to increase the <code class="ph codeph">MAX_ROW_SIZE</code>
+ setting.
+ </p>
+ </div>
+ </article>
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="scalability__spill_to_disk">
+
+ <h2 class="title topictitle2" id="ariaid-title6">SQL Operations that Spill to Disk</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Certain memory-intensive operations write temporary data to disk (known as <dfn class="term">spilling</dfn> to disk)
+ when Impala is close to exceeding its memory limit on a particular host.
+ </p>
+
+ <p class="p">
+ The result is a query that completes successfully, rather than failing with an out-of-memory error. The
+ tradeoff is decreased performance due to the extra disk I/O to write the temporary data and read it back
+ in. The slowdown could be potentially be significant. Thus, while this feature improves reliability,
+ you should optimize your queries, system parameters, and hardware configuration to make this spilling a rare occurrence.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.10</span> and higher, also see <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for
+ changes to Impala memory allocation that might change the details of which queries spill to disk,
+ and how much memory and disk space is involved in the spilling operation.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">What kinds of queries might spill to disk:</strong>
+ </p>
+
+ <p class="p">
+ Several SQL clauses and constructs require memory allocations that could activat the spilling mechanism:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ when a query uses a <code class="ph codeph">GROUP BY</code> clause for columns
+ with millions or billions of distinct values, Impala keeps a
+ similar number of temporary results in memory, to accumulate the
+ aggregate results for each value in the group.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ When large tables are joined together, Impala keeps the values of
+ the join columns from one table in memory, to compare them to
+ incoming values from the other table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ When a large result set is sorted by the <code class="ph codeph">ORDER BY</code>
+ clause, each node sorts its portion of the result set in memory.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">UNION</code> operators
+ build in-memory data structures to represent all values found so
+ far, to eliminate duplicates as the query progresses.
+ </p>
+ </li>
+
+ </ul>
+
+ <p class="p">
+ When the spill-to-disk feature is activated for a join node within a query, Impala does not
+ produce any runtime filters for that join operation on that host. Other join nodes within
+ the query are not affected.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">How Impala handles scratch disk space for spilling:</strong>
+ </p>
+
+ <p class="p">
+ By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+ are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+ operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+ technique, without any name conflicts for these temporary files.) You can specify a different location by
+ starting the <span class="keyword cmdname">impalad</span> daemon with the
+ <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+ You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+ be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+ depending on the capacity and speed
+ of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+ Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+ in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+ Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing
+ files in a scratch directory during a query, Impala logs the error and the query fails.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Memory usage for SQL operators:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.10</span> and higher, the way SQL operators such as <code class="ph codeph">GROUP BY</code>,
+ <code class="ph codeph">DISTINCT</code>, and joins, transition between using additional memory or activating the
+ spill-to-disk feature is changed. The memory required to spill to disk is reserved up front, and you can
+ examine it in the <code class="ph codeph">EXPLAIN</code> plan when the <code class="ph codeph">EXPLAIN_LEVEL</code> query option is
+ set to 2 or higher.
+ </p>
+
+ <p class="p">
+ The infrastructure of the spilling feature affects the way the affected SQL operators, such as
+ <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">DISTINCT</code>, and joins, use memory.
+ On each host that participates in the query, each such operator in a query requires memory
+ to store rows of data and other data structures. Impala reserves a certain amount of memory
+ up front for each operator that supports spill-to-disk that is sufficient to execute the
+ operator. If an operator accumulates more data than can fit in the reserved memory, it
+ can either reserve more memory to continue processing data in memory or start spilling
+ data to temporary scratch files on disk. Thus, operators with spill-to-disk support
+ can adapt to different memory constraints by using however much memory is available
+ to speed up execution, yet tolerate low memory conditions by spilling data to disk.
+ </p>
+
+ <p class="p">
+ The amount data depends on the portion of the data being handled by that host, and thus
+ the operator may end up consuming different amounts of memory on different hosts.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> This feature was added to the <code class="ph codeph">ORDER BY</code> clause in Impala 1.4.
+ This feature was extended to cover join queries, aggregation functions, and analytic
+ functions in Impala 2.0. The size of the memory work area required by
+ each operator that spills was reduced from 512 megabytes to 256 megabytes in Impala 2.2.
+ <span class="ph">The spilling mechanism was reworked to take advantage of the
+ Impala buffer pool feature and be more predictable and stable in <span class="keyword">Impala 2.10</span>.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Avoiding queries that spill to disk:</strong>
+ </p>
+
+ <p class="p">
+ Because the extra I/O can impose significant performance overhead on these types of queries, try to avoid
+ this situation by using the following steps:
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Detect how often queries spill to disk, and how much temporary data is written. Refer to the following
+ sources:
+ <ul class="ul">
+ <li class="li">
+ The output of the <code class="ph codeph">PROFILE</code> command in the <span class="keyword cmdname">impala-shell</span>
+ interpreter. This data shows the memory usage for each host and in total across the cluster. The
+ <code class="ph codeph">WriteIoBytes</code> counter reports how much data was written to disk for each operator
+ during the query. (In <span class="keyword">Impala 2.9</span>, the counter was named
+ <code class="ph codeph">ScratchBytesWritten</code>; in <span class="keyword">Impala 2.8</span> and earlier, it was named
+ <code class="ph codeph">BytesWritten</code>.)
+ </li>
+
+ <li class="li">
+ The <span class="ph uicontrol">Queries</span> tab in the Impala debug web user interface. Select the query to
+ examine and click the corresponding <span class="ph uicontrol">Profile</span> link. This data breaks down the
+ memory usage for a single host within the cluster, the host whose web interface you are connected to.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Use one or more techniques to reduce the possibility of the queries spilling to disk:
+ <ul class="ul">
+ <li class="li">
+ Increase the Impala memory limit if practical, for example, if you can increase the available memory
+ by more than the amount of temporary data written to disk on a particular node. Remember that in
+ Impala 2.0 and later, you can issue <code class="ph codeph">SET MEM_LIMIT</code> as a SQL statement, which lets you
+ fine-tune the memory usage for queries from JDBC and ODBC applications.
+ </li>
+
+ <li class="li">
+ Increase the number of nodes in the cluster, to increase the aggregate memory available to Impala and
+ reduce the amount of memory required on each node.
+ </li>
+
+ <li class="li">
+ Increase the overall memory capacity of each DataNode at the hardware level.
+ </li>
+
+ <li class="li">
+ On a cluster with resources shared between Impala and other Hadoop components, use resource
+ management features to allocate more memory for Impala. See
+ <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+ </li>
+
+ <li class="li">
+ If the memory pressure is due to running many concurrent queries rather than a few memory-intensive
+ ones, consider using the Impala admission control feature to lower the limit on the number of
+ concurrent queries. By spacing out the most resource-intensive queries, you can avoid spikes in
+ memory usage and improve overall response times. See
+ <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+ </li>
+
+ <li class="li">
+ Tune the queries with the highest memory requirements, using one or more of the following techniques:
+ <ul class="ul">
+ <li class="li">
+ Run the <code class="ph codeph">COMPUTE STATS</code> statement for all tables involved in large-scale joins and
+ aggregation queries.
+ </li>
+
+ <li class="li">
+ Minimize your use of <code class="ph codeph">STRING</code> columns in join columns. Prefer numeric values
+ instead.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">EXPLAIN</code> plan to understand the execution strategy being used for the
+ most resource-intensive queries. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+ details.
+ </li>
+
+ <li class="li">
+ If Impala still chooses a suboptimal execution strategy even with statistics available, or if it
+ is impractical to keep the statistics up to date for huge or rapidly changing tables, add hints
+ to the most resource-intensive queries to select the right execution strategy. See
+ <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a> for details.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ If your queries experience substantial performance overhead due to spilling, enable the
+ <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> query option. This option prevents queries whose memory usage
+ is likely to be exorbitant from spilling to disk. See
+ <a class="xref" href="impala_disable_unsafe_spills.html#disable_unsafe_spills">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a> for details. As you tune
+ problematic queries using the preceding steps, fewer and fewer will be cancelled by this option
+ setting.
+ </li>
+ </ul>
+ </li>
+ </ol>
+
+ <p class="p">
+ <strong class="ph b">Testing performance implications of spilling to disk:</strong>
+ </p>
+
+ <p class="p">
+ To artificially provoke spilling, to test this feature and understand the performance implications, use a
+ test environment with a memory limit of at least 2 GB. Issue the <code class="ph codeph">SET</code> command with no
+ arguments to check the current setting for the <code class="ph codeph">MEM_LIMIT</code> query option. Set the query
+ option <code class="ph codeph">DISABLE_UNSAFE_SPILLS=true</code>. This option limits the spill-to-disk feature to prevent
+ runaway disk usage from queries that are known in advance to be suboptimal. Within
+ <span class="keyword cmdname">impala-shell</span>, run a query that you expect to be memory-intensive, based on the criteria
+ explained earlier. A self-join of a large table is a good candidate:
+ </p>
+
+<pre class="pre codeblock"><code>select count(*) from big_table a join big_table b using (column_with_many_values);
+</code></pre>
+
+ <p class="p">
+ Issue the <code class="ph codeph">PROFILE</code> command to get a detailed breakdown of the memory usage on each node
+ during the query.
+
+ </p>
+
+
+
+ <p class="p">
+ Set the <code class="ph codeph">MEM_LIMIT</code> query option to a value that is smaller than the peak memory usage
+ reported in the profile output. Now try the memory-intensive query again.
+ </p>
+
+ <p class="p">
+ Check if the query fails with a message like the following:
+ </p>
+
+<pre class="pre codeblock"><code>WARNINGS: Spilling has been disabled for plans that do not have stats and are not hinted
+to prevent potentially bad plans from using too many cluster resources. Compute stats on
+these tables, hint the plan or disable this behavior via query options to enable spilling.
+</code></pre>
+
+ <p class="p">
+ If so, the query could have consumed substantial temporary disk space, slowing down so much that it would
+ not complete in any reasonable time. Rather than rely on the spill-to-disk feature in this case, issue the
+ <code class="ph codeph">COMPUTE STATS</code> statement for the table or tables in your sample query. Then run the query
+ again, check the peak memory usage again in the <code class="ph codeph">PROFILE</code> output, and adjust the memory
+ limit again if necessary to be lower than the peak memory usage.
+ </p>
+
+ <p class="p">
+ At this point, you have a query that is memory-intensive, but Impala can optimize it efficiently so that
+ the memory usage is not exorbitant. You have set an artificial constraint through the
+ <code class="ph codeph">MEM_LIMIT</code> option so that the query would normally fail with an out-of-memory error. But
+ the automatic spill-to-disk feature means that the query should actually succeed, at the expense of some
+ extra disk I/O to read and write temporary work data.
+ </p>
+
+ <p class="p">
+ Try the query again, and confirm that it succeeds. Examine the <code class="ph codeph">PROFILE</code> output again. This
+ time, look for lines of this form:
+ </p>
+
+<pre class="pre codeblock"><code>- SpilledPartitions: <var class="keyword varname">N</var>
+</code></pre>
+
+ <p class="p">
+ If you see any such lines with <var class="keyword varname">N</var> greater than 0, that indicates the query would have
+ failed in Impala releases prior to 2.0, but now it succeeded because of the spill-to-disk feature. Examine
+ the total time taken by the <code class="ph codeph">AGGREGATION_NODE</code> or other query fragments containing non-zero
+ <code class="ph codeph">SpilledPartitions</code> values. Compare the times to similar fragments that did not spill, for
+ example in the <code class="ph codeph">PROFILE</code> output when the same query is run with a higher memory limit. This
+ gives you an idea of the performance penalty of the spill operation for a particular query with a
+ particular memory limit. If you make the memory limit just a little lower than the peak memory usage, the
+ query only needs to write a small amount of temporary data to disk. The lower you set the memory limit, the
+ more temporary data is written and the slower the query becomes.
+ </p>
+
+ <p class="p">
+ Now repeat this procedure for actual queries used in your environment. Use the
+ <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> setting to identify cases where queries used more memory than
+ necessary due to lack of statistics on the relevant tables and columns, and issue <code class="ph codeph">COMPUTE
+ STATS</code> where necessary.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">When to use DISABLE_UNSAFE_SPILLS:</strong>
+ </p>
+
+ <p class="p">
+ You might wonder, why not leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned on all the time. Whether and
+ how frequently to use this option depends on your system environment and workload.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> is suitable for an environment with ad hoc queries whose performance
+ characteristics and memory usage are not known in advance. It prevents <span class="q">"worst-case scenario"</span> queries
+ that use large amounts of memory unnecessarily. Thus, you might turn this option on within a session while
+ developing new SQL code, even though it is turned off for existing applications.
+ </p>
+
+ <p class="p">
+ Organizations where table and column statistics are generally up-to-date might leave this option turned on
+ all the time, again to avoid worst-case scenarios for untested queries or if a problem in the ETL pipeline
+ results in a table with no statistics. Turning on <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> lets you <span class="q">"fail
+ fast"</span> in this case and immediately gather statistics or tune the problematic queries.
+ </p>
+
+ <p class="p">
+ Some organizations might leave this option turned off. For example, you might have tables large enough that
+ the <code class="ph codeph">COMPUTE STATS</code> takes substantial time to run, making it impractical to re-run after
+ loading new data. If you have examined the <code class="ph codeph">EXPLAIN</code> plans of your queries and know that
+ they are operating efficiently, you might leave <code class="ph codeph">DISABLE_UNSAFE_SPILLS</code> turned off. In that
+ case, you know that any queries that spill will not go overboard with their memory consumption.
+ </p>
+
+ </div>
+ </article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title7" id="scalability__complex_query">
+<h2 class="title topictitle2" id="ariaid-title7">Limits on Query Size and Complexity</h2>
+<div class="body conbody">
+<p class="p">
+There are hardcoded limits on the maximum size and complexity of queries.
+Currently, the maximum number of expressions in a query is 2000.
+You might exceed the limits with large or deeply nested queries
+produced by business intelligence tools or other query generators.
+</p>
+<p class="p">
+If you have the ability to customize such queries or the query generation
+logic that produces them, replace sequences of repetitive expressions
+with single operators such as <code class="ph codeph">IN</code> or <code class="ph codeph">BETWEEN</code>
+that can represent multiple values or ranges.
+For example, instead of a large number of <code class="ph codeph">OR</code> clauses:
+</p>
+<pre class="pre codeblock"><code>WHERE val = 1 OR val = 2 OR val = 6 OR val = 100 ...
+</code></pre>
+<p class="p">
+use a single <code class="ph codeph">IN</code> clause:
+</p>
+<pre class="pre codeblock"><code>WHERE val IN (1,2,6,100,...)</code></pre>
+</div>
+</article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title8" id="scalability__scalability_io">
+<h2 class="title topictitle2" id="ariaid-title8">Scalability Considerations for Impala I/O</h2>
+<div class="body conbody">
+<p class="p">
+Impala parallelizes its I/O operations aggressively,
+therefore the more disks you can attach to each host, the better.
+Impala retrieves data from disk so quickly using
+bulk read operations on large blocks, that most queries
+are CPU-bound rather than I/O-bound.
+</p>
+<p class="p">
+Because the kind of sequential scanning typically done by
+Impala queries does not benefit much from the random-access
+capabilities of SSDs, spinning disks typically provide
+the most cost-effective kind of storage for Impala data,
+with little or no performance penalty as compared to SSDs.
+</p>
+<p class="p">
+Resource management features such as YARN, Llama, and admission control
+typically constrain the amount of memory, CPU, or overall number of
+queries in a high-concurrency environment.
+Currently, there is no throttling mechanism for Impala I/O.
+</p>
+</div>
+</article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="scalability__big_tables">
+ <h2 class="title topictitle2" id="ariaid-title9">Scalability Considerations for Table Layout</h2>
+ <div class="body conbody">
+ <p class="p">
+ Due to the overhead of retrieving and updating table metadata
+ in the metastore database, try to limit the number of columns
+ in a table to a maximum of approximately 2000.
+ Although Impala can handle wider tables than this, the metastore overhead
+ can become significant, leading to query performance that is slower
+ than expected based on the actual data volume.
+ </p>
+ <p class="p">
+ To minimize overhead related to the metastore database and Impala query planning,
+ try to limit the number of partitions for any partitioned table to a few tens of thousands.
+ </p>
+ <p class="p">
+ If the volume of data within a table makes it impractical to run exploratory
+ queries, consider using the <code class="ph codeph">TABLESAMPLE</code> clause to limit query processing
+ to only a percentage of data within the table. This technique reduces the overhead
+ for query startup, I/O to read the data, and the amount of network, CPU, and memory
+ needed to process intermediate results during the query. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a>
+ for details.
+ </p>
+ </div>
+ </article>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title10" id="scalability__kerberos_overhead_cluster_size">
+<h2 class="title topictitle2" id="ariaid-title10">Kerberos-Related Network Overhead for Large Clusters</h2>
+<div class="body conbody">
+<p class="p">
+When Impala starts up, or after each <code class="ph codeph">kinit</code> refresh, Impala sends a number of
+simultaneous requests to the KDC. For a cluster with 100 hosts, the KDC might be able to process
+all the requests within roughly 5 seconds. For a cluster with 1000 hosts, the time to process
+the requests would be roughly 500 seconds. Impala also makes a number of DNS requests at the same
+time as these Kerberos-related requests.
+</p>
+<p class="p">
+While these authentication requests are being processed, any submitted Impala queries will fail.
+During this period, the KDC and DNS may be slow to respond to requests from components other than Impala,
+so other secure services might be affected temporarily.
+</p>
+ <p class="p">
+ In <span class="keyword">Impala 2.12</span> or earlier, to reduce the
+ frequency of the <code class="ph codeph">kinit</code> renewal that initiates a new set
+ of authentication requests, increase the <code class="ph codeph">kerberos_reinit_interval</code>
+ configuration setting for the <code class="ph codeph">impalad</code> daemons. Currently,
+ the default is 60 minutes. Consider using a higher value such as 360 (6 hours).
+ </p>
+ <p class="p">
+ The <code class="ph codeph">kerberos_reinit_interval</code> configuration setting is removed
+ in <span class="keyword">Impala 3.0</span>, and the above step is no longer needed.
+ </p>
+
+</div>
+</article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="scalability__scalability_hotspots">
+ <h2 class="title topictitle2" id="ariaid-title11">Avoiding CPU Hotspots for HDFS Cached Data</h2>
+ <div class="body conbody">
+ <p class="p">
+ You can use the HDFS caching feature, described in <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+ with Impala to reduce I/O and memory-to-memory copying for frequently accessed tables or partitions.
+ </p>
+ <p class="p">
+ In the early days of this feature, you might have found that enabling HDFS caching
+ resulted in little or no performance improvement, because it could result in
+ <span class="q">"hotspots"</span>: instead of the I/O to read the table data being parallelized across
+ the cluster, the I/O was reduced but the CPU load to process the data blocks
+ might be concentrated on a single host.
+ </p>
+ <p class="p">
+ To avoid hotspots, include the <code class="ph codeph">WITH REPLICATION</code> clause with the
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements for tables that use HDFS caching.
+ This clause allows more than one host to cache the relevant data blocks, so the CPU load
+ can be shared, reducing the load on any one host.
+ See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>
+ for details.
+ </p>
+ <p class="p">
+ Hotspots with high CPU load for HDFS cached data could still arise in some cases, due to
+ the way that Impala schedules the work of processing data blocks on different hosts.
+ In <span class="keyword">Impala 2.5</span> and higher, scheduling improvements mean that the work for
+ HDFS cached data is divided better among all the hosts that have cached replicas
+ for a particular data block. When more than one host has a cached replica for a data block,
+ Impala assigns the work of processing that block to whichever host has done the least work
+ (in terms of number of bytes read) for the current query. If hotspots persist even with this
+ load-based scheduling algorithm, you can enable the query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA=TRUE</code>
+ to further distribute the CPU load. This setting causes Impala to randomly pick a host to process a cached
+ data block if the scheduling algorithm encounters a tie when deciding which host has done the
+ least work.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="scalability__scalability_file_handle_cache">
+ <h2 class="title topictitle2" id="ariaid-title12">Scalability Considerations for NameNode Traffic with File Handle Caching</h2>
+ <div class="body conbody">
+ <p class="p">
+ One scalability aspect that affects heavily loaded clusters is the load on the HDFS
+ NameNode, from looking up the details as each HDFS file is opened. Impala queries
+ often access many different HDFS files, for example if a query does a full table scan
+ on a table with thousands of partitions, each partition containing multiple data files.
+ Accessing each column of a Parquet file also involves a separate <span class="q">"open"</span> call,
+ further increasing the load on the NameNode. High NameNode overhead can add startup time
+ (that is, increase latency) to Impala queries, and reduce overall throughput for non-Impala
+ workloads that also require accessing HDFS files.
+ </p>
+ <p class="p"> In <span class="keyword">Impala 2.10</span> and higher, you can reduce
+ NameNode overhead by enabling a caching feature for HDFS file handles.
+ Data files that are accessed by different queries, or even multiple
+ times within the same query, can be accessed without a new <span class="q">"open"</span>
+ call and without fetching the file details again from the NameNode. </p>
+ <p class="p">
+ Because this feature only involves HDFS data files, it does not apply to non-HDFS tables,
+ such as Kudu or HBase tables, or tables that store their data on cloud services such as
+ S3 or ADLS. Any read operations that perform remote reads also skip the cached file handles.
+ </p>
+ <p class="p"> The feature is enabled by default with 20,000 file handles to be
+ cached. To change the value, set the configuration option
+ <code class="ph codeph">max_cached_file_handles</code> to a non-zero value for each
+ <span class="keyword cmdname">impalad</span> daemon. From the initial default value of
+ 20000, adjust upward if NameNode request load is still significant, or
+ downward if it is more important to reduce the extra memory usage on
+ each host. Each cache entry consumes 6 KB, meaning that caching 20,000
+ file handles requires up to 120 MB on each Impala executor. The exact
+ memory usage varies depending on how many file handles have actually
+ been cached; memory is freed as file handles are evicted from the cache. </p>
+ <p class="p">
+ If a manual HDFS operation moves a file to the HDFS Trashcan while the file handle is cached,
+ Impala still accesses the contents of that file. This is a change from prior behavior. Previously,
+ accessing a file that was in the trashcan would cause an error. This behavior only applies to
+ non-Impala methods of removing HDFS files, not the Impala mechanisms such as <code class="ph codeph">TRUNCATE TABLE</code>
+ or <code class="ph codeph">DROP TABLE</code>.
+ </p>
+ <p class="p">
+ If files are removed, replaced, or appended by HDFS operations outside of Impala, the way to bring the
+ file information up to date is to run the <code class="ph codeph">REFRESH</code> statement on the table.
+ </p>
+ <p class="p">
+ File handle cache entries are evicted as the cache fills up, or based on a timeout period
+ when they have not been accessed for some time.
+ </p>
+ <p class="p">
+ To evaluate the effectiveness of file handle caching for a particular workload, issue the
+ <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> or examine query
+ profiles in the Impala web UI. Look for the ratio of <code class="ph codeph">CachedFileHandlesHitCount</code>
+ (ideally, should be high) to <code class="ph codeph">CachedFileHandlesMissCount</code> (ideally, should be low).
+ Before starting any evaluation, run some representative queries to <span class="q">"warm up"</span> the cache,
+ because the first time each data file is accessed is always recorded as a cache miss.
+ To see metrics about file handle caching for each <span class="keyword cmdname">impalad</span> instance,
+ examine the <span class="ph uicontrol">/metrics</span> page in the Impala web UI, in particular the fields
+ <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-miss-count</span>,
+ <span class="ph uicontrol">impala-server.io.mgr.cached-file-handles-hit-count</span>, and
+ <span class="ph uicontrol">impala-server.io.mgr.num-cached-file-handles</span>.
+ </p>
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schedule_random_replica.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schedule_random_replica.html b/docs/build3x/html/topics/impala_schedule_random_replica.html
new file mode 100644
index 0000000..85f724d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schedule_random_replica.html
@@ -0,0 +1,83 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schedule_random_replica"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</title></head><body id="schedule_random_replica"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SCHEDULE_RANDOM_REPLICA Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option fine-tunes the algorithm for deciding which host
+ processes each HDFS data block. It only applies to tables and partitions that are not enabled
+ for the HDFS caching feature.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ In the presence of HDFS cached replicas, Impala randomizes
+ which host processes each cached data block.
+ To ensure that HDFS data blocks are cached on more
+ than one host, use the <code class="ph codeph">WITH REPLICATION</code> clause along with
+ the <code class="ph codeph">CACHED IN</code> clause in a
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement.
+ Specify a replication value greater than or equal to the HDFS block replication factor.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option applies to tables and partitions
+ that <em class="ph i">do not</em> use HDFS caching.
+ By default, Impala estimates how much work each host has done for
+ the query, and selects the host that has the lowest workload.
+ This algorithm is intended to reduce CPU hotspots arising when the
+ same host is selected to process multiple data blocks, but hotspots
+ might still arise for some combinations of queries and data layout.
+ When the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> option is enabled,
+ Impala further randomizes the scheduling algorithm for non-HDFS cached blocks,
+ which can further reduce the chance of CPU hotspots.
+ </p>
+
+ <p class="p">
+ This query option works in conjunction with the work scheduling improvements
+ in <span class="keyword">Impala 2.5</span> and higher. The scheduling improvements
+ distribute the processing for cached HDFS data blocks to minimize hotspots:
+ if a data block is cached on more than one host, Impala chooses which host
+ to process each block based on which host has read the fewest bytes during
+ the current query. Enable <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> setting if CPU hotspots
+ still persist because of cases where hosts are <span class="q">"tied"</span> in terms of
+ the amount of work done; by default, Impala picks the first eligible host
+ in this case.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a>,
+ <a class="xref" href="impala_scalability.html#scalability_hotspots">Avoiding CPU Hotspots for HDFS Cached Data</a>
+ , <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schema_design.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schema_design.html b/docs/build3x/html/topics/impala_schema_design.html
new file mode 100644
index 0000000..31285d6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schema_design.html
@@ -0,0 +1,184 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_design"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Guidelines for Designing Impala Schemas</title></head><body id="schema_design"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Guidelines for Designing Impala Schemas</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The guidelines in this topic help you to construct an optimized and scalable schema, one that integrates well
+ with your existing data management processes. Use these guidelines as a checklist when doing any
+ proof-of-concept work, porting exercise, or before deploying to production.
+ </p>
+
+ <p class="p">
+ If you are adapting an existing database or Hive schema for use with Impala, read the guidelines in this
+ section and then see <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for specific porting and compatibility tips.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <section class="section" id="schema_design__schema_design_text_vs_binary"><h2 class="title sectiontitle">Prefer binary file formats over text-based formats.</h2>
+
+
+
+ <p class="p">
+ To save space and improve memory usage and query performance, use binary file formats for any large or
+ intensively queried tables. Parquet file format is the most efficient for data warehouse-style analytic
+ queries. Avro is the other binary file format that Impala supports, that you might already have as part of
+ a Hadoop ETL pipeline.
+ </p>
+
+ <p class="p">
+ Although Impala can create and query tables with the RCFile and SequenceFile file formats, such tables are
+ relatively bulky due to the text-based nature of those formats, and are not optimized for data
+ warehouse-style queries due to their row-oriented layout. Impala does not support <code class="ph codeph">INSERT</code>
+ operations for tables with these file formats.
+ </p>
+
+ <p class="p">
+ Guidelines:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ For an efficient and scalable format for large, performance-critical tables, use the Parquet file format.
+ </li>
+
+ <li class="li">
+ To deliver intermediate data during the ETL process, in a format that can also be used by other Hadoop
+ components, Avro is a reasonable choice.
+ </li>
+
+ <li class="li">
+ For convenient import of raw data, use a text table instead of RCFile or SequenceFile, and convert to
+ Parquet in a later stage of the ETL process.
+ </li>
+ </ul>
+ </section>
+
+ <section class="section" id="schema_design__schema_design_compression"><h2 class="title sectiontitle">Use Snappy compression where practical.</h2>
+
+
+
+ <p class="p">
+ Snappy compression involves low CPU overhead to decompress, while still providing substantial space
+ savings. In cases where you have a choice of compression codecs, such as with the Parquet and Avro file
+ formats, use Snappy compression unless you find a compelling reason to use a different codec.
+ </p>
+ </section>
+
+ <section class="section" id="schema_design__schema_design_numeric_types"><h2 class="title sectiontitle">Prefer numeric types over strings.</h2>
+
+
+
+ <p class="p">
+ If you have numeric values that you could treat as either strings or numbers (such as
+ <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> for partition key columns), define
+ them as the smallest applicable integer types. For example, <code class="ph codeph">YEAR</code> can be
+ <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code> can be <code class="ph codeph">TINYINT</code>.
+ Although you might not see any difference in the way partitioned tables or text files are laid out on disk,
+ using numeric types will save space in binary formats such as Parquet, and in memory when doing queries,
+ particularly resource-intensive queries such as joins.
+ </p>
+ </section>
+
+
+
+ <section class="section" id="schema_design__schema_design_partitioning"><h2 class="title sectiontitle">Partition, but do not over-partition.</h2>
+
+
+
+ <p class="p">
+ Partitioning is an important aspect of performance tuning for Impala. Follow the procedures in
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> to set up partitioning for your biggest, most
+ intensively queried tables.
+ </p>
+
+ <p class="p">
+ If you are moving to Impala from a traditional database system, or just getting started in the Big Data
+ field, you might not have enough data volume to take advantage of Impala parallel queries with your
+ existing partitioning scheme. For example, if you have only a few tens of megabytes of data per day,
+ partitioning by <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code> columns might be
+ too granular. Most of your cluster might be sitting idle during queries that target a single day, or each
+ node might have very little work to do. Consider reducing the number of partition key columns so that each
+ partition directory contains several gigabytes worth of data.
+ </p>
+
+ <p class="p">
+ For example, consider a Parquet table where each data file is 1 HDFS block, with a maximum block size of 1
+ GB. (In Impala 2.0 and later, the default Parquet block size is reduced to 256 MB. For this exercise, let's
+ assume you have bumped the size back up to 1 GB by setting the query option
+ <code class="ph codeph">PARQUET_FILE_SIZE=1g</code>.) if you have a 10-node cluster, you need 10 data files (up to 10 GB)
+ to give each node some work to do for a query. But each core on each machine can process a separate data
+ block in parallel. With 16-core machines on a 10-node cluster, a query could process up to 160 GB fully in
+ parallel. If there are only a few data files per partition, not only are most cluster nodes sitting idle
+ during queries, so are most cores on those machines.
+ </p>
+
+ <p class="p">
+ You can reduce the Parquet block size to as low as 128 MB or 64 MB to increase the number of files per
+ partition and improve parallelism. But also consider reducing the level of partitioning so that analytic
+ queries have enough data to work with.
+ </p>
+ </section>
+
+ <section class="section" id="schema_design__schema_design_compute_stats"><h2 class="title sectiontitle">Always compute stats after loading data.</h2>
+
+
+
+ <p class="p">
+ Impala makes extensive use of statistics about data in the overall table and in each column, to help plan
+ resource-intensive operations such as join queries and inserting into partitioned Parquet tables. Because
+ this information is only available after data is loaded, run the <code class="ph codeph">COMPUTE STATS</code> statement
+ on a table after loading or replacing data in a table or partition.
+ </p>
+
+ <p class="p">
+ Having accurate statistics can make the difference between a successful operation, or one that fails due to
+ an out-of-memory error or a timeout. When you encounter performance or capacity issues, always use the
+ <code class="ph codeph">SHOW STATS</code> statement to check if the statistics are present and up-to-date for all tables
+ in the query.
+ </p>
+
+ <p class="p">
+ When doing a join query, Impala consults the statistics for each joined table to determine their relative
+ sizes and to estimate the number of rows produced in each join stage. When doing an <code class="ph codeph">INSERT</code>
+ into a Parquet table, Impala consults the statistics for the source table to determine how to distribute
+ the work of constructing the data files for each partition.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for the syntax of the <code class="ph codeph">COMPUTE
+ STATS</code> statement, and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for all the performance
+ considerations for table and column statistics.
+ </p>
+ </section>
+
+ <section class="section" id="schema_design__schema_design_explain"><h2 class="title sectiontitle">Verify sensible execution plans with EXPLAIN and SUMMARY.</h2>
+
+
+
+ <p class="p">
+ Before executing a resource-intensive query, use the <code class="ph codeph">EXPLAIN</code> statement to get an overview
+ of how Impala intends to parallelize the query and distribute the work. If you see that the query plan is
+ inefficient, you can take tuning steps such as changing file formats, using partitioned tables, running the
+ <code class="ph codeph">COMPUTE STATS</code> statement, or adding query hints. For information about all of these
+ techniques, see <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a>.
+ </p>
+
+ <p class="p">
+ After you run a query, you can see performance-related information about how it actually ran by issuing the
+ <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>. Prior to Impala 1.4, you would use
+ the <code class="ph codeph">PROFILE</code> command, but its highly technical output was only useful for the most
+ experienced users. <code class="ph codeph">SUMMARY</code>, new in Impala 1.4, summarizes the most useful information for
+ all stages of execution, for all nodes rather than splitting out figures for each node.
+ </p>
+ </section>
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_schema_objects.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_schema_objects.html b/docs/build3x/html/topics/impala_schema_objects.html
new file mode 100644
index 0000000..147bb50
--- /dev/null
+++ b/docs/build3x/html/topics/impala_schema_objects.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aliases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_databases.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions_overview.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_identifiers.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_tables.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_views.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name=
"DC.Format" content="XHTML"><meta name="DC.Identifier" content="schema_objects"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Schema Objects and Object Names</title></head><body id="schema_objects"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Schema Objects and Object Names</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ With Impala, you work with schema objects that are familiar to database users: primarily databases, tables, views,
+ and functions. The SQL syntax to work with these objects is explained in
+ <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>. This section explains the conceptual knowledge you need to
+ work with these objects and the various ways to specify their names.
+ </p>
+
+ <p class="p">
+ Within a table, partitions can also be considered a kind of object. Partitioning is an important subject for
+ Impala, with its own documentation section covering use cases and performance considerations. See
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details.
+ </p>
+
+ <p class="p">
+ Impala does not have a counterpart of the <span class="q">"tablespace"</span> notion from some database systems. By default,
+ all the data files for a database, table, or partition are located within nested folders within the HDFS file
+ system. You can also specify a particular HDFS location for a given Impala table or partition. The raw data
+ for these objects is represented as a collection of data files, providing the flexibility to load data by
+ simply moving files into the expected HDFS location.
+ </p>
+
+ <p class="p">
+ Information about the schema objects is held in the
+ <a class="xref" href="impala_hadoop.html#intro_metastore">metastore</a> database. This database is shared between
+ Impala and Hive, allowing each to create, drop, and query each other's databases, tables, and so on. When
+ Impala makes a change to schema objects through a <code class="ph codeph">CREATE</code>, <code class="ph codeph">ALTER</code>,
+ <code class="ph codeph">DROP</code>, <code class="ph codeph">INSERT</code>, or <code class="ph codeph">LOAD DATA</code> statement, it broadcasts those
+ changes to all nodes in the cluster through the <a class="xref" href="impala_components.html#intro_catalogd">catalog
+ service</a>. When you make such changes through Hive or directly through manipulating HDFS files, you use
+ the <a class="xref" href="impala_refresh.html#refresh">REFRESH</a> or
+ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a> statements on the
+ Impala side to recognize the newly loaded data, new tables, and so on.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_aliases.html">Overview of Impala Aliases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_databases.html">Overview of Impala Databases</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_functions_overview.html">Overview of Impala Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_identifiers.html">Overview of Impala Identifiers</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tables.html">Overview of Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_views.html">Overview of Impala Views</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Refer
ence</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_scratch_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_scratch_limit.html b/docs/build3x/html/topics/impala_scratch_limit.html
new file mode 100644
index 0000000..a743dca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_scratch_limit.html
@@ -0,0 +1,77 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="scratch_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SCRATCH_LIMIT Query Option</title></head><body id="scratch_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SCRATCH_LIMIT Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies the maximum amount of disk storage, in bytes, that any Impala query can consume
+ on any host using the <span class="q">"spill to disk"</span> mechanism that handles queries that exceed
+ the memory limit.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+ megabytes or gigabytes. For example:
+ </p>
+
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set SCRATCH_LIMIT=134217728
+
+-- 512 megabytes.
+set SCRATCH_LIMIT=512m;
+
+-- 1 gigabyte.
+set SCRATCH_LIMIT=1g;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ A value of zero turns off the spill to disk feature for queries
+ in the current session, causing them to fail immediately if they
+ exceed the memory limit.
+ </p>
+
+ <p class="p">
+ The amount of memory used per host for a query is limited by the
+ <code class="ph codeph">MEM_LIMIT</code> query option.
+ </p>
+
+ <p class="p">
+ The more Impala daemon hosts in the cluster, the less memory is used on each host,
+ and therefore also less scratch space is required for queries that
+ exceed the memory limit.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> -1 (amount of spill space is unlimited)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a>,
+ <a class="xref" href="impala_mem_limit.html#mem_limit">MEM_LIMIT Query Option</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security.html b/docs/build3x/html/topics/impala_security.html
new file mode 100644
index 0000000..e8b1588
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_guidelines.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_files.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_install.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_metastore.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security_webui.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ssl.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authorization.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="DC.Relation" scheme="URI" content="../topics/i
mpala_auditing.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_lineage.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Security</title></head><body id="security"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Impala Security</span></h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala includes a fine-grained authorization framework for Hadoop, based on Apache Sentry.
+ Sentry authorization was added in Impala 1.1.0. Together with the Kerberos
+ authentication framework, Sentry takes Hadoop security to a new level needed for the requirements of
+ highly regulated industries such as healthcare, financial services, and government. Impala also includes
+ an auditing capability which was added in Impala 1.1.1; Impala generates the audit data which can be
+ consumed, filtered, and visualized by cluster-management components focused on governance.
+ </p>
+
+ <p class="p">
+ The Impala security features have several objectives. At the most basic level, security prevents
+ accidents or mistakes that could disrupt application processing, delete or corrupt data, or reveal data to
+ unauthorized users. More advanced security features and practices can harden the system against malicious
+ users trying to gain unauthorized access or perform other disallowed operations. The auditing feature
+ provides a way to confirm that no unauthorized access occurred, and detect whether any such attempts were
+ made. This is a critical set of features for production deployments in large organizations that handle
+ important or sensitive data. It sets the stage for multi-tenancy, where multiple applications run
+ concurrently and are prevented from interfering with each other.
+ </p>
+
+ <p class="p">
+ The material in this section presumes that you are already familiar with administering secure Linux systems.
+ That is, you should know the general security practices for Linux and Hadoop, and their associated commands
+ and configuration files. For example, you should know how to create Linux users and groups, manage Linux
+ group membership, set Linux and HDFS file permissions and ownership, and designate the default permissions
+ and ownership for new files. You should be familiar with the configuration of the nodes in your Hadoop
+ cluster, and know how to apply configuration changes or run a set of commands across all the nodes.
+ </p>
+
+ <p class="p">
+ The security features are divided into these broad categories:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm">
+ authorization
+ </dt>
+
+ <dd class="dd">
+ Which users are allowed to access which resources, and what operations are they allowed to perform?
+ Impala relies on the open source Sentry project for authorization. By default (when authorization is not
+ enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+ user, which is suitable for a development/test environment but not for a secure production environment.
+ When authorization is enabled, Impala uses the OS user ID of the user who runs
+ <span class="keyword cmdname">impala-shell</span> or other client program, and associates various privileges with each
+ user. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details about setting up and managing
+ authorization.
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm">
+ authentication
+ </dt>
+
+ <dd class="dd">
+ How does Impala verify the identity of the user to confirm that they really are allowed to exercise the
+ privileges assigned to that user? Impala relies on the Kerberos subsystem for authentication. See
+ <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about setting up and managing authentication.
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm">
+ auditing
+ </dt>
+
+ <dd class="dd">
+ What operations were attempted, and did they succeed or not? This feature provides a way to look back and
+ diagnose whether attempts were made to perform unauthorized operations. You use this information to track
+ down suspicious activity, and to see where changes are needed in authorization policies. The audit data
+ produced by this feature can be collected and presented in a user-friendly form by cluster-management
+ software. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details about setting up and managing
+ auditing.
+ </dd>
+
+
+ </dl>
+
+ <p class="p toc"></p>
+
+
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_security_guidelines.html">Security Guidelines for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_files.html">Securing Impala Data and Log Files</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_install.html">Installation Considerations for Impala Security</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_metastore.html">Securing the Hive Metastore Database</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_security_webui.html">Securing the Impala Web User Interface</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ssl.html">Configuring TLS/SSL for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authorization.html
">Enabling Sentry Authorization for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_authentication.html">Impala Authentication</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_auditing.html">Auditing Impala Operations</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_lineage.html">Viewing Lineage Information for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>
[22/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_math_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_math_functions.html b/docs/build3x/html/topics/impala_math_functions.html
new file mode 100644
index 0000000..9987e34
--- /dev/null
+++ b/docs/build3x/html/topics/impala_math_functions.html
@@ -0,0 +1,1711 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="math_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Mathematical Functions</title></head><body id="math_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Mathematical Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Mathematical functions, or arithmetic functions, perform numeric calculations that are typically more complex
+ than basic addition, subtraction, multiplication, and division. For example, these functions include
+ trigonometric, logarithmic, and base conversion operations.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In Impala, exponentiation uses the <code class="ph codeph">pow()</code> function rather than an exponentiation operator
+ such as <code class="ph codeph">**</code>.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The mathematical functions operate mainly on these data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+ <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>, <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+ <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>, and <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>. For the operators that
+ perform the standard operations such as addition, subtraction, multiplication, and division, see
+ <a class="xref" href="impala_operators.html#arithmetic_operators">Arithmetic Operators</a>.
+ </p>
+
+ <p class="p">
+ Functions that perform bitwise operations are explained in <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following mathematical functions:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="math_functions__abs">
+ <code class="ph codeph">abs(numeric_type a)</code>
+
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the absolute value of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Use this function to ensure all return values are positive. This is different than
+ the <code class="ph codeph">positive()</code> function, which returns its argument unchanged (even if the argument
+ was negative).
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__acos">
+ <code class="ph codeph">acos(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the arccosine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__asin">
+ <code class="ph codeph">asin(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the arcsine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__atan">
+ <code class="ph codeph">atan(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the arctangent of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__atan2">
+ <code class="ph codeph">atan2(double a, double b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the arctangent of the two arguments, with the signs of the arguments used to determine the
+ quadrant of the result.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__bin">
+ <code class="ph codeph">bin(bigint a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the binary representation of an integer value, that is, a string of 0 and 1
+ digits.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__ceil">
+ <code class="ph codeph">ceil(double a)</code>,
+ <code class="ph codeph">ceil(decimal(p,s) a)</code>,
+ <code class="ph codeph" id="math_functions__ceiling">ceiling(double a)</code>,
+ <code class="ph codeph">ceiling(decimal(p,s) a)</code>,
+ <code class="ph codeph" id="math_functions__dceil">dceil(double a)</code>,
+ <code class="ph codeph">dceil(decimal(p,s) a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the smallest integer that is greater than or equal to the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__conv">
+ <code class="ph codeph">conv(bigint num, int from_base, int to_base), conv(string num, int from_base, int
+ to_base)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string representation of an integer value in a particular base. The input value
+ can be a string, for example to convert a hexadecimal number such as <code class="ph codeph">fce2</code> to decimal. To
+ use the return value as a number (for example, when converting to base 10), use <code class="ph codeph">CAST()</code>
+ to convert to the appropriate type.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__cos">
+ <code class="ph codeph">cos(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the cosine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__cosh">
+ <code class="ph codeph">cosh(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the hyperbolic cosine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__cot">
+ <code class="ph codeph">cot(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the cotangent of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__degrees">
+ <code class="ph codeph">degrees(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Converts argument value from radians to degrees.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__e">
+ <code class="ph codeph">e()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the
+ <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+ constant e</a>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__exp">
+ <code class="ph codeph">exp(double a)</code>,
+ <code class="ph codeph" id="math_functions__dexp">dexp(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the
+ <a class="xref" href="https://en.wikipedia.org/wiki/E_(mathematical_constant" target="_blank">mathematical
+ constant e</a> raised to the power of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__factorial">
+ <code class="ph codeph">factorial(integer_type a)</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Computes the <a class="xref" href="https://en.wikipedia.org/wiki/Factorial" target="_blank">factorial</a> of an integer value.
+ It works with any integer type.
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> You can use either the <code class="ph codeph">factorial()</code> function or the <code class="ph codeph">!</code> operator.
+ The factorial of 0 is 1. Likewise, the <code class="ph codeph">factorial()</code> function returns 1 for any negative value.
+ The maximum positive value for the input argument is 20; a value of 21 or greater overflows the
+ range for a <code class="ph codeph">BIGINT</code> and causes an error.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+<pre class="pre codeblock"><code>select factorial(5);
++--------------+
+| factorial(5) |
++--------------+
+| 120 |
++--------------+
+
+select 5!;
++-----+
+| 5! |
++-----+
+| 120 |
++-----+
+
+select factorial(0);
++--------------+
+| factorial(0) |
++--------------+
+| 1 |
++--------------+
+
+select factorial(-100);
++-----------------+
+| factorial(-100) |
++-----------------+
+| 1 |
++-----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__floor">
+ <code class="ph codeph">floor(double a)</code>,
+ <code class="ph codeph">floor(decimal(p,s) a)</code>,
+ <code class="ph codeph" id="math_functions__dfloor">dfloor(double a)</code>,
+ <code class="ph codeph">dfloor(decimal(p,s) a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the largest integer that is less than or equal to the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input type
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__fmod">
+ <code class="ph codeph">fmod(double a, double b), fmod(float a, float b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the modulus of a floating-point number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">float</code> or <code class="ph codeph">double</code>, depending on type of arguments
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.1.1
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Because this function operates on <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+ values, it is subject to potential rounding errors for values that cannot be
+ represented precisely. Prefer to use whole numbers, or values that you know
+ can be represented precisely by the <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code>
+ types.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show equivalent operations with the <code class="ph codeph">fmod()</code>
+ function and the <code class="ph codeph">%</code> arithmetic operator, for values not subject
+ to any rounding error.
+ </p>
+<pre class="pre codeblock"><code>select fmod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1 |
++-------------+
+
+select fmod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5 |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1 |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5 |
++---------+
+</code></pre>
+ <p class="p">
+ The following examples show operations with the <code class="ph codeph">fmod()</code>
+ function for values that cannot be represented precisely by the
+ <code class="ph codeph">DOUBLE</code> or <code class="ph codeph">FLOAT</code> types, and thus are
+ subject to rounding error. <code class="ph codeph">fmod(9.9,3.0)</code> returns a value
+ slightly different than the expected 0.9 because of rounding.
+ <code class="ph codeph">fmod(9.9,3.3)</code> returns a value quite different from
+ the expected value of 0 because of rounding error during intermediate
+ calculations.
+ </p>
+<pre class="pre codeblock"><code>select fmod(9.9,3.0);
++--------------------+
+| fmod(9.9, 3.0) |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select fmod(9.9,3.3);
++-------------------+
+| fmod(9.9, 3.3) |
++-------------------+
+| 3.299999713897705 |
++-------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__fnv_hash">
+ <code class="ph codeph">fnv_hash(type v)</code>,
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a consistent 64-bit value derived from the input argument, for convenience of
+ implementing hashing logic in an application.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ You might use the return value in an application where you perform load balancing, bucketing, or some
+ other technique to divide processing or storage.
+ </p>
+ <p class="p">
+ Because the result can be any 64-bit value, to restrict the value to a particular range, you can use an
+ expression that includes the <code class="ph codeph">ABS()</code> function and the <code class="ph codeph">%</code> (modulo)
+ operator. For example, to produce a hash value in the range 0-9, you could use the expression
+ <code class="ph codeph">ABS(FNV_HASH(x)) % 10</code>.
+ </p>
+ <p class="p">
+ This function implements the same algorithm that Impala uses internally for hashing, on systems where
+ the CRC32 instructions are not available.
+ </p>
+ <p class="p">
+ This function implements the
+ <a class="xref" href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function" target="_blank">Fowler–Noll–Vo
+ hash function</a>, in particular the FNV-1a variation. This is not a perfect hash function: some
+ combinations of values could produce the same result value. It is not suitable for cryptographic use.
+ </p>
+ <p class="p">
+ Similar input values of different types could produce different hash values, for example the same
+ numeric value represented as <code class="ph codeph">SMALLINT</code> or <code class="ph codeph">BIGINT</code>,
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">DECIMAL(5,2)</code> or
+ <code class="ph codeph">DECIMAL(20,5)</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table h (x int, s string);
+[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
+[localhost:21000] > select x, fnv_hash(x) from h;
++------------+----------------------+
+| x | fnv_hash(x) |
++------------+----------------------+
+| 0 | -2611523532599129963 |
+| 1 | 4307505193096137732 |
+| 1234567890 | 3614724209955230832 |
++------------+----------------------+
+[localhost:21000] > select s, fnv_hash(s) from h;
++------------------------------+---------------------+
+| s | fnv_hash(s) |
++------------------------------+---------------------+
+| hello | 6414202926103426347 |
+| world | 6535280128821139475 |
+| antidisestablishmentarianism | -209330013948433970 |
++------------------------------+---------------------+
+[localhost:21000] > select s, abs(fnv_hash(s)) % 10 from h;
++------------------------------+-------------------------+
+| s | abs(fnv_hash(s)) % 10.0 |
++------------------------------+-------------------------+
+| hello | 8 |
+| world | 6 |
+| antidisestablishmentarianism | 4 |
++------------------------------+-------------------------+</code></pre>
+ <p class="p">
+ For short argument values, the high-order bits of the result have relatively low entropy:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table b (x boolean);
+[localhost:21000] > insert into b values (true), (true), (false), (false);
+[localhost:21000] > select x, fnv_hash(x) from b;
++-------+---------------------+
+| x | fnv_hash(x) |
++-------+---------------------+
+| true | 2062020650953872396 |
+| true | 2062020650953872396 |
+| false | 2062021750465500607 |
+| false | 2062021750465500607 |
++-------+---------------------+</code></pre>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.2.2
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__greatest">
+ <code class="ph codeph">greatest(bigint a[, bigint b ...])</code>, <code class="ph codeph">greatest(double a[, double b ...])</code>,
+ <code class="ph codeph">greatest(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">greatest(string a[, string b
+ ...])</code>, <code class="ph codeph">greatest(timestamp a[, timestamp b ...])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the largest value from a list of expressions.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__hex">
+ <code class="ph codeph">hex(bigint a), hex(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the hexadecimal representation of an integer value, or of the characters in a
+ string.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__is_inf">
+ <code class="ph codeph">is_inf(double a)</code>,
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"inf"</span>, signifying infinity.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+ respectively, and Impala interprets them as these special values. They can also be produced by certain
+ arithmetic expressions; for example, <code class="ph codeph">1/0</code> returns <code class="ph codeph">Infinity</code> and
+ <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+ DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__is_nan">
+ <code class="ph codeph">is_nan(double a)</code>,
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests whether a value is equal to the special value <span class="q">"NaN"</span>, signifying <span class="q">"not a
+ number"</span>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">boolean</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Infinity and NaN can be specified in text data files as <code class="ph codeph">inf</code> and <code class="ph codeph">nan</code>
+ respectively, and Impala interprets them as these special values. They can also be produced by certain
+ arithmetic expressions; for example, <code class="ph codeph">1/0</code> returns <code class="ph codeph">Infinity</code> and
+ <code class="ph codeph">pow(-1, 0.5)</code> returns <code class="ph codeph">NaN</code>. Or you can cast the literal values, such as <code class="ph codeph">CAST('nan' AS
+ DOUBLE)</code> or <code class="ph codeph">CAST('inf' AS DOUBLE)</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__least">
+ <code class="ph codeph">least(bigint a[, bigint b ...])</code>, <code class="ph codeph">least(double a[, double b ...])</code>,
+ <code class="ph codeph">least(decimal(p,s) a[, decimal(p,s) b ...])</code>, <code class="ph codeph">least(string a[, string b
+ ...])</code>, <code class="ph codeph">least(timestamp a[, timestamp b ...])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the smallest value from a list of expressions.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__ln">
+ <code class="ph codeph">ln(double a)</code>,
+ <code class="ph codeph" id="math_functions__dlog1">dlog1(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the
+ <a class="xref" href="https://en.wikipedia.org/wiki/Natural_logarithm" target="_blank">natural
+ logarithm</a> of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__log">
+ <code class="ph codeph">log(double base, double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the logarithm of the second argument to the specified base.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__log10">
+ <code class="ph codeph">log10(double a)</code>,
+ <code class="ph codeph" id="math_functions__dlog10">dlog10(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 10.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__log2">
+ <code class="ph codeph">log2(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the logarithm of the argument to the base 2.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__max_int">
+ <code class="ph codeph">max_int(), <span class="ph" id="math_functions__max_tinyint">max_tinyint()</span>, <span class="ph" id="math_functions__max_smallint">max_smallint()</span>,
+ <span class="ph" id="math_functions__max_bigint">max_bigint()</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+
+
+ <strong class="ph b">Purpose:</strong> Returns the largest value of the associated integral type.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+ </p>
+ <p class="p">
+
+ <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+ check if all values in a column are within the allowed range, before copying data or altering column
+ definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+ sufficient precision.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__min_int">
+ <code class="ph codeph">min_int(), <span class="ph" id="math_functions__min_tinyint">min_tinyint()</span>, <span class="ph" id="math_functions__min_smallint">min_smallint()</span>,
+ <span class="ph" id="math_functions__min_bigint">min_bigint()</span></code>
+ </dt>
+
+ <dd class="dd">
+
+
+
+
+ <strong class="ph b">Purpose:</strong> Returns the smallest value of the associated integral type (a negative number).
+ <p class="p">
+ <strong class="ph b">Return type:</strong> The same as the integral type being checked.
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Use the corresponding <code class="ph codeph">min_</code> and <code class="ph codeph">max_</code> functions to
+ check if all values in a column are within the allowed range, before copying data or altering column
+ definitions. If not, switch to the next higher integral type or to a <code class="ph codeph">DECIMAL</code> with
+ sufficient precision.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__mod">
+ <code class="ph codeph">mod(<var class="keyword varname">numeric_type</var> a, <var class="keyword varname">same_type</var> b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the modulus of a number. Equivalent to the <code class="ph codeph">%</code> arithmetic operator.
+ Works with any size integer type, any size floating-point type, and <code class="ph codeph">DECIMAL</code>
+ with any precision and scale.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Because this function works with <code class="ph codeph">DECIMAL</code> values, prefer it over <code class="ph codeph">fmod()</code>
+ when working with fractional values. It is not subject to the rounding errors that make
+ <code class="ph codeph">fmod()</code> problematic with floating-point numbers.
+ The <code class="ph codeph">%</code> arithmetic operator now uses the <code class="ph codeph">mod()</code> function
+ in cases where its arguments can be interpreted as <code class="ph codeph">DECIMAL</code> values,
+ increasing the accuracy of that operator.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how the <code class="ph codeph">mod()</code> function works for
+ whole numbers and fractional values, and how the <code class="ph codeph">%</code> operator
+ works the same way. In the case of <code class="ph codeph">mod(9.9,3)</code>,
+ the type conversion for the second argument results in the first argument
+ being interpreted as <code class="ph codeph">DOUBLE</code>, so to produce an accurate
+ <code class="ph codeph">DECIMAL</code> result requires casting the second argument
+ or writing it as a <code class="ph codeph">DECIMAL</code> literal, 3.0.
+ </p>
+<pre class="pre codeblock"><code>select mod(10,3);
++-------------+
+| fmod(10, 3) |
++-------------+
+| 1 |
++-------------+
+
+select mod(5.5,2);
++--------------+
+| fmod(5.5, 2) |
++--------------+
+| 1.5 |
++--------------+
+
+select 10 % 3;
++--------+
+| 10 % 3 |
++--------+
+| 1 |
++--------+
+
+select 5.5 % 2;
++---------+
+| 5.5 % 2 |
++---------+
+| 1.5 |
++---------+
+
+select mod(9.9,3.3);
++---------------+
+| mod(9.9, 3.3) |
++---------------+
+| 0.0 |
++---------------+
+
+select mod(9.9,3);
++--------------------+
+| mod(9.9, 3) |
++--------------------+
+| 0.8999996185302734 |
++--------------------+
+
+select mod(9.9, cast(3 as decimal(2,1)));
++-----------------------------------+
+| mod(9.9, cast(3 as decimal(2,1))) |
++-----------------------------------+
+| 0.9 |
++-----------------------------------+
+
+select mod(9.9,3.0);
++---------------+
+| mod(9.9, 3.0) |
++---------------+
+| 0.9 |
++---------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__murmur_hash">
+ <code class="ph codeph">murmur_hash(type v)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a consistent 64-bit value derived from the input argument, for convenience of
+ implementing <a class="xref" href="https://en.wikipedia.org/wiki/MurmurHash" target="_blank"> MurmurHash2</a> non-cryptographic hash function.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ You might use the return value in an application where you perform load balancing, bucketing, or some
+ other technique to divide processing or storage. This function provides a good performance for all kinds
+ of keys such as number, ascii string and UTF-8. It can be recommended as general-purpose hashing function.
+ </p>
+ <p class="p">
+ Regarding comparison of murmur_hash with fnv_hash, murmur_hash is based on Murmur2 hash algorithm and fnv_hash
+ function is based on FNV-1a hash algorithm. Murmur2 and FNV-1a can show very good randomness and performance
+ compared with well known other hash algorithms, but Murmur2 slightly show better randomness and performance than FNV-1a.
+ See <a class="xref" href="https://www.strchr.com/hash_functions" target="_blank">[1]</a><a class="xref" href="https://aras-p.info/blog/2016/08/09/More-Hash-Function-Tests" target="_blank">[2]</a><a class="xref" href="https://www.strchr.com/hash_functions" target="_blank">[3]</a> for details.
+ </p>
+ <p class="p">
+ Similar input values of different types could produce different hash values, for example the same
+ numeric value represented as <code class="ph codeph">SMALLINT</code> or <code class="ph codeph">BIGINT</code>,
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>, or <code class="ph codeph">DECIMAL(5,2)</code> or
+ <code class="ph codeph">DECIMAL(20,5)</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table h (x int, s string);
+[localhost:21000] > insert into h values (0, 'hello'), (1,'world'), (1234567890,'antidisestablishmentarianism');
+[localhost:21000] > select x, murmur_hash(x) from h;
++------------+----------------------+
+| x | murmur_hash(x) |
++------------+----------------------+
+| 0 | 6960269033020761575 |
+| 1 | -780611581681153783 |
+| 1234567890 | -5754914572385924334 |
++------------+----------------------+
+[localhost:21000] > select s, murmur_hash(s) from h;
++------------------------------+----------------------+
+| s | murmur_hash(s) |
++------------------------------+----------------------+
+| hello | 2191231550387646743 |
+| world | 5568329560871645431 |
+| antidisestablishmentarianism | -2261804666958489663 |
++------------------------------+----------------------+ </code></pre>
+ <p class="p">
+ For short argument values, the high-order bits of the result have relatively higher entropy than fnv_hash:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table b (x boolean);
+[localhost:21000] > insert into b values (true), (true), (false), (false);
+[localhost:21000] > select x, murmur_hash(x) from b;
++-------+----------------------+
+| x | murmur_hash(x) |
++-------+---------------------++
+| true | -5720937396023583481 |
+| true | -5720937396023583481 |
+| false | 6351753276682545529 |
+| false | 6351753276682545529 |
++-------+--------------------+-+</code></pre>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 2.12.0
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__negative">
+ <code class="ph codeph">negative(numeric_type a)</code>
+
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the argument with the sign reversed; returns a positive value if the argument was
+ already negative.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">-abs(a)</code> instead if you need to ensure all return values are
+ negative.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__pi">
+ <code class="ph codeph">pi()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the constant pi.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__pmod">
+ <code class="ph codeph">pmod(bigint a, bigint b), pmod(double a, double b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the positive modulus of a number.
+ Primarily for <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-656" target="_blank">HiveQL compatibility</a>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code> or <code class="ph codeph">double</code>, depending on type of arguments
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how the <code class="ph codeph">fmod()</code> function sometimes returns a negative value
+ depending on the sign of its arguments, and the <code class="ph codeph">pmod()</code> function returns the same value
+ as <code class="ph codeph">fmod()</code>, but sometimes with the sign flipped.
+ </p>
+<pre class="pre codeblock"><code>select fmod(-5,2);
++-------------+
+| fmod(-5, 2) |
++-------------+
+| -1 |
++-------------+
+
+select pmod(-5,2);
++-------------+
+| pmod(-5, 2) |
++-------------+
+| 1 |
++-------------+
+
+select fmod(-5,-2);
++--------------+
+| fmod(-5, -2) |
++--------------+
+| -1 |
++--------------+
+
+select pmod(-5,-2);
++--------------+
+| pmod(-5, -2) |
++--------------+
+| -1 |
++--------------+
+
+select fmod(5,-2);
++-------------+
+| fmod(5, -2) |
++-------------+
+| 1 |
++-------------+
+
+select pmod(5,-2);
++-------------+
+| pmod(5, -2) |
++-------------+
+| -1 |
++-------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__positive">
+ <code class="ph codeph">positive(numeric_type a)</code>
+
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the original argument unchanged (even if the argument is negative).
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Use <code class="ph codeph">abs()</code> instead if you need to ensure all return values are
+ positive.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__pow">
+ <code class="ph codeph">pow(double a, double p)</code>,
+ <code class="ph codeph" id="math_functions__power">power(double a, double p)</code>,
+ <code class="ph codeph" id="math_functions__dpow">dpow(double a, double p)</code>,
+ <code class="ph codeph" id="math_functions__fpow">fpow(double a, double p)</code>
+ </dt>
+
+ <dd class="dd">
+
+
+
+
+ <strong class="ph b">Purpose:</strong> Returns the first argument raised to the power of the second argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__precision">
+ <code class="ph codeph">precision(<var class="keyword varname">numeric_expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Computes the precision (number of decimal digits) needed to represent the type of the
+ argument expression as a <code class="ph codeph">DECIMAL</code> value.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically used in combination with the <code class="ph codeph">scale()</code> function, to determine the appropriate
+ <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to declare in a
+ <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following examples demonstrate how to check the precision and scale of numeric literals or other
+ numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+ <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+ represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+ interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] > select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3 | 0 |
++--------------+----------+
+[localhost:21000] > select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4 | 3 |
++------------------+--------------+
+[localhost:21000] > with t1 as
+ ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+ select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24 | 6 |
++--------------+----------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__quotient">
+ <code class="ph codeph">quotient(bigint numerator, bigint denominator)</code>,
+ <code class="ph codeph">quotient(double numerator, double denominator)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the first argument divided by the second argument, discarding any fractional
+ part. Avoids promoting integer arguments to <code class="ph codeph">DOUBLE</code> as happens with the <code class="ph codeph">/</code> SQL
+ operator. <span class="ph">Also includes an overload that accepts <code class="ph codeph">DOUBLE</code> arguments,
+ discards the fractional part of each argument value before dividing, and again returns <code class="ph codeph">BIGINT</code>.
+ With integer arguments, this function works the same as the <code class="ph codeph">DIV</code> operator.</span>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">bigint</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__radians">
+ <code class="ph codeph">radians(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Converts argument value from degrees to radians.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__rand">
+ <code class="ph codeph">rand()</code>, <code class="ph codeph">rand(int seed)</code>,
+ <code class="ph codeph" id="math_functions__random">random()</code>,
+ <code class="ph codeph">random(int seed)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a random value between 0 and 1. After <code class="ph codeph">rand()</code> is called with a
+ seed argument, it produces a consistent random sequence based on the seed value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Currently, the random sequence is reset after each query, and multiple calls to
+ <code class="ph codeph">rand()</code> within the same query return the same value each time. For different number
+ sequences that are different for each query, pass a unique seed value to each call to
+ <code class="ph codeph">rand()</code>. For example, <code class="ph codeph">select rand(unix_timestamp()) from ...</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how <code class="ph codeph">rand()</code> can produce sequences of varying predictability,
+ so that you can reproduce query results involving random values or generate unique sequences of random
+ values for each query.
+ When <code class="ph codeph">rand()</code> is called with no argument, it generates the same sequence of values each time,
+ regardless of the ordering of the result set.
+ When <code class="ph codeph">rand()</code> is called with a constant integer, it generates a different sequence of values,
+ but still always the same sequence for the same seed value.
+ If you pass in a seed value that changes, such as the return value of the expression <code class="ph codeph">unix_timestamp(now())</code>,
+ each query will use a different sequence of random values, potentially more useful in probability calculations although
+ more difficult to reproduce at a later time. Therefore, the final two examples with an unpredictable seed value
+ also include the seed in the result set, to make it possible to reproduce the same random sequence later.
+ </p>
+<pre class="pre codeblock"><code>select x, rand() from three_rows;
++---+-----------------------+
+| x | rand() |
++---+-----------------------+
+| 1 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144 |
+| 3 | 0.4431900859080209 |
++---+-----------------------+
+
+select x, rand() from three_rows order by x desc;
++---+-----------------------+
+| x | rand() |
++---+-----------------------+
+| 3 | 0.0004714746030380365 |
+| 2 | 0.5895895192351144 |
+| 1 | 0.4431900859080209 |
++---+-----------------------+
+
+select x, rand(1234) from three_rows order by x;
++---+----------------------+
+| x | rand(1234) |
++---+----------------------+
+| 1 | 0.7377511392057646 |
+| 2 | 0.009428468537250751 |
+| 3 | 0.208117277924026 |
++---+----------------------+
+
+select x, rand(1234) from three_rows order by x desc;
++---+----------------------+
+| x | rand(1234) |
++---+----------------------+
+| 3 | 0.7377511392057646 |
+| 2 | 0.009428468537250751 |
+| 1 | 0.208117277924026 |
++---+----------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+ from three_rows order by x;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 1 | 1440777752 | 0.002051228658320023 |
+| 2 | 1440777752 | 0.5098743483004506 |
+| 3 | 1440777752 | 0.9517714925817081 |
++---+-----------------------+-----------------------------+
+
+select x, unix_timestamp(now()), rand(unix_timestamp(now()))
+ from three_rows order by x desc;
++---+-----------------------+-----------------------------+
+| x | unix_timestamp(now()) | rand(unix_timestamp(now())) |
++---+-----------------------+-----------------------------+
+| 3 | 1440777761 | 0.9985985015512437 |
+| 2 | 1440777761 | 0.3251255333074953 |
+| 1 | 1440777761 | 0.02422675025846192 |
++---+-----------------------+-----------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__round">
+ <code class="ph codeph">round(double a)</code>,
+ <code class="ph codeph">round(double a, int d)</code>,
+ <code class="ph codeph">round(decimal a, int_type d)</code>,
+ <code class="ph codeph" id="math_functions__dround">dround(double a)</code>,
+ <code class="ph codeph">dround(double a, int d)</code>,
+ <code class="ph codeph">dround(decimal(p,s) a, int_type d)</code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Rounds a floating-point value. By default (with a
+ single argument), rounds to the nearest integer. Values ending in .5
+ are rounded up for positive numbers, down for negative numbers (that
+ is, away from zero). The optional second argument specifies how many
+ digits to leave after the decimal point; values greater than zero
+ produce a floating-point return value rounded to the requested number
+ of digits to the right of the decimal point.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input type
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__scale">
+ <code class="ph codeph">scale(<var class="keyword varname">numeric_expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Computes the scale (number of decimal digits to the right of the decimal point) needed to
+ represent the type of the argument expression as a <code class="ph codeph">DECIMAL</code> value.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically used in combination with the <code class="ph codeph">precision()</code> function, to determine the
+ appropriate <code class="ph codeph">DECIMAL(<var class="keyword varname">precision</var>,<var class="keyword varname">scale</var>)</code> type to
+ declare in a <code class="ph codeph">CREATE TABLE</code> statement or <code class="ph codeph">CAST()</code> function.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <div class="p">
+ The following examples demonstrate how to check the precision and scale of numeric literals or other
+ numeric expressions. Impala represents numeric literals in the smallest appropriate type. 5 is a
+ <code class="ph codeph">TINYINT</code> value, which ranges from -128 to 127, therefore 3 decimal digits are needed to
+ represent the entire range, and because it is an integer value there are no fractional digits. 1.333 is
+ interpreted as a <code class="ph codeph">DECIMAL</code> value, with 4 digits total and 3 digits after the decimal point.
+<pre class="pre codeblock"><code>[localhost:21000] > select precision(5), scale(5);
++--------------+----------+
+| precision(5) | scale(5) |
++--------------+----------+
+| 3 | 0 |
++--------------+----------+
+[localhost:21000] > select precision(1.333), scale(1.333);
++------------------+--------------+
+| precision(1.333) | scale(1.333) |
++------------------+--------------+
+| 4 | 3 |
++------------------+--------------+
+[localhost:21000] > with t1 as
+ ( select cast(12.34 as decimal(20,2)) x union select cast(1 as decimal(8,6)) x )
+ select precision(x), scale(x) from t1 limit 1;
++--------------+----------+
+| precision(x) | scale(x) |
++--------------+----------+
+| 24 | 6 |
++--------------+----------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__sign">
+ <code class="ph codeph">sign(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns -1, 0, or 1 to indicate the signedness of the argument value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__sin">
+ <code class="ph codeph">sin(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the sine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__sinh">
+ <code class="ph codeph">sinh(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the hyperbolic sine of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__sqrt">
+ <code class="ph codeph">sqrt(double a)</code>,
+ <code class="ph codeph" id="math_functions__dsqrt">dsqrt(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+
+ <strong class="ph b">Purpose:</strong> Returns the square root of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__tan">
+ <code class="ph codeph">tan(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the tangent of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__tanh">
+ <code class="ph codeph">tanh(double a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the hyperbolic tangent of the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__truncate">
+ <code class="ph codeph">truncate(double_or_decimal a[, digits_to_leave])</code>,
+ <span class="ph" id="math_functions__dtrunc"><code class="ph codeph">dtrunc(double_or_decimal a[, digits_to_leave])</code></span>,
+ <span class="ph" id="math_functions__trunc_number"><code class="ph codeph">trunc(double_or_decimal a[, digits_to_leave])</code></span>
+ </dt>
+
+ <dd class="dd">
+
+
+
+ <strong class="ph b">Purpose:</strong> Removes some or all fractional digits from a numeric value.
+ <p class="p">
+ <strong class="ph b">Arguments:</strong>
+ With a single floating-point argument, removes all fractional digits, leaving an
+ integer value. The optional second argument specifies the number of fractional digits
+ to include in the return value, and only applies when the argument type is
+ <code class="ph codeph">DECIMAL</code>. A second argument of 0 truncates to a whole integer value.
+ A second argument of negative N sets N digits to 0 on the left side of the decimal
+ </p>
+ <p class="p">
+ <strong class="ph b">Scale argument:</strong> The scale argument applies only when truncating
+ <code class="ph codeph">DECIMAL</code> values. It is an integer specifying how many
+ significant digits to leave to the right of the decimal point.
+ A scale argument of 0 truncates to a whole integer value. A scale
+ argument of negative N sets N digits to 0 on the left side of the decimal
+ point.
+ </p>
+ <p class="p">
+ <code class="ph codeph">truncate()</code>, <code class="ph codeph">dtrunc()</code>,
+ <span class="ph">and <code class="ph codeph">trunc()</code></span> are aliases for the
+ same function.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input type
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> The <code class="ph codeph">trunc()</code> alias was added in
+ <span class="keyword">Impala 2.10</span>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ You can also pass a <code class="ph codeph">DOUBLE</code> argument, or <code class="ph codeph">DECIMAL</code>
+ argument with optional scale, to the <code class="ph codeph">dtrunc()</code> or
+ <code class="ph codeph">truncate</code> functions. Using the <code class="ph codeph">trunc()</code>
+ function for numeric values is common with other industry-standard database
+ systems, so you might find such <code class="ph codeph">trunc()</code> calls in code that you
+ are porting to Impala.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">trunc()</code> function also has a signature that applies to
+ <code class="ph codeph">TIMESTAMP</code> values. See <a class="xref" href="impala_datetime_functions.html">Impala Date and Time Functions</a>
+ for details.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples demonstrate the <code class="ph codeph">truncate()</code>
+ and <code class="ph codeph">dtrunc()</code> signatures for this function:
+ </p>
+<pre class="pre codeblock"><code>select truncate(3.45);
++----------------+
+| truncate(3.45) |
++----------------+
+| 3 |
++----------------+
+
+select truncate(-3.45);
++-----------------+
+| truncate(-3.45) |
++-----------------+
+| -3 |
++-----------------+
+
+select truncate(3.456,1);
++--------------------+
+| truncate(3.456, 1) |
++--------------------+
+| 3.4 |
++--------------------+
+
+select dtrunc(3.456,1);
++------------------+
+| dtrunc(3.456, 1) |
++------------------+
+| 3.4 |
++------------------+
+
+select truncate(3.456,2);
++--------------------+
+| truncate(3.456, 2) |
++--------------------+
+| 3.45 |
++--------------------+
+
+select truncate(3.456,7);
++--------------------+
+| truncate(3.456, 7) |
++--------------------+
+| 3.4560000 |
++--------------------+
+</code></pre>
+ <p class="p">
+ The following examples demonstrate using <code class="ph codeph">trunc()</code> with
+ <code class="ph codeph">DECIMAL</code> or <code class="ph codeph">DOUBLE</code> values, and with
+ an optional scale argument for <code class="ph codeph">DECIMAL</code> values.
+ (The behavior is the same for the <code class="ph codeph">truncate()</code> and
+ <code class="ph codeph">dtrunc()</code> aliases also.)
+ </p>
+<pre class="pre codeblock"><code>
+create table t1 (d decimal(20,7));
+
+-- By default, no digits to the right of the decimal point.
+insert into t1 values (1.1), (2.22), (3.333), (4.4444), (5.55555);
+select trunc(d) from t1 order by d;
++----------+
+| trunc(d) |
++----------+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
++----------+
+
+-- 1 digit to the right of the decimal point.
+select trunc(d,1) from t1 order by d;
++-------------+
+| trunc(d, 1) |
++-------------+
+| 1.1 |
+| 2.2 |
+| 3.3 |
+| 4.4 |
+| 5.5 |
++-------------+
+
+-- 2 digits to the right of the decimal point,
+-- including trailing zeroes if needed.
+select trunc(d,2) from t1 order by d;
++-------------+
+| trunc(d, 2) |
++-------------+
+| 1.10 |
+| 2.22 |
+| 3.33 |
+| 4.44 |
+| 5.55 |
++-------------+
+
+insert into t1 values (9999.9999), (8888.8888);
+
+-- Negative scale truncates digits to the left
+-- of the decimal point.
+select trunc(d,-2) from t1 where d > 100 order by d;
++--------------+
+| trunc(d, -2) |
++--------------+
+| 8800 |
+| 9900 |
++--------------+
+
+-- The scale of the result is adjusted to match the
+-- scale argument.
+select trunc(d,2),
+ precision(trunc(d,2)) as p,
+ scale(trunc(d,2)) as s
+from t1 order by d;
++-------------+----+---+
+| trunc(d, 2) | p | s |
++-------------+----+---+
+| 1.10 | 15 | 2 |
+| 2.22 | 15 | 2 |
+| 3.33 | 15 | 2 |
+| 4.44 | 15 | 2 |
+| 5.55 | 15 | 2 |
+| 8888.88 | 15 | 2 |
+| 9999.99 | 15 | 2 |
++-------------+----+---+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+create table dbl (d double);
+
+insert into dbl values
+ (1.1), (2.22), (3.333), (4.4444), (5.55555),
+ (8888.8888), (9999.9999);
+
+-- With double values, there is no optional scale argument.
+select trunc(d) from dbl order by d;
++----------+
+| trunc(d) |
++----------+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 8888 |
+| 9999 |
++----------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="math_functions__unhex">
+ <code class="ph codeph">unhex(string a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a string of characters with ASCII values corresponding to pairs of hexadecimal
+ digits in the argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max.html b/docs/build3x/html/topics/impala_max.html
new file mode 100644
index 0000000..00c6e64
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max.html
@@ -0,0 +1,298 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX Function</title></head><body id="max"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns the maximum value from a set of numbers. Opposite of the
+ <code class="ph codeph">MIN</code> function. Its single argument can be numeric column, or the numeric result of a function
+ or expression applied to the column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column
+ are ignored. If the table is empty, or all the values supplied to <code class="ph codeph">MAX</code> are
+ <code class="ph codeph">NULL</code>, <code class="ph codeph">MAX</code> returns <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>MAX([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+ <p class="p">
+ When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+ grouping values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong> In Impala 2.0 and higher, this function can be used as an analytic function, but with restrictions on any window clause.
+ For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause is only allowed if the start
+ bound is <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value, except for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ arguments which produce a <code class="ph codeph">STRING</code> result
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+ <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+ query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+ for the kinds of queries that this option applies to, and slight differences in how partitions are
+ evaluated when this query option is enabled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Find the largest value for this column in the table.
+select max(c1) from t1;
+-- Find the largest value for this column from a subset of the table.
+select max(c1) from t1 where month = 'January' and year = '2013';
+-- Find the largest value from a set of numeric function results.
+select max(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, max(purchase_price) from store_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select max(distinct x) from t1;
+</code></pre>
+
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">MAX()</code> in an analytic context. They use a table
+ containing integers from 1 to 10. Notice how the <code class="ph codeph">MAX()</code> is reported for each input value, as
+ opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, max(x) over (partition by property) as max from int_t where property in ('odd','even');
++----+----------+-----+
+| x | property | max |
++----+----------+-----+
+| 2 | even | 10 |
+| 4 | even | 10 |
+| 6 | even | 10 |
+| 8 | even | 10 |
+| 10 | even | 10 |
+| 1 | odd | 9 |
+| 3 | odd | 9 |
+| 5 | odd | 9 |
+| 7 | odd | 9 |
+| 9 | odd | 9 |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">MAX()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to display the smallest value of <code class="ph codeph">X</code>
+encountered up to each row in the result set. The examples use two columns in the <code class="ph codeph">ORDER BY</code>
+clause to produce a sequence of values that rises and falls, to illustrate how the <code class="ph codeph">MAX()</code>
+result only increases or stays the same throughout each partition within the result set.
+The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+
+<pre class="pre codeblock"><code>select x, property,
+ max(x) <strong class="ph b">over (order by property, x desc)</strong> as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 7 |
+| 3 | prime | 7 |
+| 2 | prime | 7 |
+| 9 | square | 9 |
+| 4 | square | 9 |
+| 1 | square | 9 |
++---+----------+-----------------------+
+
+select x, property,
+ max(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">rows between unbounded preceding and current row</strong>
+ ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 7 |
+| 3 | prime | 7 |
+| 2 | prime | 7 |
+| 9 | square | 9 |
+| 4 | square | 9 |
+| 1 | square | 9 |
++---+----------+-----------------------+
+
+select x, property,
+ max(x) over
+ (
+ <strong class="ph b">order by property, x desc</strong>
+ <strong class="ph b">range between unbounded preceding and current row</strong>
+ ) as 'maximum to this point'
+from int_t where property in ('prime','square');
++---+----------+-----------------------+
+| x | property | maximum to this point |
++---+----------+-----------------------+
+| 7 | prime | 7 |
+| 5 | prime | 7 |
+| 3 | prime | 7 |
+| 2 | prime | 7 |
+| 9 | square | 9 |
+| 4 | square | 9 |
+| 1 | square | 9 |
++---+----------+-----------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running maximum taking into account all rows before
+and 1 row after the current row.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code> clause.
+Because of an extra Impala restriction on the <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code> functions in an
+analytic context, the lower bound must be <code class="ph codeph">UNBOUNDED PRECEDING</code>.
+<pre class="pre codeblock"><code>select x, property,
+ max(x) over
+ (
+ <strong class="ph b">order by property, x</strong>
+ <strong class="ph b">rows between unbounded preceding and 1 following</strong>
+ ) as 'local maximum'
+from int_t where property in ('prime','square');
++---+----------+---------------+
+| x | property | local maximum |
++---+----------+---------------+
+| 2 | prime | 3 |
+| 3 | prime | 5 |
+| 5 | prime | 7 |
+| 7 | prime | 7 |
+| 1 | square | 7 |
+| 4 | square | 9 |
+| 9 | square | 9 |
++---+----------+---------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+ max(x) over
+ (
+ <strong class="ph b">order by property, x</strong>
+ <strong class="ph b">range between unbounded preceding and 1 following</strong>
+ ) as 'local maximum'
+from int_t where property in ('prime','square');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_min.html#min">MIN Function</a>,
+ <a class="xref" href="impala_avg.html#avg">AVG Function</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_errors.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_errors.html b/docs/build3x/html/topics/impala_max_errors.html
new file mode 100644
index 0000000..0773474
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_errors.html
@@ -0,0 +1,40 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_errors"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_ERRORS Query Option</title></head><body id="max_errors"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_ERRORS Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Maximum number of non-fatal errors for any particular query that are recorded in the Impala log file. For
+ example, if a billion-row table had a non-fatal data error in every row, you could diagnose the problem
+ without all billion errors being logged. Unspecified or 0 indicates the built-in default value of 1000.
+ </p>
+
+ <p class="p">
+ This option only controls how many errors are reported. To specify whether Impala continues or halts when it
+ encounters such errors, use the <code class="ph codeph">ABORT_ON_ERROR</code> option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning 1000 errors)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a>,
+ <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_max_num_runtime_filters.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_max_num_runtime_filters.html b/docs/build3x/html/topics/impala_max_num_runtime_filters.html
new file mode 100644
index 0000000..8e728e8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_max_num_runtime_filters.html
@@ -0,0 +1,75 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="max_num_runtime_filters"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</title></head><body id="max_num_runtime_filters"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAX_NUM_RUNTIME_FILTERS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">MAX_NUM_RUNTIME_FILTERS</code> query option
+ sets an upper limit on the number of runtime filters that can be produced for each query.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 10
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Each runtime filter imposes some memory overhead on the query.
+ Depending on the setting of the <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code>
+ query option, each filter might consume between 1 and 16 megabytes
+ per plan fragment. There are typically 5 or fewer filters per plan fragment.
+ </p>
+
+ <p class="p">
+ Impala evaluates the effectiveness of each filter, and keeps the
+ ones that eliminate the largest number of partitions or rows.
+ Therefore, this setting can protect against
+ potential problems due to excessive memory overhead for filter production,
+ while still allowing a high level of optimization for suitable queries.
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option affects only Bloom filters, not the min/max filters
+ that are applied to Kudu tables. Therefore, it does not affect the
+ performance of queries against Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[03/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_show.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_show.html b/docs/build3x/html/topics/impala_show.html
new file mode 100644
index 0000000..6683296
--- /dev/null
+++ b/docs/build3x/html/topics/impala_show.html
@@ -0,0 +1,1525 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version"
content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="show"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SHOW Statement</title></head><body id="show"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SHOW Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">SHOW</code> statement is a flexible way to get information about different types of Impala
+ objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SHOW DATABASES [[LIKE] '<var class="keyword varname">pattern</var>']
+SHOW SCHEMAS [[LIKE] '<var class="keyword varname">pattern</var>'] - an alias for SHOW DATABASES
+SHOW TABLES [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']
+<span class="ph">SHOW [AGGREGATE | ANALYTIC] FUNCTIONS [IN <var class="keyword varname">database_name</var>] [[LIKE] '<var class="keyword varname">pattern</var>']</span>
+<span class="ph">SHOW CREATE TABLE [<var class="keyword varname">database_name</var>].<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW TABLE STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW COLUMN STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+<span class="ph">SHOW <span class="ph">[RANGE]</span> PARTITIONS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var></span>
+SHOW FILES IN [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PARTITION (<var class="keyword varname">key_col_expression</var> [, <var class="keyword varname">key_col_expression</var>]</span>]
+
+<span class="ph">SHOW ROLES
+SHOW CURRENT ROLES
+SHOW ROLE GRANT GROUP <var class="keyword varname">group_name</var>
+SHOW GRANT ROLE <var class="keyword varname">role_name</var></span>
+</code></pre>
+
+
+
+
+
+
+
+ <p class="p">
+ Issue a <code class="ph codeph">SHOW <var class="keyword varname">object_type</var></code> statement to see the appropriate objects in the
+ current database, or <code class="ph codeph">SHOW <var class="keyword varname">object_type</var> IN <var class="keyword varname">database_name</var></code>
+ to see objects in a specific database.
+ </p>
+
+ <p class="p">
+ The optional <var class="keyword varname">pattern</var> argument is a quoted string literal, using Unix-style
+ <code class="ph codeph">*</code> wildcards and allowing <code class="ph codeph">|</code> for alternation. The preceding
+ <code class="ph codeph">LIKE</code> keyword is also optional. All object names are stored in lowercase, so use all
+ lowercase letters in the pattern string. For example:
+ </p>
+
+<pre class="pre codeblock"><code>show databases 'a*';
+show databases like 'a*';
+show tables in some_db like '*fact*';
+use some_db;
+show tables '*dim*|*fact*';</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="show__show_files">
+
+ <h2 class="title topictitle2" id="ariaid-title2">SHOW FILES Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW FILES</code> statement displays the files that constitute a specified table,
+ or a partition within a partitioned table. This syntax is available in <span class="keyword">Impala 2.2</span> and higher
+ only. The output includes the names of the files, the size of each file, and the applicable partition
+ for a partitioned table. The size includes a suffix of <code class="ph codeph">B</code> for bytes,
+ <code class="ph codeph">MB</code> for megabytes, and <code class="ph codeph">GB</code> for gigabytes.
+ </p>
+
+ <div class="p">
+ In <span class="keyword">Impala 2.8</span> and higher, you can use general
+ expressions with operators such as <code class="ph codeph"><</code>, <code class="ph codeph">IN</code>,
+ <code class="ph codeph">LIKE</code>, and <code class="ph codeph">BETWEEN</code> in the <code class="ph codeph">PARTITION</code>
+ clause, instead of only equality operators. For example:
+<pre class="pre codeblock"><code>
+show files in sample_table partition (j < 5);
+show files in sample_table partition (k = 3, l between 1 and 10);
+show files in sample_table partition (month like 'J%');
+
+</code></pre>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ This statement applies to tables and partitions stored on HDFS, or in the Amazon Simple Storage System (S3).
+ It does not apply to views.
+ It does not apply to tables mapped onto HBase <span class="ph">or Kudu</span>,
+ because those data management systems do not use the same file-based storage layout.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ You can use this statement to verify the results of your ETL process: that is, that
+ the expected files are present, with the expected sizes. You can examine the file information
+ to detect conditions such as empty files, missing files, or inefficient layouts due to
+ a large number of small files. When you use <code class="ph codeph">INSERT</code> statements to copy
+ from one table to another, you can see how the file layout changes due to file format
+ conversions, compaction of small input files into large data blocks, and
+ multiple output files from parallel queries and partitioned inserts.
+ </p>
+
+ <p class="p">
+ The output from this statement does not include files that Impala considers to be hidden
+ or invisible, such as those whose names start with a dot or an underscore, or that
+ end with the suffixes <code class="ph codeph">.copying</code> or <code class="ph codeph">.tmp</code>.
+ </p>
+
+ <p class="p">
+ The information for partitioned tables complements the output of the <code class="ph codeph">SHOW PARTITIONS</code>
+ statement, which summarizes information about each partition. <code class="ph codeph">SHOW PARTITIONS</code>
+ produces some output for each partition, while <code class="ph codeph">SHOW FILES</code> does not
+ produce any output for empty partitions because they do not include any data files.
+ </p>
+
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read
+ permission for all the table files, read and execute permission for all the directories that make up the table,
+ and execute permission for the database directory and all its parent directories.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a <code class="ph codeph">SHOW FILES</code> statement
+ for an unpartitioned table using text format:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table unpart_text (x bigint, s string);
+[localhost:21000] > insert into unpart_text (x, s) select id, name
+ > from oreilly.sample_data limit 20e6;
+[localhost:21000] > show files in unpart_text;
++------------------------------------------------------------------------------+----------+-----------+
+| path | size | partition |
++------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB | |
++------------------------------------------------------------------------------+----------+-----------+
+[localhost:21000] > insert into unpart_text (x, s) select id, name from oreilly.sample_data limit 100e6;
+[localhost:21000] > show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+-----------+
+| path | size | partition |
++--------------------------------------------------------------------------------------+----------+-----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB | |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB | |
++--------------------------------------------------------------------------------------+----------+-----------+
+</code></pre>
+
+ <p class="p">
+ This example illustrates how, after issuing some <code class="ph codeph">INSERT ... VALUES</code> statements,
+ the table now contains some tiny files of just a few bytes. Such small files could cause inefficient processing of
+ parallel queries that are expecting multi-megabyte input files. The example shows how you might compact the small files by doing
+ an <code class="ph codeph">INSERT ... SELECT</code> into a different table, possibly converting the data to Parquet in the process:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > insert into unpart_text values (10,'hello'), (20, 'world');
+[localhost:21000] > insert into unpart_text values (-1,'foo'), (-1000, 'bar');
+[localhost:21000] > show files in unpart_text;
++--------------------------------------------------------------------------------------+----------+
+| path | size |
++--------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/4f11b8bdf8b6aa92_238145083_data.0. | 18B
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/35665776ef85cfaf_1012432410_data.0. | 448.31MB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/ac3dba252a8952b8_1663177415_data.0. | 2.19GB
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_text/cfb8252452445682_1868457216_data.0. | 17B
++--------------------------------------------------------------------------------------+----------+
+[localhost:21000] > create table unpart_parq stored as parquet as select * from unpart_text;
++---------------------------+
+| summary |
++---------------------------+
+| Inserted 120000002 row(s) |
++---------------------------+
+[localhost:21000] > show files in unpart_parq;
++------------------------------------------------------------------------------------------+----------+
+| path | size |
++------------------------------------------------------------------------------------------+----------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.0.parq | 255.36MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630184_549959007_data.1.parq | 178.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.0.parq | 255.37MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630185_549959007_data.1.parq | 57.71MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630186_2141167244_data.1.parq | 175.52MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.0.parq | 255.40MB |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/unpart_parq/60798d96ba630187_1006832086_data.1.parq | 214.61MB |
++------------------------------------------------------------------------------------------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned text table
+ with data in two different partitions, and two empty partitions.
+ The partitions with no data are not represented in the <code class="ph codeph">SHOW FILES</code> output.
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table part_text (x bigint, y int, s string)
+ > partitioned by (year bigint, month bigint, day bigint);
+[localhost:21000] > insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=1)
+ > select id, val, name from oreilly.normalized_parquet
+where id between 1 and 1000000;
+[localhost:21000] > insert overwrite part_text (x, y, s) partition (year=2014,month=1,day=2)
+ > select id, val, name from oreilly.normalized_parquet
+ > where id between 1000001 and 2000000;
+[localhost:21000] > alter table part_text add partition (year=2014,month=1,day=3);
+[localhost:21000] > alter table part_text add partition (year=2014,month=1,day=4);
+[localhost:21000] > show partitions part_text;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+| 2014 | 1 | 1 | -1 | 4 | 25.16MB | NOT CACHED | NOT CACHED | TEXT | false |
+| 2014 | 1 | 2 | -1 | 4 | 26.22MB | NOT CACHED | NOT CACHED | TEXT | false |
+| 2014 | 1 | 3 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false |
+| 2014 | 1 | 4 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false |
+| Total | | | -1 | 8 | 51.38MB | 0B | | | |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+--------+-------------------+
+[localhost:21000] > show files in part_text;
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| path | size | partition |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc80689f_1418645991_data.0. | 5.77MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a0_1418645991_data.0. | 6.25MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a1_147082319_data.0. | 7.16MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=1/80732d9dc8068a2_2111411753_data.0. | 5.98MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbb_501271652_data.0. | 6.42MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbc_501271652_data.0. | 6.62MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbd_1393490200_data.0. | 6.98MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_text/year=2014/month=1/day=2/21a828cf494b5bbe_1393490200_data.0. | 6.20MB | year=2014/month=1/day=2 |
++---------------------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+ <p class="p">
+ The following example shows a <code class="ph codeph">SHOW FILES</code> statement for a partitioned Parquet table.
+ The number and sizes of files are different from the equivalent partitioned text table
+ used in the previous example, because <code class="ph codeph">INSERT</code> operations for Parquet tables
+ are parallelized differently than for text tables. (Also, the amount of data is so small
+ that it can be written to Parquet without involving all the hosts in this 4-node cluster.)
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table part_parq (x bigint, y int, s string) partitioned by (year bigint, month bigint, day bigint) stored as parquet;
+[localhost:21000] > insert into part_parq partition (year,month,day) select x, y, s, year, month, day from partitioned_text;
+[localhost:21000] > show partitions part_parq;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+| 2014 | 1 | 1 | -1 | 3 | 17.89MB | NOT CACHED | NOT CACHED | PARQUET | false |
+| 2014 | 1 | 2 | -1 | 3 | 17.89MB | NOT CACHED | NOT CACHED | PARQUET | false |
+| Total | | | -1 | 6 | 35.79MB | 0B | | | |
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+-------------------+
+[localhost:21000] > show files in part_parq;
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| path | size | partition |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/1134113650_data.0.parq | 4.49MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/617567880_data.0.parq | 5.14MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=1/2099499416_data.0.parq | 8.27MB | year=2014/month=1/day=1 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/945567189_data.0.parq | 8.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/2145850112_data.0.parq | 4.80MB | year=2014/month=1/day=2 |
+| hdfs://<var class="keyword varname">impala_data_dir</var>/show_files.db/part_parq/year=2014/month=1/day=2/665613448_data.0.parq | 4.29MB | year=2014/month=1/day=2 |
++-----------------------------------------------------------------------------------------------+--------+-------------------------+
+</code></pre>
+<p class="p">
+ The following example shows output from the <code class="ph codeph">SHOW FILES</code> statement
+ for a table where the data files are stored in Amazon S3:
+</p>
+<pre class="pre codeblock"><code>[localhost:21000] > show files in s3_testing.sample_data_s3;
++-----------------------------------------------------------------------+---------+
+| path | size |
++-----------------------------------------------------------------------+---------+
+| s3a://impala-demo/sample_data/e065453cba1988a6_1733868553_data.0.parq | 24.84MB |
++-----------------------------------------------------------------------+---------+
+</code></pre>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="show__show_roles">
+
+ <h2 class="title topictitle2" id="ariaid-title3">SHOW ROLES Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW ROLES</code> statement displays roles. This syntax is available in <span class="keyword">Impala 2.0</span> and later
+ only, when you are using the Sentry authorization framework along with the Sentry service, as described in
+ <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+ with privileges defined in a policy file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+ output might look something like this:
+ </p>
+
+<pre class="pre codeblock"><code>show roles;
++-----------+
+| role_name |
++-----------+
+| analyst |
+| role1 |
+| sales |
+| superuser |
+| test_role |
++-----------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="show__show_current_role">
+
+ <h2 class="title topictitle2" id="ariaid-title4">SHOW CURRENT ROLE</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW CURRENT ROLE</code> statement displays roles assigned to the current user. This syntax
+ is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework along with
+ the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not
+ apply when you use the Sentry framework with privileges defined in a policy file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ Depending on the roles set up within your organization by the <code class="ph codeph">CREATE ROLE</code> statement, the
+ output might look something like this:
+ </p>
+
+<pre class="pre codeblock"><code>show current roles;
++-----------+
+| role_name |
++-----------+
+| role1 |
+| superuser |
++-----------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="show__show_role_grant">
+
+ <h2 class="title topictitle2" id="ariaid-title5">SHOW ROLE GRANT Statement</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">SHOW ROLE GRANT</code> statement lists all the roles assigned to the specified group. This
+ statement is only allowed for Sentry administrative users and others users that are part of the specified
+ group. This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization
+ framework along with the Sentry service, as described in
+ <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It does not apply when you use the Sentry framework
+ with privileges defined in a policy file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="show__show_grant_role">
+
+ <h2 class="title topictitle2" id="ariaid-title6">SHOW GRANT ROLE Statement</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">SHOW GRANT ROLE</code> statement list all the grants for the given role name. This statement
+ is only allowed for Sentry administrative users and other users that have been granted the specified role.
+ This syntax is available in <span class="keyword">Impala 2.0</span> and later only, when you are using the Sentry authorization framework
+ along with the Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>. It
+ does not apply when you use the Sentry framework with privileges defined in a policy file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="show__show_databases">
+
+ <h2 class="title topictitle2" id="ariaid-title7">SHOW DATABASES</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW DATABASES</code> statement is often the first one you issue when connecting to an
+ instance for the first time. You typically issue <code class="ph codeph">SHOW DATABASES</code> to see the names you can
+ specify in a <code class="ph codeph">USE <var class="keyword varname">db_name</var></code> statement, then after switching to a database
+ you issue <code class="ph codeph">SHOW TABLES</code> to see the names you can specify in <code class="ph codeph">SELECT</code> and
+ <code class="ph codeph">INSERT</code> statements.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, the output includes a second column showing any associated comment
+ for each database.
+ </p>
+
+ <p class="p">
+ The output of <code class="ph codeph">SHOW DATABASES</code> includes the special <code class="ph codeph">_impala_builtins</code>
+ database, which lets you view definitions of built-in functions, as described under <code class="ph codeph">SHOW
+ FUNCTIONS</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example shows how you might locate a particular table on an unfamiliar system. The
+ <code class="ph codeph">DEFAULT</code> database is the one you initially connect to; a database with that name is present
+ on every system. You can issue <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code> without going
+ into a database, or <code class="ph codeph">SHOW TABLES</code> once you are inside a particular database.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > show databases;
++------------------+----------------------------------------------+
+| name | comment |
++------------------+----------------------------------------------+
+| _impala_builtins | System database for Impala builtin functions |
+| default | Default Hive database |
+| file_formats | |
++------------------+----------------------------------------------+
+Returned 3 row(s) in 0.02s
+[localhost:21000] > show tables in file_formats;
++--------------------+
+| name |
++--------------------+
+| parquet_table |
+| rcfile_table |
+| sequencefile_table |
+| textfile_table |
++--------------------+
+Returned 4 row(s) in 0.01s
+[localhost:21000] > use file_formats;
+[localhost:21000] > show tables like '*parq*';
++--------------------+
+| name |
++--------------------+
+| parquet_table |
++--------------------+
+Returned 1 row(s) in 0.01s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, <a class="xref" href="impala_use.html#use">USE Statement</a>
+ <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>,
+ <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="show__show_tables">
+
+ <h2 class="title topictitle2" id="ariaid-title8">SHOW TABLES Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Displays the names of tables. By default, lists tables in the current database, or with the
+ <code class="ph codeph">IN</code> clause, in a specified database. By default, lists all tables, or with the
+ <code class="ph codeph">LIKE</code> clause, only those whose name match a pattern with <code class="ph codeph">*</code> wildcards.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples demonstrate the <code class="ph codeph">SHOW TABLES</code> statement.
+ If the database contains no tables, the result set is empty.
+ If the database does contain tables, <code class="ph codeph">SHOW TABLES IN <var class="keyword varname">db_name</var></code>
+ lists all the table names. <code class="ph codeph">SHOW TABLES</code> with no qualifiers lists
+ all the table names in the current database.
+ </p>
+
+<pre class="pre codeblock"><code>create database empty_db;
+show tables in empty_db;
+Fetched 0 row(s) in 0.11s
+
+create database full_db;
+create table full_db.t1 (x int);
+create table full_db.t2 like full_db.t1;
+
+show tables in full_db;
++------+
+| name |
++------+
+| t1 |
+| t2 |
++------+
+
+use full_db;
+show tables;
++------+
+| name |
++------+
+| t1 |
+| t2 |
++------+
+</code></pre>
+
+ <p class="p">
+ This example demonstrates how <code class="ph codeph">SHOW TABLES LIKE '<var class="keyword varname">wildcard_pattern</var>'</code>
+ lists table names that match a pattern, or multiple alternative patterns.
+ The ability to do wildcard matches for table names makes it helpful to establish naming conventions for tables to
+ conveniently locate a group of related tables.
+ </p>
+
+<pre class="pre codeblock"><code>create table fact_tbl (x int);
+create table dim_tbl_1 (s string);
+create table dim_tbl_2 (s string);
+
+/* Asterisk is the wildcard character. Only 2 out of the 3 just-created tables are returned. */
+show tables like 'dim*';
++-----------+
+| name |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* We are already in the FULL_DB database, but just to be sure we can specify the database name also. */
+show tables in full_db like 'dim*';
++-----------+
+| name |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
++-----------+
+
+/* The pipe character separates multiple wildcard patterns. */
+show tables like '*dim*|t*';
++-----------+
+| name |
++-----------+
+| dim_tbl_1 |
+| dim_tbl_2 |
+| t1 |
+| t2 |
++-----------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+ <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+ <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+ <a class="xref" href="impala_show.html#show_functions">SHOW FUNCTIONS Statement</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="show__show_create_table">
+
+ <h2 class="title topictitle2" id="ariaid-title9">SHOW CREATE TABLE Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ As a schema changes over time, you might run a <code class="ph codeph">CREATE TABLE</code> statement followed by several
+ <code class="ph codeph">ALTER TABLE</code> statements. To capture the cumulative effect of all those statements,
+ <code class="ph codeph">SHOW CREATE TABLE</code> displays a <code class="ph codeph">CREATE TABLE</code> statement that would reproduce
+ the current structure of a table. You can use this output in scripts that set up or clone a group of
+ tables, rather than trying to reproduce the original sequence of <code class="ph codeph">CREATE TABLE</code> and
+ <code class="ph codeph">ALTER TABLE</code> statements. When creating variations on the original table, or cloning the
+ original table on a different system, you might need to edit the <code class="ph codeph">SHOW CREATE TABLE</code> output
+ to change things such as the database name, <code class="ph codeph">LOCATION</code> field, and so on that might be
+ different on the destination system.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ For Kudu tables:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The column specifications include attributes such as <code class="ph codeph">NULL</code>,
+ <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">ENCODING</code>, and <code class="ph codeph">COMPRESSION</code>.
+ If you do not specify those attributes in the original <code class="ph codeph">CREATE TABLE</code> statement,
+ the <code class="ph codeph">SHOW CREATE TABLE</code> output displays the defaults that were used.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The specifications of any <code class="ph codeph">RANGE</code> clauses are not displayed in full.
+ To see the definition of the range clauses for a Kudu table, use the <code class="ph codeph">SHOW RANGE PARTITIONS</code> statement.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">TBLPROPERTIES</code> output reflects the Kudu master address
+ and the internal Kudu name associated with the Impala table.
+ </p>
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>
+show CREATE TABLE numeric_grades_default_letter;
++------------------------------------------------------------------------------------------------+
+| result |
++------------------------------------------------------------------------------------------------+
+| CREATE TABLE user.numeric_grades_default_letter ( |
+| score TINYINT NOT NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+| letter_grade STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION DEFAULT '-', |
+| student STRING NULL ENCODING AUTO_ENCODING COMPRESSION DEFAULT_COMPRESSION, |
+| PRIMARY KEY (score) |
+| ) |
+| PARTITION BY <strong class="ph b">RANGE (score) (...)</strong> |
+| STORED AS KUDU |
+| TBLPROPERTIES ('kudu.master_addresses'='vd0342.example.com:7051', |
+| 'kudu.table_name'='impala::USER.numeric_grades_default_letter') |
++------------------------------------------------------------------------------------------------+
+
+show range partitions numeric_grades_default_letter;
++--------------------+
+| RANGE (score) |
++--------------------+
+| 0 <= VALUES < 50 |
+| 50 <= VALUES < 65 |
+| 65 <= VALUES < 80 |
+| 80 <= VALUES < 100 |
++--------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how various clauses from the <code class="ph codeph">CREATE TABLE</code> statement are
+ represented in the output of <code class="ph codeph">SHOW CREATE TABLE</code>.
+ </p>
+
+<pre class="pre codeblock"><code>create table show_create_table_demo (id int comment "Unique ID", y double, s string)
+ partitioned by (year smallint)
+ stored as parquet;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo ( |
+| id INT COMMENT 'Unique ID', |
+| y DOUBLE, |
+| s STRING |
+| ) |
+| PARTITIONED BY ( |
+| year SMALLINT |
+| ) |
+| STORED AS PARQUET |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/scratch.db/show_create_table_demo' |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152582') |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how, after a sequence of <code class="ph codeph">ALTER TABLE</code> statements, the output
+ from <code class="ph codeph">SHOW CREATE TABLE</code> represents the current state of the table. This output could be
+ used to create a matching table rather than executing the original <code class="ph codeph">CREATE TABLE</code> and
+ sequence of <code class="ph codeph">ALTER TABLE</code> statements.
+ </p>
+
+<pre class="pre codeblock"><code>alter table show_create_table_demo drop column s;
+alter table show_create_table_demo set fileformat textfile;
+
+show create table show_create_table_demo;
++----------------------------------------------------------------------------------------+
+| result |
++----------------------------------------------------------------------------------------+
+| CREATE TABLE scratch.show_create_table_demo ( |
+| id INT COMMENT 'Unique ID', |
+| y DOUBLE |
+| ) |
+| PARTITIONED BY ( |
+| year SMALLINT |
+| ) |
+| STORED AS TEXTFILE |
+| LOCATION 'hdfs://127.0.0.1:8020/user/hive/warehouse/demo.db/show_create_table_demo' |
+| TBLPROPERTIES ('transient_lastDdlTime'='1418152638') |
++----------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>,
+ <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="show__show_table_stats">
+
+ <h2 class="title topictitle2" id="ariaid-title10">SHOW TABLE STATS Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+ tuning performance and diagnosing performance issues, especially with the largest tables and the most
+ complex join queries.
+ </p>
+
+ <p class="p">
+ Any values that are not available (because the <code class="ph codeph">COMPUTE STATS</code> statement has not been run
+ yet) are displayed as <code class="ph codeph">-1</code>.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">SHOW TABLE STATS</code> provides some general information about the table, such as the number of
+ files, overall size of the data, whether some or all of the data is in the HDFS cache, and the file format,
+ that is useful whether or not you have run the <code class="ph codeph">COMPUTE STATS</code> statement. A
+ <code class="ph codeph">-1</code> in the <code class="ph codeph">#Rows</code> output column indicates that the <code class="ph codeph">COMPUTE
+ STATS</code> statement has never been run for this table. If the table is partitioned, <code class="ph codeph">SHOW TABLE
+ STATS</code> provides this information for each partition. (It produces the same output as the
+ <code class="ph codeph">SHOW PARTITIONS</code> statement in this case.)
+ </p>
+
+ <p class="p">
+ The output of <code class="ph codeph">SHOW COLUMN STATS</code> is primarily only useful after the <code class="ph codeph">COMPUTE
+ STATS</code> statement has been run on the table. A <code class="ph codeph">-1</code> in the <code class="ph codeph">#Distinct
+ Values</code> output column indicates that the <code class="ph codeph">COMPUTE STATS</code> statement has never been
+ run for this table. Currently, Impala always leaves the <code class="ph codeph">#Nulls</code> column as
+ <code class="ph codeph">-1</code>, even after <code class="ph codeph">COMPUTE STATS</code> has been run.
+ </p>
+
+ <p class="p">
+ These <code class="ph codeph">SHOW</code> statements work on actual tables only, not on views.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because Kudu tables do not have characteristics derived from HDFS, such
+ as number of files, file format, and HDFS cache status, the output of
+ <code class="ph codeph">SHOW TABLE STATS</code> reflects different characteristics
+ that apply to Kudu tables. If the Kudu table is created with the
+ clause <code class="ph codeph">PARTITIONS 20</code>, then the result set of
+ <code class="ph codeph">SHOW TABLE STATS</code> consists of 20 rows, each representing
+ one of the numbered partitions. For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+show table stats kudu_table;
++--------+-----------+----------+-----------------------+------------+
+| # Rows | Start Key | Stop Key | Leader Replica | # Replicas |
++--------+-----------+----------+-----------------------+------------+
+| -1 | | 00000001 | host.example.com:7050 | 3 |
+| -1 | 00000001 | 00000002 | host.example.com:7050 | 3 |
+| -1 | 00000002 | 00000003 | host.example.com:7050 | 3 |
+| -1 | 00000003 | 00000004 | host.example.com:7050 | 3 |
+| -1 | 00000004 | 00000005 | host.example.com:7050 | 3 |
+...
+</code></pre>
+
+ <p class="p">
+ Impala does not compute the number of rows for each partition for
+ Kudu tables. Therefore, you do not need to re-run <code class="ph codeph">COMPUTE STATS</code>
+ when you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+ <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+ all Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">SHOW TABLE STATS</code> statement displays physical
+ information about a table and the associated data files:
+ </p>
+
+<pre class="pre codeblock"><code>show table stats store_sales;
++-------+--------+----------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+----------+--------------+--------+-------------------+
+| -1 | 1 | 370.45MB | NOT CACHED | TEXT | false |
++-------+--------+----------+--------------+--------+-------------------+
+
+show table stats customer;
++-------+--------+---------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+---------+--------------+--------+-------------------+
+| -1 | 1 | 12.60MB | NOT CACHED | TEXT | false |
++-------+--------+---------+--------------+--------+-------------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how, after a <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+ STATS</code> statement, the <code class="ph codeph">#Rows</code> field is now filled in. Because the
+ <code class="ph codeph">STORE_SALES</code> table in this example is not partitioned, the <code class="ph codeph">COMPUTE INCREMENTAL
+ STATS</code> statement produces regular stats rather than incremental stats, therefore the
+ <code class="ph codeph">Incremental stats</code> field remains <code class="ph codeph">false</code>.
+ </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+show table stats customer;
++--------+--------+---------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++--------+--------+---------+--------------+--------+-------------------+
+| 100000 | 1 | 12.60MB | NOT CACHED | TEXT | false |
++--------+--------+---------+--------------+--------+-------------------+
+
+compute incremental stats store_sales;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show table stats store_sales;
++---------+--------+----------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++---------+--------+----------+--------------+--------+-------------------+
+| 2880404 | 1 | 370.45MB | NOT CACHED | TEXT | false |
++---------+--------+----------+--------------+--------+-------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ The Impala user must also have execute
+ permission for the database directory, and any parent directories of the database directory in HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="show__show_column_stats">
+
+ <h2 class="title topictitle2" id="ariaid-title11">SHOW COLUMN STATS Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW COLUMN STATS</code> variants are important for
+ tuning performance and diagnosing performance issues, especially with the largest tables and the most
+ complex join queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ The output for <code class="ph codeph">SHOW COLUMN STATS</code> includes
+ the relevant information for Kudu tables.
+ The information for column statistics that originates in the
+ underlying Kudu storage layer is also represented in the
+ metastore database that Impala uses.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+ before the <code class="ph codeph">COMPUTE STATS</code> statement is run. Impala deduces some information, such as
+ maximum and average size for fixed-length columns, and leaves and unknown values as <code class="ph codeph">-1</code>.
+ </p>
+
+<pre class="pre codeblock"><code>show column stats customer;
++------------------------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------------+--------+------------------+--------+----------+----------+
+| c_customer_sk | INT | -1 | -1 | 4 | 4 |
+| c_customer_id | STRING | -1 | -1 | -1 | -1 |
+| c_current_cdemo_sk | INT | -1 | -1 | 4 | 4 |
+| c_current_hdemo_sk | INT | -1 | -1 | 4 | 4 |
+| c_current_addr_sk | INT | -1 | -1 | 4 | 4 |
+| c_first_shipto_date_sk | INT | -1 | -1 | 4 | 4 |
+| c_first_sales_date_sk | INT | -1 | -1 | 4 | 4 |
+| c_salutation | STRING | -1 | -1 | -1 | -1 |
+| c_first_name | STRING | -1 | -1 | -1 | -1 |
+| c_last_name | STRING | -1 | -1 | -1 | -1 |
+| c_preferred_cust_flag | STRING | -1 | -1 | -1 | -1 |
+| c_birth_day | INT | -1 | -1 | 4 | 4 |
+| c_birth_month | INT | -1 | -1 | 4 | 4 |
+| c_birth_year | INT | -1 | -1 | 4 | 4 |
+| c_birth_country | STRING | -1 | -1 | -1 | -1 |
+| c_login | STRING | -1 | -1 | -1 | -1 |
+| c_email_address | STRING | -1 | -1 | -1 | -1 |
+| c_last_review_date | STRING | -1 | -1 | -1 | -1 |
++------------------------+--------+------------------+--------+----------+----------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk | INT | -1 | -1 | 4 | 4 |
+| ss_sold_time_sk | INT | -1 | -1 | 4 | 4 |
+| ss_item_sk | INT | -1 | -1 | 4 | 4 |
+| ss_customer_sk | INT | -1 | -1 | 4 | 4 |
+| ss_cdemo_sk | INT | -1 | -1 | 4 | 4 |
+| ss_hdemo_sk | INT | -1 | -1 | 4 | 4 |
+| ss_addr_sk | INT | -1 | -1 | 4 | 4 |
+| ss_store_sk | INT | -1 | -1 | 4 | 4 |
+| ss_promo_sk | INT | -1 | -1 | 4 | 4 |
+| ss_ticket_number | INT | -1 | -1 | 4 | 4 |
+| ss_quantity | INT | -1 | -1 | 4 | 4 |
+| ss_wholesale_cost | FLOAT | -1 | -1 | 4 | 4 |
+| ss_list_price | FLOAT | -1 | -1 | 4 | 4 |
+| ss_sales_price | FLOAT | -1 | -1 | 4 | 4 |
+| ss_ext_discount_amt | FLOAT | -1 | -1 | 4 | 4 |
+| ss_ext_sales_price | FLOAT | -1 | -1 | 4 | 4 |
+| ss_ext_wholesale_cost | FLOAT | -1 | -1 | 4 | 4 |
+| ss_ext_list_price | FLOAT | -1 | -1 | 4 | 4 |
+| ss_ext_tax | FLOAT | -1 | -1 | 4 | 4 |
+| ss_coupon_amt | FLOAT | -1 | -1 | 4 | 4 |
+| ss_net_paid | FLOAT | -1 | -1 | 4 | 4 |
+| ss_net_paid_inc_tax | FLOAT | -1 | -1 | 4 | 4 |
+| ss_net_profit | FLOAT | -1 | -1 | 4 | 4 |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+ <p class="p">
+ The following examples show the output of the <code class="ph codeph">SHOW COLUMN STATS</code> statement for some tables,
+ after the <code class="ph codeph">COMPUTE STATS</code> statement is run. Now most of the <code class="ph codeph">-1</code> values are
+ changed to reflect the actual table data. The <code class="ph codeph">#Nulls</code> column remains <code class="ph codeph">-1</code>
+ because Impala does not use the number of <code class="ph codeph">NULL</code> values to influence query planning.
+ </p>
+
+<pre class="pre codeblock"><code>compute stats customer;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 1 partition(s) and 18 column(s). |
++------------------------------------------+
+
+compute stats store_sales;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 1 partition(s) and 23 column(s). |
++------------------------------------------+
+
+show column stats customer;
++------------------------+--------+------------------+--------+----------+--------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------------+--------+------------------+--------+----------+--------+
+| c_customer_sk | INT | 139017 | -1 | 4 | 4 |
+| c_customer_id | STRING | 111904 | -1 | 16 | 16 |
+| c_current_cdemo_sk | INT | 95837 | -1 | 4 | 4 |
+| c_current_hdemo_sk | INT | 8097 | -1 | 4 | 4 |
+| c_current_addr_sk | INT | 57334 | -1 | 4 | 4 |
+| c_first_shipto_date_sk | INT | 4374 | -1 | 4 | 4 |
+| c_first_sales_date_sk | INT | 4409 | -1 | 4 | 4 |
+| c_salutation | STRING | 7 | -1 | 4 | 3.1308 |
+| c_first_name | STRING | 3887 | -1 | 11 | 5.6356 |
+| c_last_name | STRING | 4739 | -1 | 13 | 5.9106 |
+| c_preferred_cust_flag | STRING | 3 | -1 | 1 | 0.9656 |
+| c_birth_day | INT | 31 | -1 | 4 | 4 |
+| c_birth_month | INT | 12 | -1 | 4 | 4 |
+| c_birth_year | INT | 71 | -1 | 4 | 4 |
+| c_birth_country | STRING | 205 | -1 | 20 | 8.4001 |
+| c_login | STRING | 1 | -1 | 0 | 0 |
+| c_email_address | STRING | 94492 | -1 | 46 | 26.485 |
+| c_last_review_date | STRING | 349 | -1 | 7 | 6.7561 |
++------------------------+--------+------------------+--------+----------+--------+
+
+show column stats store_sales;
++-----------------------+-------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------------------+-------+------------------+--------+----------+----------+
+| ss_sold_date_sk | INT | 4395 | -1 | 4 | 4 |
+| ss_sold_time_sk | INT | 63617 | -1 | 4 | 4 |
+| ss_item_sk | INT | 19463 | -1 | 4 | 4 |
+| ss_customer_sk | INT | 122720 | -1 | 4 | 4 |
+| ss_cdemo_sk | INT | 242982 | -1 | 4 | 4 |
+| ss_hdemo_sk | INT | 8097 | -1 | 4 | 4 |
+| ss_addr_sk | INT | 70770 | -1 | 4 | 4 |
+| ss_store_sk | INT | 6 | -1 | 4 | 4 |
+| ss_promo_sk | INT | 355 | -1 | 4 | 4 |
+| ss_ticket_number | INT | 304098 | -1 | 4 | 4 |
+| ss_quantity | INT | 105 | -1 | 4 | 4 |
+| ss_wholesale_cost | FLOAT | 9600 | -1 | 4 | 4 |
+| ss_list_price | FLOAT | 22191 | -1 | 4 | 4 |
+| ss_sales_price | FLOAT | 20693 | -1 | 4 | 4 |
+| ss_ext_discount_amt | FLOAT | 228141 | -1 | 4 | 4 |
+| ss_ext_sales_price | FLOAT | 433550 | -1 | 4 | 4 |
+| ss_ext_wholesale_cost | FLOAT | 406291 | -1 | 4 | 4 |
+| ss_ext_list_price | FLOAT | 574871 | -1 | 4 | 4 |
+| ss_ext_tax | FLOAT | 91806 | -1 | 4 | 4 |
+| ss_coupon_amt | FLOAT | 228141 | -1 | 4 | 4 |
+| ss_net_paid | FLOAT | 493107 | -1 | 4 | 4 |
+| ss_net_paid_inc_tax | FLOAT | 653523 | -1 | 4 | 4 |
+| ss_net_profit | FLOAT | 611934 | -1 | 4 | 4 |
++-----------------------+-------+------------------+--------+----------+----------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ The Impala user must also have execute
+ permission for the database directory, and any parent directories of the database directory in HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="show__show_partitions">
+
+ <h2 class="title topictitle2" id="ariaid-title12">SHOW PARTITIONS Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ <code class="ph codeph">SHOW PARTITIONS</code> displays information about each partition for a partitioned table. (The
+ output is the same as the <code class="ph codeph">SHOW TABLE STATS</code> statement, but <code class="ph codeph">SHOW PARTITIONS</code>
+ only works on a partitioned table.) Because it displays table statistics for all partitions, the output is
+ more informative if you have run the <code class="ph codeph">COMPUTE STATS</code> statement after creating all the
+ partitions. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details. For example, on a
+ <code class="ph codeph">CENSUS</code> table partitioned on the <code class="ph codeph">YEAR</code> column:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ The optional <code class="ph codeph">RANGE</code> clause only applies to Kudu tables. It displays only the partitions
+ defined by the <code class="ph codeph">RANGE</code> clause of <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code>.
+ </p>
+
+ <p class="p">
+ Although you can specify <code class="ph codeph"><</code> or
+ <code class="ph codeph"><=</code> comparison operators when defining
+ range partitions for Kudu tables, Kudu rewrites them if necessary
+ to represent each range as
+ <code class="ph codeph"><var class="keyword varname">low_bound</var> <= VALUES < <var class="keyword varname">high_bound</var></code>.
+ This rewriting might involve incrementing one of the boundary values
+ or appending a <code class="ph codeph">\0</code> for string values, so that the
+ partition covers the same range as originally specified.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows the output for a Parquet, text, or other
+ HDFS-backed table partitioned on the <code class="ph codeph">YEAR</code> column:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > show partitions census;
++-------+-------+--------+------+---------+
+| year | #Rows | #Files | Size | Format |
++-------+-------+--------+------+---------+
+| 2000 | -1 | 0 | 0B | TEXT |
+| 2004 | -1 | 0 | 0B | TEXT |
+| 2008 | -1 | 0 | 0B | TEXT |
+| 2010 | -1 | 0 | 0B | TEXT |
+| 2011 | 4 | 1 | 22B | TEXT |
+| 2012 | 4 | 1 | 22B | TEXT |
+| 2013 | 1 | 1 | 231B | PARQUET |
+| Total | 9 | 3 | 275B | |
++-------+-------+--------+------+---------+
+</code></pre>
+
+ <p class="p">
+ The following example shows the output for a Kudu table
+ using the hash partitioning mechanism. The number of
+ rows in the result set corresponds to the values used
+ in the <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>
+ clause of <code class="ph codeph">CREATE TABLE</code>.
+ </p>
+
+<pre class="pre codeblock"><code>
+show partitions million_rows_hash;
+
++--------+-----------+----------+-----------------------+--
+| # Rows | Start Key | Stop Key | Leader Replica | # Replicas
++--------+-----------+----------+-----------------------+--
+| -1 | | 00000001 | n236.example.com:7050 | 3
+| -1 | 00000001 | 00000002 | n236.example.com:7050 | 3
+| -1 | 00000002 | 00000003 | n336.example.com:7050 | 3
+| -1 | 00000003 | 00000004 | n238.example.com:7050 | 3
+| -1 | 00000004 | 00000005 | n338.example.com:7050 | 3
+....
+| -1 | 0000002E | 0000002F | n240.example.com:7050 | 3
+| -1 | 0000002F | 00000030 | n336.example.com:7050 | 3
+| -1 | 00000030 | 00000031 | n240.example.com:7050 | 3
+| -1 | 00000031 | | n334.example.com:7050 | 3
++--------+-----------+----------+-----------------------+--
+Fetched 50 row(s) in 0.05s
+
+</code></pre>
+
+ <p class="p">
+ The following example shows the output for a Kudu table
+ using the range partitioning mechanism:
+ </p>
+
+<pre class="pre codeblock"><code>
+show range partitions million_rows_range;
++-----------------------+
+| RANGE (id) |
++-----------------------+
+| VALUES < "A" |
+| "A" <= VALUES < "[" |
+| "a" <= VALUES < "{" |
+| "{" <= VALUES < "~\0" |
++-----------------------+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ The Impala user must also have execute
+ permission for the database directory, and any parent directories of the database directory in HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for usage information and examples.
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>, <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="show__show_functions">
+
+ <h2 class="title topictitle2" id="ariaid-title13">SHOW FUNCTIONS Statement</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ By default, <code class="ph codeph">SHOW FUNCTIONS</code> displays user-defined functions (UDFs) and <code class="ph codeph">SHOW
+ AGGREGATE FUNCTIONS</code> displays user-defined aggregate functions (UDAFs) associated with a particular
+ database. The output from <code class="ph codeph">SHOW FUNCTIONS</code> includes the argument signature of each function.
+ You specify this argument signature as part of the <code class="ph codeph">DROP FUNCTION</code> statement. You might have
+ several UDFs with the same name, each accepting different argument data types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">SHOW FUNCTIONS</code> output includes
+ a new column, labelled <code class="ph codeph">is persistent</code>. This property is <code class="ph codeph">true</code> for
+ Impala built-in functions, C++ UDFs, and Java UDFs created using the new <code class="ph codeph">CREATE FUNCTION</code>
+ syntax with no signature. It is <code class="ph codeph">false</code> for Java UDFs created using the old
+ <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the types for the arguments and return value.
+ Any functions with <code class="ph codeph">false</code> shown for this property must be created again by the
+ <code class="ph codeph">CREATE FUNCTION</code> statement each time the Impala catalog server is restarted.
+ See <code class="ph codeph">CREATE FUNCTION</code> for information on switching to the new syntax, so that
+ Java UDFs are preserved across restarts. Java UDFs that are persisted this way are also easier
+ to share across Impala and Hive.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+
+ <p class="p">
+ When authorization is enabled, the output of the <code class="ph codeph">SHOW</code> statement is limited to those
+ objects for which you have some privilege. There might be other database, tables, and so on, but their
+ names are concealed. If you believe an object exists but you cannot see it in the <code class="ph codeph">SHOW</code>
+ output, check with the system administrator if you need to be granted a new privilege for that object. See
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to set up authorization and add
+ privileges for specific kinds of objects.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ To display Impala built-in functions, specify the special database name <code class="ph codeph">_impala_builtins</code>:
+ </p>
+
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
++--------------+-------------------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT | abs(BIGINT) | BUILTIN | true |
+| DECIMAL(*,*) | abs(DECIMAL(*,*)) | BUILTIN | true |
+| DOUBLE | abs(DOUBLE) | BUILTIN | true |
+| FLOAT | abs(FLOAT) | BUILTIN | true |
++----------------+----------------------------------------+
+...
+
+show functions in _impala_builtins like '*week*';
++-------------+------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+------------------------------+-------------+---------------+
+| INT | dayofweek(TIMESTAMP) | BUILTIN | true |
+| INT | weekofyear(TIMESTAMP) | BUILTIN | true |
+| TIMESTAMP | weeks_add(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | weeks_add(TIMESTAMP, INT) | BUILTIN | true |
+| TIMESTAMP | weeks_sub(TIMESTAMP, BIGINT) | BUILTIN | true |
+| TIMESTAMP | weeks_sub(TIMESTAMP, INT) | BUILTIN | true |
++-------------+------------------------------+-------------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_functions_overview.html#functions">Overview of Impala Functions</a>, <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>,
+ <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>,
+ <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+ <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>
+ </p>
+ </div>
+ </article>
+
+
+</article></main></body></html>
[33/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain.html b/docs/build3x/html/topics/impala_explain.html
new file mode 100644
index 0000000..7768124
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain.html
@@ -0,0 +1,296 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN Statement</title></head><body id="explain"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Returns the execution plan for a statement, showing the low-level mechanisms that Impala will use to read the
+ data, divide the work among nodes in the cluster, and transmit intermediate and final results across the
+ network. Use <code class="ph codeph">explain</code> followed by a complete <code class="ph codeph">SELECT</code> query. For example:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>EXPLAIN { <var class="keyword varname">select_query</var> | <var class="keyword varname">ctas_stmt</var> | <var class="keyword varname">insert_stmt</var> }
+</code></pre>
+
+ <p class="p">
+ The <var class="keyword varname">select_query</var> is a <code class="ph codeph">SELECT</code> statement, optionally prefixed by a
+ <code class="ph codeph">WITH</code> clause. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+ </p>
+
+ <p class="p">
+ The <var class="keyword varname">insert_stmt</var> is an <code class="ph codeph">INSERT</code> statement that inserts into or overwrites an
+ existing table. It can use either the <code class="ph codeph">INSERT ... SELECT</code> or <code class="ph codeph">INSERT ...
+ VALUES</code> syntax. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+ </p>
+
+ <p class="p">
+ The <var class="keyword varname">ctas_stmt</var> is a <code class="ph codeph">CREATE TABLE</code> statement using the <code class="ph codeph">AS
+ SELECT</code> clause, typically abbreviated as a <span class="q">"CTAS"</span> operation. See
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ You can interpret the output to judge whether the query is performing efficiently, and adjust the query
+ and/or the schema if not. For example, you might change the tests in the <code class="ph codeph">WHERE</code> clause, add
+ hints to make join operations more efficient, introduce subqueries, change the order of tables in a join, add
+ or change partitioning for a table, collect column statistics and/or table statistics in Hive, or any other
+ performance tuning steps.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">EXPLAIN</code> output reminds you if table or column statistics are missing from any table
+ involved in the query. These statistics are important for optimizing queries involving large tables or
+ multi-table joins. See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for how to gather statistics,
+ and <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for how to use this information for query tuning.
+ </p>
+
+ <div class="p">
+ Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+ <ul class="ul">
+ <li class="li">
+ The last part of the plan shows the low-level details such as the expected amount of data that will be
+ read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+ take to scan a table based on total data size and the size of the cluster.
+ </li>
+
+ <li class="li">
+ As you work your way up, next you see the operations that will be parallelized and performed on each
+ Impala node.
+ </li>
+
+ <li class="li">
+ At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+ from one node to another.
+ </li>
+
+ <li class="li">
+ See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+ <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+ <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+ dealing with logical or physical aspects of the query.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ If you come from a traditional database background and are not familiar with data warehousing, keep in mind
+ that Impala is optimized for full table scans across very large tables. The structure and distribution of
+ this data is typically not suitable for the kind of indexing and single-row lookups that are common in OLTP
+ environments. Seeing a query scan entirely through a large table is common, not necessarily an indication of
+ an inefficient query. Of course, if you can reduce the volume of scanned data by orders of magnitude, for
+ example by using a query that affects only certain partitions within a partitioned table, then you might be
+ able to optimize a query so that it executes in seconds rather than minutes.
+ </p>
+
+ <p class="p">
+ For more information and examples to help you interpret <code class="ph codeph">EXPLAIN</code> output, see
+ <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Extended EXPLAIN output:</strong>
+ </p>
+
+ <p class="p">
+ For performance tuning of complex queries, and capacity planning (such as using the admission control and
+ resource management features), you can enable more detailed and informative output for the
+ <code class="ph codeph">EXPLAIN</code> statement. In the <span class="keyword cmdname">impala-shell</span> interpreter, issue the command
+ <code class="ph codeph">SET EXPLAIN_LEVEL=<var class="keyword varname">level</var></code>, where <var class="keyword varname">level</var> is an integer
+ from 0 to 3 or corresponding mnemonic values <code class="ph codeph">minimal</code>, <code class="ph codeph">standard</code>,
+ <code class="ph codeph">extended</code>, or <code class="ph codeph">verbose</code>.
+ </p>
+
+ <p class="p">
+ When extended <code class="ph codeph">EXPLAIN</code> output is enabled, <code class="ph codeph">EXPLAIN</code> statements print
+ information about estimated memory requirements, minimum number of virtual cores, and so on.
+
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details and examples.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example shows how the standard <code class="ph codeph">EXPLAIN</code> output moves from the lowest (physical) level to
+ the higher (logical) levels. The query begins by scanning a certain amount of data; each node performs an
+ aggregation operation (evaluating <code class="ph codeph">COUNT(*)</code>) on some subset of data that is local to that
+ node; the intermediate results are transmitted back to the coordinator node (labelled here as the
+ <code class="ph codeph">EXCHANGE</code> node); lastly, the intermediate results are summed to display the final result.
+ </p>
+
+<pre class="pre codeblock" id="explain__explain_plan_simple"><code>[impalad-host:21000] > explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+| |
+| 03:AGGREGATE [MERGE FINALIZE] |
+| | output: sum(count(*)) |
+| | |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+| 00:SCAN HDFS [default.customer_address] |
+| partitions=1/1 size=5.25MB |
++----------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ These examples show how the extended <code class="ph codeph">EXPLAIN</code> output becomes more accurate and informative as
+ statistics are gathered by the <code class="ph codeph">COMPUTE STATS</code> statement. Initially, much of the information
+ about data size and distribution is marked <span class="q">"unavailable"</span>. Impala can determine the raw data size, but
+ not the number of rows or number of distinct values for each column without additional analysis. The
+ <code class="ph codeph">COMPUTE STATS</code> statement performs this analysis, so a subsequent <code class="ph codeph">EXPLAIN</code>
+ statement has additional information to use in deciding how to optimize the distributed query.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=extended;
+EXPLAIN_LEVEL set to extended
+[localhost:21000] > explain select x from t1;
+[localhost:21000] > explain select x from t1;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=32.00MB VCores=1 |
+| |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | hosts=1 per-host-mem=unavailable |
+<strong class="ph b">| | tuple-ids=0 row-size=4B cardinality=unavailable |</strong>
+| | |
+| 00:SCAN HDFS [default.t2, PARTITION=RANDOM] |
+| partitions=1/1 size=36B |
+<strong class="ph b">| table stats: unavailable |</strong>
+<strong class="ph b">| column stats: unavailable |</strong>
+| hosts=1 per-host-mem=32.00MB |
+<strong class="ph b">| tuple-ids=0 row-size=4B cardinality=unavailable |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats t1;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] > explain select x from t1;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=64.00MB VCores=1 |
+| |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | hosts=1 per-host-mem=unavailable |
+| | tuple-ids=0 row-size=4B cardinality=0 |
+| | |
+| 00:SCAN HDFS [default.t1, PARTITION=RANDOM] |
+| partitions=1/1 size=36B |
+<strong class="ph b">| table stats: 0 rows total |</strong>
+<strong class="ph b">| column stats: all |</strong>
+| hosts=1 per-host-mem=64.00MB |
+<strong class="ph b">| tuple-ids=0 row-size=4B cardinality=0 |</strong>
++----------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read
+ and execute permissions for all applicable directories in all source tables
+ for the query that is being explained.
+ (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+ if the source table is partitioned.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">EXPLAIN</code> statement displays equivalent plan
+ information for queries against Kudu tables as for queries
+ against HDFS-based tables.
+ </p>
+
+ <p class="p">
+ To see which predicates Impala can <span class="q">"push down"</span> to Kudu for
+ efficient evaluation, without transmitting unnecessary rows back
+ to Impala, look for the <code class="ph codeph">kudu predicates</code> item in
+ the scan phase of the query. The label <code class="ph codeph">kudu predicates</code>
+ indicates a condition that can be evaluated efficiently on the Kudu
+ side. The label <code class="ph codeph">predicates</code> in a <code class="ph codeph">SCAN KUDU</code>
+ node indicates a condition that is evaluated by Impala.
+ For example, in a table with primary key column <code class="ph codeph">X</code>
+ and non-primary key column <code class="ph codeph">Y</code>, you can see that
+ some operators in the <code class="ph codeph">WHERE</code> clause are evaluated
+ immediately by Kudu and others are evaluated later by Impala:
+ </p>
+
+<pre class="pre codeblock"><code>
+EXPLAIN SELECT x,y from kudu_table WHERE
+ x = 1 AND y NOT IN (2,3) AND z = 1
+ AND a IS NOT NULL AND b > 0 AND length(s) > 5;
++----------------
+| Explain String
++----------------
+...
+| 00:SCAN KUDU [kudu_table]
+| predicates: y NOT IN (2, 3), length(s) > 5
+| kudu predicates: a IS NOT NULL, b > 0, x = 1, z = 1
+</code></pre>
+
+ <p class="p">
+ Only binary predicates, <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code>
+ (in <span class="keyword">Impala 2.9</span> and higher), and <code class="ph codeph">IN</code> predicates
+ containing literal values that exactly match the types in the Kudu table, and do not
+ require any casting, can be pushed to Kudu.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+ <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>,
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain_level.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain_level.html b/docs/build3x/html/topics/impala_explain_level.html
new file mode 100644
index 0000000..23d901a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain_level.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_level"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXPLAIN_LEVEL Query Option</title></head><body id="explain_level"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">EXPLAIN_LEVEL Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Controls the amount of detail provided in the output of the <code class="ph codeph">EXPLAIN</code> statement. The basic
+ output can help you identify high-level performance issues such as scanning a higher volume of data or more
+ partitions than you expect. The higher levels of detail show how intermediate results flow between nodes and
+ how different SQL operations such as <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, joins, and
+ <code class="ph codeph">WHERE</code> clauses are implemented within a distributed query.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code> or <code class="ph codeph">INT</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">1</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Arguments:</strong>
+ </p>
+
+ <p class="p">
+ The allowed range of numeric values for this option is 0 to 3:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">0</code> or <code class="ph codeph">MINIMAL</code>: A barebones list, one line per operation. Primarily useful
+ for checking the join order in very long queries where the regular <code class="ph codeph">EXPLAIN</code> output is too
+ long to read easily.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">1</code> or <code class="ph codeph">STANDARD</code>: The default level of detail, showing the logical way that
+ work is split up for the distributed query.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">2</code> or <code class="ph codeph">EXTENDED</code>: Includes additional detail about how the query planner
+ uses statistics in its decision-making process, to understand how a query could be tuned by gathering
+ statistics, using query hints, adding or removing predicates, and so on.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">3</code> or <code class="ph codeph">VERBOSE</code>: The maximum level of detail, showing how work is split up
+ within each node into <span class="q">"query fragments"</span> that are connected in a pipeline. This extra detail is
+ primarily useful for low-level performance testing and tuning within Impala itself, rather than for
+ rewriting the SQL code at the user level.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Prior to Impala 1.3, the allowed argument range for <code class="ph codeph">EXPLAIN_LEVEL</code> was 0 to 1: level 0 had
+ the mnemonic <code class="ph codeph">NORMAL</code>, and level 1 was <code class="ph codeph">VERBOSE</code>. In Impala 1.3 and higher,
+ <code class="ph codeph">NORMAL</code> is not a valid mnemonic value, and <code class="ph codeph">VERBOSE</code> still applies to the
+ highest level of detail but now corresponds to level 3. You might need to adjust the values if you have any
+ older <code class="ph codeph">impala-shell</code> script files that set the <code class="ph codeph">EXPLAIN_LEVEL</code> query option.
+ </div>
+
+ <p class="p">
+ Changing the value of this option controls the amount of detail in the output of the <code class="ph codeph">EXPLAIN</code>
+ statement. The extended information from level 2 or 3 is especially useful during performance tuning, when
+ you need to confirm whether the work for the query is distributed the way you expect, particularly for the
+ most resource-intensive operations such as join queries against large tables, queries against tables with
+ large numbers of partitions, and insert operations for Parquet tables. The extended information also helps to
+ check estimated resource usage when you use the admission control or resource management features explained
+ in <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a>. See
+ <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for the syntax of the <code class="ph codeph">EXPLAIN</code> statement, and
+ <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details about how to use the extended information.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ As always, read the <code class="ph codeph">EXPLAIN</code> output from bottom to top. The lowest lines represent the
+ initial work of the query (scanning data files), the lines in the middle represent calculations done on each
+ node and how intermediate results are transmitted from one node to another, and the topmost lines represent
+ the final results being sent back to the coordinator node.
+ </p>
+
+ <p class="p">
+ The numbers in the left column are generated internally during the initial planning phase and do not
+ represent the actual order of operations, so it is not significant if they appear out of order in the
+ <code class="ph codeph">EXPLAIN</code> output.
+ </p>
+
+ <p class="p">
+ At all <code class="ph codeph">EXPLAIN</code> levels, the plan contains a warning if any tables in the query are missing
+ statistics. Use the <code class="ph codeph">COMPUTE STATS</code> statement to gather statistics for each table and suppress
+ this warning. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details about how the statistics help
+ query performance.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> always starts with an explain plan
+ showing full detail, the same as with <code class="ph codeph">EXPLAIN_LEVEL=3</code>. <span class="ph">After the explain
+ plan comes the executive summary, the same output as produced by the <code class="ph codeph">SUMMARY</code> command in
+ <span class="keyword cmdname">impala-shell</span>.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ These examples use a trivial, empty table to illustrate how the essential aspects of query planning are shown
+ in <code class="ph codeph">EXPLAIN</code> output:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, s string);
+[localhost:21000] > set explain_level=1;
+[localhost:21000] > explain select count(*) from t1;
++------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1 |
+| WARNING: The following tables are missing relevant table and/or column |
+| statistics. |
+| explain_plan.t1 |
+| |
+| 03:AGGREGATE [MERGE FINALIZE] |
+| | output: sum(count(*)) |
+| | |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+| 00:SCAN HDFS [explain_plan.t1] |
+| partitions=1/1 size=0B |
++------------------------------------------------------------------------+
+[localhost:21000] > explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+| statistics. |
+| explain_plan.t1 |
+| |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+| 00:SCAN HDFS [explain_plan.t1] |
+| partitions=1/1 size=0B |
++------------------------------------------------------------------------+
+[localhost:21000] > set explain_level=2;
+[localhost:21000] > explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| WARNING: The following tables are missing relevant table and/or column |
+| statistics. |
+| explain_plan.t1 |
+| |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | hosts=0 per-host-mem=unavailable |
+| | tuple-ids=0 row-size=19B cardinality=unavailable |
+| | |
+| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
+| partitions=1/1 size=0B |
+| table stats: unavailable |
+| column stats: unavailable |
+| hosts=0 per-host-mem=0B |
+| tuple-ids=0 row-size=19B cardinality=unavailable |
++------------------------------------------------------------------------+
+[localhost:21000] > set explain_level=3;
+[localhost:21000] > explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+<strong class="ph b">| WARNING: The following tables are missing relevant table and/or column |</strong>
+<strong class="ph b">| statistics. |</strong>
+<strong class="ph b">| explain_plan.t1 |</strong>
+| |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| hosts=0 per-host-mem=unavailable |
+| tuple-ids=0 row-size=19B cardinality=unavailable |
+| |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
+| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
+| partitions=1/1 size=0B |
+<strong class="ph b">| table stats: unavailable |</strong>
+<strong class="ph b">| column stats: unavailable |</strong>
+| hosts=0 per-host-mem=0B |
+| tuple-ids=0 row-size=19B cardinality=unavailable |
++------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ As the warning message demonstrates, most of the information needed for Impala to do efficient query
+ planning, and for you to understand the performance characteristics of the query, requires running the
+ <code class="ph codeph">COMPUTE STATS</code> statement for the table:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats t1;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+[localhost:21000] > explain select * from t1;
++------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=-9223372036854775808B VCores=0 |
+| |
+| F01:PLAN FRAGMENT [PARTITION=UNPARTITIONED] |
+| 01:EXCHANGE [PARTITION=UNPARTITIONED] |
+| hosts=0 per-host-mem=unavailable |
+| tuple-ids=0 row-size=20B cardinality=0 |
+| |
+| F00:PLAN FRAGMENT [PARTITION=RANDOM] |
+| DATASTREAM SINK [FRAGMENT=F01, EXCHANGE=01, PARTITION=UNPARTITIONED] |
+| 00:SCAN HDFS [explain_plan.t1, PARTITION=RANDOM] |
+| partitions=1/1 size=0B |
+<strong class="ph b">| table stats: 0 rows total |</strong>
+<strong class="ph b">| column stats: all |</strong>
+| hosts=0 per-host-mem=0B |
+| tuple-ids=0 row-size=20B cardinality=0 |
++------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ Joins and other complicated, multi-part queries are the ones where you most commonly need to examine the
+ <code class="ph codeph">EXPLAIN</code> output and customize the amount of detail in the output. This example shows the
+ default <code class="ph codeph">EXPLAIN</code> output for a three-way join query, then the equivalent output with a
+ <code class="ph codeph">[SHUFFLE]</code> hint to change the join mechanism between the first two tables from a broadcast
+ join to a shuffle join.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=1;
+[localhost:21000] > explain select one.*, two.*, three.* from t1 one, t1 two, t1 three where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+| |
+| 07:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST] |</strong>
+| | hash predicates: two.x = three.x |
+| | |
+<strong class="ph b">| |--06:EXCHANGE [BROADCAST] |</strong>
+| | | |
+| | 02:SCAN HDFS [explain_plan.t1 three] |
+| | partitions=1/1 size=0B |
+| | |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, BROADCAST] |</strong>
+| | hash predicates: one.x = two.x |
+| | |
+<strong class="ph b">| |--05:EXCHANGE [BROADCAST] |</strong>
+| | | |
+| | 01:SCAN HDFS [explain_plan.t1 two] |
+| | partitions=1/1 size=0B |
+| | |
+| 00:SCAN HDFS [explain_plan.t1 one] |
+| partitions=1/1 size=0B |
++---------------------------------------------------------+
+[localhost:21000] > explain select one.*, two.*, three.*
+ > from t1 one join [shuffle] t1 two join t1 three
+ > where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+| |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST] |</strong>
+| | hash predicates: two.x = three.x |
+| | |
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST] |</strong>
+| | | |
+| | 02:SCAN HDFS [explain_plan.t1 three] |
+| | partitions=1/1 size=0B |
+| | |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</strong>
+| | hash predicates: one.x = two.x |
+| | |
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</strong>
+| | | |
+| | 01:SCAN HDFS [explain_plan.t1 two] |
+| | partitions=1/1 size=0B |
+| | |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)] |</strong>
+| | |
+| 00:SCAN HDFS [explain_plan.t1 one] |
+| partitions=1/1 size=0B |
++---------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ For a join involving many different tables, the default <code class="ph codeph">EXPLAIN</code> output might stretch over
+ several pages, and the only details you care about might be the join order and the mechanism (broadcast or
+ shuffle) for joining each pair of tables. In that case, you might set <code class="ph codeph">EXPLAIN_LEVEL</code> to its
+ lowest value of 0, to focus on just the join order and join mechanism for each stage. The following example
+ shows how the rows from the first and second joined tables are hashed and divided among the nodes of the
+ cluster for further filtering; then the entire contents of the third table are broadcast to all nodes for the
+ final stage of join processing.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set explain_level=0;
+[localhost:21000] > explain select one.*, two.*, three.*
+ > from t1 one join [shuffle] t1 two join t1 three
+ > where one.x = two.x and two.x = three.x;
++---------------------------------------------------------+
+| Explain String |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=4.00GB VCores=3 |
+| |
+| 08:EXCHANGE [PARTITION=UNPARTITIONED] |
+<strong class="ph b">| 04:HASH JOIN [INNER JOIN, BROADCAST] |</strong>
+<strong class="ph b">| |--07:EXCHANGE [BROADCAST] |</strong>
+| | 02:SCAN HDFS [explain_plan.t1 three] |
+<strong class="ph b">| 03:HASH JOIN [INNER JOIN, PARTITIONED] |</strong>
+<strong class="ph b">| |--06:EXCHANGE [PARTITION=HASH(two.x)] |</strong>
+| | 01:SCAN HDFS [explain_plan.t1 two] |
+<strong class="ph b">| 05:EXCHANGE [PARTITION=HASH(one.x)] |</strong>
+| 00:SCAN HDFS [explain_plan.t1 one] |
++---------------------------------------------------------+
+</code></pre>
+
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_explain_plan.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_explain_plan.html b/docs/build3x/html/topics/impala_explain_plan.html
new file mode 100644
index 0000000..020c28b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_explain_plan.html
@@ -0,0 +1,592 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="explain_plan"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</title>
</head><body id="explain_plan"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To understand the high-level performance considerations for Impala queries, read the output of the
+ <code class="ph codeph">EXPLAIN</code> statement for the query. You can get the <code class="ph codeph">EXPLAIN</code> plan without
+ actually running the query itself.
+ </p>
+
+ <p class="p">
+ For an overview of the physical performance characteristics for a query, issue the <code class="ph codeph">SUMMARY</code>
+ statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This condensed information
+ shows which phases of execution took the most time, and how the estimates for memory usage and number of rows
+ at each phase compare to the actual values.
+ </p>
+
+ <p class="p">
+ To understand the detailed performance characteristics for a query, issue the <code class="ph codeph">PROFILE</code>
+ statement in <span class="keyword cmdname">impala-shell</span> immediately after executing a query. This low-level information
+ includes physical details about memory, CPU, I/O, and network usage, and thus is only available after the
+ query is actually run.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <p class="p">
+ Also, see <a class="xref" href="impala_hbase.html#hbase_performance">Performance Considerations for the Impala-HBase Integration</a>
+ and <a class="xref" href="impala_s3.html#s3_performance">Understanding and Tuning Impala Query Performance for S3 Data</a>
+ for examples of interpreting
+ <code class="ph codeph">EXPLAIN</code> plans for queries against HBase tables
+ <span class="ph">and data stored in the Amazon Simple Storage System (S3)</span>.
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="explain_plan__perf_explain">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Using the EXPLAIN Plan for Performance Tuning</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph"><a class="xref" href="impala_explain.html#explain">EXPLAIN</a></code> statement gives you an outline
+ of the logical steps that a query will perform, such as how the work will be distributed among the nodes
+ and how intermediate results will be combined to produce the final result set. You can see these details
+ before actually running the query. You can use this information to check that the query will not operate in
+ some very unexpected or inefficient way.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>[impalad-host:21000] > explain select count(*) from customer_address;
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=42.00MB VCores=1 |
+| |
+| 03:AGGREGATE [MERGE FINALIZE] |
+| | output: sum(count(*)) |
+| | |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+| 00:SCAN HDFS [default.customer_address] |
+| partitions=1/1 size=5.25MB |
++----------------------------------------------------------+
+</code></pre>
+
+ <div class="p">
+ Read the <code class="ph codeph">EXPLAIN</code> plan from bottom to top:
+ <ul class="ul">
+ <li class="li">
+ The last part of the plan shows the low-level details such as the expected amount of data that will be
+ read, where you can judge the effectiveness of your partitioning strategy and estimate how long it will
+ take to scan a table based on total data size and the size of the cluster.
+ </li>
+
+ <li class="li">
+ As you work your way up, next you see the operations that will be parallelized and performed on each
+ Impala node.
+ </li>
+
+ <li class="li">
+ At the higher levels, you see how data flows when intermediate result sets are combined and transmitted
+ from one node to another.
+ </li>
+
+ <li class="li">
+ See <a class="xref" href="../shared/../topics/impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the
+ <code class="ph codeph">EXPLAIN_LEVEL</code> query option, which lets you customize how much detail to show in the
+ <code class="ph codeph">EXPLAIN</code> plan depending on whether you are doing high-level or low-level tuning,
+ dealing with logical or physical aspects of the query.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ The <code class="ph codeph">EXPLAIN</code> plan is also printed at the beginning of the query profile report described in
+ <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining both the logical and physical aspects of the
+ query side-by-side.
+ </p>
+
+ <p class="p">
+ The amount of detail displayed in the <code class="ph codeph">EXPLAIN</code> output is controlled by the
+ <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option. You typically
+ increase this setting from <code class="ph codeph">standard</code> to <code class="ph codeph">extended</code> (or from <code class="ph codeph">1</code>
+ to <code class="ph codeph">2</code>) when doublechecking the presence of table and column statistics during performance
+ tuning, or when estimating query resource usage in conjunction with the resource management features.
+ </p>
+
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="explain_plan__perf_summary">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Using the SUMMARY Report for Performance Tuning</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph"><a class="xref" href="impala_shell_commands.html#shell_commands">SUMMARY</a></code> command within
+ the <span class="keyword cmdname">impala-shell</span> interpreter gives you an easy-to-digest overview of the timings for the
+ different phases of execution for a query. Like the <code class="ph codeph">EXPLAIN</code> plan, it is easy to see
+ potential performance bottlenecks. Like the <code class="ph codeph">PROFILE</code> output, it is available after the
+ query is run and so displays actual timing numbers.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SUMMARY</code> report is also printed at the beginning of the query profile report described
+ in <a class="xref" href="#perf_profile">Using the Query Profile for Performance Tuning</a>, for convenience in examining high-level and low-level aspects of the query
+ side-by-side.
+ </p>
+
+ <p class="p">
+ For example, here is a query involving an aggregate function, on a single-node VM. The different stages of
+ the query and their timings are shown (rolled up for all nodes), along with estimated and actual values
+ used in planning the query. In this case, the <code class="ph codeph">AVG()</code> function is computed for a subset of
+ data on each node (stage 01) and then the aggregated results from all nodes are combined at the end (stage
+ 03). You can see which stages took the most time, and whether any estimates were substantially different
+ than the actual data distribution. (When examining the time values, be sure to consider the suffixes such
+ as <code class="ph codeph">us</code> for microseconds and <code class="ph codeph">ms</code> for milliseconds, rather than just looking
+ for the largest numbers.)
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select avg(ss_sales_price) from store_sales where ss_coupon_amt = 0;
++---------------------+
+| avg(ss_sales_price) |
++---------------------+
+| 37.80770926328327 |
++---------------------+
+[localhost:21000] > summary;
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+| 03:AGGREGATE | 1 | 1.03ms | 1.03ms | 1 | 1 | 48.00 KB | -1 B | MERGE FINALIZE |
+| 02:EXCHANGE | 1 | 0ns | 0ns | 1 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 01:AGGREGATE | 1 | 30.79ms | 30.79ms | 1 | 1 | 80.00 KB | 10.00 MB | |
+| 00:SCAN HDFS | 1 | 5.45s | 5.45s | 2.21M | -1 | 64.05 MB | 432.00 MB | tpc.store_sales |
++--------------+--------+----------+----------+-------+------------+----------+---------------+-----------------+
+</code></pre>
+
+ <p class="p">
+ Notice how the longest initial phase of the query is measured in seconds (s), while later phases working on
+ smaller intermediate results are measured in milliseconds (ms) or even nanoseconds (ns).
+ </p>
+
+ <p class="p">
+ Here is an example from a more complicated query, as it would appear in the <code class="ph codeph">PROFILE</code>
+ output:
+ </p>
+
+<pre class="pre codeblock"><code>Operator #Hosts Avg Time Max Time #Rows Est. #Rows Peak Mem Est. Peak Mem Detail
+------------------------------------------------------------------------------------------------------------------------
+09:MERGING-EXCHANGE 1 79.738us 79.738us 5 5 0 -1.00 B UNPARTITIONED
+05:TOP-N 3 84.693us 88.810us 5 5 12.00 KB 120.00 B
+04:AGGREGATE 3 5.263ms 6.432ms 5 5 44.00 KB 10.00 MB MERGE FINALIZE
+08:AGGREGATE 3 16.659ms 27.444ms 52.52K 600.12K 3.20 MB 15.11 MB MERGE
+07:EXCHANGE 3 2.644ms 5.1ms 52.52K 600.12K 0 0 HASH(o_orderpriority)
+03:AGGREGATE 3 342.913ms 966.291ms 52.52K 600.12K 10.80 MB 15.11 MB
+02:HASH JOIN 3 2s165ms 2s171ms 144.87K 600.12K 13.63 MB 941.01 KB INNER JOIN, BROADCAST
+|--06:EXCHANGE 3 8.296ms 8.692ms 57.22K 15.00K 0 0 BROADCAST
+| 01:SCAN HDFS 2 1s412ms 1s978ms 57.22K 15.00K 24.21 MB 176.00 MB tpch.orders o
+00:SCAN HDFS 3 8s032ms 8s558ms 3.79M 600.12K 32.29 MB 264.00 MB tpch.lineitem l
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="explain_plan__perf_profile">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Using the Query Profile for Performance Tuning</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">PROFILE</code> statement, available in the <span class="keyword cmdname">impala-shell</span> interpreter,
+ produces a detailed low-level report showing how the most recent query was executed. Unlike the
+ <code class="ph codeph">EXPLAIN</code> plan described in <a class="xref" href="#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a>, this information is only available
+ after the query has finished. It shows physical details such as the number of bytes read, maximum memory
+ usage, and so on for each node. You can use this information to determine if the query is I/O-bound or
+ CPU-bound, whether some network condition is imposing a bottleneck, whether a slowdown is affecting some
+ nodes but not others, and to check that recommended configuration settings such as short-circuit local
+ reads are in effect.
+ </p>
+
+ <p class="p">
+ By default, time values in the profile output reflect the wall-clock time taken by an operation.
+ For values denoting system time or user time, the measurement unit is reflected in the metric
+ name, such as <code class="ph codeph">ScannerThreadsSysTime</code> or <code class="ph codeph">ScannerThreadsUserTime</code>.
+ For example, a multi-threaded I/O operation might show a small figure for wall-clock time,
+ while the corresponding system time is larger, representing the sum of the CPU time taken by each thread.
+ Or a wall-clock time figure might be larger because it counts time spent waiting, while
+ the corresponding system and user time figures only measure the time while the operation
+ is actively using CPU cycles.
+ </p>
+
+ <p class="p">
+ The <a class="xref" href="impala_explain_plan.html#perf_explain"><code class="ph codeph">EXPLAIN</code> plan</a> is also printed
+ at the beginning of the query profile report, for convenience in examining both the logical and physical
+ aspects of the query side-by-side. The
+ <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL</a> query option also controls the
+ verbosity of the <code class="ph codeph">EXPLAIN</code> output printed by the <code class="ph codeph">PROFILE</code> command.
+ </p>
+
+
+
+ <p class="p">
+ Here is an example of a query profile, from a relatively straightforward query on a single-node
+ pseudo-distributed cluster to keep the output relatively brief.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > profile;
+Query Runtime Profile:
+Query (id=6540a03d4bee0691:4963d6269b210ebd):
+ Summary:
+ Session ID: ea4a197f1c7bf858:c74e66f72e3a33ba
+ Session Type: BEESWAX
+ Start Time: 2013-12-02 17:10:30.263067000
+ End Time: 2013-12-02 17:10:50.932044000
+ Query Type: QUERY
+ Query State: FINISHED
+ Query Status: OK
+ Impala Version: impalad version 1.2.1 RELEASE (build edb5af1bcad63d410bc5d47cc203df3a880e9324)
+ User: doc_demo
+ Network Address: 127.0.0.1:49161
+ Default Db: stats_testing
+ Sql Statement: select t1.s, t2.s from t1 join t2 on (t1.id = t2.parent)
+ Plan:
+----------------
+Estimated Per-Host Requirements: Memory=2.09GB VCores=2
+
+PLAN FRAGMENT 0
+ PARTITION: UNPARTITIONED
+
+ 4:EXCHANGE
+ cardinality: unavailable
+ per-host memory: unavailable
+ tuple ids: 0 1
+
+PLAN FRAGMENT 1
+ PARTITION: RANDOM
+
+ STREAM DATA SINK
+ EXCHANGE ID: 4
+ UNPARTITIONED
+
+ 2:HASH JOIN
+ | join op: INNER JOIN (BROADCAST)
+ | hash predicates:
+ | t1.id = t2.parent
+ | cardinality: unavailable
+ | per-host memory: 2.00GB
+ | tuple ids: 0 1
+ |
+ |----3:EXCHANGE
+ | cardinality: unavailable
+ | per-host memory: 0B
+ | tuple ids: 1
+ |
+ 0:SCAN HDFS
+ table=stats_testing.t1 #partitions=1/1 size=33B
+ table stats: unavailable
+ column stats: unavailable
+ cardinality: unavailable
+ per-host memory: 32.00MB
+ tuple ids: 0
+
+PLAN FRAGMENT 2
+ PARTITION: RANDOM
+
+ STREAM DATA SINK
+ EXCHANGE ID: 3
+ UNPARTITIONED
+
+ 1:SCAN HDFS
+ table=stats_testing.t2 #partitions=1/1 size=960.00KB
+ table stats: unavailable
+ column stats: unavailable
+ cardinality: unavailable
+ per-host memory: 96.00MB
+ tuple ids: 1
+----------------
+ Query Timeline: 20s670ms
+ - Start execution: 2.559ms (2.559ms)
+ - Planning finished: 23.587ms (21.27ms)
+ - Rows available: 666.199ms (642.612ms)
+ - First row fetched: 668.919ms (2.719ms)
+ - Unregister query: 20s668ms (20s000ms)
+ ImpalaServer:
+ - ClientFetchWaitTimer: 19s637ms
+ - RowMaterializationTimer: 167.121ms
+ Execution Profile 6540a03d4bee0691:4963d6269b210ebd:(Active: 837.815ms, % non-child: 0.00%)
+ Per Node Peak Memory Usage: impala-1.example.com:22000(7.42 MB)
+ - FinalizationTimer: 0ns
+ Coordinator Fragment:(Active: 195.198ms, % non-child: 0.00%)
+ MemoryUsage(500.0ms): 16.00 KB, 7.42 MB, 7.33 MB, 7.10 MB, 6.94 MB, 6.71 MB, 6.56 MB, 6.40 MB, 6.17 MB, 6.02 MB, 5.79 MB, 5.63 MB, 5.48 MB, 5.25 MB, 5.09 MB, 4.86 MB, 4.71 MB, 4.47 MB, 4.32 MB, 4.09 MB, 3.93 MB, 3.78 MB, 3.55 MB, 3.39 MB, 3.16 MB, 3.01 MB, 2.78 MB, 2.62 MB, 2.39 MB, 2.24 MB, 2.08 MB, 1.85 MB, 1.70 MB, 1.54 MB, 1.31 MB, 1.16 MB, 948.00 KB, 790.00 KB, 553.00 KB, 395.00 KB, 237.00 KB
+ ThreadUsage(500.0ms): 1
+ - AverageThreadTokens: 1.00
+ - PeakMemoryUsage: 7.42 MB
+ - PrepareTime: 36.144us
+ - RowsProduced: 98.30K (98304)
+ - TotalCpuTime: 20s449ms
+ - TotalNetworkWaitTime: 191.630ms
+ - TotalStorageWaitTime: 0ns
+ CodeGen:(Active: 150.679ms, % non-child: 77.19%)
+ - CodegenTime: 0ns
+ - CompileTime: 139.503ms
+ - LoadTime: 10.7ms
+ - ModuleFileSize: 95.27 KB
+ EXCHANGE_NODE (id=4):(Active: 194.858ms, % non-child: 99.83%)
+ - BytesReceived: 2.33 MB
+ - ConvertRowBatchTime: 2.732ms
+ - DataArrivalWaitTime: 191.118ms
+ - DeserializeRowBatchTimer: 14.943ms
+ - FirstBatchArrivalWaitTime: 191.117ms
+ - PeakMemoryUsage: 7.41 MB
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 504.49 K/sec
+ - SendersBlockedTimer: 0ns
+ - SendersBlockedTotalTimer(*): 0ns
+ Averaged Fragment 1:(Active: 442.360ms, % non-child: 0.00%)
+ split sizes: min: 33.00 B, max: 33.00 B, avg: 33.00 B, stddev: 0.00
+ completion times: min:443.720ms max:443.720ms mean: 443.720ms stddev:0ns
+ execution rates: min:74.00 B/sec max:74.00 B/sec mean:74.00 B/sec stddev:0.00 /sec
+ num instances: 1
+ - AverageThreadTokens: 1.00
+ - PeakMemoryUsage: 6.06 MB
+ - PrepareTime: 7.291ms
+ - RowsProduced: 98.30K (98304)
+ - TotalCpuTime: 784.259ms
+ - TotalNetworkWaitTime: 388.818ms
+ - TotalStorageWaitTime: 3.934ms
+ CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+ - CodegenTime: 2.669ms
+ - CompileTime: 302.467ms
+ - LoadTime: 9.231ms
+ - ModuleFileSize: 95.27 KB
+ DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+ - BytesSent: 2.33 MB
+ - NetworkThroughput(*): 35.89 MB/sec
+ - OverallThroughput: 29.06 MB/sec
+ - PeakMemoryUsage: 5.33 KB
+ - SerializeBatchTime: 26.487ms
+ - ThriftTransmitTime(*): 64.814ms
+ - UncompressedRowBatchSize: 6.66 MB
+ HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+ - BuildBuckets: 1.02K (1024)
+ - BuildRows: 98.30K (98304)
+ - BuildTime: 12.622ms
+ - LoadFactor: 0.00
+ - PeakMemoryUsage: 6.02 MB
+ - ProbeRows: 3
+ - ProbeTime: 3.579ms
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 271.54 K/sec
+ EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+ - BytesReceived: 1.15 MB
+ - ConvertRowBatchTime: 2.792ms
+ - DataArrivalWaitTime: 339.936ms
+ - DeserializeRowBatchTimer: 9.910ms
+ - FirstBatchArrivalWaitTime: 199.474ms
+ - PeakMemoryUsage: 156.00 KB
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 285.20 K/sec
+ - SendersBlockedTimer: 0ns
+ - SendersBlockedTotalTimer(*): 0ns
+ HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 0.00
+ - BytesRead: 33.00 B
+ - BytesReadLocal: 33.00 B
+ - BytesReadShortCircuit: 33.00 B
+ - NumDisksAccessed: 1
+ - NumScannerThreadsStarted: 1
+ - PeakMemoryUsage: 46.00 KB
+ - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+ - RowsRead: 3
+ - RowsReturned: 3
+ - RowsReturnedRate: 220.33 K/sec
+ - ScanRangesComplete: 1
+ - ScannerThreadsInvoluntaryContextSwitches: 26
+ - ScannerThreadsTotalWallClockTime: 55.199ms
+ - DelimiterParseTime: 2.463us
+ - MaterializeTupleTime(*): 1.226us
+ - ScannerThreadsSysTime: 0ns
+ - ScannerThreadsUserTime: 42.993ms
+ - ScannerThreadsVoluntaryContextSwitches: 1
+ - TotalRawHdfsReadTime(*): 112.86us
+ - TotalReadThroughput: 0.00 /sec
+ Averaged Fragment 2:(Active: 190.120ms, % non-child: 0.00%)
+ split sizes: min: 960.00 KB, max: 960.00 KB, avg: 960.00 KB, stddev: 0.00
+ completion times: min:191.736ms max:191.736ms mean: 191.736ms stddev:0ns
+ execution rates: min:4.89 MB/sec max:4.89 MB/sec mean:4.89 MB/sec stddev:0.00 /sec
+ num instances: 1
+ - AverageThreadTokens: 0.00
+ - PeakMemoryUsage: 906.33 KB
+ - PrepareTime: 3.67ms
+ - RowsProduced: 98.30K (98304)
+ - TotalCpuTime: 403.351ms
+ - TotalNetworkWaitTime: 34.999ms
+ - TotalStorageWaitTime: 108.675ms
+ CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+ - CodegenTime: 3.133ms
+ - CompileTime: 148.316ms
+ - LoadTime: 12.317ms
+ - ModuleFileSize: 95.27 KB
+ DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+ - BytesSent: 1.15 MB
+ - NetworkThroughput(*): 23.30 MB/sec
+ - OverallThroughput: 16.23 MB/sec
+ - PeakMemoryUsage: 5.33 KB
+ - SerializeBatchTime: 22.69ms
+ - ThriftTransmitTime(*): 49.178ms
+ - UncompressedRowBatchSize: 3.28 MB
+ HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 0.00
+ - BytesRead: 960.00 KB
+ - BytesReadLocal: 960.00 KB
+ - BytesReadShortCircuit: 960.00 KB
+ - NumDisksAccessed: 1
+ - NumScannerThreadsStarted: 1
+ - PeakMemoryUsage: 869.00 KB
+ - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+ - RowsRead: 98.30K (98304)
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 827.20 K/sec
+ - ScanRangesComplete: 15
+ - ScannerThreadsInvoluntaryContextSwitches: 34
+ - ScannerThreadsTotalWallClockTime: 189.774ms
+ - DelimiterParseTime: 15.703ms
+ - MaterializeTupleTime(*): 3.419ms
+ - ScannerThreadsSysTime: 1.999ms
+ - ScannerThreadsUserTime: 44.993ms
+ - ScannerThreadsVoluntaryContextSwitches: 118
+ - TotalRawHdfsReadTime(*): 7.199ms
+ - TotalReadThroughput: 0.00 /sec
+ Fragment 1:
+ Instance 6540a03d4bee0691:4963d6269b210ebf (host=impala-1.example.com:22000):(Active: 442.360ms, % non-child: 0.00%)
+ Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/33.00 B
+ MemoryUsage(500.0ms): 69.33 KB
+ ThreadUsage(500.0ms): 1
+ - AverageThreadTokens: 1.00
+ - PeakMemoryUsage: 6.06 MB
+ - PrepareTime: 7.291ms
+ - RowsProduced: 98.30K (98304)
+ - TotalCpuTime: 784.259ms
+ - TotalNetworkWaitTime: 388.818ms
+ - TotalStorageWaitTime: 3.934ms
+ CodeGen:(Active: 312.862ms, % non-child: 70.73%)
+ - CodegenTime: 2.669ms
+ - CompileTime: 302.467ms
+ - LoadTime: 9.231ms
+ - ModuleFileSize: 95.27 KB
+ DataStreamSender (dst_id=4):(Active: 80.63ms, % non-child: 18.10%)
+ - BytesSent: 2.33 MB
+ - NetworkThroughput(*): 35.89 MB/sec
+ - OverallThroughput: 29.06 MB/sec
+ - PeakMemoryUsage: 5.33 KB
+ - SerializeBatchTime: 26.487ms
+ - ThriftTransmitTime(*): 64.814ms
+ - UncompressedRowBatchSize: 6.66 MB
+ HASH_JOIN_NODE (id=2):(Active: 362.25ms, % non-child: 3.92%)
+ ExecOption: Build Side Codegen Enabled, Probe Side Codegen Enabled, Hash Table Built Asynchronously
+ - BuildBuckets: 1.02K (1024)
+ - BuildRows: 98.30K (98304)
+ - BuildTime: 12.622ms
+ - LoadFactor: 0.00
+ - PeakMemoryUsage: 6.02 MB
+ - ProbeRows: 3
+ - ProbeTime: 3.579ms
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 271.54 K/sec
+ EXCHANGE_NODE (id=3):(Active: 344.680ms, % non-child: 77.92%)
+ - BytesReceived: 1.15 MB
+ - ConvertRowBatchTime: 2.792ms
+ - DataArrivalWaitTime: 339.936ms
+ - DeserializeRowBatchTimer: 9.910ms
+ - FirstBatchArrivalWaitTime: 199.474ms
+ - PeakMemoryUsage: 156.00 KB
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 285.20 K/sec
+ - SendersBlockedTimer: 0ns
+ - SendersBlockedTotalTimer(*): 0ns
+ HDFS_SCAN_NODE (id=0):(Active: 13.616us, % non-child: 0.00%)
+ Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:1/33.00 B
+ Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+ File Formats: TEXT/NONE:1
+ ExecOption: Codegen enabled: 1 out of 1
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 0.00
+ - BytesRead: 33.00 B
+ - BytesReadLocal: 33.00 B
+ - BytesReadShortCircuit: 33.00 B
+ - NumDisksAccessed: 1
+ - NumScannerThreadsStarted: 1
+ - PeakMemoryUsage: 46.00 KB
+ - PerReadThreadRawHdfsThroughput: 287.52 KB/sec
+ - RowsRead: 3
+ - RowsReturned: 3
+ - RowsReturnedRate: 220.33 K/sec
+ - ScanRangesComplete: 1
+ - ScannerThreadsInvoluntaryContextSwitches: 26
+ - ScannerThreadsTotalWallClockTime: 55.199ms
+ - DelimiterParseTime: 2.463us
+ - MaterializeTupleTime(*): 1.226us
+ - ScannerThreadsSysTime: 0ns
+ - ScannerThreadsUserTime: 42.993ms
+ - ScannerThreadsVoluntaryContextSwitches: 1
+ - TotalRawHdfsReadTime(*): 112.86us
+ - TotalReadThroughput: 0.00 /sec
+ Fragment 2:
+ Instance 6540a03d4bee0691:4963d6269b210ec0 (host=impala-1.example.com:22000):(Active: 190.120ms, % non-child: 0.00%)
+ Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:15/960.00 KB
+ - AverageThreadTokens: 0.00
+ - PeakMemoryUsage: 906.33 KB
+ - PrepareTime: 3.67ms
+ - RowsProduced: 98.30K (98304)
+ - TotalCpuTime: 403.351ms
+ - TotalNetworkWaitTime: 34.999ms
+ - TotalStorageWaitTime: 108.675ms
+ CodeGen:(Active: 162.57ms, % non-child: 85.24%)
+ - CodegenTime: 3.133ms
+ - CompileTime: 148.316ms
+ - LoadTime: 12.317ms
+ - ModuleFileSize: 95.27 KB
+ DataStreamSender (dst_id=3):(Active: 70.620ms, % non-child: 37.14%)
+ - BytesSent: 1.15 MB
+ - NetworkThroughput(*): 23.30 MB/sec
+ - OverallThroughput: 16.23 MB/sec
+ - PeakMemoryUsage: 5.33 KB
+ - SerializeBatchTime: 22.69ms
+ - ThriftTransmitTime(*): 49.178ms
+ - UncompressedRowBatchSize: 3.28 MB
+ HDFS_SCAN_NODE (id=1):(Active: 118.839ms, % non-child: 62.51%)
+ Hdfs split stats (<volume id>:<# splits>/<split lengths>): 0:15/960.00 KB
+ Hdfs Read Thread Concurrency Bucket: 0:0% 1:0%
+ File Formats: TEXT/NONE:15
+ ExecOption: Codegen enabled: 15 out of 15
+ - AverageHdfsReadThreadConcurrency: 0.00
+ - AverageScannerThreadConcurrency: 0.00
+ - BytesRead: 960.00 KB
+ - BytesReadLocal: 960.00 KB
+ - BytesReadShortCircuit: 960.00 KB
+ - NumDisksAccessed: 1
+ - NumScannerThreadsStarted: 1
+ - PeakMemoryUsage: 869.00 KB
+ - PerReadThreadRawHdfsThroughput: 130.21 MB/sec
+ - RowsRead: 98.30K (98304)
+ - RowsReturned: 98.30K (98304)
+ - RowsReturnedRate: 827.20 K/sec
+ - ScanRangesComplete: 15
+ - ScannerThreadsInvoluntaryContextSwitches: 34
+ - ScannerThreadsTotalWallClockTime: 189.774ms
+ - DelimiterParseTime: 15.703ms
+ - MaterializeTupleTime(*): 3.419ms
+ - ScannerThreadsSysTime: 1.999ms
+ - ScannerThreadsUserTime: 44.993ms
+ - ScannerThreadsVoluntaryContextSwitches: 118
+ - TotalRawHdfsReadTime(*): 7.199ms
+ - TotalReadThroughput: 0.00 /sec</code></pre>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_faq.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_faq.html b/docs/build3x/html/topics/impala_faq.html
new file mode 100644
index 0000000..ce2ca4c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_faq.html
@@ -0,0 +1,21 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="faq"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Frequently Asked Questions</title></head><body id="faq"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Frequently Asked Questions</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists frequently asked questions for Apache Impala,
+ the interactive SQL engine for Hadoop.
+ </p>
+
+ <p class="p">
+ This section is under construction.
+ </p>
+
+ </div>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_file_formats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_file_formats.html b/docs/build3x/html/topics/impala_file_formats.html
new file mode 100644
index 0000000..c24d6a6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_file_formats.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_txtfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avro.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_rcfile.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_seqfile.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="file_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Works with Hado
op File Formats</title></head><body id="file_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">How Impala Works with Hadoop File Formats</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ Impala supports several familiar file formats used in Apache Hadoop. Impala can load and query data files
+ produced by other Hadoop components such as Pig or MapReduce, and data files produced by Impala can be used
+ by other components also. The following sections discuss the procedures, limitations, and performance
+ considerations for using each file format with Impala.
+ </p>
+
+ <p class="p">
+ The file format used for an Impala table has significant performance consequences. Some file formats include
+ compression support that affects the size of data on the disk and, consequently, the amount of I/O and CPU
+ resources required to deserialize data. The amounts of I/O and CPU resources required can be a limiting
+ factor in query performance since querying often begins with moving and decompressing data. To reduce the
+ potential impact of this part of the process, data is often compressed. By compressing data, a smaller total
+ number of bytes are transferred from disk to memory. This reduces the amount of time taken to transfer the
+ data, but a tradeoff occurs when the CPU decompresses the content.
+ </p>
+
+ <p class="p">
+ Impala can query files encoded with most of the popular file formats and compression codecs used in Hadoop.
+ Impala can create and insert data into tables that use some file formats but not others; for file formats
+ that Impala cannot write to, create the table in Hive, issue the <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+ statement in <code class="ph codeph">impala-shell</code>, and query the table through Impala. File formats can be
+ structured, in which case they may include metadata and built-in compression. Supported formats include:
+ </p>
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">File Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="file_formats__entry__1">
+ File Type
+ </th>
+ <th class="entry nocellnorowborder" id="file_formats__entry__2">
+ Format
+ </th>
+ <th class="entry nocellnorowborder" id="file_formats__entry__3">
+ Compression Codecs
+ </th>
+ <th class="entry nocellnorowborder" id="file_formats__entry__4">
+ Impala Can CREATE?
+ </th>
+ <th class="entry nocellnorowborder" id="file_formats__entry__5">
+ Impala Can INSERT?
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row" id="file_formats__parquet_support">
+ <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+ <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+ Snappy, gzip; currently Snappy by default
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+ Yes.
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+ Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+ </td>
+ </tr>
+ <tr class="row" id="file_formats__txtfile_support">
+ <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+ <a class="xref" href="impala_txtfile.html#txtfile">Text</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+ Unstructured
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+ LZO, gzip, bzip2, Snappy
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+ Yes. For <code class="ph codeph">CREATE TABLE</code> with no <code class="ph codeph">STORED AS</code> clause, the default file
+ format is uncompressed text, with values separated by ASCII <code class="ph codeph">0x01</code> characters
+ (typically represented as Ctrl-A).
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+ Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+ If LZO compression is used, you must create the table and load data in Hive. If other kinds of
+ compression are used, you must load data through <code class="ph codeph">LOAD DATA</code>, Hive, or manually in
+ HDFS.
+
+
+ </td>
+ </tr>
+ <tr class="row" id="file_formats__avro_support">
+ <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+ <a class="xref" href="impala_avro.html#avro">Avro</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+ Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ <tr class="row" id="file_formats__rcfile_support">
+ <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+ <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">
+ Yes.
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ <tr class="row" id="file_formats__sequencefile_support">
+ <td class="entry nocellnorowborder" headers="file_formats__entry__1 ">
+ <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__4 ">Yes.</td>
+ <td class="entry nocellnorowborder" headers="file_formats__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ </tbody></table>
+
+ <p class="p">
+ Impala can only query the file formats listed in the preceding table.
+ In particular, Impala does not support the ORC file format.
+ </p>
+
+ <p class="p">
+ Impala supports the following compression codecs:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Snappy. Recommended for its effective balance between compression ratio and decompression speed. Snappy
+ compression is very fast, but gzip provides greater space savings. Supported for text files in Impala 2.0
+ and higher.
+
+ </li>
+
+ <li class="li">
+ Gzip. Recommended when achieving the highest level of compression (and therefore greatest disk-space
+ savings) is desired. Supported for text files in Impala 2.0 and higher.
+ </li>
+
+ <li class="li">
+ Deflate. Not supported for text files.
+ </li>
+
+ <li class="li">
+ Bzip2. Supported for text files in Impala 2.0 and higher.
+
+ </li>
+
+ <li class="li">
+ <p class="p"> LZO, for text files only. Impala can query
+ LZO-compressed text tables, but currently cannot create them or insert
+ data into them; perform these operations in Hive. </p>
+ </li>
+ </ul>
+ </div>
+
+ <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_txtfile.html">Using Text Data Files with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet.html">Using the Parquet File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avro.html">Using the Avro File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_rcfile.html">Using the RCFile File Format with Impala Tables</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_seqfile.html">Using the SequenceFile File Format with Impala Tables</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="file_formats__file_format_choosing">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Choosing the File Format for a Table</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Different file formats and compression codecs work better for different data sets. While Impala typically
+ provides performance gains regardless of file format, choosing the proper format for your data can yield
+ further performance improvements. Use the following considerations to decide which combination of file
+ format and compression to use for a particular table:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If you are working with existing files that are already in a supported file format, use the same format
+ for the Impala table where practical. If the original format does not yield acceptable query performance
+ or resource usage, consider creating a new Impala table with different file format or compression
+ characteristics, and doing a one-time conversion by copying the data to the new table using the
+ <code class="ph codeph">INSERT</code> statement. Depending on the file format, you might run the
+ <code class="ph codeph">INSERT</code> statement in <code class="ph codeph">impala-shell</code> or in Hive.
+ </li>
+
+ <li class="li">
+ Text files are convenient to produce through many different tools, and are human-readable for ease of
+ verification and debugging. Those characteristics are why text is the default format for an Impala
+ <code class="ph codeph">CREATE TABLE</code> statement. When performance and resource usage are the primary
+ considerations, use one of the other file formats and consider using compression. A typical workflow
+ might involve bringing data into an Impala table by copying CSV or TSV files into the appropriate data
+ directory, and then using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy the data into a table
+ using a different, more compact file format.
+ </li>
+
+ <li class="li">
+ If your architecture involves storing data to be queried in memory, do not compress the data. There is no
+ I/O savings since the data does not need to be moved from disk, but there is a CPU cost to decompress the
+ data.
+ </li>
+ </ul>
+ </div>
+ </article>
+</article></main></body></html>
[23/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_live_summary.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_live_summary.html b/docs/build3x/html/topics/impala_live_summary.html
new file mode 100644
index 0000000..d0792a1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_live_summary.html
@@ -0,0 +1,177 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_summary"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</title></head><body id="live_summary"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LIVE_SUMMARY Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+ displays the same output as the <code class="ph codeph">SUMMARY</code> command,
+ with the measurements updated in real time as the query progresses.
+ When the query finishes, the final <code class="ph codeph">SUMMARY</code> output remains
+ visible in the <span class="keyword cmdname">impala-shell</span> console output.
+ </p>
+
+ <p class="p">
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Command-line equivalent:</strong>
+ </p>
+ <p class="p">
+ You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+ by starting the shell with the <code class="ph codeph">--live_summary</code>
+ command-line option.
+ You can still turn this setting off and on again within the shell through the
+ <code class="ph codeph">SET</code> command.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The live summary output can be useful for evaluating long-running queries,
+ to evaluate which phase of execution takes up the most time, or if some hosts
+ take much longer than others for certain operations, dragging overall performance down.
+ By making the information available in real time, this feature lets you decide what
+ action to take even before you cancel a query that is taking much longer than normal.
+ </p>
+ <p class="p">
+ For example, you might see the HDFS scan phase taking a long time, and therefore revisit
+ performance-related aspects of your schema design such as constructing a partitioned table,
+ switching to the Parquet file format, running the <code class="ph codeph">COMPUTE STATS</code> statement
+ for the table, and so on.
+ Or you might see a wide variation between the average and maximum times for all hosts to
+ perform some phase of the query, and therefore investigate if one particular host
+ needed more memory or was experiencing a network problem.
+ </p>
+ <p class="p">
+ The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+ that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+ </p>
+ <p class="p">
+ For a simple and concise way of tracking the progress of an interactive query, see
+ <a class="xref" href="impala_live_progress.html#live_progress">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+ </p>
+ <div class="p">
+ Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+ statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+ command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+ shell-only options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+ system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+ The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+ query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+ systems where you intend to use these query options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+ some information only available in <span class="keyword">Impala 2.3</span> and higher
+ to prepare live progress reports and query summaries. The
+ <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+ query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+ to a cluster running an older version of Impala.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a series of <code class="ph codeph">LIVE_SUMMARY</code> reports that
+ are displayed during the course of a query, showing how the numbers increase to
+ show the progress of different phases of the distributed query. When you do the same
+ in <span class="keyword cmdname">impala-shell</span>, only a single report is displayed at any one time,
+ with each update overwriting the previous numbers.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set live_summary=true;
+LIVE_SUMMARY set to true
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 0 | 0ns | 0ns | 0 | 22.50B | 0 B | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 0 | 0ns | 0ns | 0 | 150.00K | 0 B | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 1 | 17.62s | 17.62s | 81.14M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 1 | 247.53ms | 247.53ms | 1.02K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| Operator | #Hosts | Avg Time | Max Time | #Rows | Est. #Rows | Peak Mem | Est. Peak Mem | Detail |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+| 06:AGGREGATE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | FINALIZE |
+| 05:EXCHANGE | 0 | 0ns | 0ns | 0 | 1 | 0 B | -1 B | UNPARTITIONED |
+| 03:AGGREGATE | 1 | 0ns | 0ns | 0 | 1 | 20.00 KB | 10.00 MB | |
+| 02:NESTED LOOP JOIN | 1 | 61.85s | 61.85s | 283.43M | 22.50B | 3.23 MB | 0 B | CROSS JOIN, BROADCAST |
+| |--04:EXCHANGE | 1 | 26.29ms | 26.29ms | 150.00K | 150.00K | 0 B | 0 B | BROADCAST |
+| | 01:SCAN HDFS | 1 | 503.57ms | 503.57ms | 150.00K | 150.00K | 24.09 MB | 64.00 MB | tpch.customer t2 |
+| 00:SCAN HDFS | 1 | 247.59ms | 247.59ms | 2.05K | 150.00K | 24.39 MB | 64.00 MB | tpch.customer t1 |
++---------------------+--------+----------+----------+---------+------------+----------+---------------+-----------------------+
+
+</code></pre>
+
+
+
+
+ <p class="p">
+ To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_load_data.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_load_data.html b/docs/build3x/html/topics/impala_load_data.html
new file mode 100644
index 0000000..82f689f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_load_data.html
@@ -0,0 +1,322 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="load_data"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LOAD DATA Statement</title></head><body id="load_data"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LOAD DATA Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">LOAD DATA</code> statement streamlines the ETL process for an internal Impala table by moving a
+ data file or all the data files in a directory from an HDFS location into the Impala data directory for that
+ table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LOAD DATA INPATH '<var class="keyword varname">hdfs_file_or_directory_path</var>' [OVERWRITE] INTO TABLE <var class="keyword varname">tablename</var>
+ [PARTITION (<var class="keyword varname">partcol1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">partcol2</var>=<var class="keyword varname">val2</var> ...)]</code></pre>
+
+ <p class="p">
+ When the <code class="ph codeph">LOAD DATA</code> statement operates on a partitioned table,
+ it always operates on one partition at a time. Specify the <code class="ph codeph">PARTITION</code> clauses
+ and list all the partition key columns, with a constant value specified for each.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DML (but still affected by
+ <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL</a> query option)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The loaded data files are moved, not copied, into the Impala data directory.
+ </li>
+
+ <li class="li">
+ You can specify the HDFS path of a single file to be moved, or the HDFS path of a directory to move all the
+ files inside that directory. You cannot specify any sort of wildcard to take only some of the files from a
+ directory. When loading a directory full of data files, keep all the data files at the top level, with no
+ nested directories underneath.
+ </li>
+
+ <li class="li">
+ Currently, the Impala <code class="ph codeph">LOAD DATA</code> statement only imports files from HDFS, not from the local
+ filesystem. It does not support the <code class="ph codeph">LOCAL</code> keyword of the Hive <code class="ph codeph">LOAD DATA</code>
+ statement. You must specify a path, not an <code class="ph codeph">hdfs://</code> URI.
+ </li>
+
+ <li class="li">
+ In the interest of speed, only limited error checking is done. If the loaded files have the wrong file
+ format, different columns than the destination table, or other kind of mismatch, Impala does not raise any
+ error for the <code class="ph codeph">LOAD DATA</code> statement. Querying the table afterward could produce a runtime
+ error or unexpected results. Currently, the only checking the <code class="ph codeph">LOAD DATA</code> statement does is
+ to avoid mixing together uncompressed and LZO-compressed text files in the same table.
+ </li>
+
+ <li class="li">
+ When you specify an HDFS directory name as the <code class="ph codeph">LOAD DATA</code> argument, any hidden files in
+ that directory (files whose names start with a <code class="ph codeph">.</code>) are not moved to the Impala data
+ directory.
+ </li>
+
+ <li class="li">
+ The operation fails if the source directory contains any non-hidden directories.
+ Prior to <span class="keyword">Impala 2.5</span> if the source directory contained any subdirectory, even a hidden one such as
+ <span class="ph filepath">_impala_insert_staging</span>, the <code class="ph codeph">LOAD DATA</code> statement would fail.
+ In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">LOAD DATA</code> ignores hidden subdirectories in the
+ source directory, and only fails if any of the subdirectories are non-hidden.
+ </li>
+
+ <li class="li">
+ The loaded data files retain their original names in the new location, unless a name conflicts with an
+ existing data file, in which case the name of the new file is modified slightly to be unique. (The
+ name-mangling is a slight difference from the Hive <code class="ph codeph">LOAD DATA</code> statement, which replaces
+ identically named files.)
+ </li>
+
+ <li class="li">
+ By providing an easy way to transport files from known locations in HDFS into the Impala data directory
+ structure, the <code class="ph codeph">LOAD DATA</code> statement lets you avoid memorizing the locations and layout of
+ HDFS directory tree containing the Impala databases and tables. (For a quick way to check the location of
+ the data files for an Impala table, issue the statement <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table_name</var></code>.)
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">PARTITION</code> clause is especially convenient for ingesting new data for a partitioned
+ table. As you receive new data for a time period, geographic region, or other division that corresponds to
+ one or more partitioning columns, you can load that data straight into the appropriate Impala data
+ directory, which might be nested several levels down if the table is partitioned by multiple columns. When
+ the table is partitioned, you must specify constant values for all the partitioning columns.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because Impala currently cannot create Parquet data files containing complex types
+ (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), the
+ <code class="ph codeph">LOAD DATA</code> statement is especially important when working with
+ tables containing complex type columns. You create the Parquet data files outside
+ Impala, then use either <code class="ph codeph">LOAD DATA</code>, an external table, or HDFS-level
+ file operations followed by <code class="ph codeph">REFRESH</code> to associate the data files with
+ the corresponding table.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ First, we use a trivial Python script to write different numbers of strings (one per line) into files stored
+ in the <code class="ph codeph">doc_demo</code> HDFS user account. (Substitute the path for your own HDFS user account when
+ doing <span class="keyword cmdname">hdfs dfs</span> operations like these.)
+ </p>
+
+<pre class="pre codeblock"><code>$ random_strings.py 1000 | hdfs dfs -put - /user/doc_demo/thousand_strings.txt
+$ random_strings.py 100 | hdfs dfs -put - /user/doc_demo/hundred_strings.txt
+$ random_strings.py 10 | hdfs dfs -put - /user/doc_demo/ten_strings.txt</code></pre>
+
+ <p class="p">
+ Next, we create a table and load an initial set of data into it. Remember, unless you specify a
+ <code class="ph codeph">STORED AS</code> clause, Impala tables default to <code class="ph codeph">TEXTFILE</code> format with Ctrl-A (hex
+ 01) as the field delimiter. This example uses a single-column table, so the delimiter is not significant. For
+ large-scale ETL jobs, you would typically use binary format data files such as Parquet or Avro, and load them
+ into Impala tables that use the corresponding file format.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (s string);
+[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.61s
+[kilo2-202-961.cs1cloud.internal:21000] > select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0 |
++------+
+| 1000 |
++------+
+Returned 1 row(s) in 0.67s
+[localhost:21000] > load data inpath '/user/doc_demo/thousand_strings.txt' into table t1;
+ERROR: AnalysisException: INPATH location '/user/doc_demo/thousand_strings.txt' does not exist. </code></pre>
+
+ <p class="p">
+ As indicated by the message at the end of the previous example, the data file was moved from its original
+ location. The following example illustrates how the data file was moved into the Impala data directory for
+ the destination table, keeping its original filename:
+ </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -ls /user/hive/warehouse/load_data_testing.db/t1
+Found 1 items
+-rw-r--r-- 1 doc_demo doc_demo 13926 2013-06-26 15:40 /user/hive/warehouse/load_data_testing.db/t1/thousand_strings.txt</code></pre>
+
+ <p class="p">
+ The following example demonstrates the difference between the <code class="ph codeph">INTO TABLE</code> and
+ <code class="ph codeph">OVERWRITE TABLE</code> clauses. The table already contains 1000 rows. After issuing the
+ <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">INTO TABLE</code> clause, the table contains 100 more
+ rows, for a total of 1100. After issuing the <code class="ph codeph">LOAD DATA</code> statement with the <code class="ph codeph">OVERWRITE
+ INTO TABLE</code> clause, the former contents are gone, and now the table only contains the 10 rows from
+ the just-loaded data file.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > load data inpath '/user/doc_demo/hundred_strings.txt' into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.24s
+[localhost:21000] > select count(*) from t1;
+Query finished, fetching results ...
++------+
+| _c0 |
++------+
+| 1100 |
++------+
+Returned 1 row(s) in 0.55s
+[localhost:21000] > load data inpath '/user/doc_demo/ten_strings.txt' overwrite into table t1;
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] > select count(*) from t1;
+Query finished, fetching results ...
++-----+
+| _c0 |
++-----+
+| 10 |
++-----+
+Returned 1 row(s) in 0.62s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Amazon Simple Storage Service (S3).
+ The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+ partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+ </p>
+ <p class="p">
+ Because of differences between S3 and traditional filesystems, DML operations
+ for S3 tables can take longer than for tables on HDFS. For example, both the
+ <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+ to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the files are moved from a temporary staging directory to the final destination directory.)
+ Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+ actually copies the data files from one location to another and then removes the original files.
+ In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+ to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+ that a problem during statement execution could leave data in an inconsistent state.
+ It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+ <p class="p">See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.</p>
+
+ <p class="p">
+ <strong class="ph b">ADLS considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Azure Data Lake Store (ADLS).
+ The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+ partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+ </p>
+ <p class="p">See <a class="xref" href="impala_adls.html#adls">Using Impala with the Azure Data Lake Store (ADLS)</a> for details about reading and writing ADLS data with Impala.</p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and write
+ permissions for the files in the source directory, and write
+ permission for the destination directory.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement is an alternative to the
+ <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement.
+ Use <code class="ph codeph">LOAD DATA</code>
+ when you have the data files in HDFS but outside of any Impala table.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement is also an alternative
+ to the <code class="ph codeph">CREATE EXTERNAL TABLE</code> statement. Use
+ <code class="ph codeph">LOAD DATA</code> when it is appropriate to move the
+ data files under Impala control rather than querying them
+ from their original location. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+ for information about working with external tables.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_logging.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_logging.html b/docs/build3x/html/topics/impala_logging.html
new file mode 100644
index 0000000..a7cff05
--- /dev/null
+++ b/docs/build3x/html/topics/impala_logging.html
@@ -0,0 +1,423 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impa
la 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="logging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala Logging</title></head><body id="logging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala Logging</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala logs record information about:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Any errors Impala encountered. If Impala experienced a serious error during startup, you must diagnose and
+ troubleshoot that problem before you can do anything further with Impala.
+ </li>
+
+ <li class="li">
+ How Impala is configured.
+ </li>
+
+ <li class="li">
+ Jobs Impala has completed.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Formerly, the logs contained the query profile for each query, showing low-level details of how the work is
+ distributed among nodes and how intermediate and final results are transmitted across the network. To save
+ space, those query profiles are now stored in zlib-compressed files in
+ <span class="ph filepath">/var/log/impala/profiles</span>. You can access them through the Impala web user interface.
+ For example, at <code class="ph codeph">http://<var class="keyword varname">impalad-node-hostname</var>:25000/queries</code>, each query
+ is followed by a <code class="ph codeph">Profile</code> link leading to a page showing extensive analytical data for the
+ query execution.
+ </p>
+
+ <p class="p">
+ The auditing feature introduced in Impala 1.1.1 produces a separate set of audit log files when
+ enabled. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, you can control how many
+ audit event log files are kept on each host through the
+ <code class="ph codeph">--max_audit_event_log_files</code> startup option for the
+ <span class="keyword cmdname">impalad</span> daemon, similar to the <code class="ph codeph">--max_log_files</code>
+ option for regular log files.
+ </p>
+
+ <p class="p">
+ The lineage feature introduced in Impala 2.2.0 produces a separate lineage log file when
+ enabled. See <a class="xref" href="impala_lineage.html#lineage">Viewing Lineage Information for Impala Data</a> for details.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="logging__logs_details">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Locations and Names of Impala Log Files</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ By default, the log files are under the directory <span class="ph filepath">/var/log/impala</span>.
+ To change log file locations, modify the defaults file described in
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">impalad</code> process are <span class="ph filepath">impalad.INFO</span>,
+ <span class="ph filepath">impalad.WARNING</span>, and <span class="ph filepath">impalad.ERROR</span>. You might also see a file
+ <span class="ph filepath">impalad.FATAL</span>, although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">statestored</code> process are
+ <span class="ph filepath">statestored.INFO</span>, <span class="ph filepath">statestored.WARNING</span>, and
+ <span class="ph filepath">statestored.ERROR</span>. You might also see a file <span class="ph filepath">statestored.FATAL</span>,
+ although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ The significant files for the <code class="ph codeph">catalogd</code> process are <span class="ph filepath">catalogd.INFO</span>,
+ <span class="ph filepath">catalogd.WARNING</span>, and <span class="ph filepath">catalogd.ERROR</span>. You might also see a file
+ <span class="ph filepath">catalogd.FATAL</span>, although this is only present in rare conditions.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.INFO</code> files to see configuration settings for the processes.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.WARNING</code> files to see all kinds of problem information, including such
+ things as suboptimal settings and also serious runtime errors.
+ </li>
+
+ <li class="li">
+ Examine the <code class="ph codeph">.ERROR</code> and/or <code class="ph codeph">.FATAL</code> files to see only the most serious
+ errors, if the processes crash, or queries fail to complete. These messages are also in the
+ <code class="ph codeph">.WARNING</code> file.
+ </li>
+
+ <li class="li">
+ A new set of log files is produced each time the associated daemon is restarted. These log files have
+ long names including a timestamp. The <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and
+ <code class="ph codeph">.ERROR</code> files are physically represented as symbolic links to the latest applicable log
+ files.
+ </li>
+
+ <li class="li">
+ The init script for the <code class="ph codeph">impala-server</code> service also produces a consolidated log file
+ <code class="ph codeph">/var/logs/impalad/impala-server.log</code>, with all the same information as the
+ corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+ </li>
+
+ <li class="li">
+ The init script for the <code class="ph codeph">impala-state-store</code> service also produces a consolidated log file
+ <code class="ph codeph">/var/logs/impalad/impala-state-store.log</code>, with all the same information as the
+ corresponding<code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code> files.
+ </li>
+ </ul>
+
+ <p class="p">
+ Impala stores information using the <code class="ph codeph">glog_v</code> logging system. You will see some messages
+ referring to C++ file names. Logging is affected by:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">GLOG_v</code> environment variable specifies which types of messages are logged. See
+ <a class="xref" href="#log_levels">Setting Logging Levels</a> for details.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">--logbuflevel</code> startup flag for the <span class="keyword cmdname">impalad</span> daemon specifies how
+ often the log information is written to disk. The default is 0, meaning that the log is immediately
+ flushed to disk when Impala outputs an important messages such as a warning or an error, but less
+ important messages such as informational ones are buffered in memory rather than being flushed to disk
+ immediately.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="logging__logs_managing">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Managing Impala Logs</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Review Impala log files on each host, when you have traced an issue back to a specific system.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="logging__logs_rotate">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Rotating Impala Logs</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala periodically switches the physical files representing the current log files, after which it is safe
+ to remove the old files if they are no longer needed.
+ </p>
+
+ <p class="p">
+ Impala can automatically remove older unneeded log files, a feature known as <dfn class="term">log rotation</dfn>.
+
+ </p>
+
+ <p class="p">
+ In Impala 2.2 and higher, the <code class="ph codeph">--max_log_files</code> configuration option specifies how many log
+ files to keep at each severity level. You can specify an appropriate setting for each Impala-related daemon
+ (<span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and <span class="keyword cmdname">catalogd</span>). The default
+ value is 10, meaning that Impala preserves the latest 10 log files for each severity level
+ (<code class="ph codeph">INFO</code>, <code class="ph codeph">WARNING</code>, <code class="ph codeph">ERROR</code>, and <code class="ph codeph">FATAL</code>).
+ Impala checks to see if any old logs need to be removed based on the interval specified in the
+ <code class="ph codeph">logbufsecs</code> setting, every 5 seconds by default.
+ </p>
+
+
+
+ <p class="p">
+ A value of 0 preserves all log files, in which case you would set up set up manual log rotation using your
+ Linux tool or technique of choice. A value of 1 preserves only the very latest log file.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="logging__logs_debug">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Reviewing Impala Logs</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ By default, the Impala log is stored at <code class="ph codeph">/var/logs/impalad/</code>. The most comprehensive log,
+ showing informational, warning, and error messages, is in the file name <span class="ph filepath">impalad.INFO</span>.
+ View log file contents by using the web interface or by examining the contents of the log file. (When you
+ examine the logs through the file system, you can troubleshoot problems by reading the
+ <span class="ph filepath">impalad.WARNING</span> and/or <span class="ph filepath">impalad.ERROR</span> files, which contain the
+ subsets of messages indicating potential problems.)
+ </p>
+
+ <p class="p">
+ On a machine named <code class="ph codeph">impala.example.com</code> with default settings, you could view the Impala
+ logs on that machine by using a browser to access <code class="ph codeph">http://impala.example.com:25000/logs</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The web interface limits the amount of logging information displayed. To view every log entry, access the
+ log files directly through the file system.
+ </p>
+ </div>
+
+ <p class="p">
+ You can view the contents of the <code class="ph codeph">impalad.INFO</code> log file in the file system. With the
+ default configuration settings, the start of the log file appears as follows:
+ </p>
+
+<pre class="pre codeblock"><code>[user@example impalad]$ pwd
+/var/log/impalad
+[user@example impalad]$ more impalad.INFO
+Log file created at: 2013/01/07 08:42:12
+Running on machine: impala.example.com
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+I0107 08:42:12.292155 14876 daemon.cc:34] impalad version 0.4 RELEASE (build 9d7fadca0461ab40b9e9df8cdb47107ec6b27cff)
+Built on Fri, 21 Dec 2012 12:55:19 PST
+I0107 08:42:12.292484 14876 daemon.cc:35] Using hostname: impala.example.com
+I0107 08:42:12.292706 14876 logging.cc:76] Flags (see also /varz are on debug webserver):
+--dump_ir=false
+--module_output=
+--be_port=22000
+--classpath=
+--hostname=impala.example.com</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The preceding example shows only a small part of the log file. Impala log files are often several megabytes
+ in size.
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="logging__log_format">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Understanding Impala Log Contents</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The logs store information about Impala startup options. This information appears once for each time Impala
+ is started and may include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Machine name.
+ </li>
+
+ <li class="li">
+ Impala version number.
+ </li>
+
+ <li class="li">
+ Flags used to start Impala.
+ </li>
+
+ <li class="li">
+ CPU information.
+ </li>
+
+ <li class="li">
+ The number of available disks.
+ </li>
+ </ul>
+
+ <p class="p">
+ There is information about each job Impala has run. Because each Impala job creates an additional set of
+ data about queries, the amount of job specific data may be very large. Logs may contained detailed
+ information on jobs. These detailed log entries may include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The composition of the query.
+ </li>
+
+ <li class="li">
+ The degree of data locality.
+ </li>
+
+ <li class="li">
+ Statistics on data throughput and response times.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="logging__log_levels">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Setting Logging Levels</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala uses the GLOG system, which supports three logging levels. You can adjust logging levels
+ by exporting variable settings. To change logging settings manually, use a command
+ similar to the following on each node before starting <code class="ph codeph">impalad</code>:
+ </p>
+
+<pre class="pre codeblock"><code>export GLOG_v=1</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For performance reasons, do not enable the most verbose logging level of 3 unless there is
+ no other alternative for troubleshooting.
+ </div>
+
+ <p class="p">
+ For more information on how to configure GLOG, including how to set variable logging levels for different
+ system components, see
+ <a class="xref" href="https://github.com/google/glog" target="_blank">documentation for the glog project on github</a>.
+ </p>
+
+ <section class="section" id="log_levels__loglevels_details"><h3 class="title sectiontitle">Understanding What is Logged at Different Logging Levels</h3>
+
+
+
+ <p class="p">
+ As logging levels increase, the categories of information logged are cumulative. For example, GLOG_v=2
+ records everything GLOG_v=1 records, as well as additional information.
+ </p>
+
+ <p class="p">
+ Increasing logging levels imposes performance overhead and increases log size. Where practical, use
+ GLOG_v=1 for most cases: this level has minimal performance impact but still captures useful
+ troubleshooting information.
+ </p>
+
+ <p class="p">
+ Additional information logged at each level is as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ GLOG_v=1 - The default level. Logs information about each connection and query that is initiated to an
+ <code class="ph codeph">impalad</code> instance, including runtime profiles.
+ </li>
+
+ <li class="li">
+ GLOG_v=2 - Everything from the previous level plus information for each RPC initiated. This level also
+ records query execution progress information, including details on each file that is read.
+ </li>
+
+ <li class="li">
+ GLOG_v=3 - Everything from the previous level plus logging of every row that is read. This level is
+ only applicable for the most serious troubleshooting and tuning scenarios, because it can produce
+ exceptionally large and detailed log files, potentially leading to its own set of performance and
+ capacity problems.
+ </li>
+ </ul>
+
+ </section>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="logging__redaction">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Redacting Sensitive Information from Impala Log Files</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ <dfn class="term">Log redaction</dfn> is a security feature that prevents sensitive information from being displayed in
+ locations used by administrators for monitoring and troubleshooting, such as log files and the Impala debug web
+ user interface. You configure regular expressions that match sensitive types of information processed by your
+ system, such as credit card numbers or tax IDs, and literals matching these patterns are obfuscated wherever
+ they would normally be recorded in log files or displayed in administration or debugging user interfaces.
+ </p>
+
+ <p class="p">
+ In a security context, the log redaction feature is complementary to the Sentry authorization framework.
+ Sentry prevents unauthorized users from being able to directly access table data. Redaction prevents
+ administrators or support personnel from seeing the smaller amounts of sensitive or personally identifying
+ information (PII) that might appear in queries issued by those authorized users.
+ </p>
+
+ <p class="p">
+ See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details about how to enable this feature and set
+ up the regular expressions to detect and redact sensitive information within SQL statement text.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_map.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_map.html b/docs/build3x/html/topics/impala_map.html
new file mode 100644
index 0000000..3325a9b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_map.html
@@ -0,0 +1,331 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="map"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>MAP Complex Type (Impala 2.3 or higher only)</title></head><body id="map"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">MAP Complex Type (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A complex data type representing an arbitrary set of key-value pairs.
+ The key part is a scalar type, while the value part can be a scalar or
+ another complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+ or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> MAP < <var class="keyword varname">primitive_type</var>, <var class="keyword varname">type</var> >
+
+type ::= <var class="keyword varname">primitive_type</var> | <var class="keyword varname">complex_type</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because complex types are often used in combination,
+ for example an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, if you are unfamiliar with the Impala complex types,
+ start with <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+ background information and usage examples.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">MAP</code> complex data type represents a set of key-value pairs.
+ Each element of the map is indexed by a primitive type such as <code class="ph codeph">BIGINT</code> or
+ <code class="ph codeph">STRING</code>, letting you define sequences that are not continuous or categories with arbitrary names.
+ You might find it convenient for modelling data produced in other languages, such as a
+ Python dictionary or Java HashMap, where a single scalar value serves as the lookup key.
+ </p>
+
+ <p class="p">
+ In a big data context, the keys in a map column might represent a numeric sequence of events during a
+ manufacturing process, or <code class="ph codeph">TIMESTAMP</code> values corresponding to sensor observations.
+ The map itself is inherently unordered, so you choose whether to make the key values significant
+ (such as a recorded <code class="ph codeph">TIMESTAMP</code>) or synthetic (such as a random global universal ID).
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Behind the scenes, the <code class="ph codeph">MAP</code> type is implemented in a similar way as the
+ <code class="ph codeph">ARRAY</code> type. Impala does not enforce any uniqueness constraint on the
+ <code class="ph codeph">KEY</code> values, and the <code class="ph codeph">KEY</code> values are processed by
+ looping through the elements of the <code class="ph codeph">MAP</code> rather than by a constant-time lookup.
+ Therefore, this type is primarily for ease of understanding when importing data and
+ algorithms from non-SQL contexts, rather than optimizing the performance of key lookups.
+ </div>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Columns with this data type can only be used in tables or partitions with the Parquet file format.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Columns with this data type cannot be used as partition key columns in a partitioned table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement does not produce any statistics for columns of this data type.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p" id="map__d6e3285">
+ The maximum length of the column definition for any complex type, including declarations for any nested types,
+ is 4000 characters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types_limits">Limitations and Restrictions for Complex Types</a> for a full list of limitations
+ and associated guidelines about complex type columns.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+ <p class="p">
+ The following example shows a table with various kinds of <code class="ph codeph">MAP</code> columns,
+ both at the top level and nested within other complex types.
+ Each row represents information about a specific country, with complex type fields
+ of various levels of nesting to represent different information associated
+ with the country: factual measurements such as area and population,
+ notable people in different categories, geographic features such as
+ cities, points of interest within each city, and mountains with associated facts.
+ Practice the <code class="ph codeph">CREATE TABLE</code> and query notation for complex type columns
+ using empty tables, until you can visualize a complex data structure and construct corresponding SQL statements reliably.
+ </p>
+
+<pre class="pre codeblock"><code>create TABLE map_demo
+(
+ country_id BIGINT,
+
+-- Numeric facts about each country, looked up by name.
+-- For example, 'Area':1000, 'Population':999999.
+-- Using a MAP instead of a STRUCT because there could be
+-- a different set of facts for each country.
+ metrics MAP <STRING, BIGINT>,
+
+-- MAP whose value part is an ARRAY.
+-- For example, the key 'Famous Politicians' could represent an array of 10 elements,
+-- while the key 'Famous Actors' could represent an array of 20 elements.
+ notables MAP <STRING, ARRAY <STRING>>,
+
+-- MAP that is a field within a STRUCT.
+-- (The STRUCT is inside another ARRAY, because it is rare
+-- for a STRUCT to be a top-level column.)
+-- For example, city #1 might have points of interest with key 'Zoo',
+-- representing an array of 3 different zoos.
+-- City #2 might have completely different kinds of points of interest.
+-- Because the set of field names is potentially large, and most entries could be blank,
+-- a MAP makes more sense than a STRUCT to represent such a sparse data structure.
+ cities ARRAY < STRUCT <
+ name: STRING,
+ points_of_interest: MAP <STRING, ARRAY <STRING>>
+ >>,
+
+-- MAP that is an element within an ARRAY. The MAP is inside a STRUCT field to associate
+-- the mountain name with all the facts about the mountain.
+-- The "key" of the map (the first STRING field) represents the name of some fact whose value
+-- can be expressed as an integer, such as 'Height', 'Year First Climbed', and so on.
+ mountains ARRAY < STRUCT < name: STRING, facts: MAP <STRING, INT > > >
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>DESCRIBE map_demo;
++------------+------------------------------------------------+
+| name | type |
++------------+------------------------------------------------+
+| country_id | bigint |
+| metrics | map<string,bigint> |
+| notables | map<string,array<string>> |
+| cities | array<struct< |
+| | name:string, |
+| | points_of_interest:map<string,array<string>> |
+| | >> |
+| mountains | array<struct< |
+| | name:string, |
+| | facts:map<string,int> |
+| | >> |
++------------+------------------------------------------------+
+
+DESCRIBE map_demo.metrics;
++-------+--------+
+| name | type |
++-------+--------+
+| key | string |
+| value | bigint |
++-------+--------+
+
+DESCRIBE map_demo.notables;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+DESCRIBE map_demo.notables.value;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE map_demo.cities;
++------+------------------------------------------------+
+| name | type |
++------+------------------------------------------------+
+| item | struct< |
+| | name:string, |
+| | points_of_interest:map<string,array<string>> |
+| | > |
+| pos | bigint |
++------+------------------------------------------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest;
++-------+---------------+
+| name | type |
++-------+---------------+
+| key | string |
+| value | array<string> |
++-------+---------------+
+
+DESCRIBE map_demo.cities.item.points_of_interest.value;
++------+--------+
+| name | type |
++------+--------+
+| item | string |
+| pos | bigint |
++------+--------+
+
+DESCRIBE map_demo.mountains;
++------+-------------------------+
+| name | type |
++------+-------------------------+
+| item | struct< |
+| | name:string, |
+| | facts:map<string,int> |
+| | > |
+| pos | bigint |
++------+-------------------------+
+
+DESCRIBE map_demo.mountains.item.facts;
++-------+--------+
+| name | type |
++-------+--------+
+| key | string |
+| value | int |
++-------+--------+
+
+</code></pre>
+
+ <p class="p">
+ The following example shows a table that uses a variety of data types for the <code class="ph codeph">MAP</code>
+ <span class="q">"key"</span> field. Typically, you use <code class="ph codeph">BIGINT</code> or <code class="ph codeph">STRING</code> to use
+ numeric or character-based keys without worrying about exceeding any size or length constraints.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE map_demo_obscure
+(
+ id BIGINT,
+ m1 MAP <INT, INT>,
+ m2 MAP <SMALLINT, INT>,
+ m3 MAP <TINYINT, INT>,
+ m4 MAP <TIMESTAMP, INT>,
+ m5 MAP <BOOLEAN, INT>,
+ m6 MAP <CHAR(5), INT>,
+ m7 MAP <VARCHAR(25), INT>,
+ m8 MAP <FLOAT, INT>,
+ m9 MAP <DOUBLE, INT>,
+ m10 MAP <DECIMAL(12,2), INT>
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+<pre class="pre codeblock"><code>CREATE TABLE celebrities (name STRING, birth_year MAP < STRING, SMALLINT >) STORED AS PARQUET;
+-- A typical row might represent values with 2 different birth years, such as:
+-- ("Joe Movie Star", { "real": 1972, "claimed": 1977 })
+
+CREATE TABLE countries (name STRING, famous_leaders MAP < INT, STRING >) STORED AS PARQUET;
+-- A typical row might represent values with different leaders, with key values corresponding to their numeric sequence, such as:
+-- ("United States", { 1: "George Washington", 3: "Thomas Jefferson", 16: "Abraham Lincoln" })
+
+CREATE TABLE airlines (name STRING, special_meals MAP < STRING, MAP < STRING, STRING > >) STORED AS PARQUET;
+-- A typical row might represent values with multiple kinds of meals, each with several components:
+-- ("Elegant Airlines",
+-- {
+-- "vegetarian": { "breakfast": "pancakes", "snack": "cookies", "dinner": "rice pilaf" },
+-- "gluten free": { "breakfast": "oatmeal", "snack": "fruit", "dinner": "chicken" }
+-- } )
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a>,
+ <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+ <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+
+ </p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
[13/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_stats.html b/docs/build3x/html/topics/impala_perf_stats.html
new file mode 100644
index 0000000..c4bdf0c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_stats.html
@@ -0,0 +1,1192 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Table and Column Statistics</title></head><body id="perf_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Table and Column Statistics</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala can do better optimization for complex or multi-table queries when it has access to
+ statistics about the volume of data and how the values are distributed. Impala uses this
+ information to help parallelize and distribute the work for a query. For example,
+ optimizing join queries requires a way of determining if one table is <span class="q">"bigger"</span> than
+ another, which is a function of the number of rows and the average row size for each
+ table. The following sections describe the categories of statistics Impala can work with,
+ and how to produce them and keep them up to date.
+ </p>
+
+ <p class="p toc inpage all"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav><article class="topic concept nested1" aria-labelledby="perf_table_stats__table_stats" id="perf_stats__perf_table_stats">
+
+ <h2 class="title topictitle2" id="perf_table_stats__table_stats">Overview of Table Statistics</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala query planner can make use of statistics about entire tables and partitions.
+ This information includes physical characteristics such as the number of rows, number of
+ data files, the total size of the data files, and the file format. For partitioned
+ tables, the numbers are calculated per partition, and as totals for the whole table.
+ This metadata is stored in the metastore database, and can be updated by either Impala
+ or Hive. If a number is not available, the value -1 is used as a placeholder. Some
+ numbers, such as number and total sizes of data files, are always kept up to date
+ because they can be calculated cheaply, as part of gathering HDFS block metadata.
+ </p>
+
+ <p class="p">
+ The following example shows table stats for an unpartitioned Parquet table. The values
+ for the number and sizes of files are always available. Initially, the number of rows is
+ not known, because it requires a potentially expensive scan through the entire table,
+ and so that value is displayed as -1. The <code class="ph codeph">COMPUTE STATS</code> statement fills
+ in any unknown table stats values.
+ </p>
+
+<pre class="pre codeblock"><code>
+show table stats parquet_snappy;
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+| -1 | 96 | 23.35GB | NOT CACHED | NOT CACHED | PARQUET | false |...
++-------+--------+---------+--------------+-------------------+---------+-------------------+...
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+
+show table stats parquet_snappy;
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+| 1000000000 | 96 | 23.35GB | NOT CACHED | NOT CACHED | PARQUET | false |...
++------------+--------+---------+--------------+-------------------+---------+-------------------+...
+</code></pre>
+
+ <p class="p">
+ Impala performs some optimizations using this metadata on its own, and other
+ optimizations by using a combination of table and column statistics.
+ </p>
+
+ <p class="p">
+ To check that table statistics are available for a table, and see the details of those
+ statistics, use the statement <code class="ph codeph">SHOW TABLE STATS
+ <var class="keyword varname">table_name</var></code>. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+ details.
+ </p>
+
+ <p class="p">
+ If you use the Hive-based methods of gathering statistics, see
+ <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/StatsDev" target="_blank">the
+ Hive wiki</a> for information about the required configuration on the Hive side.
+ Where practical, use the Impala <code class="ph codeph">COMPUTE STATS</code> statement to avoid
+ potential configuration and scalability issues with the statistics-gathering process.
+ </p>
+
+ <p class="p">
+ If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+ Impala can only use the resulting column statistics if the table is unpartitioned.
+ Impala cannot use Hive-generated column statistics for a partitioned table.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="perf_column_stats__column_stats" id="perf_stats__perf_column_stats">
+
+ <h2 class="title topictitle2" id="perf_column_stats__column_stats">Overview of Column Statistics</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala query planner can make use of statistics about individual columns when that
+ metadata is available in the metastore database. This technique is most valuable for
+ columns compared across tables in <a class="xref" href="impala_perf_joins.html#perf_joins">join
+ queries</a>, to help estimate how many rows the query will retrieve from each table.
+ <span class="ph"> These statistics are also important for correlated subqueries using the
+ <code class="ph codeph">EXISTS()</code> or <code class="ph codeph">IN()</code> operators, which are processed
+ internally the same way as join queries.</span>
+ </p>
+
+ <p class="p">
+ The following example shows column stats for an unpartitioned Parquet table. The values
+ for the maximum and average sizes of some types are always available, because those
+ figures are constant for numeric and other fixed-size types. Initially, the number of
+ distinct values is not known, because it requires a potentially expensive scan through
+ the entire table, and so that value is displayed as -1. The same applies to maximum and
+ average sizes of variable-sized types, such as <code class="ph codeph">STRING</code>. The
+ <code class="ph codeph">COMPUTE STATS</code> statement fills in most unknown column stats values. (It
+ does not record the number of <code class="ph codeph">NULL</code> values, because currently Impala
+ does not use that figure for query optimization.)
+ </p>
+
+<pre class="pre codeblock"><code>
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-------------+----------+------------------+--------+----------+----------+
+| id | BIGINT | -1 | -1 | 8 | 8 |
+| val | INT | -1 | -1 | 4 | 4 |
+| zerofill | STRING | -1 | -1 | -1 | -1 |
+| name | STRING | -1 | -1 | -1 | -1 |
+| assertion | BOOLEAN | -1 | -1 | 1 | 1 |
+| location_id | SMALLINT | -1 | -1 | 2 | 2 |
++-------------+----------+------------------+--------+----------+----------+
+
+compute stats parquet_snappy;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 6 column(s). |
++-----------------------------------------+
+
+show column stats parquet_snappy;
++-------------+----------+------------------+--------+----------+-------------------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-------------+----------+------------------+--------+----------+-------------------+
+| id | BIGINT | 183861280 | -1 | 8 | 8 |
+| val | INT | 139017 | -1 | 4 | 4 |
+| zerofill | STRING | 101761 | -1 | 6 | 6 |
+| name | STRING | 145636240 | -1 | 22 | 13.00020027160645 |
+| assertion | BOOLEAN | 2 | -1 | 1 | 1 |
+| location_id | SMALLINT | 339 | -1 | 2 | 2 |
++-------------+----------+------------------+--------+----------+-------------------+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ For column statistics to be effective in Impala, you also need to have table
+ statistics for the applicable tables, as described in
+ <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a>. When you use the Impala
+ <code class="ph codeph">COMPUTE STATS</code> statement, both table and column statistics are
+ automatically gathered at the same time, for all columns in the table.
+ </p>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span> Prior to Impala 1.4.0,
+ <code class="ph codeph">COMPUTE STATS</code> counted the number of
+ <code class="ph codeph">NULL</code> values in each column and recorded that figure
+ in the metastore database. Because Impala does not currently use the
+ <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+ higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+ skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+ <p class="p">
+ To check whether column statistics are available for a particular set of columns, use
+ the <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statement, or check
+ the extended <code class="ph codeph">EXPLAIN</code> output for a query against that table that refers
+ to those columns. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> and
+ <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> for details.
+ </p>
+
+ <p class="p">
+ If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+ Impala can only use the resulting column statistics if the table is unpartitioned.
+ Impala cannot use Hive-generated column statistics for a partitioned table.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="perf_stats_partitions__stats_partitions" id="perf_stats__perf_stats_partitions">
+
+ <h2 class="title topictitle2" id="perf_stats_partitions__stats_partitions">How Table and Column Statistics Work for Partitioned Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When you use Impala for <span class="q">"big data"</span>, you are highly likely to use partitioning for
+ your biggest tables, the ones representing data that can be logically divided based on
+ dates, geographic regions, or similar criteria. The table and column statistics are
+ especially useful for optimizing queries on such tables. For example, a query involving
+ one year might involve substantially more or less data than a query involving a
+ different year, or a range of several years. Each query might be optimized differently
+ as a result.
+ </p>
+
+ <p class="p">
+ The following examples show how table and column stats work with a partitioned table.
+ The table for this example is partitioned by year, month, and day. For simplicity, the
+ sample data consists of 5 partitions, all from the same year and month. Table stats are
+ collected independently for each partition. (In fact, the <code class="ph codeph">SHOW
+ PARTITIONS</code> statement displays exactly the same information as <code class="ph codeph">SHOW
+ TABLE STATS</code> for a partitioned table.) Column stats apply to the entire table,
+ not to individual partitions. Because the partition key column values are represented as
+ HDFS directories, their characteristics are typically known in advance, even when the
+ values for non-key columns are shown as -1.
+ </p>
+
+<pre class="pre codeblock"><code>
+show partitions year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013 | 12 | 1 | -1 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 2 | -1 | 1 | 2.53MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 3 | -1 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 4 | -1 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 5 | -1 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| Total | | | -1 | 5 | 12.58MB | 0B | | |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show table stats year_month_day;
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+| 2013 | 12 | 1 | -1 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 2 | -1 | 1 | 2.53MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 3 | -1 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 4 | -1 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 5 | -1 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| Total | | | -1 | 5 | 12.58MB | 0B | | |...
++-------+-------+-----+-------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------+---------+------------------+--------+----------+----------+
+| id | INT | -1 | -1 | 4 | 4 |
+| val | INT | -1 | -1 | 4 | 4 |
+| zfill | STRING | -1 | -1 | -1 | -1 |
+| name | STRING | -1 | -1 | -1 | -1 |
+| assertion | BOOLEAN | -1 | -1 | 1 | 1 |
+| year | INT | 1 | 0 | 4 | 4 |
+| month | INT | 1 | 0 | 4 | 4 |
+| day | INT | 5 | 0 | 4 | 4 |
++-----------+---------+------------------+--------+----------+----------+
+
+compute stats year_month_day;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 5 partition(s) and 5 column(s). |
++-----------------------------------------+
+
+show table stats year_month_day;
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| year | month | day | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+| 2013 | 12 | 1 | 93606 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 2 | 94158 | 1 | 2.53MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 3 | 94122 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 4 | 93559 | 1 | 2.51MB | NOT CACHED | NOT CACHED | PARQUET |...
+| 2013 | 12 | 5 | 93845 | 1 | 2.52MB | NOT CACHED | NOT CACHED | PARQUET |...
+| Total | | | 469290 | 5 | 12.58MB | 0B | | |...
++-------+-------+-----+--------+--------+---------+--------------+-------------------+---------+...
+
+show column stats year_month_day;
++-----------+---------+------------------+--------+----------+-------------------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++-----------+---------+------------------+--------+----------+-------------------+
+| id | INT | 511129 | -1 | 4 | 4 |
+| val | INT | 364853 | -1 | 4 | 4 |
+| zfill | STRING | 311430 | -1 | 6 | 6 |
+| name | STRING | 471975 | -1 | 22 | 13.00160026550293 |
+| assertion | BOOLEAN | 2 | -1 | 1 | 1 |
+| year | INT | 1 | 0 | 4 | 4 |
+| month | INT | 1 | 0 | 4 | 4 |
+| day | INT | 5 | 0 | 4 | 4 |
++-----------+---------+------------------+--------+----------+-------------------+
+</code></pre>
+
+ <p class="p">
+ If you run the Hive statement <code class="ph codeph">ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS</code>,
+ Impala can only use the resulting column statistics if the table is unpartitioned.
+ Impala cannot use Hive-generated column statistics for a partitioned table.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="perf_stats__perf_generating_stats">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Generating Table and Column Statistics</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Use the <code class="ph codeph">COMPUTE STATS</code> family of commands to collect table and
+ column statistics. The <code class="ph codeph">COMPUTE STATS</code> variants offer
+ different tradeoffs between computation cost, staleness, and maintenance
+ workflows which are explained below.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+ alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+ vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+ making the switch.
+ </p>
+ </div>
+
+
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="perf_generating_stats__concept_y2f_nfl_mdb">
+
+ <h3 class="title topictitle3" id="ariaid-title6">COMPUTE STATS</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> command collects and sets the table-level
+ and partition-level row counts as well as all column statistics for a given
+ table. The collection process is CPU-intensive and can take a long time to
+ complete for very large tables.
+ </p>
+ <div class="p">
+ To speed up <code class="ph codeph">COMPUTE STATS</code> consider the following options
+ which can be combined.
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Limit the number of columns for which statistics are collected to increase
+ the efficiency of COMPUTE STATS. Queries benefit from statistics for those
+ columns involved in filters, join conditions, group by or partition by
+ clauses. Other columns are good candidates to exclude from COMPUTE STATS.
+ This feature is available since Impala 2.12.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Set the MT_DOP query option to use more threads within each participating
+ impalad to compute the statistics faster - but not more efficiently. Note
+ that computing stats on a large table with a high MT_DOP value can
+ negatively affect other queries running at the same time if the
+ COMPUTE STATS claims most CPU cycles.
+ This feature is available since Impala 2.8.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Consider the experimental extrapolation and sampling features (see below)
+ to further increase the efficiency of computing stats.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <code class="ph codeph">COMPUTE STATS</code> is intended to be run periodically,
+ e.g. weekly, or on-demand when the contents of a table have changed
+ significantly. Due to the high resource utilization and long repsonse
+ time of t<code class="ph codeph">COMPUTE STATS</code>, it is most practical to run it
+ in a scheduled maintnance window where the Impala cluster is idle
+ enough to accommodate the expensive operation. The degree of change that
+ qualifies as <span class="q">"significant"</span> depends on the query workload, but typically,
+ if 30% of the rows have changed then it is recommended to recompute
+ statistics.
+ </p>
+
+ <p class="p">
+ If you reload a complete new set of data for a table, but the number of rows and
+ number of distinct values for each column is relatively unchanged from before, you
+ do not need to recompute stats for the table.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title7" id="concept_y2f_nfl_mdb__experimental_stats_features">
+ <h4 class="title topictitle4" id="ariaid-title7">Experimental: Extrapolation and Sampling</h4>
+ <div class="body conbody">
+ <div class="p">
+ Impala 2.12 and higher includes two experimental features to alleviate
+ common issues for computing and maintaining statistics on very large tables.
+ The following shortcomings are improved upon:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Newly added partitions do not have row count statistics. Table scans
+ that only access those new partitions are treated as not having stats.
+ Similarly, table scans that access both new and old partitions estimate
+ the scan cardinality based on those old partitions that have stats, and
+ the new partitions without stats are treated as having 0 rows.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The row counts of existing partitions become stale when data is added
+ or dropped.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Computing stats for tables with a 100,000 or more partitions might fail
+ or be very slow due to the high cost of updating the partition metadata
+ in the Hive Metastore.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ With transient compute resources it is important to minimize the time
+ from starting a new cluster to successfully running queries.
+ Since the cluster might be relatively short-lived, users might prefer to
+ quickly collect stats that are "good enough" as opposed to spending
+ a lot of time and resouces on computing full-fidelity stats.
+ </p>
+ </li>
+ </ul>
+ For very large tables, it is often wasteful or impractical to run a full
+ COMPUTE STATS to address the scenarios above on a frequent basis.
+ </div>
+ <p class="p">
+ The sampling feature makes COMPUTE STATS more efficient by processing a
+ fraction of the table data, and the extrapolation feature aims to reduce
+ the frequency at which COMPUTE STATS needs to be re-run by estimating
+ the row count of new and modified partitions.
+ </p>
+ <p class="p">
+ The sampling and extrapolation features are disabled by default.
+ They can be enabled globally or for specific tables, as follows.
+ Set the impalad start-up configuration "--enable_stats_extrapolation" to
+ enable the features globally. To enable them only for a specific table, set
+ the "impala.enable.stats.extrapolation" table property to "true" for the
+ desired table. The tbale-level property overrides the global setting, so
+ it is also possible to enable sampling and extrapolation globally, but
+ disable it for specific tables by setting the table property to "false".
+ Example:
+ ALTER TABLE mytable test_table SET TBLPROPERTIES("impala.enable.stats.extrapolation"="true")
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Why are these features experimental? Due to their probabilistic nature
+ it is possible that these features perform pathologically poorly on tables
+ with extreme data/file/size distributions. Since it is not feasible for us
+ to test all possible scenarios we only cautiously advertise these new
+ capabilities. That said, the features have been thoroughly tested and
+ are considered functionally stable. If you decide to give these features
+ a try, please tell us about your experience at user@impala.apache.org!
+ We rely on user feedback to guide future inprovements in statistics
+ collection.
+ </div>
+ </div>
+
+ <article class="topic concept nested4" aria-labelledby="ariaid-title8" id="experimental_stats_features__experimental_stats_extrapolation">
+ <h5 class="title topictitle5" id="ariaid-title8">Stats Extrapolation</h5>
+ <div class="body conbody">
+ <p class="p">
+ The main idea of stats extrapolation is to estimate the row count of new
+ and modified partitions based on the result of the last COMPUTE STATS.
+ Enabling stats extrapolation changes the behavior of COMPUTE STATS,
+ as well as the cardinality estimation of table scans. COMPUTE STATS no
+ longer computes and stores per-partition row counts, and instead, only
+ computes a table-level row count together with the total number of file
+ bytes in the table at that time. No partition metadata is modified. The
+ input cardinality of a table scan is estimated by converting the data
+ volume of relevant partitions to a row count, based on the table-level
+ row count and file bytes statistics. It is assumed that within the same
+ table, different sets of files with the same data volume correspond
+ to the similar number of rows on average. With extrapolation enabled,
+ the scan cardinality estimation ignores per-partition row counts. It
+ only relies on the table-level statistics and the scanned data volume.
+ </p>
+ <p class="p">
+ The SHOW TABLE STATS and EXPLAIN commands distinguish between row counts
+ stored in the Hive Metastore, and the row counts extrapolated based on the
+ above process. Consult the SHOW TABLE STATS and EXPLAIN documentation
+ for more details.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested4" aria-labelledby="ariaid-title9" id="experimental_stats_features__experimental_stats_sampling">
+ <h5 class="title topictitle5" id="ariaid-title9">Sampling</h5>
+ <div class="body conbody">
+ <p class="p">
+ A TABLESAMPLE clause may be added to COMPUTE STATS to limit the
+ percentage of data to be processed. The final statistics are obtained
+ by extrapolating the statistics from the data sample over the entire table.
+ The extrapolated statistics are stored in the Hive Metastore, just as if no
+ sampling was used. The following example runs COMPUTE STATS over a 10 percent
+ data sample: COMPUTE STATS test_table TABLESAMPLE SYSTEM(10)
+ </p>
+ <p class="p">
+ We have found that a 10 percent sampling rate typically offers a good
+ tradeoff between statistics accuracy and execution cost. A sampling rate
+ well below 10 percent has shown poor results and is not recommended.
+ </p>
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ Sampling-based techniques sacrifice result accuracy for execution
+ efficiency, so your mileage may vary for different tables and columns
+ depending on their data distribution. The extrapolation procedure Impala
+ uses for estimating the number of distinct values per column is inherently
+ non-detetministic, so your results may even vary between runs of
+ COMPUTE STATS TABLESAMPLE, even if no data has changed.
+ </div>
+ </div>
+ </article>
+ </article>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="perf_generating_stats__concept_bmk_pfl_mdb">
+
+ <h3 class="title topictitle3" id="ariaid-title10">COMPUTE INCREMENTAL STATS</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In Impala 2.1.0 and higher, you can use the
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> and
+ <code class="ph codeph">DROP INCREMENTAL STATS</code> commands.
+ The <code class="ph codeph">INCREMENTAL</code> clauses work with incremental statistics,
+ a specialized feature for partitioned tables.
+ </p>
+
+ <p class="p">
+ When you compute incremental statistics for a partitioned table, by default Impala only
+ processes those partitions that do not yet have incremental statistics. By processing
+ only newly added partitions, you can keep statistics up to date without incurring the
+ overhead of reprocessing the entire table each time.
+ </p>
+
+ <p class="p">
+ You can also compute or drop statistics for a specified subset of partitions by
+ including a <code class="ph codeph">PARTITION</code> clause in the
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or <code class="ph codeph">DROP INCREMENTAL STATS</code>
+ statement.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ For a table with a huge number of partitions and many columns, the approximately 400 bytes
+ of metadata per column per partition can add up to significant memory overhead, as it must
+ be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+ that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+ you might experience service downtime.
+ </p>
+ <p class="p">
+ When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+ the statistics are computed again from scratch regardless of whether the table already
+ has statistics. Therefore, expect a one-time resource-intensive operation
+ for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ for the first time on a given table.
+ </p>
+ </div>
+
+ <p class="p">
+ The metadata for incremental statistics is handled differently from the original style
+ of statistics:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Issuing a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> without a partition
+ clause causes Impala to compute incremental stats for all partitions that
+ do not already have incremental stats. This might be the entire table when
+ running the command for the first time, but subsequent runs should only
+ update new partitions. You can force updating a partition that already has
+ incremental stats by issuing a <code class="ph codeph">DROP INCREMENTAL STATS</code>
+ before running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">SHOW TABLE STATS</code> and <code class="ph codeph">SHOW PARTITIONS</code>
+ statements now include an additional column showing whether incremental statistics
+ are available for each column. A partition could already be covered by the original
+ type of statistics based on a prior <code class="ph codeph">COMPUTE STATS</code> statement, as
+ indicated by a value other than <code class="ph codeph">-1</code> under the <code class="ph codeph">#Rows</code>
+ column. Impala query planning uses either kind of statistics when available.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> takes more time than <code class="ph codeph">COMPUTE
+ STATS</code> for the same volume of data. Therefore it is most suitable for tables
+ with large data volume where new partitions are added frequently, making it
+ impractical to run a full <code class="ph codeph">COMPUTE STATS</code> operation for each new
+ partition. For unpartitioned tables, or partitioned tables that are loaded once and
+ not updated with new partitions, use the original <code class="ph codeph">COMPUTE STATS</code>
+ syntax.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> uses some memory in the
+ <span class="keyword cmdname">catalogd</span> process, proportional to the number of partitions and
+ number of columns in the applicable table. The memory overhead is approximately 400
+ bytes for each column in each partition. This memory is reserved in the
+ <span class="keyword cmdname">catalogd</span> daemon, the <span class="keyword cmdname">statestored</span> daemon, and
+ in each instance of the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In cases where new files are added to an existing partition, issue a
+ <code class="ph codeph">REFRESH</code> statement for the table, followed by a <code class="ph codeph">DROP
+ INCREMENTAL STATS</code> and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> sequence
+ for the changed partition.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DROP INCREMENTAL STATS</code> statement operates only on a single
+ partition at a time. To remove statistics (whether incremental or not) from all
+ partitions of a table, issue a <code class="ph codeph">DROP STATS</code> statement with no
+ <code class="ph codeph">INCREMENTAL</code> or <code class="ph codeph">PARTITION</code> clauses.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The following considerations apply to incremental statistics when the structure of an
+ existing table is changed (known as <dfn class="term">schema evolution</dfn>):
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If you use an <code class="ph codeph">ALTER TABLE</code> statement to drop a column, the existing
+ statistics remain valid and <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> does not
+ rescan any partitions.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use an <code class="ph codeph">ALTER TABLE</code> statement to add a column, Impala rescans
+ all partitions and fills in the appropriate column-level values the next time you
+ run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the data type of a
+ column, Impala rescans all partitions and fills in the appropriate column-level
+ values the next time you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use an <code class="ph codeph">ALTER TABLE</code> statement to change the file format of a
+ table, the existing statistics remain valid and a subsequent <code class="ph codeph">COMPUTE
+ INCREMENTAL STATS</code> does not rescan any partitions.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> and
+ <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a> for syntax details.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="perf_stats__perf_stats_checking">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Detecting Missing Statistics</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can check whether a specific table has statistics using the <code class="ph codeph">SHOW TABLE
+ STATS</code> statement (for any table) or the <code class="ph codeph">SHOW PARTITIONS</code>
+ statement (for a partitioned table). Both statements display the same information. If a
+ table or a partition does not have any statistics, the <code class="ph codeph">#Rows</code> field
+ contains <code class="ph codeph">-1</code>. Once you compute statistics for the table or partition,
+ the <code class="ph codeph">#Rows</code> field changes to an accurate value.
+ </p>
+
+ <p class="p">
+ The following example shows a table that initially does not have any statistics. The
+ <code class="ph codeph">SHOW TABLE STATS</code> statement displays different values for
+ <code class="ph codeph">#Rows</code> before and after the <code class="ph codeph">COMPUTE STATS</code> operation.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table no_stats (x int);
+[localhost:21000] > show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| -1 | 0 | 0B | NOT CACHED | TEXT | false |
++-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] > compute stats no_stats;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] > show table stats no_stats;
++-------+--------+------+--------------+--------+-------------------+
+| #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+--------+------+--------------+--------+-------------------+
+| 0 | 0 | 0B | NOT CACHED | TEXT | false |
++-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows a similar progression with a partitioned table. Initially,
+ <code class="ph codeph">#Rows</code> is <code class="ph codeph">-1</code>. After a <code class="ph codeph">COMPUTE STATS</code>
+ operation, <code class="ph codeph">#Rows</code> changes to an accurate value. Any newly added
+ partition starts with no statistics, meaning that you must collect statistics after
+ adding a new partition.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table no_stats_partitioned (x int) partitioned by (year smallint);
+[localhost:21000] > show table stats no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1 | 0 | 0B | 0B | | |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] > show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| Total | -1 | 0 | 0B | 0B | | |
++-------+-------+--------+------+--------------+--------+-------------------+
+[localhost:21000] > alter table no_stats_partitioned add partition (year=2013);
+[localhost:21000] > compute stats no_stats_partitioned;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+[localhost:21000] > alter table no_stats_partitioned add partition (year=2014);
+[localhost:21000] > show partitions no_stats_partitioned;
++-------+-------+--------+------+--------------+--------+-------------------+
+| year | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats |
++-------+-------+--------+------+--------------+--------+-------------------+
+| 2013 | 0 | 0 | 0B | NOT CACHED | TEXT | false |
+| 2014 | -1 | 0 | 0B | NOT CACHED | TEXT | false |
+| Total | 0 | 0 | 0B | 0B | | |
++-------+-------+--------+------+--------------+--------+-------------------+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the default <code class="ph codeph">COMPUTE STATS</code> statement creates and updates
+ statistics for all partitions in a table, if you expect to frequently add new
+ partitions, use the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax instead, which
+ lets you compute stats for a single specified partition, or only for those partitions
+ that do not already have incremental stats.
+ </div>
+
+ <p class="p">
+ If checking each individual table is impractical, due to a large number of tables or
+ views that hide the underlying base tables, you can also check for missing statistics
+ for a particular query. Use the <code class="ph codeph">EXPLAIN</code> statement to preview query
+ efficiency before actually running the query. Use the query profile output available
+ through the <code class="ph codeph">PROFILE</code> command in <span class="keyword cmdname">impala-shell</span> or the
+ web UI to verify query execution and timing after running the query. Both the
+ <code class="ph codeph">EXPLAIN</code> plan and the <code class="ph codeph">PROFILE</code> output display a warning
+ if any tables or partitions involved in the query do not have statistics.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table no_stats (x int);
+[localhost:21000] > explain select count(*) from no_stats;
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=10.00MB VCores=1 |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| incremental_stats.no_stats |
+| |
+| 03:AGGREGATE [FINALIZE] |
+| | output: count:merge(*) |
+| | |
+| 02:EXCHANGE [UNPARTITIONED] |
+| | |
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+| 00:SCAN HDFS [incremental_stats.no_stats] |
+| partitions=1/1 files=0 size=0B |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ Because Impala uses the <dfn class="term">partition pruning</dfn> technique when possible to only
+ evaluate certain partitions, if you have a partitioned table with statistics for some
+ partitions and not others, whether or not the <code class="ph codeph">EXPLAIN</code> statement shows
+ the warning depends on the actual partitions used by the query. For example, you might
+ see warnings or not for different queries against the same table:
+ </p>
+
+<pre class="pre codeblock"><code>-- No warning because all the partitions for the year 2012 have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year = 2012;
+
+-- Missing stats warning because one or more partitions in this range
+-- do not have stats.
+EXPLAIN SELECT ... FROM t1 WHERE year BETWEEN 2006 AND 2009;
+</code></pre>
+
+ <p class="p">
+ To confirm if any partitions at all in the table are missing statistics, you might
+ explain a query that scans the entire table, such as <code class="ph codeph">SELECT COUNT(*) FROM
+ <var class="keyword varname">table_name</var></code>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="perf_stats__concept_s3c_4gl_mdb">
+
+ <h2 class="title topictitle2" id="ariaid-title12">Manually Setting Table and Column Statistics with ALTER TABLE</h2>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="concept_s3c_4gl_mdb__concept_wpt_pgl_mdb">
+
+ <h3 class="title topictitle3" id="ariaid-title13">Setting Table Statistics</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The most crucial piece of data in all the statistics is the number of rows in the
+ table (for an unpartitioned or partitioned table) and for each partition (for a
+ partitioned table). The <code class="ph codeph">COMPUTE STATS</code> statement always gathers
+ statistics about all columns, as well as overall table statistics. If it is not
+ practical to do a full <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL
+ STATS</code> operation after adding a partition or inserting data, or if you can see
+ that Impala would produce a more efficient plan if the number of rows was different,
+ you can manually set the number of rows through an <code class="ph codeph">ALTER TABLE</code>
+ statement:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Set total number of rows. Applies to both unpartitioned and partitioned tables.
+alter table <var class="keyword varname">table_name</var> set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+
+-- Set total number of rows for a specific partition. Applies to partitioned tables only.
+-- You must specify all the partition key columns in the PARTITION clause.
+alter table <var class="keyword varname">table_name</var> partition (<var class="keyword varname">keycol1</var>=<var class="keyword varname">val1</var>,<var class="keyword varname">keycol2</var>=<var class="keyword varname">val2</var>...) set tblproperties('numRows'='<var class="keyword varname">new_value</var>', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+</code></pre>
+
+ <p class="p">
+ This statement avoids re-scanning any data files. (The requirement to include the
+ <code class="ph codeph">STATS_GENERATED_VIA_STATS_TASK</code> property is relatively new, as a
+ result of the issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/HIVE-8648" target="_blank">HIVE-8648</a>
+ for the Hive metastore.)
+ </p>
+
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+ <p class="p">
+ For a partitioned table, update both the per-partition number of rows and the number
+ of rows for the whole table:
+ </p>
+
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+
+ <p class="p">
+ In practice, the <code class="ph codeph">COMPUTE STATS</code> statement, or <code class="ph codeph">COMPUTE
+ INCREMENTAL STATS</code> for a partitioned table, should be fast and convenient
+ enough that this technique is only useful for the very largest partitioned tables.
+
+
+ Because the column statistics might be left in a stale state, do not use this
+ technique as a replacement for <code class="ph codeph">COMPUTE STATS</code>. Only use this technique
+ if all other means of collecting statistics are impractical, or as a low-overhead
+ operation that you run in between periodic <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operations.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="concept_s3c_4gl_mdb__concept_asb_vgl_mdb">
+
+ <h3 class="title topictitle3" id="ariaid-title14">Setting Column Statistics</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, you can also use the <code class="ph codeph">SET
+ COLUMN STATS</code> clause of <code class="ph codeph">ALTER TABLE</code> to manually set or change
+ column statistics. Only use this technique in cases where it is impractical to run
+ <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ frequently enough to keep up with data changes for a huge table.
+ </p>
+
+ <div class="p">
+ You specify a case-insensitive symbolic name for the kind of statistics:
+ <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+ The key names and values are both quoted. This operation applies to an entire table,
+ not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x | INT | -1 | -1 | 4 | 4 |
+| s | STRING | -1 | -1 | -1 | -1 |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x | INT | 2 | 0 | 4 | 4 |
+| s | STRING | 3 | -1 | 4 | -1 |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+ </div>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="perf_stats__perf_stats_examples">
+
+ <h2 class="title topictitle2" id="ariaid-title15">Examples of Using Table and Column Statistics with Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following examples walk through a sequence of <code class="ph codeph">SHOW TABLE STATS</code>,
+ <code class="ph codeph">SHOW COLUMN STATS</code>, <code class="ph codeph">ALTER TABLE</code>, and
+ <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT</code> statements to illustrate various
+ aspects of how Impala uses statistics to help optimize queries.
+ </p>
+
+ <p class="p">
+ This example shows table and column statistics for the <code class="ph codeph">STORE</code> column
+ used in the <a class="xref" href="http://www.tpc.org/tpcds/" target="_blank">TPC-DS
+ benchmarks for decision support</a> systems. It is a tiny table holding data for 12
+ stores. Initially, before any statistics are gathered by a <code class="ph codeph">COMPUTE
+ STATS</code> statement, most of the numeric fields show placeholder values of -1,
+ indicating that the figures are unknown. The figures that are filled in are values that
+ are easily countable or deducible at the physical level, such as the number of files,
+ total data size of the files, and the maximum and average sizes for data types that have
+ a constant size such as <code class="ph codeph">INT</code>, <code class="ph codeph">FLOAT</code>, and
+ <code class="ph codeph">TIMESTAMP</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+--------+--------+
+| -1 | 1 | 3.08KB | TEXT |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] > show column stats store;
++--------------------+-----------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------------------+-----------+------------------+--------+----------+----------+
+| s_store_sk | INT | -1 | -1 | 4 | 4 |
+| s_store_id | STRING | -1 | -1 | -1 | -1 |
+| s_rec_start_date | TIMESTAMP | -1 | -1 | 16 | 16 |
+| s_rec_end_date | TIMESTAMP | -1 | -1 | 16 | 16 |
+| s_closed_date_sk | INT | -1 | -1 | 4 | 4 |
+| s_store_name | STRING | -1 | -1 | -1 | -1 |
+| s_number_employees | INT | -1 | -1 | 4 | 4 |
+| s_floor_space | INT | -1 | -1 | 4 | 4 |
+| s_hours | STRING | -1 | -1 | -1 | -1 |
+| s_manager | STRING | -1 | -1 | -1 | -1 |
+| s_market_id | INT | -1 | -1 | 4 | 4 |
+| s_geography_class | STRING | -1 | -1 | -1 | -1 |
+| s_market_desc | STRING | -1 | -1 | -1 | -1 |
+| s_market_manager | STRING | -1 | -1 | -1 | -1 |
+| s_division_id | INT | -1 | -1 | 4 | 4 |
+| s_division_name | STRING | -1 | -1 | -1 | -1 |
+| s_company_id | INT | -1 | -1 | 4 | 4 |
+| s_company_name | STRING | -1 | -1 | -1 | -1 |
+| s_street_number | STRING | -1 | -1 | -1 | -1 |
+| s_street_name | STRING | -1 | -1 | -1 | -1 |
+| s_street_type | STRING | -1 | -1 | -1 | -1 |
+| s_suite_number | STRING | -1 | -1 | -1 | -1 |
+| s_city | STRING | -1 | -1 | -1 | -1 |
+| s_county | STRING | -1 | -1 | -1 | -1 |
+| s_state | STRING | -1 | -1 | -1 | -1 |
+| s_zip | STRING | -1 | -1 | -1 | -1 |
+| s_country | STRING | -1 | -1 | -1 | -1 |
+| s_gmt_offset | FLOAT | -1 | -1 | 4 | 4 |
+| s_tax_percentage | FLOAT | -1 | -1 | 4 | 4 |
++--------------------+-----------+------------------+--------+----------+----------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+ <p class="p">
+ With the Hive <code class="ph codeph">ANALYZE TABLE</code> statement for column statistics, you had to
+ specify each column for which to gather statistics. The Impala <code class="ph codeph">COMPUTE
+ STATS</code> statement automatically gathers statistics for all columns, because it
+ reads through the entire table relatively quickly and can efficiently compute the values
+ for all the columns. This example shows how after running the <code class="ph codeph">COMPUTE
+ STATS</code> statement, statistics are filled in for both the table and all its
+ columns:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats store;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 1 partition(s) and 29 column(s). |
++------------------------------------------+
+Returned 1 row(s) in 1.88s
+[localhost:21000] > show table stats store;
++-------+--------+--------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+--------+--------+
+| 12 | 1 | 3.08KB | TEXT |
++-------+--------+--------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] > show column stats store;
++--------------------+-----------+------------------+--------+----------+-------------------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------------------+-----------+------------------+--------+----------+-------------------+
+| s_store_sk | INT | 12 | -1 | 4 | 4 |
+| s_store_id | STRING | 6 | -1 | 16 | 16 |
+| s_rec_start_date | TIMESTAMP | 4 | -1 | 16 | 16 |
+| s_rec_end_date | TIMESTAMP | 3 | -1 | 16 | 16 |
+| s_closed_date_sk | INT | 3 | -1 | 4 | 4 |
+| s_store_name | STRING | 8 | -1 | 5 | 4.25 |
+| s_number_employees | INT | 9 | -1 | 4 | 4 |
+| s_floor_space | INT | 10 | -1 | 4 | 4 |
+| s_hours | STRING | 2 | -1 | 8 | 7.083300113677979 |
+| s_manager | STRING | 7 | -1 | 15 | 12 |
+| s_market_id | INT | 7 | -1 | 4 | 4 |
+| s_geography_class | STRING | 1 | -1 | 7 | 7 |
+| s_market_desc | STRING | 10 | -1 | 94 | 55.5 |
+| s_market_manager | STRING | 7 | -1 | 16 | 14 |
+| s_division_id | INT | 1 | -1 | 4 | 4 |
+| s_division_name | STRING | 1 | -1 | 7 | 7 |
+| s_company_id | INT | 1 | -1 | 4 | 4 |
+| s_company_name | STRING | 1 | -1 | 7 | 7 |
+| s_street_number | STRING | 9 | -1 | 3 | 2.833300113677979 |
+| s_street_name | STRING | 12 | -1 | 11 | 6.583300113677979 |
+| s_street_type | STRING | 8 | -1 | 9 | 4.833300113677979 |
+| s_suite_number | STRING | 11 | -1 | 9 | 8.25 |
+| s_city | STRING | 2 | -1 | 8 | 6.5 |
+| s_county | STRING | 1 | -1 | 17 | 17 |
+| s_state | STRING | 1 | -1 | 2 | 2 |
+| s_zip | STRING | 2 | -1 | 5 | 5 |
+| s_country | STRING | 1 | -1 | 13 | 13 |
+| s_gmt_offset | FLOAT | 1 | -1 | 4 | 4 |
+| s_tax_percentage | FLOAT | 5 | -1 | 4 | 4 |
++--------------------+-----------+------------------+--------+----------+-------------------+
+Returned 29 row(s) in 0.04s</code></pre>
+
+ <p class="p">
+ The following example shows how statistics are represented for a partitioned table. In
+ this case, we have set up a table to hold the world's most trivial census data, a single
+ <code class="ph codeph">STRING</code> field, partitioned by a <code class="ph codeph">YEAR</code> column. The table
+ statistics include a separate entry for each partition, plus final totals for the
+ numeric fields. The column statistics include some easily deducible facts for the
+ partitioning column, such as the number of distinct values (the number of partition
+ subdirectories).
+
+ </p>
+
+<pre class="pre codeblock"><code>localhost:21000] > describe census;
++------+----------+---------+
+| name | type | comment |
++------+----------+---------+
+| name | string | |
+| year | smallint | |
++------+----------+---------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] > show table stats census;
++-------+-------+--------+------+---------+
+| year | #Rows | #Files | Size | Format |
++-------+-------+--------+------+---------+
+| 2000 | -1 | 0 | 0B | TEXT |
+| 2004 | -1 | 0 | 0B | TEXT |
+| 2008 | -1 | 0 | 0B | TEXT |
+| 2010 | -1 | 0 | 0B | TEXT |
+| 2011 | 0 | 1 | 22B | TEXT |
+| 2012 | -1 | 1 | 22B | TEXT |
+| 2013 | -1 | 1 | 231B | PARQUET |
+| Total | 0 | 3 | 275B | |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] > show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name | STRING | -1 | -1 | -1 | -1 |
+| year | SMALLINT | 7 | -1 | 2 | 2 |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+ <p class="p">
+ The following example shows how the statistics are filled in by a <code class="ph codeph">COMPUTE
+ STATS</code> statement in Impala.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > compute stats census;
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 2.16s
+[localhost:21000] > show table stats census;
++-------+-------+--------+------+---------+
+| year | #Rows | #Files | Size | Format |
++-------+-------+--------+------+---------+
+| 2000 | -1 | 0 | 0B | TEXT |
+| 2004 | -1 | 0 | 0B | TEXT |
+| 2008 | -1 | 0 | 0B | TEXT |
+| 2010 | -1 | 0 | 0B | TEXT |
+| 2011 | 4 | 1 | 22B | TEXT |
+| 2012 | 4 | 1 | 22B | TEXT |
+| 2013 | 1 | 1 | 231B | PARQUET |
+| Total | 9 | 3 | 275B | |
++-------+-------+--------+------+---------+
+Returned 8 row(s) in 0.02s
+[localhost:21000] > show column stats census;
++--------+----------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+----------+------------------+--------+----------+----------+
+| name | STRING | 4 | -1 | 5 | 4.5 |
+| year | SMALLINT | 7 | -1 | 2 | 2 |
++--------+----------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s</code></pre>
+
+ <p class="p">
+ For examples showing how some queries work differently when statistics are available,
+ see <a class="xref" href="impala_perf_joins.html#perf_joins_examples">Examples of Join Order Optimization</a>. You can see how Impala
+ executes a query differently in each case by observing the <code class="ph codeph">EXPLAIN</code>
+ output before and after collecting statistics. Measure the before and after query times,
+ and examine the throughput numbers in before and after <code class="ph codeph">SUMMARY</code> or
+ <code class="ph codeph">PROFILE</code> output, to verify how much the improved plan speeds up
+ performance.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_testing.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_testing.html b/docs/build3x/html/topics/impala_perf_testing.html
new file mode 100644
index 0000000..fee319a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_testing.html
@@ -0,0 +1,152 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance_testing"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Testing Impala Performance</title></head><body id="performance_testing"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Testing Impala Performance</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Test to ensure that Impala is configured for optimal performance. If you have installed Impala with cluster
+ management software, complete the processes described in this topic to help ensure a proper
+ configuration. These procedures can be used to verify that Impala is set up correctly.
+ </p>
+
+ <section class="section" id="performance_testing__checking_config_performance"><h2 class="title sectiontitle">Checking Impala Configuration Values</h2>
+
+
+
+ <p class="p">
+ You can inspect Impala configuration values by connecting to your Impala server using a browser.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To check Impala configuration values:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Use a browser to connect to one of the hosts running <code class="ph codeph">impalad</code> in your environment.
+ Connect using an address of the form
+ <code class="ph codeph">http://<var class="keyword varname">hostname</var>:<var class="keyword varname">port</var>/varz</code>.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In the preceding example, replace <code class="ph codeph">hostname</code> and <code class="ph codeph">port</code> with the name and
+ port of your Impala server. The default port is 25000.
+ </div>
+ </li>
+
+ <li class="li">
+ Review the configured values.
+ <p class="p">
+ For example, to check that your system is configured to use block locality tracking information, you
+ would check that the value for <code class="ph codeph">dfs.datanode.hdfs-blocks-metadata.enabled</code> is
+ <code class="ph codeph">true</code>.
+ </p>
+ </li>
+ </ol>
+
+ <p class="p" id="performance_testing__p_31">
+ <strong class="ph b">To check data locality:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Execute a query on a dataset that is available across multiple nodes. For example, for a table named
+ <code class="ph codeph">MyTable</code> that has a reasonable chance of being spread across multiple DataNodes:
+<pre class="pre codeblock"><code>[impalad-host:21000] > SELECT COUNT (*) FROM MyTable</code></pre>
+ </li>
+
+ <li class="li">
+ After the query completes, review the contents of the Impala logs. You should find a recent message
+ similar to the following:
+<pre class="pre codeblock"><code>Total remote scan volume = 0</code></pre>
+ </li>
+ </ol>
+
+ <p class="p">
+ The presence of remote scans may indicate <code class="ph codeph">impalad</code> is not running on the correct nodes.
+ This can be because some DataNodes do not have <code class="ph codeph">impalad</code> running or it can be because the
+ <code class="ph codeph">impalad</code> instance that is starting the query is unable to contact one or more of the
+ <code class="ph codeph">impalad</code> instances.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To understand the causes of this issue:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Connect to the debugging web server. By default, this server runs on port 25000. This page lists all
+ <code class="ph codeph">impalad</code> instances running in your cluster. If there are fewer instances than you expect,
+ this often indicates some DataNodes are not running <code class="ph codeph">impalad</code>. Ensure
+ <code class="ph codeph">impalad</code> is started on all DataNodes.
+ </li>
+
+ <li class="li">
+
+ If you are using multi-homed hosts, ensure that the Impala daemon's hostname resolves to the interface on
+ which <code class="ph codeph">impalad</code> is running. The hostname Impala is using is displayed when
+ <code class="ph codeph">impalad</code> starts. To explicitly set the hostname, use the <code class="ph codeph">--hostname</code> flag.
+ </li>
+
+ <li class="li">
+ Check that <code class="ph codeph">statestored</code> is running as expected. Review the contents of the state store
+ log to ensure all instances of <code class="ph codeph">impalad</code> are listed as having connected to the state
+ store.
+ </li>
+ </ol>
+ </section>
+
+ <section class="section" id="performance_testing__checking_config_logs"><h2 class="title sectiontitle">Reviewing Impala Logs</h2>
+
+
+
+ <p class="p">
+ You can review the contents of the Impala logs for signs that short-circuit reads or block location
+ tracking are not functioning. Before checking logs, execute a simple query against a small HDFS dataset.
+ Completing a query task generates log messages using current settings. Information on starting Impala and
+ executing queries can be found in <a class="xref" href="impala_processes.html#processes">Starting Impala</a> and
+ <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>. Information on logging can be found in
+ <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>. Log messages and their interpretations are as follows:
+ </p>
+
+ <table class="table"><caption></caption><colgroup><col style="width:75%"><col style="width:25%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__1">
+ Log Message
+ </th>
+ <th class="entry nocellnorowborder" id="performance_testing__checking_config_logs__entry__2">
+ Interpretation
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+ <div class="p">
+<pre class="pre">Unknown disk id. This will negatively affect performance. Check your hdfs settings to enable block location metadata
+</pre>
+ </div>
+ </td>
+ <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+ <p class="p">
+ Tracking block locality is not enabled.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__1 ">
+ <div class="p">
+<pre class="pre">Unable to load native-hadoop library for your platform... using builtin-java classes where applicable</pre>
+ </div>
+ </td>
+ <td class="entry nocellnorowborder" headers="performance_testing__checking_config_logs__entry__2 ">
+ <p class="p">
+ Native checksumming is not enabled.
+ </p>
+ </td>
+ </tr>
+ </tbody></table>
+ </section>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_performance.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_performance.html b/docs/build3x/html/topics/impala_performance.html
new file mode 100644
index 0000000..bc87821
--- /dev/null
+++ b/docs/build3x/html/topics/impala_performance.html
@@ -0,0 +1,116 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_cookbook.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_stats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_benchmarking.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_resources.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_hdfs_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_testing.html"><meta name="DC.Relation" scheme="URI" content="../topics/im
pala_explain_plan.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_perf_skew.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Tuning Impala for Performance</title></head><body id="performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Tuning Impala for Performance</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following sections explain the factors affecting the performance of Impala features, and procedures for
+ tuning, monitoring, and benchmarking Impala queries and other SQL operations.
+ </p>
+
+ <p class="p">
+ This section also describes techniques for maximizing Impala scalability. Scalability is tied to performance:
+ it means that performance remains high as the system workload increases. For example, reducing the disk I/O
+ performed by a query can speed up an individual query, and at the same time improve scalability by making it
+ practical to run more queries simultaneously. Sometimes, an optimization technique improves scalability more
+ than performance. For example, reducing memory usage for a query might not change the query performance much,
+ but might improve scalability by allowing more Impala queries or other kinds of jobs to run at the same time
+ without running out of memory.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Before starting any performance tuning or benchmarking, make sure your system is configured with all the
+ recommended minimum hardware requirements from <a class="xref" href="impala_prereqs.html#prereqs_hardware">Hardware Requirements</a> and
+ software settings from <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>.
+ </p>
+ </div>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>. This technique physically divides the data based on
+ the different values in frequently queried columns, allowing queries to skip reading a large percentage of
+ the data in a table.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>. Joins are the main class of queries that you can tune at
+ the SQL level, as opposed to changing physical factors such as the file format or the hardware
+ configuration. The related topics <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a> and
+ <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> are also important primarily for join performance.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_perf_stats.html#perf_table_stats">Overview of Table Statistics</a> and
+ <a class="xref" href="impala_perf_stats.html#perf_column_stats">Overview of Column Statistics</a>. Gathering table and column statistics, using the
+ <code class="ph codeph">COMPUTE STATS</code> statement, helps Impala automatically optimize the performance for join
+ queries, without requiring changes to SQL query statements. (This process is greatly simplified in Impala
+ 1.2.2 and higher, because the <code class="ph codeph">COMPUTE STATS</code> statement gathers both kinds of statistics in
+ one operation, and does not require any setup and configuration as was previously necessary for the
+ <code class="ph codeph">ANALYZE TABLE</code> statement in Hive.)
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_perf_testing.html#performance_testing">Testing Impala Performance</a>. Do some post-setup testing to ensure Impala is
+ using optimal settings for performance, before conducting any benchmark tests.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_perf_benchmarking.html#perf_benchmarks">Benchmarking Impala Queries</a>. The configuration and sample data that you use
+ for initial experiments with Impala is often not appropriate for doing performance tests.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_perf_resources.html#mem_limits">Controlling Impala Resource Usage</a>. The more memory Impala can utilize, the better query
+ performance you can expect. In a cluster running other kinds of workloads as well, you must make tradeoffs
+ to make sure all Hadoop components have enough memory to perform well, so you might cap the memory that
+ Impala can use.
+ </li>
+
+
+
+ <li class="li">
+ <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>. Queries against data stored in the Amazon Simple Storage Service (S3)
+ have different performance characteristics than when the data is stored in HDFS.
+ </li>
+ </ul>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ A good source of tips related to scalability and performance tuning is the
+ <a class="xref" href="http://www.slideshare.net/cloudera/the-impala-cookbook-42530186" target="_blank">Impala Cookbook</a>
+ presentation. These slides are updated periodically as new features come out and new benchmarks are performed.
+ </p>
+
+ </div>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_perf_cookbook.html">Impala Performance Guidelines and Best Practices</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_joins.html">Performance Considerations for Join Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_stats.html">Table and Column Statistics</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_benchmarking.html">Benchmarking Impala Queries</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_resources.html">Controlling Impala Resource Usage</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/i
mpala_perf_hdfs_caching.html">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_testing.html">Testing Impala Performance</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_plan.html">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_perf_skew.html">Detecting and Correcting HDFS Block Skew Conditions</a></strong><br></li></ul></nav></article></main></body></html>
[24/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_langref_unsupported.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_langref_unsupported.html b/docs/build3x/html/topics/impala_langref_unsupported.html
new file mode 100644
index 0000000..769bf86
--- /dev/null
+++ b/docs/build3x/html/topics/impala_langref_unsupported.html
@@ -0,0 +1,337 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="langref_hiveql_delta"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Differences Between Impala and Hive</title></head><body id="langref_hiveql_delta"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SQL Differences Between Impala and Hive</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ Impala's SQL syntax follows the SQL-92 standard, and includes many industry extensions in areas such as
+ built-in functions. See <a class="xref" href="impala_porting.html#porting">Porting SQL from Other Database Systems to Impala</a> for a general discussion of adapting SQL
+ code from a variety of database systems to Impala.
+ </p>
+
+ <p class="p">
+ Because Impala and Hive share the same metastore database and their tables are often used interchangeably,
+ the following section covers differences between Impala and Hive in detail.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="langref_hiveql_delta__langref_hiveql_unsupported">
+
+ <h2 class="title topictitle2" id="ariaid-title2">HiveQL Features not Available in Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The current release of Impala does not support the following SQL features that you might be familiar with
+ from HiveQL:
+ </p>
+
+
+
+ <ul class="ul">
+
+
+ <li class="li">
+ Extensibility mechanisms such as <code class="ph codeph">TRANSFORM</code>, custom file formats, or custom SerDes.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">DATE</code> data type.
+ </li>
+
+ <li class="li">
+ XML and JSON functions.
+ </li>
+
+ <li class="li">
+ Certain aggregate functions from HiveQL: <code class="ph codeph">covar_pop</code>, <code class="ph codeph">covar_samp</code>,
+ <code class="ph codeph">corr</code>, <code class="ph codeph">percentile</code>, <code class="ph codeph">percentile_approx</code>,
+ <code class="ph codeph">histogram_numeric</code>, <code class="ph codeph">collect_set</code>; Impala supports the set of aggregate
+ functions listed in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a> and analytic
+ functions listed in <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>.
+ </li>
+
+ <li class="li">
+ Sampling.
+ </li>
+
+ <li class="li">
+ Lateral views. In <span class="keyword">Impala 2.3</span> and higher, Impala supports queries on complex types
+ (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>), using join notation
+ rather than the <code class="ph codeph">EXPLODE()</code> keyword.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+ </li>
+
+ <li class="li">
+ Multiple <code class="ph codeph">DISTINCT</code> clauses per query, although Impala includes some workarounds for this
+ limitation.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+ expression in each query.
+ </p>
+ <p class="p">
+ If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+ specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+ <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+ <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+ <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+ </p>
+ <p class="p">
+ To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+ following technique for queries involving a single table:
+ </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+ (select count(distinct col1) as c1 from t1) v1
+ cross join
+ (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+ <p class="p">
+ Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+ technique wherever practical.
+ </p>
+ </div>
+ </li>
+ </ul>
+
+ <div class="p">
+ User-defined functions (UDFs) are supported starting in Impala 1.2. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>
+ for full details on Impala UDFs.
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala supports high-performance UDFs written in C++, as well as reusing some Java-based Hive UDFs.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports scalar UDFs and user-defined aggregate functions (UDAFs). Impala does not currently
+ support user-defined table generating functions (UDTFs).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Only Impala-supported column types are supported in Java-based UDFs.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The Hive <code class="ph codeph">current_user()</code> function cannot be
+ called from a Java UDF through Impala.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ Impala does not currently support these HiveQL statements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">ANALYZE TABLE</code> (the Impala equivalent is <code class="ph codeph">COMPUTE STATS</code>)
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DESCRIBE COLUMN</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DESCRIBE DATABASE</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">EXPORT TABLE</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">IMPORT TABLE</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SHOW TABLE EXTENDED</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SHOW INDEXES</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SHOW COLUMNS</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">INSERT OVERWRITE DIRECTORY</code>; use <code class="ph codeph">INSERT OVERWRITE <var class="keyword varname">table_name</var></code>
+ or <code class="ph codeph">CREATE TABLE AS SELECT</code> to materialize query results into the HDFS directory associated
+ with an Impala table.
+ </li>
+ </ul>
+ <p class="p">
+ Impala respects the <code class="ph codeph">serialization.null.format</code> table
+ property only for TEXT tables and ignores the property for Parquet and
+ other formats. Hive respects the <code class="ph codeph">serialization.null.format</code>
+ property for Parquet and other formats and converts matching values
+ to NULL during the scan. See <a class="xref" href="impala_txtfile.html">Using Text Data Files with Impala Tables</a> for
+ using the table property in Impala.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="langref_hiveql_delta__langref_hiveql_semantics">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Semantic Differences Between Impala and HiveQL Features</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section covers instances where Impala and Hive have similar functionality, sometimes including the
+ same syntax, but there are differences in the runtime semantics of those features.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security:</strong>
+ </p>
+
+ <p class="p">
+ Impala utilizes the <a class="xref" href="http://sentry.apache.org/" target="_blank">Apache
+ Sentry </a> authorization framework, which provides fine-grained role-based access control
+ to protect data against unauthorized access or tampering.
+ </p>
+
+ <p class="p">
+ The Hive component now includes Sentry-enabled <code class="ph codeph">GRANT</code>,
+ <code class="ph codeph">REVOKE</code>, and <code class="ph codeph">CREATE/DROP ROLE</code> statements. Earlier Hive releases had a
+ privilege system with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements that were primarily
+ intended to prevent accidental deletion of data, rather than a security mechanism to protect against
+ malicious users.
+ </p>
+
+ <p class="p">
+ Impala can make use of privileges set up through Hive <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements.
+ Impala has its own <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala 2.0 and higher.
+ See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for the details of authorization in Impala, including
+ how to switch from the original policy file-based privilege model to the Sentry service using privileges
+ stored in the metastore database.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">SQL statements and clauses:</strong>
+ </p>
+
+ <p class="p">
+ The semantics of Impala SQL statements varies from HiveQL in some cases where they use similar SQL
+ statement and clause names:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala uses different syntax and names for query hints, <code class="ph codeph">[SHUFFLE]</code> and
+ <code class="ph codeph">[NOSHUFFLE]</code> rather than <code class="ph codeph">MapJoin</code> or <code class="ph codeph">StreamJoin</code>. See
+ <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for the Impala details.
+ </li>
+
+ <li class="li">
+ Impala does not expose MapReduce specific features of <code class="ph codeph">SORT BY</code>, <code class="ph codeph">DISTRIBUTE
+ BY</code>, or <code class="ph codeph">CLUSTER BY</code>.
+ </li>
+
+ <li class="li">
+ Impala does not require queries to include a <code class="ph codeph">FROM</code> clause.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Data types:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala supports a limited set of implicit casts. This can help avoid undesired results from unexpected
+ casting behavior.
+ <ul class="ul">
+ <li class="li">
+ Impala does not implicitly cast between string and numeric or Boolean types. Always use
+ <code class="ph codeph">CAST()</code> for these conversions.
+ </li>
+
+ <li class="li">
+ Impala does perform implicit casts among the numeric types, when going from a smaller or less precise
+ type to a larger or more precise one. For example, Impala will implicitly convert a
+ <code class="ph codeph">SMALLINT</code> to a <code class="ph codeph">BIGINT</code> or <code class="ph codeph">FLOAT</code>, but to convert from
+ <code class="ph codeph">DOUBLE</code> to <code class="ph codeph">FLOAT</code> or <code class="ph codeph">INT</code> to <code class="ph codeph">TINYINT</code>
+ requires a call to <code class="ph codeph">CAST()</code> in the query.
+ </li>
+
+ <li class="li">
+ Impala does perform implicit casts from string to timestamp. Impala has a restricted set of literal
+ formats for the <code class="ph codeph">TIMESTAMP</code> data type and the <code class="ph codeph">from_unixtime()</code> format
+ string; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+ </li>
+ </ul>
+ <p class="p">
+ See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for full details on implicit and explicit casting for
+ all types, and <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details about
+ the <code class="ph codeph">CAST()</code> function.
+ </p>
+ </li>
+
+ <li class="li">
+ Impala does not store or interpret timestamps using the local timezone, to avoid undesired results from
+ unexpected time zone issues. Timestamps are stored and interpreted relative to UTC. This difference can
+ produce different results for some calls to similarly named date/time functions between Impala and Hive.
+ See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details about the Impala
+ functions. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for a discussion of how Impala handles
+ time zones, and configuration options you can use to make Impala match the Hive behavior more closely
+ when dealing with Parquet-encoded <code class="ph codeph">TIMESTAMP</code> data or when converting between
+ the local time zone and UTC.
+ </li>
+
+ <li class="li">
+ The Impala <code class="ph codeph">TIMESTAMP</code> type can represent dates ranging from 1400-01-01 to 9999-12-31.
+ This is different from the Hive date range, which is 0000-01-01 to 9999-12-31.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala does not return column overflows as <code class="ph codeph">NULL</code>, so that customers can distinguish
+ between <code class="ph codeph">NULL</code> data and overflow conditions similar to how they do so with traditional
+ database systems. Impala returns the largest or smallest value in the range for the type. For example,
+ valid values for a <code class="ph codeph">tinyint</code> range from -128 to 127. In Impala, a <code class="ph codeph">tinyint</code>
+ with a value of -200 returns -128 rather than <code class="ph codeph">NULL</code>. A <code class="ph codeph">tinyint</code> with a
+ value of 200 returns 127.
+ </p>
+ </li>
+
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Miscellaneous features:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala does not provide virtual columns.
+ </li>
+
+ <li class="li">
+ Impala does not expose locking.
+ </li>
+
+ <li class="li">
+ Impala does not expose some configuration properties.
+ </li>
+ </ul>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ldap.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ldap.html b/docs/build3x/html/topics/impala_ldap.html
new file mode 100644
index 0000000..7729e93
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ldap.html
@@ -0,0 +1,294 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ldap"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling LDAP Authentication for Impala</title></head><body id="ldap"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Enabling LDAP Authentication for Impala</h1>
+
+
+ <div class="body conbody">
+
+
+
+ <p class="p"> Authentication is the process of allowing only specified named users to
+ access the server (in this case, the Impala server). This feature is
+ crucial for any production deployment, to prevent misuse, tampering, or
+ excessive load on the server. Impala uses LDAP for authentication,
+ verifying the credentials of each user who connects through
+ <span class="keyword cmdname">impala-shell</span>, Hue, a Business Intelligence tool, JDBC
+ or ODBC application, and so on. </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+ owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+ databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+ </div>
+
+ <p class="p">
+ An alternative form of authentication you can use is Kerberos, described in
+ <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="ldap__ldap_prereqs">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with LDAP</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Authentication against LDAP servers is available in Impala 1.2.2 and higher. Impala 1.4.0 adds support for
+ secure LDAP authentication through SSL and TLS.
+ </p>
+
+ <p class="p">
+ The Impala LDAP support lets you use Impala with systems such as Active Directory that use LDAP behind the
+ scenes.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="ldap__ldap_client_server">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Client-Server Considerations for LDAP</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Only client->Impala connections can be authenticated by LDAP.
+ </p>
+
+ <p class="p"> You must use the Kerberos authentication mechanism for connections
+ between internal Impala components, such as between the
+ <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">statestored</span>, and
+ <span class="keyword cmdname">catalogd</span> daemons. See <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> on how to set up Kerberos for
+ Impala. </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="ldap__ldap_config">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Server-Side LDAP Setup</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These requirements apply on the server side when configuring and starting Impala:
+ </p>
+
+ <p class="p">
+ To enable LDAP authentication, set the following startup options for <span class="keyword cmdname">impalad</span>:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">--enable_ldap_auth</code> enables LDAP-based authentication between the client and Impala.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ldap_uri</code> sets the URI of the LDAP server to use. Typically, the URI is prefixed with
+ <code class="ph codeph">ldap://</code>. In Impala 1.4.0 and higher, you can specify secure SSL-based LDAP transport by
+ using the prefix <code class="ph codeph">ldaps://</code>. The URI can optionally specify the port, for example:
+ <code class="ph codeph">ldap://ldap_server.example.com:389</code> or
+ <code class="ph codeph">ldaps://ldap_server.example.com:636</code>. (389 and 636 are the default ports for non-SSL and
+ SSL LDAP connections, respectively.)
+ </li>
+
+
+
+ <li class="li">
+ For <code class="ph codeph">ldaps://</code> connections secured by SSL,
+ <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+ location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+ filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+ </li>
+
+
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="ldap__ldap_bind_strings">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Support for Custom Bind Strings</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When Impala connects to LDAP it issues a bind call to the LDAP server to authenticate as the connected
+ user. Impala clients, including the Impala shell, provide the short name of the user to Impala. This is
+ necessary so that Impala can use Sentry for role-based access, which uses short names.
+ </p>
+
+ <p class="p">
+ However, LDAP servers often require more complex, structured usernames for authentication. Impala supports
+ three ways of transforming the short name (for example, <code class="ph codeph">'henry'</code>) to a more complicated
+ string. If necessary, specify one of the following configuration options
+ when starting the <span class="keyword cmdname">impalad</span> daemon on each DataNode:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">--ldap_domain</code>: Replaces the username with a string
+ <code class="ph codeph"><var class="keyword varname">username</var>@<var class="keyword varname">ldap_domain</var></code>.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ldap_baseDN</code>: Replaces the username with a <span class="q">"distinguished name"</span> (DN) of the form:
+ <code class="ph codeph">uid=<var class="keyword varname">userid</var>,ldap_baseDN</code>. (This is equivalent to a Hive option).
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ldap_bind_pattern</code>: This is the most general option, and replaces the username with the
+ string <var class="keyword varname">ldap_bind_pattern</var> where all instances of the string <code class="ph codeph">#UID</code> are
+ replaced with <var class="keyword varname">userid</var>. For example, an <code class="ph codeph">ldap_bind_pattern</code> of
+ <code class="ph codeph">"user=#UID,OU=foo,CN=bar"</code> with a username of <code class="ph codeph">henry</code> will construct a
+ bind name of <code class="ph codeph">"user=henry,OU=foo,CN=bar"</code>.
+ </li>
+ </ul>
+
+ <p class="p">
+ These options are mutually exclusive; Impala does not start if more than one of these options is specified.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="ldap__ldap_security">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Secure LDAP Connections</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To avoid sending credentials over the wire in cleartext, you must configure a secure connection between
+ both the client and Impala, and between Impala and the LDAP server. The secure connection could use SSL or
+ TLS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Secure LDAP connections through SSL:</strong>
+ </p>
+
+ <p class="p">
+ For SSL-enabled LDAP connections, specify a prefix of <code class="ph codeph">ldaps://</code> instead of
+ <code class="ph codeph">ldap://</code>. Also, the default port for SSL-enabled LDAP connections is 636 instead of 389.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Secure LDAP connections through TLS:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="http://en.wikipedia.org/wiki/Transport_Layer_Security" target="_blank">TLS</a>,
+ the successor to the SSL protocol, is supported by most modern LDAP servers. Unlike SSL connections, TLS
+ connections can be made on the same server port as non-TLS connections. To secure all connections using
+ TLS, specify the following flags as startup options to the <span class="keyword cmdname">impalad</span> daemon:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">--ldap_tls</code> tells Impala to start a TLS connection to the LDAP server, and to fail
+ authentication if it cannot be done.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">--ldap_ca_certificate="<var class="keyword varname">/path/to/certificate/pem</var>"</code> specifies the
+ location of the certificate in standard <code class="ph codeph">.PEM</code> format. Store this certificate on the local
+ filesystem, in a location that only the <code class="ph codeph">impala</code> user and other trusted users can read.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="ldap__ldap_impala_shell">
+
+ <h2 class="title topictitle2" id="ariaid-title7">LDAP Authentication for impala-shell Interpreter</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To connect to Impala using LDAP authentication, you specify command-line options to the
+ <span class="keyword cmdname">impala-shell</span> command interpreter and enter the password when prompted:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">-l</code> enables LDAP authentication.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">-u</code> sets the user. Per Active Directory, the user is the short username, not the full
+ LDAP distinguished name. If your LDAP settings include a search base, use the
+ <code class="ph codeph">--ldap_bind_pattern</code> on the <span class="keyword cmdname">impalad</span> daemon to translate the short user
+ name from <span class="keyword cmdname">impala-shell</span> automatically to the fully qualified name.
+
+ </li>
+
+ <li class="li">
+ <span class="keyword cmdname">impala-shell</span> automatically prompts for the password.
+ </li>
+ </ul>
+
+ <p class="p">
+ For the full list of available <span class="keyword cmdname">impala-shell</span> options, see
+ <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">LDAP authentication for JDBC applications:</strong> See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> for the
+ format to use with the JDBC connection string for servers using LDAP authentication.
+ </p>
+ </div>
+ </article>
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="ldap__ldap_impala_hue">
+ <h2 class="title topictitle2" id="ariaid-title8">Enabling LDAP for Impala in Hue</h2>
+
+ <div class="body conbody">
+ <section class="section" id="ldap_impala_hue__ldap_impala_hue_cmdline"><h3 class="title sectiontitle">Enabling LDAP for Impala in Hue Using the Command Line</h3>
+
+ <div class="p">LDAP authentication for the Impala app in Hue can be enabled by
+ setting the following properties under the <code class="ph codeph">[impala]</code>
+ section in <code class="ph codeph">hue.ini</code>. <table class="table" id="ldap_impala_hue__ldap_impala_hue_configs"><caption></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder"><code class="ph codeph">auth_username</code></td>
+ <td class="entry nocellnorowborder">LDAP username of Hue user to be authenticated.</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder"><code class="ph codeph">auth_password</code></td>
+ <td class="entry nocellnorowborder">
+ <p class="p">LDAP password of Hue user to be authenticated.</p>
+ </td>
+ </tr>
+ </tbody></table>These login details are only used by Impala to authenticate to
+ LDAP. The Impala service trusts Hue to have already validated the user
+ being impersonated, rather than simply passing on the credentials.</div>
+ </section>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="ldap__ldap_delegation">
+ <h2 class="title topictitle2" id="ariaid-title9">Enabling Impala Delegation for LDAP Users</h2>
+ <div class="body conbody">
+ <p class="p">
+ See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+ that lets certain users submit queries using the credentials of other users.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="ldap__ldap_restrictions">
+
+ <h2 class="title topictitle2" id="ariaid-title10">LDAP Restrictions for Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The LDAP support is preliminary. It currently has only been tested against Active Directory.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_limit.html b/docs/build3x/html/topics/impala_limit.html
new file mode 100644
index 0000000..22dc7a5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_limit.html
@@ -0,0 +1,168 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIMIT Clause</title></head><body id="limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LIMIT Clause</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">LIMIT</code> clause in a <code class="ph codeph">SELECT</code> query sets a maximum number of rows for the
+ result set. Pre-selecting the maximum size of the result set helps Impala to optimize memory usage while
+ processing a distributed query.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LIMIT <var class="keyword varname">constant_integer_expression</var></code></pre>
+
+ <p class="p">
+ The argument to the <code class="ph codeph">LIMIT</code> clause must evaluate to a constant value. It can be a numeric
+ literal, or another kind of numeric expression involving operators, casts, and function return values. You
+ cannot refer to a column or use a subquery.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This clause is useful in contexts such as:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ To return exactly N items from a top-N query, such as the 10 highest-rated items in a shopping category or
+ the 50 hostnames that refer the most traffic to a web site.
+ </li>
+
+ <li class="li">
+ To demonstrate some sample values from a table or a particular query. (To display some arbitrary items, use
+ a query with no <code class="ph codeph">ORDER BY</code> clause. An <code class="ph codeph">ORDER BY</code> clause causes additional
+ memory and/or disk usage during the query.)
+ </li>
+
+ <li class="li">
+ To keep queries from returning huge result sets by accident if a table is larger than expected, or a
+ <code class="ph codeph">WHERE</code> clause matches more rows than expected.
+ </li>
+ </ul>
+
+ <p class="p">
+ Originally, the value for the <code class="ph codeph">LIMIT</code> clause had to be a numeric literal. In Impala 1.2.1 and
+ higher, it can be a numeric expression.
+ </p>
+
+ <p class="p">
+ Prior to Impala 1.4.0, Impala required any query including an
+ <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+ <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+ higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+ sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+ Impala automatically uses a temporary disk work area to perform the sort operation.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details.
+ </p>
+
+ <p class="p">
+ In Impala 1.2.1 and higher, you can combine a <code class="ph codeph">LIMIT</code> clause with an <code class="ph codeph">OFFSET</code>
+ clause to produce a small result set that is different from a top-N query, for example, to return items 11
+ through 20. This technique can be used to simulate <span class="q">"paged"</span> results. Because Impala queries typically
+ involve substantial amounts of I/O, use this technique only for compatibility in cases where you cannot
+ rewrite the application logic. For best performance and scalability, wherever practical, query as many
+ items as you expect to need, cache them on the application side, and display small groups of results to
+ users using application logic.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+ <code class="ph codeph">LIMIT</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how the <code class="ph codeph">LIMIT</code> clause caps the size of the result set, with the
+ limit being applied after any other clauses such as <code class="ph codeph">WHERE</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database limits;
+[localhost:21000] > use limits;
+[localhost:21000] > create table numbers (x int);
+[localhost:21000] > insert into numbers values (1), (3), (4), (5), (2);
+Inserted 5 rows in 1.34s
+[localhost:21000] > select x from numbers limit 100;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
+| 5 |
+| 2 |
++---+
+Returned 5 row(s) in 0.26s
+[localhost:21000] > select x from numbers limit 3;
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 4 |
++---+
+Returned 3 row(s) in 0.27s
+[localhost:21000] > select x from numbers where x > 2 limit 2;
++---+
+| x |
++---+
+| 3 |
+| 4 |
++---+
+Returned 2 row(s) in 0.27s</code></pre>
+
+ <p class="p">
+ For top-N and bottom-N queries, you use the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses
+ together:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x as "Top 3" from numbers order by x desc limit 3;
++-------+
+| top 3 |
++-------+
+| 5 |
+| 4 |
+| 3 |
++-------+
+[localhost:21000] > select x as "Bottom 3" from numbers order by x limit 3;
++----------+
+| bottom 3 |
++----------+
+| 1 |
+| 2 |
+| 3 |
++----------+
+</code></pre>
+
+ <p class="p">
+ You can use constant values besides integer literals as the <code class="ph codeph">LIMIT</code> argument:
+ </p>
+
+<pre class="pre codeblock"><code>-- Other expressions that yield constant integer values work too.
+SELECT x FROM t1 LIMIT 1e6; -- Limit is one million.
+SELECT x FROM t1 LIMIT length('hello world'); -- Limit is 11.
+SELECT x FROM t1 LIMIT 2+2; -- Limit is 4.
+SELECT x FROM t1 LIMIT cast(truncate(9.9) AS INT); -- Limit is 9.
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_lineage.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_lineage.html b/docs/build3x/html/topics/impala_lineage.html
new file mode 100644
index 0000000..12b3794
--- /dev/null
+++ b/docs/build3x/html/topics/impala_lineage.html
@@ -0,0 +1,91 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="lineage"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Viewing Lineage Information for Impala Data</title></head><body id="lineage"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Viewing Lineage Information for Impala Data</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ <dfn class="term">Lineage</dfn> is a feature that helps you track where data originated, and how
+ data propagates through the system through SQL statements such as
+ <code class="ph codeph">SELECT</code>, <code class="ph codeph">INSERT</code>, and <code class="ph codeph">CREATE
+ TABLE AS SELECT</code>.
+ </p>
+ <p class="p">
+ This type of tracking is important in high-security configurations, especially in
+ highly regulated industries such as healthcare, pharmaceuticals, financial services and
+ intelligence. For such kinds of sensitive data, it is important to know all
+ the places in the system that contain that data or other data derived from it; to verify who has accessed
+ that data; and to be able to doublecheck that the data used to make a decision was processed correctly and
+ not tampered with.
+ </p>
+
+ <section class="section" id="lineage__column_lineage"><h2 class="title sectiontitle">Column Lineage</h2>
+
+
+
+ <p class="p">
+ <dfn class="term">Column lineage</dfn> tracks information in fine detail, at the level of
+ particular columns rather than entire tables.
+ </p>
+
+ <p class="p">
+ For example, if you have a table with information derived from web logs, you might copy that data into
+ other tables as part of the ETL process. The ETL operations might involve transformations through
+ expressions and function calls, and rearranging the columns into more or fewer tables
+ (<dfn class="term">normalizing</dfn> or <dfn class="term">denormalizing</dfn> the data). Then for reporting, you might issue
+ queries against multiple tables and views. In this example, column lineage helps you determine that data
+ that entered the system as <code class="ph codeph">RAW_LOGS.FIELD1</code> was then turned into
+ <code class="ph codeph">WEBSITE_REPORTS.IP_ADDRESS</code> through an <code class="ph codeph">INSERT ... SELECT</code> statement. Or,
+ conversely, you could start with a reporting query against a view, and trace the origin of the data in a
+ field such as <code class="ph codeph">TOP_10_VISITORS.USER_ID</code> back to the underlying table and even further back
+ to the point where the data was first loaded into Impala.
+ </p>
+
+ <p class="p">
+ When you have tables where you need to track or control access to sensitive information at the column
+ level, see <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for how to implement column-level
+ security. You set up authorization using the Sentry framework, create views that refer to specific sets of
+ columns, and then assign authorization privileges to those views rather than the underlying tables.
+ </p>
+
+ </section>
+
+ <section class="section" id="lineage__lineage_data"><h2 class="title sectiontitle">Lineage Data for Impala</h2>
+
+
+
+ <p class="p">
+ The lineage feature is enabled by default. When lineage logging is enabled, the serialized column lineage
+ graph is computed for each query and stored in a specialized log file in JSON format.
+ </p>
+
+ <p class="p">
+ Impala records queries in the lineage log if they complete successfully, or fail due to authorization
+ errors. For write operations such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the statement is recorded in the lineage log only if it successfully completes. Therefore, the lineage
+ feature tracks data that was accessed by successful queries, or that was attempted to be accessed by
+ unsuccessful queries that were blocked due to authorization failure. These kinds of queries represent data
+ that really was accessed, or where the attempted access could represent malicious activity.
+ </p>
+
+ <p class="p">
+ Impala does not record in the lineage log queries that fail due to syntax errors or that fail or are
+ cancelled before they reach the stage of requesting rows from the result set.
+ </p>
+
+ <p class="p">
+ To enable or disable this feature, set or remove the <code class="ph codeph">-lineage_event_log_dir</code>
+ configuration option for the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+
+ </section>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_literals.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_literals.html b/docs/build3x/html/topics/impala_literals.html
new file mode 100644
index 0000000..b9cfe57
--- /dev/null
+++ b/docs/build3x/html/topics/impala_literals.html
@@ -0,0 +1,424 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="literals"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Literals</title></head><body id="literals"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Literals</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Each of the Impala data types has corresponding notation for literal values of that type. You specify literal
+ values in SQL statements, such as in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clause of a
+ query, or as an argument to a function call. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for a complete
+ list of types, ranges, and conversion rules.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="literals__numeric_literals">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Numeric Literals</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ To write literals for the integer types (<code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>,
+ <code class="ph codeph">INT</code>, and <code class="ph codeph">BIGINT</code>), use a sequence of digits with optional leading zeros.
+ </p>
+
+ <p class="p">
+ To write literals for the floating-point types (<code class="ph codeph">DECIMAL</code>,
+ <code class="ph codeph">FLOAT</code>, and <code class="ph codeph">DOUBLE</code>), use a sequence of digits with an optional decimal
+ point (<code class="ph codeph">.</code> character). To preserve accuracy during arithmetic expressions, Impala interprets
+ floating-point literals as the <code class="ph codeph">DECIMAL</code> type with the smallest appropriate precision and
+ scale, until required by the context to convert the result to <code class="ph codeph">FLOAT</code> or
+ <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+ <p class="p">
+ Integer values are promoted to floating-point when necessary, based on the context.
+ </p>
+
+ <p class="p">
+ You can also use exponential notation by including an <code class="ph codeph">e</code> character. For example,
+ <code class="ph codeph">1e6</code> is 1 times 10 to the power of 6 (1 million). A number in exponential notation is
+ always interpreted as floating-point.
+ </p>
+
+ <p class="p">
+ When Impala encounters a numeric literal, it considers the type to be the <span class="q">"smallest"</span> that can
+ accurately represent the value. The type is promoted to larger or more accurate types if necessary, based
+ on subsequent parts of an expression.
+ </p>
+ <p class="p">
+ For example, you can see by the types Impala defines for the following table columns
+ how it interprets the corresponding numeric literals:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table ten as select 10 as x;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc ten;
++------+---------+---------+
+| name | type | comment |
++------+---------+---------+
+| x | tinyint | |
++------+---------+---------+
+
+[localhost:21000] > create table four_k as select 4096 as x;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc four_k;
++------+----------+---------+
+| name | type | comment |
++------+----------+---------+
+| x | smallint | |
++------+----------+---------+
+
+[localhost:21000] > create table one_point_five as select 1.5 as x;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_five;
++------+--------------+---------+
+| name | type | comment |
++------+--------------+---------+
+| x | decimal(2,1) | |
++------+--------------+---------+
+
+[localhost:21000] > create table one_point_three_three_three as select 1.333 as x;
++-------------------+
+| summary |
++-------------------+
+| Inserted 1 row(s) |
++-------------------+
+[localhost:21000] > desc one_point_three_three_three;
++------+--------------+---------+
+| name | type | comment |
++------+--------------+---------+
+| x | decimal(4,3) | |
++------+--------------+---------+
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="literals__string_literals">
+
+ <h2 class="title topictitle2" id="ariaid-title3">String Literals</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ String literals are quoted using either single or double quotation marks. You can use either kind of quotes
+ for string literals, even both kinds for different literals within the same statement.
+ </p>
+
+ <p class="p">
+ Quoted literals are considered to be of type <code class="ph codeph">STRING</code>. To use quoted literals in contexts
+ requiring a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> value, <code class="ph codeph">CAST()</code> the literal to
+ a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> of the appropriate length.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Escaping special characters:</strong>
+ </p>
+
+ <p class="p">
+ To encode special characters within a string literal, precede them with the backslash (<code class="ph codeph">\</code>)
+ escape character:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">\t</code> represents a tab.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\n</code> represents a newline or linefeed. This might cause extra line breaks in
+ <span class="keyword cmdname">impala-shell</span> output.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\r</code> represents a carriage return. This might cause unusual formatting (making it appear
+ that some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\b</code> represents a backspace. This might cause unusual formatting (making it appear that
+ some content is overwritten) in <span class="keyword cmdname">impala-shell</span> output.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\0</code> represents an ASCII <code class="ph codeph">nul</code> character (not the same as a SQL
+ <code class="ph codeph">NULL</code>). This might not be visible in <span class="keyword cmdname">impala-shell</span> output.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\Z</code> represents a DOS end-of-file character. This might not be visible in
+ <span class="keyword cmdname">impala-shell</span> output.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\%</code> and <code class="ph codeph">\_</code> can be used to escape wildcard characters within the string
+ passed to the <code class="ph codeph">LIKE</code> operator.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">\</code> followed by 3 octal digits represents the ASCII code of a single character; for
+ example, <code class="ph codeph">\101</code> is ASCII 65, the character <code class="ph codeph">A</code>.
+ </li>
+
+ <li class="li">
+ Use two consecutive backslashes (<code class="ph codeph">\\</code>) to prevent the backslash from being interpreted as
+ an escape character.
+ </li>
+
+ <li class="li">
+ Use the backslash to escape single or double quotation mark characters within a string literal, if the
+ literal is enclosed by the same type of quotation mark.
+ </li>
+
+ <li class="li">
+ If the character following the <code class="ph codeph">\</code> does not represent the start of a recognized escape
+ sequence, the character is passed through unchanged.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Quotes within quotes:</strong>
+ </p>
+
+ <p class="p">
+ To include a single quotation character within a string value, enclose the literal with either single or
+ double quotation marks, and optionally escape the single quote as a <code class="ph codeph">\'</code> sequence. Earlier
+ releases required escaping a single quote inside double quotes. Continue using escape sequences in this
+ case if you also need to run your SQL code on older versions of Impala.
+ </p>
+
+ <p class="p">
+ To include a double quotation character within a string value, enclose the literal with single quotation
+ marks, no escaping is necessary in this case. Or, enclose the literal with double quotation marks and
+ escape the double quote as a <code class="ph codeph">\"</code> sequence.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select "What\'s happening?" as single_within_double,
+ > 'I\'m not sure.' as single_within_single,
+ > "Homer wrote \"The Iliad\"." as double_within_double,
+ > 'Homer also wrote "The Odyssey".' as double_within_single;
++----------------------+----------------------+--------------------------+---------------------------------+
+| single_within_double | single_within_single | double_within_double | double_within_single |
++----------------------+----------------------+--------------------------+---------------------------------+
+| What's happening? | I'm not sure. | Homer wrote "The Iliad". | Homer also wrote "The Odyssey". |
++----------------------+----------------------+--------------------------+---------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Field terminator character in CREATE TABLE:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">CREATE TABLE</code> clauses <code class="ph codeph">FIELDS TERMINATED BY</code>, <code class="ph codeph">ESCAPED
+ BY</code>, and <code class="ph codeph">LINES TERMINATED BY</code> have special rules for the string literal used for
+ their argument, because they all require a single character. You can use a regular character surrounded by
+ single or double quotation marks, an octal sequence such as <code class="ph codeph">'\054'</code> (representing a comma),
+ or an integer in the range '-127'..'128' (with quotation marks but no backslash), which is interpreted as a
+ single-byte ASCII character. Negative values are subtracted from 256; for example, <code class="ph codeph">FIELDS
+ TERMINATED BY '-2'</code> sets the field delimiter to ASCII code 254, the <span class="q">"Icelandic Thorn"</span>
+ character used as a delimiter by some data formats.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">impala-shell considerations:</strong>
+ </p>
+
+ <p class="p">
+ When dealing with output that includes non-ASCII or non-printable characters such as linefeeds and
+ backspaces, use the <span class="keyword cmdname">impala-shell</span> options to save to a file, turn off pretty printing, or
+ both rather than relying on how the output appears visually. See
+ <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for a list of <span class="keyword cmdname">impala-shell</span>
+ options.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="literals__boolean_literals">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Boolean Literals</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For <code class="ph codeph">BOOLEAN</code> values, the literals are <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code>,
+ with no quotation marks and case-insensitive.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>select true;
+select * from t1 where assertion = false;
+select case bool_col when true then 'yes' when false 'no' else 'null' end from t1;</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="literals__timestamp_literals">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Timestamp Literals</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala automatically converts <code class="ph codeph">STRING</code> literals of the
+ correct format into <code class="ph codeph">TIMESTAMP</code> values. Timestamp values
+ are accepted in the format <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>,
+ and can consist of just the date, or just the time, with or without the
+ fractional second portion. For example, you can specify <code class="ph codeph">TIMESTAMP</code>
+ values such as <code class="ph codeph">'1966-07-30'</code>, <code class="ph codeph">'08:30:00'</code>,
+ or <code class="ph codeph">'1985-09-25 17:45:30.005'</code>.
+ </p>
+
+ <p class="p">
+ You can also use <code class="ph codeph">INTERVAL</code> expressions to add or subtract from timestamp literal values,
+ such as <code class="ph codeph">CAST('1966-07-30' AS TIMESTAMP) + INTERVAL 5 YEARS + INTERVAL 3 DAYS</code>. See
+ <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+ </p>
+
+ <p class="p">
+ Depending on your data pipeline, you might receive date and time data as text, in notation that does not
+ exactly match the format for Impala <code class="ph codeph">TIMESTAMP</code> literals.
+ See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for functions that can convert
+ between a variety of string literals (including different field order, separators, and timezone notation)
+ and equivalent <code class="ph codeph">TIMESTAMP</code> or numeric values.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="literals__null">
+
+ <h2 class="title topictitle2" id="ariaid-title6">NULL</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The notion of <code class="ph codeph">NULL</code> values is familiar from all kinds of database systems, but each SQL
+ dialect can have its own behavior and restrictions on <code class="ph codeph">NULL</code> values. For Big Data
+ processing, the precise semantics of <code class="ph codeph">NULL</code> values are significant: any misunderstanding
+ could lead to inaccurate results or misformatted data, that could be time-consuming to correct for large
+ data sets.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">NULL</code> is a different value than an empty string. The empty string is represented by a
+ string literal with nothing inside, <code class="ph codeph">""</code> or <code class="ph codeph">''</code>.
+ </li>
+
+ <li class="li">
+ In a delimited text file, the <code class="ph codeph">NULL</code> value is represented by the special token
+ <code class="ph codeph">\N</code>.
+ </li>
+
+ <li class="li">
+ When Impala inserts data into a partitioned table, and the value of one of the partitioning columns is
+ <code class="ph codeph">NULL</code> or the empty string, the data is placed in a special partition that holds only
+ these two kinds of values. When these values are returned in a query, the result is <code class="ph codeph">NULL</code>
+ whether the value was originally <code class="ph codeph">NULL</code> or an empty string. This behavior is compatible
+ with the way Hive treats <code class="ph codeph">NULL</code> values in partitioned tables. Hive does not allow empty
+ strings as partition keys, and it returns a string value such as
+ <code class="ph codeph">__HIVE_DEFAULT_PARTITION__</code> instead of <code class="ph codeph">NULL</code> when such values are
+ returned from a query. For example:
+<pre class="pre codeblock"><code>create table t1 (i int) partitioned by (x int, y string);
+-- Select an INT column from another table, with all rows going into a special HDFS subdirectory
+-- named __HIVE_DEFAULT_PARTITION__. Depending on whether one or both of the partitioning keys
+-- are null, this special directory name occurs at different levels of the physical data directory
+-- for the table.
+insert into t1 partition(x=NULL, y=NULL) select c1 from some_other_table;
+insert into t1 partition(x, y=NULL) select c1, c2 from some_other_table;
+insert into t1 partition(x=NULL, y) select c1, c3 from some_other_table;</code></pre>
+ </li>
+
+ <li class="li">
+ There is no <code class="ph codeph">NOT NULL</code> clause when defining a column to prevent <code class="ph codeph">NULL</code>
+ values in that column.
+ </li>
+
+ <li class="li">
+ There is no <code class="ph codeph">DEFAULT</code> clause to specify a non-<code class="ph codeph">NULL</code> default value.
+ </li>
+
+ <li class="li">
+ If an <code class="ph codeph">INSERT</code> operation mentions some columns but not others, the unmentioned columns
+ contain <code class="ph codeph">NULL</code> for all inserted rows.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+ <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+ DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+ sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+ <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+ with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+ behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+ LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+
+ Because the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code> keywords are not currently
+ available in Hive queries, any views you create using those keywords will not be available through
+ Hive.
+ </div>
+ </li>
+
+ <li class="li">
+ In all other contexts besides sorting with <code class="ph codeph">ORDER BY</code>, comparing a <code class="ph codeph">NULL</code>
+ to anything else returns <code class="ph codeph">NULL</code>, making the comparison meaningless. For example,
+ <code class="ph codeph">10 > NULL</code> produces <code class="ph codeph">NULL</code>, <code class="ph codeph">10 < NULL</code> also produces
+ <code class="ph codeph">NULL</code>, <code class="ph codeph">5 BETWEEN 1 AND NULL</code> produces <code class="ph codeph">NULL</code>, and so on.
+ </li>
+ </ul>
+
+ <p class="p">
+ Several built-in functions serve as shorthand for evaluating expressions and returning
+ <code class="ph codeph">NULL</code>, 0, or some other substitution value depending on the expression result:
+ <code class="ph codeph">ifnull()</code>, <code class="ph codeph">isnull()</code>, <code class="ph codeph">nvl()</code>, <code class="ph codeph">nullif()</code>,
+ <code class="ph codeph">nullifzero()</code>, and <code class="ph codeph">zeroifnull()</code>. See
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Columns in Kudu tables have an attribute that specifies whether or not they can contain
+ <code class="ph codeph">NULL</code> values. A column with a <code class="ph codeph">NULL</code> attribute can contain
+ nulls. A column with a <code class="ph codeph">NOT NULL</code> attribute cannot contain any nulls, and
+ an <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or <code class="ph codeph">UPSERT</code> statement
+ will skip any row that attempts to store a null in a column designated as <code class="ph codeph">NOT NULL</code>.
+ Kudu tables default to the <code class="ph codeph">NULL</code> setting for each column, except columns that
+ are part of the primary key.
+ </p>
+ <p class="p">
+ In addition to columns with the <code class="ph codeph">NOT NULL</code> attribute, Kudu tables also have
+ restrictions on <code class="ph codeph">NULL</code> values in columns that are part of the primary key for
+ a table. No column that is part of the primary key in a Kudu table can contain any
+ <code class="ph codeph">NULL</code> values.
+ </p>
+
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_live_progress.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_live_progress.html b/docs/build3x/html/topics/impala_live_progress.html
new file mode 100644
index 0000000..bce7807
--- /dev/null
+++ b/docs/build3x/html/topics/impala_live_progress.html
@@ -0,0 +1,131 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="live_progress"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</title></head><body id="live_progress"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">LIVE_PROGRESS Query Option (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ For queries submitted through the <span class="keyword cmdname">impala-shell</span> command,
+ displays an interactive progress bar showing roughly what percentage of
+ processing has been completed. When the query finishes, the progress bar is erased
+ from the <span class="keyword cmdname">impala-shell</span> console output.
+ </p>
+
+ <p class="p">
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Command-line equivalent:</strong>
+ </p>
+ <p class="p">
+ You can enable this query option within <span class="keyword cmdname">impala-shell</span>
+ by starting the shell with the <code class="ph codeph">--live_progress</code>
+ command-line option.
+ You can still turn this setting off and on again within the shell through the
+ <code class="ph codeph">SET</code> command.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The output from this query option is printed to standard error. The output is only displayed in interactive mode,
+ that is, not when the <code class="ph codeph">-q</code> or <code class="ph codeph">-f</code> options are used.
+ </p>
+ <p class="p">
+ For a more detailed way of tracking the progress of an interactive query through
+ all phases of processing, see <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+ <p class="p">
+ Because the percentage complete figure is calculated using the number of
+ issued and completed <span class="q">"scan ranges"</span>, which occur while reading the table
+ data, the progress bar might reach 100% before the query is entirely finished.
+ For example, the query might do work to perform aggregations after all the
+ table data has been read. If many of your queries fall into this category,
+ consider using the <code class="ph codeph">LIVE_SUMMARY</code> option instead for
+ more granular progress reporting.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ currently do not produce any output during <code class="ph codeph">COMPUTE STATS</code> operations.
+ </p>
+ <div class="p">
+ Because the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ are available only within the <span class="keyword cmdname">impala-shell</span> interpreter:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You cannot change these query options through the SQL <code class="ph codeph">SET</code>
+ statement using the JDBC or ODBC interfaces. The <code class="ph codeph">SET</code>
+ command in <span class="keyword cmdname">impala-shell</span> recognizes these names as
+ shell-only options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Be careful when using <span class="keyword cmdname">impala-shell</span> on a pre-<span class="keyword">Impala 2.3</span>
+ system to connect to a system running <span class="keyword">Impala 2.3</span> or higher.
+ The older <span class="keyword cmdname">impala-shell</span> does not recognize these
+ query option names. Upgrade <span class="keyword cmdname">impala-shell</span> on the
+ systems where you intend to use these query options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Likewise, the <span class="keyword cmdname">impala-shell</span> command relies on
+ some information only available in <span class="keyword">Impala 2.3</span> and higher
+ to prepare live progress reports and query summaries. The
+ <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code>
+ query options have no effect when <span class="keyword cmdname">impala-shell</span> connects
+ to a cluster running an older version of Impala.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > set live_progress=true;
+LIVE_PROGRESS set to true
+[localhost:21000] > select count(*) from customer;
++----------+
+| count(*) |
++----------+
+| 150000 |
++----------+
+[localhost:21000] > select count(*) from customer t1 cross join customer t2;
+[################################### ] 50%
+[######################################################################] 100%
+
+
+</code></pre>
+
+ <p class="p">
+ To see how the <code class="ph codeph">LIVE_PROGRESS</code> and <code class="ph codeph">LIVE_SUMMARY</code> query options
+ work in real time, see <a class="xref" href="https://asciinema.org/a/1rv7qippo0fe7h5k1b6k4nexk" target="_blank">this animated demo</a>.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[35/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_describe.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_describe.html b/docs/build3x/html/topics/impala_describe.html
new file mode 100644
index 0000000..5c4edf9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_describe.html
@@ -0,0 +1,817 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="describe"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DESCRIBE Statement</title></head><body id="describe"><main role="main"><article role="article" aria-labelledby="describe__desc">
+
+ <h1 class="title topictitle1" id="describe__desc">DESCRIBE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">DESCRIBE</code> statement displays metadata about a table, such as the column names and their
+ data types.
+ <span class="ph">In <span class="keyword">Impala 2.3</span> and higher, you can specify the name of a complex type column, which takes
+ the form of a dotted path. The path might include multiple components in the case of a nested type definition.</span>
+ <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph">DESCRIBE DATABASE</code> form can display
+ information about a database.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DESCRIBE [DATABASE] [FORMATTED|EXTENDED] <var class="keyword varname">object_name</var>
+
+object_name ::=
+ [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>[.<var class="keyword varname">complex_col_name</var> ...]
+ | <var class="keyword varname">db_name</var>
+</code></pre>
+
+ <p class="p">
+ You can use the abbreviation <code class="ph codeph">DESC</code> for the <code class="ph codeph">DESCRIBE</code> statement.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">DESCRIBE FORMATTED</code> variation displays additional information, in a format familiar to
+ users of Apache Hive. The extra information includes low-level details such as whether the table is internal
+ or external, when it was created, the file format, the location of the data in HDFS, whether the object is a
+ table or a view, and (for views) the text of the query from the view definition.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">Compressed</code> field is not a reliable indicator of whether the table contains compressed
+ data. It typically always shows <code class="ph codeph">No</code>, because the compression settings only apply during the
+ session that loads data and are not stored persistently with the table metadata.
+ </div>
+
+<p class="p">
+ <strong class="ph b">Describing databases:</strong>
+</p>
+
+<p class="p">
+ By default, the <code class="ph codeph">DESCRIBE</code> output for a database includes the location
+ and the comment, which can be set by the <code class="ph codeph">LOCATION</code> and <code class="ph codeph">COMMENT</code>
+ clauses on the <code class="ph codeph">CREATE DATABASE</code> statement.
+</p>
+
+<p class="p">
+ The additional information displayed by the <code class="ph codeph">FORMATTED</code> or <code class="ph codeph">EXTENDED</code>
+ keyword includes the HDFS user ID that is considered the owner of the database, and any
+ optional database properties. The properties could be specified by the <code class="ph codeph">WITH DBPROPERTIES</code>
+ clause if the database is created using a Hive <code class="ph codeph">CREATE DATABASE</code> statement.
+ Impala currently does not set or do any special processing based on those properties.
+</p>
+
+<p class="p">
+The following examples show the variations in syntax and output for
+describing databases. This feature is available in <span class="keyword">Impala 2.5</span>
+and higher.
+</p>
+
+<pre class="pre codeblock"><code>
+describe database default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
++---------+----------------------+-----------------------+
+
+describe database formatted default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner: | | |
+| | public | ROLE |
++---------+----------------------+-----------------------+
+
+describe database extended default;
++---------+----------------------+-----------------------+
+| name | location | comment |
++---------+----------------------+-----------------------+
+| default | /user/hive/warehouse | Default Hive database |
+| Owner: | | |
+| | public | ROLE |
++---------+----------------------+-----------------------+
+</code></pre>
+
+<p class="p">
+ <strong class="ph b">Describing tables:</strong>
+</p>
+
+<p class="p">
+ If the <code class="ph codeph">DATABASE</code> keyword is omitted, the default
+ for the <code class="ph codeph">DESCRIBE</code> statement is to refer to a table.
+</p>
+ <p class="p">
+ If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+ columns and no other relevant table/database/server-level privileges,
+ <code class="ph codeph">DESCRIBE</code> returns the data from the columns you have
+ access to.
+ </p>
+
+ <p class="p">
+ If you have the <code class="ph codeph">SELECT</code> privilege on a subset of the table
+ columns and no other relevant table/database/server-level privileges,
+ <code class="ph codeph">DESCRIBE FORMATTED/EXTENDED</code> does not return
+ the <code class="ph codeph">LOCATION</code> field. The <code class="ph codeph">LOCATION</code> data
+ is shown if you have any privilege on the table, the containing database
+ or the server.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- By default, the table is assumed to be in the current database.
+describe my_table;
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| s | string | |
++------+--------+---------+
+
+-- Use a fully qualified table name to specify a table in any database.
+describe my_database.my_table;
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| s | string | |
++------+--------+---------+
+
+-- The formatted or extended output includes additional useful information.
+-- The LOCATION field is especially useful to know for DDL statements and HDFS commands
+-- during ETL jobs. (The LOCATION includes a full hdfs:// URL, omitted here for readability.)
+describe formatted my_table;
++------------------------------+----------------------------------------------+----------------------+
+| name | type | comment |
++------------------------------+----------------------------------------------+----------------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | NULL |
+| s | string | NULL |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | my_database | NULL |
+| Owner: | jrussell | NULL |
+| CreateTime: | Fri Mar 18 15:58:00 PDT 2016 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | /user/hive/warehouse/my_database.db/my_table | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1458341880 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org. ... .LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
+| OutputFormat: | org. ... .HiveIgnoreKeyTextOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+----------------------------------------------+----------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because the column definitions for complex types can become long, particularly when such types are nested,
+ the <code class="ph codeph">DESCRIBE</code> statement uses special formatting for complex type columns to make the output readable.
+ </p>
+
+ <p class="p">
+ For the <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types available in
+ <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">DESCRIBE</code> output is formatted to avoid
+ excessively long lines for multiple fields within a <code class="ph codeph">STRUCT</code>, or a nested sequence of
+ complex types.
+ </p>
+
+ <p class="p">
+ You can pass a multi-part qualified name to <code class="ph codeph">DESCRIBE</code>
+ to specify an <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ column and visualize its structure as if it were a table.
+ For example, if table <code class="ph codeph">T1</code> contains an <code class="ph codeph">ARRAY</code> column
+ <code class="ph codeph">A1</code>, you could issue the statement <code class="ph codeph">DESCRIBE t1.a1</code>.
+ If table <code class="ph codeph">T1</code> contained a <code class="ph codeph">STRUCT</code> column <code class="ph codeph">S1</code>,
+ and a field <code class="ph codeph">F1</code> within the <code class="ph codeph">STRUCT</code> was a <code class="ph codeph">MAP</code>,
+ you could issue the statement <code class="ph codeph">DESCRIBE t1.s1.f1</code>.
+ An <code class="ph codeph">ARRAY</code> is shown as a two-column table, with
+ <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code> columns.
+ A <code class="ph codeph">STRUCT</code> is shown as a table with each field
+ representing a column in the table.
+ A <code class="ph codeph">MAP</code> is shown as a two-column table, with
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> columns.
+ </p>
+
+ <p class="p">
+ For example, here is the <code class="ph codeph">DESCRIBE</code> output for a table containing a single top-level column
+ of each complex type:
+ </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, a array<int>, s struct<f1: string, f2: bigint>, m map<string,int>) stored as parquet;
+
+describe t1;
++------+-----------------+---------+
+| name | type | comment |
++------+-----------------+---------+
+| x | int | |
+| a | array<int> | |
+| s | struct< | |
+| | f1:string, | |
+| | f2:bigint | |
+| | > | |
+| m | map<string,int> | |
++------+-----------------+---------+
+
+</code></pre>
+
+ <p class="p">
+ Here are examples showing how to <span class="q">"drill down"</span> into the layouts of complex types, including
+ using multi-part names to examine the definitions of nested types.
+ The <code class="ph codeph">< ></code> delimiters identify the columns with complex types;
+ these are the columns where you can descend another level to see the parts that make up
+ the complex type.
+ This technique helps you to understand the multi-part names you use as table references in queries
+ involving complex types, and the corresponding column names you refer to in the <code class="ph codeph">SELECT</code> list.
+ These tables are from the <span class="q">"nested TPC-H"</span> schema, shown in detail in
+ <a class="xref" href="impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">REGION</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The first <code class="ph codeph">DESCRIBE</code> specifies the table name, to display the definition
+ of each top-level column.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The second <code class="ph codeph">DESCRIBE</code> specifies the name of a complex
+ column, <code class="ph codeph">REGION.R_NATIONS</code>, showing that when you include the name of an <code class="ph codeph">ARRAY</code>
+ column in a <code class="ph codeph">FROM</code> clause, that table reference acts like a two-column table with
+ columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The final <code class="ph codeph">DESCRIBE</code> specifies the fully qualified name of the <code class="ph codeph">ITEM</code> field,
+ to display the layout of its underlying <code class="ph codeph">STRUCT</code> type in table format, with the fields
+ mapped to column names.
+ </p>
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>
+-- #1: The overall layout of the entire table.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- #2: The ARRAY column within the table.
+describe region.r_nations;
++------+-------------------------+---------+
+| name | type | comment |
++------+-------------------------+---------+
+| item | struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | > | |
+| pos | bigint | |
++------+-------------------------+---------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+-- The fields of the STRUCT act like columns of a table.
+describe region.r_nations.item;
++-------------+----------+---------+
+| name | type | comment |
++-------------+----------+---------+
+| n_nationkey | smallint | |
+| n_name | string | |
+| n_comment | string | |
++-------------+----------+---------+
+
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">CUSTOMER</code> table contains an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>
+ elements, where one field in the <code class="ph codeph">STRUCT</code> is another <code class="ph codeph">ARRAY</code> of
+ <code class="ph codeph">STRUCT</code> elements:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Again, the initial <code class="ph codeph">DESCRIBE</code> specifies only the table name.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The second <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the complex
+ column, <code class="ph codeph">CUSTOMER.C_ORDERS</code>, showing how an <code class="ph codeph">ARRAY</code>
+ is represented as a two-column table with columns <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The third <code class="ph codeph">DESCRIBE</code> specifies the qualified name of the <code class="ph codeph">ITEM</code>
+ of the <code class="ph codeph">ARRAY</code> column, to see the structure of the nested <code class="ph codeph">ARRAY</code>.
+ Again, it has has two parts, <code class="ph codeph">ITEM</code> and <code class="ph codeph">POS</code>. Because the
+ <code class="ph codeph">ARRAY</code> contains a <code class="ph codeph">STRUCT</code>, the layout of the <code class="ph codeph">STRUCT</code>
+ is shown.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The fourth and fifth <code class="ph codeph">DESCRIBE</code> statements drill down into a <code class="ph codeph">STRUCT</code> field that
+ is itself a complex type, an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>.
+ The <code class="ph codeph">ITEM</code> portion of the qualified name is only required when the <code class="ph codeph">ARRAY</code>
+ elements are anonymous. The fields of the <code class="ph codeph">STRUCT</code> give names to any other complex types
+ nested inside the <code class="ph codeph">STRUCT</code>. Therefore, the <code class="ph codeph">DESCRIBE</code> parameters
+ <code class="ph codeph">CUSTOMER.C_ORDERS.ITEM.O_LINEITEMS</code> and <code class="ph codeph">CUSTOMER.C_ORDERS.O_LINEITEMS</code>
+ are equivalent. (For brevity, leave out the <code class="ph codeph">ITEM</code> portion of
+ a qualified name when it is not required.)
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The final <code class="ph codeph">DESCRIBE</code> shows the layout of the deeply nested <code class="ph codeph">STRUCT</code> type.
+ Because there are no more complex types nested inside this <code class="ph codeph">STRUCT</code>, this is as far
+ as you can drill down into the layout for this table.
+ </p>
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>-- #1: The overall layout of the entire table.
+describe customer;
++--------------+------------------------------------+
+| name | type |
++--------------+------------------------------------+
+| c_custkey | bigint |
+... more scalar columns ...
+| c_orders | array<struct< |
+| | o_orderkey:bigint, |
+| | o_orderstatus:string, |
+| | o_totalprice:decimal(12,2), |
+| | o_orderdate:string, |
+| | o_orderpriority:string, |
+| | o_clerk:string, |
+| | o_shippriority:int, |
+| | o_comment:string, |
+| | o_lineitems:array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+| | l_linenumber:int, |
+| | l_quantity:decimal(12,2), |
+| | l_extendedprice:decimal(12,2), |
+| | l_discount:decimal(12,2), |
+| | l_tax:decimal(12,2), |
+| | l_returnflag:string, |
+| | l_linestatus:string, |
+| | l_shipdate:string, |
+| | l_commitdate:string, |
+| | l_receiptdate:string, |
+| | l_shipinstruct:string, |
+| | l_shipmode:string, |
+| | l_comment:string |
+| | >> |
+| | >> |
++--------------+------------------------------------+
+
+-- #2: The ARRAY column within the table.
+describe customer.c_orders;
++------+------------------------------------+
+| name | type |
++------+------------------------------------+
+| item | struct< |
+| | o_orderkey:bigint, |
+| | o_orderstatus:string, |
+... more struct fields ...
+| | o_lineitems:array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more nested struct fields ...
+| | l_comment:string |
+| | >> |
+| | > |
+| pos | bigint |
++------+------------------------------------+
+
+-- #3: The STRUCT that makes up each ARRAY element.
+-- The fields of the STRUCT act like columns of a table.
+describe customer.c_orders.item;
++-----------------+----------------------------------+
+| name | type |
++-----------------+----------------------------------+
+| o_orderkey | bigint |
+| o_orderstatus | string |
+| o_totalprice | decimal(12,2) |
+| o_orderdate | string |
+| o_orderpriority | string |
+| o_clerk | string |
+| o_shippriority | int |
+| o_comment | string |
+| o_lineitems | array<struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | >> |
++-----------------+----------------------------------+
+
+-- #4: The ARRAY nested inside the STRUCT elements of the first ARRAY.
+describe customer.c_orders.item.o_lineitems;
++------+----------------------------------+
+| name | type |
++------+----------------------------------+
+| item | struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | > |
+| pos | bigint |
++------+----------------------------------+
+
+-- #5: Shorter form of the previous DESCRIBE. Omits the .ITEM portion of the name
+-- because O_LINEITEMS and other field names provide a way to refer to things
+-- inside the ARRAY element.
+describe customer.c_orders.o_lineitems;
++------+----------------------------------+
+| name | type |
++------+----------------------------------+
+| item | struct< |
+| | l_partkey:bigint, |
+| | l_suppkey:bigint, |
+... more struct fields ...
+| | l_comment:string |
+| | > |
+| pos | bigint |
++------+----------------------------------+
+
+-- #6: The STRUCT representing ARRAY elements nested inside
+-- another ARRAY of STRUCTs. The lack of any complex types
+-- in this output means this is as far as DESCRIBE can
+-- descend into the table layout.
+describe customer.c_orders.o_lineitems.item;
++-----------------+---------------+
+| name | type |
++-----------------+---------------+
+| l_partkey | bigint |
+| l_suppkey | bigint |
+... more scalar columns ...
+| l_comment | string |
++-----------------+---------------+
+
+</code></pre>
+
+<p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+<p class="p">
+ After the <span class="keyword cmdname">impalad</span> daemons are restarted, the first query against a table can take longer
+ than subsequent queries, because the metadata for the table is loaded before the query is processed. This
+ one-time delay for each table can cause misleading results in benchmark tests or cause unnecessary concern.
+ To <span class="q">"warm up"</span> the Impala metadata cache, you can issue a <code class="ph codeph">DESCRIBE</code> statement in advance
+ for each table you intend to access later.
+</p>
+
+<p class="p">
+ When you are dealing with data files stored in HDFS, sometimes it is important to know details such as the
+ path of the data files for an Impala table, and the hostname for the namenode. You can get this information
+ from the <code class="ph codeph">DESCRIBE FORMATTED</code> output. You specify HDFS URIs or path specifications with
+ statements such as <code class="ph codeph">LOAD DATA</code> and the <code class="ph codeph">LOCATION</code> clause of <code class="ph codeph">CREATE
+ TABLE</code> or <code class="ph codeph">ALTER TABLE</code>. You might also use HDFS URIs or paths with Linux commands
+ such as <span class="keyword cmdname">hadoop</span> and <span class="keyword cmdname">hdfs</span> to copy, rename, and so on, data files in HDFS.
+</p>
+
+<p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+<p class="p">
+ Each table can also have associated table statistics and column statistics. To see these categories of
+ information, use the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and <code class="ph codeph">SHOW COLUMN
+ STATS <var class="keyword varname">table_name</var></code> statements.
+
+ See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+</p>
+
+<div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+<p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<p class="p">
+ The following example shows the results of both a standard <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">DESCRIBE
+ FORMATTED</code> for different kinds of schema objects:
+</p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">DESCRIBE</code> for a table or a view returns the name, type, and comment for each of the
+ columns. For a view, if the column value is computed by an expression, the column name is automatically
+ generated as <code class="ph codeph">_c0</code>, <code class="ph codeph">_c1</code>, and so on depending on the ordinal number of the
+ column.
+ </li>
+
+ <li class="li">
+ A table created with no special format or storage clauses is designated as a <code class="ph codeph">MANAGED_TABLE</code>
+ (an <span class="q">"internal table"</span> in Impala terminology). Its data files are stored in an HDFS directory under the
+ default Hive data directory. By default, it uses Text data format.
+ </li>
+
+ <li class="li">
+ A view is designated as <code class="ph codeph">VIRTUAL_VIEW</code> in <code class="ph codeph">DESCRIBE FORMATTED</code> output. Some
+ of its properties are <code class="ph codeph">NULL</code> or blank because they are inherited from the base table. The
+ text of the query that defines the view is part of the <code class="ph codeph">DESCRIBE FORMATTED</code> output.
+ </li>
+
+ <li class="li">
+ A table with additional clauses in the <code class="ph codeph">CREATE TABLE</code> statement has differences in
+ <code class="ph codeph">DESCRIBE FORMATTED</code> output. The output for <code class="ph codeph">T2</code> includes the
+ <code class="ph codeph">EXTERNAL_TABLE</code> keyword because of the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, and
+ different <code class="ph codeph">InputFormat</code> and <code class="ph codeph">OutputFormat</code> fields to reflect the Parquet file
+ format.
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, y int, s string);
+Query: create table t1 (x int, y int, s string)
+[localhost:21000] > describe t1;
+Query: describe t1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| y | int | |
+| s | string | |
++------+--------+---------+
+Returned 3 row(s) in 0.13s
+[localhost:21000] > describe formatted t1;
+Query: describe formatted t1
+Query finished, fetching results ...
++------------------------------+--------------------------------------------+------------+
+| name | type | comment |
++------------------------------+--------------------------------------------+------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| y | int | None |
+| s | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 17:03:16 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | hdfs://127.0.0.1:8020/user/hive/warehouse/ | |
+| | describe_formatted.db/t1 | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1374526996 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org.apache.hadoop.hive.serde2.lazy. | |
+| | LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.hadoop.mapred.TextInputFormat | NULL |
+| OutputFormat: | org.apache.hadoop.hive.ql.io. | |
+| | HiveIgnoreKeyTextOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+--------------------------------------------+------------+
+Returned 26 row(s) in 0.03s
+[localhost:21000] > create view v1 as select x, upper(s) from t1;
+Query: create view v1 as select x, upper(s) from t1
+[localhost:21000] > describe v1;
+Query: describe v1
+Query finished, fetching results ...
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | int | |
+| _c1 | string | |
++------+--------+---------+
+Returned 2 row(s) in 0.10s
+[localhost:21000] > describe formatted v1;
+Query: describe formatted v1
+Query finished, fetching results ...
++------------------------------+------------------------------+----------------------+
+| name | type | comment |
++------------------------------+------------------------------+----------------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| _c1 | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 16:56:38 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Table Type: | VIRTUAL_VIEW | NULL |
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1374526598 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | null | NULL |
+| InputFormat: | null | NULL |
+| OutputFormat: | null | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
+| | NULL | NULL |
+| # View Information | NULL | NULL |
+| View Original Text: | SELECT x, upper(s) FROM t1 | NULL |
+| View Expanded Text: | SELECT x, upper(s) FROM t1 | NULL |
++------------------------------+------------------------------+----------------------+
+Returned 28 row(s) in 0.03s
+[localhost:21000] > create external table t2 (x int, y int, s string) stored as parquet location '/user/doc_demo/sample_data';
+[localhost:21000] > describe formatted t2;
+Query: describe formatted t2
+Query finished, fetching results ...
++------------------------------+----------------------------------------------------+------------+
+| name | type | comment |
++------------------------------+----------------------------------------------------+------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| y | int | None |
+| s | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | describe_formatted | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 22 17:01:47 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | hdfs://127.0.0.1:8020/user/doc_demo/sample_data | NULL |
+| Table Type: | EXTERNAL_TABLE | NULL |
+| Table Parameters: | NULL | NULL |
+| | EXTERNAL | TRUE |
+| | transient_lastDdlTime | 1374526907 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | NULL |
+| InputFormat: | org.apache.impala.hive.serde.ParquetInputFormat | NULL |
+| OutputFormat: | org.apache.impala.hive.serde.ParquetOutputFormat | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
++------------------------------+----------------------------------------------------+------------+
+Returned 27 row(s) in 0.17s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read and execute
+ permissions for all directories that are part of the table.
+ (A table could span multiple different HDFS directories if it is partitioned.
+ The directories could be widely scattered because a partition can reside
+ in an arbitrary HDFS directory based on its <code class="ph codeph">LOCATION</code> attribute.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ The information displayed for Kudu tables includes the additional attributes
+ that are only applicable for Kudu tables:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Whether or not the column is part of the primary key. Every Kudu table
+ has a <code class="ph codeph">true</code> value here for at least one column. There
+ could be multiple <code class="ph codeph">true</code> values, for tables with
+ composite primary keys.
+ </li>
+ <li class="li">
+ Whether or not the column is nullable. Specified by the <code class="ph codeph">NULL</code>
+ or <code class="ph codeph">NOT NULL</code> attributes on the <code class="ph codeph">CREATE TABLE</code> statement.
+ Columns that are part of the primary key are automatically non-nullable.
+ </li>
+ <li class="li">
+ The default value, if any, for the column. Specified by the <code class="ph codeph">DEFAULT</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement. If the default value is
+ <code class="ph codeph">NULL</code>, that is not indicated in this column. It is implied by
+ <code class="ph codeph">nullable</code> being true and no other default value specified.
+ </li>
+ <li class="li">
+ The encoding used for values in the column. Specified by the <code class="ph codeph">ENCODING</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+ <li class="li">
+ The compression used for values in the column. Specified by the <code class="ph codeph">COMPRESSION</code>
+ attribute on the <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+ <li class="li">
+ The block size (in bytes) used for the underlying Kudu storage layer for the column.
+ Specified by the <code class="ph codeph">BLOCK_SIZE</code> attribute on the <code class="ph codeph">CREATE TABLE</code>
+ statement.
+ </li>
+ </ul>
+
+ <p class="p">
+ The following example shows <code class="ph codeph">DESCRIBE</code> output for a simple Kudu table, with
+ a single-column primary key and all column attributes left with their default values:
+ </p>
+
+<pre class="pre codeblock"><code>
+describe million_rows;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| id | string | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| s | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows <code class="ph codeph">DESCRIBE</code> output for a Kudu table with a
+ two-column primary key, and Kudu-specific attributes applied to some columns:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table kudu_describe_example
+(
+ c1 int, c2 int,
+ c3 string, c4 string not null, c5 string default 'n/a', c6 string default '',
+ c7 bigint not null, c8 bigint null default null, c9 bigint default -1 encoding bit_shuffle,
+ primary key(c1,c2)
+)
+partition by hash (c1, c2) partitions 10 stored as kudu;
+
+describe kudu_describe_example;
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+| c1 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c2 | int | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c3 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c4 | string | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c5 | string | | false | true | n/a | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c6 | string | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c7 | bigint | | false | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c8 | bigint | | false | true | | AUTO_ENCODING | DEFAULT_COMPRESSION | 0 |
+| c9 | bigint | | false | true | -1 | BIT_SHUFFLE | DEFAULT_COMPRESSION | 0 |
++------+--------+---------+-------------+----------+---------------+---------------+---------------------+------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_show.html#show_tables">SHOW TABLES Statement</a>, <a class="xref" href="impala_show.html#show_create_table">SHOW CREATE TABLE Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_development.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_development.html b/docs/build3x/html/topics/impala_development.html
new file mode 100644
index 0000000..5b11207
--- /dev/null
+++ b/docs/build3x/html/topics/impala_development.html
@@ -0,0 +1,197 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_dev"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Developing Impala Applications</title></head><body id="intro_dev"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Developing Impala Applications</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The core development language with Impala is SQL. You can also use Java or other languages to interact with
+ Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For
+ specialized kinds of analysis, you can supplement the SQL built-in functions by writing
+ <a class="xref" href="impala_udf.html#udfs">user-defined functions (UDFs)</a> in C++ or Java.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_dev__intro_sql">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of the Impala SQL Dialect</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala SQL dialect is highly compatible with the SQL syntax used in the Apache Hive component (HiveQL). As
+ such, it is familiar to users who are already familiar with running SQL queries on the Hadoop
+ infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in
+ functions. Impala also includes additional built-in functions for common industry features, to simplify
+ porting SQL from non-Hadoop systems.
+ </p>
+
+ <p class="p">
+ For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+ might seem familiar:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_select.html#select">SELECT statement</a> includes familiar clauses such as <code class="ph codeph">WHERE</code>,
+ <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, and <code class="ph codeph">WITH</code>.
+ You will find familiar notions such as
+ <a class="xref" href="impala_joins.html#joins">joins</a>, <a class="xref" href="impala_functions.html#builtins">built-in
+ functions</a> for processing strings, numbers, and dates,
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">aggregate functions</a>,
+ <a class="xref" href="impala_subqueries.html#subqueries">subqueries</a>, and
+ <a class="xref" href="impala_operators.html#comparison_operators">comparison operators</a>
+ such as <code class="ph codeph">IN()</code> and <code class="ph codeph">BETWEEN</code>.
+ The <code class="ph codeph">SELECT</code> statement is the place where SQL standards compliance is most important.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ From the data warehousing world, you will recognize the notion of
+ <a class="xref" href="impala_partitioning.html#partitioning">partitioned tables</a>.
+ One or more columns serve as partition keys, and the data is physically arranged so that
+ queries that refer to the partition key columns in the <code class="ph codeph">WHERE</code> clause
+ can skip partitions that do not match the filter conditions. For example, if you have 10
+ years worth of data and use a clause such as <code class="ph codeph">WHERE year = 2015</code>,
+ <code class="ph codeph">WHERE year > 2010</code>, or <code class="ph codeph">WHERE year IN (2014, 2015)</code>,
+ Impala skips all the data for non-matching years, greatly reducing the amount of I/O
+ for the query.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In Impala 1.2 and higher, <a class="xref" href="impala_udf.html#udfs">UDFs</a> let you perform custom comparisons
+ and transformation logic during <code class="ph codeph">SELECT</code> and <code class="ph codeph">INSERT...SELECT</code> statements.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ For users coming to Impala from traditional database or data warehousing backgrounds, the following aspects of the SQL dialect
+ might require some learning and practice for you to become proficient in the Hadoop environment:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala SQL is focused on queries and includes relatively little DML. There is no <code class="ph codeph">UPDATE</code>
+ or <code class="ph codeph">DELETE</code> statement. Stale data is typically discarded (by <code class="ph codeph">DROP TABLE</code>
+ or <code class="ph codeph">ALTER TABLE ... DROP PARTITION</code> statements) or replaced (by <code class="ph codeph">INSERT
+ OVERWRITE</code> statements).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ All data creation is done by <code class="ph codeph">INSERT</code> statements, which typically insert data in bulk by
+ querying from other tables. There are two variations, <code class="ph codeph">INSERT INTO</code> which appends to the
+ existing data, and <code class="ph codeph">INSERT OVERWRITE</code> which replaces the entire contents of a table or
+ partition (similar to <code class="ph codeph">TRUNCATE TABLE</code> followed by a new <code class="ph codeph">INSERT</code>).
+ Although there is an <code class="ph codeph">INSERT ... VALUES</code> syntax to create a small number of values in
+ a single statement, it is far more efficient to use the <code class="ph codeph">INSERT ... SELECT</code> to copy
+ and transform large amounts of data from one table to another in a single operation.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You often construct Impala table definitions and data files in some other environment, and then attach
+ Impala so that it can run real-time queries. The same data files and table metadata are shared with other
+ components of the Hadoop ecosystem. In particular, Impala can access tables created by Hive or data
+ inserted by Hive, and Hive can access tables and data produced by Impala. Many other Hadoop components
+ can write files in formats such as Parquet and Avro, that can then be queried by Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL
+ includes some idioms that you might find in the import utilities for traditional database systems. For
+ example, you can create a table that reads comma-separated or tab-separated text files, specifying the
+ separator in the <code class="ph codeph">CREATE TABLE</code> statement. You can create <strong class="ph b">external tables</strong> that read
+ existing data files but do not move or transform them.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does
+ not require length constraints on string data types. For example, you can define a database column as
+ <code class="ph codeph">STRING</code> with unlimited length, rather than <code class="ph codeph">CHAR(1)</code> or
+ <code class="ph codeph">VARCHAR(64)</code>. <span class="ph">(Although in Impala 2.0 and later, you can also use
+ length-constrained <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> types.)</span>
+ </p>
+ </li>
+
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong> <a class="xref" href="impala_langref.html#langref">Impala SQL Language Reference</a>, especially
+ <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a> and <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>
+ </p>
+ </div>
+ </article>
+
+
+
+
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_dev__intro_apis">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Programming Interfaces</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can connect and submit requests to the Impala daemons through:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph"><a class="xref" href="impala_impala_shell.html#impala_shell">impala-shell</a></code> interactive
+ command interpreter.
+ </li>
+
+ <li class="li">
+ The <a class="xref" href="http://gethue.com/" target="_blank">Hue</a> web-based user interface.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_jdbc.html#impala_jdbc">JDBC</a>.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_odbc.html#impala_odbc">ODBC</a>.
+ </li>
+ </ul>
+
+ <p class="p">
+ With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications
+ running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence
+ tools that use the JDBC and ODBC interfaces.
+ </p>
+
+ <p class="p">
+ Each <code class="ph codeph">impalad</code> daemon process, running on separate nodes in a cluster, listens to
+ <a class="xref" href="impala_ports.html#ports">several ports</a> for incoming requests. Requests from
+ <code class="ph codeph">impala-shell</code> and Hue are routed to the <code class="ph codeph">impalad</code> daemons through the same
+ port. The <code class="ph codeph">impalad</code> daemons listen on separate ports for JDBC and ODBC requests.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_codegen.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_codegen.html b/docs/build3x/html/topics/impala_disable_codegen.html
new file mode 100644
index 0000000..3fae1e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_codegen.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_codegen"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_CODEGEN Query Option</title></head><body id="disable_codegen"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_CODEGEN Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ This is a debug option, intended for diagnosing and working around issues that cause crashes. If a query
+ fails with an <span class="q">"illegal instruction"</span> or other hardware-specific message, try setting
+ <code class="ph codeph">DISABLE_CODEGEN=true</code> and running the query again. If the query succeeds only when the
+ <code class="ph codeph">DISABLE_CODEGEN</code> option is turned on, submit the problem to <span class="keyword">the appropriate support channel</span> and include that
+ detail in the problem report. Do not otherwise run with this setting turned on, because it results in lower
+ overall performance.
+ </p>
+
+ <p class="p">
+ Because the code generation phase adds a small amount of overhead for each query, you might turn on the
+ <code class="ph codeph">DISABLE_CODEGEN</code> option to achieve maximum throughput when running many short-lived queries
+ against small tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
new file mode 100644
index 0000000..80d84f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_row_runtime_filtering.html
@@ -0,0 +1,90 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_row_runtime_filtering"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or higher only)</title></head><body id="disable_row_runtime_filtering"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_ROW_RUNTIME_FILTERING Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">DISABLE_ROW_RUNTIME_FILTERING</code> query option
+ reduces the scope of the runtime filtering feature. Queries still dynamically prune
+ partitions, but do not apply the filtering logic to individual rows within partitions.
+ </p>
+
+ <p class="p">
+ Only applies to queries against Parquet tables. For other file formats, Impala
+ only prunes at the level of partitions, not individual rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Impala automatically evaluates whether the per-row filters are being
+ effective at reducing the amount of intermediate data. Therefore,
+ this option is typically only needed for the rare case where Impala
+ cannot accurately determine how effective the per-row filtering is
+ for a query.
+ </p>
+
+ <p class="p">
+ Because the runtime filtering feature applies mainly to resource-intensive
+ and long-running queries, only adjust this query option when tuning long-running queries
+ involving some combination of large partitioned tables and joins involving large tables.
+ </p>
+
+ <p class="p">
+ Because this setting only improves query performance in very specific
+ circumstances, depending on the query characteristics and data distribution,
+ only use it when you determine through benchmarking that it improves
+ performance of specific expensive queries.
+ Consider setting this query option immediately before the expensive query and
+ unsetting it immediately afterward.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">File format considerations:</strong>
+ </p>
+
+ <p class="p">
+ This query option only applies to queries against HDFS-based tables
+ using the Parquet file format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ When applied to a query involving a Kudu table, this option turns off
+ all runtime filtering for the Kudu table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_runtime_filtering.html">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a>,
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a>
+
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
new file mode 100644
index 0000000..bf1f9bc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_streaming_preaggregations.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_streaming_preaggregations"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</title></head><body id="disable_streaming_preaggregations"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_STREAMING_PREAGGREGATIONS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Turns off the <span class="q">"streaming preaggregation"</span> optimization that is available in <span class="keyword">Impala 2.5</span>
+ and higher. This optimization reduces unnecessary work performed by queries that perform aggregation
+ operations on columns with few or no duplicate values, for example <code class="ph codeph">DISTINCT <var class="keyword varname">id_column</var></code>
+ or <code class="ph codeph">GROUP BY <var class="keyword varname">unique_column</var></code>. If the optimization causes regressions in
+ existing queries that use aggregation functions, you can turn it off as needed by setting this query option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+ <code class="ph codeph">true</code> is not recognized. This limitation is
+ tracked by the issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+ which shows the releases where the problem is fixed.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically, queries that would require enabling this option involve very large numbers of
+ aggregated values, such as a billion or more distinct keys being processed on each
+ worker node.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disable_unsafe_spills.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disable_unsafe_spills.html b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
new file mode 100644
index 0000000..63f1c1b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disable_unsafe_spills.html
@@ -0,0 +1,50 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disable_unsafe_spills"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</title></head><body id="disable_unsafe_spills"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISABLE_UNSAFE_SPILLS Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Enable this option if you prefer to have queries fail when they exceed the Impala memory limit, rather than
+ write temporary data to disk.
+ </p>
+
+ <p class="p">
+ Queries that <span class="q">"spill"</span> to disk typically complete successfully, when in earlier Impala releases they would have failed.
+ However, queries with exorbitant memory requirements due to missing statistics or inefficient join clauses could
+ become so slow as a result that you would rather have them cancelled automatically and reduce the memory
+ usage through standard Impala tuning techniques.
+ </p>
+
+ <p class="p">
+ This option prevents only <span class="q">"unsafe"</span> spill operations, meaning that one or more tables are missing
+ statistics or the query does not include a hint to set the most efficient mechanism for a join or
+ <code class="ph codeph">INSERT ... SELECT</code> into a partitioned table. These are the tables most likely to result in
+ suboptimal execution plans that could cause unnecessary spilling. Therefore, leaving this option enabled is a
+ good way to find tables on which to run the <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for information about the <span class="q">"spill to disk"</span>
+ feature for queries processing large result sets with joins, <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP
+ BY</code>, <code class="ph codeph">DISTINCT</code>, aggregation functions, or analytic functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_disk_space.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_disk_space.html b/docs/build3x/html/topics/impala_disk_space.html
new file mode 100644
index 0000000..560be2b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_disk_space.html
@@ -0,0 +1,133 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="disk_space"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Disk Space for Impala Data</title></head><body id="disk_space"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Managing Disk Space for Impala Data</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Impala typically works with many large files in an HDFS storage system with plenty of capacity,
+ there are times when you might perform some file cleanup to reclaim space, or advise developers on techniques
+ to minimize space consumption and file duplication.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Use compact binary file formats where practical. Numeric and time-based data in particular can be stored
+ in more compact form in binary data files. Depending on the file format, various compression and encoding
+ features can reduce file size even further. You can specify the <code class="ph codeph">STORED AS</code> clause as part
+ of the <code class="ph codeph">CREATE TABLE</code> statement, or <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">SET
+ FILEFORMAT</code> clause for an existing table or partition within a partitioned table. See
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about file formats, especially
+ <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You manage underlying data files differently depending on whether the corresponding Impala table is
+ defined as an <a class="xref" href="impala_tables.html#internal_tables">internal</a> or
+ <a class="xref" href="impala_tables.html#external_tables">external</a> table:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to check if a particular table is internal
+ (managed by Impala) or external, and to see the physical location of the data files in HDFS. See
+ <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.
+ </li>
+
+ <li class="li">
+ For Impala-managed (<span class="q">"internal"</span>) tables, use <code class="ph codeph">DROP TABLE</code> statements to remove
+ data files. See <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details.
+ </li>
+
+ <li class="li">
+ For tables not managed by Impala (<span class="q">"external"</span> tables), use appropriate HDFS-related commands such
+ as <code class="ph codeph">hadoop fs</code>, <code class="ph codeph">hdfs dfs</code>, or <code class="ph codeph">distcp</code>, to create, move,
+ copy, or delete files within HDFS directories that are accessible by the <code class="ph codeph">impala</code> user.
+ Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement after adding or removing any
+ files from the data directory of an external table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for
+ details.
+ </li>
+
+ <li class="li">
+ Use external tables to reference HDFS data files in their original location. With this technique, you
+ avoid copying the files, and you can map more than one Impala table to the same set of data files. When
+ you drop the Impala table, the data files are left undisturbed. See
+ <a class="xref" href="impala_tables.html#external_tables">External Tables</a> for details.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">LOAD DATA</code> statement to move HDFS files into the data directory for an Impala
+ table from inside Impala, without the need to specify the HDFS path of the destination directory. This
+ technique works for both internal and external tables. See
+ <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Make sure that the HDFS trashcan is configured correctly. When you remove files from HDFS, the space
+ might not be reclaimed for use by other files until sometime later, when the trashcan is emptied. See
+ <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> for details. See
+ <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for permissions needed for the HDFS trashcan to operate
+ correctly.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Drop all tables in a database before dropping the database itself. See
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Clean up temporary files after failed <code class="ph codeph">INSERT</code> statements. If an <code class="ph codeph">INSERT</code>
+ statement encounters an error, and you see a directory named <span class="ph filepath">.impala_insert_staging</span>
+ or <span class="ph filepath">_impala_insert_staging</span> left behind in the data directory for the table, it might
+ contain temporary data files taking up space in HDFS. You might be able to salvage these data files, for
+ example if they are complete but could not be moved into place due to a permission error. Or, you might
+ delete those files through commands such as <code class="ph codeph">hadoop fs</code> or <code class="ph codeph">hdfs dfs</code>, to
+ reclaim space before re-trying the <code class="ph codeph">INSERT</code>. Issue <code class="ph codeph">DESCRIBE FORMATTED
+ <var class="keyword varname">table_name</var></code> to see the HDFS path where you can check for temporary files.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+ are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+ operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+ technique, without any name conflicts for these temporary files.) You can specify a different location by
+ starting the <span class="keyword cmdname">impalad</span> daemon with the
+ <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+ You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+ be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+ depending on the capacity and speed
+ of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+ Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+ in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+ Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing
+ files in a scratch directory during a query, Impala logs the error and the query fails.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use the Amazon Simple Storage Service (S3) as a place to offload
+ data to reduce the volume of local storage, Impala 2.2.0 and higher
+ can query the data directly from S3.
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav></article></main></body></html>
[18/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_operators.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_operators.html b/docs/build3x/html/topics/impala_operators.html
new file mode 100644
index 0000000..e03240b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_operators.html
@@ -0,0 +1,2042 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="
Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="operators"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SQL Operators</title></head><body id="operators"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SQL Operators</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ SQL operators are a class of comparison functions that are widely used within the <code class="ph codeph">WHERE</code> clauses of
+ <code class="ph codeph">SELECT</code> statements.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="operators__arithmetic_operators">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Arithmetic Operators</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The arithmetic operators use expressions with a left-hand argument, the operator, and then (in most cases) a right-hand argument.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_arg</var> <var class="keyword varname">binary_operator</var> <var class="keyword varname">right_hand_arg</var>
+<var class="keyword varname">unary_operator</var> <var class="keyword varname">single_arg</var>
+</code></pre>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">+</code> and <code class="ph codeph">-</code>: Can be used either as unary or binary operators.
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ With unary notation, such as <code class="ph codeph">+5</code>, <code class="ph codeph">-2.5</code>, or <code class="ph codeph">-<var class="keyword varname">col_name</var></code>,
+ they multiply their single numeric argument by <code class="ph codeph">+1</code> or <code class="ph codeph">-1</code>. Therefore, unary
+ <code class="ph codeph">+</code> returns its argument unchanged, while unary <code class="ph codeph">-</code> flips the sign of its argument. Although
+ you can double up these operators in expressions such as <code class="ph codeph">++5</code> (always positive) or <code class="ph codeph">-+2</code> or
+ <code class="ph codeph">+-2</code> (both always negative), you cannot double the unary minus operator because <code class="ph codeph">--</code> is
+ interpreted as the start of a comment. (You can use a double unary minus operator if you separate the <code class="ph codeph">-</code>
+ characters, for example with a space or parentheses.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ With binary notation, such as <code class="ph codeph">2+2</code>, <code class="ph codeph">5-2.5</code>, or <code class="ph codeph"><var class="keyword varname">col1</var> +
+ <var class="keyword varname">col2</var></code>, they add or subtract respectively the right-hand argument to (or from) the left-hand
+ argument. Both arguments must be of numeric types.
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">*</code> and <code class="ph codeph">/</code>: Multiplication and division respectively. Both arguments must be of numeric types.
+ </p>
+
+ <p class="p">
+ When multiplying, the shorter argument is promoted if necessary (such as <code class="ph codeph">SMALLINT</code> to <code class="ph codeph">INT</code> or
+ <code class="ph codeph">BIGINT</code>, or <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>), and then the result is promoted again to the
+ next larger type. Thus, multiplying a <code class="ph codeph">TINYINT</code> and an <code class="ph codeph">INT</code> produces a <code class="ph codeph">BIGINT</code>
+ result. Multiplying a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">FLOAT</code> produces a <code class="ph codeph">DOUBLE</code> result. Multiplying
+ a <code class="ph codeph">FLOAT</code> and a <code class="ph codeph">DOUBLE</code> or a <code class="ph codeph">DOUBLE</code> and a <code class="ph codeph">DOUBLE</code> produces a
+ <code class="ph codeph">DECIMAL(38,17)</code>, because <code class="ph codeph">DECIMAL</code> values can represent much larger and more precise values than
+ <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+ <p class="p">
+ When dividing, Impala always treats the arguments and result as <code class="ph codeph">DOUBLE</code> values to avoid losing precision. If you
+ need to insert the results of a division operation into a <code class="ph codeph">FLOAT</code> column, use the <code class="ph codeph">CAST()</code>
+ function to convert the result to the correct type.
+ </p>
+ </li>
+
+ <li class="li" id="arithmetic_operators__div">
+ <p class="p">
+ <code class="ph codeph">DIV</code>: Integer division. Arguments are not promoted to a floating-point type, and any fractional result
+ is discarded. For example, <code class="ph codeph">13 DIV 7</code> returns 1, <code class="ph codeph">14 DIV 7</code> returns 2, and
+ <code class="ph codeph">15 DIV 7</code> returns 2. This operator is the same as the <code class="ph codeph">QUOTIENT()</code> function.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">%</code>: Modulo operator. Returns the remainder of the left-hand argument divided by the right-hand argument. Both
+ arguments must be of one of the integer types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">&</code>, <code class="ph codeph">|</code>, <code class="ph codeph">~</code>, and <code class="ph codeph">^</code>: Bitwise operators that return the
+ logical AND, logical OR, <code class="ph codeph">NOT</code>, or logical XOR (exclusive OR) of their argument values. Both arguments must be of
+ one of the integer types. If the arguments are of different type, the argument with the smaller type is implicitly extended to
+ match the argument with the longer type.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ You can chain a sequence of arithmetic expressions, optionally grouping them with parentheses.
+ </p>
+
+ <p class="p">
+ The arithmetic operators generally do not have equivalent calling conventions using functional notation. For example, prior to
+ <span class="keyword">Impala 2.2</span>, there is no <code class="ph codeph">MOD()</code> function equivalent to the <code class="ph codeph">%</code> modulo operator.
+ Conversely, there are some arithmetic functions that do not have a corresponding operator. For example, for exponentiation you use
+ the <code class="ph codeph">POW()</code> function, but there is no <code class="ph codeph">**</code> exponentiation operator. See
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for the arithmetic functions you can use.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+ <p class="p">
+ The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+ item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+ used in an arithmetic expression, such as multiplying by 10:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey * 10
+ from region, region.r_nations as nation
+where nation.item.n_nationkey < 5;
++-------------+-------------+------------------------------+
+| r_name | item.n_name | nation.item.n_nationkey * 10 |
++-------------+-------------+------------------------------+
+| AMERICA | CANADA | 30 |
+| AMERICA | BRAZIL | 20 |
+| AMERICA | ARGENTINA | 10 |
+| MIDDLE EAST | EGYPT | 40 |
+| AFRICA | ALGERIA | 0 |
++-------------+-------------+------------------------------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="operators__between">
+
+ <h2 class="title topictitle2" id="ariaid-title3">BETWEEN Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ In a <code class="ph codeph">WHERE</code> clause, compares an expression to both a lower and upper bound. The comparison is successful is the
+ expression is greater than or equal to the lower bound, and less than or equal to the upper bound. If the bound values are switched,
+ so the lower bound is greater than the upper bound, does not match any values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> BETWEEN <var class="keyword varname">lower_bound</var> AND <var class="keyword varname">upper_bound</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Data types:</strong> Typically used with numeric data types. Works with any data type, although not very practical for
+ <code class="ph codeph">BOOLEAN</code> values. (<code class="ph codeph">BETWEEN false AND true</code> will match all <code class="ph codeph">BOOLEAN</code> values.) Use
+ <code class="ph codeph">CAST()</code> if necessary to ensure the lower and upper bound values are compatible types. Call string or date/time
+ functions if necessary to extract or transform the relevant portion to compare, especially if the value can be transformed into a
+ number.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Be careful when using short string operands. A longer string that starts with the upper bound value will not be included, because it
+ is considered greater than the upper bound. For example, <code class="ph codeph">BETWEEN 'A' and 'M'</code> would not match the string value
+ <code class="ph codeph">'Midway'</code>. Use functions such as <code class="ph codeph">upper()</code>, <code class="ph codeph">lower()</code>, <code class="ph codeph">substr()</code>,
+ <code class="ph codeph">trim()</code>, and so on if necessary to ensure the comparison works as expected.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Retrieve data for January through June, inclusive.
+select c1 from t1 where month <strong class="ph b">between 1 and 6</strong>;
+
+-- Retrieve data for names beginning with 'A' through 'M' inclusive.
+-- Only test the first letter to ensure all the values starting with 'M' are matched.
+-- Do a case-insensitive comparison to match names with various capitalization conventions.
+select last_name from customers where upper(substr(last_name,1,1)) <strong class="ph b">between 'A' and 'M'</strong>;
+
+-- Retrieve data for only the first week of each month.
+select count(distinct visitor_id)) from web_traffic where dayofmonth(when_viewed) <strong class="ph b">between 1 and 7</strong>;</code></pre>
+
+ <p class="p">
+ The following example shows how to do a <code class="ph codeph">BETWEEN</code> comparison using a numeric field of a <code class="ph codeph">STRUCT</code> type
+ that is an item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it
+ can be used in a comparison operator:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey between 3 and 5
++-------------+-------------+------------------+
+| r_name | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA | CANADA | 3 |
+| MIDDLE EAST | EGYPT | 4 |
+| AFRICA | ETHIOPIA | 5 |
++-------------+-------------+------------------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="operators__comparison_operators">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Comparison Operators</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports the familiar comparison operators for checking equality and sort order for the column data types:
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">left_hand_expression</var> <var class="keyword varname">comparison_operator</var> <var class="keyword varname">right_hand_expression</var></code></pre>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">=</code>, <code class="ph codeph">!=</code>, <code class="ph codeph"><></code>: apply to all types.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph"><</code>, <code class="ph codeph"><=</code>, <code class="ph codeph">></code>, <code class="ph codeph">>=</code>: apply to all types; for
+ <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">TRUE</code> is considered greater than <code class="ph codeph">FALSE</code>.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Alternatives:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">IN</code> and <code class="ph codeph">BETWEEN</code> operators provide shorthand notation for expressing combinations of equality,
+ less than, and greater than comparisons with a single operator.
+ </p>
+
+ <p class="p">
+ Because comparing any value to <code class="ph codeph">NULL</code> produces <code class="ph codeph">NULL</code> rather than <code class="ph codeph">TRUE</code> or
+ <code class="ph codeph">FALSE</code>, use the <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code> operators to check if a value is
+ <code class="ph codeph">NULL</code> or not.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+ <p class="p">
+ The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+ item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+ used with a comparison operator such as <code class="ph codeph"><</code>:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey < 5
++-------------+-------------+------------------+
+| r_name | item.n_name | item.n_nationkey |
++-------------+-------------+------------------+
+| AMERICA | CANADA | 3 |
+| AMERICA | BRAZIL | 2 |
+| AMERICA | ARGENTINA | 1 |
+| MIDDLE EAST | EGYPT | 4 |
+| AFRICA | ALGERIA | 0 |
++-------------+-------------+------------------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="operators__exists">
+
+ <h2 class="title topictitle2" id="ariaid-title5">EXISTS Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ The <code class="ph codeph">EXISTS</code> operator tests whether a subquery returns any results. You typically use it to find values from one
+ table that have corresponding values in another table.
+ </p>
+
+ <p class="p">
+ The converse, <code class="ph codeph">NOT EXISTS</code>, helps to find all the values from one table that do not have any corresponding values in
+ another table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>EXISTS (<var class="keyword varname">subquery</var>)
+NOT EXISTS (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The subquery can refer to a different table than the outer query block, or the same table. For example, you might use
+ <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> to check the existence of parent/child relationships between two columns of
+ the same table.
+ </p>
+
+ <p class="p">
+ You can also use operators and function calls within the subquery to test for other kinds of relationships other than strict
+ equality. For example, you might use a call to <code class="ph codeph">COUNT()</code> in the subquery to check whether the number of matching
+ values is higher or lower than some limit. You might call a UDF in the subquery to check whether values in one table matches a
+ hashed representation of those same values in a different table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong>
+ </p>
+
+ <p class="p">
+ If the subquery returns any value at all (even <code class="ph codeph">NULL</code>), <code class="ph codeph">EXISTS</code> returns <code class="ph codeph">TRUE</code> and
+ <code class="ph codeph">NOT EXISTS</code> returns false.
+ </p>
+
+ <p class="p">
+ The following example shows how even when the subquery returns only <code class="ph codeph">NULL</code> values, <code class="ph codeph">EXISTS</code> still
+ returns <code class="ph codeph">TRUE</code> and thus matches all the rows from the table in the outer query block.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table all_nulls (x int);
+[localhost:21000] > insert into all_nulls values (null), (null), (null);
+[localhost:21000] > select y from t2 where exists (select x from all_nulls);
++---+
+| y |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+ <p class="p">
+ However, if the table in the subquery is empty and so the subquery returns an empty result set, <code class="ph codeph">EXISTS</code> returns
+ <code class="ph codeph">FALSE</code>:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table empty (x int);
+[localhost:21000] > select y from t2 where exists (select x from empty);
+[localhost:21000] >
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+ <code class="ph codeph">LIMIT</code> clause.
+ </p>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.6</span>,
+ the <code class="ph codeph">NOT EXISTS</code> operator required a correlated subquery.
+ In <span class="keyword">Impala 2.6</span> and higher, <code class="ph codeph">NOT EXISTS</code> works with
+ uncorrelated queries also.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="p">
+
+
+ The following examples refer to these simple tables containing small sets of integers or strings:
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int);
+[localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] > create table t2 (y int);
+[localhost:21000] > insert into t2 values (2), (4), (6);
+
+[localhost:21000] > create table t3 (z int);
+[localhost:21000] > insert into t3 values (1), (3), (5);
+
+[localhost:21000] > create table month_names (m string);
+[localhost:21000] > insert into month_names values
+ > ('January'), ('February'), ('March'),
+ > ('April'), ('May'), ('June'), ('July'),
+ > ('August'), ('September'), ('October'),
+ > ('November'), ('December');
+</code></pre>
+ </div>
+
+ <p class="p">
+ The following example shows a correlated subquery that finds all the values in one table that exist in another table. For each value
+ <code class="ph codeph">X</code> from <code class="ph codeph">T1</code>, the query checks if the <code class="ph codeph">Y</code> column of <code class="ph codeph">T2</code> contains an
+ identical value, and the <code class="ph codeph">EXISTS</code> operator returns <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code> as appropriate in
+ each case.
+ </p>
+
+<pre class="pre codeblock"><code>localhost:21000] > select x from t1 where exists (select y from t2 where t1.x = y);
++---+
+| x |
++---+
+| 2 |
+| 4 |
+| 6 |
++---+
+</code></pre>
+
+ <p class="p">
+ An uncorrelated query is less interesting in this case. Because the subquery always returns <code class="ph codeph">TRUE</code>, all rows from
+ <code class="ph codeph">T1</code> are returned. If the table contents where changed so that the subquery did not match any rows, none of the rows
+ from <code class="ph codeph">T1</code> would be returned.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x from t1 where exists (select y from t2 where y > 5);
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
++---+
+</code></pre>
+
+ <p class="p">
+ The following example shows how an uncorrelated subquery can test for the existence of some condition within a table. By using
+ <code class="ph codeph">LIMIT 1</code> or an aggregate function, the query returns a single result or no result based on whether the subquery
+ matches any rows. Here, we know that <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code> contain some even numbers, but <code class="ph codeph">T3</code>
+ does not.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select "contains an even number" from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++---------------------------+
+| 'contains an even number' |
++---------------------------+
+| contains an even number |
++---------------------------+
+[localhost:21000] > select "contains an even number" as assertion from t1 where exists (select x from t1 where x % 2 = 0) limit 1;
++-------------------------+
+| assertion |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select x from t2 where y % 2 = 0) limit 1;
+ERROR: AnalysisException: couldn't resolve column reference: 'x'
+[localhost:21000] > select "contains an even number" as assertion from t2 where exists (select y from t2 where y % 2 = 0) limit 1;
++-------------------------+
+| assertion |
++-------------------------+
+| contains an even number |
++-------------------------+
+[localhost:21000] > select "contains an even number" as assertion from t3 where exists (select z from t3 where z % 2 = 0) limit 1;
+[localhost:21000] >
+</code></pre>
+
+ <p class="p">
+ The following example finds numbers in one table that are 1 greater than numbers from another table. The <code class="ph codeph">EXISTS</code>
+ notation is simpler than an equivalent <code class="ph codeph">CROSS JOIN</code> between the tables. (The example then also illustrates how the
+ same test could be performed using an <code class="ph codeph">IN</code> operator.)
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x from t1 where exists (select y from t2 where x = y + 1);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+[localhost:21000] > select x from t1 where x in (select y + 1 from t2);
++---+
+| x |
++---+
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+ <p class="p">
+ The following example finds values from one table that do not exist in another table.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x from t1 where not exists (select y from t2 where x = y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+ <p class="p">
+ The following example uses the <code class="ph codeph">NOT EXISTS</code> operator to find all the leaf nodes in tree-structured data. This
+ simplified <span class="q">"tree of life"</span> has multiple levels (class, order, family, and so on), with each item pointing upward through a
+ <code class="ph codeph">PARENT</code> pointer. The example runs an outer query and a subquery on the same table, returning only those items whose
+ <code class="ph codeph">ID</code> value is <em class="ph i">not</em> referenced by the <code class="ph codeph">PARENT</code> of any other item.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table tree (id int, parent int, name string);
+[localhost:21000] > insert overwrite tree values
+ > (0, null, "animals"),
+ > (1, 0, "placentals"),
+ > (2, 0, "marsupials"),
+ > (3, 1, "bats"),
+ > (4, 1, "cats"),
+ > (5, 2, "kangaroos"),
+ > (6, 4, "lions"),
+ > (7, 4, "tigers"),
+ > (8, 5, "red kangaroo"),
+ > (9, 2, "wallabies");
+[localhost:21000] > select name as "leaf node" from tree one
+ > where not exists (select parent from tree two where one.id = two.parent);
++--------------+
+| leaf node |
++--------------+
+| bats |
+| lions |
+| tigers |
+| red kangaroo |
+| wallabies |
++--------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="operators__ilike">
+
+ <h2 class="title topictitle2" id="ariaid-title6">ILIKE Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ A case-insensitive comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using <code class="ph codeph">_</code> to match a single
+ character and <code class="ph codeph">%</code> to match multiple characters. The argument expression must match the entire string value.
+ Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+ </p>
+
+ <p class="p">
+ This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">LIKE</code> operator,
+ but with case-insensitive comparisons.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> ILIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT ILIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ In the following examples, strings that are the same except for differences in uppercase
+ and lowercase match successfully with <code class="ph codeph">ILIKE</code>, but do not match
+ with <code class="ph codeph">LIKE</code>:
+ </p>
+
+<pre class="pre codeblock"><code>select 'fooBar' ilike 'FOOBAR';
++-------------------------+
+| 'foobar' ilike 'foobar' |
++-------------------------+
+| true |
++-------------------------+
+
+select 'fooBar' like 'FOOBAR';
++------------------------+
+| 'foobar' like 'foobar' |
++------------------------+
+| false |
++------------------------+
+
+select 'FOOBAR' ilike 'f%';
++---------------------+
+| 'foobar' ilike 'f%' |
++---------------------+
+| true |
++---------------------+
+
+select 'FOOBAR' like 'f%';
++--------------------+
+| 'foobar' like 'f%' |
++--------------------+
+| false |
++--------------------+
+
+select 'ABCXYZ' not ilike 'ab_xyz';
++-----------------------------+
+| not 'abcxyz' ilike 'ab_xyz' |
++-----------------------------+
+| false |
++-----------------------------+
+
+select 'ABCXYZ' not like 'ab_xyz';
++----------------------------+
+| not 'abcxyz' like 'ab_xyz' |
++----------------------------+
+| true |
++----------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ For case-sensitive comparisons, see <a class="xref" href="impala_operators.html#like">LIKE Operator</a>.
+ For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+ or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="operators__in">
+
+ <h2 class="title topictitle2" id="ariaid-title7">IN Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ The <code class="ph codeph">IN</code> operator compares an argument value to a set of values, and returns <code class="ph codeph">TRUE</code> if the argument
+ matches any value in the set. The <code class="ph codeph">NOT IN</code> operator reverses the comparison, and checks if the argument value is not
+ part of a set of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> IN (<var class="keyword varname">subquery</var>)
+
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">expression</var> [, <var class="keyword varname">expression</var>])
+<var class="keyword varname">expression</var> NOT IN (<var class="keyword varname">subquery</var>)
+</code></pre>
+
+ <p class="p">
+ The left-hand expression and the set of comparison values must be of compatible types.
+ </p>
+
+ <p class="p">
+ The left-hand expression must consist only of a single value, not a tuple. Although the left-hand expression is typically a column
+ name, it could also be some other value. For example, the <code class="ph codeph">WHERE</code> clauses <code class="ph codeph">WHERE id IN (5)</code> and
+ <code class="ph codeph">WHERE 5 IN (id)</code> produce the same results.
+ </p>
+
+ <p class="p">
+ The set of values to check against can be specified as constants, function calls, column names, or other expressions in the query
+ text. The maximum number of expressions in the <code class="ph codeph">IN</code> list is 9999. (The maximum number of elements of
+ a single expression is 10,000 items, and the <code class="ph codeph">IN</code> operator itself counts as one.)
+ </p>
+
+ <p class="p">
+ In Impala 2.0 and higher, the set of values can also be generated by a subquery. <code class="ph codeph">IN</code> can evaluate an unlimited
+ number of results using a subquery.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Any expression using the <code class="ph codeph">IN</code> operator could be rewritten as a series of equality tests connected with
+ <code class="ph codeph">OR</code>, but the <code class="ph codeph">IN</code> syntax is often clearer, more concise, and easier for Impala to optimize. For
+ example, with partitioned tables, queries frequently use <code class="ph codeph">IN</code> clauses to filter data by comparing the partition key
+ columns to specific values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong>
+ </p>
+
+ <p class="p">
+ If there really is a matching non-null value, <code class="ph codeph">IN</code> returns <code class="ph codeph">TRUE</code>:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select 1 in (1,null,2,3);
++----------------------+
+| 1 in (1, null, 2, 3) |
++----------------------+
+| true |
++----------------------+
+[localhost:21000] > select 1 not in (1,null,2,3);
++--------------------------+
+| 1 not in (1, null, 2, 3) |
++--------------------------+
+| false |
++--------------------------+
+</code></pre>
+
+ <p class="p">
+ If the searched value is not found in the comparison values, and the comparison values include <code class="ph codeph">NULL</code>, the result is
+ <code class="ph codeph">NULL</code>:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select 5 in (1,null,2,3);
++----------------------+
+| 5 in (1, null, 2, 3) |
++----------------------+
+| NULL |
++----------------------+
+[localhost:21000] > select 5 not in (1,null,2,3);
++--------------------------+
+| 5 not in (1, null, 2, 3) |
++--------------------------+
+| NULL |
++--------------------------+
+[localhost:21000] > select 1 in (null);
++-------------+
+| 1 in (null) |
++-------------+
+| NULL |
++-------------+
+[localhost:21000] > select 1 not in (null);
++-----------------+
+| 1 not in (null) |
++-----------------+
+| NULL |
++-----------------+
+</code></pre>
+
+ <p class="p">
+ If the left-hand argument is <code class="ph codeph">NULL</code>, <code class="ph codeph">IN</code> always returns <code class="ph codeph">NULL</code>. This rule applies even
+ if the comparison values include <code class="ph codeph">NULL</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select null in (1,2,3);
++-------------------+
+| null in (1, 2, 3) |
++-------------------+
+| NULL |
++-------------------+
+[localhost:21000] > select null not in (1,2,3);
++-----------------------+
+| null not in (1, 2, 3) |
++-----------------------+
+| NULL |
++-----------------------+
+[localhost:21000] > select null in (null);
++----------------+
+| null in (null) |
++----------------+
+| NULL |
++----------------+
+[localhost:21000] > select null not in (null);
++--------------------+
+| null not in (null) |
++--------------------+
+| NULL |
++--------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Available in earlier Impala releases, but new capabilities were added in
+ <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+ <p class="p">
+ The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+ item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+ used in an arithmetic expression, such as multiplying by 10:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+from region, region.r_nations as nation
+where nation.item.n_nationkey in (1,3,5)
++---------+-------------+------------------+
+| r_name | item.n_name | item.n_nationkey |
++---------+-------------+------------------+
+| AMERICA | CANADA | 3 |
+| AMERICA | ARGENTINA | 1 |
+| AFRICA | ETHIOPIA | 5 |
++---------+-------------+------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ Correlated subqueries used in <code class="ph codeph">EXISTS</code> and <code class="ph codeph">IN</code> operators cannot include a
+ <code class="ph codeph">LIMIT</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Using IN is concise and self-documenting.
+SELECT * FROM t1 WHERE c1 IN (1,2,10);
+-- Equivalent to series of = comparisons ORed together.
+SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2 OR c1 = 10;
+
+SELECT c1 AS "starts with vowel" FROM t2 WHERE upper(substr(c1,1,1)) IN ('A','E','I','O','U');
+
+SELECT COUNT(DISTINCT(visitor_id)) FROM web_traffic WHERE month IN ('January','June','July');</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_subqueries.html#subqueries">Subqueries in Impala SELECT Statements</a>
+ </p>
+
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="operators__iregexp">
+
+ <h2 class="title topictitle2" id="ariaid-title8">IREGEXP Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Tests whether a value matches a regular expression, using case-insensitive string comparisons.
+ Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+ <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+ represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+ produces a non-greedy match, and so on.
+ </p>
+
+ <p class="p">
+ This operator, available in <span class="keyword">Impala 2.5</span> and higher, is the equivalent of the <code class="ph codeph">REGEXP</code> operator,
+ but with case-insensitive comparisons.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> IREGEXP <var class="keyword varname">regular_expression</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The regular expression must match the entire value, not just occur somewhere inside it. Use <code class="ph codeph">.*</code> at the beginning,
+ the end, or both if you only need to match characters anywhere in the middle. Thus, the <code class="ph codeph">^</code> and <code class="ph codeph">$</code>
+ atoms are often redundant, although you might already have them in your expression strings that you reuse from elsewhere.
+ </p>
+
+
+
+ <p class="p">
+ The <code class="ph codeph">|</code> symbol is the alternation operator, typically used within <code class="ph codeph">()</code> to match different sequences.
+ The <code class="ph codeph">()</code> groups do not allow backreferences. To retrieve the part of a value matched within a <code class="ph codeph">()</code>
+ section, use the <code class="ph codeph"><a class="xref" href="impala_string_functions.html#string_functions__regexp_extract">regexp_extract()</a></code>
+ built-in function. (Currently, there is not any case-insensitive equivalent for the <code class="ph codeph">regexp_extract()</code> function.)
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+ regular expression string that occurs anywhere inside the target string, the same as if the regular
+ expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+ succeeded when the regular expression matched the entire target string. This change improves compatibility
+ with the regular expression support for popular database systems. There is no change to the behavior of the
+ <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+ </p>
+ </div>
+
+ <p class="p">
+ In Impala 2.0 and later, the Impala regular expression syntax conforms to the POSIX Extended Regular
+ Expression syntax used by the Google RE2 library. For details, see
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">the RE2 documentation</a>. It
+ has most idioms familiar from regular expressions in Perl, Python, and so on, including
+ <code class="ph codeph">.*?</code> for non-greedy matches.
+ </p>
+
+ <p class="p">
+ In Impala 2.0 and later, a change in the underlying regular expression library could cause changes in the
+ way regular expressions are interpreted by this function. Test any queries that use regular expressions and
+ adjust the expression patterns if necessary. See
+ <a class="xref" href="../shared/../topics/impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples demonstrate the syntax for the <code class="ph codeph">IREGEXP</code> operator.
+ </p>
+
+<pre class="pre codeblock"><code>select 'abcABCaabbcc' iregexp '^[a-c]+$';
++---------------------------------+
+| 'abcabcaabbcc' iregexp '[a-c]+' |
++---------------------------------+
+| true |
++---------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="is_distinct_from__is_distinct" id="operators__is_distinct_from">
+
+ <h2 class="title topictitle2" id="is_distinct_from__is_distinct">IS DISTINCT FROM Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ The <code class="ph codeph">IS DISTINCT FROM</code> operator, and its converse the <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, test whether or
+ not values are identical. <code class="ph codeph">IS NOT DISTINCT FROM</code> is similar to the <code class="ph codeph">=</code> operator, and <code class="ph codeph">IS
+ DISTINCT FROM</code> is similar to the <code class="ph codeph">!=</code> operator, except that <code class="ph codeph">NULL</code> values are treated as
+ identical. Therefore, <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">true</code> rather than <code class="ph codeph">NULL</code>, and
+ <code class="ph codeph">IS DISTINCT FROM</code> returns <code class="ph codeph">false</code> rather than <code class="ph codeph">NULL</code>, when comparing two
+ <code class="ph codeph">NULL</code> values. If one of the values being compared is <code class="ph codeph">NULL</code> and the other is not, <code class="ph codeph">IS DISTINCT
+ FROM</code> returns <code class="ph codeph">true</code> and <code class="ph codeph">IS NOT DISTINCT FROM</code> returns <code class="ph codeph">false</code>, again instead
+ of returning <code class="ph codeph">NULL</code> in both cases.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression1</var> IS DISTINCT FROM <var class="keyword varname">expression2</var>
+
+<var class="keyword varname">expression1</var> IS NOT DISTINCT FROM <var class="keyword varname">expression2</var>
+<var class="keyword varname">expression1</var> <=> <var class="keyword varname">expression2</var>
+</code></pre>
+
+ <p class="p">
+ The operator <code class="ph codeph"><=></code> is an alias for <code class="ph codeph">IS NOT DISTINCT FROM</code>.
+ It is typically used as a <code class="ph codeph">NULL</code>-safe equality operator in join queries.
+ That is, <code class="ph codeph">A <=> B</code> is true if <code class="ph codeph">A</code> equals <code class="ph codeph">B</code>
+ or if both <code class="ph codeph">A</code> and <code class="ph codeph">B</code> are <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This operator provides concise notation for comparing two values and always producing a <code class="ph codeph">true</code> or
+ <code class="ph codeph">false</code> result, without treating <code class="ph codeph">NULL</code> as a special case. Otherwise, to unambiguously distinguish
+ between two values requires a compound expression involving <code class="ph codeph">IS [NOT] NULL</code> tests of both operands in addition to the
+ <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> operator.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph"><=></code> operator, used like an equality operator in a join query,
+ is more efficient than the equivalent clause: <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+ The <code class="ph codeph"><=></code> operator can use a hash join, while the <code class="ph codeph">OR</code> expression
+ cannot.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how <code class="ph codeph">IS DISTINCT FROM</code> gives output similar to
+ the <code class="ph codeph">!=</code> operator, and <code class="ph codeph">IS NOT DISTINCT FROM</code> gives output
+ similar to the <code class="ph codeph">=</code> operator. The exception is when the expression involves
+ a <code class="ph codeph">NULL</code> value on one side or both sides, where <code class="ph codeph">!=</code> and
+ <code class="ph codeph">=</code> return <code class="ph codeph">NULL</code> but the <code class="ph codeph">IS [NOT] DISTINCT FROM</code>
+ operators still return <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+ </p>
+
+<pre class="pre codeblock"><code>
+select 1 is distinct from 0, 1 != 0;
++----------------------+--------+
+| 1 is distinct from 0 | 1 != 0 |
++----------------------+--------+
+| true | true |
++----------------------+--------+
+
+select 1 is distinct from 1, 1 != 1;
++----------------------+--------+
+| 1 is distinct from 1 | 1 != 1 |
++----------------------+--------+
+| false | false |
++----------------------+--------+
+
+select 1 is distinct from null, 1 != null;
++-------------------------+-----------+
+| 1 is distinct from null | 1 != null |
++-------------------------+-----------+
+| true | NULL |
++-------------------------+-----------+
+
+select null is distinct from null, null != null;
++----------------------------+--------------+
+| null is distinct from null | null != null |
++----------------------------+--------------+
+| false | NULL |
++----------------------------+--------------+
+
+select 1 is not distinct from 0, 1 = 0;
++--------------------------+-------+
+| 1 is not distinct from 0 | 1 = 0 |
++--------------------------+-------+
+| false | false |
++--------------------------+-------+
+
+select 1 is not distinct from 1, 1 = 1;
++--------------------------+-------+
+| 1 is not distinct from 1 | 1 = 1 |
++--------------------------+-------+
+| true | true |
++--------------------------+-------+
+
+select 1 is not distinct from null, 1 = null;
++-----------------------------+----------+
+| 1 is not distinct from null | 1 = null |
++-----------------------------+----------+
+| false | NULL |
++-----------------------------+----------+
+
+select null is not distinct from null, null = null;
++--------------------------------+-------------+
+| null is not distinct from null | null = null |
++--------------------------------+-------------+
+| true | NULL |
++--------------------------------+-------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how <code class="ph codeph">IS DISTINCT FROM</code> considers
+ <code class="ph codeph">CHAR</code> values to be the same (not distinct from each other)
+ if they only differ in the number of trailing spaces. Therefore, sometimes
+ the result of an <code class="ph codeph">IS [NOT] DISTINCT FROM</code> operator differs
+ depending on whether the values are <code class="ph codeph">STRING</code>/<code class="ph codeph">VARCHAR</code>
+ or <code class="ph codeph">CHAR</code>.
+ </p>
+
+<pre class="pre codeblock"><code>
+select
+ 'x' is distinct from 'x ' as string_with_trailing_spaces,
+ cast('x' as char(5)) is distinct from cast('x ' as char(5)) as char_with_trailing_spaces;
++-----------------------------+---------------------------+
+| string_with_trailing_spaces | char_with_trailing_spaces |
++-----------------------------+---------------------------+
+| true | false |
++-----------------------------+---------------------------+
+</code></pre>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="operators__is_null">
+
+ <h2 class="title topictitle2" id="ariaid-title10">IS NULL Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+ The <code class="ph codeph">IS NULL</code> operator, and its converse the <code class="ph codeph">IS NOT NULL</code> operator, test whether a specified value is
+ <code class="ph codeph"><a class="xref" href="impala_literals.html#null">NULL</a></code>. Because using <code class="ph codeph">NULL</code> with any of the other
+ comparison operators such as <code class="ph codeph">=</code> or <code class="ph codeph">!=</code> also returns <code class="ph codeph">NULL</code> rather than
+ <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>, you use a special-purpose comparison operator to check for this special condition.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS UNKNOWN</code> and
+ <code class="ph codeph">IS NOT UNKNOWN</code> as synonyms for
+ <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT NULL</code>,
+ respectively.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IS NULL
+<var class="keyword varname">expression</var> IS NOT NULL
+
+<span class="ph"><var class="keyword varname">expression</var> IS UNKNOWN</span>
+<span class="ph"><var class="keyword varname">expression</var> IS NOT UNKNOWN</span>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ In many cases, <code class="ph codeph">NULL</code> values indicate some incorrect or incomplete processing during data ingestion or conversion.
+ You might check whether any values in a column are <code class="ph codeph">NULL</code>, and if so take some followup action to fill them in.
+ </p>
+
+ <p class="p">
+ With sparse data, often represented in <span class="q">"wide"</span> tables, it is common for most values to be <code class="ph codeph">NULL</code> with only an
+ occasional non-<code class="ph codeph">NULL</code> value. In those cases, you can use the <code class="ph codeph">IS NOT NULL</code> operator to identify the
+ rows containing any data at all for a particular column, regardless of the actual value.
+ </p>
+
+ <p class="p">
+ With a well-designed database schema, effective use of <code class="ph codeph">NULL</code> values and <code class="ph codeph">IS NULL</code> and <code class="ph codeph">IS NOT
+ NULL</code> operators can save having to design custom logic around special values such as 0, -1, <code class="ph codeph">'N/A'</code>, empty
+ string, and so on. <code class="ph codeph">NULL</code> lets you distinguish between a value that is known to be 0, false, or empty, and a truly
+ unknown value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">IS [NOT] UNKNOWN</code> operator, as with the <code class="ph codeph">IS [NOT] NULL</code>
+ operator, is not applicable to complex type columns (<code class="ph codeph">STRUCT</code>,
+ <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>). Using a complex type column with this
+ operator causes a query error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- If this value is non-zero, something is wrong.
+select count(*) from employees where employee_id is null;
+
+-- With data from disparate sources, some fields might be blank.
+-- Not necessarily an error condition.
+select count(*) from census where household_income is null;
+
+-- Sometimes we expect fields to be null, and followup action
+-- is needed when they are not.
+select count(*) from web_traffic where weird_http_code is not null;</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="operators__is_true">
+
+ <h2 class="title topictitle2" id="ariaid-title11">IS TRUE Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+ This variation of the <code class="ph codeph">IS</code> operator tests for truth
+ or falsity, with right-hand arguments <code class="ph codeph">[NOT] TRUE</code>,
+ <code class="ph codeph">[NOT] FALSE</code>, and <code class="ph codeph">[NOT] UNKNOWN</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">expression</var> IS TRUE
+<var class="keyword varname">expression</var> IS NOT TRUE
+
+<var class="keyword varname">expression</var> IS FALSE
+<var class="keyword varname">expression</var> IS NOT FALSE
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This <code class="ph codeph">IS TRUE</code> and <code class="ph codeph">IS FALSE</code> forms are
+ similar to doing equality comparisons with the Boolean values
+ <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code>, except that
+ <code class="ph codeph">IS TRUE</code> and <code class="ph codeph">IS FALSE</code>
+ always return either <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>,
+ even if the left-hand side expression returns <code class="ph codeph">NULL</code>
+ </p>
+
+ <p class="p">
+ These operators let you simplify Boolean comparisons that must also
+ check for <code class="ph codeph">NULL</code>, for example
+ <code class="ph codeph">X != 10 AND X IS NOT NULL</code> is equivalent to
+ <code class="ph codeph">(X != 10) IS TRUE</code>.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+ <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+ functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+ <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">IS [NOT] TRUE</code> and <code class="ph codeph">IS [NOT] FALSE</code> operators are not
+ applicable to complex type columns (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or
+ <code class="ph codeph">MAP</code>). Using a complex type column with these operators causes a query error.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.11.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+select assertion, b, b is true, b is false, b is unknown
+ from boolean_test;
++-------------+-------+-----------+------------+-----------+
+| assertion | b | istrue(b) | isfalse(b) | b is null |
++-------------+-------+-----------+------------+-----------+
+| 2 + 2 = 4 | true | true | false | false |
+| 2 + 2 = 5 | false | false | true | false |
+| 1 = null | NULL | false | false | true |
+| null = null | NULL | false | false | true |
++-------------+-------+-----------+------------+-----------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="operators__like">
+
+ <h2 class="title topictitle2" id="ariaid-title12">LIKE Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ A comparison operator for <code class="ph codeph">STRING</code> data, with basic wildcard capability using the underscore
+ (<code class="ph codeph">_</code>) to match a single character and the percent sign (<code class="ph codeph">%</code>) to match multiple
+ characters. The argument expression must match the entire string value.
+ Typically, it is more efficient to put any <code class="ph codeph">%</code> wildcard match at the end of the string.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_expression</var> LIKE <var class="keyword varname">wildcard_expression</var>
+<var class="keyword varname">string_expression</var> NOT LIKE <var class="keyword varname">wildcard_expression</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>select distinct c_last_name from customer where c_last_name like 'Mc%' or c_last_name like 'Mac%';
+select count(c_last_name) from customer where c_last_name like 'M%';
+select c_email_address from customer where c_email_address like '%.edu';
+
+-- We can find 4-letter names beginning with 'M' by calling functions...
+select distinct c_last_name from customer where length(c_last_name) = 4 and substr(c_last_name,1,1) = 'M';
+-- ...or in a more readable way by matching M followed by exactly 3 characters.
+select distinct c_last_name from customer where c_last_name like 'M___';</code></pre>
+
+ <p class="p">
+ For case-insensitive comparisons, see <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a>.
+ For a more general kind of search operator using regular expressions, see <a class="xref" href="impala_operators.html#regexp">REGEXP Operator</a>
+ or its case-insensitive counterpart <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="operators__logical_operators">
+
+ <h2 class="title topictitle2" id="ariaid-title13">Logical Operators</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Logical operators return a <code class="ph codeph">BOOLEAN</code> value, based on a binary or unary logical operation between arguments that are
+ also Booleans. Typically, the argument expressions use <a class="xref" href="impala_operators.html#comparison_operators">comparison
+ operators</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">boolean_expression</var> <var class="keyword varname">binary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+<var class="keyword varname">unary_logical_operator</var> <var class="keyword varname">boolean_expression</var>
+</code></pre>
+
+ <p class="p">
+ The Impala logical operators are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">AND</code>: A binary operator that returns <code class="ph codeph">true</code> if its left-hand and right-hand arguments both evaluate
+ to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if either argument is <code class="ph codeph">NULL</code>, and <code class="ph codeph">false</code> otherwise.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">OR</code>: A binary operator that returns <code class="ph codeph">true</code> if either of its left-hand and right-hand arguments
+ evaluate to <code class="ph codeph">true</code>, <code class="ph codeph">NULL</code> if one argument is <code class="ph codeph">NULL</code> and the other is either
+ <code class="ph codeph">NULL</code> or <code class="ph codeph">false</code>, and <code class="ph codeph">false</code> otherwise.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">NOT</code>: A unary operator that flips the state of a Boolean expression from <code class="ph codeph">true</code> to
+ <code class="ph codeph">false</code>, or <code class="ph codeph">false</code> to <code class="ph codeph">true</code>. If the argument expression is <code class="ph codeph">NULL</code>,
+ the result remains <code class="ph codeph">NULL</code>. (When <code class="ph codeph">NOT</code> is used this way as a unary logical operator, it works
+ differently than the <code class="ph codeph">IS NOT NULL</code> comparison operator, which returns <code class="ph codeph">true</code> when applied to a
+ <code class="ph codeph">NULL</code>.)
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ You cannot refer to a column with a complex data type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>
+ directly in an operator. You can apply operators only to scalar values that make up a complex type
+ (the fields of a <code class="ph codeph">STRUCT</code>, the items of an <code class="ph codeph">ARRAY</code>,
+ or the key or value portion of a <code class="ph codeph">MAP</code>) as part of a join query that refers to
+ the scalar value using the appropriate dot notation or <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">VALUE</code>
+ pseudocolumn names.
+ </p>
+
+ <p class="p">
+ The following example shows how to do an arithmetic operation using a numeric field of a <code class="ph codeph">STRUCT</code> type that is an
+ item within an <code class="ph codeph">ARRAY</code> column. Once the scalar numeric value <code class="ph codeph">R_NATIONKEY</code> is extracted, it can be
+ used in an arithmetic expression, such as multiplying by 10:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The SMALLINT is a field within an array of structs.
+describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+-- When we refer to the scalar value using dot notation,
+-- we can use arithmetic and comparison operators on it
+-- like any other number.
+select r_name, nation.item.n_name, nation.item.n_nationkey
+ from region, region.r_nations as nation
+where
+ nation.item.n_nationkey between 3 and 5
+ or nation.item.n_nationkey < 15;
++-------------+----------------+------------------+
+| r_name | item.n_name | item.n_nationkey |
++-------------+----------------+------------------+
+| EUROPE | UNITED KINGDOM | 23 |
+| EUROPE | RUSSIA | 22 |
+| EUROPE | ROMANIA | 19 |
+| ASIA | VIETNAM | 21 |
+| ASIA | CHINA | 18 |
+| AMERICA | UNITED STATES | 24 |
+| AMERICA | PERU | 17 |
+| AMERICA | CANADA | 3 |
+| MIDDLE EAST | SAUDI ARABIA | 20 |
+| MIDDLE EAST | EGYPT | 4 |
+| AFRICA | MOZAMBIQUE | 16 |
+| AFRICA | ETHIOPIA | 5 |
++-------------+----------------+------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ These examples demonstrate the <code class="ph codeph">AND</code> operator:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select true and true;
++---------------+
+| true and true |
++---------------+
+| true |
++---------------+
+[localhost:21000] > select true and false;
++----------------+
+| true and false |
++----------------+
+| false |
++----------------+
+[localhost:21000] > select false and false;
++-----------------+
+| false and false |
++-----------------+
+| false |
++-----------------+
+[localhost:21000] > select true and null;
++---------------+
+| true and null |
++---------------+
+| NULL |
++---------------+
+[localhost:21000] > select (10 > 2) and (6 != 9);
++-----------------------+
+| (10 > 2) and (6 != 9) |
++-----------------------+
+| true |
++-----------------------+
+</code></pre>
+
+ <p class="p">
+ These examples demonstrate the <code class="ph codeph">OR</code> operator:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select true or true;
++--------------+
+| true or true |
++--------------+
+| true |
++--------------+
+[localhost:21000] > select true or false;
++---------------+
+| true or false |
++---------------+
+| true |
++---------------+
+[localhost:21000] > select false or false;
++----------------+
+| false or false |
++----------------+
+| false |
++----------------+
+[localhost:21000] > select true or null;
++--------------+
+| true or null |
++--------------+
+| true |
++--------------+
+[localhost:21000] > select null or true;
++--------------+
+| null or true |
++--------------+
+| true |
++--------------+
+[localhost:21000] > select false or null;
++---------------+
+| false or null |
++---------------+
+| NULL |
++---------------+
+[localhost:21000] > select (1 = 1) or ('hello' = 'world');
++--------------------------------+
+| (1 = 1) or ('hello' = 'world') |
++--------------------------------+
+| true |
++--------------------------------+
+[localhost:21000] > select (2 + 2 != 4) or (-1 > 0);
++--------------------------+
+| (2 + 2 != 4) or (-1 > 0) |
++--------------------------+
+| false |
++--------------------------+
+</code></pre>
+
+ <p class="p">
+ These examples demonstrate the <code class="ph codeph">NOT</code> operator:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select not true;
++----------+
+| not true |
++----------+
+| false |
++----------+
+[localhost:21000] > select not false;
++-----------+
+| not false |
++-----------+
+| true |
++-----------+
+[localhost:21000] > select not null;
++----------+
+| not null |
++----------+
+| NULL |
++----------+
+[localhost:21000] > select not (1=1);
++-------------+
+| not (1 = 1) |
++-------------+
+| false |
++-------------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="operators__regexp">
+
+ <h2 class="title topictitle2" id="ariaid-title14">REGEXP Operator</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Tests whether a value matches a regular expression. Uses the POSIX regular expression syntax where <code class="ph codeph">^</code> and
+ <code class="ph codeph">$</code> match the beginning and end of the string, <code class="ph codeph">.</code> represents any single character, <code class="ph codeph">*</code>
+ represents a sequence of zero or more items, <code class="ph codeph">+</code> represents a sequence of one or more items, <code class="ph codeph">?</code>
+ produces a non-greedy match, and so on.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">string_ex
<TRUNCATED>
[11/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_proxy.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_proxy.html b/docs/build3x/html/topics/impala_proxy.html
new file mode 100644
index 0000000..9a2b90d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_proxy.html
@@ -0,0 +1,501 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="proxy"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala through a Proxy for High Availability</title></head><body id="proxy"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala through a Proxy for High Availability</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ For most clusters that have multiple users and production availability requirements, you might set up a proxy
+ server to relay requests to and from Impala.
+ </p>
+
+ <p class="p">
+ Currently, the Impala statestore mechanism does not include such proxying and load-balancing features. Set up
+ a software package of your choice to perform these functions.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+ The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+ requirements for high availability, because problems with those daemons do not result in data loss.
+ If those daemons become unavailable due to an outage on a particular
+ host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+ <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+ Impala service.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="proxy__proxy_overview">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of Proxy Usage and Load Balancing for Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Using a load-balancing proxy server for Impala has the following advantages:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Applications connect to a single well-known host and port, rather than keeping track of the hosts where
+ the <span class="keyword cmdname">impalad</span> daemon is running.
+ </li>
+
+ <li class="li">
+ If any host running the <span class="keyword cmdname">impalad</span> daemon becomes unavailable, application connection
+ requests still succeed because you always connect to the proxy server rather than a specific host running
+ the <span class="keyword cmdname">impalad</span> daemon.
+ </li>
+
+ <li class="li">
+ The coordinator node for each Impala query potentially requires more memory and CPU cycles than the other
+ nodes that process the query. The proxy server can issue queries using round-robin scheduling, so that
+ each connection uses a different coordinator node. This load-balancing technique lets the Impala nodes
+ share this additional work, rather than concentrating it on a single machine.
+ </li>
+ </ul>
+
+ <p class="p">
+ The following setup steps are a general outline that apply to any load-balancing proxy software:
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Select and download the load-balancing proxy software or other
+ load-balancing hardware appliance. It should only need to be installed
+ and configured on a single host, typically on an edge node. Pick a
+ host other than the DataNodes where <span class="keyword cmdname">impalad</span> is
+ running, because the intention is to protect against the possibility
+ of one or more of these DataNodes becoming unavailable.
+ </li>
+
+ <li class="li">
+ Configure the load balancer (typically by editing a configuration file).
+ In particular:
+ <ul class="ul">
+ <li class="li">
+ Set up a port that the load balancer will listen on to relay
+ Impala requests back and forth. </li>
+ <li class="li">
+ See <a class="xref" href="#proxy_balancing">Choosing the Load-Balancing Algorithm</a> for load
+ balancing algorithm options.
+ </li>
+ <li class="li">
+ For Kerberized clusters, follow the instructions in <a class="xref" href="impala_proxy.html#proxy_kerberos">Special Proxy Considerations for Clusters Using Kerberos</a>.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ If you are using Hue or JDBC-based applications, you typically set
+ up load balancing for both ports 21000 and 21050, because these client
+ applications connect through port 21050 while the
+ <span class="keyword cmdname">impala-shell</span> command connects through port
+ 21000. See <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for when to use port
+ 21000, 21050, or another value depending on what type of connections
+ you are load balancing.
+ </li>
+
+ <li class="li">
+ Run the load-balancing proxy server, pointing it at the configuration file that you set up.
+ </li>
+
+ <li class="li">
+ For any scripts, jobs, or configuration settings for applications
+ that formerly connected to a specific DataNode to run Impala SQL
+ statements, change the connection information (such as the
+ <code class="ph codeph">-i</code> option in <span class="keyword cmdname">impala-shell</span>) to
+ point to the load balancer instead.
+ </li>
+ </ol>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The following sections use the HAProxy software as a representative example of a load balancer
+ that you can use with Impala.
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="proxy__proxy_balancing">
+ <h2 class="title topictitle2" id="ariaid-title3">Choosing the Load-Balancing Algorithm</h2>
+ <div class="body conbody">
+ <p class="p">
+ Load-balancing software offers a number of algorithms to distribute requests.
+ Each algorithm has its own characteristics that make it suitable in some situations
+ but not others.
+ </p>
+
+ <dl class="dl">
+
+ <dt class="dt dlterm">Leastconn</dt>
+ <dd class="dd">
+ Connects sessions to the coordinator with the fewest connections,
+ to balance the load evenly. Typically used for workloads consisting
+ of many independent, short-running queries. In configurations with
+ only a few client machines, this setting can avoid having all
+ requests go to only a small set of coordinators.
+ </dd>
+ <dd class="dd ddexpand">
+ Recommended for Impala with F5.
+ </dd>
+
+
+ <dt class="dt dlterm">Source IP Persistence</dt>
+ <dd class="dd">
+ <p class="p">
+ Sessions from the same IP address always go to the same
+ coordinator. A good choice for Impala workloads containing a mix
+ of queries and DDL statements, such as <code class="ph codeph">CREATE TABLE</code>
+ and <code class="ph codeph">ALTER TABLE</code>. Because the metadata changes from
+ a DDL statement take time to propagate across the cluster, prefer
+ to use the Source IP Persistence in this case. If you are unable
+ to choose Source IP Persistence, run the DDL and subsequent queries
+ that depend on the results of the DDL through the same session,
+ for example by running <code class="ph codeph">impala-shell -f <var class="keyword varname">script_file</var></code>
+ to submit several statements through a single session.
+ </p>
+ </dd>
+ <dd class="dd ddexpand">
+ <p class="p">
+ Required for setting up high availability with Hue.
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm">Round-robin</dt>
+ <dd class="dd">
+ <p class="p">
+ Distributes connections to all coordinator nodes.
+ Typically not recommended for Impala.
+ </p>
+ </dd>
+
+ </dl>
+
+ <p class="p">
+ You might need to perform benchmarks and load testing to determine
+ which setting is optimal for your use case. Always set up using two
+ load-balancing algorithms: Source IP Persistence for Hue and Leastconn
+ for others.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="proxy__proxy_kerberos">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Special Proxy Considerations for Clusters Using Kerberos</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ In a cluster using Kerberos, applications check host credentials to
+ verify that the host they are connecting to is the same one that is
+ actually processing the request, to prevent man-in-the-middle attacks.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and lower
+ versions, once you enable a proxy server in a Kerberized cluster, users
+ will not be able to connect to individual impala daemons directly from
+ impala-shell.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.12</span> and higher,
+ if you enable a proxy server in a Kerberized cluster, users have an
+ option to connect to Impala daemons directly from
+ <span class="keyword cmdname">impala-shell</span> using the <code class="ph codeph">-b</code> /
+ <code class="ph codeph">--kerberos_host_fqdn</code> option when you start
+ <span class="keyword cmdname">impala-shell</span>. This option can be used for testing or
+ troubleshooting purposes, but not recommended for live production
+ environments as it defeats the purpose of a load balancer/proxy.
+ </p>
+
+ <div class="p">
+ Example:
+<pre class="pre codeblock"><code>
+impala-shell -i impalad-1.mydomain.com -k -b loadbalancer-1.mydomain.com
+</code></pre>
+ </div>
+
+ <div class="p">
+ Alternatively, with the fully qualified
+ configurations:
+<pre class="pre codeblock"><code>impala-shell --impalad=impalad-1.mydomain.com:21000 --kerberos --kerberos_host_fqdn=loadbalancer-1.mydomain.com</code></pre>
+ </div>
+ <p class="p">
+ See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for
+ information about the option.
+ </p>
+
+ <p class="p">
+ To clarify that the load-balancing proxy server is legitimate, perform
+ these extra Kerberos setup steps:
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ This section assumes you are starting with a Kerberos-enabled cluster. See
+ <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for instructions for setting up Impala with Kerberos. See
+ <span class="xref">the documentation for your Apache Hadoop distribution</span> for general steps to set up Kerberos.
+ </li>
+
+ <li class="li">
+ Choose the host you will use for the proxy server. Based on the Kerberos setup procedure, it should
+ already have an entry <code class="ph codeph">impala/<var class="keyword varname">proxy_host</var>@<var class="keyword varname">realm</var></code> in
+ its keytab. If not, go back over the initial Kerberos configuration steps for the keytab on each host
+ running the <span class="keyword cmdname">impalad</span> daemon.
+ </li>
+
+ <li class="li">
+ Copy the keytab file from the proxy host to all other hosts in the cluster that run the
+ <span class="keyword cmdname">impalad</span> daemon. (For optimal performance, <span class="keyword cmdname">impalad</span> should be running
+ on all DataNodes in the cluster.) Put the keytab file in a secure location on each of these other hosts.
+ </li>
+
+ <li class="li">
+ Add an entry <code class="ph codeph">impala/<var class="keyword varname">actual_hostname</var>@<var class="keyword varname">realm</var></code> to the keytab on each
+ host running the <span class="keyword cmdname">impalad</span> daemon.
+ </li>
+
+ <li class="li">
+
+ For each impalad node, merge the existing keytab with the proxy’s keytab using
+ <span class="keyword cmdname">ktutil</span>, producing a new keytab file. For example:
+ <pre class="pre codeblock"><code>$ ktutil
+ ktutil: read_kt proxy.keytab
+ ktutil: read_kt impala.keytab
+ ktutil: write_kt proxy_impala.keytab
+ ktutil: quit</code></pre>
+
+ </li>
+
+ <li class="li">
+
+ To verify that the keytabs are merged, run the command:
+<pre class="pre codeblock"><code>
+klist -k <var class="keyword varname">keytabfile</var>
+</code></pre>
+ which lists the credentials for both <code class="ph codeph">principal</code> and <code class="ph codeph">be_principal</code> on
+ all nodes.
+ </li>
+
+
+ <li class="li">
+
+ Make sure that the <code class="ph codeph">impala</code> user has permission to read this merged keytab file.
+
+ </li>
+
+ <li class="li">
+ Change the following configuration settings for each host in the cluster that participates
+ in the load balancing:
+ <ul class="ul">
+ <li class="li">
+ In the <span class="keyword cmdname">impalad</span> option definition, add:
+<pre class="pre codeblock"><code>
+--principal=impala/<em class="ph i">proxy_host@realm</em>
+ --be_principal=impala/<em class="ph i">actual_host@realm</em>
+ --keytab_file=<em class="ph i">path_to_merged_keytab</em>
+</code></pre>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Every host has different <code class="ph codeph">--be_principal</code> because the actual hostname
+ is different on each host.
+
+ Specify the fully qualified domain name (FQDN) for the proxy host, not the IP
+ address. Use the exact FQDN as returned by a reverse DNS lookup for the associated
+ IP address.
+
+ </div>
+ </li>
+
+ <li class="li">
+ Modify the startup options. See <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for the procedure to modify the startup
+ options.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ Restart Impala to make the changes take effect. Restart the <span class="keyword cmdname">impalad</span> daemons on all
+ hosts in the cluster, as well as the <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span>
+ daemons.
+ </li>
+
+ </ol>
+
+
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="proxy__tut_proxy">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Example of Configuring HAProxy Load Balancer for Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If you are not already using a load-balancing proxy, you can experiment with
+ <a class="xref" href="http://haproxy.1wt.eu/" target="_blank">HAProxy</a> a free, open source load
+ balancer. This example shows how you might install and configure that load balancer on a Red Hat Enterprise
+ Linux system.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Install the load balancer: <code class="ph codeph">yum install haproxy</code>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Set up the configuration file: <span class="ph filepath">/etc/haproxy/haproxy.cfg</span>. See the following section
+ for a sample configuration file.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Run the load balancer (on a single host, preferably one not running <span class="keyword cmdname">impalad</span>):
+ </p>
+<pre class="pre codeblock"><code>/usr/sbin/haproxy –f /etc/haproxy/haproxy.cfg</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In <span class="keyword cmdname">impala-shell</span>, JDBC applications, or ODBC applications, connect to the listener
+ port of the proxy host, rather than port 21000 or 21050 on a host actually running <span class="keyword cmdname">impalad</span>.
+ The sample configuration file sets haproxy to listen on port 25003, therefore you would send all
+ requests to <code class="ph codeph"><var class="keyword varname">haproxy_host</var>:25003</code>.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ This is the sample <span class="ph filepath">haproxy.cfg</span> used in this example:
+ </p>
+
+<pre class="pre codeblock"><code>global
+ # To have these messages end up in /var/log/haproxy.log you will
+ # need to:
+ #
+ # 1) configure syslog to accept network log events. This is done
+ # by adding the '-r' option to the SYSLOGD_OPTIONS in
+ # /etc/sysconfig/syslog
+ #
+ # 2) configure local2 events to go to the /var/log/haproxy.log
+ # file. A line like the following can be added to
+ # /etc/sysconfig/syslog
+ #
+ # local2.* /var/log/haproxy.log
+ #
+ log 127.0.0.1 local0
+ log 127.0.0.1 local1 notice
+ chroot /var/lib/haproxy
+ pidfile /var/run/haproxy.pid
+ maxconn 4000
+ user haproxy
+ group haproxy
+ daemon
+
+ # turn on stats unix socket
+ #stats socket /var/lib/haproxy/stats
+
+#---------------------------------------------------------------------
+# common defaults that all the 'listen' and 'backend' sections will
+# use if not designated in their block
+#
+# You might need to adjust timing values to prevent timeouts.
+#
+# The timeout values should be dependant on how you use the cluster
+# and how long your queries run.
+#---------------------------------------------------------------------
+defaults
+ mode http
+ log global
+ option httplog
+ option dontlognull
+ option http-server-close
+ option forwardfor except 127.0.0.0/8
+ option redispatch
+ retries 3
+ maxconn 3000
+ timeout connect 5000
+ timeout client 3600s
+ timeout server 3600s
+
+#
+# This sets up the admin page for HA Proxy at port 25002.
+#
+listen stats :25002
+ balance
+ mode http
+ stats enable
+ stats auth <var class="keyword varname">username</var>:<var class="keyword varname">password</var>
+
+# This is the setup for Impala. Impala client connect to load_balancer_host:25003.
+# HAProxy will balance connections among the list of servers listed below.
+# The list of Impalad is listening at port 21000 for beeswax (impala-shell) or original ODBC driver.
+# For JDBC or ODBC version 2.x driver, use port 21050 instead of 21000.
+listen impala :25003
+ mode tcp
+ option tcplog
+ balance leastconn
+
+ server <var class="keyword varname">symbolic_name_1</var> impala-host-1.example.com:21000
+ server <var class="keyword varname">symbolic_name_2</var> impala-host-2.example.com:21000
+ server <var class="keyword varname">symbolic_name_3</var> impala-host-3.example.com:21000
+ server <var class="keyword varname">symbolic_name_4</var> impala-host-4.example.com:21000
+
+# Setup for Hue or other JDBC-enabled applications.
+# In particular, Hue requires sticky sessions.
+# The application connects to load_balancer_host:21051, and HAProxy balances
+# connections to the associated hosts, where Impala listens for JDBC
+# requests on port 21050.
+listen impalajdbc :21051
+ mode tcp
+ option tcplog
+ balance source
+ server <var class="keyword varname">symbolic_name_5</var> impala-host-1.example.com:21050 check
+ server <var class="keyword varname">symbolic_name_6</var> impala-host-2.example.com:21050 check
+ server <var class="keyword varname">symbolic_name_7</var> impala-host-3.example.com:21050 check
+ server <var class="keyword varname">symbolic_name_8</var> impala-host-4.example.com:21050 check
+</code></pre>
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ Hue requires the <code class="ph codeph">check</code> option at end of each line in
+ the above file to ensure HAProxy can detect any unreachable Impalad
+ server, and failover can be successful. Without the TCP check, you may hit
+ an error when the <span class="keyword cmdname">impalad</span> daemon to which Hue tries to
+ connect is down.
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If your JDBC or ODBC application connects to Impala through a load balancer such as
+ <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+ connection timeout values, either check the connection frequently so that it never sits idle longer than
+ the load balancer timeout value, or check the connection validity before using it and create a new one if
+ the connection has been closed.
+ </div>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_query_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_query_options.html b/docs/build3x/html/topics/impala_query_options.html
new file mode 100644
index 0000000..40b0c8e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_query_options.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_set.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_abort_on_error.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_allow_unsupported_formats.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_count_distinct.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_batch_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_buffer_pool_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_compute_stats_min_sample_size.html"><meta name="DC.Relation" scheme="UR
I" content="../topics/impala_debug_action.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_decimal_v2.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_join_distribution_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_default_spillable_buffer_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_codegen.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_row_runtime_filtering.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_streaming_preaggregations.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disable_unsafe_spills.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_single_node_rows_threshold.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_exec_time_limit_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_explain_level.html"><meta name=
"DC.Relation" scheme="URI" content="../topics/impala_hbase_cache_blocks.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hbase_caching.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_progress.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_live_summary.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_errors.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_row_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_num_runtime_filters.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max_scan_range_length.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mem_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min_spillable_buffer_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_mt_dop.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_n
um_nodes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_num_scanner_threads.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_optimize_partition_key_scans.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_compression_codec.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_annotate_strings_utf8.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_array_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_fallback_schema_resolution.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_parquet_file_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prefetch_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_timeout_s.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_request_pool.html"><meta name="DC.Relation" scheme="URI" content="../topics
/impala_replica_preference.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_bloom_filter_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_max_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_min_size.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_mode.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_runtime_filter_wait_time_ms.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_s3_skip_insert_staging.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schedule_random_replica.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_scratch_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shuffle_distinct_exprs.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_support_start_over.html"><meta name="DC.Relation" scheme="URI" c
ontent="../topics/impala_sync_ddl.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Query Options for the SET Statement</title></head><body id="query_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Query Options for the SET Statement</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can specify the following options using the <code class="ph codeph">SET</code> statement, and those settings affect all
+ queries issued from that session.
+ </p>
+
+ <p class="p">
+ Some query options are useful in day-to-day operations for improving usability, performance, or flexibility.
+ </p>
+
+ <p class="p">
+ Other query options control special-purpose aspects of Impala operation and are intended primarily for
+ advanced debugging or troubleshooting.
+ </p>
+
+ <p class="p">
+ Options with Boolean parameters can be set to 1 or <code class="ph codeph">true</code> to enable, or 0 or <code class="ph codeph">false</code>
+ to turn off.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In Impala 2.0 and later, you can set query options directly through the JDBC and ODBC interfaces by using the
+ <code class="ph codeph">SET</code> statement. Formerly, <code class="ph codeph">SET</code> was only available as a command within the
+ <span class="keyword cmdname">impala-shell</span> interpreter.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and later, you can set query options for an <span class="keyword cmdname">impala-shell</span> session
+ by specifying one or more command-line arguments of the form
+ <code class="ph codeph">--query_option=<var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>.
+ See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for details.
+ </p>
+ </div>
+
+
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_set.html#set">SET Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_abort_on_error.html">ABORT_ON_ERROR Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_allow_unsupported_formats.html">ALLOW_UNSUPPORTED_FORMATS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_appx_count_distinct.html">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_batch_size.html">BATCH_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_buffer_pool_limit.html">BUFFER_POOL_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_compression_codec.html">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a hr
ef="../topics/impala_compute_stats_min_sample_size.html">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_debug_action.html">DEBUG_ACTION Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_decimal_v2.html">DECIMAL_V2 Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_join_distribution_mode.html">DEFAULT_JOIN_DISTRIBUTION_MODE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_codegen.html">DISABLE_CODEGEN Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING Query Option (Impala 2.5 or hi
gher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_streaming_preaggregations.html">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disable_unsafe_spills.html">DISABLE_UNSAFE_SPILLS Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_exec_single_node_rows_threshold.html">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_exec_time_limit_s.html">EXEC_TIME_LIMIT_S Query Option (Impala 2.12 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_explain_level.html">EXPLAIN_LEVEL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_cache_blocks.html">HBASE_CAC
HE_BLOCKS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hbase_caching.html">HBASE_CACHING Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_progress.html">LIVE_PROGRESS Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_live_summary.html">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_errors.html">MAX_ERRORS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_row_size.html">MAX_ROW_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max_scan_r
ange_length.html">MAX_SCAN_RANGE_LENGTH Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mem_limit.html">MEM_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_mt_dop.html">MT_DOP Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_nodes.html">NUM_NODES Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_num_scanner_threads.html">NUM_SCANNER_THREADS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_co
mpression_codec.html">PARQUET_COMPRESSION_CODEC Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_annotate_strings_utf8.html">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_array_resolution.html">PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_fallback_schema_resolution.html">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_parquet_file_size.html">PARQUET_FILE_SIZE Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_prefetch_mode.html">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href
="../topics/impala_query_timeout_s.html">QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_request_pool.html">REQUEST_POOL Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_max_size.html">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_min_size.html">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a
href="../topics/impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_s3_skip_insert_staging.html">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_scratch_limit.html">SCRATCH_LIMIT Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shuffle_distinct_exprs.html">SHUFFLE_DISTINCT_EXPRS Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a hr
ef="../topics/impala_support_start_over.html">SUPPORT_START_OVER Query Option</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_sync_ddl.html">SYNC_DDL Query Option</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_set.html">SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_query_timeout_s.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_query_timeout_s.html b/docs/build3x/html/topics/impala_query_timeout_s.html
new file mode 100644
index 0000000..d6c11e6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_query_timeout_s.html
@@ -0,0 +1,62 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="query_timeout_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>QUERY_TIMEOUT_S Query Option (Impala 2.0 or higher only)</title></head><body id="query_timeout_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">QUERY_TIMEOUT_S Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Sets the idle query timeout value for the session, in seconds. Queries that sit idle for longer than the
+ timeout value are automatically cancelled. If the system administrator specified the
+ <code class="ph codeph">--idle_query_timeout</code> startup option, <code class="ph codeph">QUERY_TIMEOUT_S</code> must be smaller than
+ or equal to the <code class="ph codeph">--idle_query_timeout</code> value.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The timeout clock for queries and sessions only starts ticking when the query or session is idle.
+ For queries, this means the query has results ready but is waiting for a client to fetch the data. A
+ query can run for an arbitrary time without triggering a timeout, because the query is computing results
+ rather than sitting idle waiting for the results to be fetched. The timeout period is intended to prevent
+ unclosed queries from consuming resources and taking up slots in the admission count of running queries,
+ potentially preventing other queries from starting.
+ </p>
+ <p class="p">
+ For sessions, this means that no query has been submitted for some period of time.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SET QUERY_TIMEOUT_S=<var class="keyword varname">seconds</var>;</code></pre>
+
+
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (no timeout if <code class="ph codeph">--idle_query_timeout</code> not in effect; otherwise, use
+ <code class="ph codeph">--idle_query_timeout</code> value)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_rcfile.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_rcfile.html b/docs/build3x/html/topics/impala_rcfile.html
new file mode 100644
index 0000000..72b1bd8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_rcfile.html
@@ -0,0 +1,246 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="rcfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the RCFile File Format with Impala Tables</title></head><body id="rcfile"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using the RCFile File Format with Impala Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports using RCFile data files.
+ </p>
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">RCFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="rcfile__entry__1">
+ File Type
+ </th>
+ <th class="entry nocellnorowborder" id="rcfile__entry__2">
+ Format
+ </th>
+ <th class="entry nocellnorowborder" id="rcfile__entry__3">
+ Compression Codecs
+ </th>
+ <th class="entry nocellnorowborder" id="rcfile__entry__4">
+ Impala Can CREATE?
+ </th>
+ <th class="entry nocellnorowborder" id="rcfile__entry__5">
+ Impala Can INSERT?
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="rcfile__entry__1 ">
+ <a class="xref" href="impala_rcfile.html#rcfile">RCFile</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="rcfile__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="rcfile__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="rcfile__entry__4 ">
+ Yes.
+ </td>
+ <td class="entry nocellnorowborder" headers="rcfile__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ </tbody></table>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="rcfile__rcfile_create">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Creating RCFile Tables and Loading Data</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If you do not have an existing data file to use, begin by creating one in the appropriate format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To create an RCFile table:</strong>
+ </p>
+
+ <p class="p">
+ In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+ </p>
+
+<pre class="pre codeblock"><code>create table rcfile_table (<var class="keyword varname">column_specs</var>) stored as rcfile;</code></pre>
+
+ <p class="p">
+ Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+ certain file formats, you might use the Hive shell to load the data. See
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+ Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+ the new data.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ See <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a> for potential compatibility issues with
+ RCFile tables created in Hive 0.12, due to a change in the default RCFile SerDe for Hive.
+ </div>
+
+ <p class="p">
+ For example, here is how you might create some RCFile tables in Impala (by specifying the columns
+ explicitly, or cloning the structure of another table), load data through Hive, and query them through
+ Impala:
+ </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] > create table rcfile_table (x int) stored as rcfile;
+[localhost:21000] > create table rcfile_clone like some_other_table stored as rcfile;
+[localhost:21000] > quit;
+
+$ hive
+hive> insert into table rcfile_table select x from some_other_table;
+3 Rows loaded to rcfile_table
+Time taken: 19.015 seconds
+hive> quit;
+
+$ impala-shell -i localhost
+[localhost:21000] > select * from rcfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] > -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] > refresh rcfile_table;
+[localhost:21000] > select * from rcfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ Although you can create tables in this file format using
+ the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+ and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+ currently, Impala can query these types only in Parquet tables.
+ <span class="ph">
+ The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+ Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+ </span>
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="rcfile__rcfile_compression">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for RCFile Tables</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You may want to enable compression on existing tables. Enabling compression provides performance gains in
+ most cases and is supported for RCFile tables. For example, to enable Snappy compression, you would specify
+ the following additional settings when loading data through the Hive shell:
+ </p>
+
+<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+ <p class="p">
+ If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+ additional settings similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> CREATE TABLE <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) PARTITIONED BY (<var class="keyword varname">partition_cols</var>) STORED AS <var class="keyword varname">new_format</var>;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> INSERT OVERWRITE TABLE <var class="keyword varname">new_table</var> PARTITION(<var class="keyword varname">comma_separated_partition_cols</var>) SELECT * FROM <var class="keyword varname">old_table</var>;</code></pre>
+
+ <p class="p">
+ Remember that Hive does not require that you specify a source format for it. Consider the case of
+ converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+ Snappy compressed RCFile. Combining the components outlined previously to complete this table conversion,
+ you would specify settings similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) STORED AS RCFILE;
+hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> INSERT OVERWRITE TABLE tbl_rc SELECT * FROM tbl;</code></pre>
+
+ <p class="p">
+ To complete a similar process for a table that includes partitions, you would specify settings similar to
+ the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_rc (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS RCFILE;
+hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> INSERT OVERWRITE TABLE tbl_rc PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The compression type is specified in the following command:
+ </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+ <p class="p">
+ You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+ </p>
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="rcfile__rcfile_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala RCFile Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In general, expect query performance with RCFile tables to be
+ faster than with tables using text data, but slower than with
+ Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+ for information about using the Parquet file format for
+ high-performance analytic queries.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ </div>
+ </article>
+
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_real.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_real.html b/docs/build3x/html/topics/impala_real.html
new file mode 100644
index 0000000..5e772c2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_real.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="real"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REAL Data Type</title></head><body id="real"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REAL Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ An alias for the <code class="ph codeph">DOUBLE</code> data type. See <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ These examples show how you can use the type names <code class="ph codeph">REAL</code> and <code class="ph codeph">DOUBLE</code>
+ interchangeably, and behind the scenes Impala treats them always as <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table r1 (x real);
+[localhost:21000] > describe r1;
++------+--------+---------+
+| name | type | comment |
++------+--------+---------+
+| x | double | |
++------+--------+---------+
+[localhost:21000] > insert into r1 values (1.5), (cast (2.2 as double));
+[localhost:21000] > select cast (1e6 as real);
++---------------------------+
+| cast(1000000.0 as double) |
++---------------------------+
+| 1000000 |
++---------------------------+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_refresh.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_refresh.html b/docs/build3x/html/topics/impala_refresh.html
new file mode 100644
index 0000000..5359668
--- /dev/null
+++ b/docs/build3x/html/topics/impala_refresh.html
@@ -0,0 +1,408 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="refresh"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>REFRESH Statement</title></head><body id="refresh"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">REFRESH Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ To accurately respond to queries, the Impala node that acts as the coordinator (the node to which you are
+ connected through <span class="keyword cmdname">impala-shell</span>, JDBC, or ODBC) must have current metadata about those
+ databases and tables that are referenced in Impala queries. If you are not familiar with the way Impala uses
+ metadata and how it shares the same metastore database as Hive, see
+ <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>REFRESH [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">key_col1</var>=<var class="keyword varname">val1</var> [, <var class="keyword varname">key_col2</var>=<var class="keyword varname">val2</var>...])]
+<span class="ph">REFRESH FUNCTIONS <var class="keyword varname">db_name</var></span>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Use the <code class="ph codeph">REFRESH</code> statement to load the latest metastore metadata and block location data for
+ a particular table in these scenarios:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ After loading new data files into the HDFS data directory for the table. (Once you have set up an ETL
+ pipeline to bring data into Impala on a regular basis, this is typically the most frequent reason why
+ metadata needs to be refreshed.)
+ </li>
+
+ <li class="li">
+ After issuing <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or other
+ table-modifying SQL statement in Hive.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the syntax <code class="ph codeph">ALTER TABLE <var class="keyword varname">table_name</var> RECOVER PARTITIONS</code>
+ is a faster alternative to <code class="ph codeph">REFRESH</code> when the only change to the table data is the addition of
+ new partition directories through Hive or manual HDFS operations.
+ See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for details.
+ </p>
+ </div>
+
+ <p class="p">
+ You only need to issue the <code class="ph codeph">REFRESH</code> statement on the node to which you connect to issue
+ queries. The coordinator node divides the work among all the Impala nodes in a cluster, and sends read
+ requests for the correct HDFS blocks without relying on the metadata on the other nodes.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">REFRESH</code> reloads the metadata for the table from the metastore database, and does an
+ incremental reload of the low-level block location data to account for any new data files added to the HDFS
+ data directory for the table. It is a low-overhead, single-table operation, specifically tuned for the common
+ scenario where new data files are added to HDFS.
+ </p>
+
+ <p class="p">
+ Only the metadata for the specified table is flushed. The table must already exist and be known to Impala,
+ either because the <code class="ph codeph">CREATE TABLE</code> statement was run in Impala rather than Hive, or because a
+ previous <code class="ph codeph">INVALIDATE METADATA</code> statement caused Impala to reload its entire metadata catalog.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The catalog service broadcasts any changed metadata as a result of Impala
+ <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code> statements to all
+ Impala nodes. Thus, the <code class="ph codeph">REFRESH</code> statement is only required if you load data through Hive
+ or by manipulating data files in HDFS directly. See <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for
+ more information on the catalog service.
+ </p>
+ <p class="p">
+ Another way to avoid inconsistency across nodes is to enable the
+ <code class="ph codeph">SYNC_DDL</code> query option before performing a DDL statement or an <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">LOAD DATA</code>.
+ </p>
+ <p class="p">
+ The table name is a required parameter. To flush the metadata for all tables, use the
+ <code class="ph codeph"><a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA</a></code>
+ command.
+ </p>
+ <p class="p">
+ Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+ Impala node is already aware of, when you create a new table in the Hive shell, enter
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+ <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> after you add data files for that table.
+ </p>
+ </div>
+
+ <p class="p">
+ <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+ METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+ metadata for the table, which can be an expensive operation, especially for large tables with many
+ partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+ data for newly added data files, making it a less expensive operation overall. If data was altered in some
+ more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+ METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+ the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+ statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+ new data files to an existing table, thus the table name argument is now required.
+ </p>
+
+ <p class="p">
+ A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ A metadata change occurs.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">and</strong> the change is made through Hive.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+ connect.
+ </li>
+ </ul>
+
+ <p class="p">
+ A metadata update for an Impala node is <strong class="ph b">not</strong> required after you run <code class="ph codeph">ALTER TABLE</code>,
+ <code class="ph codeph">INSERT</code>, or other table-modifying statement in Impala rather than Hive. Impala handles the
+ metadata synchronization automatically through the catalog service.
+ </p>
+
+ <p class="p">
+ Database and table metadata is typically modified by:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Hive - through <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+ <code class="ph codeph">INSERT</code> operations.
+ </li>
+
+ <li class="li">
+ Impalad - through <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+ operations. <span class="ph">Such changes are propagated to all Impala nodes by the
+ Impala catalog service.</span>
+ </li>
+ </ul>
+
+ <p class="p">
+ <code class="ph codeph">REFRESH</code> causes the metadata for that table to be immediately reloaded. For a huge table,
+ that process could take a noticeable amount of time; but doing the refresh up front avoids an unpredictable
+ delay later, for example if the next reference to the table is during a benchmark test.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Refreshing a single partition:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.7</span> and higher, the <code class="ph codeph">REFRESH</code> statement can apply to a single partition at a time,
+ rather than the whole table. Include the optional <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code>
+ clause and specify values for each of the partition key columns.
+ </p>
+
+ <p class="p">
+ The following examples show how to make Impala aware of data added to a single partition, after data is loaded into
+ a partition's data directory using some mechanism outside Impala, such as Hive or Spark. The partition can be one that
+ Impala created and is already aware of, or a new partition created through Hive.
+ </p>
+
+<pre class="pre codeblock"><code>
+impala> create table p (x int) partitioned by (y int);
+impala> insert into p (x,y) values (1,2), (2,2), (2,1);
+impala> show partitions p;
++-------+-------+--------+------+...
+| y | #Rows | #Files | Size |...
++-------+-------+--------+------+...
+| 1 | -1 | 1 | 2B |...
+| 2 | -1 | 1 | 4B |...
+| Total | -1 | 2 | 6B |...
++-------+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline> insert into p partition (y = 1) values(1000);
+
+impala> refresh p partition (y=1);
+impala> select x from p where y=1;
++------+
+| x |
++------+
+| 2 | <- Original data created by Impala
+| 1000 | <- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+ <p class="p">
+ The same applies for tables with more than one partition key column.
+ The <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">REFRESH</code>
+ statement must include all the partition key columns.
+ </p>
+
+<pre class="pre codeblock"><code>
+impala> create table p2 (x int) partitioned by (y int, z int);
+impala> insert into p2 (x,y,z) values (0,0,0), (1,2,3), (2,2,3);
+impala> show partitions p2;
++-------+---+-------+--------+------+...
+| y | z | #Rows | #Files | Size |...
++-------+---+-------+--------+------+...
+| 0 | 0 | -1 | 1 | 2B |...
+| 2 | 3 | -1 | 1 | 4B |...
+| Total | | -1 | 2 | 6B |...
++-------+---+-------+--------+------+...
+
+-- ... Data is inserted into one of the partitions by some external mechanism ...
+beeline> insert into p2 partition (y = 2, z = 3) values(1000);
+
+impala> refresh p2 partition (y=2, z=3);
+impala> select x from p where y=2 and z = 3;
++------+
+| x |
++------+
+| 1 | <- Original data created by Impala
+| 2 | <- Original data created by Impala
+| 1000 | <- Additional data inserted through Beeline
++------+
+
+</code></pre>
+
+ <p class="p">
+ The following examples show how specifying a nonexistent partition does not cause any error,
+ and the order of the partition key columns does not have to match the column order in the table.
+ The partition spec must include all the partition key columns; specifying an incomplete set of
+ columns does cause an error.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Partition doesn't exist.
+refresh p2 partition (y=0, z=3);
+refresh p2 partition (y=0, z=-1)
+-- Key columns specified in a different order than the table definition.
+refresh p2 partition (z=1, y=0)
+-- Incomplete partition spec causes an error.
+refresh p2 partition (y=0)
+ERROR: AnalysisException: Items in partition spec must exactly match the partition columns in the table definition: default.p2 (1 vs 2)
+
+</code></pre>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how you might use the <code class="ph codeph">REFRESH</code> statement after manually adding
+ new HDFS data files to the Impala data directory for a table:
+ </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] > refresh t1;
+[impalad-host:21000] > refresh t2;
+[impalad-host:21000] > select * from t1;
+...
+[impalad-host:21000] > select * from t2;
+... </code></pre>
+
+ <p class="p">
+ For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+ combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related impala-shell options:</strong>
+ </p>
+
+ <p class="p">
+ The <span class="keyword cmdname">impala-shell</span> option <code class="ph codeph">-r</code> issues an <code class="ph codeph">INVALIDATE METADATA</code> statement
+ when starting up the shell, effectively performing a <code class="ph codeph">REFRESH</code> of all tables.
+ Due to the expense of reloading the metadata for all tables, the <span class="keyword cmdname">impala-shell</span> <code class="ph codeph">-r</code>
+ option is not recommended for day-to-day use in a production environment. (This option was mainly intended as a workaround
+ for synchronization issues in very old Impala versions.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have execute
+ permissions for all the relevant directories holding table data.
+ (A table could have data spread across multiple directories,
+ or in unexpected paths, if it uses partitioning or
+ specifies a <code class="ph codeph">LOCATION</code> attribute for
+ individual partitions or the entire table.)
+ Issues with permissions might not cause an immediate error for this statement,
+ but subsequent statements such as <code class="ph codeph">SELECT</code>
+ or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+ </p>
+ <p class="p">
+ All HDFS and Sentry permissions and privileges are the same whether you refresh the entire table
+ or a single partition.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> command checks HDFS permissions of the underlying data files and directories,
+ caching this information so that a statement can be cancelled immediately if for example the
+ <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the table. Impala
+ reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case that
+ represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+ user, issue another <code class="ph codeph">REFRESH</code> to make Impala aware of the change.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ After adding or replacing data in a table used in performance-critical queries, issue a <code class="ph codeph">COMPUTE
+ STATS</code> statement to make sure all statistics are up-to-date. Consider updating statistics for a
+ table after any <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement in Impala, or after loading data through Hive and doing a <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> in Impala. This technique is especially important for tables that
+ are very large, used in join queries, or both.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+ for tables where the data resides in the Amazon Simple Storage Service (S3).
+ In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+ in the associated S3 data directory.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Much of the metadata for Kudu tables is handled by the underlying
+ storage layer. Kudu tables have less reliance on the metastore
+ database, and require less metadata caching on the Impala side.
+ For example, information about partitions in Kudu tables is managed
+ by Kudu, and Impala does not cache any block locality metadata
+ for Kudu tables.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+ statements are needed less frequently for Kudu tables than for
+ HDFS-backed tables. Neither statement is needed when data is
+ added to, removed, or updated in a Kudu table, even if the changes
+ are made directly to Kudu through a client program using the Kudu API.
+ Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+ for a Kudu table only after making a change to the Kudu table schema,
+ such as adding or dropping a column, by a mechanism other than
+ Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">UDF considerations:</strong>
+ </p>
+ <div class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, you can refresh the user-defined functions (UDFs)
+ that Impala recognizes, at the database level, by running the <code class="ph codeph">REFRESH FUNCTIONS</code>
+ statement with the database name as an argument. Java-based UDFs can be added to the metastore
+ database through Hive <code class="ph codeph">CREATE FUNCTION</code> statements, and made visible to Impala
+ by subsequently running <code class="ph codeph">REFRESH FUNCTIONS</code>. For example:
+
+<pre class="pre codeblock"><code>CREATE DATABASE shared_udfs;
+USE shared_udfs;
+...use CREATE FUNCTION statements in Hive to create some Java-based UDFs
+ that Impala is not initially aware of...
+REFRESH FUNCTIONS shared_udfs;
+SELECT udf_created_by_hive(c1) FROM ...
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_release_notes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_release_notes.html b/docs/build3x/html/topics/impala_release_notes.html
new file mode 100644
index 0000000..86359c9
--- /dev/null
+++ b/docs/build3x/html/topics/impala_release_notes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_new_features.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_incompatible_changes.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_known_issues.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_fixed_issues.html"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_release_notes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="impala_release_notes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+ features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+ Impala versions up to <span class="ph">Impala 3.0.x</span>. For users
+ upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+ software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala</a> lists any changes to
+ file formats, SQL syntax, or software dependencies to take into account.
+ </p>
+
+ <p class="p">
+ Once you are finished reviewing these release notes, for more information about using Impala, see
+ <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_new_features.html">New Features in Apache Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_incompatible_changes.html">Incompatible Changes and Limitations in Apache Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_known_issues.html">Known Issues and Workarounds in Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_fixed_issues.html">Fixed Issues in Apache Impala</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_relnotes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_relnotes.html b/docs/build3x/html/topics/impala_relnotes.html
new file mode 100644
index 0000000..f9b8d62
--- /dev/null
+++ b/docs/build3x/html/topics/impala_relnotes.html
@@ -0,0 +1,26 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="relnotes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Release Notes</title></head><body id="relnotes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Release Notes</h1>
+
+
+ <div class="body conbody" id="relnotes__relnotes_intro">
+
+ <p class="p">
+ These release notes provide information on the <a class="xref" href="impala_new_features.html#new_features">new
+ features</a> and <a class="xref" href="impala_known_issues.html#known_issues">known issues and limitations</a> for
+ Impala versions up to <span class="ph">Impala 3.0.x</span>. For users
+ upgrading from earlier Impala releases, or using Impala in combination with specific versions of other
+ software, <a class="xref" href="impala_incompatible_changes.html#incompatible_changes">Incompatible Changes and Limitations in Apache Impala</a> lists any changes to
+ file formats, SQL syntax, or software dependencies to take into account.
+ </p>
+
+ <p class="p">
+ Once you are finished reviewing these release notes, for more information about using Impala, see
+ <a class="xref" href="impala_concepts.html">Impala Concepts and Architecture</a>.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+</article></main></body></html>
[40/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_count.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_count.html b/docs/build3x/html/topics/impala_count.html
new file mode 100644
index 0000000..a451013
--- /dev/null
+++ b/docs/build3x/html/topics/impala_count.html
@@ -0,0 +1,353 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="count"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COUNT Function</title></head><body id="count"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">COUNT Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns the number of rows, or the number of non-<code class="ph codeph">NULL</code> rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>COUNT([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]</code></pre>
+
+ <p class="p">
+ Depending on the argument, <code class="ph codeph">COUNT()</code> considers rows that meet certain conditions:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The notation <code class="ph codeph">COUNT(*)</code> includes <code class="ph codeph">NULL</code> values in the total.
+ </li>
+
+ <li class="li">
+ The notation <code class="ph codeph">COUNT(<var class="keyword varname">column_name</var>)</code> only considers rows where the column
+ contains a non-<code class="ph codeph">NULL</code> value.
+ </li>
+
+ <li class="li">
+ You can also combine <code class="ph codeph">COUNT</code> with the <code class="ph codeph">DISTINCT</code> operator to eliminate
+ duplicates before counting, and to count the combinations of values across multiple columns.
+ </li>
+ </ul>
+
+ <p class="p">
+ When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+ grouping values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BIGINT</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+ <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+ query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+ for the kinds of queries that this option applies to, and slight differences in how partitions are
+ evaluated when this query option is enabled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- How many rows total are in the table, regardless of NULL values?
+select count(*) from t1;
+-- How many rows are in the table with non-NULL values for a column?
+select count(c1) from t1;
+-- Count the rows that meet certain conditions.
+-- Again, * includes NULLs, so COUNT(*) might be greater than COUNT(col).
+select count(*) from t1 where x > 10;
+select count(c1) from t1 where x > 10;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Combine COUNT and DISTINCT to find the number of unique values.
+-- Must use column names rather than * with COUNT(DISTINCT ...) syntax.
+-- Rows with NULL values are not counted.
+select count(distinct c1) from t1;
+-- Rows with a NULL value in _either_ column are not counted.
+select count(distinct c1, c2) from t1;
+-- Return more than one result.
+select month, year, count(distinct visitor_id) from web_stats group by month, year;
+</code></pre>
+
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">COUNT()</code> in an analytic context. They use a table
+ containing integers from 1 to 10. Notice how the <code class="ph codeph">COUNT()</code> is reported for each input value, as
+ opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, count(x) over (partition by property) as count from int_t where property in ('odd','even');
++----+----------+-------+
+| x | property | count |
++----+----------+-------+
+| 2 | even | 5 |
+| 4 | even | 5 |
+| 6 | even | 5 |
+| 8 | even | 5 |
+| 10 | even | 5 |
+| 1 | odd | 5 |
+| 3 | odd | 5 |
+| 5 | odd | 5 |
+| 7 | odd | 5 |
+| 9 | odd | 5 |
++----+----------+-------+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">COUNT()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running count of all the even values,
+then a running count of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+ count(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative count'
+ from int_t where property in ('odd','even');
++----+----------+------------------+
+| x | property | cumulative count |
++----+----------+------------------+
+| 2 | even | 1 |
+| 4 | even | 2 |
+| 6 | even | 3 |
+| 8 | even | 4 |
+| 10 | even | 5 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+------------------+
+
+select x, property,
+ count(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">range between unbounded preceding and current row</strong>
+ ) as 'cumulative total'
+from int_t where property in ('odd','even');
++----+----------+------------------+
+| x | property | cumulative count |
++----+----------+------------------+
+| 2 | even | 1 |
+| 4 | even | 2 |
+| 6 | even | 3 |
+| 8 | even | 4 |
+| 10 | even | 5 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+------------------+
+
+select x, property,
+ count(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">rows between unbounded preceding and current row</strong>
+ ) as 'cumulative total'
+ from int_t where property in ('odd','even');
++----+----------+------------------+
+| x | property | cumulative count |
++----+----------+------------------+
+| 2 | even | 1 |
+| 4 | even | 2 |
+| 6 | even | 3 |
+| 8 | even | 4 |
+| 10 | even | 5 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running count taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Therefore, the count is consistently 3 for rows in the middle of the window, and 2 for
+rows near the ends of the window, where there is no preceding or no following row in the partition.
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+ count(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">rows between 1 preceding and 1 following</strong>
+ ) as 'moving total'
+ from int_t where property in ('odd','even');
++----+----------+--------------+
+| x | property | moving total |
++----+----------+--------------+
+| 2 | even | 2 |
+| 4 | even | 3 |
+| 6 | even | 3 |
+| 8 | even | 3 |
+| 10 | even | 2 |
+| 1 | odd | 2 |
+| 3 | odd | 3 |
+| 5 | odd | 3 |
+| 7 | odd | 3 |
+| 9 | odd | 2 |
++----+----------+--------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+ count(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">range between 1 preceding and 1 following</strong>
+ ) as 'moving total'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+ expression in each query.
+ </p>
+ <p class="p">
+ If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+ specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+ <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+ <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+ <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+ </p>
+ <p class="p">
+ To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+ following technique for queries involving a single table:
+ </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+ (select count(distinct col1) as c1 from t1) v1
+ cross join
+ (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+ <p class="p">
+ Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+ technique wherever practical.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_database.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_database.html b/docs/build3x/html/topics/impala_create_database.html
new file mode 100644
index 0000000..14cd785
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_database.html
@@ -0,0 +1,209 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE DATABASE Statement</title></head><body id="create_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">CREATE DATABASE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Creates a new database.
+ </p>
+
+ <p class="p">
+ In Impala, a database is both:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ A logical construct for grouping together related tables, views, and functions within their own namespace.
+ You might use a separate database for each application, set of related tables, or round of experimentation.
+ </li>
+
+ <li class="li">
+ A physical construct represented by a directory tree in HDFS. Tables (internal tables), partitions, and
+ data files are all located under this directory. You can perform HDFS-level operations such as backing it up and measuring space usage,
+ or remove it with a <code class="ph codeph">DROP DATABASE</code> statement.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] <var class="keyword varname">database_name</var>[COMMENT '<var class="keyword varname">database_comment</var>']
+ [LOCATION <var class="keyword varname">hdfs_path</var>];</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ A database is physically represented as a directory in HDFS, with a filename extension <code class="ph codeph">.db</code>,
+ under the main Impala data directory. If the associated HDFS directory does not exist, it is created for you.
+ All databases and their associated directories are top-level objects, with no physical or logical nesting.
+ </p>
+
+ <p class="p">
+ After creating a database, to make it the current database within an <span class="keyword cmdname">impala-shell</span> session,
+ use the <code class="ph codeph">USE</code> statement. You can refer to tables in the current database without prepending
+ any qualifier to their names.
+ </p>
+
+ <p class="p">
+ When you first connect to Impala through <span class="keyword cmdname">impala-shell</span>, the database you start in (before
+ issuing any <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">USE</code> statements) is named
+ <code class="ph codeph">default</code>.
+ </p>
+
+ <div class="p">
+ Impala includes another predefined database, <code class="ph codeph">_impala_builtins</code>, that serves as the location
+ for the <a class="xref" href="../shared/../topics/impala_functions.html#builtins">built-in functions</a>. To see the built-in
+ functions, use a statement like the following:
+<pre class="pre codeblock"><code>show functions in _impala_builtins;
+show functions in _impala_builtins like '*<var class="keyword varname">substring</var>*';
+</code></pre>
+ </div>
+
+ <p class="p">
+ After creating a database, your <span class="keyword cmdname">impala-shell</span> session or another
+ <span class="keyword cmdname">impala-shell</span> connected to the same node can immediately access that database. To access
+ the database through the Impala daemon on a different node, issue the <code class="ph codeph">INVALIDATE METADATA</code>
+ statement first while connected to that other node.
+ </p>
+
+ <p class="p">
+ Setting the <code class="ph codeph">LOCATION</code> attribute for a new database is a way to work with sets of files in an
+ HDFS directory structure outside the default Impala data directory, as opposed to setting the
+ <code class="ph codeph">LOCATION</code> attribute for each individual table.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Hive considerations:</strong>
+ </p>
+
+ <p class="p">
+ When you create a database in Impala, the database can also be used by Hive.
+ When you create a database in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code>
+ statement in Impala to make Impala permanently aware of the new database.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SHOW DATABASES</code> statement lists all databases, or the databases whose name
+ matches a wildcard pattern. <span class="ph">In <span class="keyword">Impala 2.5</span> and higher, the
+ <code class="ph codeph">SHOW DATABASES</code> output includes a second column that displays the associated
+ comment, if any, for each database.</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+
+ <p class="p">
+ To specify that any tables created within a database reside on the Amazon S3 system,
+ you can include an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code>
+ attribute. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically creates any
+ required folders as the databases, tables, and partitions are created, and removes
+ them when they are dropped.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+ <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+ as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+ Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have write
+ permission for the parent HDFS directory under which the database
+ is located.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3 |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>,
+ <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>,
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_function.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_function.html b/docs/build3x/html/topics/impala_create_function.html
new file mode 100644
index 0000000..9b25620
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_function.html
@@ -0,0 +1,502 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE FUNCTION Statement</title></head><body id="create_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">CREATE FUNCTION Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Creates a user-defined function (UDF), which you can use to implement custom logic during
+ <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ The syntax is different depending on whether you create a scalar UDF, which is called once for each row and
+ implemented by a single function, or a user-defined aggregate function (UDA), which is implemented by
+ multiple functions that compute intermediate results across sets of rows.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, the syntax is also different for creating or dropping scalar Java-based UDFs.
+ The statements for Java UDFs use a new syntax, without any argument types or return type specified. Java-based UDFs
+ created using the new syntax persist across restarts of the Impala catalog server, and can be shared transparently
+ between Impala and Hive.
+ </p>
+
+ <p class="p">
+ To create a persistent scalar C++ UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+ RETURNS <var class="keyword varname">return_type</var>
+ LOCATION '<var class="keyword varname">hdfs_path_to_dot_so</var>'
+ SYMBOL='<var class="keyword varname">symbol_name</var>'</code></pre>
+
+ <div class="p">
+ To create a persistent Java UDF with <code class="ph codeph">CREATE FUNCTION</code>:
+<pre class="pre codeblock"><code>CREATE FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>
+ LOCATION '<var class="keyword varname">hdfs_path_to_jar</var>'
+ SYMBOL='<var class="keyword varname">class_name</var>'</code></pre>
+ </div>
+
+
+
+ <p class="p">
+ To create a persistent UDA, which must be written in C++, issue a <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE [AGGREGATE] FUNCTION [IF NOT EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>([<var class="keyword varname">arg_type</var>[, <var class="keyword varname">arg_type</var>...])
+ RETURNS <var class="keyword varname">return_type</var>
+ LOCATION '<var class="keyword varname">hdfs_path</var>'
+ [INIT_FN='<var class="keyword varname">function</var>]
+ UPDATE_FN='<var class="keyword varname">function</var>
+ MERGE_FN='<var class="keyword varname">function</var>
+ [PREPARE_FN='<var class="keyword varname">function</var>]
+ [CLOSEFN='<var class="keyword varname">function</var>]
+ <span class="ph">[SERIALIZE_FN='<var class="keyword varname">function</var>]</span>
+ [FINALIZE_FN='<var class="keyword varname">function</var>]
+ <span class="ph">[INTERMEDIATE <var class="keyword varname">type_spec</var>]</span></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Varargs notation:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Variable-length argument lists are supported for C++ UDFs, but currently not for Java UDFs.
+ </p>
+ </div>
+
+ <p class="p">
+ If the underlying implementation of your function accepts a variable number of arguments:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The variable arguments must go last in the argument list.
+ </li>
+
+ <li class="li">
+ The variable arguments must all be of the same type.
+ </li>
+
+ <li class="li">
+ You must include at least one instance of the variable arguments in every function call invoked from SQL.
+ </li>
+
+ <li class="li">
+ You designate the variable portion of the argument list in the <code class="ph codeph">CREATE FUNCTION</code> statement
+ by including <code class="ph codeph">...</code> immediately after the type name of the first variable argument. For
+ example, to create a function that accepts an <code class="ph codeph">INT</code> argument, followed by a
+ <code class="ph codeph">BOOLEAN</code>, followed by one or more <code class="ph codeph">STRING</code> arguments, your <code class="ph codeph">CREATE
+ FUNCTION</code> statement would look like:
+<pre class="pre codeblock"><code>CREATE FUNCTION <var class="keyword varname">func_name</var> (INT, BOOLEAN, STRING ...)
+ RETURNS <var class="keyword varname">type</var> LOCATION '<var class="keyword varname">path</var>' SYMBOL='<var class="keyword varname">entry_point</var>';
+</code></pre>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_udf.html#udf_varargs">Variable-Length Argument Lists</a> for how to code a C++ UDF to accept
+ variable-length argument lists.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Scalar and aggregate functions:</strong>
+ </p>
+
+ <p class="p">
+ The simplest kind of user-defined function returns a single scalar value each time it is called, typically
+ once for each row in the result set. This general kind of function is what is usually meant by UDF.
+ User-defined aggregate functions (UDAs) are a specialized kind of UDF that produce a single value based on
+ the contents of multiple rows. You usually use UDAs in combination with a <code class="ph codeph">GROUP BY</code> clause to
+ condense a large result set into a smaller one, or even a single row summarizing column values across an
+ entire table.
+ </p>
+
+ <p class="p">
+ You create UDAs by using the <code class="ph codeph">CREATE AGGREGATE FUNCTION</code> syntax. The clauses
+ <code class="ph codeph">INIT_FN</code>, <code class="ph codeph">UPDATE_FN</code>, <code class="ph codeph">MERGE_FN</code>,
+ <span class="ph"><code class="ph codeph">SERIALIZE_FN</code>,</span> <code class="ph codeph">FINALIZE_FN</code>, and
+ <code class="ph codeph">INTERMEDIATE</code> only apply when you create a UDA rather than a scalar UDF.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">*_FN</code> clauses specify functions to call at different phases of function processing.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <strong class="ph b">Initialize:</strong> The function you specify with the <code class="ph codeph">INIT_FN</code> clause does any initial
+ setup, such as initializing member variables in internal data structures. This function is often a stub for
+ simple UDAs. You can omit this clause and a default (no-op) function will be used.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Update:</strong> The function you specify with the <code class="ph codeph">UPDATE_FN</code> clause is called once for each
+ row in the original result set, that is, before any <code class="ph codeph">GROUP BY</code> clause is applied. A separate
+ instance of the function is called for each different value returned by the <code class="ph codeph">GROUP BY</code>
+ clause. The final argument passed to this function is a pointer, to which you write an updated value based
+ on its original value and the value of the first argument.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Merge:</strong> The function you specify with the <code class="ph codeph">MERGE_FN</code> clause is called an arbitrary
+ number of times, to combine intermediate values produced by different nodes or different threads as Impala
+ reads and processes data files in parallel. The final argument passed to this function is a pointer, to
+ which you write an updated value based on its original value and the value of the first argument.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Serialize:</strong> The function you specify with the <code class="ph codeph">SERIALIZE_FN</code> clause frees memory
+ allocated to intermediate results. It is required if any memory was allocated by the Allocate function in
+ the Init, Update, or Merge functions, or if the intermediate type contains any pointers. See
+ <span class="xref">the UDA code samples</span> for details.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Finalize:</strong> The function you specify with the <code class="ph codeph">FINALIZE_FN</code> clause does any required
+ teardown for resources acquired by your UDF, such as freeing memory, closing file handles if you explicitly
+ opened any files, and so on. This function is often a stub for simple UDAs. You can omit this clause and a
+ default (no-op) function will be used. It is required in UDAs where the final return type is different than
+ the intermediate type. or if any memory was allocated by the Allocate function in the Init, Update, or
+ Merge functions. See <span class="xref">the UDA code samples</span> for details.
+ </li>
+ </ul>
+
+ <p class="p">
+ If you use a consistent naming convention for each of the underlying functions, Impala can automatically
+ determine the names based on the first such clause, so the others are optional.
+ </p>
+
+
+
+ <p class="p">
+ For end-to-end examples of UDAs, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Currently, Impala UDFs cannot accept arguments or return values of the Impala complex types
+ (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ You can write Impala UDFs in either C++ or Java. C++ UDFs are new to Impala, and are the recommended format
+ for high performance utilizing native code. Java-based UDFs are compatible between Impala and Hive, and are
+ most suited to reusing existing Hive UDFs. (Impala can run Java-based Hive UDFs but not Hive UDAs.)
+ </li>
+
+ <li class="li">
+ <span class="keyword">Impala 2.5</span> introduces UDF improvements to persistence for both C++ and Java UDFs,
+ and better compatibility between Impala and Hive for Java UDFs.
+ See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+ </li>
+
+ <li class="li">
+ The body of the UDF is represented by a <code class="ph codeph">.so</code> or <code class="ph codeph">.jar</code> file, which you store
+ in HDFS and the <code class="ph codeph">CREATE FUNCTION</code> statement distributes to each Impala node.
+ </li>
+
+ <li class="li">
+ Impala calls the underlying code during SQL statement evaluation, as many times as needed to process all
+ the rows from the result set. All UDFs are assumed to be deterministic, that is, to always return the same
+ result when passed the same argument values. Impala might or might not skip some invocations of a UDF if
+ the result value is already known from a previous call. Therefore, do not rely on the UDF being called a
+ specific number of times, and do not return different result values based on some external factor such as
+ the current time, a random number function, or an external data source that could be updated while an
+ Impala query is in progress.
+ </li>
+
+ <li class="li">
+ The names of the function arguments in the UDF are not significant, only their number, positions, and data
+ types.
+ </li>
+
+ <li class="li">
+ You can overload the same function name by creating multiple versions of the function, each with a
+ different argument signature. For security reasons, you cannot make a UDF with the same name as any
+ built-in function.
+ </li>
+
+ <li class="li">
+ In the UDF code, you represent the function return result as a <code class="ph codeph">struct</code>. This
+ <code class="ph codeph">struct</code> contains 2 fields. The first field is a <code class="ph codeph">boolean</code> representing
+ whether the value is <code class="ph codeph">NULL</code> or not. (When this field is <code class="ph codeph">true</code>, the return
+ value is interpreted as <code class="ph codeph">NULL</code>.) The second field is the same type as the specified function
+ return type, and holds the return value when the function returns something other than
+ <code class="ph codeph">NULL</code>.
+ </li>
+
+ <li class="li">
+ In the UDF code, you represent the function arguments as an initial pointer to a UDF context structure,
+ followed by references to zero or more <code class="ph codeph">struct</code>s, corresponding to each of the arguments.
+ Each <code class="ph codeph">struct</code> has the same 2 fields as with the return value, a <code class="ph codeph">boolean</code>
+ field representing whether the argument is <code class="ph codeph">NULL</code>, and a field of the appropriate type
+ holding any non-<code class="ph codeph">NULL</code> argument value.
+ </li>
+
+ <li class="li">
+ For sample code and build instructions for UDFs,
+ see <span class="xref">the sample UDFs in the Impala github repo</span>.
+ </li>
+
+ <li class="li">
+ Because the file representing the body of the UDF is stored in HDFS, it is automatically available to all
+ the Impala nodes. You do not need to manually copy any UDF-related files between servers.
+ </li>
+
+ <li class="li">
+ Because Impala currently does not have any <code class="ph codeph">ALTER FUNCTION</code> statement, if you need to rename
+ a function, move it to a different database, or change its signature or other properties, issue a
+ <code class="ph codeph">DROP FUNCTION</code> statement for the original function followed by a <code class="ph codeph">CREATE
+ FUNCTION</code> with the desired properties.
+ </li>
+
+ <li class="li">
+ Because each UDF is associated with a particular database, either issue a <code class="ph codeph">USE</code> statement
+ before doing any <code class="ph codeph">CREATE FUNCTION</code> statements, or specify the name of the function as
+ <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">function_name</var></code>.
+ </li>
+ </ul>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <p class="p">
+ Impala can run UDFs that were created through Hive, as long as they refer to Impala-compatible data types
+ (not composite or nested column types). Hive can run Java-based UDFs that were created through Impala, but
+ not Impala UDFs written in C++.
+ </p>
+
+ <p class="p">
+ The Hive <code class="ph codeph">current_user()</code> function cannot be
+ called from a Java UDF through Impala.
+ </p>
+
+ <p class="p"><strong class="ph b">Persistence:</strong></p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+ Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+ where the Java function argument and return types are omitted.
+ Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+ because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+ Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+ you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+ you restart the <span class="keyword cmdname">catalogd</span> daemon.
+ Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ For additional examples of all kinds of user-defined functions, see <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+ </p>
+
+ <p class="p">
+ The following example shows how to take a Java jar file and make all the functions inside one of its classes
+ into UDFs under a single (overloaded) function name in Impala. Each <code class="ph codeph">CREATE FUNCTION</code> or
+ <code class="ph codeph">DROP FUNCTION</code> statement applies to all the overloaded Java functions with the same name.
+ This example uses the signatureless syntax for <code class="ph codeph">CREATE FUNCTION</code> and <code class="ph codeph">DROP FUNCTION</code>,
+ which is available in <span class="keyword">Impala 2.5</span> and higher.
+ </p>
+ <p class="p">
+ At the start, the jar file is in the local filesystem. Then it is copied into HDFS, so that it is
+ available for Impala to reference through the <code class="ph codeph">CREATE FUNCTION</code> statement and
+ queries that refer to the Impala function name.
+ </p>
+<pre class="pre codeblock"><code>
+$ jar -tvf udf-examples.jar
+ 0 Mon Feb 22 04:06:50 PST 2016 META-INF/
+ 122 Mon Feb 22 04:06:48 PST 2016 META-INF/MANIFEST.MF
+ 0 Mon Feb 22 04:06:46 PST 2016 org/
+ 0 Mon Feb 22 04:06:46 PST 2016 org/apache/
+ 0 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/
+ 2460 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/IncompatibleUdfTest.class
+ 541 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdfException.class
+ 3438 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/JavaUdfTest.class
+ 5872 Mon Feb 22 04:06:46 PST 2016 org/apache/impala/TestUdf.class
+...
+$ hdfs dfs -put udf-examples.jar /user/impala/udfs
+$ hdfs dfs -ls /user/impala/udfs
+Found 2 items
+-rw-r--r-- 3 jrussell supergroup 853 2015-10-09 14:05 /user/impala/udfs/hello_world.jar
+-rw-r--r-- 3 jrussell supergroup 7366 2016-06-08 14:25 /user/impala/udfs/udf-examples.jar
+</code></pre>
+ <p class="p">
+ In <span class="keyword cmdname">impala-shell</span>, the <code class="ph codeph">CREATE FUNCTION</code> refers to the HDFS path of the jar file
+ and the fully qualified class name inside the jar. Each of the functions inside the class becomes an
+ Impala function, each one overloaded under the specified Impala function name.
+ </p>
+<pre class="pre codeblock"><code>
+[localhost:21000] > create function testudf location '/user/impala/udfs/udf-examples.jar' symbol='org.apache.impala.TestUdf';
+[localhost:21000] > show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT | testudf(BIGINT) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN, BOOLEAN) | JAVA | true |
+| DOUBLE | testudf(DOUBLE) | JAVA | true |
+| DOUBLE | testudf(DOUBLE, DOUBLE) | JAVA | true |
+| DOUBLE | testudf(DOUBLE, DOUBLE, DOUBLE) | JAVA | true |
+| FLOAT | testudf(FLOAT) | JAVA | true |
+| FLOAT | testudf(FLOAT, FLOAT) | JAVA | true |
+| FLOAT | testudf(FLOAT, FLOAT, FLOAT) | JAVA | true |
+| INT | testudf(INT) | JAVA | true |
+| DOUBLE | testudf(INT, DOUBLE) | JAVA | true |
+| INT | testudf(INT, INT) | JAVA | true |
+| INT | testudf(INT, INT, INT) | JAVA | true |
+| SMALLINT | testudf(SMALLINT) | JAVA | true |
+| SMALLINT | testudf(SMALLINT, SMALLINT) | JAVA | true |
+| SMALLINT | testudf(SMALLINT, SMALLINT, SMALLINT) | JAVA | true |
+| STRING | testudf(STRING) | JAVA | true |
+| STRING | testudf(STRING, STRING) | JAVA | true |
+| STRING | testudf(STRING, STRING, STRING) | JAVA | true |
+| TINYINT | testudf(TINYINT) | JAVA | true |
++-------------+---------------------------------------+-------------+---------------+
+</code></pre>
+ <p class="p">
+ These are all simple functions that return their single arguments, or
+ sum, concatenate, and so on their multiple arguments. Impala determines which
+ overloaded function to use based on the number and types of the arguments.
+ </p>
+<pre class="pre codeblock"><code>
+insert into bigint_x values (1), (2), (4), (3);
+select testudf(x) from bigint_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| 1 |
+| 2 |
+| 4 |
+| 3 |
++-----------------+
+
+insert into int_x values (1), (2), (4), (3);
+select testudf(x, x+1, x*x) from int_x;
++-------------------------------+
+| udfs.testudf(x, x + 1, x * x) |
++-------------------------------+
+| 4 |
+| 9 |
+| 25 |
+| 16 |
++-------------------------------+
+
+select testudf(x) from string_x;
++-----------------+
+| udfs.testudf(x) |
++-----------------+
+| one |
+| two |
+| four |
+| three |
++-----------------+
+select testudf(x,x) from string_x;
++--------------------+
+| udfs.testudf(x, x) |
++--------------------+
+| oneone |
+| twotwo |
+| fourfour |
+| threethree |
++--------------------+
+</code></pre>
+
+ <p class="p">
+ The previous example used the same Impala function name as the name of the class.
+ This example shows how the Impala function name is independent of the underlying
+ Java class or function names. A second <code class="ph codeph">CREATE FUNCTION</code> statement
+ results in a set of overloaded functions all named <code class="ph codeph">my_func</code>,
+ to go along with the overloaded functions all named <code class="ph codeph">testudf</code>.
+ </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+ symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT | my_func(BIGINT) | JAVA | true |
+| BOOLEAN | my_func(BOOLEAN) | JAVA | true |
+| BOOLEAN | my_func(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+| BIGINT | testudf(BIGINT) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+</code></pre>
+ <p class="p">
+ The corresponding <code class="ph codeph">DROP FUNCTION</code> statement with no signature
+ drops all the overloaded functions with that name.
+ </p>
+<pre class="pre codeblock"><code>
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT | testudf(BIGINT) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+</code></pre>
+ <p class="p">
+ The signatureless <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs ensures that
+ the functions shown in this example remain available after the Impala service
+ (specifically, the Catalog Server) are restarted.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for more background information, usage instructions, and examples for
+ Impala UDFs; <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_create_role.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_create_role.html b/docs/build3x/html/topics/impala_create_role.html
new file mode 100644
index 0000000..2930c3a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_create_role.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="create_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CREATE ROLE Statement (Impala 2.0 or higher only)</title></head><body id="create_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">CREATE ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ The <code class="ph codeph">CREATE ROLE</code> statement creates a role to which privileges can be granted. Privileges can
+ be granted to roles, which can then be assigned to users. A user that has been assigned a role will only be
+ able to exercise the privileges of that role. Only users that have administrative privileges can create/drop
+ roles. By default, the <code class="ph codeph">hive</code>, <code class="ph codeph">impala</code> and <code class="ph codeph">hue</code> users have
+ administrative privileges in Sentry.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Required privileges:</strong>
+ </p>
+
+ <p class="p">
+ Only administrative users (those with <code class="ph codeph">ALL</code> privileges on the server, defined in the Sentry
+ policy file) can use this statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <p class="p">
+ Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+ <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+ and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+ repeated before each role name, unlike the equivalent Hive statements.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
[45/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_avro.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_avro.html b/docs/build3x/html/topics/impala_avro.html
new file mode 100644
index 0000000..2c6c196
--- /dev/null
+++ b/docs/build3x/html/topics/impala_avro.html
@@ -0,0 +1,565 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta
name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avro"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Avro File Format with Impala Tables</title></head><body id="avro"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using the Avro File Format with Impala Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports using tables whose data files use the Avro file format. Impala can query Avro
+ tables, and in Impala 1.4.0 and higher can create them, but currently cannot insert data into them. For
+ insert operations, use Hive, then switch back to Impala to run queries.
+ </p>
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Avro Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="avro__entry__1">
+ File Type
+ </th>
+ <th class="entry nocellnorowborder" id="avro__entry__2">
+ Format
+ </th>
+ <th class="entry nocellnorowborder" id="avro__entry__3">
+ Compression Codecs
+ </th>
+ <th class="entry nocellnorowborder" id="avro__entry__4">
+ Impala Can CREATE?
+ </th>
+ <th class="entry nocellnorowborder" id="avro__entry__5">
+ Impala Can INSERT?
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="avro__entry__1 ">
+ <a class="xref" href="impala_avro.html#avro">Avro</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="avro__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="avro__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="avro__entry__4 ">
+ Yes, in Impala 1.4.0 and higher. Before that, create the table using Hive.
+ </td>
+ <td class="entry nocellnorowborder" headers="avro__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ </tbody></table>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="avro__avro_create_table">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Creating Avro Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To create a new table using the Avro file format, issue the <code class="ph codeph">CREATE TABLE</code> statement through
+ Impala with the <code class="ph codeph">STORED AS AVRO</code> clause, or through Hive. If you create the table through
+ Impala, you must include column definitions that match the fields specified in the Avro schema. With Hive,
+ you can omit the columns and just specify the Avro schema.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">CREATE TABLE</code> for Avro tables can include
+ SQL-style column definitions rather than specifying Avro notation through the <code class="ph codeph">TBLPROPERTIES</code>
+ clause. Impala issues warning messages if there are any mismatches between the types specified in the
+ SQL column definitions and the underlying types; for example, any <code class="ph codeph">TINYINT</code> or
+ <code class="ph codeph">SMALLINT</code> columns are treated as <code class="ph codeph">INT</code> in the underlying Avro files,
+ and therefore are displayed as <code class="ph codeph">INT</code> in any <code class="ph codeph">DESCRIBE</code> or
+ <code class="ph codeph">SHOW CREATE TABLE</code> output.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Currently, Avro tables cannot contain <code class="ph codeph">TIMESTAMP</code> columns. If you need to store date and
+ time values in Avro tables, as a workaround you can use a <code class="ph codeph">STRING</code> representation of the
+ values, convert the values to <code class="ph codeph">BIGINT</code> with the <code class="ph codeph">UNIX_TIMESTAMP()</code> function,
+ or create separate numeric columns for individual date and time fields using the <code class="ph codeph">EXTRACT()</code>
+ function.
+ </p>
+ </div>
+
+
+
+ <p class="p">
+ The following examples demonstrate creating an Avro table in Impala, using either an inline column
+ specification or one taken from a JSON file stored in HDFS:
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > CREATE TABLE avro_only_sql_columns
+ > (
+ > id INT,
+ > bool_col BOOLEAN,
+ > tinyint_col TINYINT, /* Gets promoted to INT */
+ > smallint_col SMALLINT, /* Gets promoted to INT */
+ > int_col INT,
+ > bigint_col BIGINT,
+ > float_col FLOAT,
+ > double_col DOUBLE,
+ > date_string_col STRING,
+ > string_col STRING
+ > )
+ > STORED AS AVRO;
+
+[localhost:21000] > CREATE TABLE impala_avro_table
+ > (bool_col BOOLEAN, int_col INT, long_col BIGINT, float_col FLOAT, double_col DOUBLE, string_col STRING, nullable_int INT)
+ > STORED AS AVRO
+ > TBLPROPERTIES ('avro.schema.literal'='{
+ > "name": "my_record",
+ > "type": "record",
+ > "fields": [
+ > {"name":"bool_col", "type":"boolean"},
+ > {"name":"int_col", "type":"int"},
+ > {"name":"long_col", "type":"long"},
+ > {"name":"float_col", "type":"float"},
+ > {"name":"double_col", "type":"double"},
+ > {"name":"string_col", "type":"string"},
+ > {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+[localhost:21000] > CREATE TABLE avro_examples_of_all_types (
+ > id INT,
+ > bool_col BOOLEAN,
+ > tinyint_col TINYINT,
+ > smallint_col SMALLINT,
+ > int_col INT,
+ > bigint_col BIGINT,
+ > float_col FLOAT,
+ > double_col DOUBLE,
+ > date_string_col STRING,
+ > string_col STRING
+ > )
+ > STORED AS AVRO
+ > TBLPROPERTIES ('avro.schema.url'='hdfs://localhost:8020/avro_schemas/alltypes.json');
+
+</code></pre>
+
+ <p class="p">
+ The following example demonstrates creating an Avro table in Hive:
+ </p>
+
+<pre class="pre codeblock"><code>
+hive> CREATE TABLE hive_avro_table
+ > ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+ > STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+ > OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+ > TBLPROPERTIES ('avro.schema.literal'='{
+ > "name": "my_record",
+ > "type": "record",
+ > "fields": [
+ > {"name":"bool_col", "type":"boolean"},
+ > {"name":"int_col", "type":"int"},
+ > {"name":"long_col", "type":"long"},
+ > {"name":"float_col", "type":"float"},
+ > {"name":"double_col", "type":"double"},
+ > {"name":"string_col", "type":"string"},
+ > {"name": "nullable_int", "type": ["null", "int"]}]}');
+
+</code></pre>
+
+ <p class="p">
+ Each field of the record becomes a column of the table. Note that any other information, such as the record
+ name, is ignored.
+ </p>
+
+
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For nullable Avro columns, make sure to put the <code class="ph codeph">"null"</code> entry before the actual type name.
+ In Impala, all columns are nullable; Impala currently does not have a <code class="ph codeph">NOT NULL</code> clause. Any
+ non-nullable property is only enforced on the Avro side.
+ </div>
+
+ <p class="p">
+ Most column types map directly from Avro to Impala under the same names. These are the exceptions and
+ special cases to consider:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">DECIMAL</code> type is defined in Avro as a <code class="ph codeph">BYTE</code> type with the
+ <code class="ph codeph">logicalType</code> property set to <code class="ph codeph">"decimal"</code> and a specified precision and
+ scale.
+ </li>
+
+ <li class="li">
+ The Avro <code class="ph codeph">long</code> type maps to <code class="ph codeph">BIGINT</code> in Impala.
+ </li>
+ </ul>
+
+ <p class="p">
+ If you create the table through Hive, switch back to <span class="keyword cmdname">impala-shell</span> and issue an
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. Then you can run queries for
+ that table through <span class="keyword cmdname">impala-shell</span>.
+ </p>
+
+ <div class="p">
+ In rare instances, a mismatch could occur between the Avro schema and the column definitions in the
+ metastore database. In <span class="keyword">Impala 2.3</span> and higher, Impala checks for such inconsistencies during
+ a <code class="ph codeph">CREATE TABLE</code> statement and each time it loads the metadata for a table (for example,
+ after <code class="ph codeph">INVALIDATE METADATA</code>). Impala uses the following rules to determine how to treat
+ mismatching columns, a process known as <dfn class="term">schema reconciliation</dfn>:
+ <ul class="ul">
+ <li class="li">
+ If there is a mismatch in the number of columns, Impala uses the column
+ definitions from the Avro schema.
+ </li>
+ <li class="li">
+ If there is a mismatch in column name or type, Impala uses the column definition from the Avro schema.
+ Because a <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> column in Impala maps to an Avro <code class="ph codeph">STRING</code>,
+ this case is not considered a mismatch and the column is preserved as <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+ in the reconciled schema. <span class="ph">Prior to <span class="keyword">Impala 2.7</span> the column
+ name and comment for such <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns was also taken from the SQL column definition.
+ In <span class="keyword">Impala 2.7</span> and higher, the column name and comment from the Avro schema file take precedence for such columns,
+ and only the <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code> type is preserved from the SQL column definition.</span>
+ </li>
+ <li class="li">
+ An Impala <code class="ph codeph">TIMESTAMP</code> column definition maps to an Avro <code class="ph codeph">STRING</code> and is presented as a <code class="ph codeph">STRING</code>
+ in the reconciled schema, because Avro has no binary <code class="ph codeph">TIMESTAMP</code> representation.
+ As a result, no Avro table can have a <code class="ph codeph">TIMESTAMP</code> column; this restriction is the same as
+ in earlier Impala releases.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ Although you can create tables in this file format using
+ the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+ and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+ currently, Impala can query these types only in Parquet tables.
+ <span class="ph">
+ The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+ Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+ </span>
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="avro__avro_map_table">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Using a Hive-Created Avro Table in Impala</h2>
+
+ <div class="body conbody">
+
+ <div class="p">
+ If you have an Avro table created through Hive, you can use it in Impala as long as it contains only
+ Impala-compatible data types. It cannot contain:
+ <ul class="ul">
+ <li class="li">
+ Complex types: <code class="ph codeph">array</code>, <code class="ph codeph">map</code>, <code class="ph codeph">record</code>,
+ <code class="ph codeph">struct</code>, <code class="ph codeph">union</code> other than
+ <code class="ph codeph">[<var class="keyword varname">supported_type</var>,null]</code> or
+ <code class="ph codeph">[null,<var class="keyword varname">supported_type</var>]</code>
+ </li>
+
+ <li class="li">
+ The Avro-specific types <code class="ph codeph">enum</code>, <code class="ph codeph">bytes</code>, and <code class="ph codeph">fixed</code>
+ </li>
+
+ <li class="li">
+ Any scalar type other than those listed in <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>
+ </li>
+ </ul>
+ Because Impala and Hive share the same metastore database, Impala can directly access the table definitions
+ and data for tables that were created in Hive.
+ </div>
+
+ <p class="p">
+ If you create an Avro table in Hive, issue an <code class="ph codeph">INVALIDATE METADATA</code> the next time you
+ connect to Impala through <span class="keyword cmdname">impala-shell</span>. This is a one-time operation to make Impala
+ aware of the new table. You can issue the statement while connected to any Impala node, and the catalog
+ service broadcasts the change to all other Impala nodes.
+ </p>
+
+ <p class="p">
+ If you load new data into an Avro table through Hive, either through a Hive <code class="ph codeph">LOAD DATA</code> or
+ <code class="ph codeph">INSERT</code> statement, or by manually copying or moving files into the data directory for the
+ table, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement the next time you connect
+ to Impala through <span class="keyword cmdname">impala-shell</span>. You can issue the statement while connected to any
+ Impala node, and the catalog service broadcasts the change to all other Impala nodes. If you issue the
+ <code class="ph codeph">LOAD DATA</code> statement through Impala, you do not need a <code class="ph codeph">REFRESH</code> afterward.
+ </p>
+
+ <p class="p">
+ Impala only supports fields of type <code class="ph codeph">boolean</code>, <code class="ph codeph">int</code>, <code class="ph codeph">long</code>,
+ <code class="ph codeph">float</code>, <code class="ph codeph">double</code>, and <code class="ph codeph">string</code>, or unions of these types with
+ null; for example, <code class="ph codeph">["string", "null"]</code>. Unions with <code class="ph codeph">null</code> essentially
+ create a nullable type.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="avro__avro_json">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Specifying the Avro Schema through JSON</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ While you can embed a schema directly in your <code class="ph codeph">CREATE TABLE</code> statement, as shown above,
+ column width restrictions in the Hive metastore limit the length of schema you can specify. If you
+ encounter problems with long schema literals, try storing your schema as a <code class="ph codeph">JSON</code> file in
+ HDFS instead. Specify your schema in HDFS using table properties similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>tblproperties ('avro.schema.url'='hdfs//your-name-node:port/path/to/schema.json');</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="avro__avro_load_data">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Loading Data into an Avro Table</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Currently, Impala cannot write Avro data files. Therefore, an Avro table cannot be used as the destination
+ of an Impala <code class="ph codeph">INSERT</code> statement or <code class="ph codeph">CREATE TABLE AS SELECT</code>.
+ </p>
+
+ <p class="p">
+ To copy data from another table, issue any <code class="ph codeph">INSERT</code> statements through Hive. For information
+ about loading data into Avro tables through Hive, see
+ <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/AvroSerDe" target="_blank">Avro
+ page on the Hive wiki</a>.
+ </p>
+
+ <p class="p">
+ If you already have data files in Avro format, you can also issue <code class="ph codeph">LOAD DATA</code> in either
+ Impala or Hive. Impala can move existing Avro data files into an Avro table, it just cannot create new
+ Avro data files.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="avro__avro_compression">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Enabling Compression for Avro Tables</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ To enable compression for Avro tables, specify settings in the Hive shell to enable compression and to
+ specify a codec, then issue a <code class="ph codeph">CREATE TABLE</code> statement as in the preceding examples. Impala
+ supports the <code class="ph codeph">snappy</code> and <code class="ph codeph">deflate</code> codecs for Avro tables.
+ </p>
+
+ <p class="p">
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>hive> set hive.exec.compress.output=true;
+hive> set avro.output.codec=snappy;</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="avro__avro_schema_evolution">
+
+ <h2 class="title topictitle2" id="ariaid-title7">How Impala Handles Avro Schema Evolution</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Starting in Impala 1.1, Impala can deal with Avro data files that employ <dfn class="term">schema evolution</dfn>,
+ where different data files within the same table use slightly different type definitions. (You would
+ perform the schema evolution operation by issuing an <code class="ph codeph">ALTER TABLE</code> statement in the Hive
+ shell.) The old and new types for any changed columns must be compatible, for example a column might start
+ as an <code class="ph codeph">int</code> and later change to a <code class="ph codeph">bigint</code> or <code class="ph codeph">float</code>.
+ </p>
+
+ <p class="p">
+ As with any other tables where the definitions are changed or data is added outside of the current
+ <span class="keyword cmdname">impalad</span> node, ensure that Impala loads the latest metadata for the table if the Avro
+ schema is modified through Hive. Issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement. <code class="ph codeph">REFRESH</code>
+ reloads the metadata immediately, <code class="ph codeph">INVALIDATE METADATA</code> reloads the metadata the next time
+ the table is accessed.
+ </p>
+
+ <p class="p">
+ When Avro data files or columns are not consulted during a query, Impala does not check for consistency.
+ Thus, if you issue <code class="ph codeph">SELECT c1, c2 FROM t1</code>, Impala does not return any error if the column
+ <code class="ph codeph">c3</code> changed in an incompatible way. If a query retrieves data from some partitions but not
+ others, Impala does not check the data files for the unused partitions.
+ </p>
+
+ <p class="p">
+ In the Hive DDL statements, you can specify an <code class="ph codeph">avro.schema.literal</code> table property (if the
+ schema definition is short) or an <code class="ph codeph">avro.schema.url</code> property (if the schema definition is
+ long, or to allow convenient editing for the definition).
+ </p>
+
+ <p class="p">
+ For example, running the following SQL code in the Hive shell creates a table using the Avro file format
+ and puts some sample data into it:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE avro_table (a string, b string)
+ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
+STORED AS INPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
+OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
+TBLPROPERTIES (
+ 'avro.schema.literal'='{
+ "type": "record",
+ "name": "my_record",
+ "fields": [
+ {"name": "a", "type": "int"},
+ {"name": "b", "type": "string"}
+ ]}');
+
+INSERT OVERWRITE TABLE avro_table SELECT 1, "avro" FROM functional.alltypes LIMIT 1;
+</code></pre>
+
+ <p class="p">
+ Once the Avro table is created and contains data, you can query it through the
+ <span class="keyword cmdname">impala-shell</span> command:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select * from avro_table;
++---+------+
+| a | b |
++---+------+
+| 1 | avro |
++---+------+
+</code></pre>
+
+ <p class="p">
+ Now in the Hive shell, you change the type of a column and add a new column with a default value:
+ </p>
+
+<pre class="pre codeblock"><code>-- Promote column "a" from INT to FLOAT (no need to update Avro schema)
+ALTER TABLE avro_table CHANGE A A FLOAT;
+
+-- Add column "c" with default
+ALTER TABLE avro_table ADD COLUMNS (c int);
+ALTER TABLE avro_table SET TBLPROPERTIES (
+ 'avro.schema.literal'='{
+ "type": "record",
+ "name": "my_record",
+ "fields": [
+ {"name": "a", "type": "int"},
+ {"name": "b", "type": "string"},
+ {"name": "c", "type": "int", "default": 10}
+ ]}');
+</code></pre>
+
+ <p class="p">
+ Once again in <span class="keyword cmdname">impala-shell</span>, you can query the Avro table based on its latest schema
+ definition. Because the table metadata was changed outside of Impala, you issue a <code class="ph codeph">REFRESH</code>
+ statement first so that Impala has up-to-date metadata for the table.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > refresh avro_table;
+[localhost:21000] > select * from avro_table;
++---+------+----+
+| a | b | c |
++---+------+----+
+| 1 | avro | 10 |
++---+------+----+
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="avro__avro_data_types">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Data Type Considerations for Avro Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Avro format defines a set of data types whose names differ from the names of the corresponding Impala
+ data types. If you are preparing Avro files using other Hadoop components such as Pig or MapReduce, you
+ might need to work with the type names defined by Avro. The following figure lists the Avro-defined types
+ and the equivalent types in Impala.
+ </p>
+
+<pre class="pre codeblock"><code>Primitive Types (Avro -> Impala)
+--------------------------------
+STRING -> STRING
+STRING -> CHAR
+STRING -> VARCHAR
+INT -> INT
+BOOLEAN -> BOOLEAN
+LONG -> BIGINT
+FLOAT -> FLOAT
+DOUBLE -> DOUBLE
+
+Logical Types
+-------------
+BYTES + logicalType = "decimal" -> DECIMAL
+
+Avro Types with No Impala Equivalent
+------------------------------------
+RECORD, MAP, ARRAY, UNION, ENUM, FIXED, NULL
+
+Impala Types with No Avro Equivalent
+------------------------------------
+TIMESTAMP
+
+</code></pre>
+
+ <p class="p">
+ The Avro specification allows string values up to 2**64 bytes in length.
+ Impala queries for Avro tables use 32-bit integers to hold string lengths.
+ In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+ and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+ If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+ bytes in an Avro table, the query fails. In earlier releases,
+ encountering such long values in an Avro table could cause a crash.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="avro__avro_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Query Performance for Impala Avro Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In general, expect query performance with Avro tables to be
+ faster than with tables using text data, but slower than with
+ Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+ for information about using the Parquet file format for
+ high-performance analytic queries.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_batch_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_batch_size.html b/docs/build3x/html/topics/impala_batch_size.html
new file mode 100644
index 0000000..cf89ad1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_batch_size.html
@@ -0,0 +1,34 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="batch_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BATCH_SIZE Query Option</title></head><body id="batch_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">BATCH_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Number of rows evaluated at a time by SQL operators. Unspecified or a size of 0 uses a predefined default
+ size. Using a large number improves responsiveness, especially for scan operations, at the cost of a higher memory footprint.
+ </p>
+
+ <p class="p">
+ This option is primarily for testing during Impala development, or for use under the direction of <span class="keyword">the appropriate support channel</span>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (meaning the predefined default of 1024)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> 0-65536. The value of 0 still has the special meaning of <span class="q">"use the default"</span>,
+ so the effective range is 1-65536. The maximum applies in <span class="keyword">Impala 2.11</span> and higher.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_bigint.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_bigint.html b/docs/build3x/html/topics/impala_bigint.html
new file mode 100644
index 0000000..ac3d700
--- /dev/null
+++ b/docs/build3x/html/topics/impala_bigint.html
@@ -0,0 +1,138 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bigint"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BIGINT Data Type</title></head><body id="bigint"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">BIGINT Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ An 8-byte integer data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BIGINT</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> -9223372036854775808 .. 9223372036854775807. There is no <code class="ph codeph">UNSIGNED</code> subtype.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala automatically converts to a floating-point type (<code class="ph codeph">FLOAT</code> or
+ <code class="ph codeph">DOUBLE</code>) automatically. Use <code class="ph codeph">CAST()</code> to convert to <code class="ph codeph">TINYINT</code>,
+ <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">STRING</code>, or <code class="ph codeph">TIMESTAMP</code>.
+ <span class="ph">
+ Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x BIGINT);
+SELECT CAST(1000 AS BIGINT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">BIGINT</code> is a convenient type to use for column declarations because you can use any kind of
+ integer values in <code class="ph codeph">INSERT</code> statements and they are promoted to <code class="ph codeph">BIGINT</code> where
+ necessary. However, <code class="ph codeph">BIGINT</code> also requires the most bytes of any integer type on disk and in
+ memory, meaning your queries are not as efficient and scalable as possible if you overuse this type.
+ Therefore, prefer to use the smallest integer type with sufficient range to hold all input values, and
+ <code class="ph codeph">CAST()</code> when necessary to the appropriate type.
+ </p>
+
+ <p class="p">
+ For a convenient and automated way to check the bounds of the <code class="ph codeph">BIGINT</code> type, call the
+ functions <code class="ph codeph">MIN_BIGINT()</code> and <code class="ph codeph">MAX_BIGINT()</code>.
+ </p>
+
+ <p class="p">
+ If an integer value is too large to be represented as a <code class="ph codeph">BIGINT</code>, use a
+ <code class="ph codeph">DECIMAL</code> instead with sufficient digits of precision.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong> Casting any non-numeric value to this type produces a <code class="ph codeph">NULL</code>
+ value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Prefer to use this type for a partition key column. Impala can process the numeric
+ type more efficiently than a <code class="ph codeph">STRING</code> representation of the value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Available in all versions of Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Sqoop considerations:</strong>
+ </p>
+
+ <p class="p"> If you use Sqoop to
+ convert RDBMS data to Parquet, be careful with interpreting any
+ resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+ or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+ represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+ represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+ values represent the time in milliseconds, while Impala interprets
+ <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+ a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+ this way from Sqoop, divide the values by 1000 when interpreting as the
+ <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+ <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_bit_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_bit_functions.html b/docs/build3x/html/topics/impala_bit_functions.html
new file mode 100644
index 0000000..4c33b22
--- /dev/null
+++ b/docs/build3x/html/topics/impala_bit_functions.html
@@ -0,0 +1,848 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="bit_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Bit Functions</title></head><body id="bit_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Bit Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Bit manipulation functions perform bitwise operations involved in scientific processing or computer science algorithms.
+ For example, these functions include setting, clearing, or testing bits within an integer value, or changing the
+ positions of bits with or without wraparound.
+ </p>
+
+ <p class="p">
+ If a function takes two integer arguments that are required to be of the same type, the smaller argument is promoted
+ to the type of the larger one if required. For example, <code class="ph codeph">BITAND(1,4096)</code> treats both arguments as
+ <code class="ph codeph">SMALLINT</code>, because 1 can be represented as a <code class="ph codeph">TINYINT</code> but 4096 requires a <code class="ph codeph">SMALLINT</code>.
+ </p>
+
+ <p class="p">
+ Remember that all Impala integer values are signed. Therefore, when dealing with binary values where the most significant
+ bit is 1, the specified or returned values might be negative when represented in base 10.
+ </p>
+
+ <p class="p">
+ Whenever any argument is <code class="ph codeph">NULL</code>, either the input value, bit position, or number of shift or rotate positions,
+ the return value from any of these functions is also <code class="ph codeph">NULL</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The bit functions operate on all the integral data types: <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>, <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>, and
+ <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following bit functions:
+ </p>
+
+
+
+ <dl class="dl">
+
+
+
+ <dt class="dt dlterm" id="bit_functions__bitand">
+ <code class="ph codeph">bitand(integer_type a, same_type b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in both of the arguments.
+ If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitand()</code> function is equivalent to the <code class="ph codeph">&</code> binary operator.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the results of ANDing integer values.
+ 255 contains all 1 bits in its lowermost 7 bits.
+ 32767 contains all 1 bits in its lowermost 15 bits.
+
+ You can use the <code class="ph codeph">bin()</code> function to check the binary representation of any
+ integer value, although the result is always represented as a 64-bit value.
+ If necessary, the smaller argument is promoted to the
+ type of the larger one.
+ </p>
+<pre class="pre codeblock"><code>select bitand(255, 32767); /* 0000000011111111 & 0111111111111111 */
++--------------------+
+| bitand(255, 32767) |
++--------------------+
+| 255 |
++--------------------+
+
+select bitand(32767, 1); /* 0111111111111111 & 0000000000000001 */
++------------------+
+| bitand(32767, 1) |
++------------------+
+| 1 |
++------------------+
+
+select bitand(32, 16); /* 00010000 & 00001000 */
++----------------+
+| bitand(32, 16) |
++----------------+
+| 0 |
++----------------+
+
+select bitand(12,5); /* 00001100 & 00000101 */
++---------------+
+| bitand(12, 5) |
++---------------+
+| 4 |
++---------------+
+
+select bitand(-1,15); /* 11111111 & 00001111 */
++----------------+
+| bitand(-1, 15) |
++----------------+
+| 15 |
++----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__bitnot">
+ <code class="ph codeph">bitnot(integer_type a)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Inverts all the bits of the input argument.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitnot()</code> function is equivalent to the <code class="ph codeph">~</code> unary operator.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ These examples illustrate what happens when you flip all the bits of an integer value.
+ The sign always changes. The decimal representation is one different between the positive and
+ negative values.
+
+ </p>
+<pre class="pre codeblock"><code>select bitnot(127); /* 01111111 -> 10000000 */
++-------------+
+| bitnot(127) |
++-------------+
+| -128 |
++-------------+
+
+select bitnot(16); /* 00010000 -> 11101111 */
++------------+
+| bitnot(16) |
++------------+
+| -17 |
++------------+
+
+select bitnot(0); /* 00000000 -> 11111111 */
++-----------+
+| bitnot(0) |
++-----------+
+| -1 |
++-----------+
+
+select bitnot(-128); /* 10000000 -> 01111111 */
++--------------+
+| bitnot(-128) |
++--------------+
+| 127 |
++--------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__bitor">
+ <code class="ph codeph">bitor(integer_type a, same_type b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in either of the arguments.
+ If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitor()</code> function is equivalent to the <code class="ph codeph">|</code> binary operator.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the results of ORing integer values.
+ </p>
+<pre class="pre codeblock"><code>select bitor(1,4); /* 00000001 | 00000100 */
++-------------+
+| bitor(1, 4) |
++-------------+
+| 5 |
++-------------+
+
+select bitor(16,48); /* 00001000 | 00011000 */
++---------------+
+| bitor(16, 48) |
++---------------+
+| 48 |
++---------------+
+
+select bitor(0,7); /* 00000000 | 00000111 */
++-------------+
+| bitor(0, 7) |
++-------------+
+| 7 |
++-------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__bitxor">
+ <code class="ph codeph">bitxor(integer_type a, same_type b)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns an integer value representing the bits that are set to 1 in one but not both of the arguments.
+ If the arguments are of different sizes, the smaller is promoted to the type of the larger.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> The <code class="ph codeph">bitxor()</code> function is equivalent to the <code class="ph codeph">^</code> binary operator.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the results of XORing integer values.
+ XORing a non-zero value with zero returns the non-zero value.
+ XORing two identical values returns zero, because all the 1 bits from the first argument are also 1 bits in the second argument.
+ XORing different non-zero values turns off some bits and leaves others turned on, based on whether the same bit is set in both arguments.
+ </p>
+<pre class="pre codeblock"><code>select bitxor(0,15); /* 00000000 ^ 00001111 */
++---------------+
+| bitxor(0, 15) |
++---------------+
+| 15 |
++---------------+
+
+select bitxor(7,7); /* 00000111 ^ 00000111 */
++--------------+
+| bitxor(7, 7) |
++--------------+
+| 0 |
++--------------+
+
+select bitxor(8,4); /* 00001000 ^ 00000100 */
++--------------+
+| bitxor(8, 4) |
++--------------+
+| 12 |
++--------------+
+
+select bitxor(3,7); /* 00000011 ^ 00000111 */
++--------------+
+| bitxor(3, 7) |
++--------------+
+| 4 |
++--------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__countset">
+ <code class="ph codeph">countset(integer_type a [, int zero_or_one])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> By default, returns the number of 1 bits in the specified integer value.
+ If the optional second argument is set to zero, it returns the number of 0 bits instead.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In discussions of information theory, this operation is referred to as the
+ <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Hamming_weight" target="_blank">population count</a>"</span>
+ or <span class="q">"popcount"</span>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how to count the number of 1 bits in an integer value.
+ </p>
+<pre class="pre codeblock"><code>select countset(1); /* 00000001 */
++-------------+
+| countset(1) |
++-------------+
+| 1 |
++-------------+
+
+select countset(3); /* 00000011 */
++-------------+
+| countset(3) |
++-------------+
+| 2 |
++-------------+
+
+select countset(16); /* 00010000 */
++--------------+
+| countset(16) |
++--------------+
+| 1 |
++--------------+
+
+select countset(17); /* 00010001 */
++--------------+
+| countset(17) |
++--------------+
+| 2 |
++--------------+
+
+select countset(7,1); /* 00000111 = 3 1 bits; the function counts 1 bits by default */
++----------------+
+| countset(7, 1) |
++----------------+
+| 3 |
++----------------+
+
+select countset(7,0); /* 00000111 = 5 0 bits; third argument can only be 0 or 1 */
++----------------+
+| countset(7, 0) |
++----------------+
+| 5 |
++----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__getbit">
+ <code class="ph codeph">getbit(integer_type a, int position)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a 0 or 1 representing the bit at a
+ specified position. The positions are numbered right to left, starting at zero.
+ The position argument cannot be negative.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ When you use a literal input value, it is treated as an 8-bit, 16-bit,
+ and so on value, the smallest type that is appropriate.
+ The type of the input value limits the range of the positions.
+ Cast the input value to the appropriate type if you need to
+ ensure it is treated as a 64-bit, 32-bit, and so on value.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how to test a specific bit within an integer value.
+ </p>
+<pre class="pre codeblock"><code>select getbit(1,0); /* 00000001 */
++--------------+
+| getbit(1, 0) |
++--------------+
+| 1 |
++--------------+
+
+select getbit(16,1) /* 00010000 */
++---------------+
+| getbit(16, 1) |
++---------------+
+| 0 |
++---------------+
+
+select getbit(16,4) /* 00010000 */
++---------------+
+| getbit(16, 4) |
++---------------+
+| 1 |
++---------------+
+
+select getbit(16,5) /* 00010000 */
++---------------+
+| getbit(16, 5) |
++---------------+
+| 0 |
++---------------+
+
+select getbit(-1,3); /* 11111111 */
++---------------+
+| getbit(-1, 3) |
++---------------+
+| 1 |
++---------------+
+
+select getbit(-1,25); /* 11111111 */
+ERROR: Invalid bit position: 25
+
+select getbit(cast(-1 as int),25); /* 11111111111111111111111111111111 */
++-----------------------------+
+| getbit(cast(-1 as int), 25) |
++-----------------------------+
+| 1 |
++-----------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__rotateleft">
+ <code class="ph codeph">rotateleft(integer_type a, int positions)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Rotates an integer value left by a specified number of bits.
+ As the most significant bit is taken out of the original value,
+ if it is a 1 bit, it is <span class="q">"rotated"</span> back to the least significant bit.
+ Therefore, the final value has the same number of 1 bits as the original value,
+ just in different positions.
+ In computer science terms, this operation is a
+ <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Specifying a second argument of zero leaves the original value unchanged.
+ Rotating a -1 value by any number of positions still returns -1,
+ because the original value has all 1 bits and all the 1 bits are
+ preserved during rotation.
+ Similarly, rotating a 0 value by any number of positions still returns 0.
+ Rotating a value by the same number of bits as in the value returns the same value.
+ Because this is a circular operation, the number of positions is not limited
+ to the number of bits in the input value.
+ For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+ identical result in each case.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select rotateleft(1,4); /* 00000001 -> 00010000 */
++------------------+
+| rotateleft(1, 4) |
++------------------+
+| 16 |
++------------------+
+
+select rotateleft(-1,155); /* 11111111 -> 11111111 */
++---------------------+
+| rotateleft(-1, 155) |
++---------------------+
+| -1 |
++---------------------+
+
+select rotateleft(-128,1); /* 10000000 -> 00000001 */
++---------------------+
+| rotateleft(-128, 1) |
++---------------------+
+| 1 |
++---------------------+
+
+select rotateleft(-127,3); /* 10000001 -> 00001100 */
++---------------------+
+| rotateleft(-127, 3) |
++---------------------+
+| 12 |
++---------------------+
+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__rotateright">
+ <code class="ph codeph">rotateright(integer_type a, int positions)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Rotates an integer value right by a specified number of bits.
+ As the least significant bit is taken out of the original value,
+ if it is a 1 bit, it is <span class="q">"rotated"</span> back to the most significant bit.
+ Therefore, the final value has the same number of 1 bits as the original value,
+ just in different positions.
+ In computer science terms, this operation is a
+ <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Circular_shift" target="_blank">circular shift</a>"</span>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Specifying a second argument of zero leaves the original value unchanged.
+ Rotating a -1 value by any number of positions still returns -1,
+ because the original value has all 1 bits and all the 1 bits are
+ preserved during rotation.
+ Similarly, rotating a 0 value by any number of positions still returns 0.
+ Rotating a value by the same number of bits as in the value returns the same value.
+ Because this is a circular operation, the number of positions is not limited
+ to the number of bits in the input value.
+ For example, rotating an 8-bit value by 1, 9, 17, and so on positions returns an
+ identical result in each case.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select rotateright(16,4); /* 00010000 -> 00000001 */
++--------------------+
+| rotateright(16, 4) |
++--------------------+
+| 1 |
++--------------------+
+
+select rotateright(-1,155); /* 11111111 -> 11111111 */
++----------------------+
+| rotateright(-1, 155) |
++----------------------+
+| -1 |
++----------------------+
+
+select rotateright(-128,1); /* 10000000 -> 01000000 */
++----------------------+
+| rotateright(-128, 1) |
++----------------------+
+| 64 |
++----------------------+
+
+select rotateright(-127,3); /* 10000001 -> 00110000 */
++----------------------+
+| rotateright(-127, 3) |
++----------------------+
+| 48 |
++----------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__setbit">
+ <code class="ph codeph">setbit(integer_type a, int position [, int zero_or_one])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> By default, changes a bit at a specified position to a 1, if it is not already.
+ If the optional third argument is set to zero, the specified bit is set to 0 instead.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ If the bit at the specified position was already 1 (by default)
+ or 0 (with a third argument of zero), the return value is
+ the same as the first argument.
+ The positions are numbered right to left, starting at zero.
+ (Therefore, the return value could be different from the first argument
+ even if the position argument is zero.)
+ The position argument cannot be negative.
+ <p class="p">
+ When you use a literal input value, it is treated as an 8-bit, 16-bit,
+ and so on value, the smallest type that is appropriate.
+ The type of the input value limits the range of the positions.
+ Cast the input value to the appropriate type if you need to
+ ensure it is treated as a 64-bit, 32-bit, and so on value.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select setbit(0,0); /* 00000000 -> 00000001 */
++--------------+
+| setbit(0, 0) |
++--------------+
+| 1 |
++--------------+
+
+select setbit(0,3); /* 00000000 -> 00001000 */
++--------------+
+| setbit(0, 3) |
++--------------+
+| 8 |
++--------------+
+
+select setbit(7,3); /* 00000111 -> 00001111 */
++--------------+
+| setbit(7, 3) |
++--------------+
+| 15 |
++--------------+
+
+select setbit(15,3); /* 00001111 -> 00001111 */
++---------------+
+| setbit(15, 3) |
++---------------+
+| 15 |
++---------------+
+
+select setbit(0,32); /* By default, 0 is a TINYINT with only 8 bits. */
+ERROR: Invalid bit position: 32
+
+select setbit(cast(0 as bigint),32); /* For BIGINT, the position can be 0..63. */
++-------------------------------+
+| setbit(cast(0 as bigint), 32) |
++-------------------------------+
+| 4294967296 |
++-------------------------------+
+
+select setbit(7,3,1); /* 00000111 -> 00001111; setting to 1 is the default */
++-----------------+
+| setbit(7, 3, 1) |
++-----------------+
+| 15 |
++-----------------+
+
+select setbit(7,2,0); /* 00000111 -> 00000011; third argument of 0 clears instead of sets */
++-----------------+
+| setbit(7, 2, 0) |
++-----------------+
+| 3 |
++-----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__shiftleft">
+ <code class="ph codeph">shiftleft(integer_type a, int positions)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Shifts an integer value left by a specified number of bits.
+ As the most significant bit is taken out of the original value,
+ it is discarded and the least significant bit becomes 0.
+ In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The final value has either the same number of 1 bits as the original value, or fewer.
+ Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+ a result of zero.
+ </p>
+ <p class="p">
+ Specifying a second argument of zero leaves the original value unchanged.
+ Shifting any value by 0 returns the original value.
+ Shifting any value by 1 is the same as multiplying it by 2,
+ as long as the value is small enough; larger values eventually
+ become negative when shifted, as the sign bit is set.
+ Starting with the value 1 and shifting it left by N positions gives
+ the same result as 2 to the Nth power, or <code class="ph codeph">pow(2,<var class="keyword varname">N</var>)</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select shiftleft(1,0); /* 00000001 -> 00000001 */
++-----------------+
+| shiftleft(1, 0) |
++-----------------+
+| 1 |
++-----------------+
+
+select shiftleft(1,3); /* 00000001 -> 00001000 */
++-----------------+
+| shiftleft(1, 3) |
++-----------------+
+| 8 |
++-----------------+
+
+select shiftleft(8,2); /* 00001000 -> 00100000 */
++-----------------+
+| shiftleft(8, 2) |
++-----------------+
+| 32 |
++-----------------+
+
+select shiftleft(127,1); /* 01111111 -> 11111110 */
++-------------------+
+| shiftleft(127, 1) |
++-------------------+
+| -2 |
++-------------------+
+
+select shiftleft(127,5); /* 01111111 -> 11100000 */
++-------------------+
+| shiftleft(127, 5) |
++-------------------+
+| -32 |
++-------------------+
+
+select shiftleft(-1,4); /* 11111111 -> 11110000 */
++------------------+
+| shiftleft(-1, 4) |
++------------------+
+| -16 |
++------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="bit_functions__shiftright">
+ <code class="ph codeph">shiftright(integer_type a, int positions)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Shifts an integer value right by a specified number of bits.
+ As the least significant bit is taken out of the original value,
+ it is discarded and the most significant bit becomes 0.
+ In computer science terms, this operation is a <span class="q">"<a class="xref" href="https://en.wikipedia.org/wiki/Logical_shift" target="_blank">logical shift</a>"</span>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Therefore, the final value has either the same number of 1 bits as the original value, or fewer.
+ Shifting an 8-bit value by 8 positions, a 16-bit value by 16 positions, and so on produces
+ a result of zero.
+ </p>
+ <p class="p">
+ Specifying a second argument of zero leaves the original value unchanged.
+ Shifting any value by 0 returns the original value.
+ Shifting any positive value right by 1 is the same as dividing it by 2.
+ Negative values become positive when shifted right.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the input value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select shiftright(16,0); /* 00010000 -> 00010000 */
++-------------------+
+| shiftright(16, 0) |
++-------------------+
+| 16 |
++-------------------+
+
+select shiftright(16,4); /* 00010000 -> 00000001 */
++-------------------+
+| shiftright(16, 4) |
++-------------------+
+| 1 |
++-------------------+
+
+select shiftright(16,5); /* 00010000 -> 00000000 */
++-------------------+
+| shiftright(16, 5) |
++-------------------+
+| 0 |
++-------------------+
+
+select shiftright(-1,1); /* 11111111 -> 01111111 */
++-------------------+
+| shiftright(-1, 1) |
++-------------------+
+| 127 |
++-------------------+
+
+select shiftright(-1,5); /* 11111111 -> 00000111 */
++-------------------+
+| shiftright(-1, 5) |
++-------------------+
+| 7 |
++-------------------+
+</code></pre>
+ </dd>
+
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_boolean.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_boolean.html b/docs/build3x/html/topics/impala_boolean.html
new file mode 100644
index 0000000..afbf2e3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_boolean.html
@@ -0,0 +1,170 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="boolean"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BOOLEAN Data Type</title></head><body id="boolean"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">BOOLEAN Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, representing a
+ single true/false choice.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> BOOLEAN</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> <code class="ph codeph">TRUE</code> or <code class="ph codeph">FALSE</code>. Do not use quotation marks around the
+ <code class="ph codeph">TRUE</code> and <code class="ph codeph">FALSE</code> literal values. You can write the literal values in
+ uppercase, lowercase, or mixed case. The values queried from a table are always returned in lowercase,
+ <code class="ph codeph">true</code> or <code class="ph codeph">false</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala does not automatically convert any other type to <code class="ph codeph">BOOLEAN</code>. All
+ conversions must use an explicit call to the <code class="ph codeph">CAST()</code> function.
+ </p>
+
+ <p class="p">
+ You can use <code class="ph codeph">CAST()</code> to convert
+
+ any integer or floating-point type to
+ <code class="ph codeph">BOOLEAN</code>: a value of 0 represents <code class="ph codeph">false</code>, and any non-zero value is converted
+ to <code class="ph codeph">true</code>.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(42 AS BOOLEAN) AS nonzero_int, CAST(99.44 AS BOOLEAN) AS nonzero_decimal,
+ CAST(000 AS BOOLEAN) AS zero_int, CAST(0.0 AS BOOLEAN) AS zero_decimal;
++-------------+-----------------+----------+--------------+
+| nonzero_int | nonzero_decimal | zero_int | zero_decimal |
++-------------+-----------------+----------+--------------+
+| true | true | false | false |
++-------------+-----------------+----------+--------------+
+</code></pre>
+
+ <p class="p">
+ When you cast the opposite way, from <code class="ph codeph">BOOLEAN</code> to a numeric type,
+ the result becomes either 1 or 0:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT CAST(true AS INT) AS true_int, CAST(true AS DOUBLE) AS true_double,
+ CAST(false AS INT) AS false_int, CAST(false AS DOUBLE) AS false_double;
++----------+-------------+-----------+--------------+
+| true_int | true_double | false_int | false_double |
++----------+-------------+-----------+--------------+
+| 1 | 1 | 0 | 0 |
++----------+-------------+-----------+--------------+
+</code></pre>
+
+ <p class="p">
+
+ You can cast <code class="ph codeph">DECIMAL</code> values to <code class="ph codeph">BOOLEAN</code>, with the same treatment of zero and
+ non-zero values as the other numeric types. You cannot cast a <code class="ph codeph">BOOLEAN</code> to a
+ <code class="ph codeph">DECIMAL</code>.
+ </p>
+
+ <p class="p">
+ You cannot cast a <code class="ph codeph">STRING</code> value to <code class="ph codeph">BOOLEAN</code>, although you can cast a
+ <code class="ph codeph">BOOLEAN</code> value to <code class="ph codeph">STRING</code>, returning <code class="ph codeph">'1'</code> for
+ <code class="ph codeph">true</code> values and <code class="ph codeph">'0'</code> for <code class="ph codeph">false</code> values.
+ </p>
+
+ <p class="p">
+ Although you can cast a <code class="ph codeph">TIMESTAMP</code> to a <code class="ph codeph">BOOLEAN</code> or a
+ <code class="ph codeph">BOOLEAN</code> to a <code class="ph codeph">TIMESTAMP</code>, the results are unlikely to be useful. Any non-zero
+ <code class="ph codeph">TIMESTAMP</code> (that is, any value other than <code class="ph codeph">1970-01-01 00:00:00</code>) becomes
+ <code class="ph codeph">TRUE</code> when converted to <code class="ph codeph">BOOLEAN</code>, while <code class="ph codeph">1970-01-01 00:00:00</code>
+ becomes <code class="ph codeph">FALSE</code>. A value of <code class="ph codeph">FALSE</code> becomes <code class="ph codeph">1970-01-01
+ 00:00:00</code> when converted to <code class="ph codeph">BOOLEAN</code>, and <code class="ph codeph">TRUE</code> becomes one second
+ past this epoch date, that is, <code class="ph codeph">1970-01-01 00:00:01</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">NULL considerations:</strong> An expression of this type produces a <code class="ph codeph">NULL</code> value if any
+ argument of the expression is <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong>
+ </p>
+
+ <p class="p">
+ Do not use a <code class="ph codeph">BOOLEAN</code> column as a partition key. Although you can create such a table,
+ subsequent operations produce errors:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table truth_table (assertion string) partitioned by (truth boolean);
+[localhost:21000] > insert into truth_table values ('Pigs can fly',false);
+ERROR: AnalysisException: INSERT into table with BOOLEAN partition column (truth) is not supported: partitioning.truth_table
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SELECT 1 < 2;
+SELECT 2 = 5;
+SELECT 100 < NULL, 100 > NULL;
+CREATE TABLE assertions (claim STRING, really BOOLEAN);
+INSERT INTO assertions VALUES
+ ("1 is less than 2", 1 < 2),
+ ("2 is the same as 5", 2 = 5),
+ ("Grass is green", true),
+ ("The moon is made of green cheese", false);
+SELECT claim FROM assertions WHERE really = TRUE;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+ and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong> <a class="xref" href="impala_literals.html#boolean_literals">Boolean Literals</a>,
+ <a class="xref" href="impala_operators.html#operators">SQL Operators</a>,
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
[31/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_float.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_float.html b/docs/build3x/html/topics/impala_float.html
new file mode 100644
index 0000000..53661d0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_float.html
@@ -0,0 +1,153 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="float"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>FLOAT Data Type</title></head><body id="float"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">FLOAT Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A single precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+ TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> FLOAT</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> 1.40129846432481707e-45 .. 3.40282346638528860e+38, positive or negative
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Precision:</strong> 6 to 9 significant digits, depending on usage. The number of significant digits does
+ not depend on the position of the decimal point.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Representation:</strong> The values are stored in 4 bytes, using
+ <a class="xref" href="https://en.wikipedia.org/wiki/Single-precision_floating-point_format" target="_blank">IEEE 754 Single Precision Binary Floating Point</a> format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala automatically converts <code class="ph codeph">FLOAT</code> to more precise
+ <code class="ph codeph">DOUBLE</code> values, but not the other way around. You can use <code class="ph codeph">CAST()</code> to convert
+ <code class="ph codeph">FLOAT</code> values to <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>,
+ <code class="ph codeph">BIGINT</code>, <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>.
+ You can use exponential notation in <code class="ph codeph">FLOAT</code> literals or when casting from
+ <code class="ph codeph">STRING</code>, for example <code class="ph codeph">1.0e6</code> to represent one million.
+ <span class="ph">
+ Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+
+ <p class="p">
+ Impala does not evaluate NaN (not a number) as equal to any other numeric values,
+ including other NaN values. For example, the following statement, which evaluates equality
+ between two NaN values, returns <code class="ph codeph">false</code>:
+ </p>
+
+<pre class="pre codeblock"><code>
+SELECT CAST('nan' AS FLOAT)=CAST('nan' AS FLOAT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x FLOAT);
+SELECT CAST(1000.5 AS FLOAT);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+ type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+ expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as a 4-byte value.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+ high-performance hardware instructions, and distributed queries can perform these operations in different
+ order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+ and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+ large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+ repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+ <p class="p">
+ The inability to exactly represent certain floating-point values means that
+ <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+ or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+ transferring data from other database systems that use different representations
+ or file formats.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+ and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+ <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_functions.html b/docs/build3x/html/topics/impala_functions.html
new file mode 100644
index 0000000..44fa0c2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_functions.html
@@ -0,0 +1,162 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_math_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_bit_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conversion_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datetime_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_conditional_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_string_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_misc_functions.html"><meta name="DC.Relation" scheme="URI" content=
"../topics/impala_aggregate_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_analytic_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_udf.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="builtins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Built-In Functions</title></head><body id="builtins"><main role="main"><article role="article" aria-labelledby="builtins__title_functions">
+
+ <h1 class="title topictitle1" id="builtins__title_functions">Impala Built-In Functions</h1>
+
+
+
+ <div class="body conbody">
+
+
+
+ <p class="p">
+ Impala supports several categories of built-in functions. These functions let you perform mathematical
+ calculations, string manipulation, date calculations, and other kinds of data transformations directly in
+ <code class="ph codeph">SELECT</code> statements. The built-in functions let a SQL query return results with all
+ formatting, calculating, and type conversions applied, rather than performing time-consuming postprocessing
+ in another application. By applying function calls where practical, you can make a SQL query that is as
+ convenient as an expression in a procedural programming language or a formula in a spreadsheet.
+ </p>
+
+ <p class="p">
+ The categories of functions supported by Impala are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+ </li>
+
+ <li class="li">
+ Aggregation functions, explained in <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+ </li>
+ </ul>
+
+ <p class="p">
+ You call any of these functions through the <code class="ph codeph">SELECT</code> statement. For most functions, you can
+ omit the <code class="ph codeph">FROM</code> clause and supply literal values for any required arguments:
+ </p>
+
+<pre class="pre codeblock"><code>select abs(-1);
++---------+
+| abs(-1) |
++---------+
+| 1 |
++---------+
+
+select concat('The rain ', 'in Spain');
++---------------------------------+
+| concat('the rain ', 'in spain') |
++---------------------------------+
+| The rain in Spain |
++---------------------------------+
+
+select power(2,5);
++-------------+
+| power(2, 5) |
++-------------+
+| 32 |
++-------------+
+</code></pre>
+
+ <p class="p">
+ When you use a <code class="ph codeph">FROM</code> clause and specify a column name as a function argument, the function is
+ applied for each item in the result set:
+ </p>
+
+
+
+<pre class="pre codeblock"><code>select concat('Country = ',country_code) from all_countries where population > 100000000;
+select round(price) as dollar_value from product_catalog where price between 0.0 and 100.0;
+</code></pre>
+
+ <p class="p">
+ Typically, if any argument to a built-in function is <code class="ph codeph">NULL</code>, the result value is also
+ <code class="ph codeph">NULL</code>:
+ </p>
+
+<pre class="pre codeblock"><code>select cos(null);
++-----------+
+| cos(null) |
++-----------+
+| NULL |
++-----------+
+
+select power(2,null);
++----------------+
+| power(2, null) |
++----------------+
+| NULL |
++----------------+
+
+select concat('a',null,'b');
++------------------------+
+| concat('a', null, 'b') |
++------------------------+
+| NULL |
++------------------------+
+</code></pre>
+
+ <p class="p">
+ Aggregate functions are a special category with different rules. These functions calculate a return value
+ across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+ </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age > 20;
+</code></pre>
+
+ <p class="p">
+ Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+ result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+ ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+ <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+ <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+ </p>
+
+ <p class="p">
+ Aggregate functions are a special category with different rules. These functions calculate a return value
+ across all the items in a result set, so they do require a <code class="ph codeph">FROM</code> clause in the query:
+ </p>
+
+
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age > 20;
+</code></pre>
+
+ <p class="p">
+ Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+ result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are ignored
+ when computing the AVG() for that column. Likewise, specifying <code class="ph codeph">COUNT(col_name)</code> in a query
+ counts only those rows where <code class="ph codeph">col_name</code> contains a non-<code class="ph codeph">NULL</code> value.
+ </p>
+
+ <p class="p">
+ Analytic functions are a variation on aggregate functions. Instead of returning a single value, or an
+ identical value for each group of rows, they can compute values that vary based on a <span class="q">"window"</span> consisting
+ of other rows around them in the result set.
+ </p>
+
+ <p class="p toc"></p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_math_functions.html">Impala Mathematical Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_bit_functions.html">Impala Bit Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conversion_functions.html">Impala Type Conversion Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_datetime_functions.html">Impala Date and Time Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_conditional_functions.html">Impala Conditional Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_string_functions.html">Impala String Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_misc_functions.html">Impala Miscellaneous Functions</
a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_analytic_functions.html">Impala Analytic Functions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_udf.html">Impala User-Defined Functions (UDFs)</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_functions_overview.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_functions_overview.html b/docs/build3x/html/topics/impala_functions_overview.html
new file mode 100644
index 0000000..fef454e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_functions_overview.html
@@ -0,0 +1,109 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Functions</title></head><body id="functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Functions let you apply arithmetic, string, or other computations and transformations to Impala data. You
+ typically use them in <code class="ph codeph">SELECT</code> lists and <code class="ph codeph">WHERE</code> clauses to filter and format
+ query results so that the result set is exactly what you want, with no further processing needed on the
+ application side.
+ </p>
+
+ <p class="p">
+ Scalar functions return a single result for each input row. See <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select name, population from country where continent = 'North America' order by population desc limit 4;
+[localhost:21000] > select upper(name), population from country where continent = 'North America' order by population desc limit 4;
++-------------+------------+
+| upper(name) | population |
++-------------+------------+
+| USA | 320000000 |
+| MEXICO | 122000000 |
+| CANADA | 25000000 |
+| GUATEMALA | 16000000 |
++-------------+------------+
+</code></pre>
+ <p class="p">
+ Aggregate functions combine the results from multiple rows:
+ either a single result for the entire table, or a separate result for each group of rows.
+ Aggregate functions are frequently used in combination with <code class="ph codeph">GROUP BY</code>
+ and <code class="ph codeph">HAVING</code> clauses in the <code class="ph codeph">SELECT</code> statement.
+ See <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select continent, <strong class="ph b">sum(population)</strong> as howmany from country <strong class="ph b">group by continent</strong> order by howmany desc;
++---------------+------------+
+| continent | howmany |
++---------------+------------+
+| Asia | 4298723000 |
+| Africa | 1110635000 |
+| Europe | 742452000 |
+| North America | 565265000 |
+| South America | 406740000 |
+| Oceania | 38304000 |
++---------------+------------+
+</code></pre>
+
+ <p class="p">
+ User-defined functions (UDFs) let you code your own logic. They can be either scalar or aggregate functions.
+ UDFs let you implement important business or scientific logic using high-performance code for Impala to automatically parallelize.
+ You can also use UDFs to implement convenience functions to simplify reporting or porting SQL from other database systems.
+ See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select <strong class="ph b">rot13('Hello world!')</strong> as 'Weak obfuscation';
++------------------+
+| weak obfuscation |
++------------------+
+| Uryyb jbeyq! |
++------------------+
+[localhost:21000] > select <strong class="ph b">likelihood_of_new_subatomic_particle(sensor1, sensor2, sensor3)</strong> as probability
+ > from experimental_results group by experiment;
+</code></pre>
+
+ <p class="p">
+ Each function is associated with a specific database. For example, if you issue a <code class="ph codeph">USE somedb</code>
+ statement followed by <code class="ph codeph">CREATE FUNCTION somefunc</code>, the new function is created in the
+ <code class="ph codeph">somedb</code> database, and you could refer to it through the fully qualified name
+ <code class="ph codeph">somedb.somefunc</code>. You could then issue another <code class="ph codeph">USE</code> statement
+ and create a function with the same name in a different database.
+ </p>
+
+ <p class="p">
+ Impala built-in functions are associated with a special database named <code class="ph codeph">_impala_builtins</code>,
+ which lets you refer to them from any database without qualifying the name.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > show databases;
++-------------------------+
+| name |
++-------------------------+
+| <strong class="ph b">_impala_builtins</strong> |
+| analytic_functions |
+| avro_testing |
+| data_file_size |
+...
+[localhost:21000] > show functions in _impala_builtins like '*subs*';
++-------------+-----------------------------------+
+| return type | signature |
++-------------+-----------------------------------+
+| STRING | substr(STRING, BIGINT) |
+| STRING | substr(STRING, BIGINT, BIGINT) |
+| STRING | substring(STRING, BIGINT) |
+| STRING | substring(STRING, BIGINT, BIGINT) |
++-------------+-----------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related statements:</strong> <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>,
+ <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_grant.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_grant.html b/docs/build3x/html/topics/impala_grant.html
new file mode 100644
index 0000000..33b0a45
--- /dev/null
+++ b/docs/build3x/html/topics/impala_grant.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="grant"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GRANT Statement (Impala 2.0 or higher only)</title></head><body id="grant"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">GRANT Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The
+ <code class="ph codeph">GRANT</code> statement grants a privilege on a specified object
+ to a role or grants a role to a group.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>GRANT ROLE <var class="keyword varname">role_name</var> TO GROUP <var class="keyword varname">group_name</var>
+
+GRANT <var class="keyword varname">privilege</var> ON <var class="keyword varname">object_type</var> <var class="keyword varname">object_name</var>
+ TO [ROLE] <var class="keyword varname">roleName</var>
+ [WITH GRANT OPTION]
+
+<span class="ph" id="grant__privileges">privilege ::= ALL | ALTER | CREATE | DROP | INSERT | REFRESH | SELECT | SELECT(<var class="keyword varname">column_name</var>)</span>
+<span class="ph" id="grant__priv_objs">object_type ::= TABLE | DATABASE | SERVER | URI</span>
+</code></pre>
+
+ <p class="p">
+ Typically, the object name is an identifier. For URIs, it is a string literal.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Required privileges:</strong>
+ </p>
+
+ <p class="p">
+ Only administrative users (initially, a predefined set of users
+ specified in the Sentry service configuration file) can use this
+ statement.
+ </p>
+ <p class="p">Only Sentry administrative users can grant roles to a group. </p>
+
+ <p class="p"> The <code class="ph codeph">WITH GRANT OPTION</code> clause allows members of the
+ specified role to issue <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code>
+ statements for those same privileges Hence, if a role has the
+ <code class="ph codeph">ALL</code> privilege on a database and the <code class="ph codeph">WITH GRANT
+ OPTION</code> set, users granted that role can execute
+ <code class="ph codeph">GRANT</code>/<code class="ph codeph">REVOKE</code> statements only for that
+ database or child tables of the database. This means a user could revoke
+ the privileges of the user that provided them the <code class="ph codeph">GRANT
+ OPTION</code>. </p>
+
+ <p class="p"> Impala does not currently support revoking only the <code class="ph codeph">WITH GRANT
+ OPTION</code> from a privilege previously granted to a role. To remove
+ the <code class="ph codeph">WITH GRANT OPTION</code>, revoke the privilege and grant it
+ again without the <code class="ph codeph">WITH GRANT OPTION</code> flag. </p>
+
+ <p class="p">
+ The ability to grant or revoke <code class="ph codeph">SELECT</code> privilege on specific columns is available
+ in <span class="keyword">Impala 2.3</span> and higher. See <span class="xref">the documentation for Apache Sentry</span> for details.
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ You can only grant the <code class="ph codeph">ALL</code> privilege to the
+ <code class="ph codeph">URI</code> object. Finer-grained privileges mentioned below on
+ a <code class="ph codeph">URI</code> are not supported.
+ </p>
+
+ <div class="p">
+ Starting in <span class="keyword">Impala 3.0</span>, finer grained privileges
+ are enforced as below.<table class="simpletable frame-all" id="grant__simpletable_kmb_ppn_ndb"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><col style="width:33.33333333333333%"><thead><tr class="sthead">
+ <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__1">Privilege</th>
+ <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__2">Scope</th>
+ <th class="stentry" id="grant__simpletable_kmb_ppn_ndb__stentry__3">SQL Allowed to Execute</th>
+ </tr></thead><tbody><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on all tables in all
+ databases<p class="p"><code class="ph codeph">REFRESH</code> on all tables and functions
+ in all databases</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on all tables in the
+ named database<p class="p"><code class="ph codeph">REFRESH</code> on all tables and
+ functions in the named database</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">REFRESH</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">INVALIDATE METADATA</code> on the named
+ table<p class="p"><code class="ph codeph">REFRESH</code> on the named
+ table</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">CREATE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">CREATE DATABASE</code> on all
+ databases<p class="p"><code class="ph codeph">CREATE TABLE</code> on all
+ tables</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">CREATE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">CREATE TABLE</code> on all tables in the named
+ database</td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP DATBASE</code> on all databases<p class="p"><code class="ph codeph">DROP
+ TABLE</code> on all tables</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP DATABASE</code> on the named
+ database<p class="p"><code class="ph codeph">DROP TABLE</code> on all tables in the
+ named database</p></td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">DROP</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">DROP TABLE</code> on the named table</td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">SERVER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on all tables</td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">DATABASE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on the tables in the named
+ database</td>
+ </tr><tr class="strow">
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__1"><code class="ph codeph">ALTER</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__2"><code class="ph codeph">TABLE</code></td>
+ <td class="stentry" headers="grant__simpletable_kmb_ppn_ndb__stentry__3"><code class="ph codeph">ALTER TABLE</code> on the named table</td>
+ </tr></tbody></table>
+ </div>
+
+ <div class="p">
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <div class="p">
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">ALTER TABLE RENAME</code> requires the
+ <code class="ph codeph">ALTER</code> privilege at the <code class="ph codeph">TABLE</code>
+ level and the <code class="ph codeph">CREATE</code> privilege at the
+ <code class="ph codeph">DATABASE</code> level.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> requires the
+ <code class="ph codeph">CREATE</code> privilege on the database that should
+ contain the new table and the <code class="ph codeph">SELECT</code> privilege on
+ the tables referenced in the query portion of the statement.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">COMPUTE STATS</code> requires the
+ <code class="ph codeph">ALTER</code> and <code class="ph codeph">SELECT</code> privileges on
+ the target table.
+ </li>
+ </ul>
+ </div>
+ </div>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <div class="p">
+ <ul class="ul">
+ <li class="li">
+ The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements are available in
+ <span class="keyword">Impala 2.0</span> and later.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 1.4</span> and later, Impala can make use of any roles and privileges specified by the
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Hive, when your system is configured to
+ use the Sentry service instead of the file-based policy mechanism.
+ </li>
+
+ <li class="li">
+ The Impala <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements for privileges do not require
+ the <code class="ph codeph">ROLE</code> keyword to be repeated before each role name, unlike the equivalent Hive
+ statements.
+ </li>
+
+ <li class="li">
+ Currently, each Impala <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statement can only grant or
+ revoke a single privilege to or from a single role.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <div class="p">
+ Access to Kudu tables must be granted to and revoked from roles with the
+ following considerations:
+ <ul class="ul">
+ <li class="li">
+ Only users with the <code class="ph codeph">ALL</code> privilege on
+ <code class="ph codeph">SERVER</code> can create external Kudu tables.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> is
+ required to specify the <code class="ph codeph">kudu.master_addresses</code>
+ property in the <code class="ph codeph">CREATE TABLE</code> statements for managed
+ tables as well as external tables.
+ </li>
+ <li class="li">
+ Access to Kudu tables is enforced at the table level and at the
+ column level.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">SELECT</code>- and <code class="ph codeph">INSERT</code>-specific
+ permissions are supported.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">DELETE</code>, <code class="ph codeph">UPDATE</code>, and
+ <code class="ph codeph">UPSERT</code> operations require the <code class="ph codeph">ALL</code>
+ privilege.
+ </li>
+ </ul>
+ Because non-SQL APIs can access Kudu data without going through Sentry
+ authorization, currently the Sentry support is considered preliminary
+ and subject to change.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_drop_role.html#drop_role">DROP ROLE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_group_by.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_group_by.html b/docs/build3x/html/topics/impala_group_by.html
new file mode 100644
index 0000000..bcc6c1d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_group_by.html
@@ -0,0 +1,140 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP BY Clause</title></head><body id="group_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">GROUP BY Clause</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Specify the <code class="ph codeph">GROUP BY</code> clause in queries that use aggregation functions, such as
+ <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, and
+ <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>. Specify in the
+ <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause the names of all the
+ columns that do not participate in the aggregation operation.
+ </p>
+
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+ <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+ be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+ When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+ of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+ clause to control the order in the result set of the scalar elements from the complex type.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+ BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+ to all be different values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ For example, the following query finds the 5 items that sold the highest total quantity (using the
+ <code class="ph codeph">SUM()</code> function, and also counts the number of sales transactions for those items (using the
+ <code class="ph codeph">COUNT()</code> function). Because the column representing the item IDs is not used in any
+ aggregation functions, we specify that column in the <code class="ph codeph">GROUP BY</code> clause.
+ </p>
+
+<pre class="pre codeblock"><code>select
+ <strong class="ph b">ss_item_sk</strong> as Item,
+ <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+ <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+ <strong class="ph b">group by ss_item_sk</strong>
+ order by sum(ss_quantity) desc
+ limit 5;
++-------+-----------------+--------------------------+
+| item | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 9325 | 372 | 19072 |
+| 4279 | 357 | 18501 |
+| 7507 | 371 | 18475 |
+| 5953 | 369 | 18451 |
+| 16753 | 375 | 18446 |
++-------+-----------------+--------------------------+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">HAVING</code> clause lets you filter the results of aggregate functions, because you cannot
+ refer to those expressions in the <code class="ph codeph">WHERE</code> clause. For example, to find the 5 lowest-selling
+ items that were included in at least 100 sales transactions, we could use this query:
+ </p>
+
+<pre class="pre codeblock"><code>select
+ <strong class="ph b">ss_item_sk</strong> as Item,
+ <strong class="ph b">count</strong>(ss_item_sk) as Times_Purchased,
+ <strong class="ph b">sum</strong>(ss_quantity) as Total_Quantity_Purchased
+from store_sales
+ <strong class="ph b">group by ss_item_sk</strong>
+ <strong class="ph b">having times_purchased >= 100</strong>
+ order by sum(ss_quantity)
+ limit 5;
++-------+-----------------+--------------------------+
+| item | times_purchased | total_quantity_purchased |
++-------+-----------------+--------------------------+
+| 13943 | 105 | 4087 |
+| 2992 | 101 | 4176 |
+| 4773 | 107 | 4204 |
+| 14350 | 103 | 4260 |
+| 11956 | 102 | 4275 |
++-------+-----------------+--------------------------+</code></pre>
+
+ <p class="p">
+ When performing calculations involving scientific or financial data, remember that columns with type
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> are stored as true floating-point numbers, which cannot
+ precisely represent every possible fractional value. Thus, if you include a <code class="ph codeph">FLOAT</code> or
+ <code class="ph codeph">DOUBLE</code> column in a <code class="ph codeph">GROUP BY</code> clause, the results might not precisely match
+ literal values in your query or from an original Text data file. Use rounding operations, the
+ <code class="ph codeph">BETWEEN</code> operator, or another arithmetic technique to match floating-point values that are
+ <span class="q">"near"</span> literal values you expect. For example, this query on the <code class="ph codeph">ss_wholesale_cost</code>
+ column returns cost values that are close but not identical to the original figures that were entered as
+ decimal fractions.
+ </p>
+
+<pre class="pre codeblock"><code>select ss_wholesale_cost, avg(ss_quantity * ss_sales_price) as avg_revenue_per_sale
+ from sales
+ group by ss_wholesale_cost
+ order by avg_revenue_per_sale desc
+ limit 5;
++-------------------+----------------------+
+| ss_wholesale_cost | avg_revenue_per_sale |
++-------------------+----------------------+
+| 96.94000244140625 | 4454.351539300434 |
+| 95.93000030517578 | 4423.119941283189 |
+| 98.37999725341797 | 4332.516490316291 |
+| 97.97000122070312 | 4330.480601655014 |
+| 98.52999877929688 | 4291.316953108634 |
++-------------------+----------------------+</code></pre>
+
+ <p class="p">
+ Notice how wholesale cost values originally entered as decimal fractions such as <code class="ph codeph">96.94</code> and
+ <code class="ph codeph">98.38</code> are slightly larger or smaller in the result set, due to precision limitations in the
+ hardware floating-point types. The imprecise representation of <code class="ph codeph">FLOAT</code> and
+ <code class="ph codeph">DOUBLE</code> values is why financial data processing systems often store currency using data types
+ that are less space-efficient but avoid these types of rounding errors.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_group_concat.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_group_concat.html b/docs/build3x/html/topics/impala_group_concat.html
new file mode 100644
index 0000000..3a390c0
--- /dev/null
+++ b/docs/build3x/html/topics/impala_group_concat.html
@@ -0,0 +1,141 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="group_concat"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>GROUP_CONCAT Function</title></head><body id="group_concat"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">GROUP_CONCAT Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns a single string representing the argument value concatenated together for
+ each row of the result set. If the optional separator string is specified, the separator is added between
+ each pair of concatenated values. The default separator is a comma followed by a space.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>GROUP_CONCAT([ALL<span class="ph"> | DISTINCT</span>] <var class="keyword varname">expression</var> [, <var class="keyword varname">separator</var>])</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> <code class="ph codeph">concat()</code> and <code class="ph codeph">concat_ws()</code> are appropriate for
+ concatenating the values of multiple columns within the same row, while <code class="ph codeph">group_concat()</code>
+ joins together values from different rows.
+ </p>
+
+ <p class="p">
+ By default, returns a single string covering the whole result set. To include other columns or values in the
+ result set, or to produce multiple concatenated strings for subsets of rows, include a <code class="ph codeph">GROUP
+ BY</code> clause in the query.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">STRING</code>
+ </p>
+
+ <p class="p">
+ This function cannot be used in an analytic context. That is, the <code class="ph codeph">OVER()</code> clause is not allowed at all with this function.
+ </p>
+
+ <p class="p">
+ Currently, Impala returns an error if the result value grows larger than 1 GiB.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples illustrate various aspects of the <code class="ph codeph">GROUP_CONCAT()</code> function.
+ </p>
+
+ <p class="p">
+ You can call the function directly on a <code class="ph codeph">STRING</code> column. To use it with a numeric column, cast
+ the value to <code class="ph codeph">STRING</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int, s string);
+[localhost:21000] > insert into t1 values (1, "one"), (3, "three"), (2, "two"), (1, "one");
+[localhost:21000] > select group_concat(s) from t1;
++----------------------+
+| group_concat(s) |
++----------------------+
+| one, three, two, one |
++----------------------+
+[localhost:21000] > select group_concat(cast(x as string)) from t1;
++---------------------------------+
+| group_concat(cast(x as string)) |
++---------------------------------+
+| 1, 3, 2, 1 |
++---------------------------------+
+</code></pre>
+
+ <p class="p">
+ Specify the <code class="ph codeph">DISTINCT</code> keyword to eliminate duplicate values from
+ the concatenated result:
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > select group_concat(distinct s) from t1;
++--------------------------+
+| group_concat(distinct s) |
++--------------------------+
+| three, two, one |
++--------------------------+
+</code></pre>
+
+ <p class="p">
+ The optional separator lets you format the result in flexible ways. The separator can be an arbitrary string
+ expression, not just a single character.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select group_concat(s,"|") from t1;
++----------------------+
+| group_concat(s, '|') |
++----------------------+
+| one|three|two|one |
++----------------------+
+[localhost:21000] > select group_concat(s,'---') from t1;
++-------------------------+
+| group_concat(s, '---') |
++-------------------------+
+| one---three---two---one |
++-------------------------+
+</code></pre>
+
+ <p class="p">
+ The default separator is a comma followed by a space. To get a comma-delimited result without extra spaces,
+ specify a delimiter character that is only a comma.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select group_concat(s,',') from t1;
++----------------------+
+| group_concat(s, ',') |
++----------------------+
+| one,three,two,one |
++----------------------+
+</code></pre>
+
+ <p class="p">
+ Including a <code class="ph codeph">GROUP BY</code> clause lets you produce a different concatenated result for each group
+ in the result set. In this example, the only <code class="ph codeph">X</code> value that occurs more than once is
+ <code class="ph codeph">1</code>, so that is the only row in the result set where <code class="ph codeph">GROUP_CONCAT()</code> returns a
+ delimited value. For groups containing a single value, <code class="ph codeph">GROUP_CONCAT()</code> returns the original
+ value of its <code class="ph codeph">STRING</code> argument.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x, group_concat(s) from t1 group by x;
++---+-----------------+
+| x | group_concat(s) |
++---+-----------------+
+| 2 | two |
+| 3 | three |
+| 1 | one, one |
++---+-----------------+
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hadoop.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hadoop.html b/docs/build3x/html/topics/impala_hadoop.html
new file mode 100644
index 0000000..30c0a97
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hadoop.html
@@ -0,0 +1,138 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_hadoop"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>How Impala Fits Into the Hadoop Ecosystem</title></head><body id="intro_hadoop"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">How Impala Fits Into the Hadoop Ecosystem</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala makes use of many familiar components within the Hadoop ecosystem. Impala can interchange data with
+ other Hadoop components, as both a consumer and a producer, so it can fit in flexible ways into your ETL and
+ ELT pipelines.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_hadoop__intro_hive">
+
+ <h2 class="title topictitle2" id="ariaid-title2">How Impala Works with Hive</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ A major Impala goal is to make SQL-on-Hadoop operations fast and efficient enough to appeal to new
+ categories of users and open up Hadoop to new types of use cases. Where practical, it makes use of existing
+ Apache Hive infrastructure that many Hadoop users already have in place to perform long-running,
+ batch-oriented SQL queries.
+ </p>
+
+ <p class="p">
+ In particular, Impala keeps its table definitions in a traditional MySQL or PostgreSQL database known as
+ the <strong class="ph b">metastore</strong>, the same database where Hive keeps this type of data. Thus, Impala can access tables
+ defined or loaded by Hive, as long as all columns use Impala-supported data types, file formats, and
+ compression codecs.
+ </p>
+
+ <p class="p">
+ The initial focus on query features and performance means that Impala can read more types of data with the
+ <code class="ph codeph">SELECT</code> statement than it can write with the <code class="ph codeph">INSERT</code> statement. To query
+ data using the Avro, RCFile, or SequenceFile <a class="xref" href="impala_file_formats.html#file_formats">file
+ formats</a>, you load the data using Hive.
+ </p>
+
+ <p class="p">
+ The Impala query optimizer can also make use of <a class="xref" href="impala_perf_stats.html#perf_table_stats">table
+ statistics</a> and <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a>.
+ Originally, you gathered this information with the <code class="ph codeph">ANALYZE TABLE</code> statement in Hive; in
+ Impala 1.2.2 and higher, use the Impala <code class="ph codeph"><a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE
+ STATS</a></code> statement instead. <code class="ph codeph">COMPUTE STATS</code> requires less setup, is more
+ reliable, and does not require switching back and forth between <span class="keyword cmdname">impala-shell</span>
+ and the Hive shell.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_hadoop__intro_metastore">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Metadata and the Metastore</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ As discussed in <a class="xref" href="impala_hadoop.html#intro_hive">How Impala Works with Hive</a>, Impala maintains information about table
+ definitions in a central database known as the <strong class="ph b">metastore</strong>. Impala also tracks other metadata for the
+ low-level characteristics of data files:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The physical locations of blocks within HDFS.
+ </li>
+ </ul>
+
+ <p class="p">
+ For tables with a large volume of data and/or many partitions, retrieving all the metadata for a table can
+ be time-consuming, taking minutes in some cases. Thus, each Impala node caches all of this metadata to
+ reuse for future queries against the same table.
+ </p>
+
+ <p class="p">
+ If the table definition or the data in the table is updated, all other Impala daemons in the cluster must
+ receive the latest metadata, replacing the obsolete cached metadata, before issuing a query against that
+ table. In Impala 1.2 and higher, the metadata update is automatic, coordinated through the
+ <span class="keyword cmdname">catalogd</span> daemon, for all DDL and DML statements issued through Impala. See
+ <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for details.
+ </p>
+
+ <p class="p">
+ For DDL and DML issued through Hive, or changes made manually to files in HDFS, you still use the
+ <code class="ph codeph">REFRESH</code> statement (when new data files are added to existing tables) or the
+ <code class="ph codeph">INVALIDATE METADATA</code> statement (for entirely new tables, or after dropping a table,
+ performing an HDFS rebalance operation, or deleting data files). Issuing <code class="ph codeph">INVALIDATE
+ METADATA</code> by itself retrieves metadata for all the tables tracked by the metastore. If you know
+ that only specific tables have been changed outside of Impala, you can issue <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> for each affected table to only retrieve the latest metadata for
+ those tables.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_hadoop__intro_hdfs">
+
+ <h2 class="title topictitle2" id="ariaid-title4">How Impala Uses HDFS</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala uses the distributed filesystem HDFS as its primary data storage medium. Impala relies on the
+ redundancy provided by HDFS to guard against hardware or network outages on individual nodes. Impala table
+ data is physically represented as data files in HDFS, using familiar HDFS file formats and compression
+ codecs. When data files are present in the directory for a new table, Impala reads them all, regardless of
+ file name. New data is added in files with names controlled by Impala.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="intro_hadoop__intro_hbase">
+
+ <h2 class="title topictitle2" id="ariaid-title5">How Impala Uses HBase</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ HBase is an alternative to HDFS as a storage medium for Impala data. It is a database storage system built
+ on top of HDFS, without built-in SQL support. Many Hadoop users already have it configured and store large
+ (often sparse) data sets in it. By defining tables in Impala and mapping them to equivalent tables in
+ HBase, you can query the contents of the HBase tables through Impala, and even perform join queries
+ including both Impala and HBase tables. See <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_having.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_having.html b/docs/build3x/html/topics/impala_having.html
new file mode 100644
index 0000000..dd255ab
--- /dev/null
+++ b/docs/build3x/html/topics/impala_having.html
@@ -0,0 +1,39 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="having"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HAVING Clause</title></head><body id="having"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">HAVING Clause</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Performs a filter operation on a <code class="ph codeph">SELECT</code> query, by examining the results of aggregation
+ functions rather than testing each individual table row. Therefore, it is always used in conjunction with a
+ function such as <code class="ph codeph"><a class="xref" href="impala_count.html#count">COUNT()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_sum.html#sum">SUM()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_avg.html#avg">AVG()</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_min.html#min">MIN()</a></code>, or
+ <code class="ph codeph"><a class="xref" href="impala_max.html#max">MAX()</a></code>, and typically with the
+ <code class="ph codeph"><a class="xref" href="impala_group_by.html#group_by">GROUP BY</a></code> clause also.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ The filter expression in the <code class="ph codeph">HAVING</code> clause cannot include a scalar subquery.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_select.html#select">SELECT Statement</a>,
+ <a class="xref" href="impala_group_by.html#group_by">GROUP BY Clause</a>,
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
[29/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_impala_shell.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_impala_shell.html b/docs/build3x/html/topics/impala_impala_shell.html
new file mode 100644
index 0000000..42b01e7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_impala_shell.html
@@ -0,0 +1,87 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_options.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_connecting.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_running_commands.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_shell_commands.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_shell"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Impala Shell (impala-shell Command)</title></head><body id=
"impala_shell"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using the Impala Shell (impala-shell Command)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You can use the Impala shell tool (<code class="ph codeph">impala-shell</code>) to set up databases and tables, insert
+ data, and issue queries. For ad hoc queries and exploration, you can submit SQL statements in an interactive
+ session. To automate your work, you can specify command-line options to process a single statement or a
+ script file. The <span class="keyword cmdname">impala-shell</span> interpreter accepts all the same SQL statements listed in
+ <a class="xref" href="impala_langref_sql.html#langref_sql">Impala SQL Statements</a>, plus some shell-only commands that you can use for tuning
+ performance and diagnosing problems.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">impala-shell</code> command fits into the familiar Unix toolchain:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">-q</code> option lets you issue a single query from the command line, without starting the
+ interactive interpreter. You could use this option to run <code class="ph codeph">impala-shell</code> from inside a shell
+ script or with the command invocation syntax from a Python, Perl, or other kind of script.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">-f</code> option lets you process a file containing multiple SQL statements,
+ such as a set of reports or DDL statements to create a group of tables and views.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">--var</code> option lets you pass substitution variables to the statements that
+ are executed by that <span class="keyword cmdname">impala-shell</span> session, for example the statements
+ in a script file processed by the <code class="ph codeph">-f</code> option. You encode the substitution variable
+ on the command line using the notation
+ <code class="ph codeph">--var=<var class="keyword varname">variable_name</var>=<var class="keyword varname">value</var></code>.
+ Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+ This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">-o</code> option lets you save query output to a file.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">-B</code> option turns off pretty-printing, so that you can produce comma-separated,
+ tab-separated, or other delimited text files as output. (Use the <code class="ph codeph">--output_delimiter</code> option
+ to choose the delimiter character; the default is the tab character.)
+ </li>
+
+ <li class="li">
+ In non-interactive mode, query output is printed to <code class="ph codeph">stdout</code> or to the file specified by the
+ <code class="ph codeph">-o</code> option, while incidental output is printed to <code class="ph codeph">stderr</code>, so that you can
+ process just the query output as part of a Unix pipeline.
+ </li>
+
+ <li class="li">
+ In interactive mode, <code class="ph codeph">impala-shell</code> uses the <code class="ph codeph">readline</code> facility to recall
+ and edit previous commands.
+ </li>
+ </ul>
+
+ <p class="p">
+ For information on installing the Impala shell, see <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+ </p>
+
+ <p class="p">
+ For information about establishing a connection to a DataNode running the <code class="ph codeph">impalad</code> daemon
+ through the <code class="ph codeph">impala-shell</code> command, see <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a>.
+ </p>
+
+ <p class="p">
+ For a list of the <code class="ph codeph">impala-shell</code> command-line options, see
+ <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>. For reference information about the
+ <code class="ph codeph">impala-shell</code> interactive commands, see
+ <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a>.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_shell_options.html">impala-shell Configuration Options</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_connecting.html">Connecting to impalad through impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_shell_commands.html">impala-shell Command Reference</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_incompatible_changes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_incompatible_changes.html b/docs/build3x/html/topics/impala_incompatible_changes.html
new file mode 100644
index 0000000..3d25658
--- /dev/null
+++ b/docs/build3x/html/topics/impala_incompatible_changes.html
@@ -0,0 +1,1526 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="incompatible_changes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Incompatible Changes and Limitations in Apache Impala</title></head><body id="incompatible_changes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Incompatible Changes and Limitations in Apache Impala</span></h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala version covered by this documentation library contains the following incompatible changes. These
+ are things such as file format changes, removed features, or changes to implementation, default
+ configuration, dependencies, or prerequisites that could cause issues during or after an Impala upgrade.
+ </p>
+
+ <p class="p">
+ Even added SQL statements or clauses can produce incompatibilities, if you have databases, tables, or columns
+ whose names conflict with the new keywords. <span class="ph">See
+ <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the set of reserved words for the current
+ release, and the quoting techniques to avoid name conflicts.</span>
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="incompatible_changes__incompatible_changes_300x">
+ <h2 class="title topictitle2" id="ariaid-title2">Incompatible Changes Introduced in Impala 3.0.x</h2>
+ <div class="body conbody">
+ <p class="p"> For the full list of issues closed in this release, including any that
+ introduce behavior changes or incompatibilities, see the <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>. </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="incompatible_changes__incompatible_changes_212x">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Incompatible Changes Introduced in Impala 2.12.x</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including any that introduce
+ behavior changes or incompatibilities, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="incompatible_changes__incompatible_changes_211x">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Incompatible Changes Introduced in Impala 2.11.x</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including any that introduce
+ behavior changes or incompatibilities, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="incompatible_changes__incompatible_changes_210x">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Incompatible Changes Introduced in Impala 2.10.x</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including any that introduce
+ behavior changes or incompatibilities, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="incompatible_changes__incompatible_changes_29x">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Incompatible Changes Introduced in Impala 2.9.x</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including any that introduce
+ behavior changes or incompatibilities, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+ </p>
+
+
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="incompatible_changes__incompatible_changes_28x">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Incompatible Changes Introduced in Impala 2.8.x</h2>
+
+ <div class="body conbody">
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Llama support is removed completely from Impala. Related flags (<code class="ph codeph">--enable_rm</code>)
+ and query options (such as <code class="ph codeph">V_CPU_CORES</code>) remain but do not have any effect.
+ </p>
+ <p class="p">
+ If <code class="ph codeph">--enable_rm</code> is passed to Impala, a warning is printed to the log on startup.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The syntax related to Kudu tables includes a number of new reserved words,
+ such as <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">ENCODING</code>, that
+ might conflict with names of existing tables, columns, or other identifiers from older Impala versions.
+ See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the full list of reserved words.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The DDL syntax for Kudu tables, particularly in the <code class="ph codeph">CREATE TABLE</code> statement, is different
+ from the special <code class="ph codeph">impala_next</code> fork that was previously used for accessing Kudu tables
+ from Impala:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DISTRIBUTE BY</code> clause is now <code class="ph codeph">PARTITIONED BY</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INTO <var class="keyword varname">N</var> BUCKETS</code>
+ clause is now <code class="ph codeph">PARTITIONS <var class="keyword varname">N</var></code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">SPLIT ROWS</code> clause is replaced by different syntax for specifying
+ the ranges covered by each partition.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DESCRIBE</code> output for Kudu tables includes several extra columns.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Non-primary-key columns can contain <code class="ph codeph">NULL</code> values by default. The
+ <code class="ph codeph">SHOW CREATE TABLE</code> output for these columns displays the <code class="ph codeph">NULL</code>
+ attribute. There was a period during early experimental versions of Impala + Kudu where
+ non-primary-key columns had the <code class="ph codeph">NOT NULL</code> attribute by default.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">IGNORE</code> keyword that was present in early experimental versions of Impala + Kudu
+ is no longer present. The behavior of the <code class="ph codeph">IGNORE</code> keyword is now the default:
+ DML statements continue with warnings, instead of failing with errors, if they encounter conditions
+ such as <span class="q">"primary key already exists"</span> for an <code class="ph codeph">INSERT</code> statement or
+ <span class="q">"primary key already deleted"</span> for a <code class="ph codeph">DELETE</code> statement.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The replication factor for Kudu tables must be an odd number.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A UDF compiled into an LLVM IR bitcode module (<code class="ph codeph">.bc</code>) might
+ encounter a runtime error when native code generation is turned off by
+ setting the query option <code class="ph codeph">DISABLE_CODEGEN=1</code>.
+ This issue also applies when running a built-in or native UDF with
+ more than 20 arguments.
+ See <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4432" target="_blank">IMPALA-4432</a> for details.
+ As a workaround, either turn native code generation back on with the query option
+ <code class="ph codeph">DISABLE_CODEGEN=0</code>, or use the regular UDF compilation path
+ that does not produce an IR module.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="incompatible_changes__incompatible_changes_27x">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Incompatible Changes Introduced in Impala 2.7.x</h2>
+
+ <div class="body conbody">
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Bug fixes related to parsing of floating-point values (IMPALA-1731 and IMPALA-3868) can change
+ the results of casting strings that represent invalid floating-point values.
+ For example, formerly a string value beginning or ending with <code class="ph codeph">inf</code>,
+ such as <code class="ph codeph">1.23inf</code> or <code class="ph codeph">infinite</code>, now are converted to <code class="ph codeph">NULL</code>
+ when interpreted as a floating-point value.
+ Formerly, they were interpreted as the special <span class="q">"infinity"</span> value when converting from string to floating-point.
+ Similarly, now only the string <code class="ph codeph">NaN</code> (case-sensitive) is interpreted as the special <span class="q">"not a number"</span>
+ value. String values containing multiple dots, such as <code class="ph codeph">3..141</code> or <code class="ph codeph">3.1.4.1</code>,
+ are now interpreted as <code class="ph codeph">NULL</code> rather than being converted to valid floating-point values.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="incompatible_changes__incompatible_changes_26x">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Incompatible Changes Introduced in Impala 2.6.x</h2>
+
+ <div class="body conbody">
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+ query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+ as a fallback if statistics are not available; otherwise, Impala
+ uses the statistics to estimate the appropriate size to use for each filter.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Admission control and dynamic resource pools are enabled by default.
+ When upgrading from an earlier release, you must turn on these settings yourself
+ if they are not already enabled.
+ See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+ about admission control.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala reserves some new keywords, in preparation for support for Kudu syntax:
+ <code class="ph codeph">buckets</code>, <code class="ph codeph">delete</code>, <code class="ph codeph">distribute</code>,
+ <code class="ph codeph">hash</code>, <code class="ph codeph">ignore</code>, <code class="ph codeph">split</code>, and <code class="ph codeph">update</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ For Kerberized clusters, the Catalog service now uses
+ the Kerberos principal instead of the operating sytem user that runs
+ the <span class="keyword cmdname">catalogd</span> daemon.
+ This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+ setting to put the OS user into the Sentry administrative group, on clusters where the principal
+ and the OS user name for this user are different.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+ improved, no longer going through an intermediate conversion step
+ to <code class="ph codeph">DOUBLE</code>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+ <code class="ph codeph">DOUBLE</code> produces a more precise
+ value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+ now succeed, when formerly they failed due to lack of a function
+ signature with a <code class="ph codeph">DOUBLE</code> argument.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+ If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+ expression are of <code class="ph codeph">CHAR</code> type, the final result
+ is also <code class="ph codeph">CHAR</code> instead of being converted to
+ <code class="ph codeph">STRING</code>.
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ The initial release of <span class="keyword">Impala 2.5</span> sometimes has a higher peak memory usage than in previous releases
+ while reading Parquet files.
+ The following query options might help to reduce memory consumption in the Parquet scanner:
+ <ul class="ul">
+ <li class="li">
+ Reduce the number of scanner threads, for example: <code class="ph codeph">set num_scanner_threads=30</code>
+ </li>
+ <li class="li">
+ Reduce the batch size, for example: <code class="ph codeph">set batch_size=512</code>
+ </li>
+ <li class="li">
+ Increase the memory limit, for example: <code class="ph codeph">set mem_limit=64g</code>
+ </li>
+ </ul>
+ You can track the status of the fix for this issue at
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3662" target="_blank">IMPALA-3662</a>.
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option, which is enabled by
+ default, increases the speed of <code class="ph codeph">INSERT</code> operations for S3 tables.
+ The speedup applies to regular <code class="ph codeph">INSERT</code>, but not <code class="ph codeph">INSERT OVERWRITE</code>.
+ The tradeoff is the possibility of inconsistent output files left behind if a
+ node fails during <code class="ph codeph">INSERT</code> execution.
+ See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ Certain features are turned off by default, to avoid regressions or unexpected
+ behavior following an upgrade. Consider turning on these features after suitable testing:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+ specified through the HDFS configuration setting
+ <code class="ph codeph">hadoop.security.auth_to_local</code>.
+ This feature is disabled by default; to enable it,
+ specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+ in the <span class="keyword cmdname">impalad</span> configuration settings.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new query option, <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code>,
+ makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+ metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+ and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+ by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ statements.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new query option,
+ <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code>,
+ lets Impala locate columns within Parquet files based on
+ column name rather than ordinal position.
+ This enhancement improves interoperability with applications
+ that write Parquet files with a different order or subset of
+ columns than are used in the Impala table.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="incompatible_changes__incompatible_changes_25x">
+
+ <h2 class="title topictitle2" id="ariaid-title10">Incompatible Changes Introduced in Impala 2.5.x</h2>
+
+ <div class="body conbody">
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The admission control default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+ setting) is now unlimited instead of 200.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Multiplying a mixture of <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+ <code class="ph codeph">DOUBLE</code> values now returns
+ <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This
+ change avoids some cases where an intermediate value would underflow or overflow
+ and become <code class="ph codeph">NULL</code> unexpectedly. The results of
+ multiplying <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">FLOAT</code> or
+ <code class="ph codeph">DOUBLE</code> might now be slightly less precise than
+ before. Previously, the intermediate types and thus the final result
+ depended on the exact order of the values of different types being
+ multiplied, which made the final result values difficult to
+ reason about.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Previously, the <code class="ph codeph">_</code> and <code class="ph codeph">%</code> wildcard
+ characters for the <code class="ph codeph">LIKE</code> operator would not match
+ characters on the second or subsequent lines of multi-line string values. The fix for issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2204" target="_blank">IMPALA-2204</a> causes
+ the wildcard matching to apply to the entire string for values
+ containing embedded <code class="ph codeph">\n</code> characters. This could cause
+ different results than in previous Impala releases for identical
+ queries on identical data.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Formerly, all Impala UDFs and UDAs required running the
+ <code class="ph codeph">CREATE FUNCTION</code> statements to
+ re-create them after each <span class="keyword cmdname">catalogd</span> restart.
+ In <span class="keyword">Impala 2.5</span> and higher, functions written in C++ are persisted across
+ restarts, and the requirement to
+ re-create functions only applies to functions written in Java. Adapt any
+ function-reloading logic that you have added to your Impala environment.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">CREATE TABLE LIKE</code> no longer inherits HDFS caching settings from the source table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+ The second column includes the associated comment string, if any, for each database.
+ Adjust any application code that examines the list of databases and assumes the
+ result set contains only a single column.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The output of the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes
+ two new columns, showing the kind of the function (for example,
+ <code class="ph codeph">BUILTIN</code>) and whether or not the function persists
+ across catalog server restarts. For example, the <code class="ph codeph">SHOW
+ FUNCTIONS</code> output for the
+ <code class="ph codeph">_impala_builtins</code> database starts with:
+ </p>
+<pre class="pre codeblock"><code>
++--------------+-------------------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++--------------+-------------------------------------------------+-------------+---------------+
+| BIGINT | abs(BIGINT) | BUILTIN | true |
+| DECIMAL(*,*) | abs(DECIMAL(*,*)) | BUILTIN | true |
+| DOUBLE | abs(DOUBLE) | BUILTIN | true |
+...
+</code></pre>
+ </li>
+ </ul>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="incompatible_changes__incompatible_changes_24x">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Incompatible Changes Introduced in Impala 2.4.x</h2>
+
+ <div class="body conbody">
+ <p class="p">
+ Other than support for DSSD storage, the Impala feature set for <span class="keyword">Impala 2.4</span> is the same as for <span class="keyword">Impala 2.3</span>.
+ Therefore, there are no incompatible changes for Impala introduced in <span class="keyword">Impala 2.4</span>.
+ </p>
+ </div>
+
+ </article>
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="incompatible_changes__incompatible_changes_23x">
+
+ <h2 class="title topictitle2" id="ariaid-title12">Incompatible Changes Introduced in Impala 2.3.x</h2>
+
+ <div class="body conbody">
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The use of the Llama component for integrated resource management within YARN
+ is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+ The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+ </p>
+ <p class="p">
+ For clusters running Impala alongside
+ other data management components, you define static service pools to define the resources
+ available to Impala and other components. Then within the area allocated for Impala,
+ you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+ </p>
+ </div>
+
+ <ul class="ul">
+
+ <li class="li">
+ <p class="p">
+ If Impala encounters a Parquet file that is invalid because of an incorrect magic number,
+ the query skips the file. This change is caused by the fix for issue <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2130" target="_blank">IMPALA-2130</a>.
+ Previously, Impala would attempt to read the file despite the possibility that the file was corrupted.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Previously, calls to overloaded built-in functions could treat parameters as <code class="ph codeph">DOUBLE</code>
+ or <code class="ph codeph">FLOAT</code> when no overload had a signature that matched the exact argument types.
+ Now Impala prefers the function signature with <code class="ph codeph">DECIMAL</code> parameters in this case.
+ This change avoids a possible loss of precision in function calls such as <code class="ph codeph">greatest(0, 99999.8888)</code>;
+ now both parameters are treated as <code class="ph codeph">DECIMAL</code> rather than <code class="ph codeph">DOUBLE</code>, avoiding
+ any loss of precision in the fractional value.
+ This could cause slightly different results than in previous Impala releases for certain function calls.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Formerly, adding or subtracting a large interval value to a <code class="ph codeph">TIMESTAMP</code> could produce
+ a nonsensical result. Now when the result goes outside the range of <code class="ph codeph">TIMESTAMP</code> values,
+ Impala returns <code class="ph codeph">NULL</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Formerly, it was possible to accidentally create a table with identical row and column delimiters.
+ This could happen unintentionally, when specifying one of the delimiters and using the
+ default value for the other. Now an attempt to use identical delimiters still succeeds,
+ but displays a warning message.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Formerly, Impala could include snippets of table data in log files by default, for example
+ when reporting conversion errors for data values. Now any such log messages are only produced
+ at higher logging levels that you would enable only during debugging.
+ </p>
+ </li>
+
+ </ul>
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="incompatible_changes__incompatible_changes_22x">
+
+ <h2 class="title topictitle2" id="ariaid-title13">Incompatible Changes Introduced in Impala 2.2.x</h2>
+
+ <div class="body conbody">
+
+ <section class="section" id="incompatible_changes_22x__files_220"><h3 class="title sectiontitle">
+ Changes to File Handling
+ </h3>
+
+ <p class="p">
+ Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+ files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+ Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+ <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+ </p>
+ <p class="p">
+ The log rotation feature in Impala 2.2.0 and higher
+ means that older log files are now removed by default.
+ The default is to preserve the latest 10 log files for each
+ severity level, for each Impala-related daemon. If you have
+ set up your own log rotation processes that expect older
+ files to be present, either adjust your procedures or
+ change the Impala <code class="ph codeph">-max_log_files</code> setting.
+ <span class="ph">See <a class="xref" href="impala_logging.html#logs_rotate">Rotating Impala Logs</a> for details.</span>
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_22x__prereqs_210"><h3 class="title sectiontitle">
+ Changes to Prerequisites
+ </h3>
+
+ <p class="p">
+ The prerequisite for CPU architecture has been relaxed in Impala 2.2.0 and higher. From this release
+ onward, Impala works on CPUs that have the SSSE3 instruction set. The SSE4 instruction set is no longer
+ required. This relaxed requirement simplifies the upgrade planning from Impala 1.x releases, which also
+ worked on SSSE3-enabled processors.
+ </p>
+ </section>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="incompatible_changes__incompatible_changes_21x">
+
+ <h2 class="title topictitle2" id="ariaid-title14">Incompatible Changes Introduced in Impala 2.1.x</h2>
+
+ <div class="body conbody">
+
+ <section class="section" id="incompatible_changes_21x__prereqs_210"><h3 class="title sectiontitle">
+ Changes to Prerequisites
+ </h3>
+
+ <p class="p">
+ Currently, Impala 2.1.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+ requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+ the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.1</span>.
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_21x__output_format_210"><h3 class="title sectiontitle">
+ Changes to Output Format
+ </h3>
+
+ <p class="p">
+ The <span class="q">"small query"</span> optimization feature introduces some new information in the
+ <code class="ph codeph">EXPLAIN</code> plan, which you might need to account for if you parse the text of the plan
+ output.
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_21x__reserved_words_210"><h3 class="title sectiontitle">
+ New Reserved Words
+ </h3>
+
+ <p class="p">
+ New SQL syntax introduces additional reserved words:
+ <code class="ph codeph">FOR</code>, <code class="ph codeph">GRANT</code>, <code class="ph codeph">REVOKE</code>, <code class="ph codeph">ROLE</code>, <code class="ph codeph">ROLES</code>,
+ <code class="ph codeph">INCREMENTAL</code>.
+ <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+ for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+ </p>
+ </section>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="incompatible_changes__incompatible_changes_205">
+
+ <h2 class="title topictitle2" id="ariaid-title15">Incompatible Changes Introduced in Impala 2.0.5</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ No incompatible changes.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="incompatible_changes__incompatible_changes_204">
+
+ <h2 class="title topictitle2" id="ariaid-title16">Incompatible Changes Introduced in Impala 2.0.4</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ No incompatible changes.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="incompatible_changes__incompatible_changes_203">
+
+ <h2 class="title topictitle2" id="ariaid-title17">Incompatible Changes Introduced in Impala 2.0.3</h2>
+
+ <div class="body conbody">
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="incompatible_changes__incompatible_changes_202">
+
+ <h2 class="title topictitle2" id="ariaid-title18">Incompatible Changes Introduced in Impala 2.0.2</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ No incompatible changes.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title19" id="incompatible_changes__incompatible_changes_201">
+
+ <h2 class="title topictitle2" id="ariaid-title19">Incompatible Changes Introduced in Impala 2.0.1</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement has always left behind a hidden work directory inside the data
+ directory of the table. Formerly, this hidden work directory was named
+ <span class="ph filepath">.impala_insert_staging</span> . In Impala 2.0.1 and later, this directory name is changed to
+ <span class="ph filepath">_impala_insert_staging</span> . (While HDFS tools are expected to treat names beginning
+ either with underscore and dot as hidden, in practice names beginning with an underscore are more widely
+ supported.) If you have any scripts, cleanup jobs, and so on that rely on the name of this work directory,
+ adjust them to use the new name.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">abs()</code> function now takes a broader range of numeric types as arguments, and the
+ return type is the same as the argument type.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Shorthand notation for character classes in regular expressions, such as <code class="ph codeph">\d</code> for digit,
+ are now available again in regular expression operators and functions such as
+ <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code>. Some other differences in
+ regular expression behavior remain between Impala 1.x and Impala 2.x releases. See
+ <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_200">Incompatible Changes Introduced in Impala 2.0.0</a> for details.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title20" id="incompatible_changes__incompatible_changes_200">
+
+ <h2 class="title topictitle2" id="ariaid-title20">Incompatible Changes Introduced in Impala 2.0.0</h2>
+
+ <div class="body conbody">
+
+ <section class="section" id="incompatible_changes_200__prereqs_200"><h3 class="title sectiontitle">
+ Changes to Prerequisites
+ </h3>
+
+ <p class="p">
+ Currently, Impala 2.0.x does not function on CPUs without the SSE4.1 instruction set. This minimum CPU
+ requirement is higher than in previous versions, which relied on the older SSSE3 instruction set. Check
+ the CPU level of the hosts in your cluster before upgrading to <span class="keyword">Impala 2.0</span>.
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_200__queries_200"><h3 class="title sectiontitle">
+ Changes to Query Syntax
+ </h3>
+
+
+ <p class="p">
+ The new syntax where query hints are allowed in comments causes some changes in the way comments are
+ parsed in the <span class="keyword cmdname">impala-shell</span> interpreter. Previously, you could end a
+ <code class="ph codeph">--</code> comment line with a semicolon and <span class="keyword cmdname">impala-shell</span> would treat that
+ as a no-op statement. Now, a comment line ending with a semicolon is passed as an empty statement to
+ the Impala daemon, where it is flagged as an error.
+ </p>
+
+ <p class="p">
+ Impala 2.0 and later uses a different support library for regular expression parsing than in earlier
+ Impala versions. Now, Impala uses the
+ <a class="xref" href="https://code.google.com/p/re2/" target="_blank">Google RE2 library</a>
+ rather than Boost for evaluating regular expressions. This implementation change causes some
+ differences in the allowed regular expression syntax, and in the way certain regex operators are
+ interpreted. The following are some of the major differences (not necessarily a complete list):
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">.*?</code> notation for non-greedy matches is now supported, where it was not in earlier
+ Impala releases.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ By default, <code class="ph codeph">^</code> and <code class="ph codeph">$</code> now match only begin/end of buffer, not
+ begin/end of each line. This behavior can be overridden in the regex itself using the
+ <code class="ph codeph">m</code> flag.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ By default, <code class="ph codeph">.</code> does not match newline. This behavior can be overridden in the regex
+ itself using the <code class="ph codeph">s</code> flag.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">\Z</code> is not supported.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph"><</code> and <code class="ph codeph">></code> for start of word and end of word are not
+ supported.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Lookahead and lookbehind are not supported.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Shorthand notation for character classes, such as <code class="ph codeph">\d</code> for digit, is not recognized.
+ (This restriction is lifted in Impala 2.0.1, which restores the shorthand notation.)
+ </p>
+ </li>
+ </ul>
+ </section>
+
+ <section class="section" id="incompatible_changes_200__output_format_210"><h3 class="title sectiontitle">
+ Changes to Output Format
+ </h3>
+
+
+ <p class="p">
+ In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+ <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+ </p>
+
+ <p class="p">
+ The changed format for the user name in secure environments is also reflected where the user name is
+ displayed in the output of the <code class="ph codeph">PROFILE</code> command.
+ </p>
+
+ <p class="p">
+ In the output from <code class="ph codeph">SHOW FUNCTIONS</code>, <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code>, and
+ <code class="ph codeph">SHOW ANALYTIC FUNCTIONS</code>, arguments and return types of arbitrary
+ <code class="ph codeph">DECIMAL</code> scale and precision are represented as <code class="ph codeph">DECIMAL(*,*)</code>.
+ Formerly, these items were displayed as <code class="ph codeph">DECIMAL(-1,-1)</code>.
+ </p>
+
+ </section>
+
+ <section class="section" id="incompatible_changes_200__query_options_200"><h3 class="title sectiontitle">
+ Changes to Query Options
+ </h3>
+
+ <p class="p">
+ The <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> query option has been replaced by the
+ <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+ <span class="ph">See <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.</span>
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_200__config_options_200"><h3 class="title sectiontitle">
+ Changes to Configuration Options
+ </h3>
+
+
+ <p class="p">
+ The meaning of the <code class="ph codeph">--idle_query_timeout</code> configuration option is changed, to
+ accommodate the new <code class="ph codeph">QUERY_TIMEOUT_S</code> query option. Rather than setting an absolute
+ timeout period that applies to all queries, it now sets a maximum timeout period, which can be adjusted
+ downward for individual queries by specifying a value for the <code class="ph codeph">QUERY_TIMEOUT_S</code> query
+ option. In sessions where no <code class="ph codeph">QUERY_TIMEOUT_S</code> query option is specified, the
+ <code class="ph codeph">--idle_query_timeout</code> timeout period applies the same as in earlier versions.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">--strict_unicode</code> option of <span class="keyword cmdname">impala-shell</span> was removed. To avoid
+ problems with Unicode values in <span class="keyword cmdname">impala-shell</span>, define the following locale setting
+ before running <span class="keyword cmdname">impala-shell</span>:
+ </p>
+<pre class="pre codeblock"><code>export LC_CTYPE=en_US.UTF-8
+</code></pre>
+
+ </section>
+
+ <section class="section" id="incompatible_changes_200__reserved_words_210"><h3 class="title sectiontitle">
+ New Reserved Words
+ </h3>
+
+ <p class="p">
+ Some new SQL syntax requires the addition of new reserved words: <code class="ph codeph">ANTI</code>,
+ <code class="ph codeph">ANALYTIC</code>, <code class="ph codeph">OVER</code>, <code class="ph codeph">PRECEDING</code>,
+ <code class="ph codeph">UNBOUNDED</code>, <code class="ph codeph">FOLLOWING</code>, <code class="ph codeph">CURRENT</code>,
+ <code class="ph codeph">ROWS</code>, <code class="ph codeph">RANGE</code>, <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>.
+ <span class="ph">As always, see <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+ for the set of reserved words for the current release, and the quoting techniques to avoid name conflicts.</span>
+ </p>
+ </section>
+
+ <section class="section" id="incompatible_changes_200__output_files_200"><h3 class="title sectiontitle">
+ Changes to Data Files
+ </h3>
+
+
+ <p class="p" id="incompatible_changes_200__parquet_block_size">
+ The default Parquet block size for Impala is changed from 1 GB to 256 MB. This change could have
+ implications for the sizes of Parquet files produced by <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE
+ TABLE AS SELECT</code> statements.
+ </p>
+ <p class="p">
+ Although older Impala releases typically produced files that were smaller than the old default size of
+ 1 GB, now the file size matches more closely whatever value is specified for the
+ <code class="ph codeph">PARQUET_FILE_SIZE</code> query option. Thus, if you use a non-default value for this setting,
+ the output files could be larger than before. They still might be somewhat smaller than the specified
+ value, because Impala makes conservative estimates about the space needed to represent each column as
+ it encodes the data.
+ </p>
+ <p class="p">
+ When you do not specify an explicit value for the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option,
+ Impala tries to keep the file size within the 256 MB default size, but Impala might adjust the file
+ size to be somewhat larger if needed to accommodate the layout for <dfn class="term">wide</dfn> tables, that is,
+ tables with hundreds or thousands of columns.
+ </p>
+ <p class="p">
+ This change is unlikely to affect memory usage while writing Parquet files, because Impala does not
+ pre-allocate the memory needed to hold the entire Parquet block.
+ </p>
+
+ </section>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title21" id="incompatible_changes__incompatible_changes_144">
+ <h2 class="title topictitle2" id="ariaid-title21">Incompatible Changes Introduced in Impala 1.4.4</h2>
+ <div class="body conbody">
+ <p class="p">
+ No incompatible changes.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title22" id="incompatible_changes__incompatible_changes_143">
+
+ <h2 class="title topictitle2" id="ariaid-title22">Incompatible Changes Introduced in Impala 1.4.3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+ Impala.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="incompatible_changes__incompatible_changes_142">
+
+ <h2 class="title topictitle2" id="ariaid-title23">Incompatible Changes Introduced in Impala 1.4.2</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ None. Impala 1.4.2 is purely a bug-fix release. It does not include any incompatible changes.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="incompatible_changes__incompatible_changes_141">
+
+ <h2 class="title topictitle2" id="ariaid-title24">Incompatible Changes Introduced in Impala 1.4.1</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ None. Impala 1.4.1 is purely a bug-fix release. It does not include any incompatible changes.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title25" id="incompatible_changes__incompatible_changes_140">
+
+ <h2 class="title topictitle2" id="ariaid-title25">Incompatible Changes Introduced in Impala 1.4.0</h2>
+
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ There is a slight change to required security privileges in the Sentry framework. To create a new
+ object, now you need the <code class="ph codeph">ALL</code> privilege on the parent object. For example, to create a
+ new table, view, or function requires having the <code class="ph codeph">ALL</code> privilege on the database
+ containing the new object. See <a class="xref" href="impala_authorization.html">Enabling Sentry Authorization for Impala</a> for a full list of operations and
+ associated privileges.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ With the ability of <code class="ph codeph">ORDER BY</code> queries to process unlimited amounts of data with no
+ <code class="ph codeph">LIMIT</code> clause, the query options <code class="ph codeph">DEFAULT_ORDER_BY_LIMIT</code> and
+ <code class="ph codeph">ABORT_ON_DEFAULT_LIMIT_EXCEEDED</code> are now deprecated and have no effect.
+ <span class="ph">See <a class="xref" href="impala_order_by.html#order_by">ORDER BY Clause</a> for details about improvements to
+ the <code class="ph codeph">ORDER BY</code> clause.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ There are some changes to the list of reserved words. <span class="ph">See
+ <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span> The following
+ keywords are new:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">API_VERSION</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">BINARY</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">CACHED</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">CLASS</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">PARTITIONS</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">PRODUCED</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">UNCACHED</code>
+ </li>
+ </ul>
+ <p class="p">
+ The following were formerly reserved keywords, but are no longer reserved:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">COUNT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">GROUP_CONCAT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">NDV</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SUM</code>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The fix for issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-973" target="_blank">IMPALA-973</a>
+ changes the behavior of the <code class="ph codeph">INVALIDATE METADATA</code> statement regarding nonexistent
+ tables. In Impala 1.4.0 and higher, the statement returns an error if the specified table is not in the
+ metastore database at all. It completes successfully if the specified table is in the metastore
+ database but not yet recognized by Impala, for example if the table was created through Hive. Formerly,
+ you could issue this statement for a completely nonexistent table, with no error.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title26" id="incompatible_changes__incompatible_changes_133">
+
+ <h2 class="title topictitle2" id="ariaid-title26">Incompatible Changes Introduced in Impala 1.3.3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ No incompatible changes. The TLS/SSL security fix does not require any change in the way you interact with
+ Impala.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title27" id="incompatible_changes__incompatible_changes_132">
+
+ <h2 class="title topictitle2" id="ariaid-title27">Incompatible Changes Introduced in Impala 1.3.2</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ With the fix for IMPALA-1019, you can use HDFS caching for files that are accessed by Impala.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title28" id="incompatible_changes__incompatible_changes_131">
+
+ <h2 class="title topictitle2" id="ariaid-title28">Incompatible Changes Introduced in Impala 1.3.1</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In Impala 1.3.1 and higher, the <code class="ph codeph">REGEXP</code> and <code class="ph codeph">RLIKE</code> operators now match a
+ regular expression string that occurs anywhere inside the target string, the same as if the regular
+ expression was enclosed on each side by <code class="ph codeph">.*</code>. See
+ <a class="xref" href="../shared/../topics/impala_operators.html#regexp">REGEXP Operator</a> for examples. Previously, these operators only
+ succeeded when the regular expression matched the entire target string. This change improves compatibility
+ with the regular expression support for popular database systems. There is no change to the behavior of the
+ <code class="ph codeph">regexp_extract()</code> and <code class="ph codeph">regexp_replace()</code> built-in functions.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The result set for the <code class="ph codeph">SHOW FUNCTIONS</code> statement includes a new first column, with the
+ data type of the return value. <span class="ph">See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for
+ examples.</span>
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title29" id="incompatible_changes__incompatible_changes_130">
+
+ <h2 class="title topictitle2" id="ariaid-title29">Incompatible Changes Introduced in Impala 1.3.0</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">EXPLAIN_LEVEL</code> query option now accepts numeric options from 0 (most concise) to 3
+ (most verbose), rather than only 0 or 1. If you formerly used <code class="ph codeph">SET EXPLAIN_LEVEL=1</code> to
+ get detailed explain plans, switch to <code class="ph codeph">SET EXPLAIN_LEVEL=3</code>. If you used the mnemonic
+ keyword (<code class="ph codeph">SET EXPLAIN_LEVEL=verbose</code>), you do not need to change your code because now
+ level 3 corresponds to <code class="ph codeph">verbose</code>. <span class="ph">See
+ <a class="xref" href="impala_explain_level.html#explain_level">EXPLAIN_LEVEL Query Option</a> for details about the allowed explain levels, and
+ <a class="xref" href="impala_explain_plan.html#explain_plan">Understanding Impala Query Performance - EXPLAIN Plans and Query Profiles</a> for usage information.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <div class="p">
+ The keyword <code class="ph codeph">DECIMAL</code> is now a reserved word. If you have any databases, tables,
+ columns, or other objects already named <code class="ph codeph">DECIMAL</code>, quote any references to them using
+ backticks (<code class="ph codeph">``</code>) to avoid name conflicts with the keyword.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Although the <code class="ph codeph">DECIMAL</code> keyword is a reserved word, currently Impala does not support
+ <code class="ph codeph">DECIMAL</code> as a data type for columns.
+ </div>
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The query option formerly named <code class="ph codeph">YARN_POOL</code> is now named
+ <code class="ph codeph">REQUEST_POOL</code> to reflect its broader use with the Impala admission control feature.
+ <span class="ph">See <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a> for information about the
+ option, and <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details about its use with the
+ admission control feature.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ There are some changes to the list of reserved words. <span class="ph">See
+ <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a> for the most current list.</span>
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The names of aggregate functions are no longer reserved words, so you can have databases, tables,
+ columns, or other objects named <code class="ph codeph">AVG</code>, <code class="ph codeph">MIN</code>, and so on without any
+ name conflicts.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The internal function names <code class="ph codeph">DISTINCTPC</code> and <code class="ph codeph">DISTINCTPCSA</code> are no
+ longer reserved words, although <code class="ph codeph">DISTINCT</code> is still a reserved word.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The keywords <code class="ph codeph">CLOSE_FN</code> and <code class="ph codeph">PREPARE_FN</code> are now reserved words.
+ <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for their role in
+ the <code class="ph codeph">CREATE FUNCTION</code> statement, and <a class="xref" href="impala_udf.html#udf_threads">Thread-Safe Work Area for UDFs</a> for
+ usage information.</span>
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The HDFS property <code class="ph codeph">dfs.client.file-block-storage-locations.timeout</code> was renamed to
+ <code class="ph codeph">dfs.client.file-block-storage-locations.timeout.millis</code>, to emphasize that the unit of
+ measure is milliseconds, not seconds. Impala requires a timeout of at least 10 seconds, making the
+ minimum value for this setting 10000. If you are not using cluster management software, you might need to
+ edit the <span class="ph filepath">hdfs-site.xml</span> file in the Impala configuration directory for the new name
+ and minimum value.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title30" id="incompatible_changes__incompatible_changes_124">
+
+ <h2 class="title topictitle2" id="ariaid-title30">Incompatible Changes Introduced in Impala 1.2.4</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ There are no incompatible changes introduced in Impala 1.2.4.
+ </p>
+
+ <p class="p">
+ Previously, after creating a table in Hive, you had to issue the <code class="ph codeph">INVALIDATE METADATA</code>
+ statement with no table name, a potentially expensive operation on clusters with many databases, tables,
+ and partitions. Starting in Impala 1.2.4, you can issue the statement <code class="ph codeph">INVALIDATE METADATA
+ <var class="keyword varname">table_name</var></code> for a table newly created through Hive. Loading the metadata for
+ only this one table is faster and involves less network overhead. Therefore, you might revisit your setup
+ DDL scripts to add the table name to <code class="ph codeph">INVALIDATE METADATA</code> statements, in cases where you
+ create and populate the tables through Hive before querying them through Impala.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title31" id="incompatible_changes__incompatible_changes_123">
+
+ <h2 class="title topictitle2" id="ariaid-title31">Incompatible Changes Introduced in Impala 1.2.3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Because the feature set of Impala 1.2.3 is identical to Impala 1.2.2, there are no new incompatible
+ changes. See <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_122">Incompatible Changes Introduced in Impala 1.2.2</a> if you are upgrading
+ from Impala 1.2.1 or 1.1.x.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title32" id="incompatible_changes__incompatible_changes_122">
+
+ <h2 class="title topictitle2" id="ariaid-title32">Incompatible Changes Introduced in Impala 1.2.2</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following changes to SQL syntax and semantics in Impala 1.2.2 could require updates to your SQL code,
+ or schema objects such as tables or views:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ With the addition of the <code class="ph codeph">CROSS JOIN</code> keyword, you might need to rewrite any queries
+ that refer to a table named <code class="ph codeph">CROSS</code> or use the name <code class="ph codeph">CROSS</code> as a table
+ alias:
+ </p>
+<pre class="pre codeblock"><code>-- Formerly, 'cross' in this query was an alias for t1
+-- and it was a normal join query.
+-- In 1.2.2 and higher, CROSS JOIN is a keyword, so 'cross'
+-- is not interpreted as a table alias, and the query
+-- uses the special CROSS JOIN processing rather than a
+-- regular join.
+select * from t1 cross join t2...
+
+-- Now if CROSS is used in other context such as a table or column name,
+-- use backticks to escape it.
+create table `cross` (x int);
+select * from `cross`;</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Formerly, a <code class="ph codeph">DROP DATABASE</code> statement in Impala would not remove the top-level HDFS
+ directory for that database. The <code class="ph codeph">DROP DATABASE</code> has been enhanced to remove that
+ directory. (You still need to drop all the tables inside the database first; this change only applies
+ to the top-level directory for the entire database.)
+ </p>
+ </li>
+
+ <li class="li">
+ The keyword <code class="ph codeph">PARQUET</code> is introduced as a synonym for <code class="ph codeph">PARQUETFILE</code> in the
+ <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements, because that is the common
+ name for the file format. (As opposed to SequenceFile and RCFile where the <span class="q">"File"</span> suffix is part of
+ the name.) Documentation examples have been changed to prefer the new shorter keyword. The
+ <code class="ph codeph">PARQUETFILE</code> keyword is still available for backward compatibility with older Impala
+ versions.
+ </li>
+
+ <li class="li">
+ New overloads are available for several operators and built-in functions, allowing you to insert their
+ result values into smaller numeric columns such as <code class="ph codeph">INT</code>, <code class="ph codeph">SMALLINT</code>,
+ <code class="ph codeph">TINYINT</code>, and <code class="ph codeph">FLOAT</code> without using a <code class="ph codeph">CAST()</code> call. If you
+ remove the <code class="ph codeph">CAST()</code> calls from <code class="ph codeph">INSERT</code> statements, those statements might
+ not work with earlier versions of Impala.
+ </li>
+ </ul>
+
+ <p class="p">
+ Because many users are likely to upgrade straight from Impala 1.x to Impala 1.2.2, also read
+ <a class="xref" href="impala_incompatible_changes.html#incompatible_changes_121">Incompatible Changes Introduced in Impala 1.2.1</a> for things to note about upgrading
+ to Impala 1.2.x in general.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title33" id="incompatible_changes__incompatible_changes_121">
+
+ <h2 class="title topictitle2" id="ariaid-title33">Incompatible Changes Introduced in Impala 1.2.1</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following changes to SQL syntax and semantics in Impala 1.2.1 could require updates to your SQL code,
+ or schema objects such as tables or views:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+ <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+ DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+ sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+ <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+ with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+ behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+ LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+ <p class="p">
+ See <a class="xref" href="impala_literals.html#null">NULL</a> for more information.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+ start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+ <code class="ph codeph">INVALIDATE METADATA</code> statements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+ <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+ <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+ <span class="keyword cmdname">catalogd</span> daemon.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+ when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+ data-changing operation is performed through Impala. These statements are still needed if such
+ operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+ statements only need to be issued on one Impala node rather than on all nodes. See
+ <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+ <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+ information for those statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+ <span class="keyword cmdname">catalogd</span> service.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title34" id="incompatible_changes__incompatible_changes_120">
+
+ <h2 class="title topictitle2" id="ariaid-title34">Incompatible Changes Introduced in Impala 1.2.0 (Beta)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ There are no incompatible changes to SQL syntax in Impala 1.2.0 (beta).
+ </p>
+
+ <p class="p">
+ The new <span class="keyword cmdname">catalogd</span> service might require changes to any user-written scripts that stop,
+ start, or restart Impala services, install or upgrade Impala packages, or issue <code class="ph codeph">REFRESH</code> or
+ <code class="ph codeph">INVALIDATE METADATA</code> statements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_install.html#install">Installing Impala</a>,
+ <a class="xref" href="../shared/../topics/impala_upgrading.html#upgrading">Upgrading Impala</a> and
+ <a class="xref" href="../shared/../topics/impala_processes.html#processes">Starting Impala</a>, for usage information for the
+ <span class="keyword cmdname">catalogd</span> daemon.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are no longer needed
+ when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+ data-changing operation is performed through Impala. These statements are still needed if such
+ operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+ statements only need to be issued on one Impala node rather than on all nodes. See
+ <a class="xref" href="../shared/../topics/impala_refresh.html#refresh">REFRESH Statement</a> and
+ <a class="xref" href="../shared/../topics/impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage
+ information for those statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="../shared/../topics/impala_components.html#intro_catalogd">The Impala Catalog Service</a> for background information on the
+ <span class="keyword cmdname">catalogd</span> service.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The new resource management feature interacts with both YARN and Llama services.
+ <span class="ph">See
+ <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for usage information for Impala resource
+ management.</span>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title35" id="incompatible_changes__incompatible_changes_111">
+
+ <h2 class="title topictitle2" id="ariaid-title35">Incompatible Changes Introduced in Impala 1.1.1</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ There are no incompatible changes in Impala 1.1.1.
+ </p>
+
+
+
+
+
+
+
+
+
+
+
+ <p class="p">
+ Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+ that Parquet support is available for Hive 10, reusing existing Impala Parquet data files in Hive requires
+ updating the table metadata. Use the following command if you are already running Impala 1.1.1:
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUETFILE;
+</code></pre>
+
+ <p class="p">
+ If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+ INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+ OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+ <p class="p">
+ Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+ </p>
+
+ <p class="p">
+ As usual, make sure to upgrade the Impala LZO package to the latest level at the same
+ time as you upgrade the Impala server.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title36" id="incompatible_changes__incompatible_changes_11">
+
+ <h2 class="title topictitle2" id="ariaid-title36">Incompatible Change Introduced in Impala 1.1</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> statement now requires a table name; in Impala 1.0, the table name was
+ optional. This syntax change is part of the internal rework to make <code class="ph codeph">REFRESH</code> a true
+ Impala SQL statement so that it can be called through the JDBC and ODBC APIs. <code class="ph codeph">REFRESH</code>
+ now reloads the metadata immediately, rather than marking it for update the next time any affected
+ table is accessed. The previous behavior, where omitting the table name caused a refresh of the entire
+ Impala metadata catalog, is available through the new <code class="ph codeph">INVALIDATE METADATA</code> statement.
+ <code class="ph codeph">INVALIDATE METADATA</code> can be specified with a table name to affect a single table, or
+ without a table name to affect the entire metadata catalog; the relevant metadata is reloaded the next
+ time it is requested during the processing for a SQL statement. See
+ <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest details about these
+ statements.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title37" id="incompatible_changes__incompatible_changes_10">
+
+ <h2 class="title topictitle2" id="ariaid-title37">Incompatible Changes Introduced in Impala 1.0</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ If you use LZO-compressed text files, when you upgrade Impala to version 1.0, also update the
+ Impala LZO package to the latest level. See <a class="xref" href="impala_txtfile.html#lzo">Using LZO-Compressed Text Files</a> for
+ details.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+</article></main></body></html>
[27/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_invalidate_metadata.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_invalidate_metadata.html b/docs/build3x/html/topics/impala_invalidate_metadata.html
new file mode 100644
index 0000000..ae7e419
--- /dev/null
+++ b/docs/build3x/html/topics/impala_invalidate_metadata.html
@@ -0,0 +1,286 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="invalidate_metadata"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>INVALIDATE METADATA Statement</title></head><body id="invalidate_metadata"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">INVALIDATE METADATA Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Marks the metadata for one or all tables as stale. Required after a table is created through the Hive shell,
+ before the table is available for Impala queries. The next time the current Impala node performs a query
+ against a table whose metadata is invalidated, Impala reloads the associated metadata before the query
+ proceeds. This is a relatively expensive operation compared to the incremental metadata update done by the
+ <code class="ph codeph">REFRESH</code> statement, so in the common scenario of adding new data files to an existing table,
+ prefer <code class="ph codeph">REFRESH</code> rather than <code class="ph codeph">INVALIDATE METADATA</code>. If you are not familiar
+ with the way Impala uses metadata and how it shares the same metastore database as Hive, see
+ <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a> for background information.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>INVALIDATE METADATA [[<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var>]</code></pre>
+
+ <p class="p">
+ By default, the cached metadata for all tables is flushed. If you specify a table name, only the metadata for
+ that one table is flushed. Even for a single table, <code class="ph codeph">INVALIDATE METADATA</code> is more expensive
+ than <code class="ph codeph">REFRESH</code>, so prefer <code class="ph codeph">REFRESH</code> in the common case where you add new data
+ files for an existing table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong>
+ </p>
+
+ <p class="p">
+ To accurately respond to queries, Impala must have current metadata about those databases and tables that
+ clients query directly. Therefore, if some other entity modifies information used by Impala in the metastore
+ that Impala and Hive share, the information cached by Impala must be updated. However, this does not mean
+ that all metadata updates require an Impala update.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+ the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+ reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+ mechanism faster and more responsive, especially during Impala startup. See
+ <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+ </p>
+ <p class="p">
+ In Impala 1.2 and higher, a dedicated daemon (<span class="keyword cmdname">catalogd</span>) broadcasts DDL changes made
+ through Impala to all Impala nodes. Formerly, after you created a database or table while connected to one
+ Impala node, you needed to issue an <code class="ph codeph">INVALIDATE METADATA</code> statement on another Impala node
+ before accessing the new database or table from the other node. Now, newly created or altered objects are
+ picked up automatically by all Impala nodes. You must still use the <code class="ph codeph">INVALIDATE METADATA</code>
+ technique after creating or altering objects through Hive. See
+ <a class="xref" href="impala_components.html#intro_catalogd">The Impala Catalog Service</a> for more information on the catalog service.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">INVALIDATE METADATA</code> statement is new in Impala 1.1 and higher, and takes over some of
+ the use cases of the Impala 1.0 <code class="ph codeph">REFRESH</code> statement. Because <code class="ph codeph">REFRESH</code> now
+ requires a table name parameter, to flush the metadata for all tables at once, use the <code class="ph codeph">INVALIDATE
+ METADATA</code> statement.
+ </p>
+ <p class="p">
+ Because <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> only works for tables that the current
+ Impala node is already aware of, when you create a new table in the Hive shell, enter
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">new_table</var></code> before you can see the new table in
+ <span class="keyword cmdname">impala-shell</span>. Once the table is known by Impala, you can issue <code class="ph codeph">REFRESH
+ <var class="keyword varname">table_name</var></code> after you add data files for that table.
+ </p>
+ </div>
+
+ <p class="p">
+ <code class="ph codeph">INVALIDATE METADATA</code> and <code class="ph codeph">REFRESH</code> are counterparts: <code class="ph codeph">INVALIDATE
+ METADATA</code> waits to reload the metadata when needed for a subsequent query, but reloads all the
+ metadata for the table, which can be an expensive operation, especially for large tables with many
+ partitions. <code class="ph codeph">REFRESH</code> reloads the metadata immediately, but only loads the block location
+ data for newly added data files, making it a less expensive operation overall. If data was altered in some
+ more extensive way, such as being reorganized by the HDFS balancer, use <code class="ph codeph">INVALIDATE
+ METADATA</code> to avoid a performance penalty from reduced local reads. If you used Impala version 1.0,
+ the <code class="ph codeph">INVALIDATE METADATA</code> statement works just like the Impala 1.0 <code class="ph codeph">REFRESH</code>
+ statement did, while the Impala 1.1 <code class="ph codeph">REFRESH</code> is optimized for the common use case of adding
+ new data files to an existing table, thus the table name argument is now required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ A metadata update for an <code class="ph codeph">impalad</code> instance <strong class="ph b">is</strong> required if:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ A metadata change occurs.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">and</strong> the change is made from another <code class="ph codeph">impalad</code> instance in your cluster, or through
+ Hive.
+ </li>
+
+ <li class="li">
+ <strong class="ph b">and</strong> the change is made to a metastore database to which clients such as the Impala shell or ODBC directly
+ connect.
+ </li>
+ </ul>
+
+ <p class="p">
+ A metadata update for an Impala node is <strong class="ph b">not</strong> required when you issue queries from the same Impala node
+ where you ran <code class="ph codeph">ALTER TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-modifying statement.
+ </p>
+
+ <p class="p">
+ Database and table metadata is typically modified by:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Hive - via <code class="ph codeph">ALTER</code>, <code class="ph codeph">CREATE</code>, <code class="ph codeph">DROP</code> or
+ <code class="ph codeph">INSERT</code> operations.
+ </li>
+
+ <li class="li">
+ Impalad - via <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, and <code class="ph codeph">INSERT</code>
+ operations.
+ </li>
+ </ul>
+
+ <p class="p">
+ <code class="ph codeph">INVALIDATE METADATA</code> causes the metadata for that table to be marked as stale, and reloaded
+ the next time the table is referenced. For a huge table, that process could take a noticeable amount of time;
+ thus you might prefer to use <code class="ph codeph">REFRESH</code> where practical, to avoid an unpredictable delay later,
+ for example if the next reference to the table is during a benchmark test.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how you might use the <code class="ph codeph">INVALIDATE METADATA</code> statement after
+ creating new tables (such as SequenceFile or HBase tables) through the Hive shell. Before the
+ <code class="ph codeph">INVALIDATE METADATA</code> statement was issued, Impala would give a <span class="q">"table not found"</span> error
+ if you tried to refer to those table names. The <code class="ph codeph">DESCRIBE</code> statements cause the latest
+ metadata to be immediately loaded for the tables, avoiding a delay the next time those tables are queried.
+ </p>
+
+<pre class="pre codeblock"><code>[impalad-host:21000] > invalidate metadata;
+[impalad-host:21000] > describe t1;
+...
+[impalad-host:21000] > describe t2;
+... </code></pre>
+
+ <p class="p">
+ For more examples of using <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> with a
+ combination of Impala and Hive operations, see <a class="xref" href="impala_tutorial.html#tutorial_impala_hive">Switching Back and Forth Between Impala and Hive</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have execute
+ permissions for all the relevant directories holding table data.
+ (A table could have data spread across multiple directories,
+ or in unexpected paths, if it uses partitioning or
+ specifies a <code class="ph codeph">LOCATION</code> attribute for
+ individual partitions or the entire table.)
+ Issues with permissions might not cause an immediate error for this statement,
+ but subsequent statements such as <code class="ph codeph">SELECT</code>
+ or <code class="ph codeph">SHOW TABLE STATS</code> could fail.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS considerations:</strong>
+ </p>
+
+ <p class="p">
+ By default, the <code class="ph codeph">INVALIDATE METADATA</code> command checks HDFS permissions of the underlying data
+ files and directories, caching this information so that a statement can be cancelled immediately if for
+ example the <code class="ph codeph">impala</code> user does not have permission to write to the data directory for the
+ table. (This checking does not apply when the <span class="keyword cmdname">catalogd</span> configuration option
+ <code class="ph codeph">--load_catalog_in_background</code> is set to <code class="ph codeph">false</code>, which it is by default.)
+ Impala reports any lack of write permissions as an <code class="ph codeph">INFO</code> message in the log file, in case
+ that represents an oversight. If you change HDFS permissions to make data readable or writeable by the Impala
+ user, issue another <code class="ph codeph">INVALIDATE METADATA</code> to make Impala aware of the change.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This example illustrates creating a new database and new table in Hive, then doing an <code class="ph codeph">INVALIDATE
+ METADATA</code> statement in Impala using the fully qualified table name, after which both the new table
+ and the new database are visible to Impala. The ability to specify <code class="ph codeph">INVALIDATE METADATA
+ <var class="keyword varname">table_name</var></code> for a table created in Hive is a new capability in Impala 1.2.4. In
+ earlier releases, that statement would have returned an error indicating an unknown table, requiring you to
+ do <code class="ph codeph">INVALIDATE METADATA</code> with no table name, a more expensive operation that reloaded metadata
+ for all tables and databases.
+ </p>
+
+<pre class="pre codeblock"><code>$ hive
+hive> create database new_db_from_hive;
+OK
+Time taken: 4.118 seconds
+hive> create table new_db_from_hive.new_table_from_hive (x int);
+OK
+Time taken: 0.618 seconds
+hive> quit;
+$ impala-shell
+[localhost:21000] > show databases like 'new*';
+[localhost:21000] > refresh new_db_from_hive.new_table_from_hive;
+ERROR: AnalysisException: Database does not exist: new_db_from_hive
+[localhost:21000] > invalidate metadata new_db_from_hive.new_table_from_hive;
+[localhost:21000] > show databases like 'new*';
++--------------------+
+| name |
++--------------------+
+| new_db_from_hive |
++--------------------+
+[localhost:21000] > show tables in new_db_from_hive;
++---------------------+
+| name |
++---------------------+
+| new_table_from_hive |
++---------------------+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements also cache metadata
+ for tables where the data resides in the Amazon Simple Storage Service (S3).
+ In particular, issue a <code class="ph codeph">REFRESH</code> for a table after adding or removing files
+ in the associated S3 data directory.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Much of the metadata for Kudu tables is handled by the underlying
+ storage layer. Kudu tables have less reliance on the metastore
+ database, and require less metadata caching on the Impala side.
+ For example, information about partitions in Kudu tables is managed
+ by Kudu, and Impala does not cache any block locality metadata
+ for Kudu tables.
+ </p>
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code>
+ statements are needed less frequently for Kudu tables than for
+ HDFS-backed tables. Neither statement is needed when data is
+ added to, removed, or updated in a Kudu table, even if the changes
+ are made directly to Kudu through a client program using the Kudu API.
+ Run <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> or
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+ for a Kudu table only after making a change to the Kudu table schema,
+ such as adding or dropping a column, by a mechanism other than
+ Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_hadoop.html#intro_metastore">Overview of Impala Metadata and the Metastore</a>,
+ <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_isilon.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_isilon.html b/docs/build3x/html/topics/impala_isilon.html
new file mode 100644
index 0000000..b0a2a2a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_isilon.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_isilon"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with Isilon Storage</title></head><body id="impala_isilon"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala with Isilon Storage</h1>
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You can use Impala to query data files that reside on EMC Isilon storage devices, rather than in HDFS.
+ This capability allows convenient query access to a storage system where you might already be
+ managing large volumes of data. The combination of the Impala query engine and Isilon storage is
+ certified on <span class="keyword">Impala 2.2.4</span> or higher.
+ </p>
+
+ <div class="p">
+ Because the EMC Isilon storage devices use a global value for the block size
+ rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+ query option has no effect when Impala inserts data into a table or partition
+ residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+ default block size globally on the Isilon device. For example, to set the
+ Isilon default block size to 256 MB, the recommended size for Parquet
+ data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+ </div>
+
+ <p class="p">
+ The typical use case for Impala and Isilon together is to use Isilon for the
+ default filesystem, replacing HDFS entirely. In this configuration,
+ when you create a database, table, or partition, the data always resides on
+ Isilon storage and you do not need to specify any special <code class="ph codeph">LOCATION</code>
+ attribute. If you do specify a <code class="ph codeph">LOCATION</code> attribute, its value refers
+ to a path within the Isilon filesystem.
+ For example:
+ </p>
+<pre class="pre codeblock"><code>-- If the default filesystem is Isilon, all Impala data resides there
+-- and all Impala databases and tables are located there.
+CREATE TABLE t1 (x INT, s STRING);
+
+-- You can specify LOCATION for database, table, or partition,
+-- using values from the Isilon filesystem.
+CREATE DATABASE d1 LOCATION '/some/path/on/isilon/server/d1.db';
+CREATE TABLE d1.t2 (a TINYINT, b BOOLEAN);
+</code></pre>
+
+ <p class="p">
+ Impala can write to, delete, and rename data files and database, table,
+ and partition directories on Isilon storage. Therefore, Impala statements such
+ as
+ <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP TABLE</code>,
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">DROP DATABASE</code>,
+ <code class="ph codeph">ALTER TABLE</code>,
+ and
+ <code class="ph codeph">INSERT</code> work the same with Isilon storage as with HDFS.
+ </p>
+
+ <p class="p">
+ When the Impala spill-to-disk feature is activated by a query that approaches
+ the memory limit, Impala writes all the temporary data to a local (not Isilon)
+ storage device. Because the I/O bandwidth for the temporary data depends on
+ the number of local disks, and clusters using Isilon storage might not have
+ as many local disks attached, pay special attention on Isilon-enabled clusters
+ to any queries that use the spill-to-disk feature. Where practical, tune the
+ queries or allocate extra memory for Impala to avoid spilling.
+ Although you can specify an Isilon storage device as the destination for
+ the temporary data for the spill-to-disk feature, that configuration is
+ not recommended due to the need to transfer the data both ways using remote I/O.
+ </p>
+
+ <p class="p">
+ When tuning Impala queries on HDFS, you typically try to avoid any remote reads.
+ When the data resides on Isilon storage, all the I/O consists of remote reads.
+ Do not be alarmed when you see non-zero numbers for remote read measurements
+ in query profile output. The benefit of the Impala and Isilon integration is
+ primarily convenience of not having to move or copy large volumes of data to HDFS,
+ rather than raw query performance. You can increase the performance of Impala
+ I/O for Isilon systems by increasing the value for the
+ <code class="ph codeph">--num_remote_hdfs_io_threads</code> startup option for the
+ <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+
+
+ </div>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_jdbc.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_jdbc.html b/docs/build3x/html/topics/impala_jdbc.html
new file mode 100644
index 0000000..33ed714
--- /dev/null
+++ b/docs/build3x/html/topics/impala_jdbc.html
@@ -0,0 +1,340 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_jdbc"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Configuring Impala to Work with JDBC</title></head><body id="impala_jdbc"><main role="main"><article role="article" aria-labelledby="impala_jdbc__jdbc">
+
+ <h1 class="title topictitle1" id="impala_jdbc__jdbc">Configuring Impala to Work with JDBC</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports the standard JDBC interface, allowing access from commercial Business Intelligence tools and
+ custom software written in Java or other programming languages. The JDBC driver allows you to access Impala
+ from a Java program that you write, or a Business Intelligence or similar tool that uses JDBC to communicate
+ with various database products.
+ </p>
+
+ <p class="p">
+ Setting up a JDBC connection to Impala involves the following steps:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Verifying the communication port where the Impala daemons in your cluster are listening for incoming JDBC
+ requests.
+ </li>
+
+ <li class="li">
+ Installing the JDBC driver on every system that runs the JDBC-enabled application.
+ </li>
+
+ <li class="li">
+ Specifying a connection string for the JDBC application to access one of the servers running the
+ <span class="keyword cmdname">impalad</span> daemon, with the appropriate security settings.
+ </li>
+ </ul>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_jdbc__jdbc_port">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Configuring the JDBC Port</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The default port used by JDBC 2.0 and later (as well as ODBC 2.x) is 21050. Impala server accepts JDBC
+ connections through this same port 21050 by default. Make sure this port is available for communication
+ with other hosts on your network, for example, that it is not blocked by firewall software. If your JDBC
+ client software connects to a different port, specify that alternative port number with the
+ <code class="ph codeph">--hs2_port</code> option when starting <code class="ph codeph">impalad</code>. See
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a> for details about Impala startup options. See
+ <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a> for information about all ports used for communication between Impala
+ and clients or between Impala components.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_jdbc__jdbc_driver_choice">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Choosing the JDBC Driver</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ In Impala 2.0 and later, you can use the Hive 0.13 JDBC driver. If you are
+ already using JDBC applications with an earlier Impala release, you should update
+ your JDBC driver, because the Hive 0.12 driver that was formerly the only choice
+ is not compatible with Impala 2.0 and later.
+ </p>
+
+ <p class="p">
+ The Hive JDBC driver provides a substantial speed increase for JDBC
+ applications with Impala 2.0 and higher, for queries that return large result sets.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ The Impala complex types (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+ are available in <span class="keyword">Impala 2.3</span> and higher.
+ To use these types with JDBC requires version 2.5.28 or higher of the JDBC Connector for Impala.
+ To use these types with ODBC requires version 2.5.30 or higher of the ODBC Connector for Impala.
+ Consider upgrading all JDBC and ODBC drivers at the same time you upgrade from <span class="keyword">Impala 2.3</span> or higher.
+ </p>
+ <p class="p">
+ Although the result sets from queries involving complex types consist of all scalar values,
+ the queries involve join notation and column references that might not be understood by
+ a particular JDBC or ODBC connector. Consider defining a view that represents the
+ flattened version of a table containing complex type columns, and pointing the JDBC
+ or ODBC application at the view.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_jdbc__jdbc_setup">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Enabling Impala JDBC Support on Client Systems</h2>
+
+
+ <div class="body conbody">
+
+ <section class="section" id="jdbc_setup__install_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+
+ <p class="p">
+ You install the Hive JDBC driver (<code class="ph codeph">hive-jdbc</code> package) through the Linux package manager, on
+ hosts within the cluster. The driver consists of several Java JAR files. The same driver can be used by Impala and Hive.
+ </p>
+
+ <p class="p">
+ To get the JAR files, install the Hive JDBC driver on each host in the cluster that will run
+ JDBC applications.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The latest JDBC driver, corresponding to Hive 0.13, provides substantial performance improvements for
+ Impala queries that return large result sets. Impala 2.0 and later are compatible with the Hive 0.13
+ driver. If you already have an older JDBC driver installed, and are running Impala 2.0 or higher, consider
+ upgrading to the latest Hive JDBC driver for best performance with JDBC applications.
+ </div>
+
+ <p class="p">
+ If you are using JDBC-enabled applications on hosts outside the cluster, you cannot use the the same install
+ procedure on the hosts. Install the JDBC driver on at least one cluster host using the preceding
+ procedure. Then download the JAR files to each client machine that will use JDBC with Impala:
+ </p>
+
+ <pre class="pre codeblock"><code>commons-logging-X.X.X.jar
+ hadoop-common.jar
+ hive-common-X.XX.X.jar
+ hive-jdbc-X.XX.X.jar
+ hive-metastore-X.XX.X.jar
+ hive-service-X.XX.X.jar
+ httpclient-X.X.X.jar
+ httpcore-X.X.X.jar
+ libfb303-X.X.X.jar
+ libthrift-X.X.X.jar
+ log4j-X.X.XX.jar
+ slf4j-api-X.X.X.jar
+ slf4j-logXjXX-X.X.X.jar
+ </code></pre>
+
+ <p class="p">
+ <strong class="ph b">To enable JDBC support for Impala on the system where you run the JDBC application:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Download the JAR files listed above to each client machine.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For Maven users, see
+ <a class="xref" href="https://github.com/onefoursix/Cloudera-Impala-JDBC-Example" target="_blank">this sample github page</a> for an example of the
+ dependencies you could add to a <code class="ph codeph">pom</code> file instead of downloading the individual JARs.
+ </div>
+ </li>
+
+ <li class="li">
+ Store the JAR files in a location of your choosing, ideally a directory already referenced in your
+ <code class="ph codeph">CLASSPATH</code> setting. For example:
+ <ul class="ul">
+ <li class="li">
+ On Linux, you might use a location such as <code class="ph codeph">/opt/jars/</code>.
+ </li>
+
+ <li class="li">
+ On Windows, you might use a subdirectory underneath <span class="ph filepath">C:\Program Files</span>.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ To successfully load the Impala JDBC driver, client programs must be able to locate the associated JAR
+ files. This often means setting the <code class="ph codeph">CLASSPATH</code> for the client process to include the
+ JARs. Consult the documentation for your JDBC client for more details on how to install new JDBC drivers,
+ but some examples of how to set <code class="ph codeph">CLASSPATH</code> variables include:
+ <ul class="ul">
+ <li class="li">
+ On Linux, if you extracted the JARs to <code class="ph codeph">/opt/jars/</code>, you might issue the following
+ command to prepend the JAR files path to an existing classpath:
+ <pre class="pre codeblock"><code>export CLASSPATH=/opt/jars/*.jar:$CLASSPATH</code></pre>
+ </li>
+
+ <li class="li">
+ On Windows, use the <strong class="ph b">System Properties</strong> control panel item to modify the <strong class="ph b">Environment
+ Variables</strong> for your system. Modify the environment variables to include the path to which you
+ extracted the files.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If the existing <code class="ph codeph">CLASSPATH</code> on your client machine refers to some older version of
+ the Hive JARs, ensure that the new JARs are the first ones listed. Either put the new JAR files
+ earlier in the listings, or delete the other references to Hive JAR files.
+ </div>
+ </li>
+ </ul>
+ </li>
+ </ol>
+ </section>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_jdbc__jdbc_connect">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Establishing JDBC Connections</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The JDBC driver class depends on which driver you select.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If your JDBC or ODBC application connects to Impala through a load balancer such as
+ <code class="ph codeph">haproxy</code>, be cautious about reusing the connections. If the load balancer has set up
+ connection timeout values, either check the connection frequently so that it never sits idle longer than
+ the load balancer timeout value, or check the connection validity before using it and create a new one if
+ the connection has been closed.
+ </div>
+
+ <section class="section" id="jdbc_connect__class_hive_driver"><h3 class="title sectiontitle">Using the Hive JDBC Driver</h3>
+
+
+ <p class="p">
+ For example, with the Hive JDBC driver, the class name is <code class="ph codeph">org.apache.hive.jdbc.HiveDriver</code>.
+ Once you have configured Impala to work with JDBC, you can establish connections between the two.
+ To do so for a cluster that does not use
+ Kerberos authentication, use a connection string of the form
+ <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;auth=noSasl</code>.
+
+ For example, you might use:
+ </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;auth=noSasl</code></pre>
+
+ <p class="p">
+ To connect to an instance of Impala that requires Kerberos authentication, use a connection string of the
+ form
+ <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/;principal=<var class="keyword varname">principal_name</var></code>.
+ The principal must be the same user principal you used when starting Impala. For example, you might use:
+ </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/;principal=impala/myhost.example.com@H2.EXAMPLE.COM</code></pre>
+
+ <p class="p">
+ To connect to an instance of Impala that requires LDAP authentication, use a connection string of the form
+ <code class="ph codeph">jdbc:hive2://<var class="keyword varname">host</var>:<var class="keyword varname">port</var>/<var class="keyword varname">db_name</var>;user=<var class="keyword varname">ldap_userid</var>;password=<var class="keyword varname">ldap_password</var></code>.
+ For example, you might use:
+ </p>
+
+<pre class="pre codeblock"><code>jdbc:hive2://myhost.example.com:21050/test_db;user=fred;password=xyz123</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+ and SSL encryption. If your cluster is running an older release that has this restriction,
+ use an alternative JDBC driver that supports
+ both of these security features.
+ </p>
+ </div>
+
+ </section>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_jdbc__jdbc_odbc_notes">
+ <h2 class="title topictitle2" id="ariaid-title6">Notes about JDBC and ODBC Interaction with Impala SQL Features</h2>
+ <div class="body conbody">
+ <p class="p">
+ Most Impala SQL features work equivalently through the <span class="keyword cmdname">impala-shell</span> interpreter
+ of the JDBC or ODBC APIs. The following are some exceptions to keep in mind when switching between
+ the interactive shell and applications using the APIs:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Queries involving the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+ require notation that might not be available in all levels of JDBC and ODBC drivers.
+ If you have trouble querying such a table due to the driver level or
+ inability to edit the queries used by the application, you can create a view that exposes
+ a <span class="q">"flattened"</span> version of the complex columns and point the application at the view.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The complex types available in <span class="keyword">Impala 2.3</span> and higher are supported by the
+ JDBC <code class="ph codeph">getColumns()</code> API.
+ Both <code class="ph codeph">MAP</code> and <code class="ph codeph">ARRAY</code> are reported as the JDBC SQL Type <code class="ph codeph">ARRAY</code>,
+ because this is the closest matching Java SQL type. This behavior is consistent with Hive.
+ <code class="ph codeph">STRUCT</code> types are reported as the JDBC SQL Type <code class="ph codeph">STRUCT</code>.
+ </p>
+ <div class="p">
+ To be consistent with Hive's behavior, the TYPE_NAME field is populated
+ with the primitive type name for scalar types, and with the full <code class="ph codeph">toSql()</code>
+ for complex types. The resulting type names are somewhat inconsistent,
+ because nested types are printed differently than top-level types. For example,
+ the following list shows how <code class="ph codeph">toSQL()</code> for Impala types are
+ translated to <code class="ph codeph">TYPE_NAME</code> values:
+<pre class="pre codeblock"><code>DECIMAL(10,10) becomes DECIMAL
+CHAR(10) becomes CHAR
+VARCHAR(10) becomes VARCHAR
+ARRAY<DECIMAL(10,10)> becomes ARRAY<DECIMAL(10,10)>
+ARRAY<CHAR(10)> becomes ARRAY<CHAR(10)>
+ARRAY<VARCHAR(10)> becomes ARRAY<VARCHAR(10)>
+
+</code></pre>
+ </div>
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_jdbc__jdbc_kudu">
+ <h2 class="title topictitle2" id="ariaid-title7">Kudu Considerations for DML Statements</h2>
+ <div class="body conbody">
+ <p class="p">
+ Currently, Impala <code class="ph codeph">INSERT</code>, <code class="ph codeph">UPDATE</code>, or
+ other DML statements issued through the JDBC interface against a Kudu
+ table do not return JDBC error codes for conditions such as duplicate
+ primary key columns. Therefore, for applications that issue a high
+ volume of DML statements, prefer to use the Kudu Java API directly
+ rather than a JDBC application.
+ </p>
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_joins.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_joins.html b/docs/build3x/html/topics/impala_joins.html
new file mode 100644
index 0000000..51ccf6b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_joins.html
@@ -0,0 +1,531 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="joins"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Joins in Impala SELECT Statements</title></head><body id="joins"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Joins in Impala SELECT Statements</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ A join query is a <code class="ph codeph">SELECT</code> statement that combines data from two or more tables,
+ and returns a result set containing items from some or all of those tables. It is a way to
+ cross-reference and correlate related data that is organized into multiple tables, typically
+ using identifiers that are repeated in each of the joined tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+ are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+ and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+ internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+ <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT <var class="keyword varname">select_list</var> FROM
+ <var class="keyword varname">table_or_subquery1</var> [INNER] JOIN <var class="keyword varname">table_or_subquery2</var> |
+ <var class="keyword varname">table_or_subquery1</var> {LEFT [OUTER] | RIGHT [OUTER] | FULL [OUTER]} JOIN <var class="keyword varname">table_or_subquery2</var> |
+ <var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} SEMI JOIN <var class="keyword varname">table_or_subquery2</var> |
+ <span class="ph"><var class="keyword varname">table_or_subquery1</var> {LEFT | RIGHT} ANTI JOIN <var class="keyword varname">table_or_subquery2</var> |</span>
+ [ ON <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...] |
+ USING (<var class="keyword varname">col1</var> [, <var class="keyword varname">col2</var> ...]) ]
+ [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+ <var class="keyword varname">table_or_subquery1</var>, <var class="keyword varname">table_or_subquery2</var> [, <var class="keyword varname">table_or_subquery3</var> ...]
+ [<var class="keyword varname">other_join_clause</var> ...]
+WHERE
+ <var class="keyword varname">col1</var> = <var class="keyword varname">col2</var> [AND <var class="keyword varname">col3</var> = <var class="keyword varname">col4</var> ...]
+
+SELECT <var class="keyword varname">select_list</var> FROM
+ <var class="keyword varname">table_or_subquery1</var> CROSS JOIN <var class="keyword varname">table_or_subquery2</var>
+ [<var class="keyword varname">other_join_clause</var> ...]
+[ WHERE <var class="keyword varname">where_clauses</var> ]</code></pre>
+
+ <p class="p">
+ <strong class="ph b">SQL-92 and SQL-89 Joins:</strong>
+ </p>
+
+ <p class="p">
+ Queries with the explicit <code class="ph codeph">JOIN</code> keywords are known as SQL-92 style joins, referring to the
+ level of the SQL standard where they were introduced. The corresponding <code class="ph codeph">ON</code> or
+ <code class="ph codeph">USING</code> clauses clearly show which columns are used as the join keys in each case:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+ <strong class="ph b">ON t1.id = t2.id and t1.type_flag = t2.type_flag</strong>
+ WHERE t1.c1 > 100;
+
+SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1 JOIN t2</strong>
+ <strong class="ph b">USING (id, type_flag)</strong>
+ WHERE t1.c1 > 100;</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ON</code> clause is a general way to compare columns across the two tables, even if the column
+ names are different. The <code class="ph codeph">USING</code> clause is a shorthand notation for specifying the join
+ columns, when the column names are the same in both tables. You can code equivalent <code class="ph codeph">WHERE</code>
+ clauses that compare the columns, instead of <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clauses, but that
+ practice is not recommended because mixing the join comparisons with other filtering clauses is typically
+ less readable and harder to maintain.
+ </p>
+
+ <p class="p">
+ Queries with a comma-separated list of tables and subqueries are known as SQL-89 style joins. In these
+ queries, the equality comparisons between columns of the joined tables go in the <code class="ph codeph">WHERE</code>
+ clause alongside other kinds of comparisons. This syntax is easy to learn, but it is also easy to
+ accidentally remove a <code class="ph codeph">WHERE</code> clause needed for the join to work correctly.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t2.c2 FROM <strong class="ph b">t1, t2</strong>
+ WHERE
+ <strong class="ph b">t1.id = t2.id AND t1.type_flag = t2.type_flag</strong>
+ AND t1.c1 > 100;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Self-joins:</strong>
+ </p>
+
+ <p class="p">
+ Impala can do self-joins, for example to join on two different columns in the same table to represent
+ parent-child relationships or other tree-structured data. There is no explicit syntax for this; just use the
+ same table name for both the left-hand and right-hand table, and assign different table aliases to use when
+ referring to the fully qualified column names:
+ </p>
+
+<pre class="pre codeblock"><code>-- Combine fields from both parent and child rows.
+SELECT lhs.id, rhs.parent, lhs.c1, rhs.c2 FROM tree_data lhs, tree_data rhs WHERE lhs.id = rhs.parent;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Cartesian joins:</strong>
+ </p>
+
+ <div class="p">
+ To avoid producing huge result sets by mistake, Impala does not allow Cartesian joins of the form:
+<pre class="pre codeblock"><code>SELECT ... FROM t1 JOIN t2;
+SELECT ... FROM t1, t2;</code></pre>
+ If you intend to join the tables based on common values, add <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code>
+ clauses to compare columns across the tables. If you truly intend to do a Cartesian join, use the
+ <code class="ph codeph">CROSS JOIN</code> keyword as the join operator. The <code class="ph codeph">CROSS JOIN</code> form does not use
+ any <code class="ph codeph">ON</code> clause, because it produces a result set with all combinations of rows from the
+ left-hand and right-hand tables. The result set can still be filtered by subsequent <code class="ph codeph">WHERE</code>
+ clauses. For example:
+ </div>
+
+<pre class="pre codeblock"><code>SELECT ... FROM t1 CROSS JOIN t2;
+SELECT ... FROM t1 CROSS JOIN t2 WHERE <var class="keyword varname">tests_on_non_join_columns</var>;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Inner and outer joins:</strong>
+ </p>
+
+ <p class="p">
+ An inner join is the most common and familiar type: rows in the result set contain the requested columns from
+ the appropriate tables, for all combinations of rows where the join columns of the tables have identical
+ values. If a column with the same name occurs in both tables, use a fully qualified name or a column alias to
+ refer to the column in the select list or other clauses. Impala performs inner joins by default for both
+ SQL-89 and SQL-92 join syntax:
+ </p>
+
+<pre class="pre codeblock"><code>-- The following 3 forms are all equivalent.
+SELECT t1.id, c1, c2 FROM t1, t2 WHERE t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 JOIN t2 ON t1.id = t2.id;
+SELECT t1.id, c1, c2 FROM t1 INNER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+ <p class="p">
+ An outer join retrieves all rows from the left-hand table, or the right-hand table, or both; wherever there
+ is no matching data in the table on the other side of the join, the corresponding columns in the result set
+ are set to <code class="ph codeph">NULL</code>. To perform an outer join, include the <code class="ph codeph">OUTER</code> keyword in the
+ join operator, along with either <code class="ph codeph">LEFT</code>, <code class="ph codeph">RIGHT</code>, or <code class="ph codeph">FULL</code>:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 RIGHT OUTER JOIN t2 ON t1.id = t2.id;
+SELECT * FROM t1 FULL OUTER JOIN t2 ON t1.id = t2.id;</code></pre>
+
+ <p class="p">
+ For outer joins, Impala requires SQL-92 syntax; that is, the <code class="ph codeph">JOIN</code> keyword instead of
+ comma-separated table names. Impala does not support vendor extensions such as <code class="ph codeph">(+)</code> or
+ <code class="ph codeph">*=</code> notation for doing outer joins with SQL-89 query syntax.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Equijoins and Non-Equijoins:</strong>
+ </p>
+
+ <p class="p">
+ By default, Impala requires an equality comparison between the left-hand and right-hand tables, either
+ through <code class="ph codeph">ON</code>, <code class="ph codeph">USING</code>, or <code class="ph codeph">WHERE</code> clauses. These types of
+ queries are classified broadly as equijoins. Inner, outer, full, and semi joins can all be equijoins based on
+ the presence of equality tests between columns in the left-hand and right-hand tables.
+ </p>
+
+ <p class="p">
+ In Impala 1.2.2 and higher, non-equijoin queries are also possible, with comparisons such as
+ <code class="ph codeph">!=</code> or <code class="ph codeph"><</code> between the join columns. These kinds of queries require care to
+ avoid producing huge result sets that could exceed resource limits. Once you have planned a non-equijoin
+ query that produces a result set of acceptable size, you can code the query using the <code class="ph codeph">CROSS
+ JOIN</code> operator, and add the extra comparisons in the <code class="ph codeph">WHERE</code> clause:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 CROSS JOIN t2 WHERE t1.total > t2.maximum_price;</code></pre>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, additional non-equijoin queries are possible due to the addition
+ of nested loop joins. These queries typically involve <code class="ph codeph">SEMI JOIN</code>,
+ <code class="ph codeph">ANTI JOIN</code>, or <code class="ph codeph">FULL OUTER JOIN</code> clauses.
+ Impala sometimes also uses nested loop joins internally when evaluating <code class="ph codeph">OUTER JOIN</code>
+ queries involving complex type columns.
+ Query phases involving nested loop joins do not use the spill-to-disk mechanism if they
+ exceed the memory limit. Impala decides internally when to use each join mechanism; you cannot
+ specify any query hint to choose between the nested loop join or the original hash join algorithm.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT * FROM t1 LEFT OUTER JOIN t2 ON t1.int_col < t2.int_col;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Semi-joins:</strong>
+ </p>
+
+ <p class="p">
+ Semi-joins are a relatively rarely used variation. With the left semi-join, only data from the left-hand
+ table is returned, for rows where there is matching data in the right-hand table, based on comparisons
+ between join columns in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses. Only one instance of each row
+ from the left-hand table is returned, regardless of how many matching rows exist in the right-hand table.
+ <span class="ph">A right semi-join (available in Impala 2.0 and higher) reverses the comparison and returns
+ data from the right-hand table.</span>
+ </p>
+
+<pre class="pre codeblock"><code>SELECT t1.c1, t1.c2, t1.c2 FROM t1 LEFT SEMI JOIN t2 ON t1.id = t2.id;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Natural joins (not supported):</strong>
+ </p>
+
+ <p class="p">
+ Impala does not support the <code class="ph codeph">NATURAL JOIN</code> operator, again to avoid inconsistent or huge
+ result sets. Natural joins do away with the <code class="ph codeph">ON</code> and <code class="ph codeph">USING</code> clauses, and
+ instead automatically join on all columns with the same names in the left-hand and right-hand tables. This
+ kind of query is not recommended for rapidly evolving data structures such as are typically used in Hadoop.
+ Thus, Impala does not support the <code class="ph codeph">NATURAL JOIN</code> syntax, which can produce different query
+ results as columns are added to or removed from tables.
+ </p>
+
+ <p class="p">
+ If you do have any queries that use <code class="ph codeph">NATURAL JOIN</code>, make sure to rewrite them with explicit
+ <code class="ph codeph">USING</code> clauses, because Impala could interpret the <code class="ph codeph">NATURAL</code> keyword as a
+ table alias:
+ </p>
+
+<pre class="pre codeblock"><code>-- 'NATURAL' is interpreted as an alias for 't1' and Impala attempts an inner join,
+-- resulting in an error because inner joins require explicit comparisons between columns.
+SELECT t1.c1, t2.c2 FROM t1 NATURAL JOIN t2;
+ERROR: NotImplementedException: Join with 't2' requires at least one conjunctive equality predicate.
+ To perform a Cartesian product between two tables, use a CROSS JOIN.
+
+-- If you expect the tables to have identically named columns with matching values,
+-- list the corresponding column names in a USING clause.
+SELECT t1.c1, t2.c2 FROM t1 JOIN t2 USING (id, type_flag, name, address);</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Anti-joins (<span class="keyword">Impala 2.0</span> and higher only):</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the <code class="ph codeph">LEFT ANTI JOIN</code> and <code class="ph codeph">RIGHT ANTI JOIN</code> clauses in
+ <span class="keyword">Impala 2.0</span> and higher. The <code class="ph codeph">LEFT</code> or <code class="ph codeph">RIGHT</code>
+ keyword is required for this kind of join. For <code class="ph codeph">LEFT ANTI JOIN</code>, this clause returns those
+ values from the left-hand table that have no matching value in the right-hand table. <code class="ph codeph">RIGHT ANTI
+ JOIN</code> reverses the comparison and returns values from the right-hand table. You can express this
+ negative relationship either through the <code class="ph codeph">ANTI JOIN</code> clause or through a <code class="ph codeph">NOT
+ EXISTS</code> operator with a subquery.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+
+
+ <p class="p">
+ When referring to a column with a complex type (<code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, or <code class="ph codeph">MAP</code>)
+ in a query, you use join notation to <span class="q">"unpack"</span> the scalar fields of the struct, the elements of the array, or
+ the key-value pairs of the map. (The join notation is not required for aggregation operations, such as
+ <code class="ph codeph">COUNT()</code> or <code class="ph codeph">SUM()</code> for array elements.) Because Impala recognizes which complex type elements are associated with which row
+ of the result set, you use the same syntax as for a cross or cartesian join, without an explicit join condition.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ You typically use join queries in situations like these:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When related data arrives from different sources, with each data set physically residing in a separate
+ table. For example, you might have address data from business records that you cross-check against phone
+ listings or census data.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Impala can join tables of different file formats, including Impala-managed tables and HBase tables. For
+ example, you might keep small dimension tables in HBase, for convenience of single-row lookups and
+ updates, and for the larger fact tables use Parquet or other binary file format optimized for scan
+ operations. Then, you can issue a join query to cross-reference the fact tables with the dimension
+ tables.
+ </div>
+ </li>
+
+ <li class="li">
+ When data is normalized, a technique for reducing data duplication by dividing it across multiple tables.
+ This kind of organization is often found in data that comes from traditional relational database systems.
+ For example, instead of repeating some long string such as a customer name in multiple tables, each table
+ might contain a numeric customer ID. Queries that need to display the customer name could <span class="q">"join"</span> the
+ table that specifies which customer ID corresponds to which name.
+ </li>
+
+ <li class="li">
+ When certain columns are rarely needed for queries, so they are moved into separate tables to reduce
+ overhead for common queries. For example, a <code class="ph codeph">biography</code> field might be rarely needed in
+ queries on employee data. Putting that field in a separate table reduces the amount of I/O for common
+ queries on employee addresses or phone numbers. Queries that do need the <code class="ph codeph">biography</code> column
+ can retrieve it by performing a join with that separate table.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.3</span> or higher, when referring to complex type columns in queries.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+ </li>
+ </ul>
+
+ <p class="p">
+ When comparing columns with the same names in <code class="ph codeph">ON</code> or <code class="ph codeph">WHERE</code> clauses, use the
+ fully qualified names such as <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>, or
+ assign table aliases, column aliases, or both to make the code more compact and understandable:
+ </p>
+
+<pre class="pre codeblock"><code>select t1.c1 as first_id, t2.c2 as second_id from
+ t1 join t2 on first_id = second_id;
+
+select fact.custno, dimension.custno from
+ customer_data as fact join customer_address as dimension
+ using (custno)</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Performance for join queries is a crucial aspect for Impala, because complex join queries are
+ resource-intensive operations. An efficient join query produces much less network traffic and CPU overhead
+ than an inefficient one. For best results:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Make sure that both <a class="xref" href="impala_perf_stats.html#perf_stats">table and column statistics</a> are
+ available for all the tables involved in a join query, and especially for the columns referenced in any
+ join conditions. Impala uses the statistics to automatically deduce an efficient join order.
+ Use <a class="xref" href="impala_show.html#show"><code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+ <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code></a> to check if statistics are
+ already present. Issue the <code class="ph codeph">COMPUTE STATS <var class="keyword varname">table_name</var></code> for a nonpartitioned table,
+ or (in Impala 2.1.0 and higher) <code class="ph codeph">COMPUTE INCREMENTAL STATS <var class="keyword varname">table_name</var></code>
+ for a partitioned table, to collect the initial statistics at both the table and column levels, and to keep the
+ statistics up to date after any substantial <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> operations.
+ </li>
+
+ <li class="li">
+ If table or column statistics are not available, join the largest table first. You can check the
+ existence of statistics with the <code class="ph codeph">SHOW TABLE STATS <var class="keyword varname">table_name</var></code> and
+ <code class="ph codeph">SHOW COLUMN STATS <var class="keyword varname">table_name</var></code> statements.
+ </li>
+
+ <li class="li">
+ If table or column statistics are not available, join subsequent tables according to which table has the
+ most selective filter, based on overall size and <code class="ph codeph">WHERE</code> clauses. Joining the table with
+ the most selective filter results in the fewest number of rows being returned.
+ </li>
+ </ul>
+ <p class="p">
+ For more information and examples of performance for join queries, see
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+ </p>
+ </div>
+
+ <p class="p">
+ To control the result set from a join query, include the names of corresponding column names in both tables
+ in an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause, or by coding equality comparisons for those
+ columns in the <code class="ph codeph">WHERE</code> clause.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select c_last_name, ca_city from customer join customer_address where c_customer_sk = ca_address_sk;
++-------------+-----------------+
+| c_last_name | ca_city |
++-------------+-----------------+
+| Lewis | Fairfield |
+| Moses | Fairview |
+| Hamilton | Pleasant Valley |
+| White | Oak Ridge |
+| Moran | Glendale |
+...
+| Richards | Lakewood |
+| Day | Lebanon |
+| Painter | Oak Hill |
+| Bentley | Greenfield |
+| Jones | Stringtown |
++-------------+------------------+
+Returned 50000 row(s) in 9.82s</code></pre>
+
+ <p class="p">
+ One potential downside of joins is the possibility of excess resource usage in poorly constructed queries.
+ Impala imposes restrictions on join queries to guard against such issues. To minimize the chance of runaway
+ queries on large data sets, Impala requires every join query to contain at least one equality predicate
+ between the columns of the various tables. For example, if <code class="ph codeph">T1</code> contains 1000 rows and
+ <code class="ph codeph">T2</code> contains 1,000,000 rows, a query <code class="ph codeph">SELECT <var class="keyword varname">columns</var> FROM t1 JOIN
+ t2</code> could return up to 1 billion rows (1000 * 1,000,000); Impala requires that the query include a
+ clause such as <code class="ph codeph">ON t1.c1 = t2.c2</code> or <code class="ph codeph">WHERE t1.c1 = t2.c2</code>.
+ </p>
+
+ <p class="p">
+ Because even with equality clauses, the result set can still be large, as we saw in the previous example, you
+ might use a <code class="ph codeph">LIMIT</code> clause to return a subset of the results:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select c_last_name, ca_city from customer, customer_address where c_customer_sk = ca_address_sk limit 10;
++-------------+-----------------+
+| c_last_name | ca_city |
++-------------+-----------------+
+| Lewis | Fairfield |
+| Moses | Fairview |
+| Hamilton | Pleasant Valley |
+| White | Oak Ridge |
+| Moran | Glendale |
+| Sharp | Lakeview |
+| Wiles | Farmington |
+| Shipman | Union |
+| Gilbert | New Hope |
+| Brunson | Martinsville |
++-------------+-----------------+
+Returned 10 row(s) in 0.63s</code></pre>
+
+ <p class="p">
+ Or you might use additional comparison operators or aggregation functions to condense a large result set into
+ a smaller set of values:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > -- Find the names of customers who live in one particular town.
+[localhost:21000] > select distinct c_last_name from customer, customer_address where
+ c_customer_sk = ca_address_sk
+ and ca_city = "Green Acres";
++---------------+
+| c_last_name |
++---------------+
+| Hensley |
+| Pearson |
+| Mayer |
+| Montgomery |
+| Ricks |
+...
+| Barrett |
+| Price |
+| Hill |
+| Hansen |
+| Meeks |
++---------------+
+Returned 332 row(s) in 0.97s
+
+[localhost:21000] > -- See how many different customers in this town have names starting with "A".
+[localhost:21000] > select count(distinct c_last_name) from customer, customer_address where
+ c_customer_sk = ca_address_sk
+ and ca_city = "Green Acres"
+ and substr(c_last_name,1,1) = "A";
++-----------------------------+
+| count(distinct c_last_name) |
++-----------------------------+
+| 12 |
++-----------------------------+
+Returned 1 row(s) in 1.00s</code></pre>
+
+ <p class="p">
+ Because a join query can involve reading large amounts of data from disk, sending large amounts of data
+ across the network, and loading large amounts of data into memory to do the comparisons and filtering, you
+ might do benchmarking, performance analysis, and query tuning to find the most efficient join queries for
+ your data set, hardware capacity, network configuration, and cluster workload.
+ </p>
+
+ <p class="p">
+ The two categories of joins in Impala are known as <strong class="ph b">partitioned joins</strong> and <strong class="ph b">broadcast joins</strong>. If
+ inaccurate table or column statistics, or some quirk of the data distribution, causes Impala to choose the
+ wrong mechanism for a particular join, consider using query hints as a temporary workaround. For details, see
+ <a class="xref" href="impala_hints.html#hints">Optimizer Hints</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Handling NULLs in Join Columns:</strong>
+ </p>
+
+ <p class="p">
+ By default, join key columns do not match if either one contains a <code class="ph codeph">NULL</code> value.
+ To treat such columns as equal if both contain <code class="ph codeph">NULL</code>, you can use an expression
+ such as <code class="ph codeph">A = B OR (A IS NULL AND B IS NULL)</code>.
+ In <span class="keyword">Impala 2.5</span> and higher, the <code class="ph codeph"><=></code> operator (shorthand for
+ <code class="ph codeph">IS NOT DISTINCT FROM</code>) performs the same comparison in a concise and efficient form.
+ The <code class="ph codeph"><=></code> operator is more efficient in for comparing join keys in a <code class="ph codeph">NULL</code>-safe
+ manner, because the operator can use a hash join while the <code class="ph codeph">OR</code> expression cannot.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="p">
+ The following examples refer to these simple tables containing small sets of integers:
+<pre class="pre codeblock"><code>[localhost:21000] > create table t1 (x int);
+[localhost:21000] > insert into t1 values (1), (2), (3), (4), (5), (6);
+
+[localhost:21000] > create table t2 (y int);
+[localhost:21000] > insert into t2 values (2), (4), (6);
+
+[localhost:21000] > create table t3 (z int);
+[localhost:21000] > insert into t3 values (1), (3), (5);
+</code></pre>
+ </div>
+
+
+
+ <p class="p">
+ The following example demonstrates an anti-join, returning the values from <code class="ph codeph">T1</code> that do not
+ exist in <code class="ph codeph">T2</code> (in this case, the odd numbers 1, 3, and 5):
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select x from t1 left anti join t2 on (t1.x = t2.y);
++---+
+| x |
++---+
+| 1 |
+| 3 |
+| 5 |
++---+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ See these tutorials for examples of different kinds of joins:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_tutorial.html#tut_cross_join">Cross Joins and Cartesian Products with the CROSS JOIN Operator</a>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
[30/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase.html b/docs/build3x/html/topics/impala_hbase.html
new file mode 100644
index 0000000..ef339ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase.html
@@ -0,0 +1,772 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.
0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="impala_hbase"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala to Query HBase Tables</title></head><body id="impala_hbase"><main role="main"><article role="article" aria-labelledby="impala_hbase__hbase">
+
+ <h1 class="title topictitle1" id="impala_hbase__hbase">Using Impala to Query HBase Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You can use Impala to query HBase tables. This capability allows convenient access to a storage system that
+ is tuned for different kinds of workloads than the default with Impala. The default Impala tables use data
+ files stored on HDFS, which are ideal for bulk loads and queries using full-table scans. In contrast, HBase
+ can do efficient queries for data organized for OLTP-style workloads, with lookups of individual rows or
+ ranges of values.
+ </p>
+
+ <p class="p">
+ From the perspective of an Impala user, coming from an RDBMS background, HBase is a kind of key-value store
+ where the value consists of multiple fields. The key is mapped to one column in the Impala table, and the
+ various fields of the value are mapped to the other columns in the Impala table.
+ </p>
+
+ <p class="p">
+ For background information on HBase, see <a class="xref" href="https://hbase.apache.org/book.html" target="_blank">the Apache HBase documentation</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="impala_hbase__hbase_using">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of Using HBase with Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ When you use Impala with HBase:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ You create the tables on the Impala side using the Hive shell, because the Impala <code class="ph codeph">CREATE
+ TABLE</code> statement currently does not support custom SerDes and some other syntax needed for these
+ tables:
+ <ul class="ul">
+ <li class="li">
+ You designate it as an HBase table using the <code class="ph codeph">STORED BY
+ 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'</code> clause on the Hive <code class="ph codeph">CREATE
+ TABLE</code> statement.
+ </li>
+
+ <li class="li">
+ You map these specially created tables to corresponding tables that exist in HBase, with the clause
+ <code class="ph codeph">TBLPROPERTIES("hbase.table.name" = "<var class="keyword varname">table_name_in_hbase</var>")</code> on the
+ Hive <code class="ph codeph">CREATE TABLE</code> statement.
+ </li>
+
+ <li class="li">
+ See <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for a full example.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ You define the column corresponding to the HBase row key as a string with the <code class="ph codeph">#string</code>
+ keyword, or map it to a <code class="ph codeph">STRING</code> column.
+ </li>
+
+ <li class="li">
+ Because Impala and Hive share the same metastore database, once you create the table in Hive, you can
+ query or insert into it through Impala. (After creating a new table through Hive, issue the
+ <code class="ph codeph">INVALIDATE METADATA</code> statement in <span class="keyword cmdname">impala-shell</span> to make Impala aware of
+ the new table.)
+ </li>
+
+ <li class="li">
+ You issue queries against the Impala tables. For efficient queries, use <code class="ph codeph">WHERE</code> clauses to
+ find a single key value or a range of key values wherever practical, by testing the Impala column
+ corresponding to the HBase row key. Avoid queries that do full-table scans, which are efficient for
+ regular Impala tables but inefficient in HBase.
+ </li>
+ </ul>
+
+ <p class="p">
+ To work with an HBase table from Impala, ensure that the <code class="ph codeph">impala</code> user has read/write
+ privileges for the HBase table, using the <code class="ph codeph">GRANT</code> command in the HBase shell. For details
+ about HBase security, see <a class="xref" href="https://hbase.apache.org/book.html#security" target="_blank">the Security chapter in the Apache HBase documentation</a>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="impala_hbase__hbase_config">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Configuring HBase for Use with Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ HBase works out of the box with Impala. There is no mandatory configuration needed to use these two
+ components together.
+ </p>
+
+ <p class="p">
+ To avoid delays if HBase is unavailable during Impala startup or after an <code class="ph codeph">INVALIDATE
+ METADATA</code> statement, set timeout values similar to the following in
+ <span class="ph filepath">/etc/impala/conf/hbase-site.xml</span>:
+ </p>
+
+<pre class="pre codeblock"><code><property>
+ <name>hbase.client.retries.number</name>
+ <value>3</value>
+</property>
+<property>
+ <name>hbase.rpc.timeout</name>
+ <value>3000</value>
+</property>
+</code></pre>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="impala_hbase__hbase_types">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Supported Data Types for HBase Columns</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To understand how Impala column data types are mapped to fields in HBase, you should have some background
+ knowledge about HBase first. You set up the mapping by running the <code class="ph codeph">CREATE TABLE</code> statement
+ in the Hive shell. See
+ <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+ Hive wiki</a> for a starting point, and <a class="xref" href="#hbase_queries">Examples of Querying HBase Tables from Impala</a> for examples.
+ </p>
+
+ <p class="p">
+ HBase works as a kind of <span class="q">"bit bucket"</span>, in the sense that HBase does not enforce any typing for the
+ key or value fields. All the type enforcement is done on the Impala side.
+ </p>
+
+ <p class="p">
+ For best performance of Impala queries against HBase tables, most queries will perform comparisons in the
+ <code class="ph codeph">WHERE</code> against the column that corresponds to the HBase row key. When creating the table
+ through the Hive shell, use the <code class="ph codeph">STRING</code> data type for the column that corresponds to the
+ HBase row key. Impala can translate conditional tests (through operators such as <code class="ph codeph">=</code>,
+ <code class="ph codeph"><</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">IN</code>) against this column into fast
+ lookups in HBase, but this optimization (<span class="q">"predicate pushdown"</span>) only works when that column is
+ defined as <code class="ph codeph">STRING</code>.
+ </p>
+
+ <p class="p">
+ Starting in Impala 1.1, Impala also supports reading and writing to columns that are defined in the Hive
+ <code class="ph codeph">CREATE TABLE</code> statement using binary data types, represented in the Hive table definition
+ using the <code class="ph codeph">#binary</code> keyword, often abbreviated as <code class="ph codeph">#b</code>. Defining numeric
+ columns as binary can reduce the overall data volume in the HBase tables. You should still define the
+ column that corresponds to the HBase row key as a <code class="ph codeph">STRING</code>, to allow fast lookups using
+ those columns.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="impala_hbase__hbase_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Performance Considerations for the Impala-HBase Integration</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To understand the performance characteristics of SQL queries against data stored in HBase, you should have
+ some background knowledge about how HBase interacts with SQL-oriented systems first. See
+ <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/HBaseIntegration" target="_blank">the
+ Hive wiki</a> for a starting point; because Impala shares the same metastore database as Hive, the
+ information about mapping columns from Hive tables to HBase tables is generally applicable to Impala too.
+ </p>
+
+ <p class="p">
+ Impala uses the HBase client API via Java Native Interface (JNI) to query data stored in HBase. This
+ querying does not read HFiles directly. The extra communication overhead makes it important to choose what
+ data to store in HBase or in HDFS, and construct efficient queries that can retrieve the HBase data
+ efficiently:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Use HBase table for queries that return a single row or a range of rows, not queries that scan the entire
+ table. (If a query has no <code class="ph codeph">WHERE</code> clause, that is a strong indicator that it is an
+ inefficient query for an HBase table.)
+ </li>
+
+ <li class="li">
+ If you have join queries that do aggregation operations on large fact tables and join the results against
+ small dimension tables, consider using Impala for the fact tables and HBase for the dimension tables.
+ (Because Impala does a full scan on the HBase table in this case, rather than doing single-row HBase
+ lookups based on the join column, only use this technique where the HBase table is small enough that
+ doing a full table scan does not cause a performance bottleneck for the query.)
+ </li>
+ </ul>
+
+ <p class="p">
+ Query predicates are applied to row keys as start and stop keys, thereby limiting the scope of a particular
+ lookup. If row keys are not mapped to string columns, then ordering is typically incorrect and comparison
+ operations do not work. For example, if row keys are not mapped to string columns, evaluating for greater
+ than (>) or less than (<) cannot be completed.
+ </p>
+
+ <p class="p">
+ Predicates on non-key columns can be sent to HBase to scan as <code class="ph codeph">SingleColumnValueFilters</code>,
+ providing some performance gains. In such a case, HBase returns fewer rows than if those same predicates
+ were applied using Impala. While there is some improvement, it is not as great when start and stop rows are
+ used. This is because the number of rows that HBase must examine is not limited as it is when start and
+ stop rows are used. As long as the row key predicate only applies to a single row, HBase will locate and
+ return that row. Conversely, if a non-key predicate is used, even if it only applies to a single row, HBase
+ must still scan the entire table to find the correct result.
+ </p>
+
+ <div class="example"><h3 class="title sectiontitle">Interpreting EXPLAIN Output for HBase Queries</h3>
+
+
+
+ <p class="p">
+ For example, here are some queries against the following Impala table, which is mapped to an HBase table.
+ The examples show excerpts from the output of the <code class="ph codeph">EXPLAIN</code> statement, demonstrating what
+ things to look for to indicate an efficient or inefficient query against an HBase table.
+ </p>
+
+ <p class="p">
+ The first column (<code class="ph codeph">cust_id</code>) was specified as the key column in the <code class="ph codeph">CREATE
+ EXTERNAL TABLE</code> statement; for performance, it is important to declare this column as
+ <code class="ph codeph">STRING</code>. Other columns, such as <code class="ph codeph">BIRTH_YEAR</code> and
+ <code class="ph codeph">NEVER_LOGGED_ON</code>, are also declared as <code class="ph codeph">STRING</code>, rather than their
+ <span class="q">"natural"</span> types of <code class="ph codeph">INT</code> or <code class="ph codeph">BOOLEAN</code>, because Impala can optimize
+ those types more effectively in HBase tables. For comparison, we leave one column,
+ <code class="ph codeph">YEAR_REGISTERED</code>, as <code class="ph codeph">INT</code> to show that filtering on this column is
+ inefficient.
+ </p>
+
+<pre class="pre codeblock"><code>describe hbase_table;
+Query: describe hbase_table
++-----------------------+--------+---------+
+| name | type | comment |
++-----------------------+--------+---------+
+| cust_id | <strong class="ph b">string</strong> | |
+| birth_year | <strong class="ph b">string</strong> | |
+| never_logged_on | <strong class="ph b">string</strong> | |
+| private_email_address | string | |
+| year_registered | <strong class="ph b">int</strong> | |
++-----------------------+--------+---------+
+</code></pre>
+
+ <p class="p">
+ The best case for performance involves a single row lookup using an equality comparison on the column
+ defined as the row key:
+ </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = 'some_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.01GB VCores=1 |
+| WARNING: The following tables are missing relevant table and/or column statistics. |
+| hbase.hbase_table |
+| |
+| 03:AGGREGATE [MERGE FINALIZE] |
+| | output: sum(count(*)) |
+| | |
+| 02:EXCHANGE [PARTITION=UNPARTITIONED] |
+| | |
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| start key: some_user@example.com |</strong>
+<strong class="ph b">| stop key: some_user@example.com\0 |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ Another type of efficient query involves a range lookup on the row key column, using SQL operators such
+ as greater than (or equal), less than (or equal), or <code class="ph codeph">BETWEEN</code>. This example also includes
+ an equality test on a non-key column; because that column is a <code class="ph codeph">STRING</code>, Impala can let
+ HBase perform that test, indicated by the <code class="ph codeph">hbase filters:</code> line in the
+ <code class="ph codeph">EXPLAIN</code> output. Doing the filtering within HBase is more efficient than transmitting all
+ the data to Impala and doing the filtering on the Impala side.
+ </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id between 'a' and 'b'
+ and never_logged_on = 'true';
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| start key: a |</strong>
+<strong class="ph b">| stop key: b\0 |</strong>
+<strong class="ph b">| hbase filters: cols:never_logged_on EQUAL 'true' |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The query is less efficient if Impala has to evaluate any of the predicates, because Impala must scan the
+ entire HBase table. Impala can only push down predicates to HBase for columns declared as
+ <code class="ph codeph">STRING</code>. This example tests a column declared as <code class="ph codeph">INT</code>, and the
+ <code class="ph codeph">predicates:</code> line in the <code class="ph codeph">EXPLAIN</code> output indicates that the test is
+ performed after the data is transmitted to Impala.
+ </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where year_registered = 2010;
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| predicates: year_registered = 2010 |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The same inefficiency applies if the key column is compared to any non-constant value. Here, even though
+ the key column is a <code class="ph codeph">STRING</code>, and is tested using an equality operator, Impala must scan
+ the entire HBase table because the key column is compared to another column value rather than a constant.
+ </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where cust_id = private_email_address;
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| predicates: cust_id = private_email_address |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ Currently, tests on the row key using <code class="ph codeph">OR</code> or <code class="ph codeph">IN</code> clauses are not
+ optimized into direct lookups either. Such limitations might be lifted in the future, so always check the
+ <code class="ph codeph">EXPLAIN</code> output to be sure whether a particular SQL construct results in an efficient
+ query or not for HBase tables.
+ </p>
+
+<pre class="pre codeblock"><code>explain select count(*) from hbase_table where
+ cust_id = 'some_user@example.com' or cust_id = 'other_user@example.com';
++----------------------------------------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| predicates: cust_id = 'some_user@example.com' OR cust_id = 'other_user@example.com' |</strong>
++----------------------------------------------------------------------------------------+
+
+explain select count(*) from hbase_table where
+ cust_id in ('some_user@example.com', 'other_user@example.com');
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+...
+
+| 01:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 00:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| predicates: cust_id IN ('some_user@example.com', 'other_user@example.com') |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ Either rewrite into separate queries for each value and combine the results in the application, or
+ combine the single-row queries using UNION ALL:
+ </p>
+
+<pre class="pre codeblock"><code>select count(*) from hbase_table where cust_id = 'some_user@example.com';
+select count(*) from hbase_table where cust_id = 'other_user@example.com';
+
+explain
+ select count(*) from hbase_table where cust_id = 'some_user@example.com'
+ union all
+ select count(*) from hbase_table where cust_id = 'other_user@example.com';
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+...
+
+| | 04:AGGREGATE |
+| | | output: count(*) |
+| | | |
+<strong class="ph b">| | 03:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| | start key: other_user@example.com |</strong>
+<strong class="ph b">| | stop key: other_user@example.com\0 |</strong>
+| | |
+| 10:MERGE |
+...
+
+| 02:AGGREGATE |
+| | output: count(*) |
+| | |
+<strong class="ph b">| 01:SCAN HBASE [hbase.hbase_table] |</strong>
+<strong class="ph b">| start key: some_user@example.com |</strong>
+<strong class="ph b">| stop key: some_user@example.com\0 |</strong>
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ </div>
+
+ <div class="example"><h3 class="title sectiontitle">Configuration Options for Java HBase Applications</h3>
+
+
+
+ <p class="p"> If you have an HBase Java application that calls the
+ <code class="ph codeph">setCacheBlocks</code> or <code class="ph codeph">setCaching</code>
+ methods of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, you can set these same
+ caching behaviors through Impala query options, to control the memory
+ pressure on the HBase RegionServer. For example, when doing queries in
+ HBase that result in full-table scans (which by default are
+ inefficient for HBase), you can reduce memory usage and speed up the
+ queries by turning off the <code class="ph codeph">HBASE_CACHE_BLOCKS</code> setting
+ and specifying a large number for the <code class="ph codeph">HBASE_CACHING</code>
+ setting.
+ </p>
+
+ <p class="p">
+ To set these options, issue commands like the following in <span class="keyword cmdname">impala-shell</span>:
+ </p>
+
+<pre class="pre codeblock"><code>-- Same as calling setCacheBlocks(true) or setCacheBlocks(false).
+set hbase_cache_blocks=true;
+set hbase_cache_blocks=false;
+
+-- Same as calling setCaching(rows).
+set hbase_caching=1000;
+</code></pre>
+
+ <p class="p">
+ Or update the <span class="keyword cmdname">impalad</span> defaults file <span class="ph filepath">/etc/default/impala</span> and
+ include settings for <code class="ph codeph">HBASE_CACHE_BLOCKS</code> and/or <code class="ph codeph">HBASE_CACHING</code> in the
+ <code class="ph codeph">-default_query_options</code> setting for <code class="ph codeph">IMPALA_SERVER_ARGS</code>. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In Impala 2.0 and later, these options are settable through the JDBC or ODBC interfaces using the
+ <code class="ph codeph">SET</code> statement.
+ </div>
+
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="impala_hbase__hbase_scenarios">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Use Cases for Querying HBase through Impala</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following are popular use cases for using Impala to query HBase tables:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Keeping large fact tables in Impala, and smaller dimension tables in HBase. The fact tables use Parquet
+ or other binary file format optimized for scan operations. Join queries scan through the large Impala
+ fact tables, and cross-reference the dimension tables using efficient single-row lookups in HBase.
+ </li>
+
+ <li class="li">
+ Using HBase to store rapidly incrementing counters, such as how many times a web page has been viewed, or
+ on a social network, how many connections a user has or how many votes a post received. HBase is
+ efficient for capturing such changeable data: the append-only storage mechanism is efficient for writing
+ each change to disk, and a query always returns the latest value. An application could query specific
+ totals like these from HBase, and combine the results with a broader set of data queried from Impala.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Storing very wide tables in HBase. Wide tables have many columns, possibly thousands, typically
+ recording many attributes for an important subject such as a user of an online service. These tables
+ are also often sparse, that is, most of the columns values are <code class="ph codeph">NULL</code>, 0,
+ <code class="ph codeph">false</code>, empty string, or other blank or placeholder value. (For example, any particular
+ web site user might have never used some site feature, filled in a certain field in their profile,
+ visited a particular part of the site, and so on.) A typical query against this kind of table is to
+ look up a single row to retrieve all the information about a specific subject, rather than summing,
+ averaging, or filtering millions of rows as in typical Impala-managed tables.
+ </p>
+ <p class="p">
+ Or the HBase table could be joined with a larger Impala-managed table. For example, analyze the large
+ Impala table representing web traffic for a site and pick out 50 users who view the most pages. Join
+ that result with the wide user table in HBase to look up attributes of those users. The HBase side of
+ the join would result in 50 efficient single-row lookups in HBase, rather than scanning the entire user
+ table.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="impala_hbase__hbase_loading">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Loading Data into an HBase Table</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala <code class="ph codeph">INSERT</code> statement works for HBase tables. The <code class="ph codeph">INSERT ... VALUES</code>
+ syntax is ideally suited to HBase tables, because inserting a single row is an efficient operation for an
+ HBase table. (For regular Impala tables, with data files in HDFS, the tiny data files produced by
+ <code class="ph codeph">INSERT ... VALUES</code> are extremely inefficient, so you would not use that technique with
+ tables containing any significant data volume.)
+ </p>
+
+
+
+ <p class="p">
+ When you use the <code class="ph codeph">INSERT ... SELECT</code> syntax, the result in the HBase table could be fewer
+ rows than you expect. HBase only stores the most recent version of each unique row key, so if an
+ <code class="ph codeph">INSERT ... SELECT</code> statement copies over multiple rows containing the same value for the
+ key column, subsequent queries will only return one row with each key column value:
+ </p>
+
+ <p class="p">
+ Although Impala does not have an <code class="ph codeph">UPDATE</code> statement, you can achieve the same effect by
+ doing successive <code class="ph codeph">INSERT</code> statements using the same value for the key column each time:
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="impala_hbase__hbase_limitations">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Limitations and Restrictions of the Impala and HBase Integration</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala integration with HBase has the following limitations and restrictions, some inherited from the
+ integration between HBase and Hive, and some unique to Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If you issue a <code class="ph codeph">DROP TABLE</code> for an internal (Impala-managed) table that is mapped to an
+ HBase table, the underlying table is not removed in HBase. The Hive <code class="ph codeph">DROP TABLE</code>
+ statement also removes the HBase table in this case.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT OVERWRITE</code> statement is not available for HBase tables. You can insert new
+ data, or modify an existing row by inserting a new row with the same key value, but not replace the
+ entire contents of the table. You can do an <code class="ph codeph">INSERT OVERWRITE</code> in Hive if you need this
+ capability.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you issue a <code class="ph codeph">CREATE TABLE LIKE</code> statement for a table mapped to an HBase table, the
+ new table is also an HBase table, but inherits the same underlying HBase table name as the original.
+ The new table is effectively an alias for the old one, not a new table with identical column structure.
+ Avoid using <code class="ph codeph">CREATE TABLE LIKE</code> for HBase tables, to avoid any confusion.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Copying data into an HBase table using the Impala <code class="ph codeph">INSERT ... SELECT</code> syntax might
+ produce fewer new rows than are in the query result set. If the result set contains multiple rows with
+ the same value for the key column, each row supercedes any previous rows with the same key value.
+ Because the order of the inserted rows is unpredictable, you cannot rely on this technique to preserve
+ the <span class="q">"latest"</span> version of a particular key value.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Because the complex data types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>)
+ available in <span class="keyword">Impala 2.3</span> and higher are currently only supported in Parquet tables, you cannot
+ use these types in HBase tables that are queried through Impala.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">LOAD DATA</code> statement cannot be used with HBase tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">TABLESAMPLE</code> clause of the <code class="ph codeph">SELECT</code>
+ statement does not apply to a table reference derived from a view, a subquery,
+ or anything other than a real base table. This clause only works for tables
+ backed by HDFS or HDFS-like data files, therefore it does not apply to Kudu or
+ HBase tables.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="impala_hbase__hbase_queries">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Examples of Querying HBase Tables from Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following examples create an HBase table with four column families,
+ create a corresponding table through Hive,
+ then insert and query the table through Impala.
+ </p>
+ <p class="p">
+ In HBase shell, the table
+ name is quoted in <code class="ph codeph">CREATE</code> and <code class="ph codeph">DROP</code> statements. Tables created in HBase
+ begin in <span class="q">"enabled"</span> state; before dropping them through the HBase shell, you must issue a
+ <code class="ph codeph">disable '<var class="keyword varname">table_name</var>'</code> statement.
+ </p>
+
+<pre class="pre codeblock"><code>$ hbase shell
+15/02/10 16:07:45
+HBase Shell; enter 'help<RETURN>' for list of supported commands.
+Type "exit<RETURN>" to leave the HBase Shell
+...
+
+hbase(main):001:0> create 'hbasealltypessmall', 'boolsCF', 'intsCF', 'floatsCF', 'stringsCF'
+0 row(s) in 4.6520 seconds
+
+=> Hbase::Table - hbasealltypessmall
+hbase(main):006:0> quit
+</code></pre>
+
+ <p class="p">
+ Issue the following <code class="ph codeph">CREATE TABLE</code> statement in the Hive shell. (The Impala <code class="ph codeph">CREATE
+ TABLE</code> statement currently does not support the <code class="ph codeph">STORED BY</code> clause, so you switch into Hive to
+ create the table, then back to Impala and the <span class="keyword cmdname">impala-shell</span> interpreter to issue the
+ queries.)
+ </p>
+
+ <p class="p">
+ This example creates an external table mapped to the HBase table, usable by both Impala and Hive. It is
+ defined as an external table so that when dropped by Impala or Hive, the original HBase table is not touched at all.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">WITH SERDEPROPERTIES</code> clause
+ specifies that the first column (<code class="ph codeph">ID</code>) represents the row key, and maps the remaining
+ columns of the SQL table to HBase column families. The mapping relies on the ordinal order of the
+ columns in the table, not the column names in the <code class="ph codeph">CREATE TABLE</code> statement.
+ The first column is defined to be the lookup key; the
+ <code class="ph codeph">STRING</code> data type produces the fastest key-based lookups for HBase tables.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For Impala with HBase tables, the most important aspect to ensure good performance is to use a
+ <code class="ph codeph">STRING</code> column as the row key, as shown in this example.
+ </div>
+
+<pre class="pre codeblock"><code>$ hive
+...
+hive> use hbase;
+OK
+Time taken: 4.095 seconds
+hive> CREATE EXTERNAL TABLE hbasestringids (
+ > id string,
+ > bool_col boolean,
+ > tinyint_col tinyint,
+ > smallint_col smallint,
+ > int_col int,
+ > bigint_col bigint,
+ > float_col float,
+ > double_col double,
+ > date_string_col string,
+ > string_col string,
+ > timestamp_col timestamp)
+ > STORED BY 'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
+ > WITH SERDEPROPERTIES (
+ > "hbase.columns.mapping" =
+ > ":key,boolsCF:bool_col,intsCF:tinyint_col,intsCF:smallint_col,intsCF:int_col,intsCF:\
+ > bigint_col,floatsCF:float_col,floatsCF:double_col,stringsCF:date_string_col,\
+ > stringsCF:string_col,stringsCF:timestamp_col"
+ > )
+ > TBLPROPERTIES("hbase.table.name" = "hbasealltypessmall");
+OK
+Time taken: 2.879 seconds
+hive> quit;
+</code></pre>
+
+ <p class="p">
+ Once you have established the mapping to an HBase table, you can issue DML statements and queries
+ from Impala. The following example shows a series of <code class="ph codeph">INSERT</code>
+ statements followed by a query.
+ The ideal kind of query from a performance standpoint
+ retrieves a row from the table based on a row key
+ mapped to a string column.
+ An initial <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code>
+ statement makes the table created through Hive visible to Impala.
+ </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost -d hbase
+Starting Impala Shell without Kerberos authentication
+Connected to localhost:21000
+...
+Query: use `hbase`
+[localhost:21000] > invalidate metadata hbasestringids;
+Fetched 0 row(s) in 0.09s
+[localhost:21000] > desc hbasestringids;
++-----------------+-----------+---------+
+| name | type | comment |
++-----------------+-----------+---------+
+| id | string | |
+| bool_col | boolean | |
+| double_col | double | |
+| float_col | float | |
+| bigint_col | bigint | |
+| int_col | int | |
+| smallint_col | smallint | |
+| tinyint_col | tinyint | |
+| date_string_col | string | |
+| string_col | string | |
+| timestamp_col | timestamp | |
++-----------------+-----------+---------+
+Fetched 11 row(s) in 0.02s
+[localhost:21000] > insert into hbasestringids values ('0001',true,3.141,9.94,1234567,32768,4000,76,'2014-12-31','Hello world',now());
+Inserted 1 row(s) in 0.26s
+[localhost:21000] > insert into hbasestringids values ('0002',false,2.004,6.196,1500,8000,129,127,'2014-01-01','Foo bar',now());
+Inserted 1 row(s) in 0.12s
+[localhost:21000] > select * from hbasestringids where id = '0001';
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| id | bool_col | double_col | float_col | bigint_col | int_col | smallint_col | tinyint_col | date_string_col | string_col | timestamp_col |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+| 0001 | true | 3.141 | 9.939999580383301 | 1234567 | 32768 | 4000 | 76 | 2014-12-31 | Hello world | 2015-02-10 16:36:59.764838000 |
++------+----------+------------+-------------------+------------+---------+--------------+-------------+-----------------+-------------+-------------------------------+
+Fetched 1 row(s) in 0.54s
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ After you create a table in Hive, such as the HBase mapping table in this example, issue an
+ <code class="ph codeph">INVALIDATE METADATA <var class="keyword varname">table_name</var></code> statement the next time you connect to
+ Impala, make Impala aware of the new table. (Prior to Impala 1.2.4, you could not specify the table name if
+ Impala was not aware of the table yet; in Impala 1.2.4 and higher, specifying the table name avoids
+ reloading the metadata for other tables that are not changed.)
+ </div>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_cache_blocks.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_cache_blocks.html b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
new file mode 100644
index 0000000..27ebee3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_cache_blocks.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_cache_blocks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHE_BLOCKS Query Option</title></head><body id="hbase_cache_blocks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHE_BLOCKS Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Setting this option is equivalent to calling the
+ <code class="ph codeph">setCacheBlocks</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+ application. Helps to control the memory pressure on the HBase
+ RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHING</code> query
+ option. </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+ <a class="xref" href="impala_hbase_caching.html#hbase_caching">HBASE_CACHING Query Option</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hbase_caching.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hbase_caching.html b/docs/build3x/html/topics/impala_hbase_caching.html
new file mode 100644
index 0000000..e2082d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hbase_caching.html
@@ -0,0 +1,36 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hbase_caching"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>HBASE_CACHING Query Option</title></head><body id="hbase_caching"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">HBASE_CACHING Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Setting this option is equivalent to calling the
+ <code class="ph codeph">setCaching</code> method of the class <a class="xref" href="http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/client/Scan.html" target="_blank">org.apache.hadoop.hbase.client.Scan</a>, in an HBase Java
+ application. Helps to control the memory pressure on the HBase
+ RegionServer, in conjunction with the <code class="ph codeph">HBASE_CACHE_BLOCKS</code>
+ query option. </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a>,
+ <a class="xref" href="impala_hbase_cache_blocks.html#hbase_cache_blocks">HBASE_CACHE_BLOCKS Query Option</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_hints.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_hints.html b/docs/build3x/html/topics/impala_hints.html
new file mode 100644
index 0000000..7777fa2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_hints.html
@@ -0,0 +1,488 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="hints"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Optimizer Hints</title></head><body id="hints"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Optimizer Hints</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala SQL supports
+ query hints, for fine-tuning the inner workings of queries. Specify hints
+ as a temporary workaround for expensive queries, where missing statistics
+ or other factors cause inefficient performance. </p>
+
+ <p class="p"> Hints are most often used for the resource-intensive Impala queries,
+ such as: </p>
+
+ <ul class="ul">
+ <li class="li">
+ Join queries involving large tables, where intermediate result sets are transmitted across the network to
+ evaluate the join conditions.
+ </li>
+
+ <li class="li">
+ Inserting into partitioned Parquet tables, where many memory buffers could be allocated on each host to
+ hold intermediate results for each partition.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p"> In <span class="keyword">Impala 2.0</span> and higher, you can
+ specify the hints inside comments that use either the <code class="ph codeph">/*
+ */</code> or <code class="ph codeph">--</code> notation. Specify a
+ <code class="ph codeph">+</code> symbol immediately before the hint name. Recently
+ added hints are only available using the <code class="ph codeph">/* */</code> and
+ <code class="ph codeph">--</code> notation. For clarity, the <code class="ph codeph">/* */</code>
+ and <code class="ph codeph">--</code> styles are used in the syntax and examples
+ throughout this section. With the <code class="ph codeph">/* */</code> or
+ <code class="ph codeph">--</code> notation for hints, specify a <code class="ph codeph">+</code>
+ symbol immediately before the first hint name. Multiple hints can be
+ specified separated by commas, for example <code class="ph codeph">/* +clustered,shuffle
+ */</code>
+ </p>
+
+<pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+ JOIN /* +BROADCAST|SHUFFLE */
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+ JOIN -- +BROADCAST|SHUFFLE
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+ /* +SHUFFLE|NOSHUFFLE */
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+ -- +SHUFFLE|NOSHUFFLE
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+INSERT /* +SHUFFLE|NOSHUFFLE */
+ <var class="keyword varname">insert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+INSERT -- +SHUFFLE|NOSHUFFLE
+ <var class="keyword varname">insert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT /* +SHUFFLE|NOSHUFFLE */
+ <var class="keyword varname">upsert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">
+UPSERT -- +SHUFFLE|NOSHUFFLE
+ <var class="keyword varname">upsert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">SELECT <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">table_ref</var>
+ /* +{SCHEDULE_CACHE_LOCAL | SCHEDULE_DISK_LOCAL | SCHEDULE_REMOTE}
+ [,RANDOM_REPLICA] */
+<var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT <var class="keyword varname">insert_clauses</var>
+ -- +CLUSTERED
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+ /* +CLUSTERED */
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+<span class="ph">INSERT -- +CLUSTERED
+ <var class="keyword varname">insert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+INSERT /* +CLUSTERED */
+ <var class="keyword varname">insert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT -- +CLUSTERED
+ <var class="keyword varname">upsert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+UPSERT /* +CLUSTERED */
+ <var class="keyword varname">upsert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+
+CREATE /* +SHUFFLE|NOSHUFFLE */
+ <var class="keyword varname">table_clauses</var>
+ AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +SHUFFLE|NOSHUFFLE
+ <var class="keyword varname">table_clauses</var>
+ AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE /* +CLUSTER|NOCLUSTER */
+ <var class="keyword varname">table_clauses</var>
+ AS SELECT <var class="keyword varname">remainder_of_query</var>;
+
+CREATE -- +CLUSTER|NOCLUSTER
+ <var class="keyword varname">table_clauses</var>
+ AS SELECT <var class="keyword varname">remainder_of_query</var>;
+</code></pre>
+ <p class="p">The square bracket style hints are supported for backward compatibility,
+ but the syntax is deprecated and will be removed in a future release. For
+ that reason, any newly added hints are not available with the square
+ bracket syntax.</p>
+ <pre class="pre codeblock"><code>SELECT STRAIGHT_JOIN <var class="keyword varname">select_list</var> FROM
+<var class="keyword varname">join_left_hand_table</var>
+ JOIN [{ /* +BROADCAST */ | /* +SHUFFLE */ }]
+<var class="keyword varname">join_right_hand_table</var>
+<var class="keyword varname">remainder_of_query</var>;
+
+INSERT <var class="keyword varname">insert_clauses</var>
+ [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+ [<span class="ph">/* +CLUSTERED */</span>]
+ SELECT <var class="keyword varname">remainder_of_query</var>;
+
+<span class="ph">
+UPSERT [{ /* +SHUFFLE */ | /* +NOSHUFFLE */ }]
+ [<span class="ph">/* +CLUSTERED */</span>]
+ <var class="keyword varname">upsert_clauses</var>
+ SELECT <var class="keyword varname">remainder_of_query</var>;</span>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ With both forms of hint syntax, include the <code class="ph codeph">STRAIGHT_JOIN</code>
+ keyword immediately after the <code class="ph codeph">SELECT</code> and any
+ <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords to prevent Impala from
+ reordering the tables in a way that makes the join-related hints ineffective.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">STRAIGHT_JOIN</code> hint affects the join order of table references in the query
+ block containing the hint. It does not affect the join order of nested queries, such as views,
+ inline views, or <code class="ph codeph">WHERE</code>-clause subqueries. To use this hint for performance
+ tuning of complex queries, apply the hint to all query blocks that need a fixed join order.
+ </p>
+
+ <p class="p">
+ To reduce the need to use hints, run the <code class="ph codeph">COMPUTE STATS</code> statement against all tables involved
+ in joins, or used as the source tables for <code class="ph codeph">INSERT ... SELECT</code> operations where the
+ destination is a partitioned Parquet table. Do this operation after loading data or making substantial
+ changes to the data within each table. Having up-to-date statistics helps Impala choose more efficient query
+ plans without the need for hinting. See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details and
+ examples.
+ </p>
+
+ <p class="p">
+ To see which join strategy is used for a particular query, examine the <code class="ph codeph">EXPLAIN</code> output for
+ that query. See <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details and examples.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Hints for join queries:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints control the execution strategy for join
+ queries. Specify one of the following constructs immediately after the <code class="ph codeph">JOIN</code> keyword in a
+ query:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">/* +SHUFFLE */</code> - Makes that join operation use the <span class="q">"partitioned"</span> technique, which divides
+ up corresponding rows from both tables using a hashing algorithm, sending subsets of the rows to other
+ nodes for processing. (The keyword <code class="ph codeph">SHUFFLE</code> is used to indicate a <span class="q">"partitioned join"</span>,
+ because that type of join is not related to <span class="q">"partitioned tables"</span>.) Since the alternative
+ <span class="q">"broadcast"</span> join mechanism is the default when table and index statistics are unavailable, you might
+ use this hint for queries where broadcast joins are unsuitable; typically, partitioned joins are more
+ efficient for joins between large tables of similar size.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +BROADCAST */</code> - Makes that join operation use the <span class="q">"broadcast"</span> technique that sends the
+ entire contents of the right-hand table to all nodes involved in processing the join. This is the default
+ mode of operation when table and index statistics are unavailable, so you would typically only need it if
+ stale metadata caused Impala to mistakenly choose a partitioned join operation. Typically, broadcast joins
+ are more efficient in cases where one table is much smaller than the other. (Put the smaller table on the
+ right side of the <code class="ph codeph">JOIN</code> operator.)
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Hints for INSERT ... SELECT and CREATE TABLE AS SELECT (CTAS):</strong>
+ </p>
+ <p class="p" id="hints__insert_hints">
+ When inserting into partitioned tables, such as using the Parquet file
+ format, you can include a hint in the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT(CTAS)</code>
+ statements to fine-tune the overall performance of the operation and its
+ resource usage.</p>
+ <p class="p">
+ You would only use hints if an <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CTAS</code> into a partitioned table was failing due to
+ capacity limits, or if such an operation was succeeding but with
+ less-than-optimal performance.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">/* +SHUFFLE */</code> and <code class="ph codeph">/* +NOSHUFFLE */</code> Hints
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">/* +SHUFFLE */</code> adds an exchange node, before
+ writing the data, which re-partitions the result of the
+ <code class="ph codeph">SELECT</code> based on the partitioning columns of the
+ target table. With this hint, only one node writes to a partition at
+ a time, minimizing the global number of simultaneous writes and the
+ number of memory buffers holding data for individual partitions.
+ This also reduces fragmentation, resulting in fewer files. Thus it
+ reduces overall resource usage of the <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CTAS</code> operation and allows some operations to
+ succeed that otherwise would fail. It does involve some data
+ transfer between the nodes so that the data files for a particular
+ partition are all written on the same node.
+
+ <p class="p">
+ Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">CTAS</code> statement fails or runs inefficiently due
+ to all nodes attempting to write data for all partitions.
+ </p>
+
+ <p class="p"> If the table is unpartitioned or every partitioning expression
+ is constant, then <code class="ph codeph">/* +SHUFFLE */</code> will cause every
+ write to happen on the coordinator node.
+ </p>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +NOSHUFFLE */</code> does not add exchange node before
+ inserting to partitioned tables and disables re-partitioning. So the
+ selected execution plan might be faster overall, but might also
+ produce a larger number of small data files or exceed capacity
+ limits, causing the <code class="ph codeph">INSERT</code> or <code class="ph codeph">CTAS</code>
+ operation to fail.
+
+ <p class="p"> Impala automatically uses the <code class="ph codeph">/*
+ +SHUFFLE */</code> method if any partition key column in the
+ source table, mentioned in the <code class="ph codeph">SELECT</code> clause,
+ does not have column statistics. In this case, use the <code class="ph codeph">/*
+ +NOSHUFFLE */</code> hint if you want to override this default
+ behavior.
+ </p>
+ </li>
+
+ <li class="li">
+ If column statistics are available for all partition key columns
+ in the source table mentioned in the <code class="ph codeph">INSERT ...
+ SELECT</code> or <code class="ph codeph">CTAS</code> query, Impala chooses
+ whether to use the <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/*
+ +NOSHUFFLE */</code> technique based on the estimated number of
+ distinct values in those columns and the number of nodes involved in
+ the operation. In this case, you might need the <code class="ph codeph">/* +SHUFFLE
+ */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to
+ override the execution plan selected by Impala.
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +CLUSTERED */</code> and <code class="ph codeph">/* +NOCLUSTERED
+ */</code> Hints
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">/* +CLUSTERED */</code> sorts data by the partition
+ columns before inserting to ensure that only one partition is
+ written at a time per node. Use this hint to reduce the number of
+ files kept open and the number of buffers kept in memory
+ simultaneously. This technique is primarily useful for inserts into
+ Parquet tables, where the large block size requires substantial
+ memory to buffer data for multiple output files at once. This hint
+ is available in <span class="keyword">Impala 2.8</span> or higher.
+
+ <p class="p">
+ Starting in <span class="keyword">Impala 3.0</span>, <code class="ph codeph">/*
+ +CLUSTERED */</code> is the default behavior for HDFS tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +NOCLUSTERED */</code> does not sort by primary key
+ before insert. This hint is available in <span class="keyword">Impala 2.8</span> or higher.
+
+ <p class="p">
+ Use this hint when inserting to Kudu tables.
+ </p>
+
+ <p class="p">
+ In the versions lower than <span class="keyword">Impala 3.0</span>,
+ <code class="ph codeph">/* +NOCLUSTERED */</code> is the default in HDFS
+ tables.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <p class="p">
+ Starting from <span class="keyword">Impala 2.9</span>, <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">UPSERT</code> operations into Kudu tables automatically have
+ an exchange and sort node added to the plan that partitions and sorts the
+ rows according to the partitioning/primary key scheme of the target table
+ (unless the number of rows to be inserted is small enough to trigger
+ single node execution). Use the<code class="ph codeph"> /* +NOCLUSTERED */</code> and
+ <code class="ph codeph">/* +NOSHUFFLE */</code> hints together to disable partitioning
+ and sorting before the rows are sent to Kudu.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Hints for scheduling of HDFS blocks:</strong>
+ </p>
+
+ <p class="p">
+ The hints <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL */</code>,
+ <code class="ph codeph">/* +SCHEDULE_DISK_LOCAL */</code>, and
+ <code class="ph codeph">/* +SCHEDULE_REMOTE */</code> have the same effect
+ as specifying the <code class="ph codeph">REPLICA_PREFERENCE</code> query
+ option with the respective option settings of <code class="ph codeph">CACHE_LOCAL</code>,
+ <code class="ph codeph">DISK_LOCAL</code>, or <code class="ph codeph">REMOTE</code>.
+ The hint <code class="ph codeph">/* +RANDOM_REPLICA */</code> is the same as
+ enabling the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code> query option.
+ </p>
+
+ <p class="p">
+ You can use these hints in combination by separating them with commas,
+ for example, <code class="ph codeph">/* +SCHEDULE_CACHE_LOCAL,RANDOM_REPLICA */</code>.
+ See <a class="xref" href="impala_replica_preference.html">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> and
+ <a class="xref" href="impala_schedule_random_replica.html">SCHEDULE_RANDOM_REPLICA Query Option (Impala 2.5 or higher only)</a> for information about how
+ these settings influence the way Impala processes HDFS data blocks.
+ </p>
+
+ <p class="p">
+ Specifying the replica preference as a query hint always overrides the
+ query option setting. Specifying either the <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+ query option or the corresponding <code class="ph codeph">RANDOM_REPLICA</code> query hint
+ enables the random tie-breaking behavior when processing data blocks
+ during the query.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Suggestions versus directives:</strong>
+ </p>
+
+ <p class="p">
+ In early Impala releases, hints were always obeyed and so acted more like directives. Once Impala gained join
+ order optimizations, sometimes join queries were automatically reordered in a way that made a hint
+ irrelevant. Therefore, the hints act more like suggestions in Impala 1.2.2 and higher.
+ </p>
+
+ <p class="p">
+ To force Impala to follow the hinted execution mechanism for a join query, include the
+ <code class="ph codeph">STRAIGHT_JOIN</code> keyword in the <code class="ph codeph">SELECT</code> statement. See
+ <a class="xref" href="impala_perf_joins.html#straight_join">Overriding Join Reordering with STRAIGHT_JOIN</a> for details. When you use this technique, Impala does not
+ reorder the joined tables at all, so you must be careful to arrange the join order to put the largest table
+ (or subquery result set) first, then the smallest, second smallest, third smallest, and so on. This ordering lets Impala do the
+ most I/O-intensive parts of the query using local reads on the DataNodes, and then reduce the size of the
+ intermediate result set as much as possible as each subsequent table or subquery result set is joined.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ Queries that include subqueries in the <code class="ph codeph">WHERE</code> clause can be rewritten internally as join
+ queries. Currently, you cannot apply hints to the joins produced by these types of queries.
+ </p>
+
+ <p class="p">
+ Because hints can prevent queries from taking advantage of new metadata or improvements in query planning,
+ use them only when required to work around performance issues, and be prepared to remove them when they are
+ no longer required, such as after a new Impala release or bug fix.
+ </p>
+
+ <p class="p">
+ In particular, the <code class="ph codeph">/* +BROADCAST */</code> and <code class="ph codeph">/* +SHUFFLE */</code> hints are expected to be
+ needed much less frequently in Impala 1.2.2 and higher, because the join order optimization feature in
+ combination with the <code class="ph codeph">COMPUTE STATS</code> statement now automatically choose join order and join
+ mechanism without the need to rewrite the query and add hints. See
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <p class="p">
+ The hints embedded within <code class="ph codeph">--</code> comments are compatible with Hive queries. The hints embedded
+ within <code class="ph codeph">/* */</code> comments or <code class="ph codeph">[ ]</code> square brackets are not recognized by or not
+ compatible with Hive. For example, Hive raises an error for Impala hints within <code class="ph codeph">/* */</code>
+ comments because it does not recognize the Impala hint names.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Considerations for views:</strong>
+ </p>
+
+ <p class="p">
+ If you use a hint in the query that defines a view, the hint is preserved when you query the view. Impala
+ internally rewrites all hints in views to use the <code class="ph codeph">--</code> comment notation, so that Hive can
+ query such views without errors due to unrecognized hint names.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ For example, this query joins a large customer table with a small lookup table of less than 100 rows. The
+ right-hand table can be broadcast efficiently to all nodes involved in the join. Thus, you would use the
+ <code class="ph codeph">/* +broadcast */</code> hint to force a broadcast join strategy:
+ </p>
+
+<pre class="pre codeblock"><code>select straight_join customer.address, state_lookup.state_name
+ from customer join <strong class="ph b">/* +broadcast */</strong> state_lookup
+ on customer.state_id = state_lookup.state_id;</code></pre>
+
+ <p class="p">
+ This query joins two large tables of unpredictable size. You might benchmark the query with both kinds of
+ hints and find that it is more efficient to transmit portions of each table to other nodes for processing.
+ Thus, you would use the <code class="ph codeph">/* +shuffle */</code> hint to force a partitioned join strategy:
+ </p>
+
+<pre class="pre codeblock"><code>select straight_join weather.wind_velocity, geospatial.altitude
+ from weather join <strong class="ph b">/* +shuffle */</strong> geospatial
+ on weather.lat = geospatial.lat and weather.long = geospatial.long;</code></pre>
+
+ <p class="p">
+ For joins involving three or more tables, the hint applies to the tables on either side of that specific
+ <code class="ph codeph">JOIN</code> keyword. The <code class="ph codeph">STRAIGHT_JOIN</code> keyword ensures that joins are processed
+ in a predictable order from left to right. For example, this query joins
+ <code class="ph codeph">t1</code> and <code class="ph codeph">t2</code> using a partitioned join, then joins that result set to
+ <code class="ph codeph">t3</code> using a broadcast join:
+ </p>
+
+<pre class="pre codeblock"><code>select straight_join t1.name, t2.id, t3.price
+ from t1 join <strong class="ph b">/* +shuffle */</strong> t2 join <strong class="ph b">/* +broadcast */</strong> t3
+ on t1.id = t2.id and t2.id = t3.id;</code></pre>
+
+
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ For more background information about join queries, see <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>. For
+ performance considerations, see <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_identifiers.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_identifiers.html b/docs/build3x/html/topics/impala_identifiers.html
new file mode 100644
index 0000000..267b91d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_identifiers.html
@@ -0,0 +1,110 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="identifiers"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Identifiers</title></head><body id="identifiers"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Identifiers</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Identifiers are the names of databases, tables, or columns that you specify in a SQL statement. The rules for
+ identifiers govern what names you can give to things you create, the notation for referring to names
+ containing unusual characters, and other aspects such as case sensitivity.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The minimum length of an identifier is 1 character.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The maximum length of an identifier is currently 128 characters, enforced by the metastore database.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ An identifier must start with an alphabetic character. The remainder can contain any combination of
+ alphanumeric characters and underscores. Quoting the identifier with backticks has no effect on the allowed
+ characters in the name.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ An identifier can contain only ASCII characters.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ To use an identifier name that matches one of the Impala reserved keywords (listed in
+ <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with <code class="ph codeph">``</code>
+ characters (backticks). Quote the reserved word even if it is part of a fully qualified name.
+ The following example shows how a reserved word can be used as a column name if it is quoted
+ with backticks in the <code class="ph codeph">CREATE TABLE</code> statement, and how the column name
+ must also be quoted with backticks in a query:
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table reserved (`data` string);
+
+[localhost:21000] > select data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select data from reserved
+ ^
+Encountered: DATA
+Expected: ALL, CASE, CAST, DISTINCT, EXISTS, FALSE, IF, INTERVAL, NOT, NULL, STRAIGHT_JOIN, TRUE, IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] > select reserved.data from reserved;
+ERROR: AnalysisException: Syntax error in line 1:
+select reserved.data from reserved
+ ^
+Encountered: DATA
+Expected: IDENTIFIER
+CAUSED BY: Exception: Syntax error
+
+[localhost:21000] > select reserved.`data` from reserved;
+
+[localhost:21000] >
+</code></pre>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ Because the list of reserved words grows over time as new SQL syntax is added,
+ consider adopting coding conventions (especially for any automated scripts
+ or in packaged applications) to always quote all identifiers with backticks.
+ Quoting all identifiers protects your SQL from compatibility issues if
+ new reserved words are added in later releases.
+ </div>
+
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala identifiers are always case-insensitive. That is, tables named <code class="ph codeph">t1</code> and
+ <code class="ph codeph">T1</code> always refer to the same table, regardless of quote characters. Internally, Impala
+ always folds all specified table and column names to lowercase. This is why the column headers in query
+ output are always displayed in lowercase.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_aliases.html#aliases">Overview of Impala Aliases</a> for how to define shorter or easier-to-remember aliases if the
+ original names are long or cryptic identifiers.
+ <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+ insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+ include additional characters such as spaces and dashes when they are quoted using backtick characters.
+ </span>
+ </p>
+
+ <p class="p">
+ Another way to define different names for the same tables or columns is to create views. See
+ <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
[17/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_optimize_partition_key_scans.html b/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
new file mode 100644
index 0000000..6fea36f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_optimize_partition_key_scans.html
@@ -0,0 +1,188 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="optimize_partition_key_scans"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</title></head><body id="optimize_partition_key_scans"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">OPTIMIZE_PARTITION_KEY_SCANS Query Option (<span class="keyword">Impala 2.5</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Enables a fast code path for queries that apply simple aggregate functions to partition key
+ columns: <code class="ph codeph">MIN(<var class="keyword varname">key_column</var>)</code>, <code class="ph codeph">MAX(<var class="keyword varname">key_column</var>)</code>,
+ or <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">key_column</var>)</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ In <span class="keyword">Impala 2.5.0</span>, only the value 1 enables the option, and the value
+ <code class="ph codeph">true</code> is not recognized. This limitation is
+ tracked by the issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3334" target="_blank">IMPALA-3334</a>,
+ which shows the releases where the problem is fixed.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This optimization speeds up common <span class="q">"introspection"</span> operations when using queries
+ to calculate the cardinality and range for partition key columns.
+ </p>
+
+ <p class="p">
+ This optimization does not apply if the queries contain any <code class="ph codeph">WHERE</code>,
+ <code class="ph codeph">GROUP BY</code>, or <code class="ph codeph">HAVING</code> clause. The relevant queries
+ should only compute the minimum, maximum, or number of distinct values for the
+ partition key columns across the whole table.
+ </p>
+
+ <p class="p">
+ This optimization is enabled by a query option because it skips some consistency checks
+ and therefore can return slightly different partition values if partitions are in the
+ process of being added, dropped, or loaded outside of Impala. Queries might exhibit different
+ behavior depending on the setting of this option in the following cases:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If files are removed from a partition using HDFS or other non-Impala operations,
+ there is a period until the next <code class="ph codeph">REFRESH</code> of the table where regular
+ queries fail at run time because they detect the missing files. With this optimization
+ enabled, queries that evaluate only the partition key column values (not the contents of
+ the partition itself) succeed, and treat the partition as if it still exists.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ If a partition contains any data files, but the data files do not contain any rows,
+ a regular query considers that the partition does not exist. With this optimization
+ enabled, the partition is treated as if it exists.
+ </p>
+ <p class="p">
+ If the partition includes no files at all, this optimization does not change the query
+ behavior: the partition is considered to not exist whether or not this optimization is enabled.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows initial schema setup and the default behavior of queries that
+ return just the partition key column for a table:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Make a partitioned table with 3 partitions.
+create table t1 (s string) partitioned by (year int);
+insert into t1 partition (year=2015) values ('last year');
+insert into t1 partition (year=2016) values ('this year');
+insert into t1 partition (year=2017) values ('next year');
+
+-- Regardless of the option setting, this query must read the
+-- data files to know how many rows to return for each year value.
+explain select year from t1;
++-----------------------------------------------------+
+| Explain String |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+| |
+| F00:PLAN FRAGMENT [UNPARTITIONED] |
+| 00:SCAN HDFS [key_cols.t1] |
+| partitions=3/3 files=4 size=40B |
+| table stats: 3 rows total |
+| column stats: all |
+| hosts=3 per-host-mem=unavailable |
+| tuple-ids=0 row-size=4B cardinality=3 |
++-----------------------------------------------------+
+
+-- The aggregation operation means the query does not need to read
+-- the data within each partition: the result set contains exactly 1 row
+-- per partition, derived from the partition key column value.
+-- By default, Impala still includes a 'scan' operation in the query.
+explain select distinct year from t1;
++------------------------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+| |
+| 01:AGGREGATE [FINALIZE] |
+| | group by: year |
+| | |
+| 00:SCAN HDFS [key_cols.t1] |
+| partitions=0/0 files=0 size=0B |
++------------------------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the plan is made more efficient when the
+ <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> option is enabled:
+ </p>
+
+<pre class="pre codeblock"><code>
+set optimize_partition_key_scans=1;
+OPTIMIZE_PARTITION_KEY_SCANS set to 1
+
+-- The aggregation operation is turned into a UNION internally,
+-- with constant values known in advance based on the metadata
+-- for the partitioned table.
+explain select distinct year from t1;
++-----------------------------------------------------+
+| Explain String |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+| |
+| F00:PLAN FRAGMENT [UNPARTITIONED] |
+| 01:AGGREGATE [FINALIZE] |
+| | group by: year |
+| | hosts=1 per-host-mem=unavailable |
+| | tuple-ids=1 row-size=4B cardinality=3 |
+| | |
+| 00:UNION |
+| constant-operands=3 |
+| hosts=1 per-host-mem=unavailable |
+| tuple-ids=0 row-size=4B cardinality=3 |
++-----------------------------------------------------+
+
+-- The same optimization applies to other aggregation queries
+-- that only return values based on partition key columns:
+-- MIN, MAX, COUNT(DISTINCT), and so on.
+explain select min(year) from t1;
++-----------------------------------------------------+
+| Explain String |
++-----------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=0B VCores=0 |
+| |
+| F00:PLAN FRAGMENT [UNPARTITIONED] |
+| 01:AGGREGATE [FINALIZE] |
+| | output: min(year) |
+| | hosts=1 per-host-mem=unavailable |
+| | tuple-ids=1 row-size=4B cardinality=1 |
+| | |
+| 00:UNION |
+| constant-operands=3 |
+| hosts=1 per-host-mem=unavailable |
+| tuple-ids=0 row-size=4B cardinality=3 |
++-----------------------------------------------------+
+</code></pre>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_order_by.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_order_by.html b/docs/build3x/html/topics/impala_order_by.html
new file mode 100644
index 0000000..b4cc1f3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_order_by.html
@@ -0,0 +1,398 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="order_by"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ORDER BY Clause</title></head><body id="order_by"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ORDER BY Clause</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The familiar <code class="ph codeph">ORDER BY</code> clause of a <code class="ph codeph">SELECT</code> statement sorts the result set
+ based on the values from one or more columns.
+ </p>
+
+ <p class="p">
+ For distributed queries, this is a relatively expensive operation, because the entire result set must be
+ produced and transferred to one node before the sorting can happen. This can require more memory capacity
+ than a query without <code class="ph codeph">ORDER BY</code>. Even if the query takes approximately the same time to finish
+ with or without the <code class="ph codeph">ORDER BY</code> clause, subjectively it can appear slower because no results
+ are available until all processing is finished, rather than results coming back gradually as rows matching
+ the <code class="ph codeph">WHERE</code> clause are found. Therefore, if you only need the first N results from the sorted
+ result set, also include the <code class="ph codeph">LIMIT</code> clause, which reduces network overhead and the memory
+ requirement on the coordinator node.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In Impala 1.4.0 and higher, the <code class="ph codeph">LIMIT</code> clause is now optional (rather than required) for
+ queries that use the <code class="ph codeph">ORDER BY</code> clause. Impala automatically uses a temporary disk work area
+ to perform the sort if the sort operation would otherwise exceed the Impala memory limit for a particular
+ DataNode.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ The full syntax for the <code class="ph codeph">ORDER BY</code> clause is:
+ </p>
+
+<pre class="pre codeblock"><code>ORDER BY <var class="keyword varname">col_ref</var> [, <var class="keyword varname">col_ref</var> ...] [ASC | DESC] [NULLS FIRST | NULLS LAST]
+
+col_ref ::= <var class="keyword varname">column_name</var> | <var class="keyword varname">integer_literal</var>
+</code></pre>
+
+ <p class="p">
+ Although the most common usage is <code class="ph codeph">ORDER BY <var class="keyword varname">column_name</var></code>, you can also
+ specify <code class="ph codeph">ORDER BY 1</code> to sort by the first column of the result set, <code class="ph codeph">ORDER BY
+ 2</code> to sort by the second column, and so on. The number must be a numeric literal, not some other kind
+ of constant expression. (If the argument is some other expression, even a <code class="ph codeph">STRING</code> value, the
+ query succeeds but the order of results is undefined.)
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">ORDER BY <var class="keyword varname">column_number</var></code> can only be used when the query explicitly lists
+ the columns in the <code class="ph codeph">SELECT</code> list, not with <code class="ph codeph">SELECT *</code> queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Ascending and descending sorts:</strong>
+ </p>
+
+ <p class="p">
+ The default sort order (the same as using the <code class="ph codeph">ASC</code> keyword) puts the smallest values at the
+ start of the result set, and the largest values at the end. Specifying the <code class="ph codeph">DESC</code> keyword
+ reverses that order.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Sort order for NULL values:</strong>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are positioned
+ in the sorted result set, and how to use the <code class="ph codeph">NULLS FIRST</code> and <code class="ph codeph">NULLS LAST</code>
+ clauses. (The sort position for <code class="ph codeph">NULL</code> values in <code class="ph codeph">ORDER BY ... DESC</code> queries is
+ changed in Impala 1.2.1 and higher to be more standards-compliant, and the <code class="ph codeph">NULLS FIRST</code> and
+ <code class="ph codeph">NULLS LAST</code> keywords are new in Impala 1.2.1.)
+ </p>
+
+ <p class="p">
+ Prior to Impala 1.4.0, Impala required any query including an
+ <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_order_by.html#order_by">ORDER BY</a></code> clause to also use a
+ <code class="ph codeph"><a class="xref" href="../shared/../topics/impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and
+ higher, the <code class="ph codeph">LIMIT</code> clause is optional for <code class="ph codeph">ORDER BY</code> queries. In cases where
+ sorting a huge result set requires enough memory to exceed the Impala memory limit for a particular node,
+ Impala automatically uses a temporary disk work area to perform the sort operation.
+ </p>
+
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the complex data types <code class="ph codeph">STRUCT</code>,
+ <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code> are available. These columns cannot
+ be referenced directly in the <code class="ph codeph">ORDER BY</code> clause.
+ When you query a complex type column, you use join notation to <span class="q">"unpack"</span> the elements
+ of the complex type, and within the join query you can include an <code class="ph codeph">ORDER BY</code>
+ clause to control the order in the result set of the scalar elements from the complex type.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about Impala support for complex types.
+ </p>
+
+ <p class="p">
+ The following query shows how a complex type column cannot be directly used in an <code class="ph codeph">ORDER BY</code> clause:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games (id BIGINT, score ARRAY <BIGINT>) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id FROM games ORDER BY score DESC;
+ERROR: AnalysisException: ORDER BY expression 'score' with complex type 'ARRAY<BIGINT>' is not supported.
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following query retrieves the user ID and score, only for scores greater than one million,
+ with the highest scores for each user listed first.
+ Because the individual array elements are now represented as separate rows in the result set,
+ they can be used in the <code class="ph codeph">ORDER BY</code> clause, referenced using the <code class="ph codeph">ITEM</code>
+ pseudocolumn that represents each array element.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT id, item FROM games, games.score
+ WHERE item > 1000000
+ORDER BY id, item desc;
+</code></pre>
+
+ <p class="p">
+ The following queries use similar <code class="ph codeph">ORDER BY</code> techniques with variations of the <code class="ph codeph">GAMES</code>
+ table, where the complex type is an <code class="ph codeph">ARRAY</code> containing <code class="ph codeph">STRUCT</code> or <code class="ph codeph">MAP</code>
+ elements to represent additional details about each game that was played.
+ For an array of structures, the fields of the structure are referenced as <code class="ph codeph">ITEM.<var class="keyword varname">field_name</var></code>.
+ For an array of maps, the keys and values within each array element are referenced as <code class="ph codeph">ITEM.KEY</code>
+ and <code class="ph codeph">ITEM.VALUE</code>.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE games2 (id BIGINT, play array < struct <game_name: string, score: BIGINT, high_score: boolean> >) STORED AS PARQUET
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, item.game_name, item.score FROM games2, games2.play
+ WHERE item.score > 1000000
+ORDER BY id, item.score DESC;
+
+CREATE TABLE games3 (id BIGINT, play ARRAY < MAP <STRING, BIGINT> >) STORED AS PARQUET;
+...use LOAD DATA to load externally created Parquet files into the table...
+SELECT id, info.key AS k, info.value AS v from games3, games3.play AS plays, games3.play.item AS info
+ WHERE info.KEY = 'score' AND info.VALUE > 1000000
+ORDER BY id, info.value desc;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Although the <code class="ph codeph">LIMIT</code> clause is now optional on <code class="ph codeph">ORDER BY</code> queries, if your
+ query only needs some number of rows that you can predict in advance, use the <code class="ph codeph">LIMIT</code> clause
+ to reduce unnecessary processing. For example, if the query has a clause <code class="ph codeph">LIMIT 10</code>, each data
+ node sorts its portion of the relevant result set and only returns 10 rows to the coordinator node. The
+ coordinator node picks the 10 highest or lowest row values out of this small intermediate result set.
+ </p>
+
+ <p class="p">
+ If an <code class="ph codeph">ORDER BY</code> clause is applied to an early phase of query processing, such as a subquery
+ or a view definition, Impala ignores the <code class="ph codeph">ORDER BY</code> clause. To get ordered results from a
+ subquery or view, apply an <code class="ph codeph">ORDER BY</code> clause to the outermost or final <code class="ph codeph">SELECT</code>
+ level.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">ORDER BY</code> is often used in combination with <code class="ph codeph">LIMIT</code> to perform <span class="q">"top-N"</span>
+ queries:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT user_id AS "Top 10 Visitors", SUM(page_views) FROM web_stats
+ GROUP BY page_views, user_id
+ ORDER BY SUM(page_views) DESC LIMIT 10;
+</code></pre>
+
+ <p class="p">
+ <code class="ph codeph">ORDER BY</code> is sometimes used in combination with <code class="ph codeph">OFFSET</code> and
+ <code class="ph codeph">LIMIT</code> to paginate query results, although it is relatively inefficient to issue multiple
+ queries like this against the large tables typically used with Impala:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT page_title AS "Page 1 of search results", page_url FROM search_content
+ WHERE LOWER(page_title) LIKE '%game%')
+ ORDER BY page_title LIMIT 10 OFFSET 0;
+SELECT page_title AS "Page 2 of search results", page_url FROM search_content
+ WHERE LOWER(page_title) LIKE '%game%')
+ ORDER BY page_title LIMIT 10 OFFSET 10;
+SELECT page_title AS "Page 3 of search results", page_url FROM search_content
+ WHERE LOWER(page_title) LIKE '%game%')
+ ORDER BY page_title LIMIT 10 OFFSET 20;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong>
+ </p>
+
+ <p class="p">
+ Impala sorts the intermediate results of an <code class="ph codeph">ORDER BY</code> clause in memory whenever practical. In
+ a cluster of N DataNodes, each node sorts roughly 1/Nth of the result set, the exact proportion varying
+ depending on how the data matching the query is distributed in HDFS.
+ </p>
+
+ <p class="p">
+ If the size of the sorted intermediate result set on any DataNode would cause the query to exceed the Impala
+ memory limit, Impala sorts as much as practical in memory, then writes partially sorted data to disk. (This
+ technique is known in industry terminology as <span class="q">"external sorting"</span> and <span class="q">"spilling to disk"</span>.) As each
+ 8 MB batch of data is written to disk, Impala frees the corresponding memory to sort a new 8 MB batch of
+ data. When all the data has been processed, a final merge sort operation is performed to correctly order the
+ in-memory and on-disk results as the result set is transmitted back to the coordinator node. When external
+ sorting becomes necessary, Impala requires approximately 60 MB of RAM at a minimum for the buffers needed to
+ read, write, and sort the intermediate results. If more RAM is available on the DataNode, Impala will use
+ the additional RAM to minimize the amount of disk I/O for sorting.
+ </p>
+
+ <p class="p">
+ This external sort technique is used as appropriate on each DataNode (possibly including the coordinator
+ node) to sort the portion of the result set that is processed on that node. When the sorted intermediate
+ results are sent back to the coordinator node to produce the final result set, the coordinator node uses a
+ merge sort technique to produce a final sorted result set without using any extra resources on the
+ coordinator node.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Configuration for disk usage:</strong>
+ </p>
+
+ <p class="p">
+ By default, intermediate files used during large sort, join, aggregation, or analytic function operations
+ are stored in the directory <span class="ph filepath">/tmp/impala-scratch</span> . These files are removed when the
+ operation finishes. (Multiple concurrent queries can perform operations that use the <span class="q">"spill to disk"</span>
+ technique, without any name conflicts for these temporary files.) You can specify a different location by
+ starting the <span class="keyword cmdname">impalad</span> daemon with the
+ <code class="ph codeph">--scratch_dirs="<var class="keyword varname">path_to_directory</var>"</code> configuration option.
+ You can specify a single directory, or a comma-separated list of directories. The scratch directories must
+ be on the local filesystem, not in HDFS. You might specify different directory paths for different hosts,
+ depending on the capacity and speed
+ of the available storage devices. In <span class="keyword">Impala 2.3</span> or higher, Impala successfully starts (with a warning
+ Impala successfully starts (with a warning written to the log) if it cannot create or read and write files
+ in one of the scratch directories. If there is less than 1 GB free on the filesystem where that directory resides,
+ Impala still runs, but writes a warning message to its log. If Impala encounters an error reading or writing
+ files in a scratch directory during a query, Impala logs the error and the query fails.
+ </p>
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Sorting considerations:</strong> Although you can specify an <code class="ph codeph">ORDER BY</code> clause in an
+ <code class="ph codeph">INSERT ... SELECT</code> statement, any <code class="ph codeph">ORDER BY</code> clause is ignored and the
+ results are not necessarily sorted. An <code class="ph codeph">INSERT ... SELECT</code> operation potentially creates
+ many different data files, prepared on different data nodes, and therefore the notion of the data being
+ stored in sorted order is impractical.
+ </p>
+
+ <div class="p">
+ An <code class="ph codeph">ORDER BY</code> clause without an additional <code class="ph codeph">LIMIT</code> clause is ignored in any
+ view definition. If you need to sort the entire result set from a view, use an <code class="ph codeph">ORDER BY</code>
+ clause in the <code class="ph codeph">SELECT</code> statement that queries the view. You can still make a simple <span class="q">"top
+ 10"</span> report by combining the <code class="ph codeph">ORDER BY</code> and <code class="ph codeph">LIMIT</code> clauses in the same
+ view definition:
+<pre class="pre codeblock"><code>[localhost:21000] > create table unsorted (x bigint);
+[localhost:21000] > insert into unsorted values (1), (9), (3), (7), (5), (8), (4), (6), (2);
+[localhost:21000] > create view sorted_view as select x from unsorted order by x;
+[localhost:21000] > select x from sorted_view; -- ORDER BY clause in view has no effect.
++---+
+| x |
++---+
+| 1 |
+| 9 |
+| 3 |
+| 7 |
+| 5 |
+| 8 |
+| 4 |
+| 6 |
+| 2 |
++---+
+[localhost:21000] > select x from sorted_view order by x; -- View query requires ORDER BY at outermost level.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
+| 4 |
+| 5 |
+| 6 |
+| 7 |
+| 8 |
+| 9 |
++---+
+[localhost:21000] > create view top_3_view as select x from unsorted order by x limit 3;
+[localhost:21000] > select x from top_3_view; -- ORDER BY and LIMIT together in view definition are preserved.
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+</code></pre>
+ </div>
+
+ <p class="p">
+ With the lifting of the requirement to include a <code class="ph codeph">LIMIT</code> clause in every <code class="ph codeph">ORDER
+ BY</code> query (in Impala 1.4 and higher):
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Now the use of scratch disk space raises the possibility of an <span class="q">"out of disk space"</span> error on a
+ particular DataNode, as opposed to the previous possibility of an <span class="q">"out of memory"</span> error. Make sure
+ to keep at least 1 GB free on the filesystem used for temporary sorting work.
+ </p>
+ </li>
+
+ </ul>
+
+ <p class="p">
+ In Impala 1.2.1 and higher, all <code class="ph codeph">NULL</code> values come at the end of the result set for
+ <code class="ph codeph">ORDER BY ... ASC</code> queries, and at the beginning of the result set for <code class="ph codeph">ORDER BY ...
+ DESC</code> queries. In effect, <code class="ph codeph">NULL</code> is considered greater than all other values for
+ sorting purposes. The original Impala behavior always put <code class="ph codeph">NULL</code> values at the end, even for
+ <code class="ph codeph">ORDER BY ... DESC</code> queries. The new behavior in Impala 1.2.1 makes Impala more compatible
+ with other popular database systems. In Impala 1.2.1 and higher, you can override or specify the sorting
+ behavior for <code class="ph codeph">NULL</code> by adding the clause <code class="ph codeph">NULLS FIRST</code> or <code class="ph codeph">NULLS
+ LAST</code> at the end of the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > create table numbers (x int);
+[localhost:21000] > insert into numbers values (1), (null), (2), (null), (3);
+[localhost:21000] > select x from numbers order by x nulls first;
++------+
+| x |
++------+
+| NULL |
+| NULL |
+| 1 |
+| 2 |
+| 3 |
++------+
+[localhost:21000] > select x from numbers order by x desc nulls first;
++------+
+| x |
++------+
+| NULL |
+| NULL |
+| 3 |
+| 2 |
+| 1 |
++------+
+[localhost:21000] > select x from numbers order by x nulls last;
++------+
+| x |
++------+
+| 1 |
+| 2 |
+| 3 |
+| NULL |
+| NULL |
++------+
+[localhost:21000] > select x from numbers order by x desc nulls last;
++------+
+| x |
++------+
+| 3 |
+| 2 |
+| 1 |
+| NULL |
+| NULL |
++------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for further examples of queries with the <code class="ph codeph">ORDER
+ BY</code> clause.
+ </p>
+
+ <p class="p">
+ Analytic functions use the <code class="ph codeph">ORDER BY</code> clause in a different context to define the sequence in
+ which rows are analyzed. See <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
[32/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_fixed_issues.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_fixed_issues.html b/docs/build3x/html/topics/impala_fixed_issues.html
new file mode 100644
index 0000000..0458052
--- /dev/null
+++ b/docs/build3x/html/topics/impala_fixed_issues.html
@@ -0,0 +1,5961 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="fixed_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Fixed Issues in Apache Impala</title></head><body id="fixed_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Fixed Issues in Apache Impala</span></h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following sections describe the major issues fixed in each Impala release.
+ </p>
+
+ <p class="p">
+ For known issues that are currently unresolved, see <a class="xref" href="impala_known_issues.html#known_issues">Known Issues and Workarounds in Impala</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="fixed_issues__fixed_issues_3_0_0">
+ <h2 class="title topictitle2" id="ariaid-title2">Issues Fixed in <span class="keyword">Impala 3.0</span></h2>
+ <div class="body conbody">
+ <p class="p"> For the full list of issues closed in this release, including bug
+ fixes, see the <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>. </p>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="fixed_issues__fixed_issues_2_12_0">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Issues Fixed in <span class="keyword">Impala 2.12</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including bug fixes,
+ see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="fixed_issues__fixed_issues_2_11_0">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Issues Fixed in <span class="keyword">Impala 2.11</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including bug fixes,
+ see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="fixed_issues__fixed_issues_2100">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Issues Fixed in <span class="keyword">Impala 2.10</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including bug fixes,
+ see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="fixed_issues__fixed_issues_290">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Issues Fixed in <span class="keyword">Impala 2.9.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including bug fixes,
+ see the <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="fixed_issues__fixed_issues_280">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Issues Fixed in <span class="keyword">Impala 2.8.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of Impala fixed issues in <span class="keyword">Impala 2.8</span>, see
+ <a class="xref" href="https://issues.apache.org/jira/issues/?jql=type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.8.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.8.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="fixed_issues__fixed_issues_270">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Issues Fixed in <span class="keyword">Impala 2.7.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ For the full list of Impala fixed issues in Impala 2.7.0, see
+ <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.7.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.7.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="fixed_issues__fixed_issues_263">
+ <h2 class="title topictitle2" id="ariaid-title9">Issues Fixed in <span class="keyword">Impala 2.6.3</span></h2>
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="fixed_issues__fixed_issues_262">
+ <h2 class="title topictitle2" id="ariaid-title10">Issues Fixed in <span class="keyword">Impala 2.6.2</span></h2>
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="fixed_issues__fixed_issues_260">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Issues Fixed in <span class="keyword">Impala 2.6.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following list contains the most critical fixed issues
+ (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+ For the full list of fixed issues in <span class="keyword">Impala 2.6.0</span>, see
+ <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.6.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.6.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="fixed_issues_260__IMPALA-3385">
+ <h3 class="title topictitle3" id="ariaid-title12">RuntimeState::error_log_ crashes</h3>
+ <div class="body conbody">
+ <p class="p">
+ A crash could occur, with stack trace pointing to <code class="ph codeph">impala::RuntimeState::ErrorLog</code>.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3385" target="_blank">IMPALA-3385</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="fixed_issues_260__IMPALA-3378">
+ <h3 class="title topictitle3" id="ariaid-title13">HiveUdfCall::Open() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+ <div class="body conbody">
+ <p class="p">
+ A crash could occur because of contention between multiple calls to Java UDFs.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3378" target="_blank">IMPALA-3378</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="fixed_issues_260__IMPALA-3379">
+ <h3 class="title topictitle3" id="ariaid-title14">HBaseTableWriter::CreatePutList() produces unsynchronized access to JniUtil::global_refs_ vector</h3>
+ <div class="body conbody">
+ <p class="p">
+ A crash could occur because of contention between multiple concurrent statements writing to HBase.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3379" target="_blank">IMPALA-3379</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="fixed_issues_260__IMPALA-3317">
+ <h3 class="title topictitle3" id="ariaid-title15">Stress test failure: sorter.cc:745] Check failed: i == 0 (1 vs. 0) </h3>
+ <div class="body conbody">
+ <p class="p">
+ A crash or wrong results could occur if the spill-to-disk mechanism encountered a zero-length string at
+ the very end of a data block.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3317" target="_blank">IMPALA-3317</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="fixed_issues_260__IMPALA-3311">
+ <h3 class="title topictitle3" id="ariaid-title16">String data coming out of agg can be corrupted by blocking operators</h3>
+ <div class="body conbody">
+ <p class="p">
+ If a query plan contains an aggregation node producing string values anywhere within a subplan
+ (that is,if in the SQL statement, the aggregate function appears within an inline view over a collection column),
+ the results of the aggregation may be incorrect.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3311" target="_blank">IMPALA-3311</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title17" id="fixed_issues_260__IMPALA-3269">
+ <h3 class="title topictitle3" id="ariaid-title17">CTAS with subquery throws AuthzException</h3>
+ <div class="body conbody">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE AS SELECT</code> operation could fail with an authorization error,
+ due to a slight difference in the privilege checking for the CTAS operation.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3269" target="_blank">IMPALA-3269</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="fixed_issues_260__IMPALA-3237">
+ <h3 class="title topictitle3" id="ariaid-title18">Crash on inserting into table with binary and parquet</h3>
+ <div class="body conbody">
+ <p class="p">
+ Impala incorrectly allowed <code class="ph codeph">BINARY</code> to be specified as a column type,
+ resulting in a crash during a write to a Parquet table with a column of that type.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3237" target="_blank">IMPALA-3237</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="fixed_issues_260__IMPALA-3105">
+ <h3 class="title topictitle3" id="ariaid-title19">RowBatch::MaxTupleBufferSize() calculation incorrect, may lead to memory corruption</h3>
+ <div class="body conbody">
+ <p class="p">
+ A crash could occur while querying tables with very large rows, for example wide tables with many
+ columns or very large string values. This problem was identified in Impala 2.3, but had low
+ reproducibility in subsequent releases. The fix ensures the memory allocation size is correct.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3105" target="_blank">IMPALA-3105</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="fixed_issues_260__IMPALA-3494">
+ <h3 class="title topictitle3" id="ariaid-title20">Thrift buffer overflows when serialize more than 3355443200 bytes in impala</h3>
+ <div class="body conbody">
+ <p class="p">
+ A very large memory allocation within the <span class="keyword cmdname">catalogd</span> daemon could exceed an internal Thrift limit,
+ causing a crash.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3494" target="_blank">IMPALA-3494</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title21" id="fixed_issues_260__IMPALA-3314">
+ <h3 class="title topictitle3" id="ariaid-title21">Altering table partition's storage format is not working and crashing the daemon</h3>
+ <div class="body conbody">
+ <p class="p">
+ If a partitioned table used a file format other than Avro, and the file format of an individual partition
+ was changed to Avro, subsequent queries could encounter a crash.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3314" target="_blank">IMPALA-3314</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="fixed_issues_260__IMPALA-3798">
+ <h3 class="title topictitle3" id="ariaid-title22">Race condition may cause scanners to spin with runtime filters on Avro or Sequence files</h3>
+ <div class="body conbody">
+ <p class="p">
+ A timing problem during runtime filter processing could cause queries against Avro or SequenceFile tables
+ to hang.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3798" target="_blank">IMPALA-3798</a></p>
+ <p class="p"><strong class="ph b">Severity:</strong> High</p>
+ </div>
+ </article>
+
+ </article>
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title23" id="fixed_issues__fixed_issues_254">
+ <h2 class="title topictitle2" id="ariaid-title23">Issues Fixed in <span class="keyword">Impala 2.5.4</span></h2>
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title24" id="fixed_issues__fixed_issues_252">
+ <h2 class="title topictitle2" id="ariaid-title24">Issues Fixed in <span class="keyword">Impala 2.5.2</span></h2>
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title25" id="fixed_issues__fixed_issues_251">
+
+ <h2 class="title topictitle2" id="ariaid-title25">Issues Fixed in <span class="keyword">Impala 2.5.1</span></h2>
+
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title26" id="fixed_issues__fixed_issues_250">
+
+ <h2 class="title topictitle2" id="ariaid-title26">Issues Fixed in <span class="keyword">Impala 2.5.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following list contains the most critical issues (<code class="ph codeph">priority='Blocker'</code>) from the JIRA system.
+ For the full list of fixed issues in <span class="keyword">Impala 2.5</span>, see
+ <a class="xref" href="https://issues.apache.org/jira/issues/?jql=%20type%20%3D%20bug%20and%20project%20%3D%20IMPALA%20AND%20resolution%20%3D%20fixed%20AND%20affectedVersion%20!%3D%20%22Impala%202.5.0%22%20AND%20fixVersion%20%3D%20%22Impala%202.5.0%22%20and%20not%20labels%20%3D%20broken-build%20order%20by%20priority%20desc" target="_blank">this report in the Impala JIRA tracker</a>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="fixed_issues_250__IMPALA-2683">
+ <h3 class="title topictitle3" id="ariaid-title27">Stress test hit assert in LLVM: external function could not be resolved</h3>
+ <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2683" target="_blank">IMPALA-2683</a></p>
+<p class="p">The stress test was running a build with the TPC-H, TPC-DS, and TPC-H nested queries with scale factor 3.</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title28" id="fixed_issues_250__IMPALA-2365">
+ <h3 class="title topictitle3" id="ariaid-title28">Impalad is crashing if udf jar is not available in hdfs location for first time</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2365" target="_blank">IMPALA-2365</a></p>
+ <p class="p">
+ If a UDF JAR was not available in the HDFS location specified in the <code class="ph codeph">CREATE FUNCTION</code> statement,
+ the <span class="keyword cmdname">impalad</span> daemon could crash.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="fixed_issues_250__IMPALA-2535-570">
+ <h3 class="title topictitle3" id="ariaid-title29">PAGG hits mem_limit when switching to I/O buffers</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+ <p class="p">
+ A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+ The cause was the internal ordering of operations that could cause a later phase of the query to
+ allocate memory required by an earlier phase of the query. The workaround was to either increase
+ or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+ combination of memory limit and data volume.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="fixed_issues_250__IMPALA-2643-570">
+ <h3 class="title topictitle3" id="ariaid-title30">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+ <p class="p">
+ Referring to the same column twice in a view definition could cause the view to omit
+ rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+ incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+ from the result set.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title31" id="fixed_issues_250__IMPALA-1459-570">
+ <h3 class="title topictitle3" id="ariaid-title31">Fix migration/assignment of On-clause predicates inside inline views</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+ <p class="p">
+ Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+ being applied at the wrong stage of query processing, leading to incorrect results.
+ Wrong predicate assignment could happen under the following conditions:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ The query includes an inline view that contains an outer join.
+ </li>
+ <li class="li">
+ That inline view is joined with another table in the enclosing query block.
+ </li>
+ <li class="li">
+ That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+ only references columns originating from the outer-joined tables inside the inline view.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="fixed_issues_250__IMPALA-2093">
+ <h3 class="title topictitle3" id="ariaid-title32">Wrong plan of NOT IN aggregate subquery when a constant is used in subquery predicate</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2093" target="_blank">IMPALA-2093</a></p>
+ <p class="p">
+ <code class="ph codeph">IN</code> subqueries might return wrong results if the left-hand side of the <code class="ph codeph">IN</code> is a constant.
+ For example:
+ </p>
+<pre class="pre codeblock"><code>
+select * from alltypestiny t1
+ where 10 not in (select sum(int_col) from alltypestiny);
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="fixed_issues_250__IMPALA-2940">
+ <h3 class="title topictitle3" id="ariaid-title33">Parquet DictDecoders accumulate throughout query</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2940" target="_blank">IMPALA-2940</a></p>
+ <p class="p">
+ Parquet dictionary decoders can accumulate throughout query execution, leading to excessive memory usage. One decoder is created per-column per-split.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="fixed_issues_250__IMPALA-3056">
+ <h3 class="title topictitle3" id="ariaid-title34">Planner doesn't set the has_local_target field correctly</h3>
+ <div class="body conbody">
+<p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3056" target="_blank">IMPALA-3056</a></p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="fixed_issues_250__IMPALA-2742">
+ <h3 class="title topictitle3" id="ariaid-title35">MemPool allocation growth behavior</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2742" target="_blank">IMPALA-2742</a></p>
+ <p class="p">
+ Currently, the MemPool would always double the size of the last allocation.
+ This can lead to bad behavior if the MemPool transferred the ownership of all its data
+ except the last chunk. In the next allocation, the next allocated chunk would double
+ the size of this large chunk, which can be undesirable.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="fixed_issues_250__IMPALA-3035">
+ <h3 class="title topictitle3" id="ariaid-title36">Drop partition operations don't follow the catalog's locking protocol</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3035" target="_blank">IMPALA-3035</a></p>
+ <p class="p">
+ The <code class="ph codeph">CatalogOpExecutor.alterTableDropPartition()</code> function violates
+ the locking protocol used in the catalog that requires <code class="ph codeph">catalogLock_</code>
+ to be acquired before any table-level lock. That may cause deadlocks when <code class="ph codeph">ALTER TABLE DROP PARTITION</code>
+ is executed concurrently with other DDL operations.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title37" id="fixed_issues_250__IMPALA-2215">
+ <h3 class="title topictitle3" id="ariaid-title37">HAVING clause without aggregation not applied properly</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2215" target="_blank">IMPALA-2215</a></p>
+ <p class="p">
+ A query with a <code class="ph codeph">HAVING</code> clause but no <code class="ph codeph">GROUP BY</code> clause was not being rejected,
+ despite being invalid syntax. For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+select case when 1=1 then 'didit' end as c1 from (select 1 as one) a having 1!=1;
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title38" id="fixed_issues_250__IMPALA-2914">
+ <h3 class="title topictitle3" id="ariaid-title38">Hit DCHECK Check failed: HasDateOrTime()</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2914" target="_blank">IMPALA-2914</a></p>
+ <p class="p">
+ <code class="ph codeph">TimestampValue::ToTimestampVal()</code> requires a valid <code class="ph codeph">TimestampValue</code> as input.
+ This requirement was not enforced in some places, leading to serious errors.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title39" id="fixed_issues_250__IMPALA-2986">
+ <h3 class="title topictitle3" id="ariaid-title39">Aggregation spill loop gives up too early leading to mem limit exceeded errors</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2986" target="_blank">IMPALA-2986</a></p>
+ <p class="p">
+ An aggregation query could fail with an out-of-memory error, despite sufficient memory being reported as available.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title40" id="fixed_issues_250__IMPALA-2592">
+ <h3 class="title topictitle3" id="ariaid-title40">DataStreamSender::Channel::CloseInternal() does not close the channel on an error.</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2592" target="_blank">IMPALA-2592</a></p>
+ <p class="p">
+ Some queries do not close an internal communication channel on an error.
+ This will cause the node on the other side of the channel to wait indefinitely, causing the query to hang.
+ For example, this issue could happen on a Kerberos-enabled system if the credential cache was outdated.
+ Although the affected query hangs, the <span class="keyword cmdname">impalad</span> daemons continue processing other queries.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title41" id="fixed_issues_250__IMPALA-2184">
+ <h3 class="title topictitle3" id="ariaid-title41">Codegen does not catch exceptions in FROM_UNIXTIME()</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2184" target="_blank">IMPALA-2184</a></p>
+ <p class="p">
+ Querying for the min or max value of a timestamp cast from a bigint via <code class="ph codeph">from_unixtime()</code>
+ fails silently and crashes instances of <span class="keyword cmdname">impalad</span> when the input includes a value outside of the valid range.
+ </p>
+
+ <p class="p"><strong class="ph b">Workaround:</strong> Disable native code generation with:</p>
+<pre class="pre codeblock"><code>
+SET disable_codegen=true;
+</code></pre>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title42" id="fixed_issues_250__IMPALA-2788">
+ <h3 class="title topictitle3" id="ariaid-title42">Impala returns wrong result for function 'conv(bigint, from_base, to_base)'</h3>
+ <div class="body conbody">
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2788" target="_blank">IMPALA-2788</a></p>
+ <p class="p">
+ Impala returns wrong result for function <code class="ph codeph">conv()</code>.
+ Function <code class="ph codeph">conv(bigint, from_base, to_base)</code> returns an correct result,
+ while <code class="ph codeph">conv(string, from_base, to_base)</code> returns the correct value.
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+
+select 2061013007, conv(2061013007, 16, 10), conv('2061013007', 16, 10);
++------------+--------------------------+----------------------------+
+| 2061013007 | conv(2061013007, 16, 10) | conv('2061013007', 16, 10) |
++------------+--------------------------+----------------------------+
+| 2061013007 | 1627467783 | 139066421255 |
++------------+--------------------------+----------------------------+
+Fetched 1 row(s) in 0.65s
+
+select 2061013007, conv(cast(2061013007 as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 1627467783 | 139066421255 |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(2061013007 as string), 16, 10), conv('2061013007', 16, 10);
++------------+------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(2061013007 as string), 16, 10) | conv('2061013007', 16, 10) |
++------------+------------------------------------------+----------------------------+
+| 2061013007 | 139066421255 | 139066421255 |
++------------+------------------------------------------+----------------------------+
+
+select 2061013007, conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10), conv('2061013007', 16, 10);
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | conv(cast(cast(2061013007 as decimal(20,0)) as bigint), 16, 10) | conv('2061013007', 16, 10) |
++------------+-----------------------------------------------------------------+----------------------------+
+| 2061013007 | 1627467783 | 139066421255 |
++------------+-----------------------------------------------------------------+----------------------------+
+
+</code></pre>
+
+ <p class="p"><strong class="ph b">Workaround:</strong>
+ Cast the value to string and use <code class="ph codeph">conv(string, from_base, to_base)</code> for conversion.
+ </p>
+ </div>
+ </article>
+
+
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title43" id="fixed_issues__fixed_issues_241">
+
+ <h2 class="title topictitle2" id="ariaid-title43">Issues Fixed in <span class="keyword">Impala 2.4.1</span></h2>
+
+ <div class="body conbody">
+ <p class="p">
+ </p>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title44" id="fixed_issues__fixed_issues_240">
+
+ <h2 class="title topictitle2" id="ariaid-title44">Issues Fixed in <span class="keyword">Impala 2.4.0</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The set of fixes for Impala in <span class="keyword">Impala 2.4.0</span> is the same as
+ in <span class="keyword">Impala 2.3.2</span>.
+
+ </p>
+
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title45" id="fixed_issues__fixed_issues_234">
+
+ <h2 class="title topictitle2" id="ariaid-title45">Issues Fixed in <span class="keyword">Impala 2.3.4</span></h2>
+
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title46" id="fixed_issues__fixed_issues_232">
+
+ <h2 class="title topictitle2" id="ariaid-title46">Issues Fixed in <span class="keyword">Impala 2.3.2</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most serious or frequently encountered customer
+ issues fixed in <span class="keyword">Impala 2.3.2</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title47" id="fixed_issues_232__IMPALA-2829">
+ <h3 class="title topictitle3" id="ariaid-title47">SEGV in AnalyticEvalNode touching NULL input_stream_</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query involving an analytic function could encounter a serious error.
+ This issue was encountered infrequently, depending upon specific combinations
+ of queries and data.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2829" target="_blank">IMPALA-2829</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title48" id="fixed_issues_232__IMPALA-2722">
+ <h3 class="title topictitle3" id="ariaid-title48">Free local allocations per row batch in non-partitioned AGG and HJ</h3>
+ <div class="body conbody">
+ <p class="p">
+ An outer join query could fail unexpectedly with an out-of-memory error
+ when the <span class="q">"spill to disk"</span> mechanism was turned off.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2722" target="_blank">IMPALA-2722</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title49" id="fixed_issues_232__IMPALA-2612">
+
+ <h3 class="title topictitle3" id="ariaid-title49">Free local allocations once for every row batch when building hash tables</h3>
+ <div class="body conbody">
+ <p class="p">
+ A join query could encounter a serious error due to an internal failure to allocate memory, which
+ resulted in dereferencing a <code class="ph codeph">NULL</code> pointer.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2612" target="_blank">IMPALA-2612</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title50" id="fixed_issues_232__IMPALA-2643">
+ <h3 class="title topictitle3" id="ariaid-title50">Prevent migrating incorrectly inferred identity predicates into inline views</h3>
+ <div class="body conbody">
+ <p class="p">
+ Referring to the same column twice in a view definition could cause the view to omit
+ rows where that column contained a <code class="ph codeph">NULL</code> value. This could cause
+ incorrect results due to an inaccurate <code class="ph codeph">COUNT(*)</code> value or rows missing
+ from the result set.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2643" target="_blank">IMPALA-2643</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title51" id="fixed_issues_232__IMPALA-2695">
+ <h3 class="title topictitle3" id="ariaid-title51">Fix GRANTs on URIs with uppercase letters</h3>
+ <div class="body conbody">
+ <p class="p">
+ A <code class="ph codeph">GRANT</code> statement for a URI could be ineffective if the URI
+ contained uppercase letters, for example in an uppercase directory name.
+ Subsequent statements, such as <code class="ph codeph">CREATE EXTERNAL TABLE</code>
+ with a <code class="ph codeph">LOCATION</code> clause, could fail with an authorization exception.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2695" target="_blank">IMPALA-2695</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="IMPALA-2664-552__IMPALA-2648-552" id="fixed_issues_232__IMPALA-2664-552">
+ <h3 class="title topictitle3" id="IMPALA-2664-552__IMPALA-2648-552">Avoid sending large partition stats objects over thrift</h3>
+ <div class="body conbody">
+ <p class="p">
+ The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error
+ when loading the incremental statistics metadata for tables with large
+ numbers of partitions and columns. The problem occurred when the
+ internal representation of metadata for the table exceeded 2
+ GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it
+ would produce metadata that exceeded the maximum size.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title53" id="fixed_issues_232__IMPALA-2226">
+ <h3 class="title topictitle3" id="ariaid-title53">Throw AnalysisError if table properties are too large (for the Hive metastore)</h3>
+ <div class="body conbody">
+ <p class="p">
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements could fail with
+ metastore database errors due to length limits on the <code class="ph codeph">SERDEPROPERTIES</code> and <code class="ph codeph">TBLPROPERTIES</code> clauses.
+ (The limit on key size is 256, while the limit on value size is 4000.) The fix makes Impala handle these error conditions
+ more cleanly, by detecting too-long values rather than passing them to the metastore database.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2226" target="_blank">IMPALA-2226</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title54" id="fixed_issues_232__IMPALA-2273-552">
+ <h3 class="title topictitle3" id="ariaid-title54">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+ <div class="body conbody">
+ <p class="p">
+ Impala could fail to access Parquet data files with page headers larger than 8 MB, which could
+ occur, for example, if the minimum or maximum values for a column were long strings. The
+ fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you can use to
+ increase the Impala size limit to a value higher than 8 MB.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title55" id="fixed_issues_232__IMPALA-2473">
+ <h3 class="title topictitle3" id="ariaid-title55">reduce scanner memory usage</h3>
+ <div class="body conbody">
+ <p class="p">
+ Queries on Parquet tables could consume excessive memory (potentially multiple gigabytes) due to producing
+ large intermediate data values while evaluating groups of rows. The workaround was to reduce the size of
+ the <code class="ph codeph">NUM_SCANNER_THREADS</code> query option, the <code class="ph codeph">BATCH_SIZE</code> query option,
+ or both.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2473" target="_blank">IMPALA-2473</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title56" id="fixed_issues_232__IMPALA-2113">
+ <h3 class="title topictitle3" id="ariaid-title56">Handle error when distinct and aggregates are used with a having clause</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query that included a <code class="ph codeph">DISTINCT</code> operator and a <code class="ph codeph">HAVING</code> clause, but no
+ aggregate functions or <code class="ph codeph">GROUP BY</code>, would fail with an uninformative error message.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2113" target="_blank">IMPALA-2113</a></p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title57" id="fixed_issues_232__IMPALA-2225">
+ <h3 class="title topictitle3" id="ariaid-title57">Handle error when star based select item and aggregate are incorrectly used</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query that included <code class="ph codeph">*</code> in the <code class="ph codeph">SELECT</code> list, in addition to an
+ aggregate function call, would fail with an uninformative message if the query had no
+ <code class="ph codeph">GROUP BY</code> clause.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2225" target="_blank">IMPALA-2225</a></p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title58" id="fixed_issues_232__IMPALA-2731-552">
+ <h3 class="title topictitle3" id="ariaid-title58">Refactor MemPool usage in HBase scan node</h3>
+ <div class="body conbody">
+ <p class="p">
+ Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+ The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+ The fix for this issue involves removing a separate memory work area for HBase queries
+ and reusing other memory that was already allocated.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title59" id="fixed_issues_232__IMPALA-1459-552">
+ <h3 class="title topictitle3" id="ariaid-title59">Fix migration/assignment of On-clause predicates inside inline views</h3>
+ <div class="body conbody">
+ <p class="p">
+ Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+ being applied at the wrong stage of query processing, leading to incorrect results.
+ Wrong predicate assignment could happen under the following conditions:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ The query includes an inline view that contains an outer join.
+ </li>
+ <li class="li">
+ That inline view is joined with another table in the enclosing query block.
+ </li>
+ <li class="li">
+ That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+ only references columns originating from the outer-joined tables inside the inline view.
+ </li>
+ </ul>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title60" id="fixed_issues_232__IMPALA-2558">
+ <h3 class="title topictitle3" id="ariaid-title60">DCHECK in parquet scanner after block read error</h3>
+ <div class="body conbody">
+ <p class="p">
+ A debug build of Impala could encounter a serious error after encountering some kinds of I/O
+ errors for Parquet files. This issue only occurred in debug builds, not release builds.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2558" target="_blank">IMPALA-2558</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title61" id="fixed_issues_232__IMPALA-2535">
+ <h3 class="title topictitle3" id="ariaid-title61">PAGG hits mem_limit when switching to I/O buffers</h3>
+ <div class="body conbody">
+ <p class="p">
+ A join query could fail with an out-of-memory error despite the apparent presence of sufficient memory.
+ The cause was the internal ordering of operations that could cause a later phase of the query to
+ allocate memory required by an earlier phase of the query. The workaround was to either increase
+ or decrease the <code class="ph codeph">MEM_LIMIT</code> query option, because the issue would only occur for a specific
+ combination of memory limit and data volume.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2535" target="_blank">IMPALA-2535</a></p>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title62" id="fixed_issues_232__IMPALA-2559">
+ <h3 class="title topictitle3" id="ariaid-title62">Fix check failed: sorter_runs_.back()->is_pinned_</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could fail with an internal error while calculating the memory limit.
+ This was an infrequent condition uncovered during stress testing.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title63" id="fixed_issues_232__IMPALA-2614">
+ <h3 class="title topictitle3" id="ariaid-title63">Don't ignore Status returned by DataStreamRecvr::CreateMerger()</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could fail with an internal error while calculating the memory limit.
+ This was an infrequent condition uncovered during stress testing.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2614" target="_blank">IMPALA-2614</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2559" target="_blank">IMPALA-2559</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title64" id="fixed_issues_232__IMPALA-2591">
+ <h3 class="title topictitle3" id="ariaid-title64">DataStreamSender::Send() does not return an error status if SendBatch() failed</h3>
+ <div class="body conbody">
+
+ <p class="p">
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2591" target="_blank">IMPALA-2591</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title65" id="fixed_issues_232__IMPALA-2598">
+ <h3 class="title topictitle3" id="ariaid-title65">Re-enable SSL and Kerberos on server-server</h3>
+ <div class="body conbody">
+ <p class="p">
+ These fixes lift the restriction on using SSL encryption and Kerberos authentication together
+ for internal communication between Impala components.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2598" target="_blank">IMPALA-2598</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2747" target="_blank">IMPALA-2747</a></p>
+ </div>
+ </article>
+
+
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title66" id="fixed_issues__fixed_issues_231">
+
+ <h2 class="title topictitle2" id="ariaid-title66">Issues Fixed in <span class="keyword">Impala 2.3.1</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The version of Impala that is included with <span class="keyword">Impala 2.3.1</span> is identical to <span class="keyword">Impala 2.3.0</span>.
+ There are no new bug fixes, new features, or incompatible changes.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title67" id="fixed_issues__fixed_issues_230">
+
+ <h2 class="title topictitle2" id="ariaid-title67">Issues Fixed in <span class="keyword">Impala 2.3.0</span></h2>
+
+ <div class="body conbody">
+ <p class="p"> This section lists the most serious or frequently encountered customer
+ issues fixed in <span class="keyword">Impala 2.3</span>. Any issues already fixed in
+ <span class="keyword">Impala 2.2</span> maintenance releases (up through <span class="keyword">Impala 2.2.8</span>) are also included.
+ Those issues are listed under the respective <span class="keyword">Impala 2.2</span> sections and are
+ not repeated here.
+ </p>
+
+
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title68" id="fixed_issues_230__serious_230">
+ <h3 class="title topictitle3" id="ariaid-title68">Fixes for Serious Errors</h3>
+ <div class="body conbody">
+ <p class="p">
+ A number of issues were resolved that could result in serious errors
+ when encountered. The most critical or commonly encountered are
+ listed here.
+ </p>
+ <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2168" target="_blank">IMPALA-2168</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2378" target="_blank">IMPALA-2378</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2369" target="_blank">IMPALA-2369</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2319" target="_blank">IMPALA-2319</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2314" target="_blank">IMPALA-2314</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2016" target="_blank">IMPALA-2016</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title69" id="fixed_issues_230__correctness_230">
+ <h3 class="title topictitle3" id="ariaid-title69">Fixes for Correctness Errors</h3>
+ <div class="body conbody">
+ <p class="p">
+ A number of issues were resolved that could result in wrong results
+ when encountered. The most critical or commonly encountered are
+ listed here.
+ </p>
+ <p class="p"><strong class="ph b">Bugs:</strong>
+
+
+
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2192" target="_blank">IMPALA-2192</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2440" target="_blank">IMPALA-2440</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2090" target="_blank">IMPALA-2090</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2086" target="_blank">IMPALA-2086</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1947" target="_blank">IMPALA-1947</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a>
+ </p>
+ </div>
+ </article>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title70" id="fixed_issues__fixed_issues_2210">
+
+ <h2 class="title topictitle2" id="ariaid-title70">Issues Fixed in <span class="keyword">Impala 2.2.10</span></h2>
+
+ <div class="body conbody">
+ <p class="p"></p>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title71" id="fixed_issues__fixed_issues_229">
+
+ <h2 class="title topictitle2" id="ariaid-title71">Issues Fixed in <span class="keyword">Impala 2.2.9</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.9</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title72" id="fixed_issues_229__IMPALA-1917">
+
+ <h3 class="title topictitle3" id="ariaid-title72">Query return empty result if it contains NullLiteral in inlineview</h3>
+ <div class="body conbody">
+ <p class="p">
+ If an inline view in a <code class="ph codeph">FROM</code> clause contained a <code class="ph codeph">NULL</code> literal,
+ the result set was empty.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1917" target="_blank">IMPALA-1917</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title73" id="fixed_issues_229__IMPALA-2731">
+
+ <h3 class="title topictitle3" id="ariaid-title73">HBase scan node uses 2-4x memory after upgrade to Impala 2.2.8</h3>
+ <div class="body conbody">
+ <p class="p">
+ Queries involving HBase tables used substantially more memory than in earlier Impala versions.
+ The problem occurred starting in Impala 2.2.8, as a result of the changes for IMPALA-2284.
+ The fix for this issue involves removing a separate memory work area for HBase queries
+ and reusing other memory that was already allocated.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2731" target="_blank">IMPALA-2731</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title74" id="fixed_issues_229__IMPALA-1459">
+ <h3 class="title topictitle3" id="ariaid-title74">Fix migration/assignment of On-clause predicates inside inline views</h3>
+ <div class="body conbody">
+
+ <p class="p">
+ Some combinations of <code class="ph codeph">ON</code> clauses in join queries could result in comparisons
+ being applied at the wrong stage of query processing, leading to incorrect results.
+ Wrong predicate assignment could happen under the following conditions:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ The query includes an inline view that contains an outer join.
+ </li>
+ <li class="li">
+ That inline view is joined with another table in the enclosing query block.
+ </li>
+ <li class="li">
+ That join has an <code class="ph codeph">ON</code> clause containing a predicate that
+ only references columns originating from the outer-joined tables inside the inline view.
+ </li>
+ </ul>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1459" target="_blank">IMPALA-1459</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title75" id="fixed_issues_229__IMPALA-2446">
+ <h3 class="title topictitle3" id="ariaid-title75">Fix wrong predicate assignment in outer joins</h3>
+ <div class="body conbody">
+ <p class="p">
+ The join predicate for an <code class="ph codeph">OUTER JOIN</code> clause could be applied at the wrong stage
+ of query processing, leading to incorrect results.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2446" target="_blank">IMPALA-2446</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title76" id="fixed_issues_229__IMPALA-2648">
+ <h3 class="title topictitle3" id="ariaid-title76">Avoid sending large partition stats objects over thrift</h3>
+ <div class="body conbody">
+ <p class="p"> The <span class="keyword cmdname">catalogd</span> daemon could encounter a serious error when loading the
+ incremental statistics metadata for tables with large numbers of partitions and columns.
+ The problem occurred when the internal representation of metadata for the table exceeded 2
+ GB, for example in a table with 20K partitions and 77 columns. The fix causes a
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> operation to fail if it would produce
+ metadata that exceeded the maximum size. </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2648" target="_blank">IMPALA-2648</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2664" target="_blank">IMPALA-2664</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title77" id="fixed_issues_229__IMPALA-1675">
+ <h3 class="title topictitle3" id="ariaid-title77">Avoid overflow when adding large intervals to TIMESTAMPs</h3>
+ <div class="body conbody">
+ <p class="p"> Adding or subtracting a large <code class="ph codeph">INTERVAL</code> value to a
+ <code class="ph codeph">TIMESTAMP</code> value could produce an incorrect result, with the value
+ wrapping instead of returning an out-of-range error. </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1675" target="_blank">IMPALA-1675</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title78" id="fixed_issues_229__IMPALA-1949">
+ <h3 class="title topictitle3" id="ariaid-title78">Analysis exception when a binary operator contains an IN operator with values</h3>
+ <div class="body conbody">
+ <p class="p">
+ An <code class="ph codeph">IN</code> operator with literal values could cause a statement to fail if used
+ as the argument to a binary operator, such as an equality test for a <code class="ph codeph">BOOLEAN</code> value.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1949" target="_blank">IMPALA-1949</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title79" id="fixed_issues_229__IMPALA-2273">
+
+ <h3 class="title topictitle3" id="ariaid-title79">Make MAX_PAGE_HEADER_SIZE configurable</h3>
+ <div class="body conbody">
+ <p class="p"> Impala could fail to access Parquet data files with page headers larger than 8 MB, which
+ could occur, for example, if the minimum or maximum values for a column were long strings.
+ The fix adds a configuration setting <code class="ph codeph">--max_page_header_size</code>, which you
+ can use to increase the Impala size limit to a value higher than 8 MB. </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2273" target="_blank">IMPALA-2273</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title80" id="fixed_issues_229__IMPALA-2357">
+ <h3 class="title topictitle3" id="ariaid-title80">Fix spilling sorts with var-len slots that are NULL or empty.</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query that activated the spill-to-disk mechanism could fail if it contained a sort expression
+ involving certain combinations of fixed-length or variable-length types.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2357" target="_blank">IMPALA-2357</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title81" id="fixed_issues_229__block_pin_oom">
+ <h3 class="title topictitle3" id="ariaid-title81">Work-around IMPALA-2344: Fail query with OOM in case block->Pin() fails</h3>
+ <div class="body conbody">
+ <p class="p">
+ Some queries that activated the spill-to-disk mechanism could produce a serious error
+ if there was insufficient memory to set up internal work areas. Now those queries
+ produce normal out-of-memory errors instead.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2344" target="_blank">IMPALA-2344</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title82" id="fixed_issues_229__IMPALA-2252">
+ <h3 class="title topictitle3" id="ariaid-title82">Crash (likely race) tearing down BufferedBlockMgr on query failure</h3>
+ <div class="body conbody">
+ <p class="p">
+ A serious error could occur under rare circumstances, due to a race condition while freeing memory during heavily concurrent workloads.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2252" target="_blank">IMPALA-2252</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title83" id="fixed_issues_229__IMPALA-1746">
+ <h3 class="title topictitle3" id="ariaid-title83">QueryExecState doesn't check for query cancellation or errors</h3>
+ <div class="body conbody">
+ <p class="p">
+ A call to <code class="ph codeph">SetError()</code> in a user-defined function (UDF) would not cause the query to fail as expected.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1746" target="_blank">IMPALA-1746</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title84" id="fixed_issues_229__IMPALA-2533">
+ <h3 class="title topictitle3" id="ariaid-title84">Impala throws IllegalStateException when inserting data into a partition while select
+ subquery group by partition columns</h3>
+ <div class="body conbody">
+ <p class="p">
+ An <code class="ph codeph">INSERT ... SELECT</code> operation into a partitioned table could fail if the <code class="ph codeph">SELECT</code> query
+ included a <code class="ph codeph">GROUP BY</code> clause referring to the partition key columns.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2533" target="_blank">IMPALA-2533</a></p>
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title85" id="fixed_issues__fixed_issues_228">
+
+ <h2 class="title topictitle2" id="ariaid-title85">Issues Fixed in <span class="keyword">Impala 2.2.8</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.8</span>.
+ </p>
+
+ </div>
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title86" id="fixed_issues_228__IMPALA-1136">
+ <h3 class="title topictitle3" id="ariaid-title86">Impala is unable to read hive tables created with the "STORED AS AVRO" clause</h3>
+ <div class="body conbody">
+ <p class="p">Impala could not read Avro tables created in Hive with the <code class="ph codeph">STORED AS AVRO</code> clause.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1136" target="_blank">IMPALA-1136</a>,
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2161" target="_blank">IMPALA-2161</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title87" id="fixed_issues_228__IMPALA-2213">
+ <h3 class="title topictitle3" id="ariaid-title87">make Parquet scanner fail query if the file size metadata is stale</h3>
+ <div class="body conbody">
+ <p class="p">If a Parquet file in HDFS was overwritten by a smaller file, Impala could encounter a serious error.
+ Issuing a <code class="ph codeph">INVALIDATE METADATA</code> statement before a subsequent query would avoid the error.
+ The fix allows Impala to handle such inconsistencies in Parquet file length cleanly regardless of whether the
+ table metadata is up-to-date.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2213" target="_blank">IMPALA-2213</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title88" id="fixed_issues_228__IMPALA-2249">
+ <h3 class="title topictitle3" id="ariaid-title88">Avoid allocating StringBuffer > 1GB in ScannerContext::Stream::GetBytesInternal()</h3>
+ <div class="body conbody">
+ <p class="p">Impala could encounter a serious error when reading compressed text files larger than 1 GB. The fix causes Impala
+ to issue an error message instead in this case.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2249" target="_blank">IMPALA-2249</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title89" id="fixed_issues_228__IMPALA-2284">
+ <h3 class="title topictitle3" id="ariaid-title89">Disallow long (1<<30) strings in group_concat()</h3>
+ <div class="body conbody">
+ <p class="p">A query using the <code class="ph codeph">group_concat()</code> function could encounter a serious error if the returned string value was larger than 1 GB.
+ Now the query fails with an error message in this case.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2284" target="_blank">IMPALA-2284</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title90" id="fixed_issues_228__IMPALA-2270">
+ <h3 class="title topictitle3" id="ariaid-title90">avoid FnvHash64to32 with empty inputs</h3>
+ <div class="body conbody">
+ <p class="p">An edge case in the algorithm used to distribute data among nodes could result in uneven distribution of work for some queries,
+ with all data sent to the same node.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2270" target="_blank">IMPALA-2270</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title91" id="fixed_issues_228__IMPALA-2348">
+ <h3 class="title topictitle3" id="ariaid-title91">The catalog does not close the connection to HMS during table invalidation</h3>
+ <div class="body conbody">
+ <p class="p">A communication error could occur between Impala and the Hive metastore database, causing Impala operations that update
+ table metadata to fail.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2348" target="_blank">IMPALA-2348</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title92" id="fixed_issues_228__IMPALA-2364-548">
+ <h3 class="title topictitle3" id="ariaid-title92">Wrong DCHECK in PHJ::ProcessProbeBatch</h3>
+ <div class="body conbody">
+ <p class="p">Certain queries could encounter a serious error if the spill-to-disk mechanism was activated.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2364" target="_blank">IMPALA-2364</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title93" id="fixed_issues_228__IMPALA-2165-548">
+ <h3 class="title topictitle3" id="ariaid-title93">Avoid cardinality 0 in scan nodes of small tables and low selectivity</h3>
+ <div class="body conbody">
+ <p class="p">Impala could generate a suboptimal query plan for some queries involving small tables.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2165" target="_blank">IMPALA-2165</a></p>
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title94" id="fixed_issues__fixed_issues_227">
+
+ <h2 class="title topictitle2" id="ariaid-title94">Issues Fixed in <span class="keyword">Impala 2.2.7</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.7</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title95" id="fixed_issues_227__IMPALA-1983">
+ <h3 class="title topictitle3" id="ariaid-title95">Warn if table stats are potentially corrupt.</h3>
+ <div class="body conbody">
+ <p class="p">
+ Impala warns if it detects a discrepancy in table statistics: a table considered to have zero rows even though there are data files present.
+ In this case, Impala also skips query optimizations that are normally applied to very small tables.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1983" target="_blank">IMPALA-1983:</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title96" id="fixed_issues_227__IMPALA-2266">
+ <h3 class="title topictitle3" id="ariaid-title96">Pass correct child node in 2nd phase merge aggregation.</h3>
+ <div class="body conbody">
+ <p class="p">A query could encounter a serious error if it included a particular combination of aggregate functions and inline views.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2266" target="_blank">IMPALA-2266</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title97" id="fixed_issues_227__IMPALA-2216">
+ <h3 class="title topictitle3" id="ariaid-title97">Set the output smap of an EmptySetNode produced from an empty inline view.</h3>
+ <div class="body conbody">
+ <p class="p">A query could encounter a serious error if it included an inline view whose subquery had no <code class="ph codeph">FROM</code> clause.</p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2216" target="_blank">IMPALA-2216</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title98" id="fixed_issues_227__IMPALA-2203">
+ <h3 class="title topictitle3" id="ariaid-title98">Set an InsertStmt's result exprs from the source statement's result exprs.</h3>
+ <div class="body conbody">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statement could produce
+ different results than a <code class="ph codeph">SELECT</code> statement, for queries including a <code class="ph codeph">FULL JOIN</code> clause
+ and including literal values in the select list.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2203" target="_blank">IMPALA-2203</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title99" id="fixed_issues_227__IMPALA-2088">
+ <h3 class="title topictitle3" id="ariaid-title99">Fix planning of empty union operands with analytics.</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could return incorrect results if it contained a <code class="ph codeph">UNION</code> clause,
+ calls to analytic functions, and a constant expression that evaluated to <code class="ph codeph">FALSE</code>.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2088" target="_blank">IMPALA-2088</a></p>
+ </div>
+ </article>
+
+
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title100" id="fixed_issues_227__IMPALA-2089">
+ <h3 class="title topictitle3" id="ariaid-title100">Retain eq predicates bound by grouping slots with complex grouping exprs.</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query containing an <code class="ph codeph">INNER JOIN</code> clause could return undesired rows.
+ Some predicate specified in the <code class="ph codeph">ON</code> clause could be omitted from the filtering operation.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2089" target="_blank">IMPALA-2089</a></p>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title101" id="fixed_issues_227__IMPALA-2199">
+ <h3 class="title topictitle3" id="ariaid-title101">Row count not set for empty partition when spec is used with compute incremental stats</h3>
+ <div class="body conbody">
+ <p class="p">
+ A <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement could leave the row count for an emptyp partition as -1,
+ rather than initializing the row count to 0. The missing statistic value could result in reduced query performance.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2199" target="_blank">IMPALA-2199</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title102" id="fixed_issues_227__IMPALA-1898">
+ <h3 class="title topictitle3" id="ariaid-title102">Explicit aliases + ordinals analysis bug</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could encounter a serious error if it included column aliases with the same names as table columns, and used
+ ordinal numbers in an <code class="ph codeph">ORDER BY</code> or <code class="ph codeph">GROUP BY</code> clause.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1898" target="_blank">IMPALA-1898</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title103" id="fixed_issues_227__IMPALA-1987">
+ <h3 class="title topictitle3" id="ariaid-title103">Fix TupleIsNullPredicate to return false if no tuples are nullable.</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could return incorrect results if it included an outer join clause, inline views, and calls to functions such as <code class="ph codeph">coalesce()</code>
+ that can generate <code class="ph codeph">NULL</code> values.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1987" target="_blank">IMPALA-1987</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title104" id="fixed_issues_227__IMPALA-2178">
+ <h3 class="title topictitle3" id="ariaid-title104">fix Expr::ComputeResultsLayout() logic</h3>
+ <div class="body conbody">
+ <p class="p">
+ A query could return incorrect results if the table contained multiple <code class="ph codeph">CHAR</code> columns with length of 2 or less,
+ and the query included a <code class="ph codeph">GROUP BY</code> clause that referred to multiple such columns.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2178" target="_blank">IMPALA-2178</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title105" id="fixed_issues_227__IMPALA-1737">
+ <h3 class="title topictitle3" id="ariaid-title105">Substitute an InsertStmt's partition key exprs with the root node's smap.</h3>
+ <div class="body conbody">
+ <p class="p">
+ An <code class="ph codeph">INSERT</code> statement could encounter a serious error if the <code class="ph codeph">SELECT</code>
+ portion called an analytic function.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1737" target="_blank">IMPALA-1737</a></p>
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title106" id="fixed_issues__fixed_issues_225">
+
+ <h2 class="title topictitle2" id="ariaid-title106">Issues Fixed in Impala <span class="keyword">Impala 2.2.5</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.5</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title107" id="fixed_issues_225__IMPALA-2048">
+ <h3 class="title topictitle3" id="ariaid-title107">Impala DML/DDL operations corrupt table metadata leading to Hive query failures</h3>
+ <div class="body conbody">
+ <p class="p">
+ When the Impala <code class="ph codeph">COMPUTE STATS</code> statement was run on a partitioned Parquet table that was created in Hive, the table subsequently became inaccessible in Hive.
+ The table was still accessible to Impala. Regaining access in Hive required a workaround of creating a new table. The error displayed in Hive was:
+ </p>
+<pre class="pre codeblock"><code>Error: Error while compiling statement: FAILED: SemanticException Class not found: org.apache.impala.hive.serde.ParquetInputFormat (state=42000,code=40000)</code></pre>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2048" target="_blank">IMPALA-2048</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title108" id="fixed_issues_225__IMPALA-1929">
+ <h3 class="title topictitle3" id="ariaid-title108">Avoiding a DCHECK of NULL hash table in spilled right joins</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ A query could encounter a serious error if it contained a <code class="ph codeph">RIGHT OUTER</code>, <code class="ph codeph">RIGHT ANTI</code>, or <code class="ph codeph">FULL OUTER</code> join clause
+ and approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1929" target="_blank">IMPALA-1929</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title109" id="fixed_issues_225__IMPALA-2136">
+ <h3 class="title topictitle3" id="ariaid-title109">Bug in PrintTColumnValue caused wrong stats for TINYINT partition cols</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ Declaring a partition key column as a <code class="ph codeph">TINYINT</code> caused problems with the <code class="ph codeph">COMPUTE STATS</code> statement.
+ The associated partitions would always have zero estimated rows, leading to potential inefficient query plans.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2136" target="_blank">IMPALA-2136</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title110" id="fixed_issues_225__IMPALA-2018">
+ <h3 class="title topictitle3" id="ariaid-title110">Where clause does not propagate to joins inside nested views</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ A query that referred to a view whose query referred to another view containing a join, could return incorrect results.
+ <code class="ph codeph">WHERE</code> clauses for the outermost query were not always applied, causing the result
+ set to include additional rows that should have been filtered out.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2018" target="_blank">IMPALA-2018</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title111" id="fixed_issues_225__IMPALA-2064">
+ <h3 class="title topictitle3" id="ariaid-title111">Add effective_user() builtin</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ The <code class="ph codeph">user()</code> function returned the name of the logged-in user, which might not be the
+ same as the user name being checked for authorization if, for example, delegation was enabled.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2064" target="_blank">IMPALA-2064</a></p>
+ <p class="p"><strong class="ph b">Resolution:</strong> Rather than change the behavior of the <code class="ph codeph">user()</code> function,
+ the fix introduces an additional function <code class="ph codeph">effective_user()</code> that returns the user name that is checked during authorization.</p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title112" id="fixed_issues_225__IMPALA-2125">
+ <h3 class="title topictitle3" id="ariaid-title112">Make UTC to local TimestampValue conversion faster.</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ Query performance was improved substantially for Parquet files containing <code class="ph codeph">TIMESTAMP</code>
+ data written by Hive, when the <code class="ph codeph">-convert_legacy_hive_parquet_utc_timestamps=true</code> setting
+ is in effect.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2125" target="_blank">IMPALA-2125</a></p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title113" id="fixed_issues_225__IMPALA-2065">
+ <h3 class="title topictitle3" id="ariaid-title113">Workaround IMPALA-1619 in BufferedBlockMgr::ConsumeMemory()</h3>
+
+ <div class="body conbody">
+ <p class="p">
+ A join query could encounter a serious error if the query
+ approached the memory limit on a host so that the <span class="q">"spill to disk"</span> mechanism was activated,
+ and data volume in the join was large enough that an internal memory buffer exceeded 1 GB in size on a particular host.
+ (Exceeding this limit would only happen for huge join queries, because Impala could split this intermediate data
+ into 16 parts during the join query, and the buffer only contains compact bookkeeping data rather than the actual
+ join column data.)
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2065" target="_blank">IMPALA-2065</a></p>
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title114" id="fixed_issues__fixed_issues_223">
+
+ <h2 class="title topictitle2" id="ariaid-title114">Issues Fixed in <span class="keyword">Impala 2.2.3</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section lists the most frequently encountered customer issues fixed in <span class="keyword">Impala 2.2.3</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title115" id="fixed_issues_223__isilon_support">
+ <h3 class="title topictitle3" id="ariaid-title115">Enable using Isilon as the underlying filesystem.</h3>
+ <div class="body conbody">
+ <p class="p">
+ Enabling Impala to work with the Isilon filesystem involves a number of
+ fixes to performance and flexibility for dealing with I/O using remote reads.
+ See <a class="xref" href="impala_isilon.html#impala_isilon">Using Impala with Isilon Storage</a> for details on using Impala and Isilon together.
+ </p>
+ <p class="p"><strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1968" target="_blank">IMPALA-1968</a>,
+ <a class="xref" href="https://issues.apache.org/jira/b
<TRUNCATED>
[16/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet.html b/docs/build3x/html/topics/impala_parquet.html
new file mode 100644
index 0000000..ce5242e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet.html
@@ -0,0 +1,1421 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content=
"Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content=
"parquet"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the Parquet File Format with Impala Tables</title></head><body id="parquet"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using the Parquet File Format with Impala Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala helps you to create, manage, and query Parquet tables. Parquet is a column-oriented binary file format
+ intended to be highly efficient for the types of large-scale queries that Impala is best at. Parquet is
+ especially good for queries scanning particular columns within a table, for example to query <span class="q">"wide"</span>
+ tables with many columns, or to perform aggregation operations such as <code class="ph codeph">SUM()</code> and
+ <code class="ph codeph">AVG()</code> that need to process most or all of the values from a column. Each data file contains
+ the values for a set of rows (the <span class="q">"row group"</span>). Within a data file, the values from each column are
+ organized so that they are all adjacent, enabling good compression for the values from that column. Queries
+ against a Parquet table can retrieve and analyze these values from any column quickly and with minimal I/O.
+ </p>
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Parquet Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="parquet__entry__1">
+ File Type
+ </th>
+ <th class="entry nocellnorowborder" id="parquet__entry__2">
+ Format
+ </th>
+ <th class="entry nocellnorowborder" id="parquet__entry__3">
+ Compression Codecs
+ </th>
+ <th class="entry nocellnorowborder" id="parquet__entry__4">
+ Impala Can CREATE?
+ </th>
+ <th class="entry nocellnorowborder" id="parquet__entry__5">
+ Impala Can INSERT?
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="parquet__entry__1 ">
+ <a class="xref" href="impala_parquet.html#parquet">Parquet</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="parquet__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="parquet__entry__3 ">
+ Snappy, gzip; currently Snappy by default
+ </td>
+ <td class="entry nocellnorowborder" headers="parquet__entry__4 ">
+ Yes.
+ </td>
+ <td class="entry nocellnorowborder" headers="parquet__entry__5 ">
+ Yes: <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, and query.
+ </td>
+ </tr>
+ </tbody></table>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="parquet__parquet_ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Creating Parquet Tables in Impala</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To create a table named <code class="ph codeph">PARQUET_TABLE</code> that uses the Parquet format, you would use a
+ command like the following, substituting your own table name, column names, and data types:
+ </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] > create table <var class="keyword varname">parquet_table_name</var> (x INT, y STRING) STORED AS PARQUET;</code></pre>
+
+
+
+ <p class="p">
+ Or, to clone the column names and data types of an existing table:
+ </p>
+
+<pre class="pre codeblock"><code>[impala-host:21000] > create table <var class="keyword varname">parquet_table_name</var> LIKE <var class="keyword varname">other_table_name</var> STORED AS PARQUET;</code></pre>
+
+ <p class="p">
+ In Impala 1.4.0 and higher, you can derive column definitions from a raw Parquet data file, even without an
+ existing Impala table. For example, you can create an external table pointing to an HDFS directory, and
+ base the column definitions on one of the files in that directory:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE EXTERNAL TABLE ingest_existing_files LIKE PARQUET '/user/etl/destination/datafile1.dat'
+ STORED AS PARQUET
+ LOCATION '/user/etl/destination';
+</code></pre>
+
+ <p class="p">
+ Or, you can refer to an existing data file and create a new empty table with suitable column definitions.
+ Then you can use <code class="ph codeph">INSERT</code> to create new data files or <code class="ph codeph">LOAD DATA</code> to transfer
+ existing data files into the new table.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+ STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ The default properties of the newly created table are the same as for any other <code class="ph codeph">CREATE
+ TABLE</code> statement. For example, the default file format is text; if you want the new table to use
+ the Parquet file format, include the <code class="ph codeph">STORED AS PARQUET</code> file also.
+ </p>
+
+ <p class="p">
+ In this example, the new table is partitioned by year, month, and day. These partition key columns are not
+ part of the data file, so you specify them in the <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE columns_from_data_file LIKE PARQUET '/user/etl/destination/datafile1.dat'
+ PARTITION (year INT, month TINYINT, day TINYINT)
+ STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about the <code class="ph codeph">CREATE TABLE
+ LIKE PARQUET</code> syntax.
+ </p>
+
+ <p class="p">
+ Once you have created a table, to insert data into that table, use a command similar to the following,
+ again with your own table names:
+ </p>
+
+
+
+<pre class="pre codeblock"><code>[impala-host:21000] > insert overwrite table <var class="keyword varname">parquet_table_name</var> select * from <var class="keyword varname">other_table_name</var>;</code></pre>
+
+ <p class="p">
+ If the Parquet table has a different number of columns or different column names than the other table,
+ specify the names of columns from the other table rather than <code class="ph codeph">*</code> in the
+ <code class="ph codeph">SELECT</code> statement.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="parquet__parquet_etl">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Loading Data into Parquet Tables</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Choose from the following techniques for loading data into Parquet tables, depending on whether the
+ original data is already in an Impala table, or exists as raw data files outside Impala.
+ </p>
+
+ <p class="p">
+ If you already have data in an Impala or Hive table, perhaps in a different file format or partitioning
+ scheme, you can transfer the data to a Parquet table using the Impala <code class="ph codeph">INSERT...SELECT</code>
+ syntax. You can convert, filter, repartition, and do other things to the data as part of this same
+ <code class="ph codeph">INSERT</code> statement. See <a class="xref" href="#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a> for some examples showing how to
+ insert data into Parquet tables.
+ </p>
+
+ <div class="p">
+ When inserting into partitioned tables, especially using the Parquet file format, you can include a hint in
+ the <code class="ph codeph">INSERT</code> statement to fine-tune the overall performance of the operation and its
+ resource usage:
+ <ul class="ul">
+
+ <li class="li">
+ You would only use hints if an <code class="ph codeph">INSERT</code> into a partitioned Parquet table was
+ failing due to capacity limits, or if such an <code class="ph codeph">INSERT</code> was succeeding but with
+ less-than-optimal performance.
+ </li>
+
+ <li class="li">
+ To use a hint to influence the join order, put the hint keyword <code class="ph codeph">/* +SHUFFLE */</code> or <code class="ph codeph">/* +NOSHUFFLE */</code>
+ (including the square brackets) after the <code class="ph codeph">PARTITION</code> clause, immediately before the
+ <code class="ph codeph">SELECT</code> keyword.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +SHUFFLE */</code> selects an execution plan that reduces the number of files being written
+ simultaneously to HDFS, and the number of memory buffers holding data for individual partitions. Thus
+ it reduces overall resource usage for the <code class="ph codeph">INSERT</code> operation, allowing some
+ <code class="ph codeph">INSERT</code> operations to succeed that otherwise would fail. It does involve some data
+ transfer between the nodes so that the data files for a particular partition are all constructed on the
+ same node.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">/* +NOSHUFFLE */</code> selects an execution plan that might be faster overall, but might also
+ produce a larger number of small data files or exceed capacity limits, causing the
+ <code class="ph codeph">INSERT</code> operation to fail. Use <code class="ph codeph">/* +SHUFFLE */</code> in cases where an
+ <code class="ph codeph">INSERT</code> statement fails or runs inefficiently due to all nodes attempting to construct
+ data for all partitions.
+ </li>
+
+ <li class="li">
+ Impala automatically uses the <code class="ph codeph">/* +SHUFFLE */</code> method if any partition key column in the
+ source table, mentioned in the <code class="ph codeph">INSERT ... SELECT</code> query, does not have column
+ statistics. In this case, only the <code class="ph codeph">/* +NOSHUFFLE */</code> hint would have any effect.
+ </li>
+
+ <li class="li">
+ If column statistics are available for all partition key columns in the source table mentioned in the
+ <code class="ph codeph">INSERT ... SELECT</code> query, Impala chooses whether to use the <code class="ph codeph">/* +SHUFFLE */</code>
+ or <code class="ph codeph">/* +NOSHUFFLE */</code> technique based on the estimated number of distinct values in those
+ columns and the number of nodes involved in the <code class="ph codeph">INSERT</code> operation. In this case, you
+ might need the <code class="ph codeph">/* +SHUFFLE */</code> or the <code class="ph codeph">/* +NOSHUFFLE */</code> hint to override the
+ execution plan selected by Impala.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.8</span> or higher, you can make the
+ <code class="ph codeph">INSERT</code> operation organize (<span class="q">"cluster"</span>)
+ the data for each partition to avoid buffering data for multiple partitions
+ and reduce the risk of an out-of-memory condition. Specify the hint as
+ <code class="ph codeph">/* +CLUSTERED */</code>. This technique is primarily
+ useful for inserts into Parquet tables, where the large block
+ size requires substantial memory to buffer data for multiple
+ output files at once.
+ </li>
+
+ </ul>
+ </div>
+
+ <p class="p">
+ Any <code class="ph codeph">INSERT</code> statement for a Parquet table requires enough free space in the HDFS filesystem
+ to write one block. Because Parquet data files use a block size of 1 GB by default, an
+ <code class="ph codeph">INSERT</code> might fail (even for a very small amount of data) if your HDFS is running low on
+ space.
+ </p>
+
+
+
+ <p class="p">
+ Avoid the <code class="ph codeph">INSERT...VALUES</code> syntax for Parquet tables, because
+ <code class="ph codeph">INSERT...VALUES</code> produces a separate tiny data file for each
+ <code class="ph codeph">INSERT...VALUES</code> statement, and the strength of Parquet is in its handling of data
+ (compressing, parallelizing, and so on) in <span class="ph">large</span> chunks.
+ </p>
+
+ <p class="p">
+ If you have one or more Parquet data files produced outside of Impala, you can quickly make the data
+ queryable through Impala by one of the following methods:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">LOAD DATA</code> statement moves a single data file or a directory full of data files into
+ the data directory for an Impala table. It does no validation or conversion of the data. The original
+ data files must be somewhere in HDFS, not the local filesystem.
+
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">CREATE TABLE</code> statement with the <code class="ph codeph">LOCATION</code> clause creates a table
+ where the data continues to reside outside the Impala data directory. The original data files must be
+ somewhere in HDFS, not the local filesystem. For extra safety, if the data is intended to be long-lived
+ and reused by other applications, you can use the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax so that
+ the data files are not deleted by an Impala <code class="ph codeph">DROP TABLE</code> statement.
+
+ </li>
+
+ <li class="li">
+ If the Parquet table already exists, you can copy Parquet data files directly into it, then use the
+ <code class="ph codeph">REFRESH</code> statement to make Impala recognize the newly added data. Remember to preserve
+ the block size of the Parquet data files by using the <code class="ph codeph">hadoop distcp -pb</code> command rather
+ than a <code class="ph codeph">-put</code> or <code class="ph codeph">-cp</code> operation on the Parquet files. See
+ <a class="xref" href="#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example of this kind of operation.
+ </li>
+ </ul>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Currently, Impala always decodes the column data in Parquet files based on the ordinal position of the
+ columns, not by looking up the position of each column based on its name. Parquet files produced outside
+ of Impala must write column data in the same order as the columns are declared in the Impala table. Any
+ optional columns that are omitted from the data files must be the rightmost columns in the Impala table
+ definition.
+ </p>
+
+ <p class="p">
+ If you created compressed Parquet files through some tool other than Impala, make sure that any
+ compression codecs are supported in Parquet by Impala. For example, Impala does not currently support LZO
+ compression in Parquet files. Also doublecheck that you used any recommended compatibility settings in
+ the other tool, such as <code class="ph codeph">spark.sql.parquet.binaryAsString</code> when writing Parquet files
+ through Spark.
+ </p>
+ </div>
+
+ <p class="p">
+ Recent versions of Sqoop can produce Parquet output files using the <code class="ph codeph">--as-parquetfile</code>
+ option.
+ </p>
+
+ <p class="p"> If you use Sqoop to
+ convert RDBMS data to Parquet, be careful with interpreting any
+ resulting values from <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>,
+ or <code class="ph codeph">TIMESTAMP</code> columns. The underlying values are
+ represented as the Parquet <code class="ph codeph">INT64</code> type, which is
+ represented as <code class="ph codeph">BIGINT</code> in the Impala table. The Parquet
+ values represent the time in milliseconds, while Impala interprets
+ <code class="ph codeph">BIGINT</code> as the time in seconds. Therefore, if you have
+ a <code class="ph codeph">BIGINT</code> column in a Parquet table that was imported
+ this way from Sqoop, divide the values by 1000 when interpreting as the
+ <code class="ph codeph">TIMESTAMP</code> type.</p>
+
+ <p class="p">
+ If the data exists outside Impala and is in some other format, combine both of the preceding techniques.
+ First, use a <code class="ph codeph">LOAD DATA</code> or <code class="ph codeph">CREATE EXTERNAL TABLE ... LOCATION</code> statement to
+ bring the data into an Impala table that uses the appropriate file format. Then, use an
+ <code class="ph codeph">INSERT...SELECT</code> statement to copy the data to the Parquet table, converting to Parquet
+ format as part of the process.
+ </p>
+
+
+
+ <p class="p">
+ Loading data into Parquet tables is a memory-intensive operation, because the incoming data is buffered
+ until it reaches <span class="ph">one data block</span> in size, then that chunk of data is
+ organized and compressed in memory before being written out. The memory consumption can be larger when
+ inserting data into partitioned Parquet tables, because a separate data file is written for each
+ combination of partition key column values, potentially requiring several
+ <span class="ph">large</span> chunks to be manipulated in memory at once.
+ </p>
+
+ <p class="p">
+ When inserting into a partitioned Parquet table, Impala redistributes the data among the nodes to reduce
+ memory consumption. You might still need to temporarily increase the memory dedicated to Impala during the
+ insert operation, or break up the load operation into several <code class="ph codeph">INSERT</code> statements, or both.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ All the preceding techniques assume that the data you are loading matches the structure of the destination
+ table, including column order, column names, and partition layout. To transform or reorganize the data,
+ start by loading the data into a Parquet table that matches the underlying structure of the data, then use
+ one of the table-copying techniques such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ...
+ SELECT</code> to reorder or rename columns, divide the data among multiple partitions, and so on. For
+ example to take a single comprehensive Parquet data file and load it into a partitioned table, you would
+ use an <code class="ph codeph">INSERT ... SELECT</code> statement with dynamic partitioning to let Impala create separate
+ data files with the appropriate partition values; for an example, see
+ <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>.
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="parquet__parquet_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala Parquet Tables</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Query performance for Parquet tables depends on the number of columns needed to process the
+ <code class="ph codeph">SELECT</code> list and <code class="ph codeph">WHERE</code> clauses of the query, the way data is divided into
+ <span class="ph">large data files with block size equal to file size</span>, the reduction in I/O
+ by reading the data for each column in compressed format, which data files can be skipped (for partitioned
+ tables), and the CPU overhead of decompressing the data for each column.
+ </p>
+
+ <div class="p">
+ For example, the following is an efficient query for a Parquet table:
+<pre class="pre codeblock"><code>select avg(income) from census_data where state = 'CA';</code></pre>
+ The query processes only 2 columns out of a large number of total columns. If the table is partitioned by
+ the <code class="ph codeph">STATE</code> column, it is even more efficient because the query only has to read and decode
+ 1 column from each data file, and it can read only the data files in the partition directory for the state
+ <code class="ph codeph">'CA'</code>, skipping the data files for all the other states, which will be physically located
+ in other directories.
+ </div>
+
+ <div class="p">
+ The following is a relatively inefficient query for a Parquet table:
+<pre class="pre codeblock"><code>select * from census_data;</code></pre>
+ Impala would have to read the entire contents of each <span class="ph">large</span> data file,
+ and decompress the contents of each column for each row group, negating the I/O optimizations of the
+ column-oriented format. This query might still be faster for a Parquet table than a table with some other
+ file format, but it does not take advantage of the unique strengths of Parquet data files.
+ </div>
+
+ <p class="p">
+ Impala can optimize queries on Parquet tables, especially join queries, better when statistics are
+ available for all the tables. Issue the <code class="ph codeph">COMPUTE STATS</code> statement for each table after
+ substantial amounts of data are loaded into or appended to it. See
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+ </p>
+
+ <p class="p">
+ The runtime filtering feature, available in <span class="keyword">Impala 2.5</span> and higher, works best with Parquet tables.
+ The per-row filtering aspect only applies to Parquet tables.
+ See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, Parquet files written by Impala include
+ embedded metadata specifying the minimum and maximum values for each column, within
+ each row group and each data page within the row group. Impala-written Parquet files
+ typically contain a single row group; a row group can contain many data pages.
+ Impala uses this information (currently, only the metadata for each row group)
+ when reading each Parquet data file during a query, to quickly determine whether each
+ row group within the file potentially includes any rows that match the conditions in the
+ <code class="ph codeph">WHERE</code> clause. For example, if the column <code class="ph codeph">X</code> within
+ a particular Parquet file has a minimum value of 1 and a maximum value of 100, then
+ a query including the clause <code class="ph codeph">WHERE x > 200</code> can quickly determine
+ that it is safe to skip that particular file, instead of scanning all the associated
+ column values. This optimization technique is especially effective for tables that
+ use the <code class="ph codeph">SORT BY</code> clause for the columns most frequently checked in
+ <code class="ph codeph">WHERE</code> clauses, because any <code class="ph codeph">INSERT</code> operation on
+ such tables produces Parquet data files with relatively narrow ranges of column values
+ within each file.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="parquet_performance__parquet_partitioning">
+
+ <h3 class="title topictitle3" id="ariaid-title5">Partitioning for Parquet Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ As explained in <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, partitioning is an important
+ performance technique for Impala generally. This section explains some of the performance considerations
+ for partitioned Parquet tables.
+ </p>
+
+ <p class="p">
+ The Parquet file format is ideal for tables containing many columns, where most queries only refer to a
+ small subset of the columns. As explained in <a class="xref" href="#parquet_data_files">How Parquet Data Files Are Organized</a>, the physical layout of
+ Parquet data files lets Impala read only a small fraction of the data for many queries. The performance
+ benefits of this approach are amplified when you use Parquet tables in combination with partitioning.
+ Impala can skip the data files for certain partitions entirely, based on the comparisons in the
+ <code class="ph codeph">WHERE</code> clause that refer to the partition key columns. For example, queries on
+ partitioned tables often analyze data for time intervals based on columns such as <code class="ph codeph">YEAR</code>,
+ <code class="ph codeph">MONTH</code>, and/or <code class="ph codeph">DAY</code>, or for geographic regions. Remember that Parquet
+ data files use a <span class="ph">large</span> block size, so when deciding how finely to
+ partition the data, try to find a granularity where each partition contains
+ <span class="ph">256 MB</span> or more of data, rather than creating a large number of smaller
+ files split among many partitions.
+ </p>
+
+ <p class="p">
+ Inserting into a partitioned Parquet table can be a resource-intensive operation, because each Impala
+ node could potentially be writing a separate data file to HDFS for each combination of different values
+ for the partition key columns. The large number of simultaneous open files could exceed the HDFS
+ <span class="q">"transceivers"</span> limit. To avoid exceeding this limit, consider the following techniques:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Load different subsets of data using separate <code class="ph codeph">INSERT</code> statements with specific values
+ for the <code class="ph codeph">PARTITION</code> clause, such as <code class="ph codeph">PARTITION (year=2010)</code>.
+ </li>
+
+ <li class="li">
+ Increase the <span class="q">"transceivers"</span> value for HDFS, sometimes spelled <span class="q">"xcievers"</span> (sic). The property
+ value in the <span class="ph filepath">hdfs-site.xml</span> configuration file is
+
+ <code class="ph codeph">dfs.datanode.max.transfer.threads</code>. For example, if you were loading 12 years of data
+ partitioned by year, month, and day, even a value of 4096 might not be high enough. This
+ <a class="xref" href="http://blog.cloudera.com/blog/2012/03/hbase-hadoop-xceivers/" target="_blank">blog post</a> explores the considerations for setting this value
+ higher or lower, using HBase examples for illustration.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">COMPUTE STATS</code> statement to collect
+ <a class="xref" href="impala_perf_stats.html#perf_column_stats">column statistics</a> on the source table from
+ which data is being copied, so that the Impala query can estimate the number of different values in the
+ partition key columns and distribute the work accordingly.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="parquet__parquet_compression">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Snappy and GZip Compression for Parquet Data Files</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying
+ compression is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option. (Prior to Impala 2.0, the
+ query option name was <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>.) The allowed values for this query option
+ are <code class="ph codeph">snappy</code> (the default), <code class="ph codeph">gzip</code>, and <code class="ph codeph">none</code>. The option
+ value is not case-sensitive. If the option is set to an unrecognized value, all kinds of queries will fail
+ due to the invalid option setting, not just queries involving Parquet tables.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="parquet_compression__parquet_snappy">
+
+ <h3 class="title topictitle3" id="ariaid-title7">Example of Parquet Table with Snappy Compression</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ By default, the underlying data files for a Parquet table are compressed with Snappy. The combination of
+ fast compression and decompression makes it a good choice for many data sets. To ensure Snappy
+ compression is used, for example after experimenting with other compression codecs, set the
+ <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">snappy</code> before inserting the data:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database parquet_compression;
+[localhost:21000] > use parquet_compression;
+[localhost:21000] > create table parquet_snappy like raw_text_data;
+[localhost:21000] > set COMPRESSION_CODEC=snappy;
+[localhost:21000] > insert into parquet_snappy select * from raw_text_data;
+Inserted 1000000000 rows in 181.98s
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="parquet_compression__parquet_gzip">
+
+ <h3 class="title topictitle3" id="ariaid-title8">Example of Parquet Table with GZip Compression</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If you need more intensive compression (at the expense of more CPU cycles for uncompressing during
+ queries), set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">gzip</code> before
+ inserting the data:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table parquet_gzip like raw_text_data;
+[localhost:21000] > set COMPRESSION_CODEC=gzip;
+[localhost:21000] > insert into parquet_gzip select * from raw_text_data;
+Inserted 1000000000 rows in 1418.24s
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="parquet_compression__parquet_none">
+
+ <h3 class="title topictitle3" id="ariaid-title9">Example of Uncompressed Parquet Table</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If your data compresses very poorly, or you want to avoid the CPU overhead of compression and
+ decompression entirely, set the <code class="ph codeph">COMPRESSION_CODEC</code> query option to <code class="ph codeph">none</code>
+ before inserting the data:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table parquet_none like raw_text_data;
+[localhost:21000] > set COMPRESSION_CODEC=none;
+[localhost:21000] > insert into parquet_none select * from raw_text_data;
+Inserted 1000000000 rows in 146.90s
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="parquet_compression__parquet_compression_examples">
+
+ <h3 class="title topictitle3" id="ariaid-title10">Examples of Sizes and Speeds for Compressed Parquet Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are some examples showing differences in data sizes and query speeds for 1 billion rows of synthetic
+ data, compressed with each kind of codec. As always, run similar tests with realistic data sets of your
+ own. The actual compression ratios, and relative insert and query speeds, will vary depending on the
+ characteristics of the actual data.
+ </p>
+
+ <p class="p">
+ In this case, switching from Snappy to GZip compression shrinks the data by an additional 40% or so,
+ while switching from Snappy compression to no compression expands the data also by about 40%:
+ </p>
+
+<pre class="pre codeblock"><code>$ hdfs dfs -du -h /user/hive/warehouse/parquet_compression.db
+23.1 G /user/hive/warehouse/parquet_compression.db/parquet_snappy
+13.5 G /user/hive/warehouse/parquet_compression.db/parquet_gzip
+32.8 G /user/hive/warehouse/parquet_compression.db/parquet_none
+</code></pre>
+
+ <p class="p">
+ Because Parquet data files are typically <span class="ph">large</span>, each directory will
+ have a different number of data files and the row groups will be arranged differently.
+ </p>
+
+ <p class="p">
+ At the same time, the less agressive the compression, the faster the data can be decompressed. In this
+ case using a table with a billion rows, a query that evaluates all the values for a particular column
+ runs faster with no compression than with Snappy compression, and faster with Snappy compression than
+ with Gzip compression. Query performance depends on several other factors, so as always, run your own
+ benchmarks with your own data to determine the ideal tradeoff between data size, CPU efficiency, and
+ speed of insert and query operations.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > desc parquet_snappy;
+Query finished, fetching results ...
++-----------+---------+---------+
+| name | type | comment |
++-----------+---------+---------+
+| id | int | |
+| val | int | |
+| zfill | string | |
+| name | string | |
+| assertion | boolean | |
++-----------+---------+---------+
+Returned 5 row(s) in 0.14s
+[localhost:21000] > select avg(val) from parquet_snappy;
+Query finished, fetching results ...
++-----------------+
+| _c0 |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 4.29s
+[localhost:21000] > select avg(val) from parquet_gzip;
+Query finished, fetching results ...
++-----------------+
+| _c0 |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 6.97s
+[localhost:21000] > select avg(val) from parquet_none;
+Query finished, fetching results ...
++-----------------+
+| _c0 |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 3.67s
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="parquet_compression__parquet_compression_multiple">
+
+ <h3 class="title topictitle3" id="ariaid-title11">Example of Copying Parquet Data Files</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here is a final example, to illustrate how the data files using the various compression codecs are all
+ compatible with each other for read operations. The metadata about the compression format is written into
+ each data file, and can be decoded during queries regardless of the <code class="ph codeph">COMPRESSION_CODEC</code>
+ setting in effect at the time. In this example, we copy data files from the
+ <code class="ph codeph">PARQUET_SNAPPY</code>, <code class="ph codeph">PARQUET_GZIP</code>, and <code class="ph codeph">PARQUET_NONE</code> tables
+ used in the previous examples, each containing 1 billion rows, all to the data directory of a new table
+ <code class="ph codeph">PARQUET_EVERYTHING</code>. A couple of sample queries demonstrate that the new table now
+ contains 3 billion rows featuring a variety of compression codecs for the data files.
+ </p>
+
+ <p class="p">
+ First, we create the table in Impala so that there is a destination directory in HDFS to put the data
+ files:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table parquet_everything like parquet_snappy;
+Query: create table parquet_everything like parquet_snappy
+</code></pre>
+
+ <p class="p">
+ Then in the shell, we copy the relevant data files into the data directory for this new table. Rather
+ than using <code class="ph codeph">hdfs dfs -cp</code> as with typical files, we use <code class="ph codeph">hadoop distcp -pb</code>
+ to ensure that the special <span class="ph"> block size</span> of the Parquet data files is
+ preserved.
+ </p>
+
+<pre class="pre codeblock"><code>$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_snappy \
+ /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_gzip \
+ /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+$ hadoop distcp -pb /user/hive/warehouse/parquet_compression.db/parquet_none \
+ /user/hive/warehouse/parquet_compression.db/parquet_everything
+...<var class="keyword varname">MapReduce output</var>...
+</code></pre>
+
+ <p class="p">
+ Back in the <span class="keyword cmdname">impala-shell</span> interpreter, we use the <code class="ph codeph">REFRESH</code> statement to
+ alert the Impala server to the new data files for this table, then we can run queries demonstrating that
+ the data files represent 3 billion rows, and the values for one of the numeric columns match what was in
+ the original smaller tables:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > refresh parquet_everything;
+Query finished, fetching results ...
+
+Returned 0 row(s) in 0.32s
+[localhost:21000] > select count(*) from parquet_everything;
+Query finished, fetching results ...
++------------+
+| _c0 |
++------------+
+| 3000000000 |
++------------+
+Returned 1 row(s) in 8.18s
+[localhost:21000] > select avg(val) from parquet_everything;
+Query finished, fetching results ...
++-----------------+
+| _c0 |
++-----------------+
+| 250000.93577915 |
++-----------------+
+Returned 1 row(s) in 13.35s
+</code></pre>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="parquet__parquet_complex_types">
+
+ <h2 class="title topictitle2" id="ariaid-title12">Parquet Tables for Impala Complex Types</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, Impala supports the complex types
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+ Because these data types are currently supported only for the Parquet file format,
+ if you plan to use them, become familiar with the performance and storage aspects
+ of Parquet first.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="parquet__parquet_interop">
+
+ <h2 class="title topictitle2" id="ariaid-title13">Exchanging Parquet Data Files with Other Hadoop Components</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can read and write Parquet data files from other <span class="keyword"></span> components.
+ See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+
+
+
+
+
+
+
+
+ <p class="p">
+ Previously, it was not possible to create Parquet data through Impala and reuse that table within Hive. Now
+ that Parquet support is available for Hive, reusing existing Impala Parquet data files in Hive
+ requires updating the table metadata. Use the following command if you are already running Impala 1.1.1 or
+ higher:
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT PARQUET;
+</code></pre>
+
+ <p class="p">
+ If you are running a level of Impala that is older than 1.1.1, do the metadata update through Hive:
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET SERDE 'parquet.hive.serde.ParquetHiveSerDe';
+ALTER TABLE <var class="keyword varname">table_name</var> SET FILEFORMAT
+ INPUTFORMAT "parquet.hive.DeprecatedParquetInputFormat"
+ OUTPUTFORMAT "parquet.hive.DeprecatedParquetOutputFormat";
+</code></pre>
+
+ <p class="p">
+ Impala 1.1.1 and higher can reuse Parquet data files created by Hive, without any action required.
+ </p>
+
+
+
+ <p class="p">
+ Impala supports the scalar data types that you can encode in a Parquet data file, but not composite or
+ nested types such as maps or arrays. In <span class="keyword">Impala 2.2</span> and higher, Impala can query Parquet data
+ files that include composite or nested types, as long as the query only refers to columns with scalar
+ types.
+
+ </p>
+
+ <p class="p">
+ If you copy Parquet data files between nodes, or even between different directories on the same node, make
+ sure to preserve the block size by using the command <code class="ph codeph">hadoop distcp -pb</code>. To verify that the
+ block size was preserved, issue the command <code class="ph codeph">hdfs fsck -blocks
+ <var class="keyword varname">HDFS_path_of_impala_table_dir</var></code> and check that the average block size is at or
+ near <span class="ph">256 MB (or whatever other size is defined by the
+ <code class="ph codeph">PARQUET_FILE_SIZE</code> query option).</span>. (The <code class="ph codeph">hadoop distcp</code> operation
+ typically leaves some directories behind, with names matching <span class="ph filepath">_distcp_logs_*</span>, that you
+ can delete from the destination directory afterward.)
+
+
+
+ Issue the command <span class="keyword cmdname">hadoop distcp</span> for details about <span class="keyword cmdname">distcp</span> command
+ syntax.
+ </p>
+
+
+
+ <p class="p">
+ Impala can query Parquet files that use the <code class="ph codeph">PLAIN</code>, <code class="ph codeph">PLAIN_DICTIONARY</code>,
+ <code class="ph codeph">BIT_PACKED</code>, and <code class="ph codeph">RLE</code> encodings.
+ Currently, Impala does not support <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+ When creating files outside of Impala for use by Impala, make sure to use one of the supported encodings.
+ In particular, for MapReduce jobs, <code class="ph codeph">parquet.writer.version</code> must not be defined
+ (especially as <code class="ph codeph">PARQUET_2_0</code>) for writing the configurations of Parquet MR jobs.
+ Use the default version (or format). The default format, 1.0, includes some enhancements that are compatible with older versions.
+ Data using the 2.0 format might not be consumable by Impala, due to use of the <code class="ph codeph">RLE_DICTIONARY</code> encoding.
+ </p>
+ <div class="p">
+ To examine the internal structure and data of Parquet files, you can use the
+ <span class="keyword cmdname">parquet-tools</span> command. Make sure this
+ command is in your <code class="ph codeph">$PATH</code>. (Typically, it is symlinked from
+ <span class="ph filepath">/usr/bin</span>; sometimes, depending on your installation setup, you
+ might need to locate it under an alternative <code class="ph codeph">bin</code> directory.)
+ The arguments to this command let you perform operations such as:
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">cat</code>: Print a file's contents to standard out. In <span class="keyword">Impala 2.3</span> and higher, you can use
+ the <code class="ph codeph">-j</code> option to output JSON.
+ </li>
+ <li class="li">
+ <code class="ph codeph">head</code>: Print the first few records of a file to standard output.
+ </li>
+ <li class="li">
+ <code class="ph codeph">schema</code>: Print the Parquet schema for the file.
+ </li>
+ <li class="li">
+ <code class="ph codeph">meta</code>: Print the file footer metadata, including key-value properties (like Avro schema), compression ratios,
+ encodings, compression used, and row group information.
+ </li>
+ <li class="li">
+ <code class="ph codeph">dump</code>: Print all data and metadata.
+ </li>
+ </ul>
+ Use <code class="ph codeph">parquet-tools -h</code> to see usage information for all the arguments.
+ Here are some examples showing <span class="keyword cmdname">parquet-tools</span> usage:
+
+<pre class="pre codeblock"><code>
+$ # Be careful doing this for a big file! Use parquet-tools head to be safe.
+$ parquet-tools cat sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools head -n 2 sample.parq
+year = 1992
+month = 1
+day = 2
+dayofweek = 4
+dep_time = 748
+crs_dep_time = 750
+arr_time = 851
+crs_arr_time = 846
+carrier = US
+flight_num = 53
+actual_elapsed_time = 63
+crs_elapsed_time = 56
+arrdelay = 5
+depdelay = -2
+origin = CMH
+dest = IND
+distance = 182
+cancelled = 0
+diverted = 0
+
+year = 1992
+month = 1
+day = 3
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools schema sample.parq
+message schema {
+ optional int32 year;
+ optional int32 month;
+ optional int32 day;
+ optional int32 dayofweek;
+ optional int32 dep_time;
+ optional int32 crs_dep_time;
+ optional int32 arr_time;
+ optional int32 crs_arr_time;
+ optional binary carrier;
+ optional int32 flight_num;
+...
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+$ parquet-tools meta sample.parq
+creator: impala version 2.2.0-...
+
+file schema: schema
+-------------------------------------------------------------------
+year: OPTIONAL INT32 R:0 D:1
+month: OPTIONAL INT32 R:0 D:1
+day: OPTIONAL INT32 R:0 D:1
+dayofweek: OPTIONAL INT32 R:0 D:1
+dep_time: OPTIONAL INT32 R:0 D:1
+crs_dep_time: OPTIONAL INT32 R:0 D:1
+arr_time: OPTIONAL INT32 R:0 D:1
+crs_arr_time: OPTIONAL INT32 R:0 D:1
+carrier: OPTIONAL BINARY R:0 D:1
+flight_num: OPTIONAL INT32 R:0 D:1
+...
+
+row group 1: RC:20636601 TS:265103674
+-------------------------------------------------------------------
+year: INT32 SNAPPY DO:4 FPO:35 SZ:10103/49723/4.92 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+month: INT32 SNAPPY DO:10147 FPO:10210 SZ:11380/35732/3.14 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+day: INT32 SNAPPY DO:21572 FPO:21714 SZ:3071658/9868452/3.21 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dayofweek: INT32 SNAPPY DO:3093276 FPO:3093319 SZ:2274375/5941876/2.61 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+dep_time: INT32 SNAPPY DO:5367705 FPO:5373967 SZ:28281281/28573175/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_dep_time: INT32 SNAPPY DO:33649039 FPO:33654262 SZ:10220839/11574964/1.13 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+arr_time: INT32 SNAPPY DO:43869935 FPO:43876489 SZ:28562410/28797767/1.01 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+crs_arr_time: INT32 SNAPPY DO:72432398 FPO:72438151 SZ:10908972/12164626/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+carrier: BINARY SNAPPY DO:83341427 FPO:83341558 SZ:114916/128611/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+flight_num: INT32 SNAPPY DO:83456393 FPO:83488603 SZ:10216514/11474301/1.12 VC:20636601 ENC:PLAIN_DICTIONARY,RLE,PLAIN
+...
+
+</code></pre>
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="parquet__parquet_data_files">
+
+ <h2 class="title topictitle2" id="ariaid-title14">How Parquet Data Files Are Organized</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Parquet is a column-oriented file format, do not expect to find one data file for each column.
+ Parquet keeps all the data for a row within the same data file, to ensure that the columns for a row are
+ always available on the same node for processing. What Parquet does is to set a large HDFS block size and a
+ matching maximum data file size, to ensure that I/O and network transfer requests apply to large batches of
+ data.
+ </p>
+
+ <p class="p">
+ Within that data file, the data for a set of rows is rearranged so that all the values from the first
+ column are organized in one contiguous block, then all the values from the second column, and so on.
+ Putting the values from the same column next to each other lets Impala use effective compression techniques
+ on the values in that column.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Impala <code class="ph codeph">INSERT</code> statements write Parquet data files using an HDFS block size
+ <span class="ph">that matches the data file size</span>, to ensure that each data file is
+ represented by a single HDFS block, and the entire file can be processed on a single node without
+ requiring any remote reads.
+ </p>
+
+ <p class="p">
+ If you create Parquet data files outside of Impala, such as through a MapReduce or Pig job, ensure that
+ the HDFS block size is greater than or equal to the file size, so that the <span class="q">"one file per block"</span>
+ relationship is maintained. Set the <code class="ph codeph">dfs.block.size</code> or the <code class="ph codeph">dfs.blocksize</code>
+ property large enough that each file fits within a single HDFS block, even if that size is larger than
+ the normal HDFS block size.
+ </p>
+
+ <p class="p">
+ If the block size is reset to a lower value during a file copy, you will see lower performance for
+ queries involving those files, and the <code class="ph codeph">PROFILE</code> statement will reveal that some I/O is
+ being done suboptimally, through remote reads. See
+ <a class="xref" href="impala_parquet.html#parquet_compression_multiple">Example of Copying Parquet Data Files</a> for an example showing how to preserve the
+ block size when copying Parquet data files.
+ </p>
+ </div>
+
+ <p class="p">
+ When Impala retrieves or tests the data for a particular column, it opens all the data files, but only
+ reads the portion of each file containing the values for that column. The column values are stored
+ consecutively, minimizing the I/O required to process the values within a single column. If other columns
+ are named in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">WHERE</code> clauses, the data for all columns
+ in the same row is available within that same data file.
+ </p>
+
+ <p class="p">
+ If an <code class="ph codeph">INSERT</code> statement brings in less than <span class="ph">one Parquet
+ block's worth</span> of data, the resulting data file is smaller than ideal. Thus, if you do split up an ETL
+ job to use multiple <code class="ph codeph">INSERT</code> statements, try to keep the volume of data for each
+ <code class="ph codeph">INSERT</code> statement to approximately <span class="ph">256 MB, or a multiple of
+ 256 MB</span>.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title15" id="parquet_data_files__parquet_encoding">
+
+ <h3 class="title topictitle3" id="ariaid-title15">RLE and Dictionary Encoding for Parquet Data Files</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Parquet uses some automatic compression techniques, such as run-length encoding (RLE) and dictionary
+ encoding, based on analysis of the actual data values. Once the data values are encoded in a compact
+ form, the encoded data can optionally be further compressed using a compression algorithm. Parquet data
+ files created by Impala can use Snappy, GZip, or no compression; the Parquet spec also allows LZO
+ compression, but currently Impala does not support LZO-compressed Parquet files.
+ </p>
+
+ <p class="p">
+ RLE and dictionary encoding are compression techniques that Impala applies automatically to groups of
+ Parquet data values, in addition to any Snappy or GZip compression applied to the entire data files.
+ These automatic optimizations can save you time and planning that are normally needed for a traditional
+ data warehouse. For example, dictionary encoding reduces the need to create numeric IDs as abbreviations
+ for longer string values.
+ </p>
+
+ <p class="p">
+ Run-length encoding condenses sequences of repeated data values. For example, if many consecutive rows
+ all contain the same value for a country code, those repeating values can be represented by the value
+ followed by a count of how many times it appears consecutively.
+ </p>
+
+ <p class="p">
+ Dictionary encoding takes the different values present in a column, and represents each one in compact
+ 2-byte form rather than the original value, which could be several bytes. (Additional compression is
+ applied to the compacted values, for extra space savings.) This type of encoding applies when the number
+ of different values for a column is less than 2**16 (16,384). It does not apply to columns of data type
+ <code class="ph codeph">BOOLEAN</code>, which are already very short. <code class="ph codeph">TIMESTAMP</code> columns sometimes have
+ a unique value for each row, in which case they can quickly exceed the 2**16 limit on distinct values.
+ The 2**16 limit on different values within a column is reset for each data file, so if several different
+ data files each contained 10,000 different city names, the city name column in each data file could still
+ be condensed using dictionary encoding.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="parquet__parquet_compacting">
+
+ <h2 class="title topictitle2" id="ariaid-title16">Compacting Data Files for Parquet Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If you reuse existing table structures or ETL processes for Parquet tables, you might encounter a <span class="q">"many
+ small files"</span> situation, which is suboptimal for query efficiency. For example, statements like these
+ might produce inefficiently organized data files:
+ </p>
+
+<pre class="pre codeblock"><code>-- In an N-node cluster, each node produces a data file
+-- for the INSERT operation. If you have less than
+-- N GB of data to copy, some files are likely to be
+-- much smaller than the <span class="ph">default Parquet</span> block size.
+insert into parquet_table select * from text_table;
+
+-- Even if this operation involves an overall large amount of data,
+-- when split up by year/month/day, each partition might only
+-- receive a small amount of data. Then the data files for
+-- the partition might be divided between the N nodes in the cluster.
+-- A multi-gigabyte copy operation might produce files of only
+-- a few MB each.
+insert into partitioned_parquet_table partition (year, month, day)
+ select year, month, day, url, referer, user_agent, http_code, response_time
+ from web_stats;
+</code></pre>
+
+ <p class="p">
+ Here are techniques to help you produce large data files in Parquet <code class="ph codeph">INSERT</code> operations, and
+ to compact existing too-small data files:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ When inserting into a partitioned Parquet table, use statically partitioned <code class="ph codeph">INSERT</code>
+ statements where the partition key values are specified as constant values. Ideally, use a separate
+ <code class="ph codeph">INSERT</code> statement for each partition.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You might set the <code class="ph codeph">NUM_NODES</code> option to 1 briefly, during <code class="ph codeph">INSERT</code> or
+ <code class="ph codeph">CREATE TABLE AS SELECT</code> statements. Normally, those statements produce one or more data
+ files per data node. If the write operation involves small amounts of data, a Parquet table, and/or a
+ partitioned table, the default behavior could produce many small files when intuitively you might expect
+ only a single output file. <code class="ph codeph">SET NUM_NODES=1</code> turns off the <span class="q">"distributed"</span> aspect of the
+ write operation, making it more likely to produce only one or a few data files.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Be prepared to reduce the number of partition key columns from what you are used to with traditional
+ analytic database systems.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Do not expect Impala-written Parquet files to fill up the entire Parquet block size. Impala estimates
+ on the conservative side when figuring out how much data to write to each Parquet file. Typically, the
+ of uncompressed data in memory is substantially reduced on disk by the compression and encoding
+ techniques in the Parquet file format.
+
+ The final data file size varies depending on the compressibility of the data. Therefore, it is not an
+ indication of a problem if <span class="ph">256 MB</span> of text data is turned into 2
+ Parquet data files, each less than <span class="ph">256 MB</span>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you accidentally end up with a table with many small data files, consider using one or more of the
+ preceding techniques and copying all the data into a new Parquet table, either through <code class="ph codeph">CREATE
+ TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code> statements.
+ </p>
+
+ <p class="p">
+ To avoid rewriting queries to change table names, you can adopt a convention of always running
+ important queries against a view. Changing the view definition immediately switches any subsequent
+ queries to use the new underlying tables:
+ </p>
+<pre class="pre codeblock"><code>create view production_table as select * from table_with_many_small_files;
+-- CTAS or INSERT...SELECT all the data into a more efficient layout...
+alter view production_table as select * from table_with_few_big_files;
+select * from production_table where c1 = 100 and c2 < 50 and ...;
+</code></pre>
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="parquet__parquet_schema_evolution">
+
+ <h2 class="title topictitle2" id="ariaid-title17">Schema Evolution for Parquet Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Schema evolution refers to using the statement <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to change
+ the names, data type, or number of columns in a table. You can perform schema evolution for Parquet tables
+ as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The Impala <code class="ph codeph">ALTER TABLE</code> statement never changes any data files in the tables. From the
+ Impala side, schema evolution involves interpreting the same data files in terms of a new table
+ definition. Some types of schema changes make sense and are represented correctly. Other types of
+ changes cannot be represented in a sensible way, and produce special result values or conversion errors
+ during queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement always creates data using the latest table definition. You might
+ end up with data files with different numbers of columns or internal data representations if you do a
+ sequence of <code class="ph codeph">INSERT</code> and <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define additional columns at the end,
+ when the original data files are used in a query, these final columns are considered to be all
+ <code class="ph codeph">NULL</code> values.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> to define fewer columns than before, when
+ the original data files are used in a query, the unused columns still present in the data file are
+ ignored.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Parquet represents the <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, and <code class="ph codeph">INT</code>
+ types the same internally, all stored in 32-bit integers.
+ </p>
+ <ul class="ul">
+ <li class="li">
+ That means it is easy to promote a <code class="ph codeph">TINYINT</code> column to <code class="ph codeph">SMALLINT</code> or
+ <code class="ph codeph">INT</code>, or a <code class="ph codeph">SMALLINT</code> column to <code class="ph codeph">INT</code>. The numbers are
+ represented exactly the same in the data file, and the columns being promoted would not contain any
+ out-of-range values.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you change any of these column types to a smaller type, any values that are out-of-range for the
+ new type are returned incorrectly, typically as negative numbers.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You cannot change a <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, or <code class="ph codeph">INT</code>
+ column to <code class="ph codeph">BIGINT</code>, or the other way around. Although the <code class="ph codeph">ALTER
+ TABLE</code> succeeds, any attempt to query those columns results in conversion errors.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Any other type conversion for columns produces a conversion error during queries. For example,
+ <code class="ph codeph">INT</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">FLOAT</code> to <code class="ph codeph">DOUBLE</code>,
+ <code class="ph codeph">TIMESTAMP</code> to <code class="ph codeph">STRING</code>, <code class="ph codeph">DECIMAL(9,0)</code> to
+ <code class="ph codeph">DECIMAL(5,2)</code>, and so on.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ <div class="p">
+ You might find that you have Parquet files where the columns do not line up in the same
+ order as in your Impala table. For example, you might have a Parquet file that was part of
+ a table with columns <code class="ph codeph">C1,C2,C3,C4</code>, and now you want to reuse the same
+ Parquet file in a table with columns <code class="ph codeph">C4,C2</code>. By default, Impala expects the
+ columns in the data file to appear in the same order as the columns defined for the table,
+ making it impractical to do some kinds of file reuse or schema evolution. In <span class="keyword">Impala 2.6</span>
+ and higher, the query option <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION=name</code> lets Impala
+ resolve columns by name, and therefore handle out-of-order or extra columns in the data file.
+ For example:
+
+<pre class="pre codeblock"><code>
+create database schema_evolution;
+use schema_evolution;
+create table t1 (c1 int, c2 boolean, c3 string, c4 timestamp)
+ stored as parquet;
+insert into t1 values
+ (1, true, 'yes', now()),
+ (2, false, 'no', now() + interval 1 day);
+
+select * from t1;
++----+-------+-----+-------------------------------+
+| c1 | c2 | c3 | c4 |
++----+-------+-----+-------------------------------+
+| 1 | true | yes | 2016-06-28 14:53:26.554369000 |
+| 2 | false | no | 2016-06-29 14:53:26.554369000 |
++----+-------+-----+-------------------------------+
+
+desc formatted t1;
+...
+| Location: | /user/hive/warehouse/schema_evolution.db/t1 |
+...
+
+-- Make T2 have the same data file as in T1, including 2
+-- unused columns and column order different than T2 expects.
+load data inpath '/user/hive/warehouse/schema_evolution.db/t1'
+ into table t2;
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 1 |
++----------------------------------------------------------+
+
+-- 'position' is the default setting.
+-- Impala cannot read the Parquet file if the column order does not match.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=position;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to position
+
+select * from t2;
+WARNINGS:
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+File 'schema_evolution.db/t2/45331705_data.0.parq'
+has an incompatible Parquet schema for column 'schema_evolution.t2.c4'.
+Column type: TIMESTAMP, Parquet schema: optional int32 c1 [i:0 d:1 r:0]
+
+-- With the 'name' setting, Impala can read the Parquet data files
+-- despite mismatching column order.
+set PARQUET_FALLBACK_SCHEMA_RESOLUTION=name;
+PARQUET_FALLBACK_SCHEMA_RESOLUTION set to name
+
+select * from t2;
++-------------------------------+-------+
+| c4 | c2 |
++-------------------------------+-------+
+| 2016-06-28 14:53:26.554369000 | true |
+| 2016-06-29 14:53:26.554369000 | false |
++-------------------------------+-------+
+
+</code></pre>
+
+ See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+ for more details.
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="parquet__parquet_data_types">
+
+ <h2 class="title topictitle2" id="ariaid-title18">Data Type Considerations for Parquet Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Parquet format defines a set of data types whose names differ from the names of the corresponding
+ Impala data types. If you are preparing Parquet files using other Hadoop components such as Pig or
+ MapReduce, you might need to work with the type names defined by Parquet. The following figure lists the
+ Parquet-defined types and the equivalent types in Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Primitive types:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>BINARY -> STRING
+BOOLEAN -> BOOLEAN
+DOUBLE -> DOUBLE
+FLOAT -> FLOAT
+INT32 -> INT
+INT64 -> BIGINT
+INT96 -> TIMESTAMP
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Logical types:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>BINARY + OriginalType UTF8 -> STRING
+BINARY + OriginalType ENUM -> STRING
+BINARY + OriginalType DECIMAL -> DECIMAL
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex types:</strong>
+ </p>
+
+ <p class="p">
+ For the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code>)
+ available in <span class="keyword">Impala 2.3</span> and higher, Impala only supports queries
+ against those types in Parquet tables.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html b/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
new file mode 100644
index 0000000..f72b664
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_annotate_strings_utf8.html
@@ -0,0 +1,54 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_annotate_strings_utf8"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_annotate_strings_utf8"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Causes Impala <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+ to write Parquet files that use the UTF-8 annotation for <code class="ph codeph">STRING</code> columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ By default, Impala represents a <code class="ph codeph">STRING</code> column in Parquet as an unannotated binary field.
+ </p>
+ <p class="p">
+ Impala always uses the UTF-8 annotation when writing <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>
+ columns to Parquet files. An alternative to using the query option is to cast <code class="ph codeph">STRING</code>
+ values to <code class="ph codeph">VARCHAR</code>.
+ </p>
+ <p class="p">
+ This option is to help make Impala-written data more interoperable with other data processing engines.
+ Impala itself currently does not support all operations on UTF-8 data.
+ Although data processed by Impala is typically represented in ASCII, it is valid to designate the
+ data as UTF-8 when storing on disk, because ASCII is a subset of UTF-8.
+ </p>
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_array_resolution.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_array_resolution.html b/docs/build3x/html/topics/impala_parquet_array_resolution.html
new file mode 100644
index 0000000..831ac46
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_array_resolution.html
@@ -0,0 +1,180 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_array_resolution"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_ARRAY_RESOLUTION Query Option (Impala 2.9 or higher only)</title></head><body id="parquet_array_resolution"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">
+ PARQUET_ARRAY_RESOLUTION Query Option (<span class="keyword">Impala 2.9</span> or higher only)
+ </h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">PARQUET_ARRAY_RESOLUTION</code> query option controls the
+ behavior of the indexed-based resolution for nested arrays in Parquet.
+ </p>
+
+ <p class="p">
+ In Parquet, you can represent an array using a 2-level or 3-level
+ representation. The modern, standard representation is 3-level. The legacy
+ 2-level scheme is supported for compatibility with older Parquet files.
+ However, there is no reliable metadata within Parquet files to indicate
+ which encoding was used. It is even possible to have mixed encodings within
+ the same file if there are multiple arrays. The
+ <code class="ph codeph">PARQUET_ARRAY_RESOLTUTION</code> option controls the process of
+ resolution that is to match every column/field reference from a query to a
+ column in the Parquet file.</p>
+
+ <p class="p">
+ The supported values for the query option are:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">THREE_LEVEL</code>: Assumes arrays are encoded with the 3-level
+ representation, and does not attempt the 2-level resolution.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">TWO_LEVEL</code>: Assumes arrays are encoded with the 2-level
+ representation, and does not attempt the 3-level resolution.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">TWO_LEVEL_THEN_THREE_LEVEL</code>: First tries to resolve
+ assuming a 2-level representation, and if unsuccessful, tries a 3-level
+ representation.
+ </li>
+ </ul>
+
+ <p class="p">
+ All of the above options resolve arrays encoded with a single level.
+ </p>
+
+ <p class="p">
+ A failure to resolve a column/field reference in a query with a given array
+ resolution policy does not necessarily result in a warning or error returned
+ by the query. A mismatch might be treated like a missing column (returns
+ NULL values), and it is not possible to reliably distinguish the 'bad
+ resolution' and 'legitimately missing column' cases.
+ </p>
+
+ <p class="p">
+ The name-based policy generally does not have the problem of ambiguous
+ array representations. You specify to use the name-based policy by setting
+ the <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option to
+ <code class="ph codeph">NAME</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Enum of <code class="ph codeph">ONE_LEVEL</code>, <code class="ph codeph">TWO_LEVEL</code>,
+ <code class="ph codeph">THREE_LEVEL</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">THREE_LEVEL</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ EXAMPLE A: The following Parquet schema of a file can be interpreted as a
+ 2-level or 3-level:
+ </p>
+
+<pre class="pre codeblock"><code>
+ParquetSchemaExampleA {
+ optional group single_element_groups (LIST) {
+ repeated group single_element_group {
+ required int64 count;
+ }
+ }
+}
+</code></pre>
+
+ <p class="p">
+ The following table schema corresponds to a 2-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array<struct<f1: bigint>>) STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ Successful query with a 2-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM.f1 FROM t.col1;
+</code></pre>
+
+ <p class="p">
+ The following table schema corresponds to a 3-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array<bigint>) STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ Successful query with a 3-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+ <p class="p">
+ EXAMPLE B: The following Parquet schema of a file can be only be successfully
+ interpreted as a 2-level:
+ </p>
+
+<pre class="pre codeblock"><code>
+ParquetSchemaExampleB {
+ required group list_of_ints (LIST) {
+ repeated int32 list_of_ints_tuple;
+ }
+}
+</code></pre>
+
+ <p class="p">
+ The following table schema corresponds to a 2-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t (col1 array<int>) STORED AS PARQUET;
+</code></pre>
+
+ <p class="p">
+ Successful query with a 2-level interpretation:
+ </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=TWO_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+ <p class="p">
+ Unsuccessful query with a 3-level interpretation. The query returns
+ <code class="ph codeph">NULL</code>s as if the column was missing in the file:
+ </p>
+
+<pre class="pre codeblock"><code>
+SET PARQUET_ARRAY_RESOLUTION=THREE_LEVEL;
+SELECT ITEM FROM t.col1
+</code></pre>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_compression_codec.html b/docs/build3x/html/topics/impala_parquet_compression_codec.html
new file mode 100644
index 0000000..ac5551a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_compression_codec.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_COMPRESSION_CODEC Query Option</title></head><body id="parquet_compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PARQUET_COMPRESSION_CODEC Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Deprecated. Use <code class="ph codeph">COMPRESSION_CODEC</code> in Impala 2.0 and later. See
+ <a class="xref" href="impala_compression_codec.html#compression_codec">COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[12/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_planning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_planning.html b/docs/build3x/html/topics/impala_planning.html
new file mode 100644
index 0000000..e571e42
--- /dev/null
+++ b/docs/build3x/html/topics/impala_planning.html
@@ -0,0 +1,20 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_prereqs.html#prereqs"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_design.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="planning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Planning for Impala Deployment</title></head><body id="planning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Planning for Impala Deployment</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Before you set up Impala in production, do some planning to make sure that your hardware setup has sufficient
+ capacity, that your cluster topology is optimal for Impala queries, and that your schema design and ETL
+ processes follow the best practices for Impala.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_prereqs.html#prereqs">Impala Requirements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_schema_design.html">Guidelines for Designing Impala Schemas</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_porting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_porting.html b/docs/build3x/html/topics/impala_porting.html
new file mode 100644
index 0000000..8a8ba7e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_porting.html
@@ -0,0 +1,603 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="porting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Porting SQL from Other Database Systems to Impala</title></head><body id="porting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Porting SQL from Other Database Systems to Impala</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Although Impala uses standard SQL for queries, you might need to modify SQL source when bringing applications
+ to Impala, due to variations in data types, built-in functions, vendor language extensions, and
+ Hadoop-specific syntax. Even when SQL is working correctly, you might make further minor modifications for
+ best performance.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="porting__porting_ddl_dml">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Porting DDL and DML Statements</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When adapting SQL code from a traditional database system to Impala, expect to find a number of differences
+ in the DDL statements that you use to set up the schema. Clauses related to physical layout of files,
+ tablespaces, and indexes have no equivalent in Impala. You might restructure your schema considerably to
+ account for the Impala partitioning scheme and Hadoop file formats.
+ </p>
+
+ <p class="p">
+ Expect SQL queries to have a much higher degree of compatibility. With modest rewriting to address vendor
+ extensions and features not yet supported in Impala, you might be able to run identical or almost-identical
+ query text on both systems.
+ </p>
+
+ <p class="p">
+ Therefore, consider separating out the DDL into a separate Impala-specific setup script. Focus your reuse
+ and ongoing tuning efforts on the code for SQL queries.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="porting__porting_data_types">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Porting Data Types from Other Database Systems</h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">VARCHAR</code>, <code class="ph codeph">VARCHAR2</code>, and <code class="ph codeph">CHAR</code> columns to
+ <code class="ph codeph">STRING</code>. Remove any length constraints from the column declarations; for example,
+ change <code class="ph codeph">VARCHAR(32)</code> or <code class="ph codeph">CHAR(1)</code> to <code class="ph codeph">STRING</code>. Impala is
+ very flexible about the length of string values; it does not impose any length constraints
+ or do any special processing (such as blank-padding) for <code class="ph codeph">STRING</code> columns.
+ (In Impala 2.0 and higher, there are data types <code class="ph codeph">VARCHAR</code> and <code class="ph codeph">CHAR</code>,
+ with length constraints for both types and blank-padding for <code class="ph codeph">CHAR</code>.
+ However, for performance reasons, it is still preferable to use <code class="ph codeph">STRING</code>
+ columns where practical.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For national language character types such as <code class="ph codeph">NCHAR</code>, <code class="ph codeph">NVARCHAR</code>, or
+ <code class="ph codeph">NCLOB</code>, be aware that while Impala can store and query UTF-8 character data, currently
+ some string manipulation operations only work correctly with ASCII data. See
+ <a class="xref" href="impala_string.html#string">STRING Data Type</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">DATE</code>, <code class="ph codeph">DATETIME</code>, or <code class="ph codeph">TIME</code> columns to
+ <code class="ph codeph">TIMESTAMP</code>. Remove any precision constraints. Remove any timezone clauses, and make
+ sure your application logic or ETL process accounts for the fact that Impala expects all
+ <code class="ph codeph">TIMESTAMP</code> values to be in
+ <a class="xref" href="http://en.wikipedia.org/wiki/Coordinated_Universal_Time" target="_blank">Coordinated
+ Universal Time (UTC)</a>. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about
+ the <code class="ph codeph">TIMESTAMP</code> data type, and
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for conversion functions for different
+ date and time formats.
+ </p>
+ <p class="p">
+ You might also need to adapt date- and time-related literal values and format strings to use the
+ supported Impala date and time formats. If you have date and time literals with different separators or
+ different numbers of <code class="ph codeph">YY</code>, <code class="ph codeph">MM</code>, and so on placeholders than Impala
+ expects, consider using calls to <code class="ph codeph">regexp_replace()</code> to transform those values to the
+ Impala-compatible format. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for information about the
+ allowed formats for date and time literals, and
+ <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for string conversion functions such as
+ <code class="ph codeph">regexp_replace()</code>.
+ </p>
+ <p class="p">
+ Instead of <code class="ph codeph">SYSDATE</code>, call the function <code class="ph codeph">NOW()</code>.
+ </p>
+ <p class="p">
+ Instead of adding or subtracting directly from a date value to produce a value <var class="keyword varname">N</var>
+ days in the past or future, use an <code class="ph codeph">INTERVAL</code> expression, for example <code class="ph codeph">NOW() +
+ INTERVAL 30 DAYS</code>.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Although Impala supports <code class="ph codeph">INTERVAL</code> expressions for datetime arithmetic, as shown in
+ <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>, <code class="ph codeph">INTERVAL</code> is not available as a column
+ data type in Impala. For any <code class="ph codeph">INTERVAL</code> values stored in tables, convert them to numeric
+ values that you can add or subtract using the functions in
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a>. For example, if you had a table
+ <code class="ph codeph">DEADLINES</code> with an <code class="ph codeph">INT</code> column <code class="ph codeph">TIME_PERIOD</code>, you could
+ construct dates N days in the future like so:
+ </p>
+<pre class="pre codeblock"><code>SELECT NOW() + INTERVAL time_period DAYS from deadlines;</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For <code class="ph codeph">YEAR</code> columns, change to the smallest Impala integer type that has sufficient
+ range. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on
+ for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Change any <code class="ph codeph">DECIMAL</code> and <code class="ph codeph">NUMBER</code> types. If fixed-point precision is not
+ required, you can use <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> on the Impala side depending on
+ the range of values. For applications that require precise decimal values, such as financial data, you
+ might need to make more extensive changes to table structure and application logic, such as using
+ separate integer columns for dollars and cents, or encoding numbers as string values and writing UDFs
+ to manipulate them. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges,
+ casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">FLOAT</code>, <code class="ph codeph">DOUBLE</code>, and <code class="ph codeph">REAL</code> types are supported in
+ Impala. Remove any precision and scale specifications. (In Impala, <code class="ph codeph">REAL</code> is just an
+ alias for <code class="ph codeph">DOUBLE</code>; columns declared as <code class="ph codeph">REAL</code> are turned into
+ <code class="ph codeph">DOUBLE</code> behind the scenes.) See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for
+ details about ranges, casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Most integer types from other systems have equivalents in Impala, perhaps under different names such as
+ <code class="ph codeph">BIGINT</code> instead of <code class="ph codeph">INT8</code>. For any that are unavailable, for example
+ <code class="ph codeph">MEDIUMINT</code>, switch to the smallest Impala integer type that has sufficient range.
+ Remove any precision specifications. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details
+ about ranges, casting, and so on for the various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Remove any <code class="ph codeph">UNSIGNED</code> constraints. All Impala numeric types are signed. See
+ <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about ranges, casting, and so on for the
+ various numeric data types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For any types holding bitwise values, use an integer type with enough range to hold all the relevant
+ bits within a positive integer. See <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a> for details about
+ ranges, casting, and so on for the various numeric data types.
+ </p>
+ <p class="p">
+ For example, <code class="ph codeph">TINYINT</code> has a maximum positive value of 127, not 256, so to manipulate
+ 8-bit bitfields as positive numbers switch to the next largest type <code class="ph codeph">SMALLINT</code>.
+ </p>
+<pre class="pre codeblock"><code>[localhost:21000] > select cast(127*2 as tinyint);
++--------------------------+
+| cast(127 * 2 as tinyint) |
++--------------------------+
+| -2 |
++--------------------------+
+[localhost:21000] > select cast(128 as tinyint);
++----------------------+
+| cast(128 as tinyint) |
++----------------------+
+| -128 |
++----------------------+
+[localhost:21000] > select cast(127*2 as smallint);
++---------------------------+
+| cast(127 * 2 as smallint) |
++---------------------------+
+| 254 |
++---------------------------+</code></pre>
+ <p class="p">
+ Impala does not support notation such as <code class="ph codeph">b'0101'</code> for bit literals.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For BLOB values, use <code class="ph codeph">STRING</code> to represent <code class="ph codeph">CLOB</code> or
+ <code class="ph codeph">TEXT</code> types (character based large objects) up to 32 KB in size. Binary large objects
+ such as <code class="ph codeph">BLOB</code>, <code class="ph codeph">RAW</code> <code class="ph codeph">BINARY</code>, and
+ <code class="ph codeph">VARBINARY</code> do not currently have an equivalent in Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For Boolean-like types such as <code class="ph codeph">BOOL</code>, use the Impala <code class="ph codeph">BOOLEAN</code> type.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Because Impala currently does not support composite or nested types, any spatial data types in other
+ database systems do not have direct equivalents in Impala. You could represent spatial values in string
+ format and write UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details. Where
+ practical, separate spatial types into separate tables so that Impala can still work with the
+ non-spatial data.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any <code class="ph codeph">DEFAULT</code> clauses. Impala can use data files produced from many different
+ sources, such as Pig, Hive, or MapReduce jobs. The fast import mechanisms of <code class="ph codeph">LOAD DATA</code>
+ and external tables mean that Impala is flexible about the format of data files, and Impala does not
+ necessarily validate or cleanse data before querying it. When copying data through Impala
+ <code class="ph codeph">INSERT</code> statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or
+ <code class="ph codeph">NVL</code> to substitute some other value for <code class="ph codeph">NULL</code> fields; see
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any constraints from your <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements, for example <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">FOREIGN KEY</code>,
+ <code class="ph codeph">UNIQUE</code>, <code class="ph codeph">NOT NULL</code>, <code class="ph codeph">UNSIGNED</code>, or
+ <code class="ph codeph">CHECK</code> constraints. Impala can use data files produced from many different sources,
+ such as Pig, Hive, or MapReduce jobs. Therefore, Impala expects initial data validation to happen
+ earlier during the ETL or ELT cycle. After data is loaded into Impala tables, you can perform queries
+ to test for <code class="ph codeph">NULL</code> values. When copying data through Impala <code class="ph codeph">INSERT</code>
+ statements, you can use conditional functions such as <code class="ph codeph">CASE</code> or <code class="ph codeph">NVL</code> to
+ substitute some other value for <code class="ph codeph">NULL</code> fields; see
+ <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a> for details.
+ </p>
+ <p class="p">
+ Do as much verification as practical before loading data into Impala. After data is loaded into Impala,
+ you can do further verification using SQL queries to check if values have expected ranges, if values
+ are <code class="ph codeph">NULL</code> or not, and so on. If there is a problem with the data, you will need to
+ re-run earlier stages of the ETL process, or do an <code class="ph codeph">INSERT ... SELECT</code> statement in
+ Impala to copy the faulty data to a new table and transform or filter out the bad values.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Take out any <code class="ph codeph">CREATE INDEX</code>, <code class="ph codeph">DROP INDEX</code>, and <code class="ph codeph">ALTER
+ INDEX</code> statements, and equivalent <code class="ph codeph">ALTER TABLE</code> statements. Remove any
+ <code class="ph codeph">INDEX</code>, <code class="ph codeph">KEY</code>, or <code class="ph codeph">PRIMARY KEY</code> clauses from
+ <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> statements. Impala is optimized for bulk
+ read operations for data warehouse-style queries, and therefore does not support indexes for its
+ tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Calls to built-in functions with out-of-range or otherwise incorrect arguments, return
+ <code class="ph codeph">NULL</code> in Impala as opposed to raising exceptions. (This rule applies even when the
+ <code class="ph codeph">ABORT_ON_ERROR=true</code> query option is in effect.) Run small-scale queries using
+ representative data to doublecheck that calls to built-in functions are returning expected values
+ rather than <code class="ph codeph">NULL</code>. For example, unsupported <code class="ph codeph">CAST</code> operations do not
+ raise an error in Impala:
+ </p>
+<pre class="pre codeblock"><code>select cast('foo' as int);
++--------------------+
+| cast('foo' as int) |
++--------------------+
+| NULL |
++--------------------+</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For any other type not supported in Impala, you could represent their values in string format and write
+ UDFs to process them. See <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ To detect the presence of unsupported or unconvertable data types in data files, do initial testing
+ with the <code class="ph codeph">ABORT_ON_ERROR=true</code> query option in effect. This option causes queries to
+ fail immediately if they encounter disallowed type conversions. See
+ <a class="xref" href="impala_abort_on_error.html#abort_on_error">ABORT_ON_ERROR Query Option</a> for details. For example:
+ </p>
+<pre class="pre codeblock"><code>set abort_on_error=true;
+select count(*) from (select * from t1);
+-- The above query will fail if the data files for T1 contain any
+-- values that can't be converted to the expected Impala data types.
+-- For example, if T1.C1 is defined as INT but the column contains
+-- floating-point values like 1.1, the query will return an error.</code></pre>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="porting__porting_statements">
+
+ <h2 class="title topictitle2" id="ariaid-title4">SQL Statements to Remove or Adapt</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Some SQL statements or clauses that you might be familiar with are not currently supported in Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala has no <code class="ph codeph">DELETE</code> statement. Impala is intended for data warehouse-style operations
+ where you do bulk moves and transforms of large quantities of data. Instead of using
+ <code class="ph codeph">DELETE</code>, use <code class="ph codeph">INSERT OVERWRITE</code> to entirely replace the contents of a
+ table or partition, or use <code class="ph codeph">INSERT ... SELECT</code> to copy a subset of data (everything but
+ the rows you intended to delete) from one table to another. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for
+ an overview of Impala DML statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has no <code class="ph codeph">UPDATE</code> statement. Impala is intended for data warehouse-style operations
+ where you do bulk moves and transforms of large quantities of data. Instead of using
+ <code class="ph codeph">UPDATE</code>, do all necessary transformations early in the ETL process, such as in the job
+ that generates the original data, or when copying from one table to another to convert to a particular
+ file format or partitioning scheme. See <a class="xref" href="impala_dml.html#dml">DML Statements</a> for an overview of Impala DML
+ statements.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has no transactional statements, such as <code class="ph codeph">COMMIT</code> or <code class="ph codeph">ROLLBACK</code>.
+ Impala effectively works like the <code class="ph codeph">AUTOCOMMIT</code> mode in some database systems, where
+ changes take effect as soon as they are made.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If your database, table, column, or other names conflict with Impala reserved words, use different
+ names or quote the names with backticks. See <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>
+ for the current list of Impala reserved words.
+ </p>
+ <p class="p">
+ Conversely, if you use a keyword that Impala does not recognize, it might be interpreted as a table or
+ column alias. For example, in <code class="ph codeph">SELECT * FROM t1 NATURAL JOIN t2</code>, Impala does not
+ recognize the <code class="ph codeph">NATURAL</code> keyword and interprets it as an alias for the table
+ <code class="ph codeph">t1</code>. If you experience any unexpected behavior with queries, check the list of reserved
+ words to make sure all keywords in join and <code class="ph codeph">WHERE</code> clauses are recognized.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports subqueries only in the <code class="ph codeph">FROM</code> clause of a query, not within the
+ <code class="ph codeph">WHERE</code> clauses. Therefore, you cannot use clauses such as <code class="ph codeph">WHERE
+ <var class="keyword varname">column</var> IN (<var class="keyword varname">subquery</var>)</code>. Also, Impala does not allow
+ <code class="ph codeph">EXISTS</code> or <code class="ph codeph">NOT EXISTS</code> clauses (although <code class="ph codeph">EXISTS</code> is a
+ reserved keyword).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code> set operators, but not
+ <code class="ph codeph">INTERSECT</code>. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+ data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+ because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Within queries, Impala requires query aliases for any subqueries:
+ </p>
+<pre class="pre codeblock"><code>-- Without the alias 'contents_of_t1' at the end, query gives syntax error.
+select count(*) from (select * from t1) contents_of_t1;</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ When an alias is declared for an expression in a query, that alias cannot be referenced again within
+ the same query block:
+ </p>
+<pre class="pre codeblock"><code>-- Can't reference AVERAGE twice in the SELECT list where it's defined.
+select avg(x) as average, average+1 from t1 group by x;
+ERROR: AnalysisException: couldn't resolve column reference: 'average'
+
+-- Although it can be referenced again later in the same query.
+select avg(x) as average from t1 group by x having average > 3;</code></pre>
+ <p class="p">
+ For Impala, either repeat the expression again, or abstract the expression into a <code class="ph codeph">WITH</code>
+ clause, creating named columns that can be referenced multiple times anywhere in the base query:
+ </p>
+<pre class="pre codeblock"><code>-- The following 2 query forms are equivalent.
+select avg(x) as average, avg(x)+1 from t1 group by x;
+with avg_t as (select avg(x) average from t1 group by x) select average, average+1 from avg_t;</code></pre>
+
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala does not support certain rarely used join types that are less appropriate for high-volume tables
+ used for data warehousing. In some cases, Impala supports join types but requires explicit syntax to
+ ensure you do not do inefficient joins of huge tables by accident. For example, Impala does not support
+ natural joins or anti-joins, and requires the <code class="ph codeph">CROSS JOIN</code> operator for Cartesian
+ products. See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details on the syntax for Impala join clauses.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala has a limited choice of partitioning types. Partitions are defined based on each distinct
+ combination of values for one or more partition key columns. Impala does not redistribute or check data
+ to create evenly distributed partitions; you must choose partition key columns based on your knowledge
+ of the data volume and distribution. Adapt any tables that use range, list, hash, or key partitioning
+ to use the Impala partition syntax for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements. Impala partitioning is similar to range partitioning where every range has exactly one
+ value, or key partitioning where the hash function produces a separate bucket for every combination of
+ key values. See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for usage details, and
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> and
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Because the number of separate partitions is potentially higher than in other database systems, keep a
+ close eye on the number of partitions and the volume of data in each one; scale back the number of
+ partition key columns if you end up with too many partitions with a small volume of data in each one.
+ Remember, to distribute work for a query across a cluster, you need at least one HDFS block per node.
+ HDFS blocks are typically multiple megabytes, <span class="ph">especially</span> for Parquet
+ files. Therefore, if each partition holds only a few megabytes of data, you are unlikely to see much
+ parallelism in the query because such a small amount of data is typically processed by a single node.
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For <span class="q">"top-N"</span> queries, Impala uses the <code class="ph codeph">LIMIT</code> clause rather than comparing against a
+ pseudocolumn named <code class="ph codeph">ROWNUM</code> or <code class="ph codeph">ROW_NUM</code>. See
+ <a class="xref" href="impala_limit.html#limit">LIMIT Clause</a> for details.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="porting__porting_antipatterns">
+
+ <h2 class="title topictitle2" id="ariaid-title5">SQL Constructs to Doublecheck</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Some SQL constructs that are supported have behavior or defaults more oriented towards convenience than
+ optimal performance. Also, sometimes machine-generated SQL, perhaps issued through JDBC or ODBC
+ applications, might have inefficiencies or exceed internal Impala limits. As you port SQL code, be alert
+ and change these things where appropriate:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">STORED AS</code> clause creates data files
+ in plain text format, which is convenient for data interchange but not a good choice for high-volume
+ data with high-performance queries. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for why and
+ how to use specific file formats for compact data and high-performance queries. Especially see
+ <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>, for details about the file format most heavily optimized for
+ large-scale data warehouse queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ A <code class="ph codeph">CREATE TABLE</code> statement with no <code class="ph codeph">PARTITIONED BY</code> clause stores all the
+ data files in the same physical location, which can lead to scalability problems when the data volume
+ becomes large.
+ </p>
+ <p class="p">
+ On the other hand, adapting tables that were already partitioned in a different database system could
+ produce an Impala table with a high number of partitions and not enough data in each one, leading to
+ underutilization of Impala's parallel query features.
+ </p>
+ <p class="p">
+ See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for details about setting up partitioning and
+ tuning the performance of queries on partitioned tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT ... VALUES</code> syntax is suitable for setting up toy tables with a few rows for
+ functional testing, but because each such statement creates a separate tiny file in HDFS, it is not a
+ scalable technique for loading megabytes or gigabytes (let alone petabytes) of data. Consider revising
+ your data load process to produce raw data files outside of Impala, then setting up Impala external
+ tables or using the <code class="ph codeph">LOAD DATA</code> statement to use those data files instantly in Impala
+ tables, with no conversion or indexing stage. See <a class="xref" href="impala_tables.html#external_tables">External Tables</a> and
+ <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> for details about the Impala techniques for working with
+ data files produced outside of Impala; see <a class="xref" href="impala_tutorial.html#tutorial_etl">Data Loading and Querying Examples</a> for examples
+ of ETL workflow for Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If your ETL process is not optimized for Hadoop, you might end up with highly fragmented small data
+ files, or a single giant data file that cannot take advantage of distributed parallel queries or
+ partitioning. In this case, use an <code class="ph codeph">INSERT ... SELECT</code> statement to copy the data into a
+ new table and reorganize into a more efficient layout in the same operation. See
+ <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details about the <code class="ph codeph">INSERT</code> statement.
+ </p>
+ <p class="p">
+ You can do <code class="ph codeph">INSERT ... SELECT</code> into a table with a more efficient file format (see
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>) or from an unpartitioned table into a partitioned
+ one (see <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>).
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The number of expressions allowed in an Impala query might be smaller than for some other database
+ systems, causing failures for very complicated queries (typically produced by automated SQL
+ generators). Where practical, keep the number of expressions in the <code class="ph codeph">WHERE</code> clauses to
+ approximately 2000 or fewer. As a workaround, set the query option
+ <code class="ph codeph">DISABLE_CODEGEN=true</code> if queries fail for this reason. See
+ <a class="xref" href="impala_disable_codegen.html#disable_codegen">DISABLE_CODEGEN Query Option</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If practical, rewrite <code class="ph codeph">UNION</code> queries to use the <code class="ph codeph">UNION ALL</code> operator
+ instead. <span class="ph">Prefer <code class="ph codeph">UNION ALL</code> over <code class="ph codeph">UNION</code> when you know the
+ data sets are disjoint or duplicate values are not a problem; <code class="ph codeph">UNION ALL</code> is more efficient
+ because it avoids materializing and sorting the entire result set to eliminate duplicate values.</span>
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="porting__porting_next">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Next Porting Steps after Verifying Syntax and Semantics</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Throughout this section, some of the decisions you make during the porting process also have a substantial
+ impact on performance. After your SQL code is ported and working correctly, doublecheck the
+ performance-related aspects of your schema design, physical layout, and queries to make sure that the
+ ported application is taking full advantage of Impala's parallelism, performance-related SQL features, and
+ integration with Hadoop components.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Have you run the <code class="ph codeph">COMPUTE STATS</code> statement on each table involved in join queries? Have
+ you also run <code class="ph codeph">COMPUTE STATS</code> for each table used as the source table in an <code class="ph codeph">INSERT
+ ... SELECT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statement?
+ </li>
+
+ <li class="li">
+ Are you using the most efficient file format for your data volumes, table structure, and query
+ characteristics?
+ </li>
+
+ <li class="li">
+ Are you using partitioning effectively? That is, have you partitioned on columns that are often used for
+ filtering in <code class="ph codeph">WHERE</code> clauses? Have you partitioned at the right granularity so that there
+ is enough data in each partition to parallelize the work for each query?
+ </li>
+
+ <li class="li">
+ Does your ETL process produce a relatively small number of multi-megabyte data files (good) rather than a
+ huge number of small files (bad)?
+ </li>
+ </ul>
+
+ <p class="p">
+ See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details about the whole performance tuning
+ process.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_ports.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_ports.html b/docs/build3x/html/topics/impala_ports.html
new file mode 100644
index 0000000..5acc1b6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_ports.html
@@ -0,0 +1,421 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="ports"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Ports Used by Impala</title></head><body id="ports"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Ports Used by Impala</h1>
+
+
+ <div class="body conbody" id="ports__conbody_ports">
+
+ <p class="p">
+
+ Impala uses the TCP ports listed in the following table. Before deploying Impala, ensure these ports are open
+ on each system.
+ </p>
+
+ <table class="table"><caption></caption><colgroup><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"><col style="width:9.090909090909092%"><col style="width:18.181818181818183%"><col style="width:27.27272727272727%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__1">
+ Component
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__2">
+ Service
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__3">
+ Port
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__4">
+ Access Requirement
+ </th>
+ <th class="entry nocellnorowborder" id="ports__conbody_ports__entry__5">
+ Comment
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Frontend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 21000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Used to transmit commands and receive results by <code class="ph codeph">impala-shell</code> and
+ some ODBC drivers.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Frontend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 21050
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Used to transmit commands and receive results by applications, such as Business Intelligence tools,
+ using JDBC, the Beeswax query editor in Hue, and some ODBC drivers.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon Backend Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 22000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons use this port to communicate with each other.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStoreSubscriber Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 23000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons listen on this port for updates from the statestore daemon.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStoreSubscriber Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 23020
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The catalog daemon listens on this port for updates from the statestore daemon.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Impala Daemon HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Impala web interface for administrators to monitor and troubleshoot.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala StateStore Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25010
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ StateStore web interface for administrators to monitor and troubleshoot.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Catalog HTTP Server Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 25020
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Catalog service web interface for administrators to monitor and troubleshoot. New in Impala 1.2 and
+ higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala StateStore Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 24000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The statestore daemon listens on this port for registration/unregistration
+ requests.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Catalog Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ StateStore Service Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 26000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. The catalog service uses this port to communicate with the Impala daemons. New
+ in Impala 1.2 and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Daemon
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Callback Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 28000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. Impala daemons use to communicate with Llama. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Thrift Admin Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15002
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama Thrift Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15000
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ Internal
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Internal use only. New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__1 ">
+ <p class="p">
+ Impala Llama ApplicationMaster
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__2 ">
+ <p class="p">
+ Llama HTTP Port
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__3 ">
+ <p class="p">
+ 15001
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__4 ">
+ <p class="p">
+ External
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="ports__conbody_ports__entry__5 ">
+ <p class="p">
+ Llama service web interface for administrators to monitor and troubleshoot.
+ New in <span class="keyword">Impala 1.3</span> and higher.
+ </p>
+ </td>
+ </tr>
+ </tbody></table>
+ </div>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prefetch_mode.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prefetch_mode.html b/docs/build3x/html/topics/impala_prefetch_mode.html
new file mode 100644
index 0000000..b7cc1f5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prefetch_mode.html
@@ -0,0 +1,47 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prefetch_mode"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PREFETCH_MODE Query Option (Impala 2.6 or higher only)</title></head><body id="prefetch_mode"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PREFETCH_MODE Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Determines whether the prefetching optimization is applied during
+ join query processing.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric (0, 1)
+ or corresponding mnemonic strings (<code class="ph codeph">NONE</code>, <code class="ph codeph">HT_BUCKET</code>).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 1 (equivalent to <code class="ph codeph">HT_BUCKET</code>)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The default mode is 1, which means that hash table buckets are
+ prefetched during join query processing.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a>,
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a>.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_prereqs.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_prereqs.html b/docs/build3x/html/topics/impala_prereqs.html
new file mode 100644
index 0000000..88293d4
--- /dev/null
+++ b/docs/build3x/html/topics/impala_prereqs.html
@@ -0,0 +1,275 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_planning.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="prereqs"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Requirements</title></head><body id="prereqs"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Requirements</h1>
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ To perform as expected, Impala depends on the availability of the software, hardware, and configurations
+ described in the following sections.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_planning.html">Planning for Impala Deployment</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="prereqs__prereqs_os">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Supported Operating Systems</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+ Apache Impala runs on Linux systems only. See the <span class="ph filepath">README.md</span>
+ file for more information.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="prereqs__prereqs_hive">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Hive Metastore and Related Configuration</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Impala can interoperate with data stored in Hive, and uses the same infrastructure as Hive for tracking
+ metadata about schema objects such as tables and columns. The following components are prerequisites for
+ Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ MySQL or PostgreSQL, to act as a metastore database for both Impala and Hive.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Installing and configuring a Hive metastore is an Impala requirement. Impala does not work without
+ the metastore database. For the process of installing and configuring the metastore, see
+ <a class="xref" href="impala_install.html#install">Installing Impala</a>.
+ </p>
+
+ <p class="p">
+ Always configure a <strong class="ph b">Hive metastore service</strong> rather than connecting directly to the metastore
+ database. The Hive metastore service is required to interoperate between different levels of
+ metastore APIs if this is necessary for your environment, and using it avoids known issues with
+ connecting directly to the metastore database.
+ </p>
+
+ <p class="p">
+ A summary of the metastore installation process is as follows:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ Install a MySQL or PostgreSQL database. Start the database if it is not started after installation.
+ </li>
+
+ <li class="li">
+ Download the
+ <a class="xref" href="http://www.mysql.com/products/connector/" target="_blank">MySQL
+ connector</a> or the
+ <a class="xref" href="http://jdbc.postgresql.org/download.html" target="_blank">PostgreSQL
+ connector</a> and place it in the <code class="ph codeph">/usr/share/java/</code> directory.
+ </li>
+
+ <li class="li">
+ Use the appropriate command line tool for your database to create the metastore database.
+ </li>
+
+ <li class="li">
+ Use the appropriate command line tool for your database to grant privileges for the metastore
+ database to the <code class="ph codeph">hive</code> user.
+ </li>
+
+ <li class="li">
+ Modify <code class="ph codeph">hive-site.xml</code> to include information matching your particular database: its
+ URL, username, and password. You will copy the <code class="ph codeph">hive-site.xml</code> file to the Impala
+ Configuration Directory later in the Impala installation process.
+ </li>
+ </ul>
+ </div>
+ </li>
+
+ <li class="li">
+ <strong class="ph b">Optional:</strong> Hive. Although only the Hive metastore database is required for Impala to function, you
+ might install Hive on some client machines to create and load data into tables that use certain file
+ formats. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. Hive does not need to be
+ installed on the same DataNodes as Impala; it just needs access to the same metastore database.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="prereqs__prereqs_java">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Java Dependencies</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ Although Impala is primarily written in C++, it does use Java to communicate with various Hadoop
+ components:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The officially supported JVM for Impala is the Oracle JVM. Other JVMs might cause issues, typically
+ resulting in a failure at <span class="keyword cmdname">impalad</span> startup. In particular, the JamVM used by default on
+ certain levels of Ubuntu systems can cause <span class="keyword cmdname">impalad</span> to fail to start.
+ </li>
+
+ <li class="li">
+ Internally, the <span class="keyword cmdname">impalad</span> daemon relies on the <code class="ph codeph">JAVA_HOME</code> environment
+ variable to locate the system Java libraries. Make sure the <span class="keyword cmdname">impalad</span> service is not run
+ from an environment with an incorrect setting for this variable.
+ </li>
+
+ <li class="li">
+ All Java dependencies are packaged in the <code class="ph codeph">impala-dependencies.jar</code> file, which is located
+ at <code class="ph codeph">/usr/lib/impala/lib/</code>. These map to everything that is built under
+ <code class="ph codeph">fe/target/dependency</code>.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="prereqs__prereqs_network">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Networking Configuration Requirements</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ As part of ensuring best performance, Impala attempts to complete tasks on local data, as opposed to using
+ network connections to work with remote data. To support this goal, Impala matches
+ the <strong class="ph b">hostname</strong> provided to each Impala daemon with the <strong class="ph b">IP address</strong> of each DataNode by
+ resolving the hostname flag to an IP address. For Impala to work with local data, use a single IP interface
+ for the DataNode and the Impala daemon on each machine. Ensure that the Impala daemon's hostname flag
+ resolves to the IP address of the DataNode. For single-homed machines, this is usually automatic, but for
+ multi-homed machines, ensure that the Impala daemon's hostname resolves to the correct interface. Impala
+ tries to detect the correct hostname at start-up, and prints the derived hostname at the start of the log
+ in a message of the form:
+ </p>
+
+<pre class="pre codeblock"><code>Using hostname: impala-daemon-1.example.com</code></pre>
+
+ <p class="p">
+ In the majority of cases, this automatic detection works correctly. If you need to explicitly set the
+ hostname, do so by setting the <code class="ph codeph">--hostname</code> flag.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="prereqs__prereqs_hardware">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Hardware Requirements</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+ During join operations, portions of data from each joined table are loaded into memory. Data sets can be
+ very large, so ensure your hardware has sufficient memory to accommodate the joins you anticipate
+ completing.
+ </p>
+
+ <p class="p">
+ While requirements vary according to data set size, the following is generally recommended:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ CPU - Impala version 2.2 and higher uses the SSSE3 instruction set, which is included in newer processors.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ This required level of processor is the same as in Impala version 1.x. The Impala 2.0 and 2.1 releases
+ had a stricter requirement for the SSE4.1 instruction set, which has now been relaxed.
+ </div>
+
+ </li>
+
+ <li class="li">
+ Memory - 128 GB or more recommended, ideally 256 GB or more. If the intermediate results during query
+ processing on a particular node exceed the amount of memory available to Impala on that node, the query
+ writes temporary work data to disk, which can lead to long query times. Note that because the work is
+ parallelized, and intermediate results for aggregate queries are typically smaller than the original
+ data, Impala can query and join tables that are much larger than the memory available on an individual
+ node.
+ </li>
+
+ <li class="li">
+ Storage - DataNodes with 12 or more disks each. I/O speeds are often the limiting factor for disk
+ performance with Impala. Ensure that you have sufficient disk space to store the data Impala will be
+ querying.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="prereqs__prereqs_account">
+
+ <h2 class="title topictitle2" id="ariaid-title7">User Account Requirements</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Impala creates and uses a user and group named <code class="ph codeph">impala</code>. Do not delete this account or group
+ and do not modify the account's or group's permissions and rights. Ensure no existing systems obstruct the
+ functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in
+ a white-list, add these accounts to the list of permitted accounts.
+ </p>
+
+ <p class="p">
+ For correct file deletion during <code class="ph codeph">DROP TABLE</code> operations, Impala must be able to move files
+ to the HDFS trashcan. You might need to create an HDFS directory <span class="ph filepath">/user/impala</span>,
+ writeable by the <code class="ph codeph">impala</code> user, so that the trashcan can be created. Otherwise, data files
+ might remain behind after a <code class="ph codeph">DROP TABLE</code> statement.
+ </p>
+
+ <p class="p">
+ Impala should not run as root. Best Impala performance is achieved using direct reads, but root is not
+ permitted to use direct reads. Therefore, running Impala as root negatively affects performance.
+ </p>
+
+ <p class="p">
+ By default, any user can connect to Impala and access all the associated databases and tables. You can
+ enable authorization and authentication based on the Linux OS user who connects to the Impala server, and
+ the associated groups for that user. <a class="xref" href="impala_security.html#security">Impala Security</a> for details. These
+ security features do not change the underlying file permission requirements; the <code class="ph codeph">impala</code>
+ user still needs to be able to access the data files.
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_processes.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_processes.html b/docs/build3x/html/topics/impala_processes.html
new file mode 100644
index 0000000..4d64072
--- /dev/null
+++ b/docs/build3x/html/topics/impala_processes.html
@@ -0,0 +1,115 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="processes"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Starting Impala</title></head><body id="processes"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Starting Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ To activate Impala if it is installed but not yet started:
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Set any necessary configuration options for the Impala services. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details.
+ </li>
+
+ <li class="li">
+ Start one instance of the Impala statestore. The statestore helps Impala to distribute work efficiently,
+ and to continue running in the event of availability problems for other Impala nodes. If the statestore
+ becomes unavailable, Impala continues to function.
+ </li>
+
+ <li class="li">
+ Start one instance of the Impala catalog service.
+ </li>
+
+ <li class="li">
+ Start the main Impala service on one or more DataNodes, ideally on all DataNodes to maximize local
+ processing and avoid network traffic due to remote reads.
+ </li>
+ </ol>
+
+ <p class="p">
+ Once Impala is running, you can conduct interactive experiments using the instructions in
+ <a class="xref" href="impala_tutorial.html#tutorial">Impala Tutorials</a> and try <a class="xref" href="impala_impala_shell.html#impala_shell">Using the Impala Shell (impala-shell Command)</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_options.html">Modifying Impala Startup Options</a></strong><br></li></ul></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="processes__starting_via_cmdline">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Starting Impala from the Command Line</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To start the Impala state store and Impala from the command line or a script, you can either use the
+ <span class="keyword cmdname">service</span> command or you can start the daemons directly through the
+ <span class="keyword cmdname">impalad</span>, <code class="ph codeph">statestored</code>, and <span class="keyword cmdname">catalogd</span> executables.
+ </p>
+
+ <p class="p">
+ Start the Impala statestore and then start <code class="ph codeph">impalad</code> instances. You can modify the values
+ the service initialization scripts use when starting the statestore and Impala by editing
+ <code class="ph codeph">/etc/default/impala</code>.
+ </p>
+
+ <p class="p">
+ Start the statestore service using a command similar to the following:
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-state-store start</code></pre>
+ </div>
+
+ <p class="p">
+ Start the catalog service using a command similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog start</code></pre>
+
+ <p class="p">
+ Start the Impala service on each DataNode using a command similar to the following:
+ </p>
+
+ <div class="p">
+<pre class="pre codeblock"><code>$ sudo service impala-server start</code></pre>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+ Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+ where the Java function argument and return types are omitted.
+ Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+ because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+ Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+ you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+ you restart the <span class="keyword cmdname">catalogd</span> daemon.
+ Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+ </p>
+ </div>
+
+ <div class="p">
+ If any of the services fail to start, review:
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_logging.html#logs_debug">Reviewing Impala Logs</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_troubleshooting.html#troubleshooting">Troubleshooting Impala</a>
+ </li>
+ </ul>
+ </div>
+ </div>
+ </article>
+</article></main></body></html>
[05/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_files.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_files.html b/docs/build3x/html/topics/impala_security_files.html
new file mode 100644
index 0000000..b7fa280
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_files.html
@@ -0,0 +1,58 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="secure_files"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing Impala Data and Log Files</title></head><body id="secure_files"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Securing Impala Data and Log Files</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ One aspect of security is to protect files from unauthorized access at the filesystem level. For example, if
+ you store sensitive data in HDFS, you specify permissions on the associated files and directories in HDFS to
+ restrict read and write permissions to the appropriate users and groups.
+ </p>
+
+ <p class="p">
+ If you issue queries containing sensitive values in the <code class="ph codeph">WHERE</code> clause, such as financial
+ account numbers, those values are stored in Impala log files in the Linux filesystem and you must secure
+ those files also. For the locations of Impala log files, see <a class="xref" href="impala_logging.html#logging">Using Impala Logging</a>.
+ </p>
+
+ <p class="p">
+ All Impala read and write operations are performed under the filesystem privileges of the
+ <code class="ph codeph">impala</code> user. The <code class="ph codeph">impala</code> user must be able to read all directories and data
+ files that you query, and write into all the directories and data files for <code class="ph codeph">INSERT</code> and
+ <code class="ph codeph">LOAD DATA</code> statements. At a minimum, make sure the <code class="ph codeph">impala</code> user is in the
+ <code class="ph codeph">hive</code> group so that it can access files and directories shared between Impala and Hive. See
+ <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for more details.
+ </p>
+
+ <p class="p">
+ Setting file permissions is necessary for Impala to function correctly, but is not an effective security
+ practice by itself:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The way to ensure that only authorized users can submit requests for databases and tables they are allowed
+ to access is to set up Sentry authorization, as explained in
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>. With authorization enabled, the checking of the user
+ ID and group is done by Impala, and unauthorized access is blocked by Impala itself. The actual low-level
+ read and write requests are still done by the <code class="ph codeph">impala</code> user, so you must have appropriate
+ file and directory permissions for that user ID.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You must also set up Kerberos authentication, as described in <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a>,
+ so that users can only connect from trusted hosts. With Kerberos enabled, if someone connects a new host to
+ the network and creates user IDs that match your privileged IDs, they will be blocked from connecting to
+ Impala at all from that host.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_guidelines.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_guidelines.html b/docs/build3x/html/topics/impala_security_guidelines.html
new file mode 100644
index 0000000..c8bc24c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_guidelines.html
@@ -0,0 +1,99 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_guidelines"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Security Guidelines for Impala</title></head><body id="security_guidelines"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Security Guidelines for Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following are the major steps to harden a cluster running Impala against accidents and mistakes, or
+ malicious attackers trying to access sensitive data:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Secure the <code class="ph codeph">root</code> account. The <code class="ph codeph">root</code> user can tamper with the
+ <span class="keyword cmdname">impalad</span> daemon, read and write the data files in HDFS, log into other user accounts, and
+ access other system services that are beyond the control of Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Restrict membership in the <code class="ph codeph">sudoers</code> list (in the <span class="ph filepath">/etc/sudoers</span> file).
+ The users who can run the <code class="ph codeph">sudo</code> command can do many of the same things as the
+ <code class="ph codeph">root</code> user.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Ensure the Hadoop ownership and permissions for Impala data files are restricted.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Ensure the Hadoop ownership and permissions for Impala log files are restricted.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Ensure that the Impala web UI (available by default on port 25000 on each Impala node) is
+ password-protected. See <a class="xref" href="impala_webui.html#webui">Impala Web User Interface for Debugging</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Create a policy file that specifies which Impala privileges are available to users in particular Hadoop
+ groups (which by default map to Linux OS groups). Create the associated Linux groups using the
+ <span class="keyword cmdname">groupadd</span> command if necessary.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The Impala authorization feature makes use of the HDFS file ownership and permissions mechanism; for
+ background information, see the
+ <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html" target="_blank">HDFS Permissions Guide</a>.
+ Set up users and assign them to groups at the OS level, corresponding to the
+ different categories of users with different access levels for various databases, tables, and HDFS
+ locations (URIs). Create the associated Linux users using the <span class="keyword cmdname">useradd</span> command if
+ necessary, and add them to the appropriate groups with the <span class="keyword cmdname">usermod</span> command.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Design your databases, tables, and views with database and table structure to allow policy rules to specify
+ simple, consistent rules. For example, if all tables related to an application are inside a single
+ database, you can assign privileges for that database and use the <code class="ph codeph">*</code> wildcard for the table
+ name. If you are creating views with different privileges than the underlying base tables, you might put
+ the views in a separate database so that you can use the <code class="ph codeph">*</code> wildcard for the database
+ containing the base tables, while specifying the precise names of the individual views. (For specifying
+ table or database names, you either specify the exact name or <code class="ph codeph">*</code> to mean all the databases
+ on a server, or all the tables and views in a database.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Enable authorization by running the <code class="ph codeph">impalad</code> daemons with the <code class="ph codeph">-server_name</code>
+ and <code class="ph codeph">-authorization_policy_file</code> options on all nodes. (The authorization feature does not
+ apply to the <span class="keyword cmdname">statestored</span> daemon, which has no access to schema objects or data files.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Set up authentication using Kerberos, to make sure users really are who they say they are.
+ </p>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_install.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_install.html b/docs/build3x/html/topics/impala_security_install.html
new file mode 100644
index 0000000..09d4e38
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_install.html
@@ -0,0 +1,17 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_install"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Installation Considerations for Impala Security</title></head><body id="security_install"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Installation Considerations for Impala Security</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala 1.1 comes set up with all the software and settings needed to enable security when you run the
+ <span class="keyword cmdname">impalad</span> daemon with the new security-related options (<code class="ph codeph">-server_name</code> and
+ <code class="ph codeph">-authorization_policy_file</code>). You do not need to change any environment variables or install
+ any additional JAR files.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_metastore.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_metastore.html b/docs/build3x/html/topics/impala_security_metastore.html
new file mode 100644
index 0000000..b9034a8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_metastore.html
@@ -0,0 +1,30 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_metastore"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Hive Metastore Database</title></head><body id="security_metastore"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Securing the Hive Metastore Database</h1>
+
+
+ <div class="body conbody">
+
+
+
+ <p class="p">
+ It is important to secure the Hive metastore, so that users cannot access the names or other information
+ about databases and tables the through the Hive client or by querying the metastore database. Do this by
+ turning on Hive metastore security, using the instructions in
+ <span class="xref">the documentation for your Apache Hadoop distribution</span> for securing different Hive components:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Secure the Hive Metastore.
+ </li>
+
+ <li class="li">
+ In addition, allow access to the metastore only from the HiveServer2 server, and then disable local access
+ to the HiveServer2 server.
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_security_webui.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_security_webui.html b/docs/build3x/html/topics/impala_security_webui.html
new file mode 100644
index 0000000..44f7a19
--- /dev/null
+++ b/docs/build3x/html/topics/impala_security_webui.html
@@ -0,0 +1,57 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="security_webui"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Securing the Impala Web User Interface</title></head><body id="security_webui"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Securing the Impala Web User Interface</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The instructions in this section presume you are familiar with the
+ <a class="xref" href="http://en.wikipedia.org/wiki/.htpasswd" target="_blank">
+ <span class="ph filepath">.htpasswd</span> mechanism</a> commonly used to password-protect pages on web servers.
+ </p>
+
+ <p class="p">
+ Password-protect the Impala web UI that listens on port 25000 by default. Set up a
+ <span class="ph filepath">.htpasswd</span> file in the <code class="ph codeph">$IMPALA_HOME</code> directory, or start both the
+ <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+ <code class="ph codeph">--webserver_password_file</code> option to specify a different location (including the filename).
+ </p>
+
+ <p class="p">
+ This file should only be readable by the Impala process and machine administrators, because it contains
+ (hashed) versions of passwords. The username / password pairs are not derived from Unix usernames, Kerberos
+ users, or any other system. The <code class="ph codeph">domain</code> field in the password file must match the domain
+ supplied to Impala by the new command-line option <code class="ph codeph">--webserver_authentication_domain</code>. The
+ default is <code class="ph codeph">mydomain.com</code>.
+
+ </p>
+
+ <p class="p">
+ Impala also supports using HTTPS for secure web traffic. To do so, set
+ <code class="ph codeph">--webserver_certificate_file</code> to refer to a valid <code class="ph codeph">.pem</code> TLS/SSL certificate file.
+ Impala will automatically start using HTTPS once the TLS/SSL certificate has been read and validated. A
+ <code class="ph codeph">.pem</code> file is basically a private key, followed by a signed TLS/SSL certificate; make sure to
+ concatenate both parts when constructing the <code class="ph codeph">.pem</code> file.
+
+ </p>
+
+ <p class="p">
+ If Impala cannot find or parse the <code class="ph codeph">.pem</code> file, it prints an error message and quits.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ If the private key is encrypted using a passphrase, Impala will ask for that passphrase on startup, which
+ is not useful for a large cluster. In that case, remove the passphrase and make the <code class="ph codeph">.pem</code>
+ file readable only by Impala and administrators.
+ </p>
+ <p class="p">
+ When you turn on TLS/SSL for the Impala web UI, the associated URLs change from <code class="ph codeph">http://</code>
+ prefixes to <code class="ph codeph">https://</code>. Adjust any bookmarks or application code that refers to those URLs.
+ </p>
+ </div>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_select.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_select.html b/docs/build3x/html/topics/impala_select.html
new file mode 100644
index 0000000..9d99913
--- /dev/null
+++ b/docs/build3x/html/topics/impala_select.html
@@ -0,0 +1,236 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_joins.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_order_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_by.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_having.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_limit.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_offset.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_union.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_subqueries.html"><meta name="DC.Relation" scheme="U
RI" content="../topics/impala_tablesample.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_with.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_distinct.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="select"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SELECT Statement</title></head><body id="select"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SELECT Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">SELECT</code> statement performs queries, retrieving data from one or more tables and producing
+ result sets consisting of rows and columns.
+ </p>
+
+ <p class="p">
+ The Impala <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code> statement also typically ends
+ with a <code class="ph codeph">SELECT</code> statement, to define data to copy from one table to another.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>[WITH <em class="ph i">name</em> AS (<em class="ph i">select_expression</em>) [, ...] ]
+SELECT
+ [ALL | DISTINCT]
+ [STRAIGHT_JOIN]
+ <em class="ph i">expression</em> [, <em class="ph i">expression</em> ...]
+FROM <em class="ph i">table_reference</em> [, <em class="ph i">table_reference</em> ...]
+[[FULL | [LEFT | RIGHT] INNER | [LEFT | RIGHT] OUTER | [LEFT | RIGHT] SEMI | [LEFT | RIGHT] ANTI | CROSS]
+ JOIN <em class="ph i">table_reference</em>
+ [ON <em class="ph i">join_equality_clauses</em> | USING (<var class="keyword varname">col1</var>[, <var class="keyword varname">col2</var> ...]] ...
+WHERE <em class="ph i">conditions</em>
+GROUP BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [, ...] }
+HAVING <code class="ph codeph">conditions</code>
+ORDER BY { <em class="ph i">column</em> | <em class="ph i">expression</em> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, ...] }
+LIMIT <em class="ph i">expression</em> [OFFSET <em class="ph i">expression</em>]
+[UNION [ALL] <em class="ph i">select_statement</em>] ...]
+
+table_reference := { <var class="keyword varname">table_name</var> | (<var class="keyword varname">subquery</var>) }
+ <span class="ph">[ TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)] ]</span>
+</code></pre>
+
+ <p class="p">
+ Impala <code class="ph codeph">SELECT</code> queries support:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ SQL scalar data types: <code class="ph codeph"><a class="xref" href="impala_boolean.html#boolean">BOOLEAN</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_tinyint.html#tinyint">TINYINT</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_smallint.html#smallint">SMALLINT</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_int.html#int">INT</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_bigint.html#bigint">BIGINT</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_decimal.html#decimal">DECIMAL</a></code>
+ <code class="ph codeph"><a class="xref" href="impala_float.html#float">FLOAT</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_double.html#double">DOUBLE</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_string.html#string">STRING</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_varchar.html#varchar">VARCHAR</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_char.html#char">CHAR</a></code>.
+ </li>
+
+
+ <li class="li">
+ The complex data types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>,
+ are available in <span class="keyword">Impala 2.3</span> and higher.
+ Queries involving these types typically involve special qualified names
+ using dot notation for referring to the complex column fields,
+ and join clauses for bringing the complex columns into the result set.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details.
+ </li>
+
+ <li class="li">
+ An optional <a class="xref" href="impala_with.html#with"><code class="ph codeph">WITH</code> clause</a> before the
+ <code class="ph codeph">SELECT</code> keyword, to define a subquery whose name or column names can be referenced from
+ later in the main query. This clause lets you abstract repeated clauses, such as aggregation functions,
+ that are referenced multiple times in the same query.
+ </li>
+
+ <li class="li">
+ By default, one <code class="ph codeph">DISTINCT</code> clause per query. See <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>
+ for details. See <a class="xref" href="impala_appx_count_distinct.html#appx_count_distinct">APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</a> for a query option to
+ allow multiple <code class="ph codeph">COUNT(DISTINCT)</code> impressions in the same query.
+ </li>
+
+ <li class="li">
+ Subqueries in a <code class="ph codeph">FROM</code> clause. In <span class="keyword">Impala 2.0</span> and higher,
+ subqueries can also go in the <code class="ph codeph">WHERE</code> clause, for example with the
+ <code class="ph codeph">IN()</code>, <code class="ph codeph">EXISTS</code>, and <code class="ph codeph">NOT EXISTS</code> operators.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">WHERE</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code> clauses.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph"><a class="xref" href="impala_order_by.html#order_by">ORDER BY</a></code>. Prior to Impala 1.4.0, Impala
+ required that queries using an <code class="ph codeph">ORDER BY</code> clause also include a
+ <code class="ph codeph"><a class="xref" href="impala_limit.html#limit">LIMIT</a></code> clause. In Impala 1.4.0 and higher, this
+ restriction is lifted; sort operations that would exceed the Impala memory limit automatically use a
+ temporary disk work area to perform the sort.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala supports a wide variety of <code class="ph codeph">JOIN</code> clauses. Left, right, semi, full, and outer joins
+ are supported in all Impala versions. The <code class="ph codeph">CROSS JOIN</code> operator is available in Impala 1.2.2
+ and higher. During performance tuning, you can override the reordering of join clauses that Impala does
+ internally by including the keyword <code class="ph codeph">STRAIGHT_JOIN</code> immediately after the
+ <code class="ph codeph">SELECT</code> and any <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">ALL</code> keywords.
+ </p>
+ <p class="p">
+ See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details and examples of join queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">UNION ALL</code>.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">LIMIT</code>.
+ </li>
+
+ <li class="li">
+ External tables.
+ </li>
+
+ <li class="li">
+ Relational operators such as greater than, less than, or equal to.
+ </li>
+
+ <li class="li">
+ Arithmetic operators such as addition or subtraction.
+ </li>
+
+ <li class="li">
+ Logical/Boolean operators <code class="ph codeph">AND</code>, <code class="ph codeph">OR</code>, and <code class="ph codeph">NOT</code>. Impala does
+ not support the corresponding symbols <code class="ph codeph">&&</code>, <code class="ph codeph">||</code>, and
+ <code class="ph codeph">!</code>.
+ </li>
+
+ <li class="li">
+ Common SQL built-in functions such as <code class="ph codeph">COUNT</code>, <code class="ph codeph">SUM</code>, <code class="ph codeph">CAST</code>,
+ <code class="ph codeph">LIKE</code>, <code class="ph codeph">IN</code>, <code class="ph codeph">BETWEEN</code>, and <code class="ph codeph">COALESCE</code>. Impala
+ specifically supports built-ins described in <a class="xref" href="impala_functions.html#builtins">Impala Built-In Functions</a>.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.9</span> and higher, an optional <code class="ph codeph">TABLESAMPLE</code>
+ clause immediately after a table reference, to specify that the query only processes a
+ specified percentage of the table data. See <a class="xref" href="impala_tablesample.html">TABLESAMPLE Clause</a> for details.
+ </li>
+ </ul>
+
+ <p class="p">
+ Impala queries ignore files with extensions commonly used for temporary work files by Hadoop tools. Any
+ files with extensions <code class="ph codeph">.tmp</code> or <code class="ph codeph">.copying</code> are not considered part of the
+ Impala table. The suffix matching is case-insensitive, so for example Impala ignores both
+ <code class="ph codeph">.copying</code> and <code class="ph codeph">.COPYING</code> suffixes.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Can be cancelled. To cancel this statement, use Ctrl-C from the
+ <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+ <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+ in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+ (port 25000).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read
+ permissions for the files in all applicable directories in all source tables,
+ and read and execute permissions for the relevant data directories.
+ (A <code class="ph codeph">SELECT</code> operation could read files from multiple different HDFS directories
+ if the source table is partitioned.)
+ If a query attempts to read a data file and is unable to because of an HDFS permission error,
+ the query halts and does not return any further results.
+ </p>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SELECT</code> syntax is so extensive that it forms its own category of statements: queries. The
+ other major classifications of SQL statements are data definition language (see
+ <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and data manipulation language (see <a class="xref" href="impala_dml.html#dml">DML Statements</a>).
+ </p>
+
+ <p class="p">
+ Because the focus of Impala is on fast queries with interactive response times over huge data sets, query
+ performance and scalability are important considerations. See
+ <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> and <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for
+ details.
+ </p>
+ </div>
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_joins.html">Joins in Impala SELECT Statements</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_order_by.html">ORDER BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_by.html">GROUP BY Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_having.html">HAVING Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_limit.html">LIMIT Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_offset.html">OFFSET Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_union.html">UNION Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_subqueries.html">Subqueries in Impala SELECT Statements</a></strong><
br></li><li class="link ulchildlink"><strong><a href="../topics/impala_tablesample.html">TABLESAMPLE Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_with.html">WITH Clause</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_distinct.html">DISTINCT Operator</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_seqfile.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_seqfile.html b/docs/build3x/html/topics/impala_seqfile.html
new file mode 100644
index 0000000..5899ba3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_seqfile.html
@@ -0,0 +1,240 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_file_formats.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="seqfile"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using the SequenceFile File Format with Impala Tables</title></head><body id="seqfile"><main role="main"><article role="article" aria-labelledby="seqfile__sequencefile">
+
+ <h1 class="title topictitle1" id="seqfile__sequencefile">Using the SequenceFile File Format with Impala Tables</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports using SequenceFile data files.
+ </p>
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">SequenceFile Format Support in Impala</span></caption><colgroup><col style="width:10%"><col style="width:10%"><col style="width:20%"><col style="width:30%"><col style="width:30%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="seqfile__entry__1">
+ File Type
+ </th>
+ <th class="entry nocellnorowborder" id="seqfile__entry__2">
+ Format
+ </th>
+ <th class="entry nocellnorowborder" id="seqfile__entry__3">
+ Compression Codecs
+ </th>
+ <th class="entry nocellnorowborder" id="seqfile__entry__4">
+ Impala Can CREATE?
+ </th>
+ <th class="entry nocellnorowborder" id="seqfile__entry__5">
+ Impala Can INSERT?
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="seqfile__entry__1 ">
+ <a class="xref" href="impala_seqfile.html#seqfile">SequenceFile</a>
+ </td>
+ <td class="entry nocellnorowborder" headers="seqfile__entry__2 ">
+ Structured
+ </td>
+ <td class="entry nocellnorowborder" headers="seqfile__entry__3 ">
+ Snappy, gzip, deflate, bzip2
+ </td>
+ <td class="entry nocellnorowborder" headers="seqfile__entry__4 ">Yes.</td>
+ <td class="entry nocellnorowborder" headers="seqfile__entry__5 ">
+ No. Import data by using <code class="ph codeph">LOAD DATA</code> on data files already in the right format, or use
+ <code class="ph codeph">INSERT</code> in Hive followed by <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> in Impala.
+ </td>
+
+ </tr>
+ </tbody></table>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_file_formats.html">How Impala Works with Hadoop File Formats</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="seqfile__seqfile_create">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Creating SequenceFile Tables and Loading Data</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If you do not have an existing data file to use, begin by creating one in the appropriate format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To create a SequenceFile table:</strong>
+ </p>
+
+ <p class="p">
+ In the <code class="ph codeph">impala-shell</code> interpreter, issue a command similar to:
+ </p>
+
+<pre class="pre codeblock"><code>create table sequencefile_table (<var class="keyword varname">column_specs</var>) stored as sequencefile;</code></pre>
+
+ <p class="p">
+ Because Impala can query some kinds of tables that it cannot currently write to, after creating tables of
+ certain file formats, you might use the Hive shell to load the data. See
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details. After loading data into a table through
+ Hive or other mechanism outside of Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement the next time you connect to the Impala node, before querying the table, to make Impala recognize
+ the new data.
+ </p>
+
+ <p class="p">
+ For example, here is how you might create some SequenceFile tables in Impala (by specifying the columns
+ explicitly, or cloning the structure of another table), load data through Hive, and query them through
+ Impala:
+ </p>
+
+<pre class="pre codeblock"><code>$ impala-shell -i localhost
+[localhost:21000] > create table seqfile_table (x int) stored as sequencefile;
+[localhost:21000] > create table seqfile_clone like some_other_table stored as sequencefile;
+[localhost:21000] > quit;
+
+$ hive
+hive> insert into table seqfile_table select x from some_other_table;
+3 Rows loaded to seqfile_table
+Time taken: 19.047 seconds
+hive> quit;
+
+$ impala-shell -i localhost
+[localhost:21000] > select * from seqfile_table;
+Returned 0 row(s) in 0.23s
+[localhost:21000] > -- Make Impala recognize the data loaded through Hive;
+[localhost:21000] > refresh seqfile_table;
+[localhost:21000] > select * from seqfile_table;
++---+
+| x |
++---+
+| 1 |
+| 2 |
+| 3 |
++---+
+Returned 3 row(s) in 0.23s</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ Although you can create tables in this file format using
+ the complex types (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>,
+ and <code class="ph codeph">MAP</code>) available in <span class="keyword">Impala 2.3</span> and higher,
+ currently, Impala can query these types only in Parquet tables.
+ <span class="ph">
+ The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+ Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+ </span>
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="seqfile__seqfile_compression">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Enabling Compression for SequenceFile Tables</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You may want to enable compression on existing tables. Enabling compression provides performance gains in
+ most cases and is supported for SequenceFile tables. For example, to enable Snappy compression, you would
+ specify the following additional settings when loading data through the Hive shell:
+ </p>
+
+<pre class="pre codeblock"><code>hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> insert overwrite table <var class="keyword varname">new_table</var> select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+ <p class="p">
+ If you are converting partitioned tables, you must complete additional steps. In such a case, specify
+ additional settings similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> create table <var class="keyword varname">new_table</var> (<var class="keyword varname">your_cols</var>) partitioned by (<var class="keyword varname">partition_cols</var>) stored as <var class="keyword varname">new_format</var>;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> insert overwrite table <var class="keyword varname">new_table</var> partition(<var class="keyword varname">comma_separated_partition_cols</var>) select * from <var class="keyword varname">old_table</var>;</code></pre>
+
+ <p class="p">
+ Remember that Hive does not require that you specify a source format for it. Consider the case of
+ converting a table with two partition columns called <code class="ph codeph">year</code> and <code class="ph codeph">month</code> to a
+ Snappy compressed SequenceFile. Combining the components outlined previously to complete this table
+ conversion, you would specify settings similar to the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> create table TBL_SEQ (int_col int, string_col string) STORED AS SEQUENCEFILE;
+hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> INSERT OVERWRITE TABLE tbl_seq SELECT * FROM tbl;</code></pre>
+
+ <p class="p">
+ To complete a similar process for a table that includes partitions, you would specify settings similar to
+ the following:
+ </p>
+
+<pre class="pre codeblock"><code>hive> CREATE TABLE tbl_seq (int_col INT, string_col STRING) PARTITIONED BY (year INT) STORED AS SEQUENCEFILE;
+hive> SET hive.exec.compress.output=true;
+hive> SET mapred.max.split.size=256000000;
+hive> SET mapred.output.compression.type=BLOCK;
+hive> SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
+hive> SET hive.exec.dynamic.partition.mode=nonstrict;
+hive> SET hive.exec.dynamic.partition=true;
+hive> INSERT OVERWRITE TABLE tbl_seq PARTITION(year) SELECT * FROM tbl;</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The compression type is specified in the following command:
+ </p>
+<pre class="pre codeblock"><code>SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;</code></pre>
+ <p class="p">
+ You could elect to specify alternative codecs such as <code class="ph codeph">GzipCodec</code> here.
+ </p>
+ </div>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="seqfile__seqfile_performance">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Query Performance for Impala SequenceFile Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In general, expect query performance with SequenceFile tables to be
+ faster than with tables using text data, but slower than with
+ Parquet tables. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+ for information about using the Parquet file format for
+ high-performance analytic queries.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_set.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_set.html b/docs/build3x/html/topics/impala_set.html
new file mode 100644
index 0000000..4dd5f77
--- /dev/null
+++ b/docs/build3x/html/topics/impala_set.html
@@ -0,0 +1,280 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="set"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>SET Statement</title></head><body id="set"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">SET Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies values for query options that control the runtime behavior of other statements within the same
+ session.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, <code class="ph codeph">SET</code> also defines user-specified substitution variables for
+ the <span class="keyword cmdname">impala-shell</span> interpreter. This feature uses the <code class="ph codeph">SET</code> command
+ built into <span class="keyword cmdname">impala-shell</span> instead of the SQL <code class="ph codeph">SET</code> statement.
+ Therefore the substitution mechanism only works with queries processed by <span class="keyword cmdname">impala-shell</span>,
+ not with queries submitted through JDBC or ODBC.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, the output of the <code class="ph codeph">SET</code>
+ statement changes in some important ways:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The options are divided into groups: <code class="ph codeph">Regular Query Options</code>,
+ <code class="ph codeph">Advanced Query Options</code>, <code class="ph codeph">Development Query Options</code>, and
+ <code class="ph codeph">Deprecated Query Options</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The advanced options are intended for use in specific
+ kinds of performance tuning and debugging scenarios. The development options are
+ related to internal development of Impala or features that are not yet finalized;
+ these options might be changed or removed without notice.
+ The deprecated options are related to features that are removed or changed so that
+ the options no longer have any purpose; these options might be removed in future
+ versions.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ By default, only the first two groups (regular and advanced) are
+ displayed by the <code class="ph codeph">SET</code> command. Use the syntax <code class="ph codeph">SET ALL</code>
+ to see all groups of options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <span class="keyword cmdname">impala-shell</span> options and user-specified variables are always displayed
+ at the end of the list of query options, after all appropriate option groups.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ When the <code class="ph codeph">SET</code> command is run through the JDBC or ODBC interfaces,
+ the result set has a new third column, <code class="ph codeph">level</code>, indicating which
+ group each option belongs to. The same distinction of <code class="ph codeph">SET</code>
+ returning the regular and advanced options, and <code class="ph codeph">SET ALL</code>
+ returning all option groups, applies to JDBC and ODBC also.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SET [<var class="keyword varname">query_option</var>=<var class="keyword varname">option_value</var>]
+<span class="ph">SET ALL</span>
+</code></pre>
+
+ <p class="p">
+ <code class="ph codeph">SET</code> and <code class="ph codeph">SET ALL</code> with no arguments return a
+ result set consisting of all the applicable query options and their current values.
+ </p>
+
+ <p class="p">
+ The query option name and any string argument values are case-insensitive.
+ </p>
+
+ <p class="p">
+ Each query option has a specific allowed notation for its arguments. Boolean options can be enabled and
+ disabled by assigning values of either <code class="ph codeph">true</code> and <code class="ph codeph">false</code>, or
+ <code class="ph codeph">1</code> and <code class="ph codeph">0</code>. Some numeric options accept a final character signifying the unit,
+ such as <code class="ph codeph">2g</code> for 2 gigabytes or <code class="ph codeph">100m</code> for 100 megabytes. See
+ <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the details of each query option.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Setting query options during impala-shell invocation:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use one or more command-line options
+ of the form <code class="ph codeph">--query_option=<var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>
+ when running the <span class="keyword cmdname">impala-shell</span> command. The corresponding query option settings
+ take effect for that <span class="keyword cmdname">impala-shell</span> session.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">User-specified substitution variables:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can specify your own names and string substitution values
+ within the <span class="keyword cmdname">impala-shell</span> interpreter. Once a substitution variable is set up,
+ its value is inserted into any SQL statement in that same <span class="keyword cmdname">impala-shell</span> session
+ that contains the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>.
+ Using <code class="ph codeph">SET</code> in an interactive <span class="keyword cmdname">impala-shell</span> session overrides
+ any value for that same variable passed in through the <code class="ph codeph">--var=<var class="keyword varname">varname</var>=<var class="keyword varname">value</var></code>
+ command-line option.
+ </p>
+
+ <p class="p">
+ For example, to set up some default parameters for report queries, but then override those default
+ within an <span class="keyword cmdname">impala-shell</span> session, you might issue commands and statements such as
+ the following:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Initial setup for this example.
+create table staging_table (s string);
+insert into staging_table values ('foo'), ('bar'), ('bletch');
+
+create table production_table (s string);
+insert into production_table values ('North America'), ('EMEA'), ('Asia');
+quit;
+
+-- Start impala-shell with user-specified substitution variables,
+-- run a query, then override the variables with SET and run the query again.
+$ impala-shell --var=table_name=staging_table --var=cutoff=2
+... <var class="keyword varname">banner message</var> ...
+[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from staging_table order by s limit 2
++--------+
+| s |
++--------+
+| bar |
+| bletch |
++--------+
+Fetched 2 row(s) in 1.06s
+
+[localhost:21000] > set var:table_name=production_table;
+Variable TABLE_NAME set to production_table
+[localhost:21000] > set var:cutoff=3;
+Variable CUTOFF set to 3
+
+[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
+Query: select s from production_table order by s limit 3
++---------------+
+| s |
++---------------+
+| Asia |
+| EMEA |
+| North America |
++---------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how <code class="ph codeph">SET ALL</code> with no parameters displays
+ all user-specified substitution variables, and how <code class="ph codeph">UNSET</code> removes
+ the substitution variable entirely:
+ </p>
+
+<pre class="pre codeblock"><code>
+[localhost:21000] > set all;
+Query options (defaults shown in []):
+ABORT_ON_ERROR: [0]
+COMPRESSION_CODEC: []
+DISABLE_CODEGEN: [0]
+...
+
+Advanced Query Options:
+APPX_COUNT_DISTINCT: [0]
+BUFFER_POOL_LIMIT: []
+DEFAULT_JOIN_DISTRIBUTION_MODE: [0]
+...
+
+Development Query Options:
+BATCH_SIZE: [0]
+DEBUG_ACTION: []
+DECIMAL_V2: [0]
+...
+
+Deprecated Query Options:
+ABORT_ON_DEFAULT_LIMIT_EXCEEDED: [0]
+ALLOW_UNSUPPORTED_FORMATS: [0]
+DEFAULT_ORDER_BY_LIMIT: [-1]
+...
+
+Shell Options
+ LIVE_PROGRESS: False
+ LIVE_SUMMARY: False
+
+Variables:
+ CUTOFF: 3
+ TABLE_NAME: staging_table
+
+[localhost:21000] > unset var:cutoff;
+Unsetting variable CUTOFF
+[localhost:21000] > select s from ${var:table_name} order by s limit ${var:cutoff};
+Error: Unknown variable CUTOFF
+</code></pre>
+
+ <p class="p">
+ See <a class="xref" href="impala_shell_running_commands.html">Running Commands and SQL Statements in impala-shell</a> for more examples of using the
+ <code class="ph codeph">--var</code>, <code class="ph codeph">SET</code>, and <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+ substitution technique in <span class="keyword cmdname">impala-shell</span>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">MEM_LIMIT</code> is probably the most commonly used query option. You can specify a high value to
+ allow a resource-intensive query to complete. For testing how queries would work on memory-constrained
+ systems, you might specify an artificially low value.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example sets some numeric and some Boolean query options to control usage of memory, disk
+ space, and timeout periods, then runs a query whose success could depend on the options in effect:
+ </p>
+
+<pre class="pre codeblock"><code>set mem_limit=64g;
+set DISABLE_UNSAFE_SPILLS=true;
+set parquet_file_size=400m;
+set RESERVATION_REQUEST_TIMEOUT=900000;
+insert overwrite parquet_table select c1, c2, count(c3) from text_table group by c1, c2, c3;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">SET</code> has always been available as an <span class="keyword cmdname">impala-shell</span> command. Promoting it to
+ a SQL statement lets you use this feature in client applications through the JDBC and ODBC APIs.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the query options you can adjust using this
+ statement.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_query_options.html">Query Options for the SET Statement</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_shell_commands.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_shell_commands.html b/docs/build3x/html/topics/impala_shell_commands.html
new file mode 100644
index 0000000..1d67a69
--- /dev/null
+++ b/docs/build3x/html/topics/impala_shell_commands.html
@@ -0,0 +1,416 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="shell_commands"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>impala-shell Command Reference</title></head><body id="shell_commands"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">impala-shell Command Reference</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Use the following commands within <code class="ph codeph">impala-shell</code> to pass requests to the
+ <code class="ph codeph">impalad</code> daemon that the shell is connected to. You can enter a command interactively at the
+ prompt, or pass it as the argument to the <code class="ph codeph">-q</code> option of <code class="ph codeph">impala-shell</code>. Most
+ of these commands are passed to the Impala daemon as SQL statements; refer to the corresponding
+ <a class="xref" href="impala_langref_sql.html#langref_sql">SQL language reference sections</a> for full syntax
+ details.
+ </p>
+
+ <table class="table"><caption></caption><colgroup><col style="width:20%"><col style="width:80%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="shell_commands__entry__1">
+ Command
+ </th>
+ <th class="entry nocellnorowborder" id="shell_commands__entry__2">
+ Explanation
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row" id="shell_commands__alter_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">alter</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Changes the underlying structure or settings of an Impala table, or a table shared between Impala
+ and Hive. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> and
+ <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__compute_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">compute stats</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Gathers important performance-related information for a table, used by Impala to optimize queries.
+ See <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__connect_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">connect</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Connects to the specified instance of <code class="ph codeph">impalad</code>. The default port of 21000 is
+ assumed unless you provide another value. You can connect to any host in your cluster that is
+ running <code class="ph codeph">impalad</code>. If you connect to an instance of <code class="ph codeph">impalad</code> that
+ was started with an alternate port specified by the <code class="ph codeph">--fe_port</code> flag, you must
+ provide that alternate port. See <a class="xref" href="impala_connecting.html#connecting">Connecting to impalad through impala-shell</a> for examples.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+ connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+ issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__describe_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">describe</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Shows the columns, column data types, and any column comments for a specified table.
+ <code class="ph codeph">DESCRIBE FORMATTED</code> shows additional information such as the HDFS data directory,
+ partitions, and internal properties for the table. See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a>
+ for details about the basic <code class="ph codeph">DESCRIBE</code> output and the <code class="ph codeph">DESCRIBE
+ FORMATTED</code> variant. You can use <code class="ph codeph">DESC</code> as shorthand for the
+ <code class="ph codeph">DESCRIBE</code> command.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__drop_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">drop</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Removes a schema object, and in some cases its associated data files. See
+ <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>, <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>,
+ <a class="xref" href="impala_drop_database.html#drop_database">DROP DATABASE Statement</a>, and
+ <a class="xref" href="impala_drop_function.html#drop_function">DROP FUNCTION Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__explain_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">explain</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Provides the execution plan for a query. <code class="ph codeph">EXPLAIN</code> represents a query as a series of
+ steps. For example, these steps might be map/reduce stages, metastore operations, or file system
+ operations such as move or rename. See <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and
+ <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__help_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">help</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Help provides a list of all available commands and options.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__history_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">history</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Maintains an enumerated cross-session command history. This history is stored in the
+ <span class="ph filepath">~/.impalahistory</span> file.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__insert_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">insert</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Writes the results of a query to a specified table. This either overwrites table data or appends
+ data to the existing table content. See <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__invalidate_metadata_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">invalidate metadata</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Updates <span class="keyword cmdname">impalad</span> metadata for table existence and structure. Use this command
+ after creating, dropping, or altering databases, tables, or partitions in Hive. See
+ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__profile_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">profile</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Displays low-level information about the most recent query. Used for performance diagnosis and
+ tuning. <span class="ph"> The report starts with the same information as produced by the
+ <code class="ph codeph">EXPLAIN</code> statement and the <code class="ph codeph">SUMMARY</code> command.</span> See
+ <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__quit_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">quit</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Exits the shell. Remember to include the final semicolon so that the shell recognizes the end of
+ the command.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__refresh_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">refresh</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Refreshes <span class="keyword cmdname">impalad</span> metadata for the locations of HDFS blocks corresponding to
+ Impala data files. Use this command after loading new data files into an Impala table through Hive
+ or through HDFS commands. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__rerun_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">rerun</code> or <code class="ph codeph">@</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Executes a previous <span class="keyword cmdname">impala-shell</span> command again,
+ from the list of commands displayed by the <code class="ph codeph">history</code>
+ command. These could be SQL statements, or commands specific to
+ <span class="keyword cmdname">impala-shell</span> such as <code class="ph codeph">quit</code>
+ or <code class="ph codeph">profile</code>.
+ </p>
+ <p class="p">
+ Specify an integer argument. A positive integer <code class="ph codeph">N</code>
+ represents the command labelled <code class="ph codeph">N</code> in the history list.
+ A negative integer <code class="ph codeph">-N</code> represents the <code class="ph codeph">N</code>th
+ command from the end of the list, such as -1 for the most recent command.
+ Commands that are executed again do not produce new entries in the
+ history list.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__select_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">select</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Specifies the data set on which to complete some action. All information returned from
+ <code class="ph codeph">select</code> can be sent to some output such as the console or a file or can be used to
+ complete some other element of query. See <a class="xref" href="impala_select.html#select">SELECT Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__set_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">set</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Manages query options for an <span class="keyword cmdname">impala-shell</span> session. The available options are the
+ ones listed in <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a>. These options are used for
+ query tuning and troubleshooting. Issue <code class="ph codeph">SET</code> with no arguments to see the current
+ query options, either based on the <span class="keyword cmdname">impalad</span> defaults, as specified by you at
+ <span class="keyword cmdname">impalad</span> startup, or based on earlier <code class="ph codeph">SET</code> statements in the same
+ session. To modify option values, issue commands with the syntax <code class="ph codeph">set
+ <var class="keyword varname">option</var>=<var class="keyword varname">value</var></code>. To restore an option to its default,
+ use the <code class="ph codeph">unset</code> command. Some options take Boolean values of <code class="ph codeph">true</code>
+ and <code class="ph codeph">false</code>. Others take numeric arguments, or quoted string values.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SET</code> statement has no effect until the <span class="keyword cmdname">impala-shell</span> interpreter is
+ connected to an Impala server. Once you are connected, any query options you set remain in effect as you
+ issue a subsequent <code class="ph codeph">CONNECT</code> command to connect to a different Impala host.
+ </p>
+
+ <p class="p">
+ In Impala 2.0 and later, <code class="ph codeph">SET</code> is available as a SQL statement for any kind of
+ application, not only through <span class="keyword cmdname">impala-shell</span>. See
+ <a class="xref" href="impala_set.html#set">SET Statement</a> for details.
+ </p>
+
+ <p class="p">
+ In Impala 2.5 and later, you can use <code class="ph codeph">SET</code> to define your own substitution variables
+ within an <span class="keyword cmdname">impala-shell</span> session.
+ Within a SQL statement, you substitute the value by using the notation <code class="ph codeph">${var:<var class="keyword varname">variable_name</var>}</code>.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__shell_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">shell</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Executes the specified command in the operating system shell without exiting
+ <code class="ph codeph">impala-shell</code>. You can use the <code class="ph codeph">!</code> character as shorthand for the
+ <code class="ph codeph">shell</code> command.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Quote any instances of the <code class="ph codeph">--</code> or <code class="ph codeph">/*</code> tokens to avoid them being
+ interpreted as the start of a comment. To embed comments within <code class="ph codeph">source</code> or
+ <code class="ph codeph">!</code> commands, use the shell comment character <code class="ph codeph">#</code> before the comment
+ portion of the line.
+ </div>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__show_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">show</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Displays metastore data for schema objects created and accessed through Impala, Hive, or both.
+ <code class="ph codeph">show</code> can be used to gather information about objects such as databases, tables, and functions.
+ See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__source_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">source</code> or <code class="ph codeph">src</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Executes one or more statements residing in a specified file from the local filesystem.
+ Allows you to perform the same kinds of batch operations as with the <code class="ph codeph">-f</code> option,
+ but interactively within the interpreter. The file can contain SQL statements and other
+ <span class="keyword cmdname">impala-shell</span> commands, including additional <code class="ph codeph">SOURCE</code> commands
+ to perform a flexible sequence of actions. Each command or statement, except the last one in the file,
+ must end with a semicolon.
+ See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for examples.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__summary_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">summary</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Summarizes the work performed in various stages of a query. It provides a higher-level view of the
+ information displayed by the <code class="ph codeph">EXPLAIN</code> command. Added in Impala 1.4.0. See
+ <a class="xref" href="impala_explain_plan.html#perf_summary">Using the SUMMARY Report for Performance Tuning</a> for details about the report format
+ and how to interpret it.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, you can see a continuously updated report of
+ the summary information while a query is in progress.
+ See <a class="xref" href="impala_live_summary.html#live_summary">LIVE_SUMMARY Query Option (Impala 2.3 or higher only)</a> for details.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__unset_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">unset</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Removes any user-specified value for a query option and returns the option to its default value.
+ See <a class="xref" href="impala_query_options.html#query_options">Query Options for the SET Statement</a> for the available query options.
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, it can also remove user-specified substitution variables
+ using the notation <code class="ph codeph">UNSET VAR:<var class="keyword varname">variable_name</var></code>.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__use_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">use</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Indicates the database against which to execute subsequent commands. Lets you avoid using fully
+ qualified names when referring to tables in databases other than <code class="ph codeph">default</code>. See
+ <a class="xref" href="impala_use.html#use">USE Statement</a> for details. Not effective with the <code class="ph codeph">-q</code> option,
+ because that option only allows a single statement in the argument.
+ </p>
+ </td>
+ </tr>
+ <tr class="row" id="shell_commands__version_cmd">
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__1 ">
+ <p class="p">
+ <code class="ph codeph">version</code>
+ </p>
+ </td>
+ <td class="entry nocellnorowborder" headers="shell_commands__entry__2 ">
+ <p class="p">
+ Returns Impala version information.
+ </p>
+ </td>
+ </tr>
+ </tbody></table>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
[43/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_complex_types.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_complex_types.html b/docs/build3x/html/topics/impala_complex_types.html
new file mode 100644
index 0000000..32e40d5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_complex_types.html
@@ -0,0 +1,2606 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="complex_types"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Complex Types (Impala 2.3 or higher only)</title></head><body id="complex_types"><main role="main"><article role="article" aria-labelledby="complex_types__nested_types">
+
+ <h1 class="title topictitle1" id="complex_types__nested_types">Complex Types (<span class="keyword">Impala 2.3</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ <dfn class="term">Complex types</dfn> (also referred to as <dfn class="term">nested types</dfn>) let you represent multiple data values within a single
+ row/column position. They differ from the familiar column types such as <code class="ph codeph">BIGINT</code> and <code class="ph codeph">STRING</code>, known as
+ <dfn class="term">scalar types</dfn> or <dfn class="term">primitive types</dfn>, which represent a single data value within a given row/column position.
+ Impala supports the complex types <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> in <span class="keyword">Impala 2.3</span>
+ and higher. The Hive <code class="ph codeph">UNION</code> type is not currently supported.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <p class="p">
+ Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax
+ and examples:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>
+ </li>
+ </ul>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="complex_types__complex_types_benefits">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Benefits of Impala Complex Types</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The reasons for using Impala complex types include the following:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ You already have data produced by Hive or other non-Impala component that uses the complex type column names. You might need to
+ convert the underlying data to Parquet to use it with Impala.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Your data model originates with a non-SQL programming language or a NoSQL data management system. For example, if you are
+ representing Python data expressed as nested lists, dictionaries, and tuples, those data structures correspond closely to Impala
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Your analytic queries involving multiple tables could benefit from greater locality during join processing. By packing more
+ related data items within each HDFS data block, complex types let join queries avoid the network overhead of the traditional
+ Hadoop shuffle or broadcast join techniques.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The Impala complex type support produces result sets with all scalar values, and the scalar components of complex types can be used
+ with all SQL clauses, such as <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">ORDER BY</code>, all kinds of joins, subqueries, and inline
+ views. The ability to process complex type data entirely in SQL reduces the need to write application-specific code in Java or other
+ programming languages to deconstruct the underlying data structures.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="complex_types__complex_types_overview">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Overview of Impala Complex Types</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> types are closely related: they represent collections with arbitrary numbers of
+ elements, where each element is the same type. In contrast, <code class="ph codeph">STRUCT</code> groups together a fixed number of items into a
+ single element. The parts of a <code class="ph codeph">STRUCT</code> element (the <dfn class="term">fields</dfn>) can be of different types, and each field
+ has a name.
+ </p>
+
+ <p class="p">
+ The elements of an <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>, can also be other
+ complex types. You can construct elaborate data structures with up to 100 levels of nesting. For example, you can make an
+ <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">STRUCT</code>s. Within each <code class="ph codeph">STRUCT</code>, you can have some fields
+ that are <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, or another kind of <code class="ph codeph">STRUCT</code>. The Impala documentation uses the
+ terms complex and nested types interchangeably; for simplicity, it primarily uses the term complex types, to encompass all the
+ properties of these types.
+ </p>
+
+ <p class="p">
+ When visualizing your data model in familiar SQL terms, you can think of each <code class="ph codeph">ARRAY</code> or <code class="ph codeph">MAP</code> as a
+ miniature table, and each <code class="ph codeph">STRUCT</code> as a row within such a table. By default, the table represented by an
+ <code class="ph codeph">ARRAY</code> has two columns, <code class="ph codeph">POS</code> to represent ordering of elements, and <code class="ph codeph">ITEM</code>
+ representing the value of each element. Likewise, by default, the table represented by a <code class="ph codeph">MAP</code> encodes key-value
+ pairs, and therefore has two columns, <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>.
+
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ITEM</code> and <code class="ph codeph">VALUE</code> names are only required for the very simplest kinds of <code class="ph codeph">ARRAY</code>
+ and <code class="ph codeph">MAP</code> columns, ones that hold only scalar values. When the elements within the <code class="ph codeph">ARRAY</code> or
+ <code class="ph codeph">MAP</code> are of type <code class="ph codeph">STRUCT</code> rather than a scalar type, then the result set contains columns with names
+ corresponding to the <code class="ph codeph">STRUCT</code> fields rather than <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code>.
+ </p>
+
+
+
+ <p class="p">
+ You write most queries that process complex type columns using familiar join syntax, even though the data for both sides of the join
+ resides in a single table. The join notation brings together the scalar values from a row with the values from the complex type
+ columns for that same row. The final result set contains all scalar values, allowing you to do all the familiar filtering,
+ aggregation, ordering, and so on for the complex data entirely in SQL or using business intelligence tools that issue SQL queries.
+
+ </p>
+
+ <p class="p">
+ Behind the scenes, Impala ensures that the processing for each row is done efficiently on a single host, without the network traffic
+ involved in broadcast or shuffle joins. The most common type of join query for tables with complex type columns is <code class="ph codeph">INNER
+ JOIN</code>, which returns results only in those cases where the complex type contains some elements. Therefore, most query
+ examples in this section use either the <code class="ph codeph">INNER JOIN</code> clause or the equivalent comma notation.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Although Impala can query complex types that are present in Parquet files, Impala currently cannot create new Parquet files
+ containing complex types. Therefore, the discussion and examples presume that you are working with existing Parquet data produced
+ through Hive, Spark, or some other source. See <a class="xref" href="#complex_types_ex_hive_etl">Constructing Parquet Files with Complex Columns Using Hive</a> for examples of constructing Parquet data
+ files with complex type columns.
+ </p>
+
+ <p class="p">
+ For learning purposes, you can create empty tables with complex type columns and practice query syntax, even if you do not have
+ sample data with the required structure.
+ </p>
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="complex_types__complex_types_design">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Design Considerations for Complex Types</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When planning to use Impala complex types, and designing the Impala schema, first learn how this kind of schema differs from
+ traditional table layouts from the relational database and data warehousing fields. Because you might have already encountered
+ complex types in a Hadoop context while using Hive for ETL, also learn how to write high-performance analytic queries for complex
+ type data using Impala SQL syntax.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="complex_types_design__complex_types_vs_rdbms">
+
+ <h3 class="title topictitle3" id="ariaid-title5">How Complex Types Differ from Traditional Data Warehouse Schemas</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Complex types let you associate arbitrary data structures with a particular row. If you are familiar with schema design for
+ relational database management systems or data warehouses, a schema with complex types has the following differences:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Logically, related values can now be grouped tightly together in the same table.
+ </p>
+
+ <p class="p">
+ In traditional data warehousing, related values were typically arranged in one of two ways:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Split across multiple normalized tables. Foreign key columns specified which rows from each table were associated with
+ each other. This arrangement avoided duplicate data and therefore the data was compact, but join queries could be
+ expensive because the related data had to be retrieved from separate locations. (In the case of distributed Hadoop
+ queries, the joined tables might even be transmitted between different hosts in a cluster.)
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Flattened into a single denormalized table. Although this layout eliminated some potential performance issues by removing
+ the need for join queries, the table typically became larger because values were repeated. The extra data volume could
+ cause performance issues in other parts of the workflow, such as longer ETL cycles or more expensive full-table scans
+ during queries.
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ Complex types represent a middle ground that addresses these performance and volume concerns. By physically locating related
+ data within the same data files, complex types increase locality and reduce the expense of join queries. By associating an
+ arbitrary amount of data with a single row, complex types avoid the need to repeat lengthy values such as strings. Because
+ Impala knows which complex type values are associated with each row, you can save storage by avoiding artificial foreign key
+ values that are only used for joins. The flexibility of the <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and
+ <code class="ph codeph">MAP</code> types lets you model familiar constructs such as fact and dimension tables from a data warehouse, and
+ wide tables representing sparse matrixes.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="complex_types_design__complex_types_physical">
+
+ <h3 class="title topictitle3" id="ariaid-title6">Physical Storage for Complex Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Physically, the scalar and complex columns in each row are located adjacent to each other in the same Parquet data file, ensuring
+ that they are processed on the same host rather than being broadcast across the network when cross-referenced within a query. This
+ co-location simplifies the process of copying, converting, and backing all the columns up at once. Because of the column-oriented
+ layout of Parquet files, you can still query only the scalar columns of a table without imposing the I/O penalty of reading the
+ (possibly large) values of the composite columns.
+ </p>
+
+ <p class="p">
+ Within each Parquet data file, the constituent parts of complex type columns are stored in column-oriented format:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Each field of a <code class="ph codeph">STRUCT</code> type is stored like a column, with all the scalar values adjacent to each other and
+ encoded, compressed, and so on using the Parquet space-saving techniques.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For an <code class="ph codeph">ARRAY</code> containing scalar values, all those values (represented by the <code class="ph codeph">ITEM</code>
+ pseudocolumn) are stored adjacent to each other.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ For a <code class="ph codeph">MAP</code>, the values of the <code class="ph codeph">KEY</code> pseudocolumn are stored adjacent to each other. If the
+ <code class="ph codeph">VALUE</code> pseudocolumn is a scalar type, its values are also stored adjacent to each other.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If an <code class="ph codeph">ARRAY</code> element, <code class="ph codeph">STRUCT</code> field, or <code class="ph codeph">MAP</code> <code class="ph codeph">VALUE</code> part is
+ another complex type, the column-oriented storage applies to the next level down (or the next level after that, and so on for
+ deeply nested types) where the final elements, fields, or values are of scalar types.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ The numbers represented by the <code class="ph codeph">POS</code> pseudocolumn of an <code class="ph codeph">ARRAY</code> are not physically stored in the
+ data files. They are synthesized at query time based on the order of the <code class="ph codeph">ARRAY</code> elements associated with each row.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="complex_types_design__complex_types_file_formats">
+
+ <h3 class="title topictitle3" id="ariaid-title7">File Format Support for Impala Complex Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Currently, Impala queries support complex type data only in the Parquet file format. See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>
+ for details about the performance benefits and physical layout of this file format.
+ </p>
+
+ <p class="p">
+ Each table, or each partition within a table, can have a separate file format, and you can change file format at the table or
+ partition level through an <code class="ph codeph">ALTER TABLE</code> statement. Because this flexibility makes it difficult to guarantee ahead
+ of time that all the data files for a table or partition are in a compatible format, Impala does not throw any errors when you
+ change the file format for a table or partition using <code class="ph codeph">ALTER TABLE</code>. Any errors come at runtime when Impala
+ actually processes a table or partition that contains nested types and is not in one of the supported formats. If a query on a
+ partitioned table only processes some partitions, and all those partitions are in one of the supported formats, the query
+ succeeds.
+ </p>
+
+ <p class="p">
+ Because Impala does not parse the data structures containing nested types for unsupported formats such as text, Avro,
+ SequenceFile, or RCFile, you cannot use data files in these formats with Impala, even if the query does not refer to the nested
+ type columns. Also, if a table using an unsupported format originally contained nested type columns, and then those columns were
+ dropped from the table using <code class="ph codeph">ALTER TABLE ... DROP COLUMN</code>, any existing data files in the table still contain the
+ nested type data and Impala queries on that table will generate errors.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The one exception to the preceding rule is <code class="ph codeph">COUNT(*)</code> queries on RCFile tables that include complex types.
+ Such queries are allowed in <span class="keyword">Impala 2.6</span> and higher.
+ </p>
+ </div>
+
+ <p class="p">
+ You can perform DDL operations (even <code class="ph codeph">CREATE TABLE</code>) for tables involving complex types in file formats other than
+ Parquet. The DDL support lets you set up intermediate tables in your ETL pipeline, to be populated by Hive, before the final stage
+ where the data resides in a Parquet table and is queryable by Impala. Also, you can have a partitioned table with complex type
+ columns that uses a non-Parquet format, and use <code class="ph codeph">ALTER TABLE</code> to change the file format to Parquet for individual
+ partitions. When you put Parquet data files into those partitions, Impala can execute queries against that data as long as the
+ query does not involve any of the non-Parquet partitions.
+ </p>
+
+ <p class="p">
+ If you use the <span class="keyword cmdname">parquet-tools</span> command to examine the structure of a Parquet data file that includes complex
+ types, you see that both <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> are represented as a <code class="ph codeph">Bag</code> in Parquet
+ terminology, with all fields marked <code class="ph codeph">Optional</code> because Impala allows any column to be nullable.
+ </p>
+
+ <p class="p">
+ Impala supports either 2-level and 3-level encoding within each Parquet data file. When constructing Parquet data files outside
+ Impala, use either encoding style but do not mix 2-level and 3-level encoding within the same data file.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="complex_types_design__complex_types_vs_normalization">
+
+ <h3 class="title topictitle3" id="ariaid-title8">Choosing Between Complex Types and Normalized Tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Choosing between multiple normalized fact and dimension tables, or a single table containing complex types, is an important design
+ decision.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ If you are coming from a traditional database or data warehousing background, you might be familiar with how to split up data
+ between tables. Your business intelligence tools might already be optimized for dealing with this kind of multi-table scenario
+ through join queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you are pulling data from Impala into an application written in a programming language that has data structures analogous
+ to the complex types, such as Python or Java, complex types in Impala could simplify data interchange and improve
+ understandability and reliability of your program logic.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ You might already be faced with existing infrastructure or receive high volumes of data that assume one layout or the other.
+ For example, complex types are popular with web-oriented applications, for example to keep information about an online user
+ all in one place for convenient lookup and analysis, or to deal with sparse or constantly evolving data fields.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If some parts of the data change over time while related data remains constant, using multiple normalized tables lets you
+ replace certain parts of the data without reloading the entire data set. Conversely, if you receive related data all bundled
+ together, such as in JSON files, using complex types can save the overhead of splitting the related items across multiple
+ tables.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ From a performance perspective:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In Parquet tables, Impala can skip columns that are not referenced in a query, avoiding the I/O penalty of reading the
+ embedded data. When complex types are nested within a column, the data is physically divided at a very granular level; for
+ example, a query referring to data nested multiple levels deep in a complex type column does not have to read all the data
+ from that column, only the data for the relevant parts of the column type hierarchy.
+
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Complex types avoid the possibility of expensive join queries when data from fact and dimension tables is processed in
+ parallel across multiple hosts. All the information for a row containing complex types is typically to be in the same data
+ block, and therefore does not need to be transmitted across the network when joining fields that are all part of the same
+ row.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The tradeoff with complex types is that fewer rows fit in each data block. Whether it is better to have more data blocks
+ with fewer rows, or fewer data blocks with many rows, depends on the distribution of your data and the characteristics of
+ your query workload. If the complex columns are rarely referenced, using them might lower efficiency. If you are seeing
+ low parallelism due to a small volume of data (relatively few data blocks) in each table partition, increasing the row
+ size by including complex columns might produce more data blocks and thus spread the work more evenly across the cluster.
+ See <a class="xref" href="impala_scalability.html#scalability">Scalability Considerations for Impala</a> for more on this advanced topic.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="complex_types_design__complex_types_hive">
+
+ <h3 class="title topictitle3" id="ariaid-title9">Differences Between Impala and Hive Complex Types</h3>
+
+ <div class="body conbody">
+
+
+
+
+
+
+
+ <p class="p">
+ Impala can query Parquet tables containing <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> columns
+ produced by Hive. There are some differences to be aware of between the Impala SQL and HiveQL syntax for complex types, primarily
+ for queries.
+ </p>
+
+ <p class="p">
+ The syntax for specifying <code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code> types in a <code class="ph codeph">CREATE
+ TABLE</code> statement is compatible between Impala and Hive.
+ </p>
+
+ <p class="p">
+ Because Impala <code class="ph codeph">STRUCT</code> columns include user-specified field names, you use the <code class="ph codeph">NAMED_STRUCT()</code>
+ constructor in Hive rather than the <code class="ph codeph">STRUCT()</code> constructor when you populate an Impala <code class="ph codeph">STRUCT</code>
+ column using a Hive <code class="ph codeph">INSERT</code> statement.
+ </p>
+
+ <p class="p">
+ The Hive <code class="ph codeph">UNION</code> type is not currently supported in Impala.
+ </p>
+
+ <p class="p">
+ While Impala usually aims for a high degree of compatibility with HiveQL query syntax, Impala syntax differs from Hive for queries
+ involving complex types. The differences are intended to provide extra flexibility for queries involving these kinds of tables.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Impala uses dot notation for referring to element names or elements within complex types, and join notation for
+ cross-referencing scalar columns with the elements of complex types within the same row, rather than the <code class="ph codeph">LATERAL
+ VIEW</code> clause and <code class="ph codeph">EXPLODE()</code> function of HiveQL.
+ </li>
+
+ <li class="li">
+ Using join notation lets you use all the kinds of join queries with complex type columns. For example, you can use a
+ <code class="ph codeph">LEFT OUTER JOIN</code>, <code class="ph codeph">LEFT ANTI JOIN</code>, or <code class="ph codeph">LEFT SEMI JOIN</code> query to evaluate
+ different scenarios where the complex columns do or do not contain any elements.
+ </li>
+
+ <li class="li">
+ You can include references to collection types inside subqueries and inline views. For example, you can construct a
+ <code class="ph codeph">FROM</code> clause where one of the <span class="q">"tables"</span> is a subquery against a complex type column, or use a subquery
+ against a complex type column as the argument to an <code class="ph codeph">IN</code> or <code class="ph codeph">EXISTS</code> clause.
+ </li>
+
+ <li class="li">
+ The Impala pseudocolumn <code class="ph codeph">POS</code> lets you retrieve the position of elements in an array along with the elements
+ themselves, equivalent to the <code class="ph codeph">POSEXPLODE()</code> function of HiveQL. You do not use index notation to retrieve a
+ single array element in a query; the join query loops through the array elements and you use <code class="ph codeph">WHERE</code> clauses to
+ specify which elements to return.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Join clauses involving complex type columns do not require an <code class="ph codeph">ON</code> or <code class="ph codeph">USING</code> clause. Impala
+ implicitly applies the join key so that the correct array entries or map elements are associated with the correct row from the
+ table.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala does not currently support the <code class="ph codeph">UNION</code> complex type.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="complex_types_design__complex_types_limits">
+
+ <h3 class="title topictitle3" id="ariaid-title10">Limitations and Restrictions for Complex Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Complex type columns can only be used in tables or partitions with the Parquet file format.
+ </p>
+
+ <p class="p">
+ Complex type columns cannot be used as partition key columns in a partitioned table.
+ </p>
+
+ <p class="p">
+ When you use complex types with the <code class="ph codeph">ORDER BY</code>, <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>, or
+ <code class="ph codeph">WHERE</code> clauses, you cannot refer to the column name by itself. Instead, you refer to the names of the scalar
+ values within the complex type, such as the <code class="ph codeph">ITEM</code>, <code class="ph codeph">POS</code>, <code class="ph codeph">KEY</code>, or
+ <code class="ph codeph">VALUE</code> pseudocolumns, or the field names from a <code class="ph codeph">STRUCT</code>.
+ </p>
+
+ <p class="p">
+ The maximum depth of nesting for complex types is 100 levels.
+ </p>
+
+ <p class="p">
+ The maximum length of the column definition for any complex type, including declarations for any nested types,
+ is 4000 characters.
+ </p>
+
+ <p class="p">
+ For ideal performance and scalability, use small or medium-sized collections, where all the complex columns contain at most a few
+ hundred megabytes per row. Remember, all the columns of a row are stored in the same HDFS data block, whose size in Parquet files
+ typically ranges from 256 MB to 1 GB.
+ </p>
+
+ <p class="p">
+ Including complex type columns in a table introduces some overhead that might make queries that do not reference those columns
+ somewhat slower than Impala queries against tables without any complex type columns. Expect at most a 2x slowdown compared to
+ tables that do not have any complex type columns.
+ </p>
+
+ <p class="p">
+ Currently, the <code class="ph codeph">COMPUTE STATS</code> statement does not collect any statistics for columns containing complex types.
+ Impala uses heuristics to construct execution plans involving complex type columns.
+ </p>
+
+ <p class="p">
+ Currently, Impala built-in functions and user-defined functions cannot accept complex types as parameters or produce them as
+ function return values. (When the complex type values are materialized in an Impala result set, the result set contains the scalar
+ components of the values, such as the <code class="ph codeph">POS</code> or <code class="ph codeph">ITEM</code> for an <code class="ph codeph">ARRAY</code>, the
+ <code class="ph codeph">KEY</code> or <code class="ph codeph">VALUE</code> for a <code class="ph codeph">MAP</code>, or the fields of a <code class="ph codeph">STRUCT</code>; these
+ scalar data items <em class="ph i">can</em> be used with built-in functions and UDFs as usual.)
+ </p>
+
+ <p class="p">
+ Impala currently cannot write new data files containing complex type columns.
+ Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+ involving complex type columns, you cannot use a statement form that writes
+ data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+ To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+ ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+ </p>
+
+ <p class="p">
+ Currently, Impala can query complex type columns only from Parquet tables or Parquet partitions within partitioned tables.
+ Although you can use complex types in tables with Avro, text, and other file formats as part of your ETL pipeline, for example as
+ intermediate tables populated through Hive, doing analytics through Impala requires that the data eventually ends up in a Parquet
+ table. The requirement for Parquet data files means that you can use complex types with Impala tables hosted on other kinds of
+ file storage systems such as Isilon and Amazon S3, but you cannot use Impala to query complex types from HBase tables. See
+ <a class="xref" href="impala_complex_types.html#complex_types_file_formats">File Format Support for Impala Complex Types</a> for more details.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="complex_types__complex_types_using">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Using Complex Types from SQL</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When using complex types through SQL in Impala, you learn the notation for <code class="ph codeph">< ></code> delimiters for the complex
+ type columns in <code class="ph codeph">CREATE TABLE</code> statements, and how to construct join queries to <span class="q">"unpack"</span> the scalar values
+ nested inside the complex data structures. You might need to condense a traditional RDBMS or data warehouse schema into a smaller
+ number of Parquet tables, and use Hive, Spark, Pig, or other mechanism outside Impala to populate the tables with data.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="complex_types_using__nested_types_ddl">
+
+ <h3 class="title topictitle3" id="ariaid-title12">Complex Type Syntax for DDL Statements</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The definition of <var class="keyword varname">data_type</var>, as seen in the <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code>
+ statements, now includes complex types in addition to primitive types:
+ </p>
+
+<pre class="pre codeblock"><code> primitive_type
+| array_type
+| map_type
+| struct_type
+</code></pre>
+
+ <p class="p">
+ Unions are not currently supported.
+ </p>
+
+ <p class="p">
+ Array, struct, and map column type declarations are specified in the <code class="ph codeph">CREATE TABLE</code> statement. You can also add or
+ change the type of complex columns through the <code class="ph codeph">ALTER TABLE</code> statement.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Currently, Impala queries allow complex types only in tables that use the Parquet format. If an Impala query encounters complex
+ types in a table or partition using another file format, the query returns a runtime error.
+ </p>
+
+ <p class="p">
+ The Impala DDL support for complex types works for all file formats, so that you can create tables using text or other
+ non-Parquet formats for Hive to use as staging tables in an ETL cycle that ends with the data in a Parquet table. You can also
+ use <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT PARQUET</code> to change the file format of an existing table containing complex
+ types to Parquet, after which Impala can query it. Make sure to load Parquet files into the table after changing the file
+ format, because the <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> statement does not convert existing data to the new file
+ format.
+ </p>
+ </div>
+
+ <p class="p">
+ Partitioned tables can contain complex type columns.
+ All the partition key columns must be scalar types.
+ </p>
+
+ <p class="p">
+ Because use cases for Impala complex types require that you already have Parquet data files produced outside of Impala, you can
+ use the Impala <code class="ph codeph">CREATE TABLE LIKE PARQUET</code> syntax to produce a table with columns that match the structure of an
+ existing Parquet file, including complex type columns for nested data structures. Remember to include the <code class="ph codeph">STORED AS
+ PARQUET</code> clause in this case, because even with <code class="ph codeph">CREATE TABLE LIKE PARQUET</code>, the default file format of the
+ resulting table is still text.
+ </p>
+
+ <p class="p">
+ Because the complex columns are omitted from the result set of an Impala <code class="ph codeph">SELECT *</code> or <code class="ph codeph">SELECT
+ <var class="keyword varname">col_name</var></code> query, and because Impala currently does not support writing Parquet files with complex type
+ columns, you cannot use the <code class="ph codeph">CREATE TABLE AS SELECT</code> syntax to create a table with nested type columns.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Once you have a table set up with complex type columns, use the <code class="ph codeph">DESCRIBE</code> and <code class="ph codeph">SHOW CREATE TABLE</code>
+ statements to see the correct notation with <code class="ph codeph"><</code> and <code class="ph codeph">></code> delimiters and comma and colon
+ separators within the complex type definitions. If you do not have existing data with the same layout as the table, you can
+ query the empty table to practice with the notation for the <code class="ph codeph">SELECT</code> statement. In the <code class="ph codeph">SELECT</code>
+ list, you use dot notation and pseudocolumns such as <code class="ph codeph">ITEM</code>, <code class="ph codeph">KEY</code>, and <code class="ph codeph">VALUE</code> for
+ referring to items within the complex type columns. In the <code class="ph codeph">FROM</code> clause, you use join notation to construct
+ table aliases for any referenced <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns.
+ </p>
+ </div>
+
+
+
+ <p class="p">
+ For example, when defining a table that holds contact information, you might represent phone numbers differently depending on the
+ expected layout and relationships of the data, and how well you can predict those properties in advance.
+ </p>
+
+ <p class="p">
+ Here are different ways that you might represent phone numbers in a traditional relational schema, with equivalent representations
+ using complex types.
+ </p>
+
+ <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_fixed"><figcaption><span class="fig--title-label">Figure 1. </span>Traditional Relational Representation of Phone Numbers: Single Table</figcaption>
+
+
+
+ <p class="p">
+ The traditional, simplest way to represent phone numbers in a relational table is to store all contact info in a single table,
+ with all columns having scalar types, and each potential phone number represented as a separate column. In this example, each
+ person can only have these 3 types of phone numbers. If the person does not have a particular kind of phone number, the
+ corresponding column is <code class="ph codeph">NULL</code> for that row.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_fixed_phones
+(
+ id BIGINT
+ , name STRING
+ , address STRING
+ , home_phone STRING
+ , work_phone STRING
+ , mobile_phone STRING
+) STORED AS PARQUET;
+</code></pre>
+
+ </figure>
+
+ <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array"><figcaption><span class="fig--title-label">Figure 2. </span>An Array of Phone Numbers</figcaption>
+
+
+
+ <p class="p">
+ Using a complex type column to represent the phone numbers adds some extra flexibility. Now there could be an unlimited number
+ of phone numbers. Because the array elements have an order but not symbolic names, you could decide in advance that
+ phone_number[0] is the home number, [1] is the work number, [2] is the mobile number, and so on. (In subsequent examples, you
+ will see how to create a more flexible naming scheme using other complex type variations, such as a <code class="ph codeph">MAP</code> or an
+ <code class="ph codeph">ARRAY</code> where each element is a <code class="ph codeph">STRUCT</code>.)
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_array_of_phones
+(
+ id BIGINT
+ , name STRING
+ , address STRING
+ , phone_number ARRAY < STRING >
+) STORED AS PARQUET;
+
+</code></pre>
+
+ </figure>
+
+ <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_map"><figcaption><span class="fig--title-label">Figure 3. </span>A Map of Phone Numbers</figcaption>
+
+
+
+ <p class="p">
+ Another way to represent an arbitrary set of phone numbers is with a <code class="ph codeph">MAP</code> column. With a <code class="ph codeph">MAP</code>,
+ each element is associated with a key value that you specify, which could be a numeric, string, or other scalar type. This
+ example uses a <code class="ph codeph">STRING</code> key to give each phone number a name, such as <code class="ph codeph">'home'</code> or
+ <code class="ph codeph">'mobile'</code>. A query could filter the data based on the key values, or display the key values in reports.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_unlimited_phones
+(
+ id BIGINT, name STRING, address STRING, phone_number MAP < STRING,STRING >
+) STORED AS PARQUET;
+
+</code></pre>
+
+ </figure>
+
+ <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_flat_normalized"><figcaption><span class="fig--title-label">Figure 4. </span>Traditional Relational Representation of Phone Numbers: Normalized Tables</figcaption>
+
+
+
+ <p class="p">
+ If you are an experienced database designer, you already know how to work around the limitations of the single-table schema from
+ <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_fixed">Figure 1</a>. By normalizing the schema, with the phone numbers in their own
+ table, you can associate an arbitrary set of phone numbers with each person, and associate additional details with each phone
+ number, such as whether it is a home, work, or mobile phone.
+ </p>
+
+ <p class="p">
+ The flexibility of this approach comes with some drawbacks. Reconstructing all the data for a particular person requires a join
+ query, which might require performance tuning on Hadoop because the data from each table might be transmitted from a different
+ host. Data management tasks such as backups and refreshing the data require dealing with multiple tables instead of a single
+ table.
+ </p>
+
+ <p class="p">
+ This example illustrates a traditional database schema to store contact info normalized across 2 tables. The fact table
+ establishes the identity and basic information about person. A dimension table stores information only about phone numbers,
+ using an ID value to associate each phone number with a person ID from the fact table. Each person can have 0, 1, or many
+ phones; the categories are not restricted to a few predefined ones; and the phone table can contain as many columns as desired,
+ to represent all sorts of details about each phone number.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE fact_contacts (id BIGINT, name STRING, address STRING) STORED AS PARQUET;
+CREATE TABLE dim_phones
+(
+ contact_id BIGINT
+ , category STRING
+ , international_code STRING
+ , area_code STRING
+ , exchange STRING
+ , extension STRING
+ , mobile BOOLEAN
+ , carrier STRING
+ , current BOOLEAN
+ , service_start_date TIMESTAMP
+ , service_end_date TIMESTAMP
+)
+STORED AS PARQUET;
+</code></pre>
+
+ </figure>
+
+ <figure class="fig fignone" id="nested_types_ddl__complex_types_phones_array_struct"><figcaption><span class="fig--title-label">Figure 5. </span>Phone Numbers Represented as an Array of Structs</figcaption>
+
+
+
+ <p class="p">
+ To represent a schema equivalent to the one from <a class="xref" href="#nested_types_ddl__complex_types_phones_flat_normalized">Figure 4</a> using
+ complex types, this example uses an <code class="ph codeph">ARRAY</code> where each array element is a <code class="ph codeph">STRUCT</code>. As with the
+ earlier complex type examples, each person can have an arbitrary set of associated phone numbers. Making each array element into
+ a <code class="ph codeph">STRUCT</code> lets us associate multiple data items with each phone number, and give a separate name and type to
+ each data item. The <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code> elements reproduce the columns of the dimension
+ table from the previous example.
+ </p>
+
+ <p class="p">
+ You can do all the same kinds of queries with the complex type schema as with the normalized schema from the previous example.
+ The advantages of the complex type design are in the areas of convenience and performance. Now your backup and ETL processes
+ only deal with a single table. When a query uses a join to cross-reference the information about a person with their associated
+ phone numbers, all the relevant data for each row resides in the same HDFS data block, meaning each row can be processed on a
+ single host without requiring network transmission.
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE contacts_detailed_phones
+(
+ id BIGINT, name STRING, address STRING
+ , phone ARRAY < STRUCT <
+ category: STRING
+ , international_code: STRING
+ , area_code: STRING
+ , exchange: STRING
+ , extension: STRING
+ , mobile: BOOLEAN
+ , carrier: STRING
+ , current: BOOLEAN
+ , service_start_date: TIMESTAMP
+ , service_end_date: TIMESTAMP
+ >>
+) STORED AS PARQUET;
+
+</code></pre>
+
+ </figure>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="complex_types_using__complex_types_sql">
+
+ <h3 class="title topictitle3" id="ariaid-title13">SQL Statements that Support Complex Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala SQL statements that support complex types are currently
+ <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_describe.html#describe">DESCRIBE</a></code>,
+ <code class="ph codeph"><a class="xref" href="impala_load_data.html#load_data">LOAD DATA</a></code>, and
+ <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code>. That is, currently Impala can create or alter tables
+ containing complex type columns, examine the structure of a table containing complex type columns, import existing data files
+ containing complex type columns into a table, and query Parquet tables containing complex types.
+ </p>
+
+ <p class="p">
+ Impala currently cannot write new data files containing complex type columns.
+ Therefore, although the <code class="ph codeph">SELECT</code> statement works for queries
+ involving complex type columns, you cannot use a statement form that writes
+ data to complex type columns, such as <code class="ph codeph">CREATE TABLE AS SELECT</code> or <code class="ph codeph">INSERT ... SELECT</code>.
+ To create data files containing complex type data, use the Hive <code class="ph codeph">INSERT</code> statement, or another
+ ETL mechanism such as MapReduce jobs, Spark jobs, Pig, and so on.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title14" id="complex_types_sql__complex_types_ddl">
+
+ <h4 class="title topictitle4" id="ariaid-title14">DDL Statements and Complex Types</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Column specifications for complex or nested types use <code class="ph codeph"><</code> and <code class="ph codeph">></code> delimiters:
+ </p>
+
+<pre class="pre codeblock"><code>-- What goes inside the < > for an ARRAY is a single type, either a scalar or another
+-- complex type (ARRAY, STRUCT, or MAP).
+CREATE TABLE array_t
+(
+ id BIGINT,
+ a1 ARRAY <STRING>,
+ a2 ARRAY <BIGINT>,
+ a3 ARRAY <TIMESTAMP>,
+ a4 ARRAY <STRUCT <f1: STRING, f2: INT, f3: BOOLEAN>>
+)
+STORED AS PARQUET;
+
+-- What goes inside the < > for a MAP is two comma-separated types specifying the types of the key-value pair:
+-- a scalar type representing the key, and a scalar or complex type representing the value.
+CREATE TABLE map_t
+(
+ id BIGINT,
+ m1 MAP <STRING, STRING>,
+ m2 MAP <STRING, BIGINT>,
+ m3 MAP <BIGINT, STRING>,
+ m4 MAP <BIGINT, BIGINT>,
+ m5 MAP <STRING, ARRAY <STRING>>
+)
+STORED AS PARQUET;
+
+-- What goes inside the < > for a STRUCT is a comma-separated list of fields, each field defined as
+-- name:type. The type can be a scalar or a complex type. The field names for each STRUCT do not clash
+-- with the names of table columns or fields in other STRUCTs. A STRUCT is most often used inside
+-- an ARRAY or a MAP rather than as a top-level column.
+CREATE TABLE struct_t
+(
+ id BIGINT,
+ s1 STRUCT <f1: STRING, f2: BIGINT>,
+ s2 ARRAY <STRUCT <f1: INT, f2: TIMESTAMP>>,
+ s3 MAP <BIGINT, STRUCT <name: STRING, birthday: TIMESTAMP>>
+)
+STORED AS PARQUET;
+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="ariaid-title15" id="complex_types_sql__complex_types_queries">
+
+ <h4 class="title topictitle4" id="ariaid-title15">Queries and Complex Types</h4>
+
+ <div class="body conbody">
+
+
+
+
+
+ <p class="p">
+ The result set of an Impala query always contains all scalar types; the elements and fields within any complex type queries must
+ be <span class="q">"unpacked"</span> using join queries. A query cannot directly retrieve the entire value for a complex type column. Impala
+ returns an error in this case. Queries using <code class="ph codeph">SELECT *</code> are allowed for tables with complex types, but the
+ columns with complex types are skipped.
+ </p>
+
+ <p class="p">
+ The following example shows how referring directly to a complex type column returns an error, while <code class="ph codeph">SELECT *</code> on
+ the same table succeeds, but only retrieves the scalar columns.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+
+
+<pre class="pre codeblock"><code>SELECT c_orders FROM customer LIMIT 1;
+ERROR: AnalysisException: Expr 'c_orders' in select list returns a complex type 'ARRAY<STRUCT<o_orderkey:BIGINT,o_orderstatus:STRING, ... l_receiptdate:STRING,l_shipinstruct:STRING,l_shipmode:STRING,l_comment:STRING>>>>'.
+Only scalar types are allowed in the select list.
+
+-- Original column has several scalar and one complex column.
+DESCRIBE customer;
++--------------+------------------------------------+
+| name | type |
++--------------+------------------------------------+
+| c_custkey | bigint |
+| c_name | string |
+...
+| c_orders | array<struct< |
+| | o_orderkey:bigint, |
+| | o_orderstatus:string, |
+| | o_totalprice:decimal(12,2), |
+...
+| | >> |
++--------------+------------------------------------+
+
+-- When we SELECT * from that table, only the scalar columns come back in the result set.
+CREATE TABLE select_star_customer STORED AS PARQUET AS SELECT * FROM customer;
++------------------------+
+| summary |
++------------------------+
+| Inserted 150000 row(s) |
++------------------------+
+
+-- The c_orders column, being of complex type, was not included in the SELECT * result set.
+DESC select_star_customer;
++--------------+---------------+
+| name | type |
++--------------+---------------+
+| c_custkey | bigint |
+| c_name | string |
+| c_address | string |
+| c_nationkey | smallint |
+| c_phone | string |
+| c_acctbal | decimal(12,2) |
+| c_mktsegment | string |
+| c_comment | string |
++--------------+---------------+
+
+</code></pre>
+
+
+
+ <p class="p">
+ References to fields within <code class="ph codeph">STRUCT</code> columns use dot notation. If the field name is unambiguous, you can omit
+ qualifiers such as table name, column name, or even the <code class="ph codeph">ITEM</code> or <code class="ph codeph">VALUE</code> pseudocolumn names for
+ <code class="ph codeph">STRUCT</code> elements inside an <code class="ph codeph">ARRAY</code> or a <code class="ph codeph">MAP</code>.
+ </p>
+
+
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, address.city FROM customers WHERE address.zip = 94305;
+</code></pre>
+
+ <p class="p">
+ References to elements within <code class="ph codeph">ARRAY</code> columns use the <code class="ph codeph">ITEM</code> pseudocolumn:
+ </p>
+
+
+
+<pre class="pre codeblock"><code>select r_name, r_nations.item.n_name from region, region.r_nations limit 7;
++--------+----------------+
+| r_name | item.n_name |
++--------+----------------+
+| EUROPE | UNITED KINGDOM |
+| EUROPE | RUSSIA |
+| EUROPE | ROMANIA |
+| EUROPE | GERMANY |
+| EUROPE | FRANCE |
+| ASIA | VIETNAM |
+| ASIA | CHINA |
++--------+----------------+
+</code></pre>
+
+ <p class="p">
+ References to fields within <code class="ph codeph">MAP</code> columns use the <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> pseudocolumns.
+ In this example, once the query establishes the alias <code class="ph codeph">MAP_FIELD</code> for a <code class="ph codeph">MAP</code> column with a
+ <code class="ph codeph">STRING</code> key and an <code class="ph codeph">INT</code> value, the query can refer to <code class="ph codeph">MAP_FIELD.KEY</code> and
+ <code class="ph codeph">MAP_FIELD.VALUE</code>, which have zero, one, or many instances for each row from the containing table.
+ </p>
+
+<pre class="pre codeblock"><code>DESCRIBE table_0;
++---------+-----------------------+
+| name | type |
++---------+-----------------------+
+| field_0 | string |
+| field_1 | map<string,int> |
+...
+
+SELECT field_0, map_field.key, map_field.value
+ FROM table_0, table_0.field_1 AS map_field
+WHERE length(field_0) = 1
+LIMIT 10;
++---------+-----------+-------+
+| field_0 | key | value |
++---------+-----------+-------+
+| b | gshsgkvd | NULL |
+| b | twrtcxj6 | 18 |
+| b | 2vp5 | 39 |
+| b | fh0s | 13 |
+| v | 2 | 41 |
+| v | 8b58mz | 20 |
+| v | hw | 16 |
+| v | 65l388pyt | 29 |
+| v | 03k68g91z | 30 |
+| v | r2hlg5b | NULL |
++---------+-----------+-------+
+
+</code></pre>
+
+
+
+ <p class="p">
+ When complex types are nested inside each other, you use a combination of joins, pseudocolumn names, and dot notation to refer
+ to specific fields at the appropriate level. This is the most frequent form of query syntax for complex columns, because the
+ typical use case involves two levels of complex types, such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements.
+ </p>
+
+
+
+
+
+<pre class="pre codeblock"><code>SELECT id, phone_numbers.area_code FROM contact_info_many_structs INNER JOIN contact_info_many_structs.phone_numbers phone_numbers LIMIT 3;
+</code></pre>
+
+ <p class="p">
+ You can express relationships between <code class="ph codeph">ARRAY</code> and <code class="ph codeph">MAP</code> columns at different levels as joins. You
+ include comparison operators between fields at the top level and within the nested type columns so that Impala can do the
+ appropriate join operation.
+ </p>
+
+
+
+
+
+
+
+
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Many of the complex type examples refer to tables
+ such as <code class="ph codeph">CUSTOMER</code> and <code class="ph codeph">REGION</code>
+ adapted from the tables used in the TPC-H benchmark.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_sample_schema">Sample Schema and Data for Experimenting with Impala Complex Types</a>
+ for the table definitions.
+ </div>
+
+ <p class="p">
+ For example, the following queries work equivalently. They each return customer and order data for customers that have at least
+ one order.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_name, o.o_orderkey FROM customer c, c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821 |
+| Customer#000072578 | 2079810 |
+| Customer#000072578 | 5768068 |
+| Customer#000072578 | 1805604 |
+| Customer#000072578 | 3436389 |
++--------------------+------------+
+
+SELECT c.c_name, o.o_orderkey FROM customer c INNER JOIN c.c_orders o LIMIT 5;
++--------------------+------------+
+| c_name | o_orderkey |
++--------------------+------------+
+| Customer#000072578 | 558821 |
+| Customer#000072578 | 2079810 |
+| Customer#000072578 | 5768068 |
+| Customer#000072578 | 1805604 |
+| Customer#000072578 | 3436389 |
++--------------------+------------+
+</code></pre>
+
+ <p class="p">
+ The following query using an outer join returns customers that have orders, plus customers with no orders (no entries in the
+ <code class="ph codeph">C_ORDERS</code> array):
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, o.o_orderkey
+ FROM customer c LEFT OUTER JOIN c.c_orders o
+LIMIT 5;
++-----------+------------+
+| c_custkey | o_orderkey |
++-----------+------------+
+| 60210 | NULL |
+| 147873 | NULL |
+| 72578 | 558821 |
+| 72578 | 2079810 |
+| 72578 | 5768068 |
++-----------+------------+
+
+</code></pre>
+
+ <p class="p">
+ The following query returns <em class="ph i">only</em> customers that have no orders. (With <code class="ph codeph">LEFT ANTI JOIN</code> or <code class="ph codeph">LEFT
+ SEMI JOIN</code>, the query can only refer to columns from the left-hand table, because by definition there is no matching
+ information in the right-hand table.)
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c.c_custkey, c.c_name
+ FROM customer c LEFT ANTI JOIN c.c_orders o
+LIMIT 5;
++-----------+--------------------+
+| c_custkey | c_name |
++-----------+--------------------+
+| 60210 | Customer#000060210 |
+| 147873 | Customer#000147873 |
+| 141576 | Customer#000141576 |
+| 85365 | Customer#000085365 |
+| 70998 | Customer#000070998 |
++-----------+--------------------+
+
+</code></pre>
+
+
+
+ <p class="p">
+ You can also perform correlated subqueries to examine the properties of complex type columns for each row in the result set.
+ </p>
+
+ <p class="p">
+ Count the number of orders per customer. Note the correlated reference to the table alias <code class="ph codeph">C</code>. The
+ <code class="ph codeph">COUNT(*)</code> operation applies to all the elements of the <code class="ph codeph">C_ORDERS</code> array for the corresponding
+ row, avoiding the need for a <code class="ph codeph">GROUP BY</code> clause.
+ </p>
+
+<pre class="pre codeblock"><code>select c_name, howmany FROM customer c, (SELECT COUNT(*) howmany FROM c.c_orders) v limit 5;
++--------------------+---------+
+| c_name | howmany |
++--------------------+---------+
+| Customer#000030065 | 15 |
+| Customer#000065455 | 18 |
+| Customer#000113644 | 21 |
+| Customer#000111078 | 0 |
+| Customer#000024621 | 0 |
++--------------------+---------+
+</code></pre>
+
+ <p class="p">
+ Count the number of orders per customer, ignoring any customers that have not placed any orders:
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany_orders
+FROM
+ customer c,
+ (SELECT COUNT(*) howmany_orders FROM c.c_orders) subq1
+WHERE howmany_orders > 0
+LIMIT 5;
++--------------------+----------------+
+| c_name | howmany_orders |
++--------------------+----------------+
+| Customer#000072578 | 7 |
+| Customer#000046378 | 26 |
+| Customer#000069815 | 11 |
+| Customer#000079058 | 12 |
+| Customer#000092239 | 26 |
++--------------------+----------------+
+</code></pre>
+
+ <p class="p">
+ Count the number of line items in each order. The reference to <code class="ph codeph">C.C_ORDERS</code> in the <code class="ph codeph">FROM</code> clause
+ is needed because the <code class="ph codeph">O_ORDERKEY</code> field is a member of the elements in the <code class="ph codeph">C_ORDERS</code> array. The
+ subquery labelled <code class="ph codeph">SUBQ1</code> is correlated: it is re-evaluated for the <code class="ph codeph">C_ORDERS.O_LINEITEMS</code> array
+ from each row of the <code class="ph codeph">CUSTOMERS</code> table.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, o_orderkey, howmany_line_items
+FROM
+ customer c,
+ c.c_orders t2,
+ (SELECT COUNT(*) howmany_line_items FROM c.c_orders.o_lineitems) subq1
+WHERE howmany_line_items > 0
+LIMIT 5;
++--------------------+------------+--------------------+
+| c_name | o_orderkey | howmany_line_items |
++--------------------+------------+--------------------+
+| Customer#000020890 | 1884930 | 95 |
+| Customer#000020890 | 4570754 | 95 |
+| Customer#000020890 | 3771072 | 95 |
+| Customer#000020890 | 2555489 | 95 |
+| Customer#000020890 | 919171 | 95 |
++--------------------+------------+--------------------+
+</code></pre>
+
+ <p class="p">
+ Get the number of orders, the average order price, and the maximum items in any order per customer. For this example, the
+ subqueries labelled <code class="ph codeph">SUBQ1</code> and <code class="ph codeph">SUBQ2</code> are correlated: they are re-evaluated for each row from
+ the original <code class="ph codeph">CUSTOMER</code> table, and only apply to the complex columns associated with that row.
+ </p>
+
+<pre class="pre codeblock"><code>SELECT c_name, howmany, average_price, most_items
+FROM
+ customer c,
+ (SELECT COUNT(*) howmany, AVG(o_totalprice) average_price FROM c.c_orders) subq1,
+ (SELECT MAX(l_quantity) most_items FROM c.c_orders.o_lineitems ) subq2
+LIMIT 5;
++--------------------+---------+---------------+------------+
+| c_name | howmany | average_price | most_items |
++--------------------+---------+---------------+------------+
+| Customer#000030065 | 15 | 128908.34 | 50.00 |
+| Customer#000088191 | 0 | NULL | NULL |
+| Customer#000101555 | 10 | 164250.31 | 50.00 |
+| Customer#000022092 | 0 | NULL | NULL |
+| Customer#000036277 | 27 | 166040.06 | 50.00 |
++--------------------+---------+---------------+------------+
+</code></pre>
+
+ <p class="p">
+ For example, these queries show how to access information about the <code class="ph codeph">ARRAY</code> elements within the
+ <code class="ph codeph">CUSTOMER</code> table from the <span class="q">"nested TPC-H"</span> schema, starting with the initial <code class="ph codeph">ARRAY</code> elements
+ and progressing to examine the <code class="ph codeph">STRUCT</code> fields of the <code class="ph codeph">ARRAY</code>, and then the elements nested within
+ another <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code>:
+ </p>
+
+<pre class="pre codeblock"><code>-- How many orders does each customer have?
+-- The type of the ARRAY column doesn't matter, this is just counting the elements.
+SELECT c_custkey, count(*)
+ FROM customer, customer.c_orders
+GROUP BY c_custkey
+LIMIT 5;
++-----------+----------+
+| c_custkey | count(*) |
++-----------+----------+
+| 61081 | 21 |
+| 115987 | 15 |
+| 69685 | 19 |
+| 109124 | 15 |
+| 50491 | 12 |
++-----------+----------+
+
+-- How many line items are part of each customer order?
+-- Now we examine a field from a STRUCT nested inside the ARRAY.
+SELECT c_custkey, c_orders.o_orderkey, count(*)
+ FROM customer, customer.c_orders c_orders, c_orders.o_lineitems
+GROUP BY c_custkey, c_orders.o_orderkey
+LIMIT 5;
++-----------+------------+----------+
+| c_custkey | o_orderkey | count(*) |
++-----------+------------+----------+
+| 63367 | 4985959 | 7 |
+| 53989 | 1972230 | 2 |
+| 143513 | 5750498 | 5 |
+| 17849 | 4857989 | 1 |
+| 89881 | 1046437 | 1 |
++-----------+------------+----------+
+
+-- What are the line items in each customer order?
+-- One of the STRUCT fields inside the ARRAY is another
+-- ARRAY containing STRUCT elements. The join finds
+-- all the related items from both levels of ARRAY.
+SELECT c_custkey, o_orderkey, l_partkey
+ FROM customer, customer.c_orders, c_orders.o_lineitems
+LIMIT 5;
++-----------+------------+-----------+
+| c_custkey | o_orderkey | l_partkey |
++-----------+------------+-----------+
+| 113644 | 2738497 | 175846 |
+| 113644 | 2738497 | 27309 |
+| 113644 | 2738497 | 175873 |
+| 113644 | 2738497 | 88559 |
+| 113644 | 2738497 | 8032 |
++-----------+------------+-----------+
+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="complex_types_using__pseudocolumns">
+
+ <h3 class="title topictitle3" id="ariaid-title16">Pseudocolumns for ARRAY and MAP Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Each element in an <code class="ph codeph">ARRAY</code> type has a position, indexed starting from zero, and a value. Each element in a
+ <code class="ph codeph">MAP</code> type represents a key-value pair. Impala provides pseudocolumns that let you retrieve this metadata as part
+ of a query, or filter query results by including such things in a <code class="ph codeph">WHERE</code> clause. You refer to the pseudocolumns as
+ part of qualified column names in queries:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">ITEM</code>: The value of an array element. If the <code class="ph codeph">ARRAY</code> contains <code class="ph codeph">STRUCT</code> elements,
+ you can refer to either <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM.<var class="keyword varname">field_name</var></code> or use the shorthand
+ <code class="ph codeph"><var class="keyword varname">array_name</var>.<var class="keyword varname">field_name</var></code>.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">POS</code>: The position of an element within an array.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">KEY</code>: The value forming the first part of a key-value pair in a map. It is not necessarily unique.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">VALUE</code>: The data item forming the second part of a key-value pair in a map. If the <code class="ph codeph">VALUE</code> part
+ of the <code class="ph codeph">MAP</code> element is a <code class="ph codeph">STRUCT</code>, you can refer to either
+ <code class="ph codeph"><var class="keyword varname">map_name</var>.VALUE.<var class="keyword varname">field_name</var></code> or use the shorthand
+ <code class="ph codeph"><var class="keyword varname">map_name</var>.<var class="keyword varname">field_name</var></code>.
+ </li>
+ </ul>
+
+
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested3" aria-labelledby="item__pos" id="pseudocolumns__item">
+
+ <h4 class="title topictitle4" id="item__pos">ITEM and POS Pseudocolumns</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When an <code class="ph codeph">ARRAY</code> column contains <code class="ph codeph">STRUCT</code> elements, you can refer to a field within the
+ <code class="ph codeph">STRUCT</code> using a qualified name of the form
+ <code class="ph codeph"><var class="keyword varname">array_column</var>.<var class="keyword varname">field_name</var></code>. If the <code class="ph codeph">ARRAY</code> contains scalar
+ values, Impala recognizes the special name <code class="ph codeph"><var class="keyword varname">array_column</var>.ITEM</code> to represent the value of each
+ scalar array element. For example, if a column contained an <code class="ph codeph">ARRAY</code> where each element was a
+ <code class="ph codeph">STRING</code>, you would use <code class="ph codeph"><var class="keyword varname">array_name</var>.ITEM</code> to refer to each scalar value in the
+ <code class="ph codeph">SELECT</code> list, or the <code class="ph codeph">WHERE</code> or other clauses.
+ </p>
+
+ <p class="p">
+ This example shows a table with two <code class="ph codeph">ARRAY</code> columns whose elements are of the scalar type
+ <code class="ph codeph">STRING</code>. When referring to the values of the array elements in the <code class="ph codeph">SELECT</code> list,
+ <code class="ph codeph">WHERE</code> clause, or <code class="ph codeph">ORDER BY</code> clause, you use the <code class="ph codeph">ITEM</code> pseudocolumn because
+ within the array, the individual elements have no defined names.
+ </p>
+
+<pre class="pre codeblock"><code>create TABLE persons_of_interest
+(
+person_id BIGINT,
+aliases ARRAY <STRING>,
+associates ARRAY <STRING>,
+real_name STRING
+)
+STORED AS PARQUET;
+
+-- Get all the aliases of each person.
+SELECT real_name, aliases.ITEM
+ FROM persons_of_interest, persons_of_interest.aliases
+ORDER BY real_name, aliases.item;
+
+-- Search for particular associates of each person.
+SELECT real_name, associates.ITEM
+ FROM persons_of_interest, persons_of_interest.associates
+WHERE associates.item LIKE '% MacGuffin';
+
+</code></pre>
+
+ <p class="p">
+ Because an array is inherently an ordered data structure, Impala recognizes the special name
+ <code class="ph codeph"><var class="keyword varname">array_column</var>.POS</code> to represent the numeric position of each element within the array. The
+ <code class="ph codeph">POS</code> pseudocolumn lets you filter or reorder the result set based on the sequence of array elements.
+ </p>
+
+ <p class="p">
+ The following example uses a table from a flattened version of the TPC-H schema. The <code class="ph codeph">REGION</code> table only has a
+ few rows, such as one row for Europe and one for Asia. The row for each region represents all the countries in that region as an
+ <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> elements:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > desc region;
++-------------+--------------------------------------------------------------------+
+| name | type |
++-------------+--------------------------------------------------------------------+
+| r_regionkey | smallint |
+| r_name | string |
+| r_comment | string |
+| r_nations | array<struct<n_nationkey:smallint,n_name:string,n_comment:string>> |
++-------------+--------------------------------------------------------------------+
+
+</code></pre>
+
+ <p class="p">
+ To find the countries within a specific region, you use a join query. To find out the order of elements in the array, you also
+ refer to the <code class="ph codeph">POS</code> pseudocolumn in the select list:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > SELECT r1.r_name, r2.n_name, <strong class="ph b">r2.POS</strong>
+ > FROM region r1 INNER JOIN r1.r_nations r2
+ > WHERE r1.r_name = 'ASIA';
++--------+-----------+-----+
+| r_name | n_name | pos |
++--------+-----------+-----+
+| ASIA | VIETNAM | 0 |
+| ASIA | CHINA | 1 |
+| ASIA | JAPAN | 2 |
+| ASIA | INDONESIA | 3 |
+| ASIA | INDIA | 4 |
++--------+-----------+-----+
+</code></pre>
+
+ <p class="p">
+ Once you know the positions of the elements, you can use that information in subsequent queries, for example to change the
+ ordering of results from the complex type column or to filter certain elements from the array:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > SELECT r1.r_name, r2.n_name, r2.POS
+ > FROM region r1 INNER JOIN r1.r_nations r2
+ > WHERE r1.r_name = 'ASIA'
+ > <strong class="ph b">ORDER BY r2.POS DESC</strong>;
++--------+-----------+-----+
+| r_name | n_name | pos |
++--------+-----------+-----+
+| ASIA | INDIA | 4 |
+| ASIA | INDONESIA | 3 |
+| ASIA | JAPAN | 2 |
+| ASIA | CHINA | 1 |
+| ASIA | VIETNAM | 0 |
++--------+-----------+-----+
+[localhost:21000] > SELECT r1.r_name, r2.n_name, r2.POS
+ > FROM region r1 INNER JOIN r1.r_nations r2
+ > WHERE r1.r_name = 'ASIA' AND <strong class="ph b">r2.POS BETWEEN 1 and 3</strong>;
++--------+-----------+-----+
+| r_name | n_name | pos |
++--------+-----------+-----+
+| ASIA | CHINA | 1 |
+| ASIA | JAPAN | 2 |
+| ASIA | INDONESIA | 3 |
++--------+-----------+-----+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested3" aria-labelledby="key__value" id="pseudocolumns__key">
+
+ <h4 class="title topictitle4" id="key__value">KEY and VALUE Pseudocolumns</h4>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">MAP</code> data type is suitable for representing sparse or wide data structures, where each row might only have
+ entries for a small subset of named fields. Because the element names (the map keys) vary depending on the row, a query must be
+ able to refer to both the key and the value parts of each key-value pair. The <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code>
+ pseudocolumns let you refer to the parts of the key-value pair independently within the query, as
+ <code class="ph codeph"><var class="keyword varname">map_column</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">KEY</code> must always be a scalar type, such as <code class="ph codeph">STRING</code>, <code class="ph codeph">BIGINT</code>, or
+ <code class="ph codeph">TIMESTAMP</code>. It can be <code class="ph codeph">NULL</code>. Values of the <code class="ph codeph">KEY</code> field are not necessarily unique
+ within the same <code class="ph codeph">MAP</code>. You apply any required <code class="ph codeph">DISTINCT</code>, <code class="ph codeph">GROUP BY</code>, and other
+ clauses in the query, and loop through the result set to process all the values matching any specified keys.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">VALUE</code> can be either a scalar type or another complex type. If the <code class="ph codeph">VALUE</code> is a
+ <code class="ph codeph">STRUCT</code>, you can construct a qualified name
+ <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE.<var class="keyword varname">struct_field</var></code> to refer to the individual fields inside
+ the value part. If the <code class="ph codeph">VALUE</code> is an <code class="ph codeph">ARRAY</code> or another <code class="ph codeph">MAP</code>, you must include
+ another join condition that establishes a table alias for <code class="ph codeph"><var class="keyword varname">map_column</var>.VALUE</code>, and then
+ construct another qualified name using that alias, for example <code class="ph codeph"><var class="keyword varname">table_alias</var>.ITEM</code> or
+ <code class="ph codeph"><var class="keyword varname">table_alias</var>.KEY</code> and <code class="ph codeph"><var class="keyword varname">table_alias</var>.VALUE</code>
+ </p>
+
+ <p class="p">
+ The following example shows different ways to access a <code class="ph codeph">MAP</code> column using the <code class="ph codeph">KEY</code> and
+ <code class="ph codeph">VALUE</code> pseudocolumns. The <code class="ph codeph">DETAILS</code> column has a <code class="ph codeph">STRING</code> first part with short,
+ standardized values such as <code class="ph codeph">'Recurring'</code>, <code class="ph codeph">'Lucid'</code>, or <code class="ph codeph">'Anxiety'</code>. This is the
+ <span class="q">"key"</span> that is used to look up particular kinds of elements from the <code class="ph codeph">MAP</code>. The second part, also a
+ <code class="ph codeph">STRING</code>, is a longer free-form explanation. Impala gives you the standard pseudocolumn names
+ <code class="ph codeph">KEY</code> and <code class="ph codeph">VALUE</code> for the two parts, and you apply your own conventions and interpretations to the
+ underlying values.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If you find that the single-item nature of the <code class="ph codeph">VALUE</code> makes it difficult to model your data accurately, the
+ solution is typically to add some nesting to the complex type. For example, to have several sets of key-value pairs, make the
+ column an <code class="ph codeph">ARRAY</code> whose elements are <code class="ph codeph">MAP</code>. To make a set of key-value pairs that holds more
+ elaborate information, make a <code class="ph codeph">MAP</code> column whose <code class="ph codeph">VALUE</code> part contains an <code class="ph codeph">ARRAY</code>
+ or a <code class="ph codeph">STRUCT</code>.
+ </div>
+
+<pre class="pre codeblock"><code>CREATE TABLE dream_journal
+(
+ dream_id BIGINT,
+ details MAP <STRING,STRING>
+)
+STORED AS PARQUET;
+
+
+-- What are all the types of dreams that are recorded?
+SELECT DISTINCT details.KEY FROM dream_journal, dream_journal.details;
+
+-- How many lucid dreams were recorded?
+-- Because there is no GROUP BY, we count the 'Lucid' keys across all rows.
+SELECT <strong class="ph b">COUNT(details.KEY)</strong>
+ FROM dream_journal, dream_journal.details
+WHERE <strong class="ph b">details.KEY = 'Lucid'</strong>;
+
+-- Print a report of a subset of dreams, filtering based on both the lookup key
+-- and the detailed value.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE AS "Dream Summary"</strong>
+ FROM dream_journal, dream_journal.details
+WHERE
+ <strong class="ph b">details.KEY IN ('Happy', 'Pleasant', 'Joyous')</strong>
+ AND <strong class="ph b">details.VALUE LIKE '%childhood%'</strong>;
+</code></pre>
+
+ <p class="p">
+ The following example shows a more elaborate version of the previous table, where the <code class="ph codeph">VALUE</code> part of the
+ <code class="ph codeph">MAP</code> entry is a <code class="ph codeph">STRUCT</code> rather than a scalar type. Now instead of referring to the
+ <code class="ph codeph">VALUE</code> pseudocolumn directly, you use dot notation to refer to the <code class="ph codeph">STRUCT</code> fields inside it.
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE better_dream_journal
+(
+ dream_id BIGINT,
+ details MAP <STRING,STRUCT <summary: STRING, when_happened: TIMESTAMP, duration: DECIMAL(5,2), woke_up: BOOLEAN> >
+)
+STORED AS PARQUET;
+
+
+-- Do more elaborate reporting and filtering by examining multiple attributes within the same dream.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.VALUE.summary AS "Dream Summary"</strong>, <strong class="ph b">details.VALUE.duration AS "Duration"</strong>
+ FROM better_dream_journal, better_dream_journal.details
+WHERE
+ <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+ AND <strong class="ph b">details.VALUE.duration > 60</strong>
+ AND <strong class="ph b">details.VALUE.woke_up = TRUE</strong>;
+
+-- Remember that if the ITEM or VALUE contains a STRUCT, you can reference
+-- the STRUCT fields directly without the .ITEM or .VALUE qualifier.
+SELECT dream_id, <strong class="ph b">details.KEY AS "Dream Type"</strong>, <strong class="ph b">details.summary AS "Dream Summary"</strong>, <strong class="ph b">details.duration AS "Duration"</strong>
+ FROM better_dream_journal, better_dream_journal.details
+WHERE
+ <strong class="ph b">details.KEY IN ('Anxiety', 'Nightmare')</strong>
+ AND <strong class="ph b">details.duration > 60</strong>
+ AND <strong class="ph b">details.woke_up = TRUE</strong>;
+</code></pre>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="complex_types_using__complex_types_etl">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title19">Loading Data Containing Complex Types</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Because the Impala <code class="ph codeph">INSERT</code> statement does not currently support creating new data with complex type columns, or
+ copying existing complex type values from one table to another, you primarily use Impala to query Parquet tables with complex
+ types where the data was inserted through Hive, or create tables with complex types where you already have existing Parquet data
+ files.
+
<TRUNCATED>
[20/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_new_features.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_new_features.html b/docs/build3x/html/topics/impala_new_features.html
new file mode 100644
index 0000000..cd1ecc5
--- /dev/null
+++ b/docs/build3x/html/topics/impala_new_features.html
@@ -0,0 +1,3806 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="new_features"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>New Features in Apache Impala</title></head><body id="new_features"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">New Features in Apache Impala</span></h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ This release of Impala contains the following changes and enhancements from previous releases.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="new_features__new_features_300">
+ <h2 class="title topictitle2" id="ariaid-title2">New Features in <span class="keyword">Impala 3.0</span></h2>
+ <div class="body conbody">
+ <p class="p">
+ For the full list of issues closed in this release, including the
+ issues marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-3.0.html" target="_blank">changelog for <span class="keyword">Impala 3.0</span></a>.
+ </p>
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="new_features__new_features_2120">
+
+ <h2 class="title topictitle2" id="ariaid-title3">New Features in <span class="keyword">Impala 2.12</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including the issues
+ marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.12.html" target="_blank">changelog for <span class="keyword">Impala 2.12</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="new_features__new_features_2110">
+
+ <h2 class="title topictitle2" id="ariaid-title4">New Features in <span class="keyword">Impala 2.11</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including the issues
+ marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.11.html" target="_blank">changelog for <span class="keyword">Impala 2.11</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="new_features__new_features_2100">
+
+ <h2 class="title topictitle2" id="ariaid-title5">New Features in <span class="keyword">Impala 2.10</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including the issues
+ marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.10.html" target="_blank">changelog for <span class="keyword">Impala 2.10</span></a>.
+ </p>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="new_features__new_features_290">
+
+ <h2 class="title topictitle2" id="ariaid-title6">New Features in <span class="keyword">Impala 2.9</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For the full list of issues closed in this release, including the issues
+ marked as <span class="q">"new features"</span> or <span class="q">"improvements"</span>, see the
+ <a class="xref" href="https://impala.apache.org/docs/changelog-2.9.html" target="_blank">changelog for <span class="keyword">Impala 2.9</span></a>.
+ </p>
+
+ <p class="p">
+ The following are some of the most significant new features in this release:
+ </p>
+
+ <ul class="ul" id="new_features_290__feature_list">
+ <li class="li">
+ <p class="p">
+ A new function, <code class="ph codeph">replace()</code>, which is faster than
+ <code class="ph codeph">regexp_replace()</code> for simple string substitutions.
+ See <a class="xref" href="impala_string_functions.html">Impala String Functions</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Startup flags for the <span class="keyword cmdname">impalad</span> daemon, <code class="ph codeph">is_executor</code>
+ and <code class="ph codeph">is_coordinator</code>, let you divide the work on a large, busy cluster
+ between a small number of hosts acting as query coordinators, and a larger number of
+ hosts acting as query executors. By default, each host can act in both roles,
+ potentially introducing bottlenecks during heavily concurrent workloads.
+ See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="new_features__new_features_280">
+
+ <h2 class="title topictitle2" id="ariaid-title7">New Features in <span class="keyword">Impala 2.8</span></h2>
+
+ <div class="body conbody">
+
+ <ul class="ul" id="new_features_280__feature_list">
+ <li class="li">
+ <p class="p">
+ Performance and scalability improvements:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement can
+ take advantage of multithreading.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved scalability for highly concurrent loads by reducing the possibility of TCP/IP timeouts.
+ A configuration setting, <code class="ph codeph">accepted_cnxn_queue_depth</code>, can be adjusted upwards to
+ avoid this type of timeout on large clusters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Several performance improvements were made to the mechanism for generating native code:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Some queries involving analytic functions can take better advantage of native code generation.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Modules produced during intermediate code generation are organized
+ to be easier to cache and reuse during the lifetime of a long-running or complicated query.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+ (less time for the codegen phase) for tables with a large number
+ of columns, especially for tables containing <code class="ph codeph">TIMESTAMP</code>
+ columns.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The logic for determining whether or not to use a runtime filter is more reliable, and the
+ evaluation process itself is faster because of native code generation.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">MT_DOP</code> query option enables
+ multithreading for a number of Impala operations.
+ <code class="ph codeph">COMPUTE STATS</code> statements for Parquet tables
+ use a default of <code class="ph codeph">MT_DOP=4</code> to improve the
+ intra-node parallelism and CPU efficiency of this data-intensive
+ operation.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement is more efficient
+ (less time for the codegen phase) for tables with a large number
+ of columns.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new hint, <code class="ph codeph">CLUSTERED</code>,
+ allows Impala <code class="ph codeph">INSERT</code> operations on a Parquet table
+ that use dynamic partitioning to process a high number of
+ partitions in a single statement. The data is ordered based on the
+ partition key columns, and each partition is only written
+ by a single host, reducing the amount of memory needed to buffer
+ Parquet data while the data blocks are being constructed.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The new configuration setting <code class="ph codeph">inc_stats_size_limit_bytes</code>
+ lets you reduce the load on the catalog server when running the
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement for very large tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala folds many constant expressions within query statements,
+ rather than evaluating them for each row. This optimization
+ is especially useful when using functions to manipulate and
+ format <code class="ph codeph">TIMESTAMP</code> values, such as the result
+ of an expression such as <code class="ph codeph">to_date(now() - interval 1 day)</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Parsing of complicated expressions is faster. This speedup is
+ especially useful for queries containing large <code class="ph codeph">CASE</code>
+ expressions.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Evaluation is faster for <code class="ph codeph">IN</code> operators with many constant
+ arguments. The same performance improvement applies to other functions
+ with many constant arguments.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala optimizes identical comparison operators within multiple <code class="ph codeph">OR</code>
+ blocks.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The reporting for wall-clock times and total CPU time in profile output is more accurate.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new query option, <code class="ph codeph">SCRATCH_LIMIT</code>, lets you restrict the amount of
+ space used when a query exceeds the memory limit and activates the <span class="q">"spill to disk"</span> mechanism.
+ This option helps to avoid runaway queries or make queries <span class="q">"fail fast"</span> if they require more
+ memory than anticipated. You can prevent runaway queries from using excessive amounts of spill space,
+ without restarting the cluster to turn the spilling feature off entirely.
+ See <a class="xref" href="impala_scratch_limit.html#scratch_limit">SCRATCH_LIMIT Query Option</a> for details.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Integration with Apache Kudu:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The experimental Impala support for the Kudu storage layer has been folded
+ into the main Impala development branch. Impala can now directly access Kudu tables,
+ opening up new capabilities such as enhanced DML operations and continuous ingestion.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DELETE</code> statement is a flexible way to remove data from a Kudu table. Previously,
+ removing data from an Impala table involved removing or rewriting the underlying data files, dropping entire partitions,
+ or rewriting the entire table. This Impala statement only works for Kudu tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">UPDATE</code> statement is a flexible way to modify data within a Kudu table. Previously,
+ updating data in an Impala table involved replacing the underlying data files, dropping entire partitions,
+ or rewriting the entire table. This Impala statement only works for Kudu tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">UPSERT</code> statement is a flexible way to ingest, modify, or both data within a Kudu table. Previously,
+ ingesting data that might contain duplicates involved an inefficient multi-stage operation, and there was no
+ built-in protection against duplicate data. The <code class="ph codeph">UPSERT</code> statement, in combination with
+ the primary key designation for Kudu tables, lets you add or replace rows in a single operation, and
+ automatically avoids creating any duplicate data.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">CREATE TABLE</code> statement gains some new clauses that are specific to Kudu tables:
+ <code class="ph codeph">PARTITION BY</code>, <code class="ph codeph">PARTITIONS</code>, <code class="ph codeph">STORED AS KUDU</code>, and column
+ attributes <code class="ph codeph">PRIMARY KEY</code>, <code class="ph codeph">NULL</code> and <code class="ph codeph">NOT NULL</code>,
+ <code class="ph codeph">ENCODING</code>, <code class="ph codeph">COMPRESSION</code>, <code class="ph codeph">DEFAULT</code>, and <code class="ph codeph">BLOCK_SIZE</code>.
+ These clauses replace the explicit <code class="ph codeph">TBLPROPERTIES</code> settings that were required in the
+ early experimental phases of integration between Impala and Kudu.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">ALTER TABLE</code> statement can change certain attributes of Kudu tables.
+ You can add, drop, or rename columns.
+ You can add or drop range partitions.
+ You can change the <code class="ph codeph">TBLPROPERTIES</code> value to rename or point to a different underlying Kudu table,
+ independently from the Impala table name in the metastore database.
+ You cannot change the data type of an existing column in a Kudu table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">SHOW PARTITIONS</code> statement displays information about the distribution of data
+ between partitions in Kudu tables. A new variation, <code class="ph codeph">SHOW RANGE PARTITIONS</code>,
+ displays information about the Kudu-specific partitions that apply across ranges of key values.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Not all Impala data types are supported in Kudu tables. In particular, currently the Impala
+ <code class="ph codeph">TIMESTAMP</code> type is not allowed in a Kudu table. Impala does not recognize the
+ <code class="ph codeph">UNIXTIME_MICROS</code> Kudu type when it is present in a Kudu table. (These two
+ representations of date/time data use different units and are not directly compatible.)
+ You cannot create columns of type <code class="ph codeph">TIMESTAMP</code>, <code class="ph codeph">DECIMAL</code>,
+ <code class="ph codeph">VARCHAR</code>, or <code class="ph codeph">CHAR</code> within a Kudu table. Within a query, you can
+ cast values in a result set to these types. Certain types, such as <code class="ph codeph">BOOLEAN</code>,
+ cannot be used as primary key columns.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Currently, Kudu tables are not interchangeable between Impala and Hive the way other kinds of Impala tables are.
+ Although the metadata for Kudu tables is stored in the metastore database, currently Hive cannot access Kudu tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">INSERT</code> statement works for Kudu tables. The organization
+ of the Kudu data makes it more efficient than with HDFS-backed tables to insert
+ data in small batches, such as with the <code class="ph codeph">INSERT ... VALUES</code> syntax.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Some audit data is recorded for data governance purposes.
+ All <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>, and <code class="ph codeph">UPSERT</code> statements are characterized
+ as <code class="ph codeph">INSERT</code> operations in the audit log. Currently, lineage metadata is not generated for
+ <code class="ph codeph">UPDATE</code> and <code class="ph codeph">DELETE</code> operations on Kudu tables.
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ Currently, Kudu tables have limited support for Sentry:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Access to Kudu tables must be granted to roles as usual.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Currently, access to a Kudu table through Sentry is <span class="q">"all or nothing"</span>.
+ You cannot enforce finer-grained permissions such as at the column level,
+ or permissions on certain operations such as <code class="ph codeph">INSERT</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Only users with <code class="ph codeph">ALL</code> privileges on <code class="ph codeph">SERVER</code> can create external Kudu tables.
+ </p>
+ </li>
+ </ul>
+ Because non-SQL APIs can access Kudu data without going through Sentry
+ authorization, currently the Sentry support is considered preliminary.
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ Equality and <code class="ph codeph">IN</code> predicates in Impala queries are pushed to
+ Kudu and evaluated efficiently by the Kudu storage layer.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ <strong class="ph b">Security:</strong>
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala can take advantage of the S3 encrypted credential
+ store, to avoid exposing the secret key when accessing
+ data stored on S3.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> statement now updates information about HDFS block locations.
+ Therefore, you can perform a fast and efficient <code class="ph codeph">REFRESH</code> after doing an HDFS
+ rebalancing operation instead of the more expensive <code class="ph codeph">INVALIDATE METADATA</code> statement.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1654" target="_blank">IMPALA-1654</a>]
+ Several kinds of DDL operations
+ can now work on a range of partitions. The partitions can be specified
+ using operators such as <code class="ph codeph"><</code>, <code class="ph codeph">>=</code>, and
+ <code class="ph codeph">!=</code> rather than just an equality predicate applying to a single
+ partition.
+ This new feature extends the syntax of several clauses
+ of the <code class="ph codeph">ALTER TABLE</code> statement
+ (<code class="ph codeph">DROP PARTITION</code>, <code class="ph codeph">SET [UN]CACHED</code>,
+ <code class="ph codeph">SET FILEFORMAT | SERDEPROPERTIES | TBLPROPERTIES</code>),
+ the <code class="ph codeph">SHOW FILES</code> statement, and the
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+ It does not apply to statements that are defined to only apply to a single
+ partition, such as <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code>,
+ <code class="ph codeph">SET LOCATION</code>, and <code class="ph codeph">INSERT</code> with a static
+ partitioning clause.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">instr()</code> function has optional second and third arguments, representing
+ the character to position to begin searching for the substring, and the Nth occurrence
+ of the substring to find.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved error handling for malformed Avro data. In particular, incorrect
+ precision or scale for <code class="ph codeph">DECIMAL</code> types is now handled.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala debug web UI:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In addition to <span class="q">"inflight"</span> and <span class="q">"finished"</span> queries, the web UI
+ now also includes a section for <span class="q">"queued"</span> queries.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <span class="ph uicontrol">/sessions</span> tab now clarifies how many of the displayed
+ sections are active, and lets you sort by <span class="ph uicontrol">Expired</span> status
+ to distinguish active sessions from expired ones.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved stability when DDL operations such as <code class="ph codeph">CREATE DATABASE</code>
+ or <code class="ph codeph">DROP DATABASE</code> are run in Hive at the same time as an Impala
+ <code class="ph codeph">INVALIDATE METADATA</code> statement.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <span class="q">"out of memory"</span> error report was made more user-friendly, with additional
+ diagnostic information to help identify the spot where the memory limit was exceeded.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved disk space usage for Java-based UDFs. Temporary copies of the associated JAR
+ files are removed when no longer needed, so that they do not accumulate across restarts
+ of the <span class="keyword cmdname">catalogd</span> daemon and potentially cause an out-of-space condition.
+ These temporary files are also created in the directory specified by the <code class="ph codeph">local_library_dir</code>
+ configuration setting, so that the storage for these temporary files can be independent
+ from any capacity limits on the <span class="ph filepath">/tmp</span> filesystem.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="new_features__new_features_270">
+
+ <h2 class="title topictitle2" id="ariaid-title8">New Features in <span class="keyword">Impala 2.7</span></h2>
+
+ <div class="body conbody">
+
+ <ul class="ul" id="new_features_270__feature_list">
+ <li class="li">
+ <p class="p">
+ Performance improvements:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3206" target="_blank">IMPALA-3206</a>]
+ Speedup for queries against <code class="ph codeph">DECIMAL</code> columns in Avro tables.
+ The code that parses <code class="ph codeph">DECIMAL</code> values from Avro now uses
+ native code generation.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3674" target="_blank">IMPALA-3674</a>]
+ Improved efficiency in LLVM code generation can reduce codegen time, especially
+ for short queries.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2979" target="_blank">IMPALA-2979</a>]
+ Improvements to scheduling on worker nodes,
+ enabled by the <code class="ph codeph">REPLICA_PREFERENCE</code> query option.
+ See <a class="xref" href="impala_replica_preference.html#replica_preference">REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)</a> for details.
+ </p>
+ </li>
+ </ul>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1683" target="_blank">IMPALA-1683</a>]
+ The <code class="ph codeph">REFRESH</code> statement can be applied to a single partition,
+ rather than the entire table. See <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a>
+ and <a class="xref" href="impala_partitioning.html#partition_refresh">Refreshing a Single Partition</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improvements to the Impala web user interface:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2767" target="_blank">IMPALA-2767</a>]
+ You can now force a session to expire by clicking a link in the web UI,
+ on the <span class="ph uicontrol">/sessions</span> tab.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3715" target="_blank">IMPALA-3715</a>]
+ The <span class="ph uicontrol">/memz</span> tab includes more information about
+ Impala memory usage.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3716" target="_blank">IMPALA-3716</a>]
+ The <span class="ph uicontrol">Details</span> page for a query now includes
+ a <span class="ph uicontrol">Memory</span> tab.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3499" target="_blank">IMPALA-3499</a>]
+ Scalability improvements to the catalog server. Impala handles internal communication
+ more efficiently for tables with large numbers of columns and partitions, where the
+ size of the metadata exceeds 2 GiB.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3677" target="_blank">IMPALA-3677</a>]
+ You can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+ Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+ without triggering a crash. See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for
+ details about the Breakpad minidump feature.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3687" target="_blank">IMPALA-3687</a>]
+ The schema reconciliation rules for Avro tables have changed slightly
+ for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns. Now, if
+ the definition of such a column is changed in the Avro schema file,
+ the column retains its <code class="ph codeph">CHAR</code> or <code class="ph codeph">VARCHAR</code>
+ type as specified in the SQL definition, but the column name and comment
+ from the Avro schema file take precedence.
+ See <a class="xref" href="impala_avro.html#avro_create_table">Creating Avro Tables</a> for details about
+ column definitions in Avro tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ [<a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3575" target="_blank">IMPALA-3575</a>]
+ Some network
+ operations now have additional timeout and retry settings. The extra
+ configuration helps avoid failed queries for transient network
+ problems, to avoid hangs when a sender or receiver fails in the
+ middle of a network transmission, and to make cancellation requests
+ more reliable despite network issues. </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="new_features__new_features_260">
+
+ <h2 class="title topictitle2" id="ariaid-title9">New Features in <span class="keyword">Impala 2.6</span></h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Improvements to Impala support for the Amazon S3 filesystem:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala can now write to S3 tables through the <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for general information about
+ using Impala with S3.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new query option, <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, lets you
+ trade off between fast <code class="ph codeph">INSERT</code> performance and
+ slower <code class="ph codeph">INSERT</code>s that are more consistent if a
+ problem occurs during the statement. The new behavior is enabled by default.
+ See <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details
+ about this option.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance improvements for the runtime filtering feature:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The default for the <code class="ph codeph">RUNTIME_FILTER_MODE</code>
+ query option is changed to <code class="ph codeph">GLOBAL</code> (the highest setting).
+ See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for
+ details about this option.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">RUNTIME_BLOOM_FILTER_SIZE</code> setting is now only used
+ as a fallback if statistics are not available; otherwise, Impala
+ uses the statistics to estimate the appropriate size to use for each filter.
+ See <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE Query Option (Impala 2.5 or higher only)</a> for
+ details about this option.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ New query options <code class="ph codeph">RUNTIME_FILTER_MIN_SIZE</code> and
+ <code class="ph codeph">RUNTIME_FILTER_MAX_SIZE</code> let you fine-tune
+ the sizes of the Bloom filter structures used for runtime filtering.
+ If the filter size derived from Impala internal estimates or from
+ the <code class="ph codeph">RUNTIME_FILTER_BLOOM_SIZE</code> falls outside the size
+ range specified by these options, any too-small filter size is adjusted
+ to the minimum, and any too-large filter size is adjusted to the maximum.
+ See <a class="xref" href="impala_runtime_filter_min_size.html#runtime_filter_min_size">RUNTIME_FILTER_MIN_SIZE Query Option (Impala 2.6 or higher only)</a>
+ and <a class="xref" href="impala_runtime_filter_max_size.html#runtime_filter_max_size">RUNTIME_FILTER_MAX_SIZE Query Option (Impala 2.6 or higher only)</a>
+ for details about these options.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Runtime filter propagation now applies to all the
+ operands of <code class="ph codeph">UNION</code> and <code class="ph codeph">UNION ALL</code>
+ operators.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Runtime filters can now be produced during join queries even
+ when the join processing activates the spill-to-disk mechanism.
+ </p>
+ </li>
+ </ul>
+ See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for
+ general information about the runtime filtering feature.
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Admission control and dynamic resource pools are enabled by default.
+ See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details
+ about admission control.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala can now manually set column statistics,
+ using the <code class="ph codeph">ALTER TABLE</code> statement with a
+ <code class="ph codeph">SET COLUMN STATS</code> clause.
+ See <a class="xref" href="impala_perf_stats.html#perf_column_stats_manual">impala_perf_stats.html#perf_column_stats_manual</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala can now write lightweight <span class="q">"minidump"</span> files, rather
+ than large core files, to save diagnostic information when
+ any of the Impala-related daemons crash. This feature uses the
+ open source <code class="ph codeph">breakpad</code> framework.
+ See <a class="xref" href="impala_breakpad.html#breakpad">Breakpad Minidumps for Impala (Impala 2.6 or higher only)</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ New query options improve interoperability with Parquet files:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option
+ lets Impala locate columns within Parquet files based on
+ column name rather than ordinal position.
+ This enhancement improves interoperability with applications
+ that write Parquet files with a different order or subset of
+ columns than are used in the Impala table.
+ See <a class="xref" href="impala_parquet_fallback_schema_resolution.html#parquet_fallback_schema_resolution">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</a>
+ for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">PARQUET_ANNOTATE_STRINGS_UTF8</code> query option
+ makes Impala include the <code class="ph codeph">UTF-8</code> annotation
+ metadata for <code class="ph codeph">STRING</code>, <code class="ph codeph">CHAR</code>,
+ and <code class="ph codeph">VARCHAR</code> columns in Parquet files created
+ by <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ statements.
+ See <a class="xref" href="impala_parquet_annotate_strings_utf8.html#parquet_annotate_strings_utf8">PARQUET_ANNOTATE_STRINGS_UTF8 Query Option (Impala 2.6 or higher only)</a>
+ for details.
+ </p>
+ </li>
+ </ul>
+ See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for general information about working
+ with Parquet files.
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improvements to security and reduction in overhead for secure clusters:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Overall performance improvements for secure clusters.
+ (TPC-H queries on a secure cluster were benchmarked
+ at roughly 3x as fast as the previous release.)
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala now recognizes the <code class="ph codeph">auth_to_local</code> setting,
+ specified through the HDFS configuration setting
+ <code class="ph codeph">hadoop.security.auth_to_local</code>.
+ This feature is disabled by default; to enable it,
+ specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+ in the <span class="keyword cmdname">impalad</span> configuration settings.
+ See <a class="xref" href="impala_kerberos.html#auth_to_local">Mapping Kerberos Principals to Short Names for Impala</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Timing improvements in the mechanism for the <span class="keyword cmdname">impalad</span>
+ daemon to acquire Kerberos tickets. This feature spreads out the overhead
+ on the KDC during Impala startup, especially for large clusters.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ For Kerberized clusters, the Catalog service now uses
+ the Kerberos principal instead of the operating sytem user that runs
+ the <span class="keyword cmdname">catalogd</span> daemon.
+ This eliminates the requirement to configure a <code class="ph codeph">hadoop.user.group.static.mapping.overrides</code>
+ setting to put the OS user into the Sentry administrative group, on clusters where the principal
+ and the OS user name for this user are different.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Overall performance improvements for join queries, by using a prefetching mechanism
+ while building the in-memory hash table to evaluate join predicates.
+ See <a class="xref" href="impala_prefetch_mode.html#prefetch_mode">PREFETCH_MODE Query Option (Impala 2.6 or higher only)</a> for the query option
+ to control this optimization.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <span class="keyword cmdname">impala-shell</span> interpreter has a new command,
+ <code class="ph codeph">SOURCE</code>, that lets you run a set of SQL statements
+ or other <span class="keyword cmdname">impala-shell</span> commands stored in a file.
+ You can run additional <code class="ph codeph">SOURCE</code> commands from inside
+ a file, to set up flexible sequences of statements for use cases
+ such as schema setup, ETL, or reporting.
+ See <a class="xref" href="impala_shell_commands.html#shell_commands">impala-shell Command Reference</a> for details
+ and <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a>
+ for examples.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">millisecond()</code> built-in function lets you extract
+ the fractional seconds part of a <code class="ph codeph">TIMESTAMP</code> value.
+ See <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ If an Avro table is created without column definitions in the
+ <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+ added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+ table is now queryable. Missing values from the newly added
+ columns now default to <code class="ph codeph">NULL</code>.
+ See <a class="xref" href="impala_avro.html#avro">Using the Avro File Format with Impala Tables</a> for general details about
+ working with Avro files.
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ The mechanism for interpreting <code class="ph codeph">DECIMAL</code> literals is
+ improved, no longer going through an intermediate conversion step
+ to <code class="ph codeph">DOUBLE</code>:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Casting a <code class="ph codeph">DECIMAL</code> value to <code class="ph codeph">TIMESTAMP</code>
+ <code class="ph codeph">DOUBLE</code> produces a more precise
+ value for the <code class="ph codeph">TIMESTAMP</code> than formerly.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Certain function calls involving <code class="ph codeph">DECIMAL</code> literals
+ now succeed, when formerly they failed due to lack of a function
+ signature with a <code class="ph codeph">DOUBLE</code> argument.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Faster runtime performance for <code class="ph codeph">DECIMAL</code> constant
+ values, through improved native code generation for all combinations
+ of precision and scale.
+ </p>
+ </li>
+ </ul>
+ See <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a> for details about the <code class="ph codeph">DECIMAL</code> type.
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved type accuracy for <code class="ph codeph">CASE</code> return values.
+ If all <code class="ph codeph">WHEN</code> clauses of the <code class="ph codeph">CASE</code>
+ expression are of <code class="ph codeph">CHAR</code> type, the final result
+ is also <code class="ph codeph">CHAR</code> instead of being converted to
+ <code class="ph codeph">STRING</code>.
+ See <a class="xref" href="impala_conditional_functions.html#conditional_functions">Impala Conditional Functions</a>
+ for details about the <code class="ph codeph">CASE</code> function.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Uncorrelated queries using the <code class="ph codeph">NOT EXISTS</code> operator
+ are now supported. Formerly, the <code class="ph codeph">NOT EXISTS</code>
+ operator was only available for correlated subqueries.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved performance for reading Parquet files.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improved performance for <dfn class="term">top-N</dfn> queries, that is,
+ those including both <code class="ph codeph">ORDER BY</code> and
+ <code class="ph codeph">LIMIT</code> clauses.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Impala optionally skips an arbitrary number of header lines from text input
+ files on HDFS based on the <code class="ph codeph">skip.header.line.count</code> value
+ in the <code class="ph codeph">TBLPROPERTIES</code> field of the table metadata.
+ See <a class="xref" href="impala_txtfile.html#text_data_files">Data Files for Text Tables</a> for details.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Trailing comments are now allowed in queries processed by
+ the <span class="keyword cmdname">impala-shell</span> options <code class="ph codeph">-q</code>
+ and <code class="ph codeph">-f</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala can run <code class="ph codeph">COUNT</code> queries for RCFile tables
+ that include complex type columns.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for
+ general information about working with complex types,
+ and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>,
+ <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>
+ for syntax details of each type.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="new_features__new_features_250">
+
+ <h2 class="title topictitle2" id="ariaid-title10">New Features in <span class="keyword">Impala 2.5</span></h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Dynamic partition pruning. When a query refers to a partition key column in a <code class="ph codeph">WHERE</code>
+ clause, and the exact set of column values are not known until the query is executed,
+ Impala evaluates the predicate and skips the I/O for entire partitions that are not needed.
+ For example, if a table was partitioned by year, Impala would apply this technique to a query
+ such as <code class="ph codeph">SELECT c1 FROM partitioned_table WHERE year = (SELECT MAX(year) FROM other_table)</code>.
+ <span class="ph">See <a class="xref" href="impala_partitioning.html#dynamic_partition_pruning">Dynamic Partition Pruning</a> for details.</span>
+ </p>
+ <p class="p">
+ The dynamic partition pruning optimization technique lets Impala avoid reading
+ data files from partitions that are not part of the result set, even when
+ that determination cannot be made in advance. This technique is especially valuable
+ when performing join queries involving partitioned tables. For example, if a join
+ query includes an <code class="ph codeph">ON</code> clause and a <code class="ph codeph">WHERE</code> clause
+ that refer to the same columns, the query can find the set of column values that
+ match the <code class="ph codeph">WHERE</code> clause, and only scan the associated partitions
+ when evaluating the <code class="ph codeph">ON</code> clause.
+ </p>
+ <p class="p">
+ Dynamic partition pruning is controlled by the same settings as the runtime filtering feature.
+ By default, this feature is enabled at a medium level, because the maximum setting can use
+ slightly more memory for queries than in previous releases.
+ To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Runtime filtering. This is a wide-ranging set of optimizations that are especially valuable for join queries.
+ Using the same technique as with dynamic partition pruning,
+ Impala uses the predicates from <code class="ph codeph">WHERE</code> and <code class="ph codeph">ON</code> clauses
+ to determine the subset of column values from one of the joined tables could possibly be part of the
+ result set. Impala sends a compact representation of the filter condition to the hosts in the cluster,
+ instead of the full set of values or the entire table.
+ <span class="ph">See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for details.</span>
+ </p>
+ <p class="p">
+ By default, this feature is enabled at a medium level, because the maximum setting can use
+ slightly more memory for queries than in previous releases.
+ To fully enable this feature, set the query option <code class="ph codeph">RUNTIME_FILTER_MODE=GLOBAL</code>.
+ <span class="ph">See <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE Query Option (Impala 2.5 or higher only)</a> for details.</span>
+ </p>
+ <p class="p">
+ This feature involves some new query options:
+ <a class="xref" href="impala_runtime_filter_mode.html">RUNTIME_FILTER_MODE</a>,
+ <a class="xref" href="impala_max_num_runtime_filters.html">MAX_NUM_RUNTIME_FILTERS</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html">RUNTIME_BLOOM_FILTER_SIZE</a>,
+ <a class="xref" href="impala_runtime_filter_wait_time_ms.html">RUNTIME_FILTER_WAIT_TIME_MS</a>,
+ and <a class="xref" href="impala_disable_row_runtime_filtering.html">DISABLE_ROW_RUNTIME_FILTERING</a>.
+ <span class="ph">See
+ <a class="xref" href="impala_runtime_filter_mode.html#runtime_filter_mode">RUNTIME_FILTER_MODE</a>,
+ <a class="xref" href="impala_max_num_runtime_filters.html#max_num_runtime_filters">MAX_NUM_RUNTIME_FILTERS</a>,
+ <a class="xref" href="impala_runtime_bloom_filter_size.html#runtime_bloom_filter_size">RUNTIME_BLOOM_FILTER_SIZE</a>,
+ <a class="xref" href="impala_runtime_filter_wait_time_ms.html#runtime_filter_wait_time_ms">RUNTIME_FILTER_WAIT_TIME_MS</a>, and
+ <a class="xref" href="impala_disable_row_runtime_filtering.html#disable_row_runtime_filtering">DISABLE_ROW_RUNTIME_FILTERING</a>
+ for details.
+ </span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ More efficient use of the HDFS caching feature, to avoid
+ hotspots and bottlenecks that could occur if heavily used
+ cached data blocks were always processed by the same host.
+ By default, Impala now randomizes which host processes each cached
+ HDFS data block, when cached replicas are available on multiple hosts.
+ (Remember to use the <code class="ph codeph">WITH REPLICATION</code> clause with the
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement
+ when enabling HDFS caching for a table or partition, to cache the same
+ data blocks across multiple hosts.)
+ The new query option <code class="ph codeph">SCHEDULE_RANDOM_REPLICA</code>
+
+ lets you fine-tune the interaction with HDFS caching even more.
+ <span class="ph">See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">TRUNCATE TABLE</code> statement now accepts an <code class="ph codeph">IF EXISTS</code>
+ clause, making <code class="ph codeph">TRUNCATE TABLE</code> easier to use in setup or ETL scripts where the table might or
+ might not exist.
+ <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <div class="p">
+ Improved performance and reliability for the <code class="ph codeph">DECIMAL</code> data type:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Using <code class="ph codeph">DECIMAL</code> values in a <code class="ph codeph">GROUP BY</code> clause now
+ triggers the native code generation optimization, speeding up queries that
+ group by values such as prices.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Checking for overflow in <code class="ph codeph">DECIMAL</code>
+ multiplication is now substantially faster, making <code class="ph codeph">DECIMAL</code>
+ a more practical data type in some use cases where formerly <code class="ph codeph">DECIMAL</code>
+ was much slower than <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Multiplying a mixture of <code class="ph codeph">DECIMAL</code>
+ and <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code> values now returns the
+ <code class="ph codeph">DOUBLE</code> rather than <code class="ph codeph">DECIMAL</code>. This change avoids
+ some cases where an intermediate value would underflow or overflow and become
+ <code class="ph codeph">NULL</code> unexpectedly.
+ </p>
+ </li>
+ </ul>
+ <span class="ph">See <a class="xref" href="impala_decimal.html">DECIMAL Data Type (Impala 3.0 or higher only)</a> for details.</span>
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ For UDFs written in Java, or Hive UDFs reused for Impala,
+ Impala now allows parameters and return values to be primitive types.
+ Formerly, these things were required to be one of the <span class="q">"Writable"</span>
+ object types.
+ <span class="ph">See <a class="xref" href="impala_udf.html#udfs_hive">Using Hive UDFs with Impala</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance improvements for HDFS I/O. Impala now caches HDFS file handles to avoid the
+ overhead of repeatedly opening the same file.
+ </p>
+ </li>
+
+
+ <li class="li">
+ <p class="p">
+ Performance improvements for queries involving nested complex types.
+ Certain basic query types, such as counting the elements of a complex column,
+ now use an optimized code path.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Improvements to the memory reservation mechanism for the Impala
+ admission control feature. You can specify more settings, such
+ as the timeout period and maximum aggregate memory used, for each
+ resource pool instead of globally for the Impala instance. The
+ default limit for concurrent queries (the <span class="ph uicontrol">max requests</span>
+ setting) is now unlimited instead of 200.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Performance improvements related to code generation.
+ Even in queries where code generation is not performed
+ for some phases of execution (such as reading data from
+ Parquet tables), Impala can still use code generation in
+ other parts of the query, such as evaluating
+ functions in the <code class="ph codeph">WHERE</code> clause.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance improvements for queries using aggregation functions
+ on high-cardinality columns.
+ Formerly, Impala could do unnecessary extra work to produce intermediate
+ results for operations such as <code class="ph codeph">DISTINCT</code> or <code class="ph codeph">GROUP BY</code>
+ on columns that were unique or had few duplicate values.
+ Now, Impala decides at run time whether it is more efficient to
+ do an initial aggregation phase and pass along a smaller set of intermediate data,
+ or to pass raw intermediate data back to next phase of query processing to be aggregated there.
+ This feature is known as <dfn class="term">streaming pre-aggregation</dfn>.
+ In case of performance regression, this feature can be turned off
+ using the <code class="ph codeph">DISABLE_STREAMING_PREAGGREGATIONS</code> query option.
+ <span class="ph">See <a class="xref" href="impala_disable_streaming_preaggregations.html#disable_streaming_preaggregations">DISABLE_STREAMING_PREAGGREGATIONS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Spill-to-disk feature now always recommended. In earlier releases, the spill-to-disk feature
+ could be turned off using a pair of configuration settings,
+ <code class="ph codeph">enable_partitioned_aggregation=false</code> and
+ <code class="ph codeph">enable_partitioned_hash_join=false</code>.
+ The latest improvements in the spill-to-disk mechanism, and related features that
+ interact with it, make this feature robust enough that disabling it is now
+ no longer needed or supported. In particular, some new features in <span class="keyword">Impala 2.5</span>
+ and higher do not work when the spill-to-disk feature is disabled.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improvements to scripting capability for the <span class="keyword cmdname">impala-shell</span> command,
+ through user-specified substitution variables that can appear in statements processed
+ by <span class="keyword cmdname">impala-shell</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">--var</code> command-line option lets you pass key-value pairs to
+ <span class="keyword cmdname">impala-shell</span>. The shell can substitute the values
+ into queries before executing them, where the query text contains the notation
+ <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>. For example, you might prepare a SQL file
+ containing a set of DDL statements and queries containing variables for
+ database and table names, and then pass the applicable names as part of the
+ <code class="ph codeph">impala-shell -f <var class="keyword varname">filename</var></code> command.
+ <span class="ph">See <a class="xref" href="impala_shell_running_commands.html#shell_running_commands">Running Commands and SQL Statements in impala-shell</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">SET</code> and <code class="ph codeph">UNSET</code> commands within the
+ <span class="keyword cmdname">impala-shell</span> interpreter now work with user-specified
+ substitution variables, as well as the built-in query options.
+ The two kinds of variables are divided in the <code class="ph codeph">SET</code> output.
+ As with variables defined by the <code class="ph codeph">--var</code> command-line option,
+ you refer to the user-specified substitution variables in queries by using
+ the notation <code class="ph codeph">${var:<var class="keyword varname">varname</var>}</code>
+ in the query text. Because the substitution variables are processed by
+ <span class="keyword cmdname">impala-shell</span> instead of the <span class="keyword cmdname">impalad</span>
+ backend, you cannot define your own substitution variables through the
+ <code class="ph codeph">SET</code> statement in a JDBC or ODBC application.
+ <span class="ph">See <a class="xref" href="impala_set.html#set">SET Statement</a> for details.</span>
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance improvements for query startup. Impala better parallelizes certain work
+ when coordinating plan distribution between <span class="keyword cmdname">impalad</span> instances, which improves
+ startup time for queries involving tables with many partitions on large clusters,
+ or complicated queries with many plan fragments.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance and scalability improvements for tables with many partitions.
+ The memory requirements on the coordinator node are reduced, making it substantially
+ faster and less resource-intensive
+ to do joins involving several tables with thousands of partitions each.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Whitelisting for access to internal APIs. For applications that need direct access
+ to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+ specify a list of Kerberos users who are allowed to call those APIs. By default, the
+ <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+ for this kind of access.
+ Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+ configuration setting are blocked from accessing the APIs. This setting applies to all the
+ Impala-related daemons, although currently it is primarily used for HDFS to control the
+ behavior of the catalog server.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Improvements to Impala integration and usability for Hue. (The code changes
+ are actually on the Hue side.)
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The list of tables now refreshes dynamically.
+ </p>
+ </li>
+ </ul>
+ </li>
+ <li class="li">
+ <p class="p">
+ Usability improvements for case-insensitive queries.
+ You can now use the operators <code class="ph codeph">ILIKE</code> and <code class="ph codeph">IREGEXP</code>
+ to perform case-insensitive wildcard matches or regular expression matches,
+ rather than explicitly converting column values with <code class="ph codeph">UPPER</code>
+ or <code class="ph codeph">LOWER</code>.
+ <span class="ph">See <a class="xref" href="impala_operators.html#ilike">ILIKE Operator</a> and <a class="xref" href="impala_operators.html#iregexp">IREGEXP Operator</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Performance and reliability improvements for DDL and insert operations on partitioned tables with a large
+ number of partitions. Impala only re-evaluates metadata for partitions that are affected by
+ a DDL operation, not all partitions in the table. While a DDL or insert statement is in progress,
+ other Impala statements that attempt to modify metadata for the same table wait until the first one
+ finishes.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Reliability improvements for the <code class="ph codeph">LOAD DATA</code> statement.
+ Previously, this statement would fail if the source HDFS directory
+ contained any subdirectories at all. Now, the statement ignores
+ any hidden subdirectories, for example <span class="ph filepath">_impala_insert_staging</span>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new operator, <code class="ph codeph">IS [NOT] DISTINCT FROM</code>, lets you compare values
+ and always get a <code class="ph codeph">true</code> or <code class="ph codeph">false</code> result,
+ even if one or both of the values are <code class="ph codeph">NULL</code>.
+ The <code class="ph codeph">IS NOT DISTINCT FROM</code> operator, or its equivalent
+ <code class="ph codeph"><=></code> notation, improves the efficiency of join queries that
+ treat key values that are <code class="ph codeph">NULL</code> in both tables as equal.
+ <span class="ph">See <a class="xref" href="impala_operators.html#is_distinct_from">IS DISTINCT FROM Operator</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Security enhancements for the <span class="keyword cmdname">impala-shell</span> command.
+ A new option, <code class="ph codeph">--ldap_password_cmd</code>, lets you specify
+ a command to retrieve the LDAP password. The resulting password is
+ then used to authenticate the <span class="keyword cmdname">impala-shell</span> command
+ with the LDAP server.
+ <span class="ph">See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">CREATE TABLE AS SELECT</code> statement now accepts a
+ <code class="ph codeph">PARTITIONED BY</code> clause, which lets you create a
+ partitioned table and insert data into it with a single statement.
+ <span class="ph">See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ User-defined functions (UDFs and UDAFs) written in C++ now persist automatically
+ when the <span class="keyword cmdname">catalogd</span> daemon is restarted. You no longer
+ have to run the <code class="ph codeph">CREATE FUNCTION</code> statements again after a restart.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ User-defined functions (UDFs) written in Java can now persist
+ when the <span class="keyword cmdname">catalogd</span> daemon is restarted, and can be shared
+ transparently between Impala and Hive. You must do a one-time operation to recreate these
+ UDFs using new <code class="ph codeph">CREATE FUNCTION</code> syntax, without a signature for arguments
+ or the return value. Afterwards, you no longer have to run the <code class="ph codeph">CREATE FUNCTION</code>
+ statements again after a restart.
+ Although Impala does not have visibility into the UDFs that implement the
+ Hive built-in functions, user-created Hive UDFs are now automatically available
+ for calling through Impala.
+ <span class="ph">See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+
+ <p class="p">
+ Reliability enhancements for memory management. Some aggregation and join queries
+ that formerly might have failed with an out-of-memory error due to memory contention,
+ now can succeed using the spill-to-disk mechanism.
+ </p>
+ </li>
+ <li class="li">
+
+ <p class="p">
+ The <code class="ph codeph">SHOW DATABASES</code> statement now returns two columns rather than one.
+ The second column includes the associated comment string, if any, for each database.
+ Adjust any application code that examines the list of databases and assumes the
+ result set contains only a single column.
+ <span class="ph">See <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ A new optimization speeds up aggregation operations that involve only the partition key
+ columns of partitioned tables. For example, a query such as <code class="ph codeph">SELECT COUNT(DISTINCT k), MIN(k), MAX(k) FROM t1</code>
+ can avoid reading any data files if <code class="ph codeph">T1</code> is a partitioned table and <code class="ph codeph">K</code>
+ is one of the partition key columns. Because this technique can produce different results in cases
+ where HDFS files in a partition are manually deleted or are empty, you must enable the optimization
+ by setting the query option <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>.
+ <span class="ph">See <a class="xref" href="impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DESCRIBE</code> statement can now display metadata about a database, using the
+ syntax <code class="ph codeph">DESCRIBE DATABASE <var class="keyword varname">db_name</var></code>.
+ <span class="ph">See <a class="xref" href="impala_describe.html#describe">DESCRIBE Statement</a> for details.</span>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">uuid()</code> built-in function generates an
+ alphanumeric value that you can use as a guaranteed unique identifier.
+ The uniqueness applies even across tables, for cases where an ascending
+ numeric sequence is not suitable.
+ <span class="ph">See <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details.</span>
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="new_features__new_features_240">
+
+ <h2 class="title topictitle2" id="ariaid-title11">New Features in <span class="keyword">Impala 2.4</span></h2>
+
+ <div class="body conbody">
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Impala can be used on the DSSD D5 Storage Appliance.
+ From a user perspective, the Impala features are the same as in <span class="keyword">Impala 2.3</span>.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="new_features__new_features_230">
+
+ <h2 class="title topictitle2" id="ariaid-title12">New Features in <span class="keyword">Impala 2.3</span></h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following are the major new features in Impala 2.3.x. This major release
+ contains improvements to SQL syntax (particularly new support for complex types), performance,
+ manageability, security.
+ </p>
+
+ <ul class="ul">
+
+ <li class="li">
+ <p class="p">
+ Complex data types: <code class="ph codeph">STRUCT</code>, <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>. These
+ types can encode multiple named fields, positional items, or key-value pairs within a single column.
+ You can combine these types to produce nested types with arbitrarily deep nesting,
+ such as an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> values,
+ a <code class="ph codeph">MAP</code> where each key-value pair is an <code class="ph codeph">ARRAY</code> of other <code class="ph codeph">MAP</code> values,
+ and so on. Currently, complex data types are only supported for the Parquet file format.
+ <span class="ph">See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for usage details and <a class="xref" href="impala_array.html#array">ARRAY Complex Type (Impala 2.3 or higher only)</a>, <a class="xref" href="impala_struct.html#struct">STRUCT Complex Type (Impala 2.3 or higher only)</a>, and <a class="xref" href="impala_map.html#map">MAP Complex Type (Impala 2.3 or higher only)</a> for syntax.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Column-level authorization lets you define access to particular columns within a table,
+ rather than the entire table. This feature lets you reduce the reliance on creating views to
+ set up authorization schemes for subsets of information.
+ See <span class="xref">the documentation for Apache Sentry</span> for background details, and
+ <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a> and <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a> for Impala-specific syntax.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">TRUNCATE TABLE</code> statement removes all the data from a table without removing the table itself.
+ <span class="ph">See <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> for details.</span>
+ </p>
+ </li>
+
+ <li class="li" id="new_features_230__IMPALA-2015">
+ <p class="p">
+ Nested loop join queries. Some join queries that formerly required equality comparisons can now use
+ operators such as <code class="ph codeph"><</code> or <code class="ph codeph">>=</code>. This same join mechanism is used
+ internally to optimize queries that retrieve values from complex type columns.
+ <span class="ph">See <a class="xref" href="impala_joins.html#joins">Joins in Impala SELECT Statements</a> for details about Impala join queries.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Reduced memory usage and improved performance and robustness for spill-to-disk feature.
+ <span class="ph">See <a class="xref" href="impala_scalability.html#spill_to_disk">SQL Operations that Spill to Disk</a> for details about this feature.</span>
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Performance improvements for querying Parquet data files containing multiple row groups
+ and multiple data blocks:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p"> For files written by Hive, SparkSQL, and other Parquet MR writers
+ and spanning multiple HDFS blocks, Impala now scans the extra
+ data blocks locally when possible, rather than using remote
+ reads. </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Impala queries benefit from the improved alignment of row groups with HDFS blocks for Parquet
+ files written by Hive, MapReduce, and other components. (Impala itself never writes
+ multiblock Parquet files, so the alignment change does not apply to Parquet files produced by Impala.)
+ These Parquet writers now add padding to Parquet files that they write to align row groups with HDFS blocks.
+ The <code class="ph codeph">parquet.writer.max-padding</code> setting specifies the maximum number of bytes, by default
+ 8 megabytes, that can be added to the file between row groups to fill the gap at the end of one block
+ so that the next row group starts at the beginning of the next block.
+ If the gap is larger than this size, the writer attempts to fit another entire row group in the remaining space.
+ Include this setting in the <span class="ph filepath">hive-site</span> configuration file to influence Parquet files written by Hive,
+ or the <span class="ph filepath">hdfs-site</span> configuration file to influence Parquet files written by all non-Impala components.
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ See <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for instructions about using Parquet data files
+ with Impala.
+ </p>
+ </li>
+
+ <li class="li" id="new_features_230__IMPALA-1660">
+ <p class="p">
+ Many new built-in scalar functions, for convenience and enhanced portability of SQL that uses common industry extensions.
+ </p>
+
+ <p class="p">
+ Math functions<span class="ph"> (see <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a> for details)</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">ATAN2</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">COSH</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">COT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DCEIL</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DEXP</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DFLOOR</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DLOG10</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DPOW</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DROUND</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DSQRT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">DTRUNC</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">FACTORIAL</code>, and corresponding <code class="ph codeph">!</code> operator
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">FPOW</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RADIANS</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RANDOM</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SINH</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">TANH</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ String functions<span class="ph"> (see <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a> for details)</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">BTRIM</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">CHR</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">REGEXP_LIKE</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">SPLIT_PART</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ Date and time functions<span class="ph"> (see <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details)</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">INT_MONTHS_BETWEEN</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">MONTHS_BETWEEN</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">TIMEOFDAY</code>
+ </li>
+ <li class="li">
+ <code class="ph codeph">TIMESTAMP_CMP</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ Bit manipulation functions<span class="ph"> (see <a class="xref" href="impala_bit_functions.html#bit_functions">Impala Bit Functions</a> for details)</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">BITAND</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">BITNOT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">BITOR</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">BITXOR</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">COUNTSET</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">GETBIT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">ROTATELEFT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">ROTATERIGHT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SETBIT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SHIFTLEFT</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">SHIFTRIGHT</code>
+ </li>
+ </ul>
+ <p class="p">
+ Type conversion functions<span class="ph"> (see <a class="xref" href="impala_conversion_functions.html#conversion_functions">Impala Type Conversion Functions</a> for details)</span>:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">TYPEOF</code>
+ </li>
+ </ul>
+ <p class="p">
+ The <code class="ph codeph">effective_user()</code> function<span class="ph"> (see <a class="xref" href="impala_misc_functions.html#misc_functions">Impala Miscellaneous Functions</a> for details)</span>.
+ </p>
+ </li>
+
+ <li class="li" id="new_features_230__IMPALA-2081">
+ <p class="p">
+ New built-in analytic
<TRUNCATED>
[42/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_components.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_components.html b/docs/build3x/html/topics/impala_components.html
new file mode 100644
index 0000000..eb6e0f6
--- /dev/null
+++ b/docs/build3x/html/topics/impala_components.html
@@ -0,0 +1,227 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_concepts.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="intro_components"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Components of the Impala Server</title></head><body id="intro_components"><main
role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Components of the Impala Server</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala server is a distributed, massively parallel processing (MPP) database engine. It consists of
+ different daemon processes that run on specific hosts within your <span class="keyword"></span> cluster.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_concepts.html">Impala Concepts and Architecture</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="intro_components__intro_impalad">
+
+ <h2 class="title topictitle2" id="ariaid-title2">The Impala Daemon</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The core Impala component is a daemon process that runs on each DataNode of the cluster, physically represented
+ by the <code class="ph codeph">impalad</code> process. It reads and writes to data files; accepts queries transmitted
+ from the <code class="ph codeph">impala-shell</code> command, Hue, JDBC, or ODBC; parallelizes the queries and
+ distributes work across the cluster; and transmits intermediate query results back to the
+ central coordinator node.
+ </p>
+
+ <p class="p">
+ You can submit a query to the Impala daemon running on any DataNode, and that instance of the daemon serves as the
+ <dfn class="term">coordinator node</dfn> for that query. The other nodes transmit partial results back to the
+ coordinator, which constructs the final result set for a query. When running experiments with functionality
+ through the <code class="ph codeph">impala-shell</code> command, you might always connect to the same Impala daemon for
+ convenience. For clusters running production workloads, you might load-balance by
+ submitting each query to a different Impala daemon in round-robin style, using the JDBC or ODBC interfaces.
+ </p>
+
+ <p class="p">
+ The Impala daemons are in constant communication with the <dfn class="term">statestore</dfn>, to confirm which nodes
+ are healthy and can accept new work.
+ </p>
+
+ <p class="p">
+ They also receive broadcast messages from the <span class="keyword cmdname">catalogd</span> daemon (introduced in Impala 1.2)
+ whenever any Impala node in the cluster creates, alters, or drops any type of object, or when an
+ <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statement is processed through Impala. This
+ background communication minimizes the need for <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE
+ METADATA</code> statements that were needed to coordinate metadata across nodes prior to Impala 1.2.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, you can control which hosts act as query coordinators
+ and which act as query executors, to improve scalability for highly concurrent workloads on large clusters.
+ See <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_timeouts.html#impalad_timeout">Setting the Idle Query and Idle Session Timeouts for impalad</a>,
+ <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>, <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="intro_components__intro_statestore">
+
+ <h2 class="title topictitle2" id="ariaid-title3">The Impala Statestore</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala component known as the <dfn class="term">statestore</dfn> checks on the health of Impala daemons on all the
+ DataNodes in a cluster, and continuously relays its findings to each of those daemons. It is physically
+ represented by a daemon process named <code class="ph codeph">statestored</code>; you only need such a process on one
+ host in the cluster. If an Impala daemon goes offline due to hardware failure, network error, software issue,
+ or other reason, the statestore informs all the other Impala daemons so that future queries can avoid making
+ requests to the unreachable node.
+ </p>
+
+ <p class="p">
+ Because the statestore's purpose is to help when things go wrong, it is not critical to the normal
+ operation of an Impala cluster. If the statestore is not running or becomes unreachable, the Impala daemons
+ continue running and distributing work among themselves as usual; the cluster just becomes less robust if
+ other Impala daemons fail while the statestore is offline. When the statestore comes back online, it re-establishes
+ communication with the Impala daemons and resumes its monitoring function.
+ </p>
+
+ <p class="p">
+ Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+ The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+ requirements for high availability, because problems with those daemons do not result in data loss.
+ If those daemons become unavailable due to an outage on a particular
+ host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+ <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+ Impala service.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_scalability.html#statestore_scalability">Scalability Considerations for the Impala Statestore</a>,
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>, <a class="xref" href="impala_processes.html#processes">Starting Impala</a>,
+ <a class="xref" href="impala_timeouts.html#statestore_timeout">Increasing the Statestore Timeout</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="intro_components__intro_catalogd">
+
+ <h2 class="title topictitle2" id="ariaid-title4">The Impala Catalog Service</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The Impala component known as the <dfn class="term">catalog service</dfn> relays the metadata changes from Impala SQL
+ statements to all the Impala daemons in a cluster. It is physically represented by a daemon process named
+ <code class="ph codeph">catalogd</code>; you only need such a process on one host in the cluster. Because the requests
+ are passed through the statestore daemon, it makes sense to run the <span class="keyword cmdname">statestored</span> and
+ <span class="keyword cmdname">catalogd</span> services on the same host.
+ </p>
+
+ <p class="p">
+ The catalog service avoids the need to issue
+ <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements when the metadata changes are
+ performed by statements issued through Impala. When you create a table, load data, and so on through Hive,
+ you do need to issue <code class="ph codeph">REFRESH</code> or <code class="ph codeph">INVALIDATE METADATA</code> on an Impala node
+ before executing a query there.
+ </p>
+
+ <p class="p">
+ This feature touches a number of aspects of Impala:
+ </p>
+
+
+
+ <ul class="ul" id="intro_catalogd__catalogd_xrefs">
+ <li class="li">
+ <p class="p">
+ See <a class="xref" href="impala_install.html#install">Installing Impala</a>, <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a> and
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, for usage information for the
+ <span class="keyword cmdname">catalogd</span> daemon.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> statements are not needed
+ when the <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">INSERT</code>, or other table-changing or
+ data-changing operation is performed through Impala. These statements are still needed if such
+ operations are done through Hive or by manipulating data files directly in HDFS, but in those cases the
+ statements only need to be issued on one Impala node rather than on all nodes. See
+ <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> and
+ <a class="xref" href="impala_invalidate_metadata.html#invalidate_metadata">INVALIDATE METADATA Statement</a> for the latest usage information for
+ those statements.
+ </p>
+ </li>
+ </ul>
+
+ <div class="p">
+ Use <code class="ph codeph">--load_catalog_in_background</code> option to control when
+ the metadata of a table is loaded.
+ <ul class="ul">
+ <li class="li">
+ If set to <code class="ph codeph">false</code>, the metadata of a table is
+ loaded when it is referenced for the first time. This means that the
+ first run of a particular query can be slower than subsequent runs.
+ Starting in Impala 2.2, the default for
+ <code class="ph codeph">load_catalog_in_background</code> is
+ <code class="ph codeph">false</code>.
+ </li>
+ <li class="li">
+ If set to <code class="ph codeph">true</code>, the catalog service attempts to
+ load metadata for a table even if no query needed that metadata. So
+ metadata will possibly be already loaded when the first query that
+ would need it is run. However, for the following reasons, we
+ recommend not to set the option to <code class="ph codeph">true</code>.
+ <ul class="ul">
+ <li class="li">
+ Background load can interfere with query-specific metadata
+ loading. This can happen on startup or after invalidating
+ metadata, with a duration depending on the amount of metadata,
+ and can lead to a seemingly random long running queries that are
+ difficult to diagnose.
+ </li>
+ <li class="li">
+ Impala may load metadata for tables that are possibly never
+ used, potentially increasing catalog size and consequently memory
+ usage for both catalog service and Impala Daemon.
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ Most considerations for load balancing and high availability apply to the <span class="keyword cmdname">impalad</span> daemon.
+ The <span class="keyword cmdname">statestored</span> and <span class="keyword cmdname">catalogd</span> daemons do not have special
+ requirements for high availability, because problems with those daemons do not result in data loss.
+ If those daemons become unavailable due to an outage on a particular
+ host, you can stop the Impala service, delete the <span class="ph uicontrol">Impala StateStore</span> and
+ <span class="ph uicontrol">Impala Catalog Server</span> roles, add the roles on a different host, and restart the
+ Impala service.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In Impala 1.2.4 and higher, you can specify a table name with <code class="ph codeph">INVALIDATE METADATA</code> after
+ the table is created in Hive, allowing you to make individual tables visible to Impala without doing a full
+ reload of the catalog metadata. Impala 1.2.4 also includes other changes to make the metadata broadcast
+ mechanism faster and more responsive, especially during Impala startup. See
+ <a class="xref" href="../shared/../topics/impala_new_features.html#new_features_124">New Features in Impala 1.2.4</a> for details.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong> <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>,
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>, <a class="xref" href="impala_ports.html#ports">Ports Used by Impala</a>
+ </p>
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compression_codec.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compression_codec.html b/docs/build3x/html/topics/impala_compression_codec.html
new file mode 100644
index 0000000..5933efa
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compression_codec.html
@@ -0,0 +1,92 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compression_codec"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPRESSION_CODEC Query Option (Impala 2.0 or higher only)</title></head><body id="compression_codec"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">COMPRESSION_CODEC Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+
+
+
+
+ <p class="p">
+
+ When Impala writes Parquet data files using the <code class="ph codeph">INSERT</code> statement, the underlying compression
+ is controlled by the <code class="ph codeph">COMPRESSION_CODEC</code> query option.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Prior to Impala 2.0, this option was named <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code>. In Impala 2.0 and
+ later, the <code class="ph codeph">PARQUET_COMPRESSION_CODEC</code> name is not recognized. Use the more general name
+ <code class="ph codeph">COMPRESSION_CODEC</code> for new code.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SET COMPRESSION_CODEC=<var class="keyword varname">codec_name</var>;</code></pre>
+
+ <p class="p">
+ The allowed values for this query option are <code class="ph codeph">SNAPPY</code> (the default), <code class="ph codeph">GZIP</code>,
+ and <code class="ph codeph">NONE</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ A Parquet file created with <code class="ph codeph">COMPRESSION_CODEC=NONE</code> is still typically smaller than the
+ original data, due to encoding schemes such as run-length encoding and dictionary encoding that are applied
+ separately from compression.
+ </div>
+
+ <p class="p"></p>
+
+ <p class="p">
+ The option value is not case-sensitive.
+ </p>
+
+ <p class="p">
+ If the option is set to an unrecognized value, all kinds of queries will fail due to the invalid option
+ setting, not just queries involving Parquet tables. (The value <code class="ph codeph">BZIP2</code> is also recognized, but
+ is not compatible with Parquet tables.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">STRING</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">SNAPPY</code>
+ </p>
+
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>set compression_codec=gzip;
+insert into parquet_table_highly_compressed select * from t1;
+
+set compression_codec=snappy;
+insert into parquet_table_compression_plus_fast_queries select * from t1;
+
+set compression_codec=none;
+insert into parquet_table_no_compression select * from t1;
+
+set compression_codec=foo;
+select * from t1 limit 5;
+ERROR: Invalid compression codec: foo
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ For information about how compressing Parquet data files affects query performance, see
+ <a class="xref" href="impala_parquet.html#parquet_compression">Snappy and GZip Compression for Parquet Data Files</a>.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats.html b/docs/build3x/html/topics/impala_compute_stats.html
new file mode 100644
index 0000000..407ba97
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats.html
@@ -0,0 +1,637 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE STATS Statement</title></head><body id="compute_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">COMPUTE STATS Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The
+ COMPUTE STATS statement gathers information about volume and distribution
+ of data in a table and all associated columns and partitions. The
+ information is stored in the metastore database, and used by Impala to
+ help optimize queries. For example, if Impala can determine that a table
+ is large or small, or has many or few distinct values it can organize and
+ parallelize the work appropriately for a join query or insert operation.
+ For details about the kinds of information gathered by this statement, see
+ <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code><span class="ph">COMPUTE STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [ ( <var class="keyword varname">column_list</var> ) ] [TABLESAMPLE SYSTEM(<var class="keyword varname">percentage</var>) [REPEATABLE(<var class="keyword varname">seed</var>)]]</span>
+
+<var class="keyword varname">column_list</var> ::= <var class="keyword varname">column_name</var> [ , <var class="keyword varname">column_name</var>, ... ]
+
+COMPUTE INCREMENTAL STATS [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+ clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+ INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+ (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+ <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+ specification, and specify constant values for all the partition key columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Originally, Impala relied on users to run the Hive <code class="ph codeph">ANALYZE
+ TABLE</code> statement, but that method of gathering statistics proved
+ unreliable and difficult to use. The Impala <code class="ph codeph">COMPUTE STATS</code>
+ statement was built to improve the reliability and user-friendliness of
+ this operation. <code class="ph codeph">COMPUTE STATS</code> does not require any setup
+ steps or special configuration. You only run a single Impala
+ <code class="ph codeph">COMPUTE STATS</code> statement to gather both table and column
+ statistics, rather than separate Hive <code class="ph codeph">ANALYZE TABLE</code>
+ statements for each kind of statistics.
+ </p>
+
+ <p class="p">
+ For non-incremental <code class="ph codeph">COMPUTE STATS</code>
+ statement, the columns for which statistics are computed can be specified
+ with an optional comma-separate list of columns.
+ </p>
+
+ <p class="p">
+ If no column list is given, the <code class="ph codeph">COMPUTE STATS</code> statement
+ computes column-level statistics for all columns of the table. This adds
+ potentially unneeded work for columns whose stats are not needed by
+ queries. It can be especially costly for very wide tables and unneeded
+ large string fields.
+ </p>
+ <p class="p">
+ <code class="ph codeph">COMPUTE STATS</code> returns an error when a specified column
+ cannot be analyzed, such as when the column does not exist, the column is
+ of an unsupported type for COMPUTE STATS, e.g. colums of complex types,
+ or the column is a partitioning column.
+
+ </p>
+ <p class="p">
+ If an empty column list is given, no column is analyzed by <code class="ph codeph">COMPUTE
+ STATS</code>.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.12</span> and
+ higher, an optional <code class="ph codeph">TABLESAMPLE</code> clause immediately after
+ a table reference specifies that the <code class="ph codeph">COMPUTE STATS</code>
+ operation only processes a specified percentage of the table data. For
+ tables that are so large that a full <code class="ph codeph">COMPUTE STATS</code>
+ operation is impractical, you can use <code class="ph codeph">COMPUTE STATS</code> with
+ a <code class="ph codeph">TABLESAMPLE</code> clause to extrapolate statistics from a
+ sample of the table data. See <a href="impala_perf_stats.html"><span class="keyword">Table and Column Statistics</span></a>about the
+ experimental stats extrapolation and sampling features.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation is a shortcut for partitioned tables that works on a
+ subset of partitions rather than the entire table. The incremental nature makes it suitable for large tables
+ with many partitions, where a full <code class="ph codeph">COMPUTE STATS</code> operation takes too long to be practical
+ each time a partition is added or dropped. See <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a>
+ for full usage details.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+ alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+ vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+ making the switch.
+ </p>
+ <p class="p">
+ When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+ the statistics are computed again from scratch regardless of whether the table already
+ has statistics. Therefore, expect a one-time resource-intensive operation
+ for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ for the first time on a given table.
+ </p>
+ <p class="p">
+ For a table with a huge number of partitions and many columns, the approximately 400 bytes
+ of metadata per column per partition can add up to significant memory overhead, as it must
+ be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+ that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+ you might experience service downtime.
+ </p>
+ </div>
+
+ <p class="p">
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> only applies to partitioned tables. If you use the
+ <code class="ph codeph">INCREMENTAL</code> clause for an unpartitioned table, Impala automatically uses the original
+ <code class="ph codeph">COMPUTE STATS</code> statement. Such tables display <code class="ph codeph">false</code> under the
+ <code class="ph codeph">Incremental stats</code> column of the <code class="ph codeph">SHOW TABLE STATS</code> output.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <div class="p">
+ Because many of the most performance-critical and resource-intensive
+ operations rely on table and column statistics to construct accurate and
+ efficient plans, <code class="ph codeph">COMPUTE STATS</code> is an important step at
+ the end of your ETL process. Run <code class="ph codeph">COMPUTE STATS</code> on all
+ tables as your first step during performance tuning for slow queries, or
+ troubleshooting for out-of-memory conditions:
+ <ul class="ul">
+ <li class="li">
+ Accurate statistics help Impala construct an efficient query plan
+ for join queries, improving performance and reducing memory usage.
+ </li>
+ <li class="li">
+ Accurate statistics help Impala distribute the work effectively
+ for insert operations into Parquet tables, improving performance and
+ reducing memory usage.
+ </li>
+ <li class="li">
+ Accurate statistics help Impala estimate the memory
+ required for each query, which is important when you use resource
+ management features, such as admission control and the YARN resource
+ management framework. The statistics help Impala to achieve high
+ concurrency, full utilization of available memory, and avoid
+ contention with workloads from other Hadoop components.
+ </li>
+ <li class="li">
+ In <span class="keyword">Impala 2.8</span> and
+ higher, when you run the <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement against a
+ Parquet table, Impala automatically applies the query option setting
+ <code class="ph codeph">MT_DOP=4</code> to increase the amount of intra-node
+ parallelism during this CPU-intensive operation. See <a class="xref" href="impala_mt_dop.html">MT_DOP Query Option</a> for details about what this query option does
+ and how to use it with CPU-intensive <code class="ph codeph">SELECT</code>
+ statements.
+ </li>
+ </ul>
+ </div>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Computing stats for groups of partitions:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.8</span> and higher, you can run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ on multiple partitions, instead of the entire table or one partition at a time. You include
+ comparison operators other than <code class="ph codeph">=</code> in the <code class="ph codeph">PARTITION</code> clause,
+ and the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement applies to all partitions that
+ match the comparison expression.
+ </p>
+
+ <p class="p">
+ For example, the <code class="ph codeph">INT_PARTITIONS</code> table contains 4 partitions.
+ The following <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statements affect some but not all
+ partitions, as indicated by the <code class="ph codeph">Updated <var class="keyword varname">n</var> partition(s)</code>
+ messages. The partitions that are affected depend on values in the partition key column <code class="ph codeph">X</code>
+ that match the comparison expression in the <code class="ph codeph">PARTITION</code> clause.
+ </p>
+
+<pre class="pre codeblock"><code>
+show partitions int_partitions;
++-------+-------+--------+------+--------------+-------------------+---------+...
+| x | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+| 99 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | PARQUET |...
+| 120 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
+| 150 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
+| 200 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT |...
+| Total | -1 | 0 | 0B | 0B | | |...
++-------+-------+--------+------+--------------+-------------------+---------+...
+
+compute incremental stats int_partitions partition (x < 100);
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200));
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x between 100 and 175);
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 2 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x in (100, 150, 200) or x < 100);
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+compute incremental stats int_partitions partition (x != 150);
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 3 partition(s) and 1 column(s). |
++-----------------------------------------+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Currently, the statistics created by the <code class="ph codeph">COMPUTE STATS</code> statement do not include
+ information about complex type columns. The column stats metrics for complex columns are always shown
+ as -1. For queries involving complex type columns, Impala uses
+ heuristics to estimate the data distribution within such columns.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">COMPUTE STATS</code> works for HBase tables also. The statistics gathered for HBase tables are
+ somewhat different than for HDFS-backed tables, but that metadata is still used for optimization when HBase
+ tables are involved in join queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">COMPUTE STATS</code> also works for tables where data resides in the Amazon Simple Storage Service (S3).
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Performance considerations:</strong>
+ </p>
+
+ <p class="p">
+ The statistics collected by <code class="ph codeph">COMPUTE STATS</code> are used to optimize join queries
+ <code class="ph codeph">INSERT</code> operations into Parquet tables, and other resource-intensive kinds of SQL statements.
+ See <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for details.
+ </p>
+
+ <p class="p">
+ For large tables, the <code class="ph codeph">COMPUTE STATS</code> statement itself might take a long time and you
+ might need to tune its performance. The <code class="ph codeph">COMPUTE STATS</code> statement does not work with the
+ <code class="ph codeph">EXPLAIN</code> statement, or the <code class="ph codeph">SUMMARY</code> command in <span class="keyword cmdname">impala-shell</span>.
+ You can use the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span> to examine timing information
+ for the statement as a whole. If a basic <code class="ph codeph">COMPUTE STATS</code> statement takes a long time for a
+ partitioned table, consider switching to the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax so that only
+ newly added partitions are analyzed each time.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example shows two tables, <code class="ph codeph">T1</code> and <code class="ph codeph">T2</code>, with a small number distinct
+ values linked by a parent-child relationship between <code class="ph codeph">T1.ID</code> and <code class="ph codeph">T2.PARENT</code>.
+ <code class="ph codeph">T1</code> is tiny, while <code class="ph codeph">T2</code> has approximately 100K rows. Initially, the statistics
+ includes physical measurements such as the number of files, the total size, and size measurements for
+ fixed-length columns such as with the <code class="ph codeph">INT</code> type. Unknown values are represented by -1. After
+ running <code class="ph codeph">COMPUTE STATS</code> for each table, much more information is available through the
+ <code class="ph codeph">SHOW STATS</code> statements. If you were running a join query involving both of these tables, you
+ would need statistics for both tables to get the most effective optimization for the query.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>[localhost:21000] > show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| -1 | 1 | 33B | TEXT |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.02s
+[localhost:21000] > show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+----------+--------+
+| -1 | 28 | 960.00KB | TEXT |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] > show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id | INT | -1 | -1 | 4 | 4 |
+| s | STRING | -1 | -1 | -1 | -1 |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 1.71s
+[localhost:21000] > show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT | -1 | -1 | 4 | 4 |
+| s | STRING | -1 | -1 | -1 | -1 |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s
+[localhost:21000] > compute stats t1;
+Query: compute stats t1
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.30s
+[localhost:21000] > show table stats t1;
+Query: show table stats t1
++-------+--------+------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+------+--------+
+| 3 | 1 | 33B | TEXT |
++-------+--------+------+--------+
+Returned 1 row(s) in 0.01s
+[localhost:21000] > show column stats t1;
+Query: show column stats t1
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| id | INT | 3 | -1 | 4 | 4 |
+| s | STRING | 3 | -1 | -1 | -1 |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.02s
+[localhost:21000] > compute stats t2;
+Query: compute stats t2
++-----------------------------------------+
+| summary |
++-----------------------------------------+
+| Updated 1 partition(s) and 2 column(s). |
++-----------------------------------------+
+Returned 1 row(s) in 5.70s
+[localhost:21000] > show table stats t2;
+Query: show table stats t2
++-------+--------+----------+--------+
+| #Rows | #Files | Size | Format |
++-------+--------+----------+--------+
+| 98304 | 1 | 960.00KB | TEXT |
++-------+--------+----------+--------+
+Returned 1 row(s) in 0.03s
+[localhost:21000] > show column stats t2;
+Query: show column stats t2
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| parent | INT | 3 | -1 | 4 | 4 |
+| s | STRING | 6 | -1 | 14 | 9.3 |
++--------+--------+------------------+--------+----------+----------+
+Returned 2 row(s) in 0.01s</code></pre>
+
+ <p class="p">
+ The following example shows how to use the <code class="ph codeph">INCREMENTAL</code> clause, available in Impala 2.1.0 and
+ higher. The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> syntax lets you collect statistics for newly added or
+ changed partitions, without rescanning the entire table.
+ </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | -1 | 1 | 223.74KB | NOT CACHED | PARQUET | false
+| Children | -1 | 1 | 230.05KB | NOT CACHED | PARQUET | false
+| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
+| Home | -1 | 1 | 232.56KB | NOT CACHED | PARQUET | false
+| Jewelry | -1 | 1 | 223.72KB | NOT CACHED | PARQUET | false
+| Men | -1 | 1 | 231.25KB | NOT CACHED | PARQUET | false
+| Music | -1 | 1 | 237.90KB | NOT CACHED | PARQUET | false
+| Shoes | -1 | 1 | 234.90KB | NOT CACHED | PARQUET | false
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | -1 | 1 | 226.27KB | NOT CACHED | PARQUET | false
+| Total | -1 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Camping | -1 | 1 | 408.02KB | NOT CACHED | PARQUET | false
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 11 | 2.65MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Camping | 5328 | 1 | 408.02KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 11 | 2.65MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">File format considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with tables created with any of the file formats supported
+ by Impala. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for details about working with the
+ different file formats. The following considerations apply to <code class="ph codeph">COMPUTE STATS</code> depending on the
+ file format of the table.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with text tables with no restrictions. These tables can be
+ created through either Impala or Hive.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with Parquet tables. These tables can be created through
+ either Impala or Hive.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with Avro tables without restriction in <span class="keyword">Impala 2.2</span>
+ and higher. In earlier releases, <code class="ph codeph">COMPUTE STATS</code> worked only for Avro tables created through Hive,
+ and required the <code class="ph codeph">CREATE TABLE</code> statement to use SQL-style column names and types rather than an
+ Avro-style schema specification.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with RCFile tables with no restrictions. These tables can
+ be created through either Impala or Hive.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with SequenceFile tables with no restrictions. These
+ tables can be created through either Impala or Hive.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement works with partitioned tables, whether all the partitions use
+ the same file format, or some partitions are defined through <code class="ph codeph">ALTER TABLE</code> to use different
+ file formats.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Certain multi-stage statements (<code class="ph codeph">CREATE TABLE AS SELECT</code> and
+ <code class="ph codeph">COMPUTE STATS</code>) can be cancelled during some stages, when running <code class="ph codeph">INSERT</code>
+ or <code class="ph codeph">SELECT</code> operations internally. To cancel this statement, use Ctrl-C from the
+ <span class="keyword cmdname">impala-shell</span> interpreter, the <span class="ph uicontrol">Cancel</span> button from the
+ <span class="ph uicontrol">Watch</span> page in Hue, or <span class="ph uicontrol">Cancel</span> from the list of
+ in-flight queries (for a particular node) on the <span class="ph uicontrol">Queries</span> tab in the Impala web UI
+ (port 25000).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span> Prior to Impala 1.4.0,
+ <code class="ph codeph">COMPUTE STATS</code> counted the number of
+ <code class="ph codeph">NULL</code> values in each column and recorded that figure
+ in the metastore database. Because Impala does not currently use the
+ <code class="ph codeph">NULL</code> count during query planning, Impala 1.4.0 and
+ higher speeds up the <code class="ph codeph">COMPUTE STATS</code> statement by
+ skipping this <code class="ph codeph">NULL</code> counting. </div>
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong>
+ </p>
+ <p class="p">
+ Behind the scenes, the <code class="ph codeph">COMPUTE STATS</code> statement
+ executes two statements: one to count the rows of each partition
+ in the table (or the entire table if unpartitioned) through the
+ <code class="ph codeph">COUNT(*)</code> function,
+ and another to count the approximate number of distinct values
+ in each column through the <code class="ph codeph">NDV()</code> function.
+ You might see these queries in your monitoring and diagnostic displays.
+ The same factors that affect the performance, scalability, and
+ execution of other queries (such as parallel execution, memory usage,
+ admission control, and timeouts) also apply to the queries run by the
+ <code class="ph codeph">COMPUTE STATS</code> statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have read
+ permission for all affected files in the source directory:
+ all files in the case of an unpartitioned table or
+ a partitioned table in the case of <code class="ph codeph">COMPUTE STATS</code>;
+ or all the files in partitions without incremental stats in
+ the case of <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+ It must also have read and execute permissions for all
+ relevant directories holding the data files.
+ (Essentially, <code class="ph codeph">COMPUTE STATS</code> requires the
+ same permissions as the underlying <code class="ph codeph">SELECT</code> queries it runs
+ against the table.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE STATS</code> statement applies to Kudu tables.
+ Impala does not compute the number of rows for each partition for
+ Kudu tables. Therefore, you do not need to re-run the operation when
+ you see -1 in the <code class="ph codeph"># Rows</code> column of the output from
+ <code class="ph codeph">SHOW TABLE STATS</code>. That column always shows -1 for
+ all Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_drop_stats.html#drop_stats">DROP STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+ <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
new file mode 100644
index 0000000..03d21e2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_compute_stats_min_sample_size.html
@@ -0,0 +1,23 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="compute_stats_sample_min_sample_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</title></head><body id="compute_stats_sample_min_sample_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+ <h1 class="title topictitle1" id="ariaid-title1">COMPUTE_STATS_MIN_SAMPLE_SIZE Query Option</h1>
+
+
+ <div class="body conbody">
+ <p class="p">The <code class="ph codeph">COMPUTE_STATS_MIN_SAMPLE_SIZE</code> query option specifies
+ the minimum number of bytes that will be scanned in <code class="ph codeph">COMPUTE STATS
+ TABLESAMPLE</code>, regardless of the user-supplied sampling percent.
+ This query option prevents sampling for very small tables where accurate
+ stats can be obtained cheaply without sampling because the minimum sample
+ size is required to get meaningful stats.</p>
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+ <p class="p"><strong class="ph b">Default:</strong> 1GB</p>
+ <p class="p"><strong class="ph b">Added in</strong>: <span class="keyword">Impala 2.12</span></p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_concepts.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_concepts.html b/docs/build3x/html/topics/impala_concepts.html
new file mode 100644
index 0000000..b98e4ce
--- /dev/null
+++ b/docs/build3x/html/topics/impala_concepts.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_components.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_development.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_hadoop.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="concepts"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Concepts and Architecture</title></head><body id="concepts"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Concepts and Architecture</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following sections provide background information to help you become productive using Impala and
+ its features. Where appropriate, the explanations include context to help understand how aspects of Impala
+ relate to other technologies you might already be familiar with, such as relational database management
+ systems and data warehouses, or other Hadoop components such as Hive, HDFS, and HBase.
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_components.html">Components of the Impala Server</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_development.html">Developing Impala Applications</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_hadoop.html">How Impala Fits Into the Hadoop Ecosystem</a></strong><br></li></ul></nav></article></main></body></html>
[41/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_conditional_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_conditional_functions.html b/docs/build3x/html/topics/impala_conditional_functions.html
new file mode 100644
index 0000000..476fb82
--- /dev/null
+++ b/docs/build3x/html/topics/impala_conditional_functions.html
@@ -0,0 +1,611 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conditional_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Conditional Functions</title></head><body id="conditional_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Conditional Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala supports the following conditional functions for testing equality, comparison operators, and nullity:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="conditional_functions__case">
+ <code class="ph codeph">CASE a WHEN b THEN c [WHEN d THEN e]... [ELSE f] END</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+ when a match is found.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In this form of the <code class="ph codeph">CASE</code> expression, the initial value <code class="ph codeph">A</code>
+ being evaluated for each row it typically a column reference, or an expression involving
+ a column. This form can only compare against a set of specified values, not ranges,
+ multi-value comparisons such as <code class="ph codeph">BETWEEN</code> or <code class="ph codeph">IN</code>,
+ regular expressions, or <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+ on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+ <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+ </p>
+<pre class="pre codeblock"><code>select case x
+ when 1 then 'one'
+ when 2 then 'two'
+ when 0 then 'zero'
+ else 'out of range'
+ end
+ from t1;
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__case2">
+ <code class="ph codeph">CASE WHEN a THEN b [WHEN c THEN d]... [ELSE e] END</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests whether any of a sequence of expressions is true, and returns a corresponding
+ result for the first true expression.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ <code class="ph codeph">CASE</code> expressions without an initial test value have more flexibility.
+ For example, they can test different columns in different <code class="ph codeph">WHEN</code> clauses,
+ or use comparison operators such as <code class="ph codeph">BETWEEN</code>, <code class="ph codeph">IN</code> and <code class="ph codeph">IS NULL</code>
+ rather than comparing against discrete values.
+ </p>
+ <p class="p">
+ <code class="ph codeph">CASE</code> expressions are often the foundation of long queries that
+ summarize and format results for easy-to-read reports. For example, you might
+ use a <code class="ph codeph">CASE</code> function call to turn values from a numeric column
+ into category strings corresponding to integer values, or labels such as <span class="q">"Small"</span>,
+ <span class="q">"Medium"</span> and <span class="q">"Large"</span> based on ranges. Then subsequent parts of the
+ query might aggregate based on the transformed values, such as how many
+ values are classified as small, medium, or large. You can also use <code class="ph codeph">CASE</code>
+ to signal problems with out-of-bounds values, <code class="ph codeph">NULL</code> values,
+ and so on.
+ </p>
+ <p class="p">
+ By using operators such as <code class="ph codeph">OR</code>, <code class="ph codeph">IN</code>,
+ <code class="ph codeph">REGEXP</code>, and so on in <code class="ph codeph">CASE</code> expressions,
+ you can build extensive tests and transformations into a single query.
+ Therefore, applications that construct SQL statements often rely heavily on <code class="ph codeph">CASE</code>
+ calls in the generated SQL code.
+ </p>
+ <p class="p">
+ Because this flexible form of the <code class="ph codeph">CASE</code> expressions allows you to perform
+ many comparisons and call multiple functions when evaluating each row, be careful applying
+ elaborate <code class="ph codeph">CASE</code> expressions to queries that process large amounts of data.
+ For example, when practical, evaluate and transform values through <code class="ph codeph">CASE</code>
+ after applying operations such as aggregations that reduce the size of the result set;
+ transform numbers to strings after performing joins with the original numeric values.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ Although this example is split across multiple lines, you can put any or all parts of a <code class="ph codeph">CASE</code> expression
+ on a single line, with no punctuation or other separators between the <code class="ph codeph">WHEN</code>,
+ <code class="ph codeph">ELSE</code>, and <code class="ph codeph">END</code> clauses.
+ </p>
+<pre class="pre codeblock"><code>select case
+ when dayname(now()) in ('Saturday','Sunday') then 'result undefined on weekends'
+ when x > y then 'x greater than y'
+ when x = y then 'x and y are equal'
+ when x is null or y is null then 'one of the columns is null'
+ else null
+ end
+ from t1;
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__coalesce">
+ <code class="ph codeph">coalesce(type v1, type v2, ...)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the first specified argument that is not <code class="ph codeph">NULL</code>, or
+ <code class="ph codeph">NULL</code> if all arguments are <code class="ph codeph">NULL</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__decode">
+ <code class="ph codeph">decode(type expression, type search1, type result1 [, type search2, type result2 ...] [, type
+ default] )</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Compares an expression to one or more possible values, and returns a corresponding result
+ when a match is found.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Can be used as shorthand for a <code class="ph codeph">CASE</code> expression.
+ </p>
+ <p class="p">
+ The original expression and the search expressions must of the same type or convertible types. The
+ result expression can be a different type, but all result expressions must be of the same type.
+ </p>
+ <p class="p">
+ Returns a successful match If the original expression is <code class="ph codeph">NULL</code> and a search expression
+ is also <code class="ph codeph">NULL</code>. the
+ </p>
+ <p class="p">
+ Returns <code class="ph codeph">NULL</code> if the final <code class="ph codeph">default</code> value is omitted and none of the
+ search expressions match the original expression.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example translates numeric day values into descriptive names:
+ </p>
+<pre class="pre codeblock"><code>SELECT event, decode(day_of_week, 1, "Monday", 2, "Tuesday", 3, "Wednesday",
+ 4, "Thursday", 5, "Friday", 6, "Saturday", 7, "Sunday", "Unknown day")
+ FROM calendar;
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__if">
+ <code class="ph codeph">if(boolean condition, type ifTrue, type ifFalseOrNull)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests an expression and returns a corresponding result depending on whether the result is
+ true, false, or <code class="ph codeph">NULL</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the <code class="ph codeph">ifTrue</code> argument value
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__ifnull">
+ <code class="ph codeph">ifnull(type a, type ifNull)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function, with the same behavior. To simplify
+ porting SQL with vendor extensions to Impala.
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.3.0
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__isfalse">
+ <code class="ph codeph">isfalse(<var class="keyword varname">boolean</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">false</code> or not.
+ Returns <code class="ph codeph">true</code> if so.
+ If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+ Identical to <code class="ph codeph">isnottrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+ <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+ functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+ <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__isnotfalse">
+ <code class="ph codeph">isnotfalse(<var class="keyword varname">boolean</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">false</code> (that is, either <code class="ph codeph">true</code> or <code class="ph codeph">NULL</code>).
+ Returns <code class="ph codeph">true</code> if so.
+ If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+ Identical to <code class="ph codeph">istrue()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+ <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+ functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+ <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__isnottrue">
+ <code class="ph codeph">isnottrue(<var class="keyword varname">boolean</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is not <code class="ph codeph">true</code> (that is, either <code class="ph codeph">false</code> or <code class="ph codeph">NULL</code>).
+ Returns <code class="ph codeph">true</code> if so.
+ If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">true</code>.
+ Identical to <code class="ph codeph">isfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+ <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+ functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+ <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__isnull">
+ <code class="ph codeph">isnull(type a, type ifNull)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if an expression is <code class="ph codeph">NULL</code>, and returns the expression result value
+ if not. If the first argument is <code class="ph codeph">NULL</code>, returns the second argument.
+ <p class="p">
+ <strong class="ph b">Compatibility notes:</strong> Equivalent to the <code class="ph codeph">nvl()</code> function from Oracle Database or
+ <code class="ph codeph">ifnull()</code> from MySQL. The <code class="ph codeph">nvl()</code> and <code class="ph codeph">ifnull()</code>
+ functions are also available in Impala.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the first argument value
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__istrue">
+ <code class="ph codeph">istrue(<var class="keyword varname">boolean</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if a Boolean expression is <code class="ph codeph">true</code> or not.
+ Returns <code class="ph codeph">true</code> if so.
+ If the argument is <code class="ph codeph">NULL</code>, returns <code class="ph codeph">false</code>.
+ Identical to <code class="ph codeph">isnotfalse()</code>, except it returns the opposite value for a <code class="ph codeph">NULL</code> argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ In <span class="keyword">Impala 2.11</span> and higher, you can use
+ the operators <code class="ph codeph">IS [NOT] TRUE</code> and
+ <code class="ph codeph">IS [NOT] FALSE</code> as equivalents for the built-in
+ functions <code class="ph codeph">istrue()</code>, <code class="ph codeph">isnottrue()</code>,
+ <code class="ph codeph">isfalse()</code>, and <code class="ph codeph">isnotfalse()</code>.
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nonnullvalue">
+ <code class="ph codeph">nonnullvalue(<var class="keyword varname">expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+ Returns <code class="ph codeph">false</code> if so.
+ The converse of <code class="ph codeph">nullvalue()</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nullif">
+ <code class="ph codeph">nullif(<var class="keyword varname">expr1</var>,<var class="keyword varname">expr2</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the two specified arguments are equal. If the specified
+ arguments are not equal, returns the value of <var class="keyword varname">expr1</var>. The data types of the expressions
+ must be compatible, according to the conversion rules from <a class="xref" href="impala_datatypes.html#datatypes">Data Types</a>.
+ You cannot use an expression that evaluates to <code class="ph codeph">NULL</code> for <var class="keyword varname">expr1</var>; that
+ way, you can distinguish a return value of <code class="ph codeph">NULL</code> from an argument value of
+ <code class="ph codeph">NULL</code>, which would never match <var class="keyword varname">expr2</var>.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> This function is effectively shorthand for a <code class="ph codeph">CASE</code> expression of
+ the form:
+ </p>
+<pre class="pre codeblock"><code>CASE
+ WHEN <var class="keyword varname">expr1</var> = <var class="keyword varname">expr2</var> THEN NULL
+ ELSE <var class="keyword varname">expr1</var>
+END</code></pre>
+ <p class="p">
+ It is commonly used in division expressions, to produce a <code class="ph codeph">NULL</code> result instead of a
+ divide-by-zero error when the divisor is equal to zero:
+ </p>
+<pre class="pre codeblock"><code>select 1.0 / nullif(c1,0) as reciprocal from t1;</code></pre>
+ <p class="p">
+ You might also use it for compatibility with other database systems that support the same
+ <code class="ph codeph">NULLIF()</code> function.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.3.0
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nullifzero">
+ <code class="ph codeph">nullifzero(<var class="keyword varname">numeric_expr</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns <code class="ph codeph">NULL</code> if the numeric expression evaluates to 0, otherwise returns
+ the result of the expression.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Used to avoid error conditions such as divide-by-zero in numeric calculations.
+ Serves as shorthand for a more elaborate <code class="ph codeph">CASE</code> expression, to simplify porting SQL with
+ vendor extensions to Impala.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.3.0
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nullvalue">
+ <code class="ph codeph">nullvalue(<var class="keyword varname">expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Tests if an expression (of any type) is <code class="ph codeph">NULL</code> or not.
+ Returns <code class="ph codeph">true</code> if so.
+ The converse of <code class="ph codeph">nonnullvalue()</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">BOOLEAN</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Primarily for compatibility with code containing industry extensions to SQL.
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.2.0</span>
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nvl">
+ <code class="ph codeph">nvl(type a, type ifNull)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">isnull()</code> function. Tests if an expression is
+ <code class="ph codeph">NULL</code>, and returns the expression result value if not. If the first argument is
+ <code class="ph codeph">NULL</code>, returns the second argument. Equivalent to the <code class="ph codeph">nvl()</code> function
+ from Oracle Database or <code class="ph codeph">ifnull()</code> from MySQL.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the first argument value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.1
+ </p>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__nvl2">
+ <code class="ph codeph">nvl2(type a, type ifNull, type ifNotNull)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Enhanced variant of the <code class="ph codeph">nvl()</code> function. Tests an expression
+ and returns different result values depending on whether it is <code class="ph codeph">NULL</code> or not.
+ If the first argument is <code class="ph codeph">NULL</code>, returns the second argument.
+ If the first argument is not <code class="ph codeph">NULL</code>, returns the third argument.
+ Equivalent to the <code class="ph codeph">nvl2()</code> function from Oracle Database.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> Same as the first argument value
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how a query can use special indicator values
+ to represent null and not-null expression values. The first example tests
+ an <code class="ph codeph">INT</code> column and so uses special integer values.
+ The second example tests a <code class="ph codeph">STRING</code> column and so uses
+ special string values.
+ </p>
+<pre class="pre codeblock"><code>
+select x, nvl2(x, 999, 0) from nvl2_demo;
++------+---------------------------+
+| x | if(x is not null, 999, 0) |
++------+---------------------------+
+| NULL | 0 |
+| 1 | 999 |
+| NULL | 0 |
+| 2 | 999 |
++------+---------------------------+
+
+select s, nvl2(s, 'is not null', 'is null') from nvl2_demo;
++------+---------------------------------------------+
+| s | if(s is not null, 'is not null', 'is null') |
++------+---------------------------------------------+
+| NULL | is null |
+| one | is not null |
+| NULL | is null |
+| two | is not null |
++------+---------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="conditional_functions__zeroifnull">
+ <code class="ph codeph">zeroifnull(<var class="keyword varname">numeric_expr</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns 0 if the numeric expression evaluates to <code class="ph codeph">NULL</code>, otherwise returns
+ the result of the expression.
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Used to avoid unexpected results due to unexpected propagation of
+ <code class="ph codeph">NULL</code> values in numeric calculations. Serves as shorthand for a more elaborate
+ <code class="ph codeph">CASE</code> expression, to simplify porting SQL with vendor extensions to Impala.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> same as the initial argument value, except that integer values are promoted to
+ <code class="ph codeph">BIGINT</code> and floating-point values are promoted to <code class="ph codeph">DOUBLE</code>; use
+ <code class="ph codeph">CAST()</code> when inserting into a smaller numeric column
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> Impala 1.3.0
+ </p>
+ </dd>
+
+
+ </dl>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config.html b/docs/build3x/html/topics/impala_config.html
new file mode 100644
index 0000000..c2686d8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config.html
@@ -0,0 +1,48 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config_performance.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_odbc.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_jdbc.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Managing Impala</title></head><body id="config"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Managing Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ This section explains how to configure Impala to accept connections from applications that use popular
+ programming APIs:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_config_performance.html#config_performance">Post-Installation Configuration for Impala</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a>
+ </li>
+ </ul>
+
+ <p class="p">
+ This type of configuration is especially useful when using Impala in combination with Business Intelligence
+ tools, which use these standard interfaces to query different kinds of database and Big Data systems.
+ </p>
+
+ <p class="p">
+ You can also configure these other aspects of Impala:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_security.html#security">Impala Security</a>
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a>
+ </li>
+ </ul>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_config_performance.html">Post-Installation Configuration for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_odbc.html">Configuring Impala to Work with ODBC</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_jdbc.html">Configuring Impala to Work with JDBC</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config_options.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config_options.html b/docs/build3x/html/topics/impala_config_options.html
new file mode 100644
index 0000000..12af2bc
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config_options.html
@@ -0,0 +1,389 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_processes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_options"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Modifying Impala Startup Options</title></head><body id="config_options"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Modifying Impala Startup Options</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ The configuration options for the Impala-related daemons let you choose which hosts and
+ ports to use for the services that run on a single host, specify directories for logging,
+ control resource usage and security, and specify other aspects of the Impala software.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_processes.html">Starting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="config_options__config_options_noncm">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Configuring Impala Startup Options through the Command Line</h2>
+
+ <div class="body conbody">
+
+ <p class="p"> The Impala server, statestore, and catalog services start up using values provided in a
+ defaults file, <span class="ph filepath">/etc/default/impala</span>. </p>
+
+ <p class="p">
+ This file includes information about many resources used by Impala. Most of the defaults
+ included in this file should be effective in most cases. For example, typically you
+ would not change the definition of the <code class="ph codeph">CLASSPATH</code> variable, but you
+ would always set the address used by the statestore server. Some of the content you
+ might modify includes:
+ </p>
+
+
+
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1
+IMPALA_STATE_STORE_PORT=24000
+IMPALA_BACKEND_PORT=22000
+IMPALA_LOG_DIR=/var/log/impala
+IMPALA_CATALOG_SERVICE_HOST=...
+IMPALA_STATE_STORE_HOST=...
+
+export IMPALA_STATE_STORE_ARGS=${IMPALA_STATE_STORE_ARGS:- \
+ -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT}}
+IMPALA_SERVER_ARGS=" \
+-log_dir=${IMPALA_LOG_DIR} \
+-catalog_service_host=${IMPALA_CATALOG_SERVICE_HOST} \
+-state_store_port=${IMPALA_STATE_STORE_PORT} \
+-state_store_host=${IMPALA_STATE_STORE_HOST} \
+-be_port=${IMPALA_BACKEND_PORT}"
+export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+
+ <p class="p">
+ To use alternate values, edit the defaults file, then restart all the Impala-related
+ services so that the changes take effect. Restart the Impala server using the following
+ commands:
+ </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-server restart
+Stopping Impala Server: [ OK ]
+Starting Impala Server: [ OK ]</code></pre>
+
+ <p class="p">
+ Restart the Impala statestore using the following commands:
+ </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-state-store restart
+Stopping Impala State Store Server: [ OK ]
+Starting Impala State Store Server: [ OK ]</code></pre>
+
+ <p class="p">
+ Restart the Impala catalog service using the following commands:
+ </p>
+
+<pre class="pre codeblock"><code>$ sudo service impala-catalog restart
+Stopping Impala Catalog Server: [ OK ]
+Starting Impala Catalog Server: [ OK ]</code></pre>
+
+ <p class="p">
+ Some common settings to change include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Statestore address. Where practical, put the statestore on a separate host not
+ running the <span class="keyword cmdname">impalad</span> daemon. In that recommended configuration,
+ the <span class="keyword cmdname">impalad</span> daemon cannot refer to the statestore server using
+ the loopback address. If the statestore is hosted on a machine with an IP address of
+ 192.168.0.27, change:
+ </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=127.0.0.1</code></pre>
+ <p class="p">
+ to:
+ </p>
+<pre class="pre codeblock"><code>IMPALA_STATE_STORE_HOST=192.168.0.27</code></pre>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Catalog server address (including both the hostname and the port number). Update the
+ value of the <code class="ph codeph">IMPALA_CATALOG_SERVICE_HOST</code> variable. Where
+ practical, run the catalog server on the same host as the statestore. In that
+ recommended configuration, the <span class="keyword cmdname">impalad</span> daemon cannot refer to the
+ catalog server using the loopback address. If the catalog service is hosted on a
+ machine with an IP address of 192.168.0.27, add the following line:
+ </p>
+<pre class="pre codeblock"><code>IMPALA_CATALOG_SERVICE_HOST=192.168.0.27:26000</code></pre>
+ <p class="p">
+ The <span class="ph filepath">/etc/default/impala</span> defaults file currently does not define
+ an <code class="ph codeph">IMPALA_CATALOG_ARGS</code> environment variable, but if you add one it
+ will be recognized by the service startup/shutdown script. Add a definition for this
+ variable to <span class="ph filepath">/etc/default/impala</span> and add the option
+ <code class="ph codeph">-catalog_service_host=<var class="keyword varname">hostname</var></code>. If the port is
+ different than the default 26000, also add the option
+ <code class="ph codeph">-catalog_service_port=<var class="keyword varname">port</var></code>.
+ </p>
+ </li>
+
+ <li class="li" id="config_options_noncm__mem_limit">
+ <p class="p">
+ Memory limits. You can limit the amount of memory available to Impala. For example,
+ to allow Impala to use no more than 70% of system memory, change:
+ </p>
+
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+ -log_dir=${IMPALA_LOG_DIR} \
+ -state_store_port=${IMPALA_STATE_STORE_PORT} \
+ -state_store_host=${IMPALA_STATE_STORE_HOST} \
+ -be_port=${IMPALA_BACKEND_PORT}}</code></pre>
+ <p class="p">
+ to:
+ </p>
+<pre class="pre codeblock"><code>export IMPALA_SERVER_ARGS=${IMPALA_SERVER_ARGS:- \
+ -log_dir=${IMPALA_LOG_DIR} -state_store_port=${IMPALA_STATE_STORE_PORT} \
+ -state_store_host=${IMPALA_STATE_STORE_HOST} \
+ -be_port=${IMPALA_BACKEND_PORT} -mem_limit=70%}</code></pre>
+ <p class="p">
+ You can specify the memory limit using absolute notation such as
+ <code class="ph codeph">500m</code> or <code class="ph codeph">2G</code>, or as a percentage of physical memory
+ such as <code class="ph codeph">60%</code>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Queries that exceed the specified memory limit are aborted. Percentage limits are
+ based on the physical memory of the machine and do not consider cgroups.
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p"> Core dump enablement. To enable core dumps, change: </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-false}</code></pre>
+ <p class="p">
+ to:
+ </p>
+<pre class="pre codeblock"><code>export ENABLE_CORE_DUMPS=${ENABLE_COREDUMPS:-true}</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The location of core dump files may vary according to your operating system configuration.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Other security settings may prevent Impala from writing core dumps even when this option is enabled.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Authorization using the open source Sentry plugin. Specify the
+ <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code>
+ options as part of the <code class="ph codeph">IMPALA_SERVER_ARGS</code> and
+ <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> settings to enable the core Impala support
+ for authentication. See <a class="xref" href="impala_authorization.html#secure_startup">Starting the impalad Daemon with Sentry Authorization Enabled</a> for
+ details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Auditing for successful or blocked Impala queries, another aspect of security.
+ Specify the <code class="ph codeph">-audit_event_log_dir=<var class="keyword varname">directory_path</var></code>
+ option and optionally the
+ <code class="ph codeph">-max_audit_event_log_file_size=<var class="keyword varname">number_of_queries</var></code>
+ and <code class="ph codeph">-abort_on_failed_audit_event</code> options as part of the
+ <code class="ph codeph">IMPALA_SERVER_ARGS</code> settings, for each Impala node, to enable and
+ customize auditing. See <a class="xref" href="impala_auditing.html#auditing">Auditing Impala Operations</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Password protection for the Impala web UI, which listens on port 25000 by default.
+ This feature involves adding some or all of the
+ <code class="ph codeph">--webserver_password_file</code>,
+ <code class="ph codeph">--webserver_authentication_domain</code>, and
+ <code class="ph codeph">--webserver_certificate_file</code> options to the
+ <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code>
+ settings. See <a class="xref" href="impala_security_guidelines.html#security_guidelines">Security Guidelines for Impala</a> for
+ details.
+ </p>
+ </li>
+
+ <li class="li" id="config_options_noncm__default_query_options">
+ <div class="p">
+ Another setting you might add to <code class="ph codeph">IMPALA_SERVER_ARGS</code> is a
+ comma-separated list of query options and values:
+<pre class="pre codeblock"><code>-default_query_options='<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,<var class="keyword varname">option</var>=<var class="keyword varname">value</var>,...'
+</code></pre>
+ These options control the behavior of queries performed by this
+ <span class="keyword cmdname">impalad</span> instance. The option values you specify here override the
+ default values for <a class="xref" href="impala_query_options.html#query_options">Impala query
+ options</a>, as shown by the <code class="ph codeph">SET</code> statement in
+ <span class="keyword cmdname">impala-shell</span>.
+ </div>
+ </li>
+
+
+
+ <li class="li">
+ <p class="p">
+ During troubleshooting, <span class="keyword">the appropriate support channel</span> might direct you to change other values,
+ particularly for <code class="ph codeph">IMPALA_SERVER_ARGS</code>, to work around issues or
+ gather debugging information.
+ </p>
+ </li>
+ </ul>
+
+
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ These startup options for the <span class="keyword cmdname">impalad</span> daemon are different from the
+ command-line options for the <span class="keyword cmdname">impala-shell</span> command. For the
+ <span class="keyword cmdname">impala-shell</span> options, see
+ <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a>.
+ </p>
+ </div>
+
+
+
+ </div>
+
+
+
+
+
+
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="config_options__config_options_checking">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Checking the Values of Impala Configuration Options</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can check the current runtime value of all these settings through the Impala web
+ interface, available by default at
+ <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25000/varz</code> for the
+ <span class="keyword cmdname">impalad</span> daemon,
+ <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25010/varz</code> for the
+ <span class="keyword cmdname">statestored</span> daemon, or
+ <code class="ph codeph">http://<var class="keyword varname">impala_hostname</var>:25020/varz</code> for the
+ <span class="keyword cmdname">catalogd</span> daemon.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="config_options__config_options_impalad">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Startup Options for impalad Daemon</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">impalad</code> daemon implements the main Impala service, which performs
+ query processing and reads and writes the data files.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="config_options__config_options_statestored">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Startup Options for statestored Daemon</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <span class="keyword cmdname">statestored</span> daemon implements the Impala statestore service,
+ which monitors the availability of Impala services across the cluster, and handles
+ situations such as nodes becoming unavailable or becoming available again.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="config_options__config_options_catalogd">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Startup Options for catalogd Daemon</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <span class="keyword cmdname">catalogd</span> daemon implements the Impala catalog service, which
+ broadcasts metadata changes to all the Impala nodes when Impala creates a table, inserts
+ data, or performs other kinds of DDL and DML operations.
+ </p>
+
+ <div class="p">
+ Use <code class="ph codeph">--load_catalog_in_background</code> option to control when
+ the metadata of a table is loaded.
+ <ul class="ul">
+ <li class="li">
+ If set to <code class="ph codeph">false</code>, the metadata of a table is
+ loaded when it is referenced for the first time. This means that the
+ first run of a particular query can be slower than subsequent runs.
+ Starting in Impala 2.2, the default for
+ <code class="ph codeph">load_catalog_in_background</code> is
+ <code class="ph codeph">false</code>.
+ </li>
+ <li class="li">
+ If set to <code class="ph codeph">true</code>, the catalog service attempts to
+ load metadata for a table even if no query needed that metadata. So
+ metadata will possibly be already loaded when the first query that
+ would need it is run. However, for the following reasons, we
+ recommend not to set the option to <code class="ph codeph">true</code>.
+ <ul class="ul">
+ <li class="li">
+ Background load can interfere with query-specific metadata
+ loading. This can happen on startup or after invalidating
+ metadata, with a duration depending on the amount of metadata,
+ and can lead to a seemingly random long running queries that are
+ difficult to diagnose.
+ </li>
+ <li class="li">
+ Impala may load metadata for tables that are possibly never
+ used, potentially increasing catalog size and consequently memory
+ usage for both catalog service and Impala Daemon.
+ </li>
+ </ul>
+ </li>
+ </ul>
+ </div>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_config_performance.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_config_performance.html b/docs/build3x/html/topics/impala_config_performance.html
new file mode 100644
index 0000000..ad91a39
--- /dev/null
+++ b/docs/build3x/html/topics/impala_config_performance.html
@@ -0,0 +1,149 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_config.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="config_performance"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Post-Installation Configuration for Impala</title></head><body id="config_performance"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Post-Installation Configuration for Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p" id="config_performance__p_24">
+ This section describes the mandatory and recommended configuration settings for Impala. If Impala is
+ installed using cluster management software, some of these configurations might be completed automatically; you must still
+ configure short-circuit reads manually. If you want to customize your environment, consider making the changes described in this topic.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ You must enable short-circuit reads, whether or not Impala was installed with cluster
+ management software. This setting goes in the Impala configuration settings, not the Hadoop-wide settings.
+ </li>
+
+ <li class="li">
+ You must enable block location tracking, and you can optionally enable native checksumming for optimal performance.
+ </li>
+ </ul>
+
+ <section class="section" id="config_performance__section_fhq_wyv_ls"><h2 class="title sectiontitle">Mandatory: Short-Circuit Reads</h2>
+
+ <p class="p"> Enabling short-circuit reads allows Impala to read local data directly
+ from the file system. This removes the need to communicate through the
+ DataNodes, improving performance. This setting also minimizes the number
+ of additional copies of data. Short-circuit reads requires
+ <code class="ph codeph">libhadoop.so</code>
+ (the Hadoop Native Library) to be accessible to both the server and the
+ client. <code class="ph codeph">libhadoop.so</code> is not available if you have
+ installed from a tarball. You must install from an
+ <code class="ph codeph">.rpm</code>, <code class="ph codeph">.deb</code>, or parcel to use
+ short-circuit local reads.
+ </p>
+ <p class="p">
+ <strong class="ph b">To configure DataNodes for short-circuit reads:</strong>
+ </p>
+ <ol class="ol" id="config_performance__ol_qlq_wyv_ls">
+ <li class="li" id="config_performance__copy_config_files"> Copy the client
+ <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+ configuration files from the Hadoop configuration directory to the
+ Impala configuration directory. The default Impala configuration
+ location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+ <li class="li">
+
+
+
+ On all Impala nodes, configure the following properties in
+
+ Impala's copy of <code class="ph codeph">hdfs-site.xml</code> as shown: <pre class="pre codeblock"><code><property>
+ <name>dfs.client.read.shortcircuit</name>
+ <value>true</value>
+</property>
+
+<property>
+ <name>dfs.domain.socket.path</name>
+ <value>/var/run/hdfs-sockets/dn</value>
+</property>
+
+<property>
+ <name>dfs.client.file-block-storage-locations.timeout.millis</name>
+ <value>10000</value>
+</property></code></pre>
+
+
+ </li>
+ <li class="li">
+ <p class="p"> If <code class="ph codeph">/var/run/hadoop-hdfs/</code> is group-writable, make
+ sure its group is <code class="ph codeph">root</code>. </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span> If you are also going to enable block location tracking, you
+ can skip copying configuration files and restarting DataNodes and go
+ straight to <a class="xref" href="#config_performance__block_location_tracking">Optional: Block Location Tracking</a>.
+ Configuring short-circuit reads and block location tracking require
+ the same process of copying files and restarting services, so you
+ can complete that process once when you have completed all
+ configuration changes. Whether you copy files and restart services
+ now or during configuring block location tracking, short-circuit
+ reads are not enabled until you complete those final steps. </div>
+ </li>
+ <li class="li" id="config_performance__restart_all_datanodes"> After applying these changes, restart
+ all DataNodes. </li>
+ </ol>
+ </section>
+
+ <section class="section" id="config_performance__block_location_tracking"><h2 class="title sectiontitle">Mandatory: Block Location Tracking</h2>
+
+
+
+ <p class="p">
+ Enabling block location metadata allows Impala to know which disk data blocks are located on, allowing
+ better utilization of the underlying disks. Impala will not start unless this setting is enabled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To enable block location tracking:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ For each DataNode, adding the following to the <code class="ph codeph">hdfs-site.xml</code> file:
+<pre class="pre codeblock"><code><property>
+ <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
+ <value>true</value>
+</property> </code></pre>
+ </li>
+
+ <li class="li"> Copy the client
+ <code class="ph codeph">core-site.xml</code> and <code class="ph codeph">hdfs-site.xml</code>
+ configuration files from the Hadoop configuration directory to the
+ Impala configuration directory. The default Impala configuration
+ location is <code class="ph codeph">/etc/impala/conf</code>. </li>
+
+ <li class="li"> After applying these changes, restart
+ all DataNodes. </li>
+ </ol>
+ </section>
+
+ <section class="section" id="config_performance__native_checksumming"><h2 class="title sectiontitle">Optional: Native Checksumming</h2>
+
+
+
+ <p class="p">
+ Enabling native checksumming causes Impala to use an optimized native library for computing checksums, if
+ that library is available.
+ </p>
+
+ <p class="p" id="config_performance__p_29">
+ <strong class="ph b">To enable native checksumming:</strong>
+ </p>
+
+ <p class="p">
+ If you installed <span class="keyword"></span> from packages, the native checksumming library is installed and setup correctly. In
+ such a case, no additional steps are required. Conversely, if you installed by other means, such as with
+ tarballs, native checksumming may not be available due to missing shared objects. Finding the message
+ "<code class="ph codeph">Unable to load native-hadoop library for your platform... using builtin-java classes where
+ applicable</code>" in the Impala logs indicates native checksumming may be unavailable. To enable native
+ checksumming, you must build and install <code class="ph codeph">libhadoop.so</code> (the
+
+
+ Hadoop Native Library).
+ </p>
+ </section>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_config.html">Managing Impala</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_connecting.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_connecting.html b/docs/build3x/html/topics/impala_connecting.html
new file mode 100644
index 0000000..1411525
--- /dev/null
+++ b/docs/build3x/html/topics/impala_connecting.html
@@ -0,0 +1,187 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_impala_shell.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="connecting"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Connecting to impalad through impala-shell</title></head><body id="connecting"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Connecting to impalad through impala-shell</h1>
+
+
+
+ <div class="body conbody">
+
+
+
+ <div class="p">
+ Within an <span class="keyword cmdname">impala-shell</span> session, you can only issue queries while connected to an instance
+ of the <span class="keyword cmdname">impalad</span> daemon. You can specify the connection information:
+ <ul class="ul">
+ <li class="li">
+ Through command-line options when you run the <span class="keyword cmdname">impala-shell</span> command.
+ </li>
+ <li class="li">
+ Through a configuration file that is read when you run the <span class="keyword cmdname">impala-shell</span> command.
+ </li>
+ <li class="li">
+ During an <span class="keyword cmdname">impala-shell</span> session, by issuing a <code class="ph codeph">CONNECT</code> command.
+ </li>
+ </ul>
+ See <a class="xref" href="impala_shell_options.html">impala-shell Configuration Options</a> for the command-line and configuration file options you can use.
+ </div>
+
+ <p class="p">
+ You can connect to any DataNode where an instance of <span class="keyword cmdname">impalad</span> is running,
+ and that host coordinates the execution of all queries sent to it.
+ </p>
+
+ <p class="p">
+ For simplicity during development, you might always connect to the same host, perhaps running <span class="keyword cmdname">impala-shell</span> on
+ the same host as <span class="keyword cmdname">impalad</span> and specifying the hostname as <code class="ph codeph">localhost</code>.
+ </p>
+
+ <p class="p">
+ In a production environment, you might enable load balancing, in which you connect to specific host/port combination
+ but queries are forwarded to arbitrary hosts. This technique spreads the overhead of acting as the coordinator
+ node among all the DataNodes in the cluster. See <a class="xref" href="impala_proxy.html">Using Impala through a Proxy for High Availability</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To connect the Impala shell during shell startup:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Locate the hostname of a DataNode within the cluster that is running an instance of the
+ <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+ other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+ port number also.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">-i</code> option to the
+ <span class="keyword cmdname">impala-shell</span> interpreter to specify the connection information for
+ that instance of <span class="keyword cmdname">impalad</span>:
+<pre class="pre codeblock"><code># When you are logged into the same machine running impalad.
+# The prompt will reflect the current hostname.
+$ impala-shell
+
+# When you are logged into the same machine running impalad.
+# The host will reflect the hostname 'localhost'.
+$ impala-shell -i localhost
+
+# When you are logged onto a different host, perhaps a client machine
+# outside the Hadoop cluster.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>
+
+# When you are logged onto a different host, and impalad is listening
+# on a non-default port. Perhaps a load balancer is forwarding requests
+# to a different host/port combination behind the scenes.
+$ impala-shell -i <var class="keyword varname">some.other.hostname</var>:<var class="keyword varname">port_number</var>
+</code></pre>
+ </li>
+ </ol>
+
+ <p class="p">
+ <strong class="ph b">To connect the Impala shell after shell startup:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Start the Impala shell with no connection:
+<pre class="pre codeblock"><code>$ impala-shell</code></pre>
+ <p class="p">
+ You should see a prompt like the following:
+ </p>
+<pre class="pre codeblock"><code>Welcome to the Impala shell. Press TAB twice to see a list of available commands.
+...
+<span class="ph">(Shell
+ build version: Impala Shell v3.0.x (<var class="keyword varname">hash</var>) built on
+ <var class="keyword varname">date</var>)</span>
+[Not connected] > </code></pre>
+ </li>
+
+ <li class="li">
+ Locate the hostname of a DataNode within the cluster that is running an instance of the
+ <span class="keyword cmdname">impalad</span> daemon. If that DataNode uses a non-default port (something
+ other than port 21000) for <span class="keyword cmdname">impala-shell</span> connections, find out the
+ port number also.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">connect</code> command to connect to an Impala instance. Enter a command of the form:
+<pre class="pre codeblock"><code>[Not connected] > connect <var class="keyword varname">impalad-host</var>
+[<var class="keyword varname">impalad-host</var>:21000] ></code></pre>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Replace <var class="keyword varname">impalad-host</var> with the hostname you have configured for any DataNode running
+ Impala in your environment. The changed prompt indicates a successful connection.
+ </div>
+ </li>
+ </ol>
+
+ <p class="p">
+ <strong class="ph b">To start <span class="keyword cmdname">impala-shell</span> in a specific database:</strong>
+ </p>
+
+ <p class="p">
+ You can use all the same connection options as in previous examples.
+ For simplicity, these examples assume that you are logged into one of
+ the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Find the name of the database containing the relevant tables, views, and so
+ on that you want to operate on.
+ </li>
+
+ <li class="li">
+ Use the <code class="ph codeph">-d</code> option to the
+ <span class="keyword cmdname">impala-shell</span> interpreter to connect and immediately
+ switch to the specified database, without the need for a <code class="ph codeph">USE</code>
+ statement or fully qualified names:
+<pre class="pre codeblock"><code># Subsequent queries with unqualified names operate on
+# tables, views, and so on inside the database named 'staging'.
+$ impala-shell -i localhost -d staging
+
+# It is common during development, ETL, benchmarking, and so on
+# to have different databases containing the same table names
+# but with different contents or layouts.
+$ impala-shell -i localhost -d parquet_snappy_compression
+$ impala-shell -i localhost -d parquet_gzip_compression
+</code></pre>
+ </li>
+ </ol>
+
+ <p class="p">
+ <strong class="ph b">To run one or several statements in non-interactive mode:</strong>
+ </p>
+
+ <p class="p">
+ You can use all the same connection options as in previous examples.
+ For simplicity, these examples assume that you are logged into one of
+ the DataNodes that is running the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Construct a statement, or a file containing a sequence of statements,
+ that you want to run in an automated way, without typing or copying
+ and pasting each time.
+ </li>
+
+ <li class="li">
+ Invoke <span class="keyword cmdname">impala-shell</span> with the <code class="ph codeph">-q</code> option to run a single statement, or
+ the <code class="ph codeph">-f</code> option to run a sequence of statements from a file.
+ The <span class="keyword cmdname">impala-shell</span> command returns immediately, without going into
+ the interactive interpreter.
+<pre class="pre codeblock"><code># A utility command that you might run while developing shell scripts
+# to manipulate HDFS files.
+$ impala-shell -i localhost -d database_of_interest -q 'show tables'
+
+# A sequence of CREATE TABLE, CREATE VIEW, and similar DDL statements
+# can go into a file to make the setup process repeatable.
+$ impala-shell -i localhost -d database_of_interest -f recreate_tables.sql
+</code></pre>
+ </li>
+ </ol>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_impala_shell.html">Using the Impala Shell (impala-shell Command)</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_conversion_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_conversion_functions.html b/docs/build3x/html/topics/impala_conversion_functions.html
new file mode 100644
index 0000000..5532c8e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_conversion_functions.html
@@ -0,0 +1,288 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="conversion_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Type Conversion Functions</title></head><body id="conversion_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Type Conversion Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Conversion functions are usually used in combination with other functions, to explicitly pass the expected
+ data types. Impala has strict rules regarding data types for function parameters. For example, Impala does
+ not automatically convert a <code class="ph codeph">DOUBLE</code> value to <code class="ph codeph">FLOAT</code>, a
+ <code class="ph codeph">BIGINT</code> value to <code class="ph codeph">INT</code>, or other conversion where precision could be lost or
+ overflow could occur. Also, for reporting or dealing with loosely defined schemas in big data contexts,
+ you might frequently need to convert values to or from the <code class="ph codeph">STRING</code> type.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Although in <span class="keyword">Impala 2.3</span>, the <code class="ph codeph">SHOW FUNCTIONS</code> output for
+ database <code class="ph codeph">_IMPALA_BUILTINS</code> contains some function signatures
+ matching the pattern <code class="ph codeph">castto*</code>, these functions are not intended
+ for public use and are expected to be hidden in future.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following type conversion functions:
+ </p>
+
+<dl class="dl">
+
+
+<dt class="dt dlterm" id="conversion_functions__cast">
+<code class="ph codeph">cast(<var class="keyword varname">expr</var> AS <var class="keyword varname">type</var>)</code>
+</dt>
+
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Converts the value of an expression to any other type.
+If the expression value is of a type that cannot be converted to the target type, the result is <code class="ph codeph">NULL</code>.
+<p class="p"><strong class="ph b">Usage notes:</strong>
+Use <code class="ph codeph">CAST</code> when passing a column value or literal to a function that
+expects a parameter with a different type.
+Frequently used in SQL operations such as <code class="ph codeph">CREATE TABLE AS SELECT</code>
+and <code class="ph codeph">INSERT ... VALUES</code> to ensure that values from various sources
+are of the appropriate type for the destination columns.
+Where practical, do a one-time <code class="ph codeph">CAST()</code> operation during the ingestion process
+to make each column into the appropriate type, rather than using many <code class="ph codeph">CAST()</code>
+operations in each query; doing type conversions for each row during each query can be expensive
+for tables with millions or billions of rows.
+</p>
+ <p class="p">
+ The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+ values is affected by the <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+ <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+ how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+ </p>
+
+<p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select concat('Here are the first ',10,' results.'); -- Fails
+select concat('Here are the first ',cast(10 as string),' results.'); -- Succeeds
+</code></pre>
+<p class="p">
+The following example starts with a text table where every column has a type of <code class="ph codeph">STRING</code>,
+which might be how you ingest data of unknown schema until you can verify the cleanliness of the underly values.
+Then it uses <code class="ph codeph">CAST()</code> to create a new Parquet table with the same data, but using specific
+numeric data types for the columns with numeric data. Using numeric types of appropriate sizes can result in
+substantial space savings on disk and in memory, and performance improvements in queries,
+over using strings or larger-than-necessary numeric types.
+</p>
+<pre class="pre codeblock"><code>create table t1 (name string, x string, y string, z string);
+
+create table t2 stored as parquet
+as select
+ name,
+ cast(x as bigint) x,
+ cast(y as timestamp) y,
+ cast(z as smallint) z
+from t1;
+
+describe t2;
++------+----------+---------+
+| name | type | comment |
++------+----------+---------+
+| name | string | |
+| x | bigint | |
+| y | smallint | |
+| z | tinyint | |
++------+----------+---------+
+</code></pre>
+<p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+<p class="p">
+
+ For details of casts from each kind of data type, see the description of
+ the appropriate type:
+ <a class="xref" href="impala_tinyint.html#tinyint">TINYINT Data Type</a>,
+ <a class="xref" href="impala_smallint.html#smallint">SMALLINT Data Type</a>,
+ <a class="xref" href="impala_int.html#int">INT Data Type</a>,
+ <a class="xref" href="impala_bigint.html#bigint">BIGINT Data Type</a>,
+ <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>,
+ <a class="xref" href="impala_double.html#double">DOUBLE Data Type</a>,
+ <a class="xref" href="impala_decimal.html#decimal">DECIMAL Data Type (Impala 3.0 or higher only)</a>,
+ <a class="xref" href="impala_string.html#string">STRING Data Type</a>,
+ <a class="xref" href="impala_char.html#char">CHAR Data Type (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>,
+ <a class="xref" href="impala_boolean.html#boolean">BOOLEAN Data Type</a>
+</p>
+</dd>
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+<dt class="dt dlterm" id="conversion_functions__typeof">
+<code class="ph codeph">typeof(type value)</code>
+</dt>
+<dd class="dd">
+
+<strong class="ph b">Purpose:</strong> Returns the name of the data type corresponding to an expression. For types with
+extra attributes, such as length for <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code>,
+or precision and scale for <code class="ph codeph">DECIMAL</code>, includes the full specification of the type.
+
+<p class="p"><strong class="ph b">Return type:</strong> <code class="ph codeph">string</code></p>
+<p class="p"><strong class="ph b">Usage notes:</strong> Typically used in interactive exploration of a schema, or in application code that programmatically generates schema definitions such as <code class="ph codeph">CREATE TABLE</code> statements.
+For example, previously, to understand the type of an expression such as
+<code class="ph codeph">col1 / col2</code> or <code class="ph codeph">concat(col1, col2, col3)</code>,
+you might have created a dummy table with a single row, using syntax such as <code class="ph codeph">CREATE TABLE foo AS SELECT 5 / 3.0</code>,
+and then doing a <code class="ph codeph">DESCRIBE</code> to see the type of the row.
+Or you might have done a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation to create a table and
+copy data into it, only learning the types of the columns by doing a <code class="ph codeph">DESCRIBE</code> afterward.
+This technique is especially useful for arithmetic expressions involving <code class="ph codeph">DECIMAL</code> types,
+because the precision and scale of the result is typically different than that of the operands.
+</p>
+<p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+<p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<p class="p">
+These examples show how to check the type of a simple literal or function value.
+Notice how adding even tiny integers together changes the data type of the result to
+avoid overflow, and how the results of arithmetic operations on <code class="ph codeph">DECIMAL</code> values
+have specific precision and scale attributes.
+</p>
+<pre class="pre codeblock"><code>select typeof(2)
++-----------+
+| typeof(2) |
++-----------+
+| TINYINT |
++-----------+
+
+select typeof(2+2)
++---------------+
+| typeof(2 + 2) |
++---------------+
+| SMALLINT |
++---------------+
+
+select typeof('xyz')
++---------------+
+| typeof('xyz') |
++---------------+
+| STRING |
++---------------+
+
+select typeof(now())
++---------------+
+| typeof(now()) |
++---------------+
+| TIMESTAMP |
++---------------+
+
+select typeof(5.3 / 2.1)
++-------------------+
+| typeof(5.3 / 2.1) |
++-------------------+
+| DECIMAL(6,4) |
++-------------------+
+
+select typeof(5.30001 / 2342.1);
++--------------------------+
+| typeof(5.30001 / 2342.1) |
++--------------------------+
+| DECIMAL(13,11) |
++--------------------------+
+
+select typeof(typeof(2+2))
++-----------------------+
+| typeof(typeof(2 + 2)) |
++-----------------------+
+| STRING |
++-----------------------+
+</code></pre>
+
+<p class="p">
+This example shows how even if you do not have a record of the type of a column,
+for example because the type was changed by <code class="ph codeph">ALTER TABLE</code> after the
+original <code class="ph codeph">CREATE TABLE</code>, you can still find out the type in a
+more compact form than examining the full <code class="ph codeph">DESCRIBE</code> output.
+Remember to use <code class="ph codeph">LIMIT 1</code> in such cases, to avoid an identical
+result value for every row in the table.
+</p>
+<pre class="pre codeblock"><code>create table typeof_example (a int, b tinyint, c smallint, d bigint);
+
+/* Empty result set if there is no data in the table. */
+select typeof(a) from typeof_example;
+
+/* OK, now we have some data but the type of column A is being changed. */
+insert into typeof_example values (1, 2, 3, 4);
+alter table typeof_example change a a bigint;
+
+/* We can always find out the current type of that column without doing a full DESCRIBE. */
+select typeof(a) from typeof_example limit 1;
++-----------+
+| typeof(a) |
++-----------+
+| BIGINT |
++-----------+
+</code></pre>
+<p class="p">
+This example shows how you might programmatically generate a <code class="ph codeph">CREATE TABLE</code> statement
+with the appropriate column definitions to hold the result values of arbitrary expressions.
+The <code class="ph codeph">typeof()</code> function lets you construct a detailed <code class="ph codeph">CREATE TABLE</code> statement
+without actually creating the table, as opposed to <code class="ph codeph">CREATE TABLE AS SELECT</code> operations
+where you create the destination table but only learn the column data types afterward through <code class="ph codeph">DESCRIBE</code>.
+</p>
+<pre class="pre codeblock"><code>describe typeof_example;
++------+----------+---------+
+| name | type | comment |
++------+----------+---------+
+| a | bigint | |
+| b | tinyint | |
+| c | smallint | |
+| d | bigint | |
++------+----------+---------+
+
+/* An ETL or business intelligence tool might create variations on a table with different file formats,
+ different sets of columns, and so on. TYPEOF() lets an application introspect the types of the original columns. */
+select concat('create table derived_table (a ', typeof(a), ', b ', typeof(b), ', c ',
+ typeof(c), ', d ', typeof(d), ') stored as parquet;')
+ as 'create table statement'
+from typeof_example limit 1;
++-------------------------------------------------------------------------------------------+
+| create table statement |
++-------------------------------------------------------------------------------------------+
+| create table derived_table (a BIGINT, b TINYINT, c SMALLINT, d BIGINT) stored as parquet; |
++-------------------------------------------------------------------------------------------+
+</code></pre>
+</dd>
+
+
+</dl>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
[50/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_adls.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_adls.html b/docs/build3x/html/topics/impala_adls.html
new file mode 100644
index 0000000..4353825
--- /dev/null
+++ b/docs/build3x/html/topics/impala_adls.html
@@ -0,0 +1,638 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="adls"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Azure Data Lake Store (ADLS)</title></head><body id="adls"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Azure Data Lake Store (ADLS)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ You can use Impala to query data residing on the Azure Data Lake Store (ADLS) filesystem.
+ This capability allows convenient access to a storage system that is remotely managed,
+ accessible from anywhere, and integrated with various cloud-based services. Impala can
+ query files in any supported file format from ADLS. The ADLS storage location
+ can be for an entire table, or individual partitions in a partitioned table.
+ </p>
+
+ <p class="p">
+ The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+ full-table scans. In contrast, queries against ADLS data are less performant, making ADLS suitable for holding
+ <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+ HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+ to put some partitions on HDFS and others on ADLS, typically depending on the age of the data.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="adls__prereqs">
+ <h2 class="title topictitle2" id="ariaid-title2">Prerequisites</h2>
+ <div class="body conbody">
+ <p class="p">
+ These procedures presume that you have already set up an Azure account,
+ configured an ADLS store, and configured your Hadoop cluster with appropriate
+ credentials to be able to access ADLS. See the following resources for information:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-get-started-portal" target="_blank">Get started with Azure Data Lake Store using the Azure Portal</a>
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <a class="xref" href="https://hadoop.apache.org/docs/current/hadoop-azure-datalake/index.html" target="_blank">Hadoop Azure Data Lake Support</a>
+ </p>
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="adls__sql">
+ <h2 class="title topictitle2" id="ariaid-title3">How Impala SQL Statements Work with ADLS</h2>
+ <div class="body conbody">
+ <p class="p">
+ Impala SQL statements work with data on ADLS as follows:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+ or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+ can specify that a table resides on the ADLS filesystem by
+ encoding an <code class="ph codeph">adl://</code> prefix for the <code class="ph codeph">LOCATION</code>
+ property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+ property for an individual partition, so that some data in a table resides on
+ ADLS and other data in the same table resides on HDFS.
+ </p>
+ <div class="p">
+ The full format of the location URI is typically:
+<pre class="pre codeblock"><code>
+adl://<var class="keyword varname">your_account</var>.azuredatalakestore.net/<var class="keyword varname">rest_of_directory_path</var>
+</code></pre>
+ </div>
+ </li>
+ <li class="li">
+ <p class="p">
+ Once a table or partition is designated as residing on ADLS, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+ statement transparently accesses the data files from the appropriate storage layer.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ If the ADLS table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+ removes the corresponding data files from ADLS when the table is dropped.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+ data files from ADLS when the table is truncated.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+ an ADLS table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a>, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another ADLS
+ table into an ADLS table.
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ For usage information about Impala SQL statements with ADLS tables, see <a class="xref" href="impala_adls.html#ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</a>
+ and <a class="xref" href="impala_adls.html#dml">Using Impala DML Statements for ADLS Data</a>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="adls__creds">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Specifying Impala Credentials to Access Data in ADLS</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To allow Impala to access data in ADLS, specify values for the following configuration settings in your
+ <span class="ph filepath">core-site.xml</span> file:
+ </p>
+
+<pre class="pre codeblock"><code>
+<property>
+ <name>dfs.adls.oauth2.access.token.provider.type</name>
+ <value>ClientCredential</value>
+</property>
+<property>
+ <name>dfs.adls.oauth2.client.id</name>
+ <value><varname>your_client_id</varname></value>
+</property>
+<property>
+ <name>dfs.adls.oauth2.credential</name>
+ <value><varname>your_client_secret</varname></value>
+</property>
+<property>
+ <name>dfs.adls.oauth2.refresh.url</name>
+ <value><varname>refresh_URL</varname></value>
+</property>
+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Check if your Hadoop distribution or cluster management tool includes support for
+ filling in and distributing credentials across the cluster in an automated way.
+ </p>
+ </div>
+
+ <p class="p">
+ After specifying the credentials, restart both the Impala and
+ Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+ through the Hive metastore.)
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="adls__etl">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Loading Data into ADLS for Impala Queries</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If your ETL pipeline involves moving data into ADLS and then querying through Impala,
+ you can either use Impala DML statements to create, move, or copy the data, or
+ use the same data loading techniques as you would for non-Impala data.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="etl__dml">
+ <h3 class="title topictitle3" id="ariaid-title6">Using Impala DML Statements for ADLS Data</h3>
+ <div class="body conbody">
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Azure Data Lake Store (ADLS).
+ The syntax of the DML statements is the same as for any other tables, because the ADLS location for tables and
+ partitions is specified by an <code class="ph codeph">adl://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into ADLS using the normal ADLS transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the ADLS data.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="etl__manual_etl">
+ <h3 class="title topictitle3" id="ariaid-title7">Manually Loading Data into Impala Tables on ADLS</h3>
+ <div class="body conbody">
+ <p class="p">
+ As an alternative, you can use the Microsoft-provided methods to bring data files
+ into ADLS for querying through Impala. See
+ <a class="xref" href="https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-copy-data-azure-storage-blob" target="_blank">the Microsoft ADLS documentation</a>
+ for details.
+ </p>
+
+ <p class="p">
+ After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+ files in ADLS from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement to make Impala aware of the new set of data files.
+ </p>
+
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="adls__ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Creating Impala Databases, Tables, and Partitions for Data Stored on ADLS</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala reads data for a table or partition from ADLS based on the <code class="ph codeph">LOCATION</code> attribute for the
+ table or partition. Specify the ADLS details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+ TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+ clause is <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">path/to/file</var></code>.
+ </p>
+
+ <p class="p">
+ For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+ or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in ADLS to mirror
+ the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, ADLS filenames do not
+ have directory paths, Impala treats ADLS filenames with <code class="ph codeph">/</code> characters the same as HDFS
+ pathnames that include directories.
+ </p>
+
+ <p class="p">
+ To point a nonpartitioned table or an individual partition at ADLS, specify a single directory
+ path in ADLS, which could be any arbitrary directory. To replicate the structure of an entire Impala
+ partitioned table or database in ADLS requires more care, with directories and subdirectories nested and
+ named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+ necessary in HDFS, and recording the complete directory structure so that you can replicate it in ADLS.
+ </p>
+
+ <p class="p">
+ For example, the following session creates a partitioned table where only a single partition resides on ADLS.
+ The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+ <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL, and so refers to data residing on
+ ADLS, under a specific path underneath the store <code class="ph codeph">impalademo</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database db_on_hdfs;
+[localhost:21000] > use db_on_hdfs;
+[localhost:21000] > create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2015)
+ > location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3/t1';
+</code></pre>
+
+ <p class="p">
+ For convenience when working with multiple tables with data files stored in ADLS, you can create a database
+ with a <code class="ph codeph">LOCATION</code> attribute pointing to an ADLS path.
+ Specify a URL of the form <code class="ph codeph">adl://<var class="keyword varname">store</var>/<var class="keyword varname">root/path/for/database</var></code>
+ for the <code class="ph codeph">LOCATION</code> attribute of the database.
+ Any tables created inside that database
+ automatically create directories underneath the one specified by the database
+ <code class="ph codeph">LOCATION</code> attribute.
+ </p>
+
+ <p class="p">
+ The following session creates a database and two partitioned tables residing entirely on ADLS, one
+ partitioned by a single column and the other partitioned by multiple columns. Because a
+ <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">adl://</code> URL is specified for the database, the
+ tables inside that database are automatically created on ADLS underneath the database directory. To see the
+ names of the associated subdirectories, including the partition key values, we use an ADLS client tool to
+ examine how the directory structure is organized on ADLS. For example, Impala partition directories such as
+ <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+ through Hive.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database db_on_adls location 'adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3';
+[localhost:21000] > use db_on_adls;
+
+[localhost:21000] > create table partitioned_on_adls (x int) partitioned by (year int);
+[localhost:21000] > alter table partitioned_on_adls add partition (year=2013);
+[localhost:21000] > alter table partitioned_on_adls add partition (year=2014);
+[localhost:21000] > alter table partitioned_on_adls add partition (year=2015);
+
+[localhost:21000] > ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34 0 dir1/dir2/dir3/
+2015-03-17 16:43:28 0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49 0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53 0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58 0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+
+[localhost:21000] > create table partitioned_multiple_keys (x int)
+ > partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=1,day=1);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=1,day=31);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=2,day=28);
+
+[localhost:21000] > ! hadoop fs -ls adl://impalademo.azuredatalakestore.net/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34 0 dir1/dir2/dir3/
+2015-03-17 16:47:13 0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28 0 dir1/dir2/dir3/partitioned_on_adls/
+2015-03-17 16:43:49 0 dir1/dir2/dir3/partitioned_on_adls/year=2013/
+2015-03-17 16:43:53 0 dir1/dir2/dir3/partitioned_on_adls/year=2014/
+2015-03-17 16:43:58 0 dir1/dir2/dir3/partitioned_on_adls/year=2015/
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+ directory paths if they do not already exist. You can specify multiple levels of directories, and the
+ <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+ -p</code>.
+ </p>
+
+ <p class="p">
+ Use the standard ADLS file upload methods to actually put the data files into the right locations. You can
+ also put the directory paths and data files in place before creating the associated Impala databases or
+ tables, and Impala automatically uses the data from the appropriate location after the associated databases
+ and tables are created.
+ </p>
+
+ <p class="p">
+ You can switch whether an existing table or partition points to data in HDFS or ADLS. For example, if you
+ have an Impala table or partition pointing to data files in HDFS or ADLS, and you later transfer those data
+ files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+ <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+ Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+ practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="adls__internal_external">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Internal and External Tables Located on ADLS</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Just as with tables located on HDFS storage, you can designate ADLS-based tables as either internal (managed
+ by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+ TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+ removed, even if they are on ADLS storage. When you drop an external table, the files associated with the
+ table are left alone, and are still available for access by other tools or components. See
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+ </p>
+
+ <p class="p">
+ If the data on ADLS is intended to be long-lived and accessed by other tools in addition to Impala, create
+ any associated ADLS tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+ deleted from ADLS when the table is dropped.
+ </p>
+
+ <p class="p">
+ If the data on ADLS is only needed for querying by Impala and can be safely discarded once the Impala
+ workflow is complete, create the associated ADLS tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+ that dropping the table also deletes the corresponding data files on ADLS.
+ </p>
+
+ <p class="p">
+ For example, this session creates a table in ADLS with the same column layout as a table in HDFS, then
+ examines the ADLS table and queries some data from it. The table in ADLS works the same as a table in HDFS as
+ far as the expected file format of the data, table and column statistics, and other table properties. The
+ only indication that it is not an HDFS table is the <code class="ph codeph">adl://</code> URL in the
+ <code class="ph codeph">LOCATION</code> property. Many data files can reside in the ADLS directory, and their combined
+ contents form the table data. Because the data in this example is uploaded after the table is created, a
+ <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table usa_cities_adls like usa_cities location 'adl://impalademo.azuredatalakestore.net/usa_cities';
+[localhost:21000] > desc usa_cities_adls;
++-------+----------+---------+
+| name | type | comment |
++-------+----------+---------+
+| id | smallint | |
+| city | string | |
+| state | string | |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to ADLS as in the HDFS table,
+-- under the relevant store and path. If you already have the data in ADLS, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] > refresh usa_cities_adls;
+[localhost:21000] > select count(*) from usa_cities_adls;
++----------+
+| count(*) |
++----------+
+| 289 |
++----------+
+[localhost:21000] > select distinct state from sample_data_adls limit 5;
++----------------------+
+| state |
++----------------------+
+| Louisiana |
+| Minnesota |
+| Georgia |
+| Alaska |
+| Ohio |
++----------------------+
+[localhost:21000] > desc formatted usa_cities_adls;
++------------------------------+----------------------------------------------------+---------+
+| name | type | comment |
++------------------------------+----------------------------------------------------+---------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| id | smallint | NULL |
+| city | string | NULL |
+| state | string | NULL |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | adls_testing | NULL |
+| Owner: | jrussell | NULL |
+| CreateTime: | Mon Mar 16 11:36:25 PDT 2017 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | adl://impalademo.azuredatalakestore.net/usa_cities | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+...
++------------------------------+----------------------------------------------------+---------+
+</code></pre>
+
+ <p class="p">
+ In this case, we have already uploaded a Parquet file with a million rows of data to the
+ <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impalademo</code> store on ADLS. This
+ session creates a table with matching column settings pointing to the corresponding location in ADLS, then
+ queries the table. Because the data is already in place on ADLS when the table is created, no
+ <code class="ph codeph">REFRESH</code> statement is required.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table sample_data_adls
+ > (id int, id bigint, val int, zerofill string,
+ > name string, assertion boolean, city string, state string)
+ > stored as parquet location 'adl://impalademo.azuredatalakestore.net/sample_data';
+[localhost:21000] > select count(*) from sample_data_adls;
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+[localhost:21000] > select count(*) howmany, assertion from sample_data_adls group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149 | true |
+| 332851 | false |
++---------+-----------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="adls__queries">
+
+ <h2 class="title topictitle2" id="ariaid-title10">Running and Tuning Impala Queries for Data Stored on ADLS</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+ query data stored in ADLS exactly the same as data stored on HDFS or in HBase:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Queries against ADLS data support all the same file formats as for HDFS data.
+ </li>
+
+ <li class="li">
+ Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in ADLS
+ corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+ ADD PARTITION</code> to set up the appropriate paths in ADLS.
+ </li>
+
+ <li class="li">
+ HDFS, Kudu, and HBase tables can be joined to ADLS tables, or ADLS tables can be joined with each other.
+ </li>
+
+ <li class="li">
+ Authorization using the Sentry framework to control access to databases, tables, or columns works the
+ same whether the data is in HDFS or in ADLS.
+ </li>
+
+ <li class="li">
+ The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and ADLS tables. Use
+ <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for ADLS tables in the same situations
+ where you would issue those statements for HDFS tables.
+ </li>
+
+ <li class="li">
+ Queries against ADLS tables are subject to the same kinds of admission control and resource management as
+ HDFS tables.
+ </li>
+
+ <li class="li">
+ Metadata about ADLS tables is stored in the same metastore database as for HDFS tables.
+ </li>
+
+ <li class="li">
+ You can set up views referring to ADLS tables, the same as for HDFS tables.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+ STATS</code> statements work for ADLS tables also.
+ </li>
+ </ul>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="queries__performance">
+
+ <h3 class="title topictitle3" id="ariaid-title11">Understanding and Tuning Impala Query Performance for ADLS Data</h3>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Impala queries for data stored in ADLS might be less performant than queries against the
+ equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+ interpret explain plans and profiles for queries against ADLS data, and tips to achieve the best
+ performance possible for such queries.
+ </p>
+
+ <p class="p">
+ All else being equal, performance is expected to be lower for queries running against data on ADLS rather
+ than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+ data is in ADLS. Although the work is still distributed across the datanodes of the cluster, Impala might
+ parallelize the work for a distributed query differently for data on HDFS and ADLS. ADLS does not have the
+ same block notion as HDFS, so Impala uses heuristics to determine how to split up large ADLS files for
+ processing in parallel. Because all hosts can access any ADLS data file with equal efficiency, the
+ distribution of work might be different than for HDFS data, where the data blocks are physically read
+ using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+ read the ADLS data might be spread evenly across the hosts of the cluster, the fact that all data is
+ initially retrieved across the network means that the overall query performance is likely to be lower for
+ ADLS data than for HDFS data.
+ </p>
+
+ <p class="p">
+ Because ADLS does not expose the block sizes of data files the way HDFS does,
+ any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+ use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of
+ Parquet data files. (Using a large block size is more important for Parquet tables than
+ for tables that use other file formats.)
+ </p>
+
+ <p class="p">
+ When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+ ADLS the same way. Therefore, follow all the same tuning recommendations for ADLS tables as for HDFS ones,
+ such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+ row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+ </p>
+
+ <p class="p">
+ In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+ <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+ <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+ If you do see any indications that a query against an ADLS table performed <span class="q">"remote read"</span>
+ operations, do not be alarmed. That is expected because, by definition, all the I/O for ADLS tables involves
+ remote reads.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="adls__restrictions">
+
+ <h2 class="title topictitle2" id="ariaid-title12">Restrictions on Impala Support for ADLS</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala requires that the default filesystem for the cluster be HDFS. You cannot use ADLS as the only
+ filesystem in the cluster.
+ </p>
+
+ <p class="p">
+ Although ADLS is often used to store JSON-formatted data, the current Impala support for ADLS does not include
+ directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+ flattened version of that data for querying by Impala as part of your ETL cycle.
+ </p>
+
+ <p class="p">
+ You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+ located in ADLS.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="adls__best_practices">
+ <h2 class="title topictitle2" id="ariaid-title13">Best Practices for Using Impala with ADLS</h2>
+
+ <div class="body conbody">
+ <p class="p">
+ The following guidelines represent best practices derived from testing and real-world experience with Impala on ADLS:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Any reference to an ADLS location must be fully qualified. (This rule applies when
+ ADLS is not designated as the default filesystem.)
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Set any appropriate configuration settings for <span class="keyword cmdname">impalad</span>.
+ </p>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admin.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admin.html b/docs/build3x/html/topics/impala_admin.html
new file mode 100644
index 0000000..7c76987
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admin.html
@@ -0,0 +1,52 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admission.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_resource_management.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_timeouts.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_proxy.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_disk_space.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admin"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Admini
stration</title></head><body id="admin"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Administration</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ As an administrator, you monitor Impala's use of resources and take action when necessary to keep Impala
+ running smoothly and avoid conflicts with other Hadoop components running on the same cluster. When you
+ detect that an issue has happened or could happen in the future, you reconfigure Impala or other components
+ such as HDFS or even the hardware of the cluster itself to resolve or avoid problems.
+ </p>
+
+ <p class="p toc"></p>
+
+ <p class="p">
+ <strong class="ph b">Related tasks:</strong>
+ </p>
+
+ <p class="p">
+ As an administrator, you can expect to perform installation, upgrade, and configuration tasks for Impala on
+ all machines in a cluster. See <a class="xref" href="impala_install.html#install">Installing Impala</a>,
+ <a class="xref" href="impala_upgrading.html#upgrading">Upgrading Impala</a>, and <a class="xref" href="impala_config.html#config">Managing Impala</a> for details.
+ </p>
+
+ <p class="p">
+ For security tasks typically performed by administrators, see <a class="xref" href="impala_security.html#security">Impala Security</a>.
+ </p>
+
+ <div class="p">
+ Administrators also decide how to allocate cluster resources so that all Hadoop components can run smoothly
+ together. For Impala, this task primarily involves:
+ <ul class="ul">
+ <li class="li">
+ Deciding how many Impala queries can run concurrently and with how much memory, through the admission
+ control feature. See <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for details.
+ </li>
+
+ <li class="li">
+ Dividing cluster resources such as memory between Impala and other components, using YARN for overall
+ resource management, and Llama to mediate resource requests from Impala to YARN. See
+ <a class="xref" href="impala_resource_management.html#resource_management">Resource Management for Impala</a> for details.
+ </li>
+ </ul>
+ </div>
+
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_admission.html">Admission Control and Query Queuing</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_resource_management.html">Resource Management for Impala</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_timeouts.html">Setting Timeout Periods for Daemons, Queries, and Sessions</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_proxy.html">Using Impala through a Proxy for High Availability</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_disk_space.html">Managing Disk Space for Impala Data</a></strong><br></li></ul></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_admission.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_admission.html b/docs/build3x/html/topics/impala_admission.html
new file mode 100644
index 0000000..9eff7ea
--- /dev/null
+++ b/docs/build3x/html/topics/impala_admission.html
@@ -0,0 +1,822 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_admin.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="admission_control"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Admission Control and Query Queuing</title></head><body id="admission_control"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Admission Control and Query Queuing</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p" id="admission_control__admission_control_intro">
+ Admission control is an Impala feature that imposes limits on concurrent SQL queries, to avoid resource usage
+ spikes and out-of-memory conditions on busy clusters.
+ It is a form of <span class="q">"throttling"</span>.
+ New queries are accepted and executed until
+ certain conditions are met, such as too many queries or too much
+ total memory used across the cluster.
+ When one of these thresholds is reached,
+ incoming queries wait to begin execution. These queries are
+ queued and are admitted (that is, begin executing) when the resources become available.
+ </p>
+ <p class="p">
+ In addition to the threshold values for currently executing queries,
+ you can place limits on the maximum number of queries that are
+ queued (waiting) and a limit on the amount of time they might wait
+ before returning with an error. These queue settings let you ensure that queries do
+ not wait indefinitely, so that you can detect and correct <span class="q">"starvation"</span> scenarios.
+ </p>
+ <p class="p">
+ Enable this feature if your cluster is
+ underutilized at some times and overutilized at others. Overutilization is indicated by performance
+ bottlenecks and queries being cancelled due to out-of-memory conditions, when those same queries are
+ successful and perform well during times with less concurrent load. Admission control works as a safeguard to
+ avoid out-of-memory conditions during heavy concurrent usage.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The use of the Llama component for integrated resource management within YARN
+ is no longer supported with <span class="keyword">Impala 2.3</span> and higher.
+ The Llama support code is removed entirely in <span class="keyword">Impala 2.8</span> and higher.
+ </p>
+ <p class="p">
+ For clusters running Impala alongside
+ other data management components, you define static service pools to define the resources
+ available to Impala and other components. Then within the area allocated for Impala,
+ you can create dynamic service pools, each with its own settings for the Impala admission control feature.
+ </p>
+ </div>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_admin.html">Impala Administration</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="admission_control__admission_intro">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Overview of Impala Admission Control</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ On a busy cluster, you might find there is an optimal number of Impala queries that run concurrently.
+ For example, when the I/O capacity is fully utilized by I/O-intensive queries,
+ you might not find any throughput benefit in running more concurrent queries.
+ By allowing some queries to run at full speed while others wait, rather than having
+ all queries contend for resources and run slowly, admission control can result in higher overall throughput.
+ </p>
+
+ <p class="p">
+ For another example, consider a memory-bound workload such as many large joins or aggregation queries.
+ Each such query could briefly use many gigabytes of memory to process intermediate results.
+ Because Impala by default cancels queries that exceed the specified memory limit,
+ running multiple large-scale queries at once might require
+ re-running some queries that are cancelled. In this case, admission control improves the
+ reliability and stability of the overall workload by only allowing as many concurrent queries
+ as the overall memory of the cluster can accomodate.
+ </p>
+
+ <p class="p">
+ The admission control feature lets you set an upper limit on the number of concurrent Impala
+ queries and on the memory used by those queries. Any additional queries are queued until the earlier ones
+ finish, rather than being cancelled or running slowly and causing contention. As other queries finish, the
+ queued queries are allowed to proceed.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can specify these limits and thresholds for each
+ pool rather than globally. That way, you can balance the resource usage and throughput
+ between steady well-defined workloads, rare resource-intensive queries, and ad hoc
+ exploratory queries.
+ </p>
+
+ <p class="p">
+ For details on the internal workings of admission control, see
+ <a class="xref" href="impala_admission.html#admission_architecture">How Impala Schedules and Enforces Limits on Concurrent Queries</a>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="admission_control__admission_concurrency">
+ <h2 class="title topictitle2" id="ariaid-title3">Concurrent Queries and Admission Control</h2>
+ <div class="body conbody">
+ <p class="p">
+ One way to limit resource usage through admission control is to set an upper limit
+ on the number of concurrent queries. This is the initial technique you might use
+ when you do not have extensive information about memory usage for your workload.
+ This setting can be specified separately for each dynamic resource pool.
+ </p>
+ <p class="p">
+ You can combine this setting with the memory-based approach described in
+ <a class="xref" href="impala_admission.html#admission_memory">Memory Limits and Admission Control</a>. If either the maximum number of
+ or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+ are queued until the concurrent workload falls below the threshold again.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="admission_control__admission_memory">
+ <h2 class="title topictitle2" id="ariaid-title4">Memory Limits and Admission Control</h2>
+ <div class="body conbody">
+ <p class="p">
+ Each dynamic resource pool can have an upper limit on the cluster-wide memory used by queries executing in that pool.
+ This is the technique to use once you have a stable workload with well-understood memory requirements.
+ </p>
+ <p class="p">
+ Always specify the <span class="ph uicontrol">Default Query Memory Limit</span> for the expected maximum amount of RAM
+ that a query might require on each host, which is equivalent to setting the <code class="ph codeph">MEM_LIMIT</code>
+ query option for every query run in that pool. That value affects the execution of each query, preventing it
+ from overallocating memory on each host, and potentially activating the spill-to-disk mechanism or cancelling
+ the query when necessary.
+ </p>
+ <p class="p">
+ Optionally, specify the <span class="ph uicontrol">Max Memory</span> setting, a cluster-wide limit that determines
+ how many queries can be safely run concurrently, based on the upper memory limit per host multiplied by the
+ number of Impala nodes in the cluster.
+ </p>
+ <div class="p">
+ For example, consider the following scenario:
+ <ul class="ul">
+ <li class="li"> The cluster is running <span class="keyword cmdname">impalad</span> daemons on five
+ DataNodes. </li>
+ <li class="li"> A dynamic resource pool has <span class="ph uicontrol">Max Memory</span> set
+ to 100 GB. </li>
+ <li class="li"> The <span class="ph uicontrol">Default Query Memory Limit</span> for the
+ pool is 10 GB. Therefore, any query running in this pool could use
+ up to 50 GB of memory (default query memory limit * number of Impala
+ nodes). </li>
+ <li class="li"> The maximum number of queries that Impala executes concurrently
+ within this dynamic resource pool is two, which is the most that
+ could be accomodated within the 100 GB <span class="ph uicontrol">Max
+ Memory</span> cluster-wide limit. </li>
+ <li class="li"> There is no memory penalty if queries use less memory than the
+ <span class="ph uicontrol">Default Query Memory Limit</span> per-host setting
+ or the <span class="ph uicontrol">Max Memory</span> cluster-wide limit. These
+ values are only used to estimate how many queries can be run
+ concurrently within the resource constraints for the pool. </li>
+ </ul>
+ </div>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span> If you specify <span class="ph uicontrol">Max
+ Memory</span> for an Impala dynamic resource pool, you must also
+ specify the <span class="ph uicontrol">Default Query Memory Limit</span>.
+ <span class="ph uicontrol">Max Memory</span> relies on the <span class="ph uicontrol">Default
+ Query Memory Limit</span> to produce a reliable estimate of
+ overall memory consumption for a query. </div>
+ <p class="p">
+ You can combine the memory-based settings with the upper limit on concurrent queries described in
+ <a class="xref" href="impala_admission.html#admission_concurrency">Concurrent Queries and Admission Control</a>. If either the maximum number of
+ or the expected memory usage of the concurrent queries is exceeded, subsequent queries
+ are queued until the concurrent workload falls below the threshold again.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="admission_control__admission_yarn">
+
+ <h2 class="title topictitle2" id="ariaid-title5">How Impala Admission Control Relates to Other Resource Management Tools</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The admission control feature is similar in some ways to the YARN resource management framework. These features
+ can be used separately or together. This section describes some similarities and differences, to help you
+ decide which combination of resource management features to use for Impala.
+ </p>
+
+ <p class="p">
+ Admission control is a lightweight, decentralized system that is suitable for workloads consisting
+ primarily of Impala queries and other SQL statements. It sets <span class="q">"soft"</span> limits that smooth out Impala
+ memory usage during times of heavy load, rather than taking an all-or-nothing approach that cancels jobs
+ that are too resource-intensive.
+ </p>
+
+ <p class="p">
+ Because the admission control system does not interact with other Hadoop workloads such as MapReduce jobs, you
+ might use YARN with static service pools on clusters where resources are shared between
+ Impala and other Hadoop components. This configuration is recommended when using Impala in a
+ <dfn class="term">multitenant</dfn> cluster. Devote a percentage of cluster resources to Impala, and allocate another
+ percentage for MapReduce and other batch-style workloads. Let admission control handle the concurrency and
+ memory usage for the Impala work within the cluster, and let YARN manage the work for other components within the
+ cluster. In this scenario, Impala's resources are not managed by YARN.
+ </p>
+
+ <p class="p">
+ The Impala admission control feature uses the same configuration mechanism as the YARN resource manager to map users to
+ pools and authenticate them.
+ </p>
+
+ <p class="p">
+ Although the Impala admission control feature uses a <code class="ph codeph">fair-scheduler.xml</code> configuration file
+ behind the scenes, this file does not depend on which scheduler is used for YARN. You still use this file
+ even when YARN is using the capacity scheduler.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="admission_control__admission_architecture">
+
+ <h2 class="title topictitle2" id="ariaid-title6">How Impala Schedules and Enforces Limits on Concurrent Queries</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The admission control system is decentralized, embedded in each Impala daemon and communicating through the
+ statestore mechanism. Although the limits you set for memory usage and number of concurrent queries apply
+ cluster-wide, each Impala daemon makes its own decisions about whether to allow each query to run
+ immediately or to queue it for a less-busy time. These decisions are fast, meaning the admission control
+ mechanism is low-overhead, but might be imprecise during times of heavy load across many coordinators. There could be times when the
+ more queries were queued (in aggregate across the cluster) than the specified limit, or when number of admitted queries
+ exceeds the expected number. Thus, you typically err on the
+ high side for the size of the queue, because there is not a big penalty for having a large number of queued
+ queries; and you typically err on the low side for configuring memory resources, to leave some headroom in case more
+ queries are admitted than expected, without running out of memory and being cancelled as a result.
+ </p>
+
+
+
+ <p class="p">
+ To avoid a large backlog of queued requests, you can set an upper limit on the size of the queue for
+ queries that are queued. When the number of queued queries exceeds this limit, further queries are
+ cancelled rather than being queued. You can also configure a timeout period per pool, after which queued queries are
+ cancelled, to avoid indefinite waits. If a cluster reaches this state where queries are cancelled due to
+ too many concurrent requests or long waits for query execution to begin, that is a signal for an
+ administrator to take action, either by provisioning more resources, scheduling work on the cluster to
+ smooth out the load, or by doing <a class="xref" href="impala_performance.html#performance">Impala performance
+ tuning</a> to enable higher throughput.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="admission_control__admission_jdbc_odbc">
+
+ <h2 class="title topictitle2" id="ariaid-title7">How Admission Control works with Impala Clients (JDBC, ODBC, HiveServer2)</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Most aspects of admission control work transparently with client interfaces such as JDBC and ODBC:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If a SQL statement is put into a queue rather than running immediately, the API call blocks until the
+ statement is dequeued and begins execution. At that point, the client program can request to fetch
+ results, which might also block until results become available.
+ </li>
+
+ <li class="li">
+ If a SQL statement is cancelled because it has been queued for too long or because it exceeded the memory
+ limit during execution, the error is returned to the client program with a descriptive error message.
+ </li>
+
+ </ul>
+
+ <p class="p">
+ In Impala 2.0 and higher, you can submit
+ a SQL <code class="ph codeph">SET</code> statement from the client application
+ to change the <code class="ph codeph">REQUEST_POOL</code> query option.
+ This option lets you submit queries to different resource pools,
+ as described in <a class="xref" href="impala_request_pool.html#request_pool">REQUEST_POOL Query Option</a>.
+
+ </p>
+
+ <p class="p">
+ At any time, the set of queued queries could include queries submitted through multiple different Impala
+ daemon hosts. All the queries submitted through a particular host will be executed in order, so a
+ <code class="ph codeph">CREATE TABLE</code> followed by an <code class="ph codeph">INSERT</code> on the same table would succeed.
+ Queries submitted through different hosts are not guaranteed to be executed in the order they were
+ received. Therefore, if you are using load-balancing or other round-robin scheduling where different
+ statements are submitted through different hosts, set up all table structures ahead of time so that the
+ statements controlled by the queuing system are primarily queries, where order is not significant. Or, if a
+ sequence of statements needs to happen in strict order (such as an <code class="ph codeph">INSERT</code> followed by a
+ <code class="ph codeph">SELECT</code>), submit all those statements through a single session, while connected to the same
+ Impala daemon host.
+ </p>
+
+ <p class="p">
+ Admission control has the following limitations or special behavior when used with JDBC or ODBC
+ applications:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The other resource-related query options,
+ <code class="ph codeph">RESERVATION_REQUEST_TIMEOUT</code> and <code class="ph codeph">V_CPU_CORES</code>, are no longer used. Those query options only
+ applied to using Impala with Llama, which is no longer supported.
+ </li>
+ </ul>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="admission_control__admission_schema_config">
+ <h2 class="title topictitle2" id="ariaid-title8">SQL and Schema Considerations for Admission Control</h2>
+ <div class="body conbody">
+ <p class="p">
+ When queries complete quickly and are tuned for optimal memory usage, there is less chance of
+ performance or capacity problems during times of heavy load. Before setting up admission control,
+ tune your Impala queries to ensure that the query plans are efficient and the memory estimates
+ are accurate. Understanding the nature of your workload, and which queries are the most
+ resource-intensive, helps you to plan how to divide the queries into different pools and
+ decide what limits to define for each pool.
+ </p>
+ <p class="p">
+ For large tables, especially those involved in join queries, keep their statistics up to date
+ after loading substantial amounts of new data or adding new partitions.
+ Use the <code class="ph codeph">COMPUTE STATS</code> statement for unpartitioned tables, and
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> for partitioned tables.
+ </p>
+ <p class="p">
+ When you use dynamic resource pools with a <span class="ph uicontrol">Max Memory</span> setting enabled,
+ you typically override the memory estimates that Impala makes based on the statistics from the
+ <code class="ph codeph">COMPUTE STATS</code> statement.
+ You either set the <code class="ph codeph">MEM_LIMIT</code> query option within a particular session to
+ set an upper memory limit for queries within that session, or a default <code class="ph codeph">MEM_LIMIT</code>
+ setting for all queries processed by the <span class="keyword cmdname">impalad</span> instance, or
+ a default <code class="ph codeph">MEM_LIMIT</code> setting for all queries assigned to a particular
+ dynamic resource pool. By designating a consistent memory limit for a set of similar queries
+ that use the same resource pool, you avoid unnecessary query queuing or out-of-memory conditions
+ that can arise during high-concurrency workloads when memory estimates for some queries are inaccurate.
+ </p>
+ <p class="p">
+ Follow other steps from <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> to tune your queries.
+ </p>
+ </div>
+ </article>
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="admission_control__admission_config">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Configuring Admission Control</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The configuration options for admission control range from the simple (a single resource pool with a single
+ set of options) to the complex (multiple resource pools with different options, each pool handling queries
+ for a different set of users and groups).
+ </p>
+
+ <section class="section" id="admission_config__admission_flags"><h3 class="title sectiontitle">Impala Service Flags for Admission Control (Advanced)</h3>
+
+
+
+ <p class="p">
+ The following Impala configuration options let you adjust the settings of the admission control feature. When supplying the
+ options on the <span class="keyword cmdname">impalad</span> command line, prepend the option name with <code class="ph codeph">--</code>.
+ </p>
+
+ <dl class="dl" id="admission_config__admission_control_option_list">
+
+ <dt class="dt dlterm" id="admission_config__queue_wait_timeout_ms">
+ <code class="ph codeph">queue_wait_timeout_ms</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Maximum amount of time (in milliseconds) that a
+ request waits to be admitted before timing out.
+ <p class="p">
+ <strong class="ph b">Type:</strong> <code class="ph codeph">int64</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">60000</code>
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__default_pool_max_requests">
+ <code class="ph codeph">default_pool_max_requests</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Maximum number of concurrent outstanding requests
+ allowed to run before incoming requests are queued. Because this
+ limit applies cluster-wide, but each Impala node makes independent
+ decisions to run queries immediately or queue them, it is a soft
+ limit; the overall number of concurrent queries might be slightly
+ higher during times of heavy load. A negative value indicates no
+ limit. Ignored if <code class="ph codeph">fair_scheduler_config_path</code> and
+ <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+ <strong class="ph b">Type:</strong>
+ <code class="ph codeph">int64</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <span class="ph">-1, meaning unlimited (prior to <span class="keyword">Impala 2.5</span> the default was 200)</span>
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__default_pool_max_queued">
+ <code class="ph codeph">default_pool_max_queued</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Maximum number of requests allowed to be queued
+ before rejecting requests. Because this limit applies
+ cluster-wide, but each Impala node makes independent decisions to
+ run queries immediately or queue them, it is a soft limit; the
+ overall number of queued queries might be slightly higher during
+ times of heavy load. A negative value or 0 indicates requests are
+ always rejected once the maximum concurrent requests are
+ executing. Ignored if <code class="ph codeph">fair_scheduler_config_path</code>
+ and <code class="ph codeph">llama_site_path</code> are set. <p class="p">
+ <strong class="ph b">Type:</strong>
+ <code class="ph codeph">int64</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <span class="ph">unlimited</span>
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__default_pool_mem_limit">
+ <code class="ph codeph">default_pool_mem_limit</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Maximum amount of memory (across the entire
+ cluster) that all outstanding requests in this pool can use before
+ new requests to this pool are queued. Specified in bytes,
+ megabytes, or gigabytes by a number followed by the suffix
+ <code class="ph codeph">b</code> (optional), <code class="ph codeph">m</code>, or
+ <code class="ph codeph">g</code>, either uppercase or lowercase. You can
+ specify floating-point values for megabytes and gigabytes, to
+ represent fractional numbers such as <code class="ph codeph">1.5</code>. You can
+ also specify it as a percentage of the physical memory by
+ specifying the suffix <code class="ph codeph">%</code>. 0 or no setting
+ indicates no limit. Defaults to bytes if no unit is given. Because
+ this limit applies cluster-wide, but each Impala node makes
+ independent decisions to run queries immediately or queue them, it
+ is a soft limit; the overall memory used by concurrent queries
+ might be slightly higher during times of heavy load. Ignored if
+ <code class="ph codeph">fair_scheduler_config_path</code> and
+ <code class="ph codeph">llama_site_path</code> are set. <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Impala relies on the statistics produced by the <code class="ph codeph">COMPUTE STATS</code> statement to estimate memory
+ usage for each query. See <a class="xref" href="../shared/../topics/impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a> for guidelines
+ about how and when to use this statement.
+ </div>
+ <p class="p">
+ <strong class="ph b">Type:</strong> string
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <code class="ph codeph">""</code> (empty string, meaning unlimited) </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__disable_pool_max_requests">
+ <code class="ph codeph">disable_pool_max_requests</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Disables all per-pool limits on the maximum number
+ of running requests. <p class="p">
+ <strong class="ph b">Type:</strong> Boolean </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <code class="ph codeph">false</code>
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__disable_pool_mem_limits">
+ <code class="ph codeph">disable_pool_mem_limits</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Disables all per-pool mem limits. <p class="p">
+ <strong class="ph b">Type:</strong> Boolean </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <code class="ph codeph">false</code>
+ </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__fair_scheduler_allocation_path">
+ <code class="ph codeph">fair_scheduler_allocation_path</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Path to the fair scheduler allocation file
+ (<code class="ph codeph">fair-scheduler.xml</code>). <p class="p">
+ <strong class="ph b">Type:</strong> string
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ <code class="ph codeph">""</code> (empty string) </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Admission control only uses a small subset
+ of the settings that can go in this file, as described below.
+ For details about all the Fair Scheduler configuration settings,
+ see the <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache wiki</a>. </p>
+ </dd>
+
+
+ <dt class="dt dlterm" id="admission_config__llama_site_path">
+ <code class="ph codeph">llama_site_path</code>
+ </dt>
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Path to the configuration file used by admission control
+ (<code class="ph codeph">llama-site.xml</code>). If set,
+ <code class="ph codeph">fair_scheduler_allocation_path</code> must also be set.
+ <p class="p">
+ <strong class="ph b">Type:</strong> string
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">""</code> (empty string) </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Admission control only uses a few
+ of the settings that can go in this file, as described below.
+ </p>
+ </dd>
+
+ </dl>
+ </section>
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="admission_config__admission_config_manual">
+
+ <h3 class="title topictitle3" id="ariaid-title10">Configuring Admission Control Using the Command Line</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To configure admission control, use a combination of startup options for the Impala daemon and edit
+ or create the configuration files <span class="ph filepath">fair-scheduler.xml</span> and
+ <span class="ph filepath">llama-site.xml</span>.
+ </p>
+
+ <p class="p">
+ For a straightforward configuration using a single resource pool named <code class="ph codeph">default</code>, you can
+ specify configuration options on the command line and skip the <span class="ph filepath">fair-scheduler.xml</span>
+ and <span class="ph filepath">llama-site.xml</span> configuration files.
+ </p>
+
+ <p class="p">
+ For an advanced configuration with multiple resource pools using different settings, set up the
+ <span class="ph filepath">fair-scheduler.xml</span> and <span class="ph filepath">llama-site.xml</span> configuration files
+ manually. Provide the paths to each one using the <span class="keyword cmdname">impalad</span> command-line options,
+ <code class="ph codeph">--fair_scheduler_allocation_path</code> and <code class="ph codeph">--llama_site_path</code> respectively.
+ </p>
+
+ <p class="p">
+ The Impala admission control feature only uses the Fair Scheduler configuration settings to determine how
+ to map users and groups to different resource pools. For example, you might set up different resource
+ pools with separate memory limits, and maximum number of concurrent and queued queries, for different
+ categories of users within your organization. For details about all the Fair Scheduler configuration
+ settings, see the
+ <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Apache
+ wiki</a>.
+ </p>
+
+ <p class="p">
+ The Impala admission control feature only uses a small subset of possible settings from the
+ <span class="ph filepath">llama-site.xml</span> configuration file:
+ </p>
+
+<pre class="pre codeblock"><code>llama.am.throttling.maximum.placed.reservations.<var class="keyword varname">queue_name</var>
+llama.am.throttling.maximum.queued.reservations.<var class="keyword varname">queue_name</var>
+<span class="ph">impala.admission-control.pool-default-query-options.<var class="keyword varname">queue_name</var>
+impala.admission-control.pool-queue-timeout-ms.<var class="keyword varname">queue_name</var></span>
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">impala.admission-control.pool-queue-timeout-ms</code>
+ setting specifies the timeout value for this pool, in milliseconds.
+ The<code class="ph codeph">impala.admission-control.pool-default-query-options</code>
+ settings designates the default query options for all queries that run
+ in this pool. Its argument value is a comma-delimited string of
+ 'key=value' pairs, for example,<code class="ph codeph">'key1=val1,key2=val2'</code>.
+ For example, this is where you might set a default memory limit
+ for all queries in the pool, using an argument such as <code class="ph codeph">MEM_LIMIT=5G</code>.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">impala.admission-control.*</code> configuration settings are available in
+ <span class="keyword">Impala 2.5</span> and higher.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="admission_config__admission_examples">
+
+ <h3 class="title topictitle3" id="ariaid-title11">Example of Admission Control Configuration</h3>
+
+ <div class="body conbody">
+
+ <p class="p"> Here are sample <span class="ph filepath">fair-scheduler.xml</span> and
+ <span class="ph filepath">llama-site.xml</span> files that define resource pools
+ <code class="ph codeph">root.default</code>, <code class="ph codeph">root.development</code>, and
+ <code class="ph codeph">root.production</code>. These sample files are stripped down: in a real
+ deployment they might contain other settings for use with various aspects of the YARN
+ component. The settings shown here are the significant ones for the Impala admission
+ control feature. </p>
+
+ <p class="p">
+ <strong class="ph b">fair-scheduler.xml:</strong>
+ </p>
+
+ <p class="p">
+ Although Impala does not use the <code class="ph codeph">vcores</code> value, you must still specify it to satisfy
+ YARN requirements for the file contents.
+ </p>
+
+ <p class="p">
+ Each <code class="ph codeph"><aclSubmitApps></code> tag (other than the one for <code class="ph codeph">root</code>) contains
+ a comma-separated list of users, then a space, then a comma-separated list of groups; these are the
+ users and groups allowed to submit Impala statements to the corresponding resource pool.
+ </p>
+
+ <p class="p">
+ If you leave the <code class="ph codeph"><aclSubmitApps></code> element empty for a pool, nobody can submit
+ directly to that pool; child pools can specify their own <code class="ph codeph"><aclSubmitApps></code> values
+ to authorize users and groups to submit to those pools.
+ </p>
+
+ <pre class="pre codeblock"><code><allocations>
+
+ <queue name="root">
+ <aclSubmitApps> </aclSubmitApps>
+ <queue name="default">
+ <maxResources>50000 mb, 0 vcores</maxResources>
+ <aclSubmitApps>*</aclSubmitApps>
+ </queue>
+ <queue name="development">
+ <maxResources>200000 mb, 0 vcores</maxResources>
+ <aclSubmitApps>user1,user2 dev,ops,admin</aclSubmitApps>
+ </queue>
+ <queue name="production">
+ <maxResources>1000000 mb, 0 vcores</maxResources>
+ <aclSubmitApps> ops,admin</aclSubmitApps>
+ </queue>
+ </queue>
+ <queuePlacementPolicy>
+ <rule name="specified" create="false"/>
+ <rule name="default" />
+ </queuePlacementPolicy>
+</allocations>
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">llama-site.xml:</strong>
+ </p>
+
+ <pre class="pre codeblock"><code>
+<?xml version="1.0" encoding="UTF-8"?>
+<configuration>
+ <property>
+ <name>llama.am.throttling.maximum.placed.reservations.root.default</name>
+ <value>10</value>
+ </property>
+ <property>
+ <name>llama.am.throttling.maximum.queued.reservations.root.default</name>
+ <value>50</value>
+ </property>
+ <property>
+ <name>impala.admission-control.pool-default-query-options.root.default</name>
+ <value>mem_limit=128m,query_timeout_s=20,max_io_buffers=10</value>
+ </property>
+ <property>
+ <name>impala.admission-control.pool-queue-timeout-ms.root.default</name>
+ <value>30000</value>
+ </property>
+ <property>
+ <name>llama.am.throttling.maximum.placed.reservations.root.development</name>
+ <value>50</value>
+ </property>
+ <property>
+ <name>llama.am.throttling.maximum.queued.reservations.root.development</name>
+ <value>100</value>
+ </property>
+ <property>
+ <name>impala.admission-control.pool-default-query-options.root.development</name>
+ <value>mem_limit=256m,query_timeout_s=30,max_io_buffers=10</value>
+ </property>
+ <property>
+ <name>impala.admission-control.pool-queue-timeout-ms.root.development</name>
+ <value>15000</value>
+ </property>
+ <property>
+ <name>llama.am.throttling.maximum.placed.reservations.root.production</name>
+ <value>100</value>
+ </property>
+ <property>
+ <name>llama.am.throttling.maximum.queued.reservations.root.production</name>
+ <value>200</value>
+ </property>
+<!--
+ Default query options for the 'root.production' pool.
+ THIS IS A NEW PARAMETER in Impala 2.5.
+ Note that the MEM_LIMIT query option still shows up in here even though it is a
+ separate box in the UI. We do that because it is the most important query option
+ that people will need (everything else is somewhat advanced).
+
+ MEM_LIMIT takes a per-node memory limit which is specified using one of the following:
+ - '<int>[bB]?' -> bytes (default if no unit given)
+ - '<float>[mM(bB)]' -> megabytes
+ - '<float>[gG(bB)]' -> in gigabytes
+ E.g. 'MEM_LIMIT=12345' (no unit) means 12345 bytes, and you can append m or g
+ to specify megabytes or gigabytes, though that is not required.
+-->
+ <property>
+ <name>impala.admission-control.pool-default-query-options.root.production</name>
+ <value>mem_limit=386m,query_timeout_s=30,max_io_buffers=10</value>
+ </property>
+<!--
+ Default queue timeout (ms) for the pool 'root.production'.
+ If this isn’t set, the process-wide flag is used.
+ THIS IS A NEW PARAMETER in Impala 2.5.
+-->
+ <property>
+ <name>impala.admission-control.pool-queue-timeout-ms.root.production</name>
+ <value>30000</value>
+ </property>
+</configuration>
+
+</code></pre>
+
+ </div>
+ </article>
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="admission_config__admission_guidelines">
+
+ <h3 class="title topictitle3" id="ariaid-title12">Guidelines for Using Admission Control</h3>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To see how admission control works for particular queries, examine the profile output for the query. This
+ information is available through the <code class="ph codeph">PROFILE</code> statement in <span class="keyword cmdname">impala-shell</span>
+ immediately after running a query in the shell, on the <span class="ph uicontrol">queries</span> page of the Impala
+ debug web UI, or in the Impala log file (basic information at log level 1, more detailed information at log
+ level 2). The profile output contains details about the admission decision, such as whether the query was
+ queued or not and which resource pool it was assigned to. It also includes the estimated and actual memory
+ usage for the query, so you can fine-tune the configuration for the memory limits of the resource pools.
+ </p>
+
+ <p class="p">
+ Remember that the limits imposed by admission control are <span class="q">"soft"</span> limits.
+ The decentralized nature of this mechanism means that each Impala node makes its own decisions about whether
+ to allow queries to run immediately or to queue them. These decisions rely on information passed back and forth
+ between nodes by the statestore service. If a sudden surge in requests causes more queries than anticipated to run
+ concurrently, then throughput could decrease due to queries spilling to disk or contending for resources;
+ or queries could be cancelled if they exceed the <code class="ph codeph">MEM_LIMIT</code> setting while running.
+ </p>
+
+
+
+ <p class="p">
+ In <span class="keyword cmdname">impala-shell</span>, you can also specify which resource pool to direct queries to by
+ setting the <code class="ph codeph">REQUEST_POOL</code> query option.
+ </p>
+
+ <p class="p">
+ The statements affected by the admission control feature are primarily queries, but also include statements
+ that write data such as <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>. Most write
+ operations in Impala are not resource-intensive, but inserting into a Parquet table can require substantial
+ memory due to buffering intermediate data before writing out each Parquet data block. See
+ <a class="xref" href="impala_parquet.html#parquet_etl">Loading Data into Parquet Tables</a> for instructions about inserting data efficiently into
+ Parquet tables.
+ </p>
+
+ <p class="p">
+ Although admission control does not scrutinize memory usage for other kinds of DDL statements, if a query
+ is queued due to a limit on concurrent queries or memory usage, subsequent statements in the same session
+ are also queued so that they are processed in the correct order:
+ </p>
+
+<pre class="pre codeblock"><code>-- This query could be queued to avoid out-of-memory at times of heavy load.
+select * from huge_table join enormous_table using (id);
+-- If so, this subsequent statement in the same session is also queued
+-- until the previous statement completes.
+drop table huge_table;
+</code></pre>
+
+ <p class="p">
+ If you set up different resource pools for different users and groups, consider reusing any classifications
+ you developed for use with Sentry security. See <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a> for details.
+ </p>
+
+ <p class="p">
+ For details about all the Fair Scheduler configuration settings, see
+ <a class="xref" href="http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html#Configuration" target="_blank">Fair Scheduler Configuration</a>, in particular the tags such as <code class="ph codeph"><queue></code> and
+ <code class="ph codeph"><aclSubmitApps></code> to map users and groups to particular resource pools (queues).
+ </p>
+
+
+ </div>
+ </article>
+</article>
+</article></main></body></html>
[07/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_s3.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_s3.html b/docs/build3x/html/topics/impala_s3.html
new file mode 100644
index 0000000..33aa361
--- /dev/null
+++ b/docs/build3x/html/topics/impala_s3.html
@@ -0,0 +1,775 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Using Impala with the Amazon S3 Filesystem</title></head><body id="s3"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Using Impala with the Amazon S3 Filesystem</h1>
+
+
+
+ <div class="body conbody">
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala supports both queries (<code class="ph codeph">SELECT</code>)
+ and DML (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>, <code class="ph codeph">CREATE TABLE AS SELECT</code>)
+ for data residing on Amazon S3. With the inclusion of write support,
+
+ the Impala support for S3 is now considered ready for production use.
+ </p>
+ </div>
+
+ <p class="p">
+
+
+
+ You can use Impala to query data residing on the Amazon S3 filesystem. This capability allows convenient
+ access to a storage system that is remotely managed, accessible from anywhere, and integrated with various
+ cloud-based services. Impala can query files in any supported file format from S3. The S3 storage location
+ can be for an entire table, or individual partitions in a partitioned table.
+ </p>
+
+ <p class="p">
+ The default Impala tables use data files stored on HDFS, which are ideal for bulk loads and queries using
+ full-table scans. In contrast, queries against S3 data are less performant, making S3 suitable for holding
+ <span class="q">"cold"</span> data that is only queried occasionally, while more frequently accessed <span class="q">"hot"</span> data resides in
+ HDFS. In a partitioned table, you can set the <code class="ph codeph">LOCATION</code> attribute for individual partitions
+ to put some partitions on HDFS and others on S3, typically depending on the age of the data.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="s3__s3_sql">
+ <h2 class="title topictitle2" id="ariaid-title2">How Impala SQL Statements Work with S3</h2>
+ <div class="body conbody">
+ <p class="p">
+ Impala SQL statements work with data on S3 as follows:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>
+ or <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> statements
+ can specify that a table resides on the S3 filesystem by
+ encoding an <code class="ph codeph">s3a://</code> prefix for the <code class="ph codeph">LOCATION</code>
+ property. <code class="ph codeph">ALTER TABLE</code> can also set the <code class="ph codeph">LOCATION</code>
+ property for an individual partition, so that some data in a table resides on
+ S3 and other data in the same table resides on HDFS.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Once a table or partition is designated as residing on S3, the <a class="xref" href="impala_select.html#select">SELECT Statement</a>
+ statement transparently accesses the data files from the appropriate storage layer.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ If the S3 table is an internal table, the <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a> statement
+ removes the corresponding data files from S3 when the table is dropped.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_truncate_table.html#truncate_table">TRUNCATE TABLE Statement (Impala 2.3 or higher only)</a> statement always removes the corresponding
+ data files from S3 when the table is truncated.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_load_data.html#load_data">LOAD DATA Statement</a> can move data files residing in HDFS into
+ an S3 table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> statement, or the <code class="ph codeph">CREATE TABLE AS SELECT</code>
+ form of the <code class="ph codeph">CREATE TABLE</code> statement, can copy data from an HDFS table or another S3
+ table into an S3 table. The <a class="xref" href="impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a>
+ query option chooses whether or not to use a fast code path for these write operations to S3,
+ with the tradeoff of potential inconsistency in the case of a failure during the statement.
+ </p>
+ </li>
+ </ul>
+ <p class="p">
+ For usage information about Impala SQL statements with S3 tables, see <a class="xref" href="impala_s3.html#s3_ddl">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</a>
+ and <a class="xref" href="impala_s3.html#s3_dml">Using Impala DML Statements for S3 Data</a>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="s3__s3_creds">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Specifying Impala Credentials to Access Data in S3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+
+ To allow Impala to access data in S3, specify values for the following configuration settings in your
+ <span class="ph filepath">core-site.xml</span> file:
+ </p>
+
+
+<pre class="pre codeblock"><code>
+<property>
+<name>fs.s3a.access.key</name>
+<value><var class="keyword varname">your_access_key</var></value>
+</property>
+<property>
+<name>fs.s3a.secret.key</name>
+<value><var class="keyword varname">your_secret_key</var></value>
+</property>
+</code></pre>
+
+ <p class="p">
+ After specifying the credentials, restart both the Impala and
+ Hive services. (Restarting Hive is required because Impala queries, CREATE TABLE statements, and so on go
+ through the Hive metastore.)
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+
+ <p class="p">
+ Although you can specify the access key ID and secret key as part of the <code class="ph codeph">s3a://</code> URL in the
+ <code class="ph codeph">LOCATION</code> attribute, doing so makes this sensitive information visible in many places, such
+ as <code class="ph codeph">DESCRIBE FORMATTED</code> output and Impala log files. Therefore, specify this information
+ centrally in the <span class="ph filepath">core-site.xml</span> file, and restrict read access to that file to only
+ trusted users.
+ </p>
+
+
+
+ </div>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="s3__s3_etl">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Loading Data into S3 for Impala Queries</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ If your ETL pipeline involves moving data into S3 and then querying through Impala,
+ you can either use Impala DML statements to create, move, or copy the data, or
+ use the same data loading techniques as you would for non-Impala data.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="s3_etl__s3_dml">
+ <h3 class="title topictitle3" id="ariaid-title5">Using Impala DML Statements for S3 Data</h3>
+ <div class="body conbody">
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, the Impala DML statements (<code class="ph codeph">INSERT</code>, <code class="ph codeph">LOAD DATA</code>,
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code>) can write data into a table or partition that resides in the
+ Amazon Simple Storage Service (S3).
+ The syntax of the DML statements is the same as for any other tables, because the S3 location for tables and
+ partitions is specified by an <code class="ph codeph">s3a://</code> prefix in the
+ <code class="ph codeph">LOCATION</code> attribute of
+ <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statements.
+ If you bring data into S3 using the normal S3 transfer mechanisms instead of Impala DML statements,
+ issue a <code class="ph codeph">REFRESH</code> statement for the table before using Impala to query the S3 data.
+ </p>
+ <p class="p">
+ Because of differences between S3 and traditional filesystems, DML operations
+ for S3 tables can take longer than for tables on HDFS. For example, both the
+ <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+ to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the files are moved from a temporary staging directory to the final destination directory.)
+ Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+ actually copies the data files from one location to another and then removes the original files.
+ In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+ to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+ that a problem during statement execution could leave data in an inconsistent state.
+ It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="s3_etl__s3_manual_etl">
+ <h3 class="title topictitle3" id="ariaid-title6">Manually Loading Data into Impala Tables on S3</h3>
+ <div class="body conbody">
+ <p class="p">
+ As an alternative, or on earlier Impala releases without DML support for S3,
+ you can use the Amazon-provided methods to bring data files into S3 for querying through Impala. See
+ <a class="xref" href="http://aws.amazon.com/s3/" target="_blank">the Amazon S3 web site</a> for
+ details.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <div class="p">
+ For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+ and higher:
+ <ul class="ul">
+ <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+ <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+ </ul>
+ By default, when you drop an internal (managed) table, the data files are
+ moved to the HDFS trashcan. This operation is expensive for tables that
+ reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+ <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+ The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+ skipping the HDFS trashcan.
+ For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+ data files on S3 using one of the tools from the Hadoop ecosystem, such as
+ <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+ </div>
+ </div>
+
+ <p class="p">
+ Alternative file creation techniques (less compatible with the <code class="ph codeph">PURGE</code> clause) include:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <a class="xref" href="https://console.aws.amazon.com/s3/home" target="_blank">Amazon AWS / S3
+ web interface</a> to upload from a web browser.
+ </li>
+
+ <li class="li">
+ The <a class="xref" href="http://aws.amazon.com/cli/" target="_blank">Amazon AWS CLI</a> to
+ manipulate files from the command line.
+ </li>
+
+ <li class="li">
+ Other S3-enabled software, such as
+ <a class="xref" href="http://s3tools.org/s3cmd" target="_blank">the S3Tools client software</a>.
+ </li>
+ </ul>
+
+ <p class="p">
+ After you upload data files to a location already mapped to an Impala table or partition, or if you delete
+ files in S3 from such a location, issue the <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code>
+ statement to make Impala aware of the new set of data files.
+ </p>
+
+ </div>
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="s3__s3_ddl">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Creating Impala Databases, Tables, and Partitions for Data Stored on S3</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala reads data for a table or partition from S3 based on the <code class="ph codeph">LOCATION</code> attribute for the
+ table or partition. Specify the S3 details in the <code class="ph codeph">LOCATION</code> clause of a <code class="ph codeph">CREATE
+ TABLE</code> or <code class="ph codeph">ALTER TABLE</code> statement. The notation for the <code class="ph codeph">LOCATION</code>
+ clause is <code class="ph codeph">s3a://<var class="keyword varname">bucket_name</var>/<var class="keyword varname">path/to/file</var></code>. The
+ filesystem prefix is always <code class="ph codeph">s3a://</code> because Impala does not support the <code class="ph codeph">s3://</code> or
+ <code class="ph codeph">s3n://</code> prefixes.
+ </p>
+
+ <p class="p">
+ For a partitioned table, either specify a separate <code class="ph codeph">LOCATION</code> clause for each new partition,
+ or specify a base <code class="ph codeph">LOCATION</code> for the table and set up a directory structure in S3 to mirror
+ the way Impala partitioned tables are structured in HDFS. Although, strictly speaking, S3 filenames do not
+ have directory paths, Impala treats S3 filenames with <code class="ph codeph">/</code> characters the same as HDFS
+ pathnames that include directories.
+ </p>
+
+ <p class="p">
+ You point a nonpartitioned table or an individual partition at S3 by specifying a single directory
+ path in S3, which could be any arbitrary directory. To replicate the structure of an entire Impala
+ partitioned table or database in S3 requires more care, with directories and subdirectories nested and
+ named to match the equivalent directory tree in HDFS. Consider setting up an empty staging area if
+ necessary in HDFS, and recording the complete directory structure so that you can replicate it in S3.
+
+ </p>
+
+ <p class="p">
+ For convenience when working with multiple tables with data files stored in S3, you can create a database
+ with a <code class="ph codeph">LOCATION</code> attribute pointing to an S3 path.
+ Specify a URL of the form <code class="ph codeph">s3a://<var class="keyword varname">bucket</var>/<var class="keyword varname">root/path/for/database</var></code>
+ for the <code class="ph codeph">LOCATION</code> attribute of the database.
+ Any tables created inside that database
+ automatically create directories underneath the one specified by the database
+ <code class="ph codeph">LOCATION</code> attribute.
+ </p>
+
+ <p class="p">
+ For example, the following session creates a partitioned table where only a single partition resides on S3.
+ The partitions for years 2013 and 2014 are located on HDFS. The partition for year 2015 includes a
+ <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL, and so refers to data residing on
+ S3, under a specific path underneath the bucket <code class="ph codeph">impala-demo</code>.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database db_on_hdfs;
+[localhost:21000] > use db_on_hdfs;
+[localhost:21000] > create table mostly_on_hdfs (x int) partitioned by (year int);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2013);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2014);
+[localhost:21000] > alter table mostly_on_hdfs add partition (year=2015)
+ > location 's3a://impala-demo/dir1/dir2/dir3/t1';
+</code></pre>
+
+ <p class="p">
+ The following session creates a database and two partitioned tables residing entirely on S3, one
+ partitioned by a single column and the other partitioned by multiple columns. Because a
+ <code class="ph codeph">LOCATION</code> attribute with an <code class="ph codeph">s3a://</code> URL is specified for the database, the
+ tables inside that database are automatically created on S3 underneath the database directory. To see the
+ names of the associated subdirectories, including the partition key values, we use an S3 client tool to
+ examine how the directory structure is organized on S3. For example, Impala partition directories such as
+ <code class="ph codeph">month=1</code> do not include leading zeroes, which sometimes appear in partition directories created
+ through Hive.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create database db_on_s3 location 's3a://impala-demo/dir1/dir2/dir3';
+[localhost:21000] > use db_on_s3;
+
+[localhost:21000] > create table partitioned_on_s3 (x int) partitioned by (year int);
+[localhost:21000] > alter table partitioned_on_s3 add partition (year=2013);
+[localhost:21000] > alter table partitioned_on_s3 add partition (year=2014);
+[localhost:21000] > alter table partitioned_on_s3 add partition (year=2015);
+
+[localhost:21000] > !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34 0 dir1/dir2/dir3/
+2015-03-17 16:43:28 0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49 0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53 0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58 0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+
+[localhost:21000] > create table partitioned_multiple_keys (x int)
+ > partitioned by (year smallint, month tinyint, day tinyint);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=1,day=1);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=1,day=31);
+[localhost:21000] > alter table partitioned_multiple_keys
+ > add partition (year=2015,month=2,day=28);
+
+[localhost:21000] > !aws s3 ls s3://impala-demo/dir1/dir2/dir3 --recursive;
+2015-03-17 13:56:34 0 dir1/dir2/dir3/
+2015-03-17 16:47:13 0 dir1/dir2/dir3/partitioned_multiple_keys/
+2015-03-17 16:47:44 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=1/
+2015-03-17 16:47:50 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=1/day=31/
+2015-03-17 16:47:57 0 dir1/dir2/dir3/partitioned_multiple_keys/year=2015/month=2/day=28/
+2015-03-17 16:43:28 0 dir1/dir2/dir3/partitioned_on_s3/
+2015-03-17 16:43:49 0 dir1/dir2/dir3/partitioned_on_s3/year=2013/
+2015-03-17 16:43:53 0 dir1/dir2/dir3/partitioned_on_s3/year=2014/
+2015-03-17 16:43:58 0 dir1/dir2/dir3/partitioned_on_s3/year=2015/
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">CREATE DATABASE</code> and <code class="ph codeph">CREATE TABLE</code> statements create the associated
+ directory paths if they do not already exist. You can specify multiple levels of directories, and the
+ <code class="ph codeph">CREATE</code> statement creates all appropriate levels, similar to using <code class="ph codeph">mkdir
+ -p</code>.
+ </p>
+
+ <p class="p">
+ Use the standard S3 file upload methods to actually put the data files into the right locations. You can
+ also put the directory paths and data files in place before creating the associated Impala databases or
+ tables, and Impala automatically uses the data from the appropriate location after the associated databases
+ and tables are created.
+ </p>
+
+ <p class="p">
+ You can switch whether an existing table or partition points to data in HDFS or S3. For example, if you
+ have an Impala table or partition pointing to data files in HDFS or S3, and you later transfer those data
+ files to the other filesystem, use an <code class="ph codeph">ALTER TABLE</code> statement to adjust the
+ <code class="ph codeph">LOCATION</code> attribute of the corresponding table or partition to reflect that change. Because
+ Impala does not have an <code class="ph codeph">ALTER DATABASE</code> statement, this location-switching technique is not
+ practical for entire databases that have a custom <code class="ph codeph">LOCATION</code> attribute.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="s3__s3_internal_external">
+
+ <h2 class="title topictitle2" id="ariaid-title8">Internal and External Tables Located on S3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Just as with tables located on HDFS storage, you can designate S3-based tables as either internal (managed
+ by Impala) or external, by using the syntax <code class="ph codeph">CREATE TABLE</code> or <code class="ph codeph">CREATE EXTERNAL
+ TABLE</code> respectively. When you drop an internal table, the files associated with the table are
+ removed, even if they are on S3 storage. When you drop an external table, the files associated with the
+ table are left alone, and are still available for access by other tools or components. See
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details.
+ </p>
+
+ <p class="p">
+ If the data on S3 is intended to be long-lived and accessed by other tools in addition to Impala, create
+ any associated S3 tables with the <code class="ph codeph">CREATE EXTERNAL TABLE</code> syntax, so that the files are not
+ deleted from S3 when the table is dropped.
+ </p>
+
+ <p class="p">
+ If the data on S3 is only needed for querying by Impala and can be safely discarded once the Impala
+ workflow is complete, create the associated S3 tables using the <code class="ph codeph">CREATE TABLE</code> syntax, so
+ that dropping the table also deletes the corresponding data files on S3.
+ </p>
+
+ <p class="p">
+ For example, this session creates a table in S3 with the same column layout as a table in HDFS, then
+ examines the S3 table and queries some data from it. The table in S3 works the same as a table in HDFS as
+ far as the expected file format of the data, table and column statistics, and other table properties. The
+ only indication that it is not an HDFS table is the <code class="ph codeph">s3a://</code> URL in the
+ <code class="ph codeph">LOCATION</code> property. Many data files can reside in the S3 directory, and their combined
+ contents form the table data. Because the data in this example is uploaded after the table is created, a
+ <code class="ph codeph">REFRESH</code> statement prompts Impala to update its cached information about the data files.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table usa_cities_s3 like usa_cities location 's3a://impala-demo/usa_cities';
+[localhost:21000] > desc usa_cities_s3;
++-------+----------+---------+
+| name | type | comment |
++-------+----------+---------+
+| id | smallint | |
+| city | string | |
+| state | string | |
++-------+----------+---------+
+
+-- Now from a web browser, upload the same data file(s) to S3 as in the HDFS table,
+-- under the relevant bucket and path. If you already have the data in S3, you would
+-- point the table LOCATION at an existing path.
+
+[localhost:21000] > refresh usa_cities_s3;
+[localhost:21000] > select count(*) from usa_cities_s3;
++----------+
+| count(*) |
++----------+
+| 289 |
++----------+
+[localhost:21000] > select distinct state from sample_data_s3 limit 5;
++----------------------+
+| state |
++----------------------+
+| Louisiana |
+| Minnesota |
+| Georgia |
+| Alaska |
+| Ohio |
++----------------------+
+[localhost:21000] > desc formatted usa_cities_s3;
++------------------------------+------------------------------+---------+
+| name | type | comment |
++------------------------------+------------------------------+---------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| id | smallint | NULL |
+| city | string | NULL |
+| state | string | NULL |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | s3_testing | NULL |
+| Owner: | jrussell | NULL |
+| CreateTime: | Mon Mar 16 11:36:25 PDT 2015 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+| Location: | s3a://impala-demo/usa_cities | NULL |
+| Table Type: | MANAGED_TABLE | NULL |
+...
++------------------------------+------------------------------+---------+
+</code></pre>
+
+
+
+ <p class="p">
+ In this case, we have already uploaded a Parquet file with a million rows of data to the
+ <code class="ph codeph">sample_data</code> directory underneath the <code class="ph codeph">impala-demo</code> bucket on S3. This
+ session creates a table with matching column settings pointing to the corresponding location in S3, then
+ queries the table. Because the data is already in place on S3 when the table is created, no
+ <code class="ph codeph">REFRESH</code> statement is required.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table sample_data_s3
+ > (id int, id bigint, val int, zerofill string,
+ > name string, assertion boolean, city string, state string)
+ > stored as parquet location 's3a://impala-demo/sample_data';
+[localhost:21000] > select count(*) from sample_data_s3;;
++----------+
+| count(*) |
++----------+
+| 1000000 |
++----------+
+[localhost:21000] > select count(*) howmany, assertion from sample_data_s3 group by assertion;
++---------+-----------+
+| howmany | assertion |
++---------+-----------+
+| 667149 | true |
+| 332851 | false |
++---------+-----------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="s3__s3_queries">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Running and Tuning Impala Queries for Data Stored on S3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Once the appropriate <code class="ph codeph">LOCATION</code> attributes are set up at the table or partition level, you
+ query data stored in S3 exactly the same as data stored on HDFS or in HBase:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Queries against S3 data support all the same file formats as for HDFS data.
+ </li>
+
+ <li class="li">
+ Tables can be unpartitioned or partitioned. For partitioned tables, either manually construct paths in S3
+ corresponding to the HDFS directories representing partition key values, or use <code class="ph codeph">ALTER TABLE ...
+ ADD PARTITION</code> to set up the appropriate paths in S3.
+ </li>
+
+ <li class="li">
+ HDFS and HBase tables can be joined to S3 tables, or S3 tables can be joined with each other.
+ </li>
+
+ <li class="li">
+ Authorization using the Sentry framework to control access to databases, tables, or columns works the
+ same whether the data is in HDFS or in S3.
+ </li>
+
+ <li class="li">
+ The <span class="keyword cmdname">catalogd</span> daemon caches metadata for both HDFS and S3 tables. Use
+ <code class="ph codeph">REFRESH</code> and <code class="ph codeph">INVALIDATE METADATA</code> for S3 tables in the same situations
+ where you would issue those statements for HDFS tables.
+ </li>
+
+ <li class="li">
+ Queries against S3 tables are subject to the same kinds of admission control and resource management as
+ HDFS tables.
+ </li>
+
+ <li class="li">
+ Metadata about S3 tables is stored in the same metastore database as for HDFS tables.
+ </li>
+
+ <li class="li">
+ You can set up views referring to S3 tables, the same as for HDFS tables.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">COMPUTE STATS</code>, <code class="ph codeph">SHOW TABLE STATS</code>, and <code class="ph codeph">SHOW COLUMN
+ STATS</code> statements work for S3 tables also.
+ </li>
+ </ul>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="s3_queries__s3_performance">
+
+ <h3 class="title topictitle3" id="ariaid-title10">Understanding and Tuning Impala Query Performance for S3 Data</h3>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Although Impala queries for data stored in S3 might be less performant than queries against the
+ equivalent data stored in HDFS, you can still do some tuning. Here are techniques you can use to
+ interpret explain plans and profiles for queries against S3 data, and tips to achieve the best
+ performance possible for such queries.
+ </p>
+
+ <p class="p">
+ All else being equal, performance is expected to be lower for queries running against data on S3 rather
+ than HDFS. The actual mechanics of the <code class="ph codeph">SELECT</code> statement are somewhat different when the
+ data is in S3. Although the work is still distributed across the datanodes of the cluster, Impala might
+ parallelize the work for a distributed query differently for data on HDFS and S3. S3 does not have the
+ same block notion as HDFS, so Impala uses heuristics to determine how to split up large S3 files for
+ processing in parallel. Because all hosts can access any S3 data file with equal efficiency, the
+ distribution of work might be different than for HDFS data, where the data blocks are physically read
+ using short-circuit local reads by hosts that contain the appropriate block replicas. Although the I/O to
+ read the S3 data might be spread evenly across the hosts of the cluster, the fact that all data is
+ initially retrieved across the network means that the overall query performance is likely to be lower for
+ S3 data than for HDFS data.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala queries are optimized for files stored in Amazon S3.
+ For Impala tables that use the file formats Parquet, RCFile, SequenceFile,
+ Avro, and uncompressed text, the setting <code class="ph codeph">fs.s3a.block.size</code>
+ in the <span class="ph filepath">core-site.xml</span> configuration file determines
+ how Impala divides the I/O work of reading the data files. This configuration
+ setting is specified in bytes. By default, this
+ value is 33554432 (32 MB), meaning that Impala parallelizes S3 read operations on the files
+ as if they were made up of 32 MB blocks. For example, if your S3 queries primarily access
+ Parquet files written by MapReduce or Hive, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 134217728 (128 MB) to match the row group size of those files. If most S3 queries involve
+ Parquet files written by Impala, increase <code class="ph codeph">fs.s3a.block.size</code>
+ to 268435456 (256 MB) to match the row group size produced by Impala.
+ </p>
+
+ <p class="p">
+ Because of differences between S3 and traditional filesystems, DML operations
+ for S3 tables can take longer than for tables on HDFS. For example, both the
+ <code class="ph codeph">LOAD DATA</code> statement and the final stage of the <code class="ph codeph">INSERT</code>
+ and <code class="ph codeph">CREATE TABLE AS SELECT</code> statements involve moving files from one directory
+ to another. (In the case of <code class="ph codeph">INSERT</code> and <code class="ph codeph">CREATE TABLE AS SELECT</code>,
+ the files are moved from a temporary staging directory to the final destination directory.)
+ Because S3 does not support a <span class="q">"rename"</span> operation for existing objects, in these cases Impala
+ actually copies the data files from one location to another and then removes the original files.
+ In <span class="keyword">Impala 2.6</span>, the <code class="ph codeph">S3_SKIP_INSERT_STAGING</code> query option provides a way
+ to speed up <code class="ph codeph">INSERT</code> statements for S3 tables and partitions, with the tradeoff
+ that a problem during statement execution could leave data in an inconsistent state.
+ It does not apply to <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA</code> statements.
+ See <a class="xref" href="../shared/../topics/impala_s3_skip_insert_staging.html#s3_skip_insert_staging">S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</a> for details.
+ </p>
+
+ <p class="p">
+ When optimizing aspects of for complex queries such as the join order, Impala treats tables on HDFS and
+ S3 the same way. Therefore, follow all the same tuning recommendations for S3 tables as for HDFS ones,
+ such as using the <code class="ph codeph">COMPUTE STATS</code> statement to help Impala construct accurate estimates of
+ row counts and cardinality. See <a class="xref" href="impala_performance.html#performance">Tuning Impala for Performance</a> for details.
+ </p>
+
+ <p class="p">
+ In query profile reports, the numbers for <code class="ph codeph">BytesReadLocal</code>,
+ <code class="ph codeph">BytesReadShortCircuit</code>, <code class="ph codeph">BytesReadDataNodeCached</code>, and
+ <code class="ph codeph">BytesReadRemoteUnexpected</code> are blank because those metrics come from HDFS.
+ If you do see any indications that a query against an S3 table performed <span class="q">"remote read"</span>
+ operations, do not be alarmed. That is expected because, by definition, all the I/O for S3 tables involves
+ remote reads.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="s3__s3_restrictions">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Restrictions on Impala Support for S3</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala requires that the default filesystem for the cluster be HDFS. You cannot use S3 as the only
+ filesystem in the cluster.
+ </p>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.6</span> Impala could not perform DML operations (<code class="ph codeph">INSERT</code>,
+ <code class="ph codeph">LOAD DATA</code>, or <code class="ph codeph">CREATE TABLE AS SELECT</code>) where the destination is a table
+ or partition located on an S3 filesystem. This restriction is lifted in <span class="keyword">Impala 2.6</span> and higher.
+ </p>
+
+ <p class="p">
+ Impala does not support the old <code class="ph codeph">s3://</code> block-based and <code class="ph codeph">s3n://</code> filesystem
+ schemes, only <code class="ph codeph">s3a://</code>.
+ </p>
+
+ <p class="p">
+ Although S3 is often used to store JSON-formatted data, the current Impala support for S3 does not include
+ directly querying JSON data. For Impala queries, use data files in one of the file formats listed in
+ <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>. If you have data in JSON format, you can prepare a
+ flattened version of that data for querying by Impala as part of your ETL cycle.
+ </p>
+
+ <p class="p">
+ You cannot use the <code class="ph codeph">ALTER TABLE ... SET CACHED</code> statement for tables or partitions that are
+ located in S3.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="s3__s3_best_practices">
+ <h2 class="title topictitle2" id="ariaid-title12">Best Practices for Using Impala with S3</h2>
+
+ <div class="body conbody">
+ <p class="p">
+ The following guidelines represent best practices derived from testing and field experience with Impala on S3:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Any reference to an S3 location must be fully qualified. (This rule applies when
+ S3 is not designated as the default filesystem.)
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Set the safety valve <code class="ph codeph">fs.s3a.connection.maximum</code> to 1500 for <span class="keyword cmdname">impalad</span>.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Set safety valve <code class="ph codeph">fs.s3a.block.size</code> to 134217728
+ (128 MB in bytes) if most Parquet files queried by Impala were written by Hive
+ or ParquetMR jobs. Set the block size to 268435456 (256 MB in bytes) if most Parquet
+ files queried by Impala were written by Impala.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">DROP TABLE .. PURGE</code> is much faster than the default <code class="ph codeph">DROP TABLE</code>.
+ The same applies to <code class="ph codeph">ALTER TABLE ... DROP PARTITION PURGE</code>
+ versus the default <code class="ph codeph">DROP PARTITION</code> operation.
+ However, due to the eventually consistent nature of S3, the files for that
+ table or partition could remain for some unbounded time when using <code class="ph codeph">PURGE</code>.
+ The default <code class="ph codeph">DROP TABLE/PARTITION</code> is slow because Impala copies the files to the HDFS trash folder,
+ and Impala waits until all the data is moved. <code class="ph codeph">DROP TABLE/PARTITION .. PURGE</code> is a
+ fast delete operation, and the Impala statement finishes quickly even though the change might not
+ have propagated fully throughout S3.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ <code class="ph codeph">INSERT</code> statements are faster than <code class="ph codeph">INSERT OVERWRITE</code> for S3.
+ The query option <code class="ph codeph">S3_SKIP_INSERT_STAGING</code>, which is set to <code class="ph codeph">true</code> by default,
+ skips the staging step for regular <code class="ph codeph">INSERT</code> (but not <code class="ph codeph">INSERT OVERWRITE</code>).
+ This makes the operation much faster, but consistency is not guaranteed: if a node fails during execution, the
+ table could end up with inconsistent data. Set this option to <code class="ph codeph">false</code> if stronger
+ consistency is required, however this setting will make the <code class="ph codeph">INSERT</code> operations slower.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Too many files in a table can make metadata loading and updating slow on S3.
+ If too many requests are made to S3, S3 has a back-off mechanism and
+ responds slower than usual. You might have many small files because of:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Too many partitions due to over-granular partitioning. Prefer partitions with
+ many megabytes of data, so that even a query against a single partition can
+ be parallelized effectively.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Many small <code class="ph codeph">INSERT</code> queries. Prefer bulk
+ <code class="ph codeph">INSERT</code>s so that more data is written to fewer
+ files.
+ </p>
+ </li>
+ </ul>
+ </li>
+ </ul>
+
+ </div>
+ </article>
+
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_s3_skip_insert_staging.html b/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
new file mode 100644
index 0000000..72e4be8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_s3_skip_insert_staging.html
@@ -0,0 +1,78 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="s3_skip_insert_staging"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>S3_SKIP_INSERT_STAGING Query Option (Impala 2.6 or higher only)</title></head><body id="s3_skip_insert_staging"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">S3_SKIP_INSERT_STAGING Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ </p>
+
+ <p class="p">
+ Speeds up <code class="ph codeph">INSERT</code> operations on tables or partitions residing on the
+ Amazon S3 filesystem. The tradeoff is the possibility of inconsistent data left behind
+ if an error occurs partway through the operation.
+ </p>
+
+ <p class="p">
+ By default, Impala write operations to S3 tables and partitions involve a two-stage process.
+ Impala writes intermediate files to S3, then (because S3 does not provide a <span class="q">"rename"</span>
+ operation) those intermediate files are copied to their final location, making the process
+ more expensive as on a filesystem that supports renaming or moving files.
+ This query option makes Impala skip the intermediate files, and instead write the
+ new data directly to the final destination.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ If a host that is participating in the <code class="ph codeph">INSERT</code> operation fails partway through
+ the query, you might be left with a table or partition that contains some but not all of the
+ expected data files. Therefore, this option is most appropriate for a development or test
+ environment where you have the ability to reconstruct the table if a problem during
+ <code class="ph codeph">INSERT</code> leaves the data in an inconsistent state.
+ </p>
+ </div>
+
+ <p class="p">
+ The timing of file deletion during an <code class="ph codeph">INSERT OVERWRITE</code> operation
+ makes it impractical to write new files to S3 and delete the old files in a single operation.
+ Therefore, this query option only affects regular <code class="ph codeph">INSERT</code> statements that add
+ to the existing data in a table, not <code class="ph codeph">INSERT OVERWRITE</code> statements.
+ Use <code class="ph codeph">TRUNCATE TABLE</code> if you need to remove all contents from an S3 table
+ before performing a fast <code class="ph codeph">INSERT</code> with this option enabled.
+ </p>
+
+ <p class="p">
+ Performance improvements with this option enabled can be substantial. The speed increase
+ might be more noticeable for non-partitioned tables than for partitioned tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">true</code> (shown as 1 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[34/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_distinct.html b/docs/build3x/html/topics/impala_distinct.html
new file mode 100644
index 0000000..08d6232
--- /dev/null
+++ b/docs/build3x/html/topics/impala_distinct.html
@@ -0,0 +1,81 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_select.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DISTINCT Operator</title></head><body id="distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DISTINCT Operator</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">DISTINCT</code> operator in a <code class="ph codeph">SELECT</code> statement filters the result set to
+ remove duplicates:
+ </p>
+
+<pre class="pre codeblock"><code>-- Returns the unique values from one column.
+-- NULL is included in the set of values if any rows have a NULL in this column.
+select distinct c_birth_country from customer;
+-- Returns the unique combinations of values from multiple columns.
+select distinct c_salutation, c_last_name from customer;</code></pre>
+
+ <p class="p">
+ You can use <code class="ph codeph">DISTINCT</code> in combination with an aggregation function, typically
+ <code class="ph codeph">COUNT()</code>, to find how many different values a column contains:
+ </p>
+
+<pre class="pre codeblock"><code>-- Counts the unique values from one column.
+-- NULL is not included as a distinct value in the count.
+select count(distinct c_birth_country) from customer;
+-- Counts the unique combinations of values from multiple columns.
+select count(distinct c_salutation, c_last_name) from customer;</code></pre>
+
+ <p class="p">
+ One construct that Impala SQL does <em class="ph i">not</em> support is using <code class="ph codeph">DISTINCT</code> in more than one
+ aggregation function in the same query. For example, you could not have a single query with both
+ <code class="ph codeph">COUNT(DISTINCT c_first_name)</code> and <code class="ph codeph">COUNT(DISTINCT c_last_name)</code> in the
+ <code class="ph codeph">SELECT</code> list.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Zero-length strings:</strong> For purposes of clauses such as <code class="ph codeph">DISTINCT</code> and <code class="ph codeph">GROUP
+ BY</code>, Impala considers zero-length strings (<code class="ph codeph">""</code>), <code class="ph codeph">NULL</code>, and space
+ to all be different values.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ By default, Impala only allows a single <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">columns</var>)</code>
+ expression in each query.
+ </p>
+ <p class="p">
+ If you do not need precise accuracy, you can produce an estimate of the distinct values for a column by
+ specifying <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>; a query can contain multiple instances of
+ <code class="ph codeph">NDV(<var class="keyword varname">column</var>)</code>. To make Impala automatically rewrite
+ <code class="ph codeph">COUNT(DISTINCT)</code> expressions to <code class="ph codeph">NDV()</code>, enable the
+ <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option.
+ </p>
+ <p class="p">
+ To produce the same result as multiple <code class="ph codeph">COUNT(DISTINCT)</code> expressions, you can use the
+ following technique for queries involving a single table:
+ </p>
+<pre class="pre codeblock"><code>select v1.c1 result1, v2.c1 result2 from
+ (select count(distinct col1) as c1 from t1) v1
+ cross join
+ (select count(distinct col2) as c1 from t1) v2;
+</code></pre>
+ <p class="p">
+ Because <code class="ph codeph">CROSS JOIN</code> is an expensive operation, prefer to use the <code class="ph codeph">NDV()</code>
+ technique wherever practical.
+ </p>
+ </div>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ In contrast with some database systems that always return <code class="ph codeph">DISTINCT</code> values in sorted order,
+ Impala does not do any ordering of <code class="ph codeph">DISTINCT</code> values. Always include an <code class="ph codeph">ORDER
+ BY</code> clause if you need the values in alphabetical or numeric sorted order.
+ </p>
+ </div>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_select.html">SELECT Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_dml.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_dml.html b/docs/build3x/html/topics/impala_dml.html
new file mode 100644
index 0000000..4fb1296
--- /dev/null
+++ b/docs/build3x/html/topics/impala_dml.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="dml"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DML Statements</title></head><body id="dml"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DML Statements</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ DML refers to <span class="q">"Data Manipulation Language"</span>, a subset of SQL statements that modify the data stored in
+ tables. Because Impala focuses on query performance and leverages the append-only nature of HDFS storage,
+ currently Impala only supports a small set of DML statements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <a class="xref" href="impala_delete.html">DELETE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_insert.html">INSERT Statement</a>.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_load_data.html">LOAD DATA Statement</a>. Does not apply for HBase or Kudu tables.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_update.html">UPDATE Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+ </li>
+
+ <li class="li">
+ <a class="xref" href="impala_upsert.html">UPSERT Statement (Impala 2.8 or higher only)</a>. Works for Kudu tables only.
+ </li>
+ </ul>
+
+ <p class="p">
+ <code class="ph codeph">INSERT</code> in Impala is primarily optimized for inserting large volumes of data in a single
+ statement, to make effective use of the multi-megabyte HDFS blocks. This is the way in Impala to create new
+ data files. If you intend to insert one or a few rows at a time, such as using the <code class="ph codeph">INSERT ...
+ VALUES</code> syntax, that technique is much more efficient for Impala tables stored in HBase. See
+ <a class="xref" href="impala_hbase.html#impala_hbase">Using Impala to Query HBase Tables</a> for details.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">LOAD DATA</code> moves existing data files into the directory for an Impala table, making them
+ immediately available for Impala queries. This is one way in Impala to work with data files produced by other
+ Hadoop components. (<code class="ph codeph">CREATE EXTERNAL TABLE</code> is the other alternative; with external tables,
+ you can query existing data files, while the files remain in their original location.)
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.8</span> and higher, Impala does support the <code class="ph codeph">UPDATE</code>, <code class="ph codeph">DELETE</code>,
+ and <code class="ph codeph">UPSERT</code> statements for Kudu tables.
+ For HDFS or S3 tables, to simulate the effects of an <code class="ph codeph">UPDATE</code> or <code class="ph codeph">DELETE</code> statement
+ in other database systems, typically you use <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> to copy data
+ from one table to another, filtering out or changing the appropriate rows during the copy operation.
+ </p>
+
+ <p class="p">
+ You can also achieve a result similar to <code class="ph codeph">UPDATE</code> by using Impala tables stored in HBase.
+ When you insert a row into an HBase table, and the table
+ already contains a row with the same value for the key column, the older row is hidden, effectively the same
+ as a single-row <code class="ph codeph">UPDATE</code>.
+ </p>
+
+ <p class="p">
+ Impala can perform DML operations for tables or partitions stored in the Amazon S3 filesystem
+ with <span class="keyword">Impala 2.6</span> and higher. See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ The other major classifications of SQL statements are data definition language (see
+ <a class="xref" href="impala_ddl.html#ddl">DDL Statements</a>) and queries (see <a class="xref" href="impala_select.html#select">SELECT Statement</a>).
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_double.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_double.html b/docs/build3x/html/topics/impala_double.html
new file mode 100644
index 0000000..afff3cf
--- /dev/null
+++ b/docs/build3x/html/topics/impala_double.html
@@ -0,0 +1,157 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="double"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DOUBLE Data Type</title></head><body id="double"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DOUBLE Data Type</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ A double precision floating-point data type used in <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER
+ TABLE</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> DOUBLE</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Range:</strong> 4.94065645841246544e-324d .. 1.79769313486231570e+308, positive or negative
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Precision:</strong> 15 to 17 significant digits, depending on usage. The number of significant digits does
+ not depend on the position of the decimal point.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Representation:</strong> The values are stored in 8 bytes, using
+ <a class="xref" href="https://en.wikipedia.org/wiki/Double-precision_floating-point_format" target="_blank">IEEE 754 Double Precision Binary Floating Point</a> format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Conversions:</strong> Impala does not automatically convert <code class="ph codeph">DOUBLE</code> to any other type. You can
+ use <code class="ph codeph">CAST()</code> to convert <code class="ph codeph">DOUBLE</code> values to <code class="ph codeph">FLOAT</code>,
+ <code class="ph codeph">TINYINT</code>, <code class="ph codeph">SMALLINT</code>, <code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>,
+ <code class="ph codeph">STRING</code>, <code class="ph codeph">TIMESTAMP</code>, or <code class="ph codeph">BOOLEAN</code>. You can use exponential
+ notation in <code class="ph codeph">DOUBLE</code> literals or when casting from <code class="ph codeph">STRING</code>, for example
+ <code class="ph codeph">1.0e6</code> to represent one million.
+ <span class="ph">
+ Casting an integer or floating-point value <code class="ph codeph">N</code> to
+ <code class="ph codeph">TIMESTAMP</code> produces a value that is <code class="ph codeph">N</code> seconds past the start of the epoch
+ date (January 1, 1970). By default, the result value represents a date and time in the UTC time zone.
+ If the setting <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions=true</code> is in effect,
+ the resulting <code class="ph codeph">TIMESTAMP</code> represents a date and time in the local time zone.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The data type <code class="ph codeph">REAL</code> is an alias for <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+
+ <p class="p">
+ Impala does not evaluate NaN (not a number) as equal to any other numeric values,
+ including other NaN values. For example, the following statement, which evaluates equality
+ between two NaN values, returns <code class="ph codeph">false</code>:
+ </p>
+
+<pre class="pre codeblock"><code>
+SELECT CAST('nan' AS DOUBLE)=CAST('nan' AS DOUBLE);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CREATE TABLE t1 (x DOUBLE);
+SELECT CAST(1000.5 AS DOUBLE);
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> Because fractional values of this type are not always represented precisely, when this
+ type is used for a partition key column, the underlying HDFS directories might not be named exactly as you
+ expect. Prefer to partition on a <code class="ph codeph">DECIMAL</code> column instead.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type is fully compatible with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong> This type is fully compatible with Parquet tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong> Values of this type are potentially larger in text tables than in tables
+ using Parquet or other binary formats.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as an 8-byte value.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+ high-performance hardware instructions, and distributed queries can perform these operations in different
+ order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+ and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+ large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+ repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+ <p class="p">
+ The inability to exactly represent certain floating-point values means that
+ <code class="ph codeph">DECIMAL</code> is sometimes a better choice than <code class="ph codeph">DOUBLE</code>
+ or <code class="ph codeph">FLOAT</code> when precision is critical, particularly when
+ transferring data from other database systems that use different representations
+ or file formats.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">BOOLEAN</code>, <code class="ph codeph">FLOAT</code>,
+ and <code class="ph codeph">DOUBLE</code> cannot be used for primary key columns in Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_literals.html#numeric_literals">Numeric Literals</a>, <a class="xref" href="impala_math_functions.html#math_functions">Impala Mathematical Functions</a>,
+ <a class="xref" href="impala_float.html#float">FLOAT Data Type</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_database.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_database.html b/docs/build3x/html/topics/impala_drop_database.html
new file mode 100644
index 0000000..9bbda27
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_database.html
@@ -0,0 +1,193 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_database"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP DATABASE Statement</title></head><body id="drop_database"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP DATABASE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Removes a database from the system. The physical operations involve removing the metadata for the database
+ from the metastore, and deleting the corresponding <code class="ph codeph">*.db</code> directory from HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DROP (DATABASE|SCHEMA) [IF EXISTS] <var class="keyword varname">database_name</var> <span class="ph">[RESTRICT | CASCADE]</span>;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ By default, the database must be empty before it can be dropped, to avoid losing any data.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, you can include the <code class="ph codeph">CASCADE</code>
+ clause to make Impala drop all tables and other objects in the database before dropping the database itself.
+ The <code class="ph codeph">RESTRICT</code> clause enforces the original requirement that the database be empty
+ before being dropped. Because the <code class="ph codeph">RESTRICT</code> behavior is still the default, this
+ clause is optional.
+ </p>
+
+ <p class="p">
+ The automatic dropping resulting from the <code class="ph codeph">CASCADE</code> clause follows the same rules as the
+ corresponding <code class="ph codeph">DROP TABLE</code>, <code class="ph codeph">DROP VIEW</code>, and <code class="ph codeph">DROP FUNCTION</code> statements.
+ In particular, the HDFS directories and data files for any external tables are left behind when the
+ tables are removed.
+ </p>
+
+ <p class="p">
+ When you do not use the <code class="ph codeph">CASCADE</code> clause, drop or move all the objects inside the database manually
+ before dropping the database itself:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Use the <code class="ph codeph">SHOW TABLES</code> statement to locate all tables and views in the database,
+ and issue <code class="ph codeph">DROP TABLE</code> and <code class="ph codeph">DROP VIEW</code> statements to remove them all.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Use the <code class="ph codeph">SHOW FUNCTIONS</code> and <code class="ph codeph">SHOW AGGREGATE FUNCTIONS</code> statements
+ to locate all user-defined functions in the database, and issue <code class="ph codeph">DROP FUNCTION</code>
+ and <code class="ph codeph">DROP AGGREGATE FUNCTION</code> statements to remove them all.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ To keep tables or views contained by a database while removing the database itself, use
+ <code class="ph codeph">ALTER TABLE</code> and <code class="ph codeph">ALTER VIEW</code> to move the relevant
+ objects to a different database before dropping the original database.
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ You cannot drop the current database, that is, the database your session connected to
+ either through the <code class="ph codeph">USE</code> statement or the <code class="ph codeph">-d</code> option of <span class="keyword cmdname">impala-shell</span>.
+ Issue a <code class="ph codeph">USE</code> statement to switch to a different database first.
+ Because the <code class="ph codeph">default</code> database is always available, issuing
+ <code class="ph codeph">USE default</code> is a convenient way to leave the current database
+ before dropping it.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Hive considerations:</strong>
+ </p>
+
+ <p class="p">
+ When you drop a database in Impala, the database can no longer be used by Hive.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+
+
+ <p class="p">
+ See <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a> for examples covering <code class="ph codeph">CREATE
+ DATABASE</code>, <code class="ph codeph">USE</code>, and <code class="ph codeph">DROP DATABASE</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+ <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+ as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+ Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have write
+ permission for the directory associated with the database.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <pre class="pre codeblock"><code>create database first_db;
+use first_db;
+create table t1 (x int);
+
+create database second_db;
+use second_db;
+-- Each database has its own namespace for tables.
+-- You can reuse the same table names in each database.
+create table t1 (s string);
+
+create database temp;
+
+-- You can either USE a database after creating it,
+-- or qualify all references to the table name with the name of the database.
+-- Here, tables T2 and T3 are both created in the TEMP database.
+
+create table temp.t2 (x int, y int);
+use database temp;
+create table t3 (s string);
+
+-- You cannot drop a database while it is selected by the USE statement.
+drop database temp;
+<em class="ph i">ERROR: AnalysisException: Cannot drop current default database: temp</em>
+
+-- The always-available database 'default' is a convenient one to USE
+-- before dropping a database you created.
+use default;
+
+-- Before dropping a database, first drop all the tables inside it,
+<span class="ph">-- or in <span class="keyword">Impala 2.3</span> and higher use the CASCADE clause.</span>
+drop database temp;
+ERROR: ImpalaRuntimeException: Error making 'dropDatabase' RPC to Hive Metastore:
+CAUSED BY: InvalidOperationException: Database temp is not empty
+show tables in temp;
++------+
+| name |
++------+
+| t3 |
++------+
+
+<span class="ph">-- <span class="keyword">Impala 2.3</span> and higher:</span>
+<span class="ph">drop database temp cascade;</span>
+
+-- Earlier releases:
+drop table temp.t3;
+drop database temp;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_databases.html#databases">Overview of Impala Databases</a>, <a class="xref" href="impala_create_database.html#create_database">CREATE DATABASE Statement</a>,
+ <a class="xref" href="impala_use.html#use">USE Statement</a>, <a class="xref" href="impala_show.html#show_databases">SHOW DATABASES</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_function.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_function.html b/docs/build3x/html/topics/impala_drop_function.html
new file mode 100644
index 0000000..a398e94
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_function.html
@@ -0,0 +1,136 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_function"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP FUNCTION Statement</title></head><body id="drop_function"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP FUNCTION Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Removes a user-defined function (UDF), so that it is not available for execution during Impala
+ <code class="ph codeph">SELECT</code> or <code class="ph codeph">INSERT</code> operations.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ To drop C++ UDFs and UDAs:
+ </p>
+
+<pre class="pre codeblock"><code>DROP [AGGREGATE] FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var>(<var class="keyword varname">type</var>[, <var class="keyword varname">type</var>...])</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The preceding syntax, which includes the function signature, also applies to Java UDFs that were created
+ using the corresponding <code class="ph codeph">CREATE FUNCTION</code> syntax that includes the argument and return types.
+ After upgrading to <span class="keyword">Impala 2.5</span> or higher, consider re-creating all Java UDFs with the
+ <code class="ph codeph">CREATE FUNCTION</code> syntax that does not include the function signature. Java UDFs created this
+ way are now persisted in the metastore database and do not need to be re-created after an Impala restart.
+ </p>
+ </div>
+
+ <p class="p">
+ To drop Java UDFs (created using the <code class="ph codeph">CREATE FUNCTION</code> syntax with no function signature):
+ </p>
+
+<pre class="pre codeblock"><code>DROP FUNCTION [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">function_name</var></code></pre>
+
+
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Because the same function name could be overloaded with different argument signatures, you specify the
+ argument types to identify the exact function to drop.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, Impala UDFs and UDAs written in C++ are persisted in the metastore database.
+ Java UDFs are also persisted, if they were created with the new <code class="ph codeph">CREATE FUNCTION</code> syntax for Java UDFs,
+ where the Java function argument and return types are omitted.
+ Java-based UDFs created with the old <code class="ph codeph">CREATE FUNCTION</code> syntax do not persist across restarts
+ because they are held in the memory of the <span class="keyword cmdname">catalogd</span> daemon.
+ Until you re-create such Java UDFs using the new <code class="ph codeph">CREATE FUNCTION</code> syntax,
+ you must reload those Java-based UDFs by running the original <code class="ph codeph">CREATE FUNCTION</code> statements again each time
+ you restart the <span class="keyword cmdname">catalogd</span> daemon.
+ Prior to <span class="keyword">Impala 2.5</span> the requirement to reload functions after a restart applied to both C++ and Java functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, does not need any
+ particular HDFS permissions to perform this statement.
+ All read and write operations are on the metastore database,
+ not HDFS files and directories.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows how to drop Java functions created with the signatureless
+ <code class="ph codeph">CREATE FUNCTION</code> syntax in <span class="keyword">Impala 2.5</span> and higher.
+ Issuing <code class="ph codeph">DROP FUNCTION <var class="keyword varname">function_name</var></code> removes all the
+ overloaded functions under that name.
+ (See <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a> for a longer example
+ showing how to set up such functions in the first place.)
+ </p>
+<pre class="pre codeblock"><code>
+create function my_func location '/user/impala/udfs/udf-examples.jar'
+ symbol='org.apache.impala.TestUdf';
+
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT | my_func(BIGINT) | JAVA | true |
+| BOOLEAN | my_func(BOOLEAN) | JAVA | true |
+| BOOLEAN | my_func(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+| BIGINT | testudf(BIGINT) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+
+drop function my_func;
+show functions;
++-------------+---------------------------------------+-------------+---------------+
+| return type | signature | binary type | is persistent |
++-------------+---------------------------------------+-------------+---------------+
+| BIGINT | testudf(BIGINT) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN) | JAVA | true |
+| BOOLEAN | testudf(BOOLEAN, BOOLEAN) | JAVA | true |
+...
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_udf.html#udfs">Impala User-Defined Functions (UDFs)</a>, <a class="xref" href="impala_create_function.html#create_function">CREATE FUNCTION Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_role.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_role.html b/docs/build3x/html/topics/impala_drop_role.html
new file mode 100644
index 0000000..53a5c73
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_role.html
@@ -0,0 +1,71 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_role"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP ROLE Statement (Impala 2.0 or higher only)</title></head><body id="drop_role"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP ROLE Statement (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+ The <code class="ph codeph">DROP ROLE</code> statement removes a role from the metastore database. Once dropped, the role
+ is revoked for all users to whom it was previously assigned, and all privileges granted to that role are
+ revoked. Queries that are already executing are not affected. Impala verifies the role information
+ approximately every 60 seconds, so the effects of <code class="ph codeph">DROP ROLE</code> might not take effect for new
+ Impala queries for a brief period.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DROP ROLE <var class="keyword varname">role_name</var>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Required privileges:</strong>
+ </p>
+
+ <p class="p">
+ Only administrative users (initially, a predefined set of users specified in the Sentry service configuration
+ file) can use this statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <p class="p">
+ Impala makes use of any roles and privileges specified by the <code class="ph codeph">GRANT</code> and
+ <code class="ph codeph">REVOKE</code> statements in Hive, and Hive makes use of any roles and privileges specified by the
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements in Impala. The Impala <code class="ph codeph">GRANT</code>
+ and <code class="ph codeph">REVOKE</code> statements for privileges do not require the <code class="ph codeph">ROLE</code> keyword to be
+ repeated before each role name, unlike the equivalent Hive statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>, <a class="xref" href="impala_grant.html#grant">GRANT Statement (Impala 2.0 or higher only)</a>
+ <a class="xref" href="impala_revoke.html#revoke">REVOKE Statement (Impala 2.0 or higher only)</a>, <a class="xref" href="impala_create_role.html#create_role">CREATE ROLE Statement (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_show.html#show">SHOW Statement</a>
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_stats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_stats.html b/docs/build3x/html/topics/impala_drop_stats.html
new file mode 100644
index 0000000..2175f20
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_stats.html
@@ -0,0 +1,285 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_stats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP STATS Statement</title></head><body id="drop_stats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP STATS Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Removes the specified statistics from a table or partition. The statistics were originally created by the
+ <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DROP STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var>
+DROP INCREMENTAL STATS [<var class="keyword varname">database_name</var>.]<var class="keyword varname">table_name</var> PARTITION (<var class="keyword varname">partition_spec</var>)
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION</code> clause is only allowed in combination with the <code class="ph codeph">INCREMENTAL</code>
+ clause. It is optional for <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, and required for <code class="ph codeph">DROP
+ INCREMENTAL STATS</code>. Whenever you specify partitions through the <code class="ph codeph">PARTITION
+ (<var class="keyword varname">partition_spec</var>)</code> clause in a <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> or
+ <code class="ph codeph">DROP INCREMENTAL STATS</code> statement, you must include all the partitioning columns in the
+ specification, and specify constant values for all the partition key columns.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">DROP STATS</code> removes all statistics from the table, whether created by <code class="ph codeph">COMPUTE
+ STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">DROP INCREMENTAL STATS</code> only affects incremental statistics for a single partition, specified
+ through the <code class="ph codeph">PARTITION</code> clause. The incremental stats are marked as outdated, so that they are
+ recomputed by the next <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ You typically use this statement when the statistics for a table or a partition have become stale due to data
+ files being added to or removed from the associated HDFS data directories, whether by manual HDFS operations
+ or <code class="ph codeph">INSERT</code>, <code class="ph codeph">INSERT OVERWRITE</code>, or <code class="ph codeph">LOAD DATA</code> statements, or
+ adding or dropping partitions.
+ </p>
+
+ <p class="p">
+ When a table or partition has no associated statistics, Impala treats it as essentially zero-sized when
+ constructing the execution plan for a query. In particular, the statistics influence the order in which
+ tables are joined in a join query. To ensure proper query planning and good query performance and
+ scalability, make sure to run <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on
+ the table or partition after removing any stale statistics.
+ </p>
+
+ <p class="p">
+ Dropping the statistics is not required for an unpartitioned table or a partitioned table covered by the
+ original type of statistics. A subsequent <code class="ph codeph">COMPUTE STATS</code> statement replaces any existing
+ statistics with new ones, for all partitions, regardless of whether the old ones were outdated. Therefore,
+ this statement was rarely used before the introduction of incremental statistics.
+ </p>
+
+ <p class="p">
+ Dropping the statistics is required for a partitioned table containing incremental statistics, to make a
+ subsequent <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement rescan an existing partition. See
+ <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a> for information about incremental statistics, a new feature
+ available in Impala 2.1.0 and higher.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ The user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, does not need any
+ particular HDFS permissions to perform this statement.
+ All read and write operations are on the metastore database,
+ not HDFS files and directories.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a partitioned table that has associated statistics produced by the
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and how the situation evolves as statistics are dropped
+ from specific partitions, then the entire table.
+ </p>
+
+ <p class="p">
+ Initially, all table and column statistics are filled in.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+-----------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned;
++------------------+-----------+------------------+--------+----------+--------------
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk | INT | 19443 | -1 | 4 | 4
+| i_item_id | STRING | 9025 | -1 | 16 | 16
+| i_rec_start_date | TIMESTAMP | 4 | -1 | 16 | 16
+| i_rec_end_date | TIMESTAMP | 3 | -1 | 16 | 16
+| i_item_desc | STRING | 13330 | -1 | 200 | 100.302803039
+| i_current_price | FLOAT | 2807 | -1 | 4 | 4
+| i_wholesale_cost | FLOAT | 2105 | -1 | 4 | 4
+| i_brand_id | INT | 965 | -1 | 4 | 4
+| i_brand | STRING | 725 | -1 | 22 | 16.1776008605
+| i_class_id | INT | 16 | -1 | 4 | 4
+| i_class | STRING | 101 | -1 | 15 | 7.76749992370
+| i_category_id | INT | 10 | -1 | 4 | 4
+| i_manufact_id | INT | 1857 | -1 | 4 | 4
+| i_manufact | STRING | 1028 | -1 | 15 | 11.3295001983
+| i_size | STRING | 8 | -1 | 11 | 4.33459997177
+| i_formulation | STRING | 12884 | -1 | 20 | 19.9799995422
+| i_color | STRING | 92 | -1 | 10 | 5.38089990615
+| i_units | STRING | 22 | -1 | 7 | 4.18690013885
+| i_container | STRING | 2 | -1 | 7 | 6.99259996414
+| i_manager_id | INT | 105 | -1 | 4 | 4
+| i_product_name | STRING | 19094 | -1 | 25 | 18.0233001708
+| i_category | STRING | 10 | 0 | -1 | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+ <p class="p">
+ To remove statistics for particular partitions, use the <code class="ph codeph">DROP INCREMENTAL STATS</code> statement.
+ After removing statistics for two partitions, the table-level statistics reflect that change in the
+ <code class="ph codeph">#Rows</code> and <code class="ph codeph">Incremental stats</code> fields. The counts, maximums, and averages of
+ the column-level statistics are unaffected.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ (It is possible that the row count might be preserved in future after a <code class="ph codeph">DROP INCREMENTAL
+ STATS</code> statement. Check the resolution of the issue
+ <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1615" target="_blank">IMPALA-1615</a>.)
+ </div>
+
+<pre class="pre codeblock"><code>drop incremental stats item_partitioned partition (i_category='Sports');
+drop incremental stats item_partitioned partition (i_category='Electronics');
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+-----------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+-----------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+--------------
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size
++------------------+-----------+------------------+--------+----------+--------------
+| i_item_sk | INT | 19443 | -1 | 4 | 4
+| i_item_id | STRING | 9025 | -1 | 16 | 16
+| i_rec_start_date | TIMESTAMP | 4 | -1 | 16 | 16
+| i_rec_end_date | TIMESTAMP | 3 | -1 | 16 | 16
+| i_item_desc | STRING | 13330 | -1 | 200 | 100.302803039
+| i_current_price | FLOAT | 2807 | -1 | 4 | 4
+| i_wholesale_cost | FLOAT | 2105 | -1 | 4 | 4
+| i_brand_id | INT | 965 | -1 | 4 | 4
+| i_brand | STRING | 725 | -1 | 22 | 16.1776008605
+| i_class_id | INT | 16 | -1 | 4 | 4
+| i_class | STRING | 101 | -1 | 15 | 7.76749992370
+| i_category_id | INT | 10 | -1 | 4 | 4
+| i_manufact_id | INT | 1857 | -1 | 4 | 4
+| i_manufact | STRING | 1028 | -1 | 15 | 11.3295001983
+| i_size | STRING | 8 | -1 | 11 | 4.33459997177
+| i_formulation | STRING | 12884 | -1 | 20 | 19.9799995422
+| i_color | STRING | 92 | -1 | 10 | 5.38089990615
+| i_units | STRING | 22 | -1 | 7 | 4.18690013885
+| i_container | STRING | 2 | -1 | 7 | 6.99259996414
+| i_manager_id | INT | 105 | -1 | 4 | 4
+| i_product_name | STRING | 19094 | -1 | 25 | 18.0233001708
+| i_category | STRING | 10 | 0 | -1 | -1
++------------------+-----------+------------------+--------+----------+--------------
+</code></pre>
+
+ <p class="p">
+ To remove all statistics from the table, whether produced by <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, use the <code class="ph codeph">DROP STATS</code> statement without the
+ <code class="ph codeph">INCREMENTAL</code> clause). Now, both table-level and column-level statistics are reset.
+ </p>
+
+<pre class="pre codeblock"><code>drop stats item_partitioned;
+
+show table stats item_partitioned
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | -1 | 1 | 223.74KB | NOT CACHED | PARQUET | false
+| Children | -1 | 1 | 230.05KB | NOT CACHED | PARQUET | false
+| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
+| Home | -1 | 1 | 232.56KB | NOT CACHED | PARQUET | false
+| Jewelry | -1 | 1 | 223.72KB | NOT CACHED | PARQUET | false
+| Men | -1 | 1 | 231.25KB | NOT CACHED | PARQUET | false
+| Music | -1 | 1 | 237.90KB | NOT CACHED | PARQUET | false
+| Shoes | -1 | 1 | 234.90KB | NOT CACHED | PARQUET | false
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | -1 | 1 | 226.27KB | NOT CACHED | PARQUET | false
+| Total | -1 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+show column stats item_partitioned
++------------------+-----------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++------------------+-----------+------------------+--------+----------+----------+
+| i_item_sk | INT | -1 | -1 | 4 | 4 |
+| i_item_id | STRING | -1 | -1 | -1 | -1 |
+| i_rec_start_date | TIMESTAMP | -1 | -1 | 16 | 16 |
+| i_rec_end_date | TIMESTAMP | -1 | -1 | 16 | 16 |
+| i_item_desc | STRING | -1 | -1 | -1 | -1 |
+| i_current_price | FLOAT | -1 | -1 | 4 | 4 |
+| i_wholesale_cost | FLOAT | -1 | -1 | 4 | 4 |
+| i_brand_id | INT | -1 | -1 | 4 | 4 |
+| i_brand | STRING | -1 | -1 | -1 | -1 |
+| i_class_id | INT | -1 | -1 | 4 | 4 |
+| i_class | STRING | -1 | -1 | -1 | -1 |
+| i_category_id | INT | -1 | -1 | 4 | 4 |
+| i_manufact_id | INT | -1 | -1 | 4 | 4 |
+| i_manufact | STRING | -1 | -1 | -1 | -1 |
+| i_size | STRING | -1 | -1 | -1 | -1 |
+| i_formulation | STRING | -1 | -1 | -1 | -1 |
+| i_color | STRING | -1 | -1 | -1 | -1 |
+| i_units | STRING | -1 | -1 | -1 | -1 |
+| i_container | STRING | -1 | -1 | -1 | -1 |
+| i_manager_id | INT | -1 | -1 | 4 | 4 |
+| i_product_name | STRING | -1 | -1 | -1 | -1 |
+| i_category | STRING | 10 | 0 | -1 | -1 |
++------------------+-----------+------------------+--------+----------+----------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_compute_stats.html#compute_stats">COMPUTE STATS Statement</a>, <a class="xref" href="impala_show.html#show_table_stats">SHOW TABLE STATS Statement</a>,
+ <a class="xref" href="impala_show.html#show_column_stats">SHOW COLUMN STATS Statement</a>, <a class="xref" href="impala_perf_stats.html#perf_stats">Table and Column Statistics</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_table.html b/docs/build3x/html/topics/impala_drop_table.html
new file mode 100644
index 0000000..ff98d9c
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_table.html
@@ -0,0 +1,192 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP TABLE Statement</title></head><body id="drop_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP TABLE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Removes an Impala table. Also removes the underlying HDFS data files for internal tables, although not for
+ external tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DROP TABLE [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">table_name</var> <span class="ph">[PURGE]</span></code></pre>
+
+ <p class="p">
+ <strong class="ph b">IF EXISTS clause:</strong>
+ </p>
+
+ <p class="p">
+ The optional <code class="ph codeph">IF EXISTS</code> clause makes the statement succeed whether or not the table exists.
+ If the table does exist, it is dropped; if it does not exist, the statement has no effect. This capability is
+ useful in standardized setup scripts that remove existing schema objects and create new ones. By using some
+ combination of <code class="ph codeph">IF EXISTS</code> for the <code class="ph codeph">DROP</code> statements and <code class="ph codeph">IF NOT
+ EXISTS</code> clauses for the <code class="ph codeph">CREATE</code> statements, the script can run successfully the first
+ time you run it (when the objects do not exist yet) and subsequent times (when some or all of the objects do
+ already exist).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">PURGE clause:</strong>
+ </p>
+
+ <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+ <span class="keyword">Impala 2.3</span> and higher, causes Impala to remove the associated
+ HDFS data files immediately, rather than going through the HDFS trashcan
+ mechanism. Use this keyword when dropping a table if it is crucial to
+ remove the data as quickly as possible to free up space, or if there is a
+ problem with the trashcan, such as the trash cannot being configured or
+ being in a different HDFS encryption zone than the data files. </p>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ By default, Impala removes the associated HDFS directory and data files for the table. If you issue a
+ <code class="ph codeph">DROP TABLE</code> and the data files are not deleted, it might be for the following reasons:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ If the table was created with the
+ <code class="ph codeph"><a class="xref" href="impala_tables.html#external_tables">EXTERNAL</a></code> clause, Impala leaves all
+ files and directories untouched. Use external tables when the data is under the control of other Hadoop
+ components, and Impala is only used to query the data files from their original locations.
+ </li>
+
+ <li class="li">
+ Impala might leave the data files behind unintentionally, if there is no HDFS location available to hold
+ the HDFS trashcan for the <code class="ph codeph">impala</code> user. See
+ <a class="xref" href="impala_prereqs.html#prereqs_account">User Account Requirements</a> for the procedure to set up the required HDFS home
+ directory.
+ </li>
+ </ul>
+
+ <p class="p">
+ Make sure that you are in the correct database before dropping a table, either by issuing a
+ <code class="ph codeph">USE</code> statement first or by using a fully qualified name
+ <code class="ph codeph"><var class="keyword varname">db_name</var>.<var class="keyword varname">table_name</var></code>.
+ </p>
+
+ <p class="p">
+ If you intend to issue a <code class="ph codeph">DROP DATABASE</code> statement, first issue <code class="ph codeph">DROP TABLE</code>
+ statements to remove all the tables in that database.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>create database temporary;
+use temporary;
+create table unimportant (x int);
+create table trivial (s string);
+-- Drop a table in the current database.
+drop table unimportant;
+-- Switch to a different database.
+use default;
+-- To drop a table in a different database...
+drop table trivial;
+<em class="ph i">ERROR: AnalysisException: Table does not exist: default.trivial</em>
+-- ...use a fully qualified name.
+drop table temporary.trivial;</code></pre>
+
+ <p class="p">
+ For other tips about managing and reclaiming Impala disk space, see
+ <a class="xref" href="../shared/../topics/impala_disk_space.html#disk_space">Managing Disk Space for Impala Data</a>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">DROP TABLE</code> statement can remove data files from S3
+ if the associated S3 table is an internal table.
+ In <span class="keyword">Impala 2.6</span> and higher, as part of improved support for writing
+ to S3, Impala also removes the associated folder when dropping an internal table
+ that resides on S3.
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about working with S3 tables.
+ </p>
+
+ <div class="p">
+ For best compatibility with the S3 write support in <span class="keyword">Impala 2.6</span>
+ and higher:
+ <ul class="ul">
+ <li class="li">Use native Hadoop techniques to create data files in S3 for querying through Impala.</li>
+ <li class="li">Use the <code class="ph codeph">PURGE</code> clause of <code class="ph codeph">DROP TABLE</code> when dropping internal (managed) tables.</li>
+ </ul>
+ By default, when you drop an internal (managed) table, the data files are
+ moved to the HDFS trashcan. This operation is expensive for tables that
+ reside on the Amazon S3 filesystem. Therefore, for S3 tables, prefer to use
+ <code class="ph codeph">DROP TABLE <var class="keyword varname">table_name</var> PURGE</code> rather than the default <code class="ph codeph">DROP TABLE</code> statement.
+ The <code class="ph codeph">PURGE</code> clause makes Impala delete the data files immediately,
+ skipping the HDFS trashcan.
+ For the <code class="ph codeph">PURGE</code> clause to work effectively, you must originally create the
+ data files on S3 using one of the tools from the Hadoop ecosystem, such as
+ <code class="ph codeph">hadoop fs -cp</code>, or <code class="ph codeph">INSERT</code> in Impala or Hive.
+ </div>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+ <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+ as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+ Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ For an internal table, the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, must have write
+ permission for all the files and directories that make up the table.
+ </p>
+ <p class="p">
+ For an external table, dropping the table only involves changes to metadata in the metastore database.
+ Because Impala does not remove any HDFS files or directories when external tables are dropped,
+ no particular permissions are needed for the associated HDFS files or directories.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Kudu tables can be managed or external, the same as with HDFS-based
+ tables. For a managed table, the underlying Kudu table and its data
+ are removed by <code class="ph codeph">DROP TABLE</code>. For an external table,
+ the underlying Kudu table and its data remain after a
+ <code class="ph codeph">DROP TABLE</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+ <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a>, <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>,
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+ <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_drop_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_drop_view.html b/docs/build3x/html/topics/impala_drop_view.html
new file mode 100644
index 0000000..523e50a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_drop_view.html
@@ -0,0 +1,80 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="drop_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>DROP VIEW Statement</title></head><body id="drop_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">DROP VIEW Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Removes the specified view, which was originally created by the <code class="ph codeph">CREATE VIEW</code> statement.
+ Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+ <code class="ph codeph">DROP VIEW</code> only involves changes to metadata in the metastore database, not any data files in
+ HDFS.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DROP VIEW [IF EXISTS] [<var class="keyword varname">db_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <div class="p">
+ The following example creates a series of views and then drops them. These examples illustrate how views
+ are associated with a particular database, and both the view definitions and the view names for
+ <code class="ph codeph">CREATE VIEW</code> and <code class="ph codeph">DROP VIEW</code> can refer to a view in the current database or
+ a fully qualified view name.
+<pre class="pre codeblock"><code>
+-- Create and drop a view in the current database.
+CREATE VIEW few_rows_from_t1 AS SELECT * FROM t1 LIMIT 10;
+DROP VIEW few_rows_from_t1;
+
+-- Create and drop a view referencing a table in a different database.
+CREATE VIEW table_from_other_db AS SELECT x FROM db1.foo WHERE x IS NOT NULL;
+DROP VIEW table_from_other_db;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Switch into the other database and drop the view.
+USE db2;
+DROP VIEW v1;
+
+USE db1;
+-- Create a view in a different database.
+CREATE VIEW db2.v1 AS SELECT * FROM db2.foo;
+-- Drop a view in the other database.
+DROP VIEW db2.v1;
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+ <a class="xref" href="impala_alter_view.html#alter_view">ALTER VIEW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html b/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
new file mode 100644
index 0000000..9ca982a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_exec_single_node_rows_threshold.html
@@ -0,0 +1,89 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="exec_single_node_rows_threshold"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (Impala 2.1 or higher only)</title></head><body id="exec_single_node_rows_threshold"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">EXEC_SINGLE_NODE_ROWS_THRESHOLD Query Option (<span class="keyword">Impala 2.1</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ This setting controls the cutoff point (in terms of number of rows scanned) below which Impala treats a query
+ as a <span class="q">"small"</span> query, turning off optimizations such as parallel execution and native code generation. The
+ overhead for these optimizations is applicable for queries involving substantial amounts of data, but it
+ makes sense to skip them for queries involving tiny amounts of data. Reducing the overhead for small queries
+ allows Impala to complete them more quickly, keeping YARN resources, admission control slots, and so on
+ available for data-intensive queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=<var class="keyword varname">number_of_rows</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 100
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Typically, you increase the default value to make this optimization apply to more queries.
+ If incorrect or corrupted table and column statistics cause Impala to apply this optimization
+ incorrectly to queries that actually involve substantial work, you might see the queries being slower as a
+ result of remote reads. In that case, recompute statistics with the <code class="ph codeph">COMPUTE STATS</code>
+ or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement. If there is a problem collecting accurate
+ statistics, you can turn this feature off by setting the value to -1.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong>
+ </p>
+
+ <p class="p">
+ This setting applies to query fragments where the amount of data to scan can be accurately determined, either
+ through table and column statistics, or by the presence of a <code class="ph codeph">LIMIT</code> clause. If Impala cannot
+ accurately estimate the size of the input data, this setting does not apply.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, where Impala supports the complex data types <code class="ph codeph">STRUCT</code>,
+ <code class="ph codeph">ARRAY</code>, and <code class="ph codeph">MAP</code>, if a query refers to any column of those types,
+ the small-query optimization is turned off for that query regardless of the
+ <code class="ph codeph">EXEC_SINGLE_NODE_ROWS_THRESHOLD</code> setting.
+ </p>
+
+ <p class="p">
+ For a query that is determined to be <span class="q">"small"</span>, all work is performed on the coordinator node. This might
+ result in some I/O being performed by remote reads. The savings from not distributing the query work and not
+ generating native code are expected to outweigh any overhead from the remote reads.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ A common use case is to query just a few rows from a table to inspect typical data values. In this example,
+ Impala does not parallelize the query or perform native code generation because the result set is guaranteed
+ to be smaller than the threshold value from this query option:
+ </p>
+
+<pre class="pre codeblock"><code>SET EXEC_SINGLE_NODE_ROWS_THRESHOLD=500;
+SELECT * FROM enormous_table LIMIT 300;
+</code></pre>
+
+
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_exec_time_limit_s.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_exec_time_limit_s.html b/docs/build3x/html/topics/impala_exec_time_limit_s.html
new file mode 100644
index 0000000..df2d28a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_exec_time_limit_s.html
@@ -0,0 +1,70 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="exec_time_limit_s"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>EXEC_TIME_LIMIT_S Query Option (Impala 2.12 or higher only)</title></head><body id="exec_time_limit_s"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">EXEC_TIME_LIMIT_S Query Option (<span class="keyword">Impala 2.12</span> or higher only)</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">EXEC_TIME_LIMIT_S</code> query option sets a time limit on query execution.
+ If a query is still executing when time limit expires, it is automatically canceled. The
+ option is intended to prevent runaway queries that execute for much longer than intended.
+ </p>
+
+ <p class="p">
+ For example, an Impala administrator could set a default value of
+ <code class="ph codeph">EXEC_TIME_LIMIT_S=3600</code> for a resource pool to automatically kill queries
+ that execute for longer than one hour (see
+ <a class="xref" href="impala_admission.html#admission_control">Admission Control and Query Queuing</a> for information about default query
+ options). Then, if a user accidentally runs a large query that executes for more than one
+ hour, it will be automatically killed after the time limit expires to free up resources.
+ Users can override the default value per query or per session if they do not want the
+ default <code class="ph codeph">EXEC_TIME_LIMIT_S</code> value to apply to a specific query or a
+ session.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The time limit only starts once the query is executing. Time spent planning the query,
+ scheduling the query, or in admission control is not counted towards the execution time
+ limit. <code class="ph codeph">SELECT</code> statements are eligible for automatic cancellation until
+ the client has fetched all result rows. DML queries are eligible for automatic
+ cancellation until the DML statement has finished.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>SET EXEC_TIME_LIMIT_S=<var class="keyword varname">seconds</var>;</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (no time limit )
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong>
+ <span class="keyword">Impala 2.12</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_timeouts.html#timeouts">Setting Timeout Periods for Daemons, Queries, and Sessions</a>
+ </p>
+
+ </div>
+
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[46/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_authorization.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_authorization.html b/docs/build3x/html/topics/impala_authorization.html
new file mode 100644
index 0000000..79c8cec
--- /dev/null
+++ b/docs/build3x/html/topics/impala_authorization.html
@@ -0,0 +1,1176 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_security.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="authorization"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Sentry Authorization for Impala</title></head><body id="authorization"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Enabling Sentry Authorization for Impala</h1>
+
+
+ <div class="body conbody" id="authorization__sentry">
+
+ <p class="p">
+ Authorization determines which users are allowed to access which resources, and what operations they are
+ allowed to perform. In Impala 1.1 and higher, you use Apache Sentry for
+ authorization. Sentry adds a fine-grained authorization framework for Hadoop. By default (when authorization
+ is not enabled), Impala does all read and write operations with the privileges of the <code class="ph codeph">impala</code>
+ user, which is suitable for a development/test environment but not for a secure production environment. When
+ authorization is enabled, Impala uses the OS user ID of the user who runs <span class="keyword cmdname">impala-shell</span> or
+ other client program, and associates various privileges with each user.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Sentry is typically used in conjunction with Kerberos authentication, which defines which hosts are allowed
+ to connect to each server. Using the combination of Sentry and Kerberos prevents malicious users from being
+ able to connect by creating a named account on an untrusted machine. See
+ <a class="xref" href="impala_kerberos.html#kerberos">Enabling Kerberos Authentication for Impala</a> for details about Kerberos authentication.
+ </div>
+
+ <p class="p toc inpage">
+ See the following sections for details about using the Impala authorization features:
+ </p>
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_security.html">Impala Security</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="authorization__sentry_priv_model">
+
+ <h2 class="title topictitle2" id="ariaid-title2">The Sentry Privilege Model</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+ associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+ hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+ database systems such as MySQL.
+ </p>
+
+ <p class="p">
+ The object hierarchy for Impala covers Server, URI, Database, Table, and Column. (The Table privileges apply to views as well;
+ anywhere you specify a table name, you can specify a view name instead.)
+ Column-level authorization is available in <span class="keyword">Impala 2.3</span> and higher.
+ Previously, you constructed views to query specific columns and assigned privilege based on
+ the views rather than the base tables. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+ <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+ in a table.
+ </p>
+
+ <p class="p">
+ A restricted set of privileges determines what you can do with each object:
+ </p>
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="sentry_priv_model__select_priv">
+ SELECT privilege
+ </dt>
+
+ <dd class="dd">
+ Lets you read data from a table or view, for example with the <code class="ph codeph">SELECT</code> statement, the
+ <code class="ph codeph">INSERT...SELECT</code> syntax, or <code class="ph codeph">CREATE TABLE...LIKE</code>. Also required to
+ issue the <code class="ph codeph">DESCRIBE</code> statement or the <code class="ph codeph">EXPLAIN</code> statement for a query
+ against a particular table. Only objects for which a user has this privilege are shown in the output
+ for <code class="ph codeph">SHOW DATABASES</code> and <code class="ph codeph">SHOW TABLES</code> statements. The
+ <code class="ph codeph">REFRESH</code> statement and <code class="ph codeph">INVALIDATE METADATA</code> statements only access
+ metadata for tables for which the user has this privilege.
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="sentry_priv_model__insert_priv">
+ INSERT privilege
+ </dt>
+
+ <dd class="dd">
+ Lets you write data to a table. Applies to the <code class="ph codeph">INSERT</code> and <code class="ph codeph">LOAD DATA</code>
+ statements.
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="sentry_priv_model__all_priv">
+ ALL privilege
+ </dt>
+
+ <dd class="dd">
+ Lets you create or modify the object. Required to run DDL statements such as <code class="ph codeph">CREATE
+ TABLE</code>, <code class="ph codeph">ALTER TABLE</code>, or <code class="ph codeph">DROP TABLE</code> for a table,
+ <code class="ph codeph">CREATE DATABASE</code> or <code class="ph codeph">DROP DATABASE</code> for a database, or <code class="ph codeph">CREATE
+ VIEW</code>, <code class="ph codeph">ALTER VIEW</code>, or <code class="ph codeph">DROP VIEW</code> for a view. Also required for
+ the URI of the <span class="q">"location"</span> parameter for the <code class="ph codeph">CREATE EXTERNAL TABLE</code> and
+ <code class="ph codeph">LOAD DATA</code> statements.
+
+ </dd>
+
+
+ </dl>
+
+ <p class="p">
+ Privileges can be specified for a table or view before that object actually exists. If you do not have
+ sufficient privilege to perform an operation, the error message does not disclose if the object exists or
+ not.
+ </p>
+
+ <p class="p">
+ Originally, privileges were encoded in a policy file, stored in HDFS. This mode of operation is still an
+ option, but the emphasis of privilege management is moving towards being SQL-based. Although currently
+ Impala does not have <code class="ph codeph">GRANT</code> or <code class="ph codeph">REVOKE</code> statements, Impala can make use of
+ privileges assigned through <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements done through
+ Hive. The mode of operation with <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements instead of
+ the policy file requires that a special Sentry service be enabled; this service stores, retrieves, and
+ manipulates privilege information stored inside the metastore database.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="authorization__secure_startup">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Starting the impalad Daemon with Sentry Authorization Enabled</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ To run the <span class="keyword cmdname">impalad</span> daemon with authorization enabled, you add one or more options to the
+ <code class="ph codeph">IMPALA_SERVER_ARGS</code> declaration in the <span class="ph filepath">/etc/default/impala</span>
+ configuration file:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">-server_name</code> option turns on Sentry authorization for Impala. The authorization
+ rules refer to a symbolic server name, and you specify the name to use as the argument to the
+ <code class="ph codeph">-server_name</code> option.
+ </li>
+
+ <li class="li">
+ If you specify just <code class="ph codeph">-server_name</code>, Impala uses the Sentry service for authorization,
+ relying on the results of <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statements issued through
+ Hive. (This mode of operation is available in Impala 1.4.0 and higher.) Prior to Impala 1.4.0, or if you
+ want to continue storing privilege rules in the policy file, also specify the
+ <code class="ph codeph">-authorization_policy_file</code> option as in the following item.
+ </li>
+
+ <li class="li">
+ Specifying the <code class="ph codeph">-authorization_policy_file</code> option in addition to
+ <code class="ph codeph">-server_name</code> makes Impala read privilege information from a policy file, rather than
+ from the metastore database. The argument to the <code class="ph codeph">-authorization_policy_file</code> option
+ specifies the HDFS path to the policy file that defines the privileges on different schema objects.
+ </li>
+ </ul>
+
+ <p class="p">
+ For example, you might adapt your <span class="ph filepath">/etc/default/impala</span> configuration to contain lines
+ like the following. To use the Sentry service rather than the policy file:
+ </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-server_name=server1 \
+...
+</code></pre>
+
+ <p class="p">
+ Or to use the policy file, as in releases prior to Impala 1.4:
+ </p>
+
+<pre class="pre codeblock"><code>IMPALA_SERVER_ARGS=" \
+-authorization_policy_file=/user/hive/warehouse/auth-policy.ini \
+-server_name=server1 \
+...
+</code></pre>
+
+ <p class="p">
+ The preceding examples set up a symbolic name of <code class="ph codeph">server1</code> to refer to the current instance
+ of Impala. This symbolic name is used in the following ways:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Specify the <code class="ph codeph">server1</code> value for the <code class="ph codeph">sentry.hive.server</code> property in the
+ <span class="ph filepath">sentry-site.xml</span> configuration file for Hive, as well as in the
+ <code class="ph codeph">-server_name</code> option for <span class="keyword cmdname">impalad</span>.
+ </p>
+ <p class="p">
+ If the <span class="keyword cmdname">impalad</span> daemon is not already running, start it as described in
+ <a class="xref" href="impala_processes.html#processes">Starting Impala</a>. If it is already running, restart it with the command
+ <code class="ph codeph">sudo /etc/init.d/impala-server restart</code>. Run the appropriate commands on all the nodes
+ where <span class="keyword cmdname">impalad</span> normally runs.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ If you use the mode of operation using the policy file, the rules in the <code class="ph codeph">[roles]</code>
+ section of the policy file refer to this same <code class="ph codeph">server1</code> name. For example, the following
+ rule sets up a role <code class="ph codeph">report_generator</code> that lets users with that role query any table in
+ a database named <code class="ph codeph">reporting_db</code> on a node where the <span class="keyword cmdname">impalad</span> daemon
+ was started up with the <code class="ph codeph">-server_name=server1</code> option:
+ </p>
+<pre class="pre codeblock"><code>[roles]
+report_generator = server=server1->db=reporting_db->table=*->action=SELECT
+</code></pre>
+ </li>
+ </ul>
+
+ <p class="p">
+ When <span class="keyword cmdname">impalad</span> is started with one or both of the <code class="ph codeph">-server_name=server1</code>
+ and <code class="ph codeph">-authorization_policy_file</code> options, Impala authorization is enabled. If Impala detects
+ any errors or inconsistencies in the authorization settings or the policy file, the daemon refuses to
+ start.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="authorization__sentry_service">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Using Impala with the Sentry Service (<span class="keyword">Impala 1.4</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When you use the Sentry service rather than the policy file, you set up privileges through
+ <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> statement in either Impala or Hive, then both components
+ use those same privileges automatically. (Impala added the <code class="ph codeph">GRANT</code> and
+ <code class="ph codeph">REVOKE</code> statements in <span class="keyword">Impala 2.0</span>.)
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="authorization__security_policy_file">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Using Impala with the Sentry Policy File</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The policy file is a file that you put in a designated location in HDFS, and is read during the startup of
+ the <span class="keyword cmdname">impalad</span> daemon when you specify both the <code class="ph codeph">-server_name</code> and
+ <code class="ph codeph">-authorization_policy_file</code> startup options. It controls which objects (databases, tables,
+ and HDFS directory paths) can be accessed by the user who connects to <span class="keyword cmdname">impalad</span>, and what
+ operations that user can perform on the objects.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The Sentry service, as described in <a class="xref" href="impala_authorization.html#sentry_service">Using Impala with the Sentry Service (Impala 1.4 or higher only)</a>, stores
+ authorization metadata in a relational database. This means you can manage user privileges for Impala tables
+ using traditional <code class="ph codeph">GRANT</code> and <code class="ph codeph">REVOKE</code> SQL statements, rather than the
+ policy file approach described here.If you are still using policy files, migrate to the
+ database-backed service whenever practical.
+ </p>
+ </div>
+
+ <p class="p">
+ The location of the policy file is listed in the <span class="ph filepath">auth-site.xml</span> configuration file. To
+ minimize overhead, the security information from this file is cached by each <span class="keyword cmdname">impalad</span>
+ daemon and refreshed automatically, with a default interval of 5 minutes. After making a substantial change
+ to security policies, restart all Impala daemons to pick up the changes immediately.
+ </p>
+
+ <p class="p toc inpage"></p>
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="security_policy_file__security_policy_file_details">
+
+ <h3 class="title topictitle3" id="ariaid-title6">Policy File Location and Format</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The policy file uses the familiar <code class="ph codeph">.ini</code> format, divided into the major sections
+ <code class="ph codeph">[groups]</code> and <code class="ph codeph">[roles]</code>. There is also an optional
+ <code class="ph codeph">[databases]</code> section, which allows you to specify a specific policy file for a particular
+ database, as explained in <a class="xref" href="#security_multiple_policy_files">Using Multiple Policy Files for Different Databases</a>. Another optional section,
+ <code class="ph codeph">[users]</code>, allows you to override the OS-level mapping of users to groups; that is an
+ advanced technique primarily for testing and debugging, and is beyond the scope of this document.
+ </p>
+
+ <p class="p">
+ In the <code class="ph codeph">[groups]</code> section, you define various categories of users and select which roles
+ are associated with each category. The group and usernames correspond to Linux groups and users on the
+ server where the <span class="keyword cmdname">impalad</span> daemon runs.
+ </p>
+
+ <p class="p">
+ The group and usernames in the <code class="ph codeph">[groups]</code> section correspond to Linux groups and users on
+ the server where the <span class="keyword cmdname">impalad</span> daemon runs. When you access Impala through the
+ <span class="keyword cmdname">impalad</span> interpreter, for purposes of authorization, the user is the logged-in Linux
+ user and the groups are the Linux groups that user is a member of. When you access Impala through the
+ ODBC or JDBC interfaces, the user and password specified through the connection string are used as login
+ credentials for the Linux server, and authorization is based on that username and the associated Linux
+ group membership.
+ </p>
+
+ <div class="p">
+ In the <code class="ph codeph">[roles]</code> section, you a set of roles. For each role, you specify precisely the set
+ of privileges is available. That is, which objects users with that role can access, and what operations
+ they can perform on those objects. This is the lowest-level category of security information; the other
+ sections in the policy file map the privileges to higher-level divisions of groups and users. In the
+ <code class="ph codeph">[groups]</code> section, you specify which roles are associated with which groups. The group
+ and usernames correspond to Linux groups and users on the server where the <span class="keyword cmdname">impalad</span>
+ daemon runs. The privileges are specified using patterns like:
+<pre class="pre codeblock"><code>server=<var class="keyword varname">server_name</var>->db=<var class="keyword varname">database_name</var>->table=<var class="keyword varname">table_name</var>->action=SELECT
+server=<var class="keyword varname">server_name</var>->db=<var class="keyword varname">database_name</var>->table=<var class="keyword varname">table_name</var>->action=CREATE
+server=<var class="keyword varname">server_name</var>->db=<var class="keyword varname">database_name</var>->table=<var class="keyword varname">table_name</var>->action=ALL
+</code></pre>
+ For the <var class="keyword varname">server_name</var> value, substitute the same symbolic name you specify with the
+ <span class="keyword cmdname">impalad</span> <code class="ph codeph">-server_name</code> option. You can use <code class="ph codeph">*</code> wildcard
+ characters at each level of the privilege specification to allow access to all such objects. For example:
+<pre class="pre codeblock"><code>server=impala-host.example.com->db=default->table=t1->action=SELECT
+server=impala-host.example.com->db=*->table=*->action=CREATE
+server=impala-host.example.com->db=*->table=audit_log->action=SELECT
+server=impala-host.example.com->db=default->table=t1->action=*
+</code></pre>
+ </div>
+
+ <p class="p">
+ When authorization is enabled, Impala uses the policy file as a <em class="ph i">whitelist</em>, representing every
+ privilege available to any user on any object. That is, only operations specified for the appropriate
+ combination of object, role, group, and user are allowed; all other operations are not allowed. If a
+ group or role is defined multiple times in the policy file, the last definition takes precedence.
+ </p>
+
+ <p class="p">
+ To understand the notion of whitelisting, set up a minimal policy file that does not provide any
+ privileges for any object. When you connect to an Impala node where this policy file is in effect, you
+ get no results for <code class="ph codeph">SHOW DATABASES</code>, and an error when you issue any <code class="ph codeph">SHOW
+ TABLES</code>, <code class="ph codeph">USE <var class="keyword varname">database_name</var></code>, <code class="ph codeph">DESCRIBE
+ <var class="keyword varname">table_name</var></code>, <code class="ph codeph">SELECT</code>, and or other statements that expect to
+ access databases or tables, even if the corresponding databases and tables exist.
+ </p>
+
+ <p class="p">
+ The contents of the policy file are cached, to avoid a performance penalty for each query. The policy
+ file is re-checked by each <span class="keyword cmdname">impalad</span> node every 5 minutes. When you make a
+ non-time-sensitive change such as adding new privileges or new users, you can let the change take effect
+ automatically a few minutes later. If you remove or reduce privileges, and want the change to take effect
+ immediately, restart the <span class="keyword cmdname">impalad</span> daemon on all nodes, again specifying the
+ <code class="ph codeph">-server_name</code> and <code class="ph codeph">-authorization_policy_file</code> options so that the rules
+ from the updated policy file are applied.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="security_policy_file__security_examples">
+
+ <h3 class="title topictitle3" id="ariaid-title7">Examples of Policy File Rules for Security Scenarios</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following examples show rules that might go in the policy file to deal with various
+ authorization-related scenarios. For illustration purposes, this section shows several very small policy
+ files with only a few rules each. In your environment, typically you would define many roles to cover all
+ the scenarios involving your own databases, tables, and applications, and a smaller number of groups,
+ whose members are given the privileges from one or more roles.
+ </p>
+
+ <div class="example" id="security_examples__sec_ex_unprivileged"><h4 class="title sectiontitle">A User with No Privileges</h4>
+
+
+
+ <p class="p">
+ If a user has no privileges at all, that user cannot access any schema objects in the system. The error
+ messages do not disclose the names or existence of objects that the user is not authorized to read.
+ </p>
+
+ <p class="p">
+
+ This is the experience you want a user to have if they somehow log into a system where they are not an
+ authorized Impala user. In a real deployment with a filled-in policy file, a user might have no
+ privileges because they are not a member of any of the relevant groups mentioned in the policy file.
+ </p>
+
+
+
+ </div>
+
+ <div class="example" id="security_examples__sec_ex_superuser"><h4 class="title sectiontitle">Examples of Privileges for Administrative Users</h4>
+
+
+
+ <p class="p">
+ When an administrative user has broad access to tables or databases, the associated rules in the
+ <code class="ph codeph">[roles]</code> section typically use wildcards and/or inheritance. For example, in the
+ following sample policy file, <code class="ph codeph">db=*</code> refers to all databases and
+ <code class="ph codeph">db=*->table=*</code> refers to all tables in all databases.
+ </p>
+
+ <p class="p">
+ Omitting the rightmost portion of a rule means that the privileges apply to all the objects that could
+ be specified there. For example, in the following sample policy file, the
+ <code class="ph codeph">all_databases</code> role has all privileges for all tables in all databases, while the
+ <code class="ph codeph">one_database</code> role has all privileges for all tables in one specific database. The
+ <code class="ph codeph">all_databases</code> role does not grant privileges on URIs, so a group with that role could
+ not issue a <code class="ph codeph">CREATE TABLE</code> statement with a <code class="ph codeph">LOCATION</code> clause. The
+ <code class="ph codeph">entire_server</code> role has all privileges on both databases and URIs within the server.
+ </p>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = all_databases
+
+[roles]
+read_all_tables = server=server1->db=*->table=*->action=SELECT
+all_tables = server=server1->db=*->table=*
+all_databases = server=server1->db=*
+one_database = server=server1->db=test_db
+entire_server = server=server1
+</code></pre>
+
+ </div>
+
+ <div class="example" id="security_examples__sec_ex_detailed"><h4 class="title sectiontitle">A User with Privileges for Specific Databases and Tables</h4>
+
+
+
+ <p class="p">
+ If a user has privileges for specific tables in specific databases, the user can access those things
+ but nothing else. They can see the tables and their parent databases in the output of <code class="ph codeph">SHOW
+ TABLES</code> and <code class="ph codeph">SHOW DATABASES</code>, <code class="ph codeph">USE</code> the appropriate databases,
+ and perform the relevant actions (<code class="ph codeph">SELECT</code> and/or <code class="ph codeph">INSERT</code>) based on the
+ table privileges. To actually create a table requires the <code class="ph codeph">ALL</code> privilege at the
+ database level, so you might define separate roles for the user that sets up a schema and other users
+ or applications that perform day-to-day operations on the tables.
+ </p>
+
+ <p class="p">
+ The following sample policy file shows some of the syntax that is appropriate as the policy file grows,
+ such as the <code class="ph codeph">#</code> comment syntax, <code class="ph codeph">\</code> continuation syntax, and comma
+ separation for roles assigned to groups or privileges assigned to roles.
+ </p>
+
+<pre class="pre codeblock"><code>[groups]
+employee = training_sysadmin, instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1->db=training, \
+server=server1->db=instructor_private, \
+server=server1->db=lesson_development
+instructor = server=server1->db=training->table=*->action=*, \
+server=server1->db=instructor_private->table=*->action=*, \
+server=server1->db=lesson_development->table=lesson*
+# This particular course is all about queries, so the students can SELECT but not INSERT or CREATE/DROP.
+student = server=server1->db=training->table=lesson_*->action=SELECT
+</code></pre>
+
+ </div>
+
+
+
+ <div class="example" id="security_examples__sec_ex_external_files"><h4 class="title sectiontitle">Privileges for Working with External Data Files</h4>
+
+
+
+ <p class="p">
+ When data is being inserted through the <code class="ph codeph">LOAD DATA</code> statement, or is referenced from an
+ HDFS location outside the normal Impala database directories, the user also needs appropriate
+ permissions on the URIs corresponding to those HDFS locations.
+ </p>
+
+ <p class="p">
+ In this sample policy file:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">external_table</code> role lets us insert into and query the Impala table,
+ <code class="ph codeph">external_table.sample</code>.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">staging_dir</code> role lets us specify the HDFS path
+ <span class="ph filepath">/user/username/external_data</span> with the <code class="ph codeph">LOAD DATA</code> statement.
+ Remember, when Impala queries or loads data files, it operates on all the files in that directory,
+ not just a single file, so any Impala <code class="ph codeph">LOCATION</code> parameters refer to a directory
+ rather than an individual file.
+ </li>
+
+ <li class="li">
+ We included the IP address and port of the Hadoop name node in the HDFS URI of the
+ <code class="ph codeph">staging_dir</code> rule. We found those details in
+ <span class="ph filepath">/etc/hadoop/conf/core-site.xml</span>, under the <code class="ph codeph">fs.default.name</code>
+ element. That is what we use in any roles that specify URIs (that is, the locations of directories in
+ HDFS).
+ </li>
+
+ <li class="li">
+ We start this example after the table <code class="ph codeph">external_table.sample</code> is already created. In
+ the policy file for the example, we have already taken away the <code class="ph codeph">external_table_admin</code>
+ role from the <code class="ph codeph">username</code> group, and replaced it with the lesser-privileged
+ <code class="ph codeph">external_table</code> role.
+ </li>
+
+ <li class="li">
+ We assign privileges to a subdirectory underneath <span class="ph filepath">/user/username</span> in HDFS,
+ because such privileges also apply to any subdirectories underneath. If we had assigned privileges to
+ the parent directory <span class="ph filepath">/user/username</span>, it would be too likely to mess up other
+ files by specifying a wrong location by mistake.
+ </li>
+
+ <li class="li">
+ The <code class="ph codeph">username</code> under the <code class="ph codeph">[groups]</code> section refers to the
+ <code class="ph codeph">username</code> group. (In this example, there is a <code class="ph codeph">username</code> user
+ that is a member of a <code class="ph codeph">username</code> group.)
+ </li>
+ </ul>
+
+ <p class="p">
+ Policy file:
+ </p>
+
+<pre class="pre codeblock"><code>[groups]
+username = external_table, staging_dir
+
+[roles]
+external_table_admin = server=server1->db=external_table
+external_table = server=server1->db=external_table->table=sample->action=*
+staging_dir = server=server1->uri=hdfs://127.0.0.1:8020/user/username/external_data->action=*
+</code></pre>
+
+ <p class="p">
+ <span class="keyword cmdname">impala-shell</span> session:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > use external_table;
+Query: use external_table
+[localhost:21000] > show tables;
+Query: show tables
+Query finished, fetching results ...
++--------+
+| name |
++--------+
+| sample |
++--------+
+Returned 1 row(s) in 0.02s
+
+[localhost:21000] > select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-----+
+| x |
++-----+
+| 1 |
+| 5 |
+| 150 |
++-----+
+Returned 3 row(s) in 1.04s
+
+[localhost:21000] > load data inpath '/user/username/external_data' into table sample;
+Query: load data inpath '/user/username/external_data' into table sample
+Query finished, fetching results ...
++----------------------------------------------------------+
+| summary |
++----------------------------------------------------------+
+| Loaded 1 file(s). Total files in destination location: 2 |
++----------------------------------------------------------+
+Returned 1 row(s) in 0.26s
+[localhost:21000] > select * from sample;
+Query: select * from sample
+Query finished, fetching results ...
++-------+
+| x |
++-------+
+| 2 |
+| 4 |
+| 6 |
+| 8 |
+| 64738 |
+| 49152 |
+| 1 |
+| 5 |
+| 150 |
++-------+
+Returned 9 row(s) in 0.22s
+
+[localhost:21000] > load data inpath '/user/username/unauthorized_data' into table sample;
+Query: load data inpath '/user/username/unauthorized_data' into table sample
+ERROR: AuthorizationException: User 'username' does not have privileges to access: hdfs://127.0.0.1:8020/user/username/unauthorized_data
+</code></pre>
+
+ </div>
+
+
+
+ <div class="example" id="security_examples__sec_sysadmin"><h4 class="title sectiontitle">Separating Administrator Responsibility from Read and Write Privileges</h4>
+
+
+
+ <p class="p">
+ Remember that to create a database requires full privilege on that database, while day-to-day
+ operations on tables within that database can be performed with lower levels of privilege on specific
+ table. Thus, you might set up separate roles for each database or application: an administrative one
+ that could create or drop the database, and a user-level one that can access only the relevant tables.
+ </p>
+
+ <p class="p">
+ For example, this policy file divides responsibilities between users in 3 different groups:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Members of the <code class="ph codeph">supergroup</code> group have the <code class="ph codeph">training_sysadmin</code> role and
+ so can set up a database named <code class="ph codeph">training</code>.
+ </li>
+
+ <li class="li"> Members of the <code class="ph codeph">employee</code> group have the
+ <code class="ph codeph">instructor</code> role and so can create, insert into,
+ and query any tables in the <code class="ph codeph">training</code> database,
+ but cannot create or drop the database itself. </li>
+
+ <li class="li">
+ Members of the <code class="ph codeph">visitor</code> group have the <code class="ph codeph">student</code> role and so can query
+ those tables in the <code class="ph codeph">training</code> database.
+ </li>
+ </ul>
+
+<pre class="pre codeblock"><code>[groups]
+supergroup = training_sysadmin
+employee = instructor
+visitor = student
+
+[roles]
+training_sysadmin = server=server1->db=training
+instructor = server=server1->db=training->table=*->action=*
+student = server=server1->db=training->table=*->action=SELECT
+</code></pre>
+
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="security_policy_file__security_multiple_policy_files">
+
+ <h3 class="title topictitle3" id="ariaid-title8">Using Multiple Policy Files for Different Databases</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For an Impala cluster with many databases being accessed by many users and applications, it might be
+ cumbersome to update the security policy file for each privilege change or each new database, table, or
+ view. You can allow security to be managed separately for individual databases, by setting up a separate
+ policy file for each database:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Add the optional <code class="ph codeph">[databases]</code> section to the main policy file.
+ </li>
+
+ <li class="li">
+ Add entries in the <code class="ph codeph">[databases]</code> section for each database that has its own policy file.
+ </li>
+
+ <li class="li">
+ For each listed database, specify the HDFS path of the appropriate policy file.
+ </li>
+ </ul>
+
+ <p class="p">
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>[databases]
+# Defines the location of the per-DB policy files for the 'customers' and 'sales' databases.
+customers = hdfs://ha-nn-uri/etc/access/customers.ini
+sales = hdfs://ha-nn-uri/etc/access/sales.ini
+</code></pre>
+
+ <p class="p">
+ To enable URIs in per-DB policy files, the Java configuration option <code class="ph codeph">sentry.allow.uri.db.policyfile</code>
+ must be set to <code class="ph codeph">true</code>. For example:
+ </p>
+
+<pre class="pre codeblock"><code>JAVA_TOOL_OPTIONS="-Dsentry.allow.uri.db.policyfile=true"
+</code></pre>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ Enabling URIs in per-DB policy files introduces a security risk by allowing the owner of the db-level
+ policy file to grant himself/herself load privileges to anything the <code class="ph codeph">impala</code> user has
+ read permissions for in HDFS (including data in other databases controlled by different db-level policy
+ files).
+ </div>
+ </div>
+ </article>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="authorization__security_schema">
+
+ <h2 class="title topictitle2" id="ariaid-title9">Setting Up Schema Objects for a Secure Impala Deployment</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Remember that in your role definitions, you specify privileges at the level of individual databases and
+ tables, or all databases or all tables within a database. To simplify the structure of these rules, plan
+ ahead of time how to name your schema objects so that data with different authorization requirements is
+ divided into separate databases.
+ </p>
+
+ <p class="p">
+ If you are adding security on top of an existing Impala deployment, remember that you can rename tables or
+ even move them between databases using the <code class="ph codeph">ALTER TABLE</code> statement. In Impala, creating new
+ databases is a relatively inexpensive operation, basically just creating a new directory in HDFS.
+ </p>
+
+ <p class="p">
+ You can also plan the security scheme and set up the policy file before the actual schema objects named in
+ the policy file exist. Because the authorization capability is based on whitelisting, a user can only
+ create a new database or table if the required privilege is already in the policy file: either by listing
+ the exact name of the object being created, or a <code class="ph codeph">*</code> wildcard to match all the applicable
+ objects within the appropriate container.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="authorization__security_privileges">
+
+ <h2 class="title topictitle2" id="ariaid-title10">Privilege Model and Object Hierarchy</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Privileges can be granted on different objects in the schema. Any privilege that can be granted is
+ associated with a level in the object hierarchy. If a privilege is granted on a container object in the
+ hierarchy, the child object automatically inherits it. This is the same privilege model as Hive and other
+ database systems such as MySQL.
+ </p>
+
+ <p class="p">
+ The kinds of objects in the schema hierarchy are:
+ </p>
+
+<pre class="pre codeblock"><code>Server
+URI
+Database
+ Table
+</code></pre>
+
+ <p class="p">
+ The server name is specified by the <code class="ph codeph">-server_name</code> option when <span class="keyword cmdname">impalad</span>
+ starts. Specify the same name for all <span class="keyword cmdname">impalad</span> nodes in the cluster.
+ </p>
+
+ <p class="p">
+ URIs represent the HDFS paths you specify as part of statements such as <code class="ph codeph">CREATE EXTERNAL
+ TABLE</code> and <code class="ph codeph">LOAD DATA</code>. Typically, you specify what look like UNIX paths, but these
+ locations can also be prefixed with <code class="ph codeph">hdfs://</code> to make clear that they are really URIs. To
+ set privileges for a URI, specify the name of a directory, and the privilege applies to all the files in
+ that directory and any directories underneath it.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, you can specify privileges for individual columns.
+ Formerly, to specify read privileges at this level, you created a view that queried specific columns
+ and/or partitions from a base table, and gave <code class="ph codeph">SELECT</code> privilege on the view but not
+ the underlying table. Now, you can use Impala's <a class="xref" href="impala_grant.html">GRANT Statement (Impala 2.0 or higher only)</a> and
+ <a class="xref" href="impala_revoke.html">REVOKE Statement (Impala 2.0 or higher only)</a> statements to assign and revoke privileges from specific columns
+ in a table.
+ </p>
+
+ <div class="p">
+ URIs must start with either <code class="ph codeph">hdfs://</code> or <code class="ph codeph">file://</code>. If a URI starts with
+ anything else, it will cause an exception and the policy file will be invalid. When defining URIs for HDFS,
+ you must also specify the NameNode. For example:
+<pre class="pre codeblock"><code>data_read = server=server1->uri=file:///path/to/dir, \
+server=server1->uri=hdfs://namenode:port/path/to/dir
+</code></pre>
+ <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+ <p class="p">
+ Because the NameNode host and port must be specified, enable High Availability (HA) to ensure
+ that the URI will remain constant even if the NameNode changes.
+ </p>
+<pre class="pre codeblock"><code>data_read = server=server1->uri=file:///path/to/dir,\ server=server1->uri=hdfs://ha-nn-uri/path/to/dir
+</code></pre>
+ </div>
+ </div>
+
+
+
+
+
+ <table class="table"><caption><span class="table--title-label">Table 1. </span><span class="title">Valid privilege types and objects they apply to</span></caption><colgroup><col style="width:33.33333333333333%"><col style="width:66.66666666666666%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="security_privileges__entry__1"><strong class="ph b">Privilege</strong></th>
+ <th class="entry nocellnorowborder" id="security_privileges__entry__2"><strong class="ph b">Object</strong></th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">INSERT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">SELECT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">DB, TABLE, COLUMN</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__1 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__2 ">SERVER, TABLE, DB, URI</td>
+ </tr>
+ </tbody></table>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ Although this document refers to the <code class="ph codeph">ALL</code> privilege, currently if you use the policy file
+ mode, you do not use the actual keyword <code class="ph codeph">ALL</code> in the policy file. When you code role
+ entries in the policy file:
+ </p>
+ <ul class="ul">
+ <li class="li">
+ To specify the <code class="ph codeph">ALL</code> privilege for a server, use a role like
+ <code class="ph codeph">server=<var class="keyword varname">server_name</var></code>.
+ </li>
+
+ <li class="li">
+ To specify the <code class="ph codeph">ALL</code> privilege for a database, use a role like
+ <code class="ph codeph">server=<var class="keyword varname">server_name</var>->db=<var class="keyword varname">database_name</var></code>.
+ </li>
+
+ <li class="li">
+ To specify the <code class="ph codeph">ALL</code> privilege for a table, use a role like
+ <code class="ph codeph">server=<var class="keyword varname">server_name</var>->db=<var class="keyword varname">database_name</var>->table=<var class="keyword varname">table_name</var>->action=*</code>.
+ </li>
+ </ul>
+ </div>
+ <table class="table"><caption></caption><colgroup><col style="width:29.241071428571423%"><col style="width:26.116071428571423%"><col style="width:22.32142857142857%"><col style="width:22.32142857142857%"></colgroup><thead class="thead">
+ <tr class="row">
+ <th class="entry nocellnorowborder" id="security_privileges__entry__9">
+ Operation
+ </th>
+ <th class="entry nocellnorowborder" id="security_privileges__entry__10">
+ Scope
+ </th>
+ <th class="entry nocellnorowborder" id="security_privileges__entry__11">
+ Privileges
+ </th>
+ <th class="entry nocellnorowborder" id="security_privileges__entry__12">
+ URI
+ </th>
+ </tr>
+ </thead><tbody class="tbody">
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">EXPLAIN</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE; COLUMN</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">LOAD DATA</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">INSERT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DESCRIBE TABLE<p class="p">-Output shows <em class="ph i">all</em> columns if the
+ user has table level-privileges or <code class="ph codeph">SELECT</code>
+ privilege on at least one table column</p></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD COLUMNS</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. REPLACE COLUMNS</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. CHANGE column</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. RENAME</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET TBLPROPERTIES</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET FILEFORMAT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. ADD PARTITION location</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. DROP PARTITION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. PARTITION SET FILEFORMAT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET SERDEPROPERTIES</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE VIEW<p class="p">-This operation is allowed if you have
+ column-level <code class="ph codeph">SELECT</code> access to the columns
+ being used.</p></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">DATABASE; SELECT on TABLE; </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP VIEW</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row" id="security_privileges__alter_view_privs">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+ ALTER VIEW
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+ You need <code class="ph codeph">ALL</code> privilege on the named view <span class="ph">and the parent
+ database</span>, plus <code class="ph codeph">SELECT</code> privilege for any tables or views referenced by the
+ view query. Once the view is created or altered by a high-privileged system administrator, it can
+ be queried by a lower-privileged user who does not have full query privileges for the base tables.
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ ALL, SELECT
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">ALTER TABLE .. SET LOCATION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL on DATABASE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 ">URI</td>
+ </tr>
+ <tr class="row" id="security_privileges__create_external_table_privs">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+ CREATE EXTERNAL TABLE
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+ Database (ALL), URI (SELECT)
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ ALL, SELECT
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">SELECT<p class="p">-You can grant the SELECT privilege on a view to
+ give users access to specific columns of a table they do not
+ otherwise have access to.</p><p class="p">-See
+ <span class="xref">the documentation for Apache Sentry</span>
+ for details on allowed column-level
+ operations.</p></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">VIEW/TABLE; COLUMN</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">USE <dbName></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">Any</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 "></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">CREATE FUNCTION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">DROP FUNCTION</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">REFRESH <table name> or REFRESH <table name> PARTITION (<partition_spec>)</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">SERVER</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">INVALIDATE METADATA <table name></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">SELECT/INSERT</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">COMPUTE STATS</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">TABLE</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">ALL</td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row" id="security_privileges__show_table_stats_privs">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+ SHOW TABLE STATS, SHOW PARTITIONS
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+ TABLE
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ SELECT/INSERT
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" id="security_privileges__show_column_stats_privs" headers="security_privileges__entry__9 ">
+ SHOW COLUMN STATS
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+ TABLE
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ SELECT/INSERT
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row">
+ <td class="entry nocellnorowborder" id="security_privileges__show_functions_privs" headers="security_privileges__entry__9 ">
+ SHOW FUNCTIONS
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 ">
+ DATABASE
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ SELECT
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row" id="security_privileges__show_tables_privs">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+ SHOW TABLES
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ No special privileges needed to issue the statement, but only shows objects you are authorized for
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ <tr class="row" id="security_privileges__show_databases_privs">
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__9 ">
+ SHOW DATABASES, SHOW SCHEMAS
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__10 "></td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__11 ">
+ No special privileges needed to issue the statement, but only shows objects you are authorized for
+ </td>
+ <td class="entry nocellnorowborder" headers="security_privileges__entry__12 "></td>
+ </tr>
+ </tbody></table>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="authorization__sentry_debug">
+
+ <h2 class="title topictitle2" id="ariaid-title11"><span class="ph">Debugging Failed Sentry Authorization Requests</span></h2>
+
+ <div class="body conbody">
+
+ <div class="p">
+ Sentry logs all facts that lead up to authorization decisions at the debug level. If you do not understand
+ why Sentry is denying access, the best way to debug is to temporarily turn on debug logging:
+ <ul class="ul">
+ <li class="li">
+ Add <code class="ph codeph">log4j.logger.org.apache.sentry=DEBUG</code> to the <span class="ph filepath">log4j.properties</span>
+ file on each host in the cluster, in the appropriate configuration directory for each service.
+ </li>
+ </ul>
+ Specifically, look for exceptions and messages such as:
+<pre class="pre codeblock"><code>FilePermission server..., RequestPermission server...., result [true|false]</code></pre>
+ which indicate each evaluation Sentry makes. The <code class="ph codeph">FilePermission</code> is from the policy file,
+ while <code class="ph codeph">RequestPermission</code> is the privilege required for the query. A
+ <code class="ph codeph">RequestPermission</code> will iterate over all appropriate <code class="ph codeph">FilePermission</code>
+ settings until a match is found. If no matching privilege is found, Sentry returns <code class="ph codeph">false</code>
+ indicating <span class="q">"Access Denied"</span> .
+
+ </div>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="authorization__sec_ex_default">
+
+ <h2 class="title topictitle2" id="ariaid-title12">The DEFAULT Database in a Secure Deployment</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Because of the extra emphasis on granular access controls in a secure deployment, you should move any
+ important or sensitive information out of the <code class="ph codeph">DEFAULT</code> database into a named database whose
+ privileges are specified in the policy file. Sometimes you might need to give privileges on the
+ <code class="ph codeph">DEFAULT</code> database for administrative reasons; for example, as a place you can reliably
+ specify with a <code class="ph codeph">USE</code> statement when preparing to drop a database.
+ </p>
+
+
+ </div>
+ </article>
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_avg.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_avg.html b/docs/build3x/html/topics/impala_avg.html
new file mode 100644
index 0000000..a63791f
--- /dev/null
+++ b/docs/build3x/html/topics/impala_avg.html
@@ -0,0 +1,318 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_aggregate_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="avg"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>AVG Function</title></head><body id="avg"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">AVG Function</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ An aggregate function that returns the average value from a set of numbers or <code class="ph codeph">TIMESTAMP</code> values.
+ Its single argument can be numeric column, or the numeric result of a function or expression applied to the
+ column value. Rows with a <code class="ph codeph">NULL</code> value for the specified column are ignored. If the table is empty,
+ or all the values supplied to <code class="ph codeph">AVG</code> are <code class="ph codeph">NULL</code>, <code class="ph codeph">AVG</code> returns
+ <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>AVG([DISTINCT | ALL] <var class="keyword varname">expression</var>) [OVER (<var class="keyword varname">analytic_clause</var>)]
+</code></pre>
+
+ <p class="p">
+ When the query contains a <code class="ph codeph">GROUP BY</code> clause, returns one value for each combination of
+ grouping values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">DOUBLE</code> for numeric values; <code class="ph codeph">TIMESTAMP</code> for
+ <code class="ph codeph">TIMESTAMP</code> values
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ To access a column with a complex type (<code class="ph codeph">ARRAY</code>, <code class="ph codeph">STRUCT</code>, or <code class="ph codeph">MAP</code>)
+ in an aggregation function, you unpack the individual elements using join notation in the query,
+ and then apply the function to the final scalar item, field, key, or value at the bottom of any nested type hierarchy in the column.
+ See <a class="xref" href="../shared/../topics/impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types in Impala.
+ </p>
+
+ <div class="p">
+The following example demonstrates calls to several aggregation functions
+using values from a column containing nested complex types
+(an <code class="ph codeph">ARRAY</code> of <code class="ph codeph">STRUCT</code> items).
+The array is unpacked inside the query using join notation.
+The array elements are referenced using the <code class="ph codeph">ITEM</code>
+pseudocolumn, and the structure fields inside the array elements
+are referenced using dot notation.
+Numeric values such as <code class="ph codeph">SUM()</code> and <code class="ph codeph">AVG()</code>
+are computed using the numeric <code class="ph codeph">R_NATIONKEY</code> field, and
+the general-purpose <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>
+values are computed from the string <code class="ph codeph">N_NAME</code> field.
+<pre class="pre codeblock"><code>describe region;
++-------------+-------------------------+---------+
+| name | type | comment |
++-------------+-------------------------+---------+
+| r_regionkey | smallint | |
+| r_name | string | |
+| r_comment | string | |
+| r_nations | array<struct< | |
+| | n_nationkey:smallint, | |
+| | n_name:string, | |
+| | n_comment:string | |
+| | >> | |
++-------------+-------------------------+---------+
+
+select r_name, r_nations.item.n_nationkey
+ from region, region.r_nations as r_nations
+order by r_name, r_nations.item.n_nationkey;
++-------------+------------------+
+| r_name | item.n_nationkey |
++-------------+------------------+
+| AFRICA | 0 |
+| AFRICA | 5 |
+| AFRICA | 14 |
+| AFRICA | 15 |
+| AFRICA | 16 |
+| AMERICA | 1 |
+| AMERICA | 2 |
+| AMERICA | 3 |
+| AMERICA | 17 |
+| AMERICA | 24 |
+| ASIA | 8 |
+| ASIA | 9 |
+| ASIA | 12 |
+| ASIA | 18 |
+| ASIA | 21 |
+| EUROPE | 6 |
+| EUROPE | 7 |
+| EUROPE | 19 |
+| EUROPE | 22 |
+| EUROPE | 23 |
+| MIDDLE EAST | 4 |
+| MIDDLE EAST | 10 |
+| MIDDLE EAST | 11 |
+| MIDDLE EAST | 13 |
+| MIDDLE EAST | 20 |
++-------------+------------------+
+
+select
+ r_name,
+ count(r_nations.item.n_nationkey) as count,
+ sum(r_nations.item.n_nationkey) as sum,
+ avg(r_nations.item.n_nationkey) as avg,
+ min(r_nations.item.n_name) as minimum,
+ max(r_nations.item.n_name) as maximum,
+ ndv(r_nations.item.n_nationkey) as distinct_vals
+from
+ region, region.r_nations as r_nations
+group by r_name
+order by r_name;
++-------------+-------+-----+------+-----------+----------------+---------------+
+| r_name | count | sum | avg | minimum | maximum | distinct_vals |
++-------------+-------+-----+------+-----------+----------------+---------------+
+| AFRICA | 5 | 50 | 10 | ALGERIA | MOZAMBIQUE | 5 |
+| AMERICA | 5 | 47 | 9.4 | ARGENTINA | UNITED STATES | 5 |
+| ASIA | 5 | 68 | 13.6 | CHINA | VIETNAM | 5 |
+| EUROPE | 5 | 77 | 15.4 | FRANCE | UNITED KINGDOM | 5 |
+| MIDDLE EAST | 5 | 58 | 11.6 | EGYPT | SAUDI ARABIA | 5 |
++-------------+-------+-----+------+-----------+----------------+---------------+
+</code></pre>
+</div>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>-- Average all the non-NULL values in a column.
+insert overwrite avg_t values (2),(4),(6),(null),(null);
+-- The average of the above values is 4: (2+4+6) / 3. The 2 NULL values are ignored.
+select avg(x) from avg_t;
+-- Average only certain values from the column.
+select avg(x) from t1 where month = 'January' and year = '2013';
+-- Apply a calculation to the value of the column before averaging.
+select avg(x/3) from t1;
+-- Apply a function to the value of the column before averaging.
+-- Here we are substituting a value of 0 for all NULLs in the column,
+-- so that those rows do factor into the return value.
+select avg(isnull(x,0)) from t1;
+-- Apply some number-returning function to a string column and average the results.
+-- If column s contains any NULLs, length(s) also returns NULL and those rows are ignored.
+select avg(length(s)) from t1;
+-- Can also be used in combination with DISTINCT and/or GROUP BY.
+-- Return more than one result.
+select month, year, avg(page_visits) from web_stats group by month, year;
+-- Filter the input to eliminate duplicates before performing the calculation.
+select avg(distinct x) from t1;
+-- Filter the output after performing the calculation.
+select avg(x) from t1 group by y having avg(x) between 1 and 20;
+</code></pre>
+
+ <div class="p">
+ The following examples show how to use <code class="ph codeph">AVG()</code> in an analytic context. They use a table
+ containing integers from 1 to 10. Notice how the <code class="ph codeph">AVG()</code> is reported for each input value, as
+ opposed to the <code class="ph codeph">GROUP BY</code> clause which condenses the result set.
+<pre class="pre codeblock"><code>select x, property, avg(x) over (partition by property) as avg from int_t where property in ('odd','even');
++----+----------+-----+
+| x | property | avg |
++----+----------+-----+
+| 2 | even | 6 |
+| 4 | even | 6 |
+| 6 | even | 6 |
+| 8 | even | 6 |
+| 10 | even | 6 |
+| 1 | odd | 5 |
+| 3 | odd | 5 |
+| 5 | odd | 5 |
+| 7 | odd | 5 |
+| 9 | odd | 5 |
++----+----------+-----+
+</code></pre>
+
+Adding an <code class="ph codeph">ORDER BY</code> clause lets you experiment with results that are cumulative or apply to a moving
+set of rows (the <span class="q">"window"</span>). The following examples use <code class="ph codeph">AVG()</code> in an analytic context
+(that is, with an <code class="ph codeph">OVER()</code> clause) to produce a running average of all the even values,
+then a running average of all the odd values. The basic <code class="ph codeph">ORDER BY x</code> clause implicitly
+activates a window clause of <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+which is effectively the same as <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>,
+therefore all of these examples produce the same results:
+<pre class="pre codeblock"><code>select x, property,
+ avg(x) over (partition by property <strong class="ph b">order by x</strong>) as 'cumulative average'
+ from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x | property | cumulative average |
++----+----------+--------------------+
+| 2 | even | 2 |
+| 4 | even | 3 |
+| 6 | even | 4 |
+| 8 | even | 5 |
+| 10 | even | 6 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+--------------------+
+
+select x, property,
+ avg(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">range between unbounded preceding and current row</strong>
+ ) as 'cumulative average'
+from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x | property | cumulative average |
++----+----------+--------------------+
+| 2 | even | 2 |
+| 4 | even | 3 |
+| 6 | even | 4 |
+| 8 | even | 5 |
+| 10 | even | 6 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+--------------------+
+
+select x, property,
+ avg(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">rows between unbounded preceding and current row</strong>
+ ) as 'cumulative average'
+ from int_t where property in ('odd','even');
++----+----------+--------------------+
+| x | property | cumulative average |
++----+----------+--------------------+
+| 2 | even | 2 |
+| 4 | even | 3 |
+| 6 | even | 4 |
+| 8 | even | 5 |
+| 10 | even | 6 |
+| 1 | odd | 1 |
+| 3 | odd | 2 |
+| 5 | odd | 3 |
+| 7 | odd | 4 |
+| 9 | odd | 5 |
++----+----------+--------------------+
+</code></pre>
+
+The following examples show how to construct a moving window, with a running average taking into account 1 row before
+and 1 row after the current row, within the same partition (all the even values or all the odd values).
+Because of a restriction in the Impala <code class="ph codeph">RANGE</code> syntax, this type of
+moving window is possible with the <code class="ph codeph">ROWS BETWEEN</code> clause but not the <code class="ph codeph">RANGE BETWEEN</code>
+clause:
+<pre class="pre codeblock"><code>select x, property,
+ avg(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">rows between 1 preceding and 1 following</strong>
+ ) as 'moving average'
+ from int_t where property in ('odd','even');
++----+----------+----------------+
+| x | property | moving average |
++----+----------+----------------+
+| 2 | even | 3 |
+| 4 | even | 4 |
+| 6 | even | 6 |
+| 8 | even | 8 |
+| 10 | even | 9 |
+| 1 | odd | 2 |
+| 3 | odd | 3 |
+| 5 | odd | 5 |
+| 7 | odd | 7 |
+| 9 | odd | 8 |
++----+----------+----------------+
+
+-- Doesn't work because of syntax restriction on RANGE clause.
+select x, property,
+ avg(x) over
+ (
+ partition by property
+ <strong class="ph b">order by x</strong>
+ <strong class="ph b">range between 1 preceding and 1 following</strong>
+ ) as 'moving average'
+from int_t where property in ('odd','even');
+ERROR: AnalysisException: RANGE is only supported with both the lower and upper bounds UNBOUNDED or one UNBOUNDED and the other CURRENT ROW.
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Due to the way arithmetic on <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns uses
+ high-performance hardware instructions, and distributed queries can perform these operations in different
+ order for each query, results can vary slightly for aggregate function calls such as <code class="ph codeph">SUM()</code>
+ and <code class="ph codeph">AVG()</code> for <code class="ph codeph">FLOAT</code> and <code class="ph codeph">DOUBLE</code> columns, particularly on
+ large data sets where millions or billions of values are summed or averaged. For perfect consistency and
+ repeatability, use the <code class="ph codeph">DECIMAL</code> data type for such operations instead of
+ <code class="ph codeph">FLOAT</code> or <code class="ph codeph">DOUBLE</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#analytic_functions">Impala Analytic Functions</a>, <a class="xref" href="impala_max.html#max">MAX Function</a>,
+ <a class="xref" href="impala_min.html#min">MIN Function</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_aggregate_functions.html">Impala Aggregate Functions</a></div></div></nav></article></main></body></html>
[15/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html b/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
new file mode 100644
index 0000000..74ec966
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_fallback_schema_resolution.html
@@ -0,0 +1,55 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_fallback_schema_resolution"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (Impala 2.6 or higher only)</title></head><body id="parquet_fallback_schema_resolution"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FALLBACK_SCHEMA_RESOLUTION Query Option (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <div class="p">
+
+ The <code class="ph codeph">PARQUET_FALLBACK_SCHEMA_RESOLUTION</code> query option allows Impala to look
+ up columns within Parquet files by column name, rather than column order,
+ when necessary.
+ The allowed values are:
+ <ul class="ul">
+ <li class="li">
+ POSITION (0)
+ </li>
+ <li class="li">
+ NAME (1)
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ By default, Impala looks up columns within a Parquet file based on
+ the order of columns in the table.
+ The <code class="ph codeph">name</code> setting for this option enables behavior for
+ Impala queries similar to the Hive setting <code class="ph codeph">parquet.column.index access=false</code>.
+ It also allows Impala to query Parquet files created by Hive with the
+ <code class="ph codeph">parquet.column.index.access=false</code> setting in effect.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer or string
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.6.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_parquet.html#parquet_schema_evolution">Schema Evolution for Parquet Tables</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_parquet_file_size.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_parquet_file_size.html b/docs/build3x/html/topics/impala_parquet_file_size.html
new file mode 100644
index 0000000..b62341e
--- /dev/null
+++ b/docs/build3x/html/topics/impala_parquet_file_size.html
@@ -0,0 +1,101 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="parquet_file_size"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>PARQUET_FILE_SIZE Query Option</title></head><body id="parquet_file_size"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">PARQUET_FILE_SIZE Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Specifies the maximum size of each Parquet data file produced by Impala <code class="ph codeph">INSERT</code> statements.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ Specify the size in bytes, or with a trailing <code class="ph codeph">m</code> or <code class="ph codeph">g</code> character to indicate
+ megabytes or gigabytes. For example:
+ </p>
+
+<pre class="pre codeblock"><code>-- 128 megabytes.
+set PARQUET_FILE_SIZE=134217728
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 512 megabytes.
+set PARQUET_FILE_SIZE=512m;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+
+-- 1 gigabyte.
+set PARQUET_FILE_SIZE=1g;
+INSERT OVERWRITE parquet_table SELECT * FROM text_table;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ With tables that are small or finely partitioned, the default Parquet block size (formerly 1 GB, now 256 MB
+ in Impala 2.0 and later) could be much larger than needed for each data file. For <code class="ph codeph">INSERT</code>
+ operations into such tables, you can increase parallelism by specifying a smaller
+ <code class="ph codeph">PARQUET_FILE_SIZE</code> value, resulting in more HDFS blocks that can be processed by different
+ nodes.
+
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> numeric, with optional unit specifier
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ Currently, the maximum value for this setting is 1 gigabyte (<code class="ph codeph">1g</code>).
+ Setting a value higher than 1 gigabyte could result in errors during
+ an <code class="ph codeph">INSERT</code> operation.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> 0 (produces files with a target size of 256 MB; files might be larger for very wide tables)
+ </p>
+
+ <p class="p">
+ Because ADLS does not expose the block sizes of data files the way HDFS does,
+ any Impala <code class="ph codeph">INSERT</code> or <code class="ph codeph">CREATE TABLE AS SELECT</code> statements
+ use the <code class="ph codeph">PARQUET_FILE_SIZE</code> query option setting to define the size of
+ Parquet data files. (Using a large block size is more important for Parquet tables than
+ for tables that use other file formats.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Isilon considerations:</strong>
+ </p>
+ <div class="p">
+ Because the EMC Isilon storage devices use a global value for the block size
+ rather than a configurable value for each file, the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+ query option has no effect when Impala inserts data into a table or partition
+ residing on Isilon storage. Use the <code class="ph codeph">isi</code> command to set the
+ default block size globally on the Isilon device. For example, to set the
+ Isilon default block size to 256 MB, the recommended size for Parquet
+ data files for Impala, issue the following command:
+<pre class="pre codeblock"><code>isi hdfs settings modify --default-block-size=256MB</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ For information about the Parquet file format, and how the number and size of data files affects query
+ performance, see <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a>.
+ </p>
+
+
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_partitioning.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_partitioning.html b/docs/build3x/html/topics/impala_partitioning.html
new file mode 100644
index 0000000..c99d10d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_partitioning.html
@@ -0,0 +1,801 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version"
content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="partitioning"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Partitioning for Impala Tables</title></head><body id="partitioning"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Partitioning for Impala Tables</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ By default, all the data files for a table are located in a single directory. Partitioning is a technique for physically dividing the
+ data during loading, based on values from one or more columns, to speed up queries that test those columns. For example, with a
+ <code class="ph codeph">school_records</code> table partitioned on a <code class="ph codeph">year</code> column, there is a separate data directory for each
+ different year value, and all the data for that year is stored in a data file in that directory. A query that includes a
+ <code class="ph codeph">WHERE</code> condition such as <code class="ph codeph">YEAR=1966</code>, <code class="ph codeph">YEAR IN (1989,1999)</code>, or <code class="ph codeph">YEAR BETWEEN
+ 1984 AND 1989</code> can examine only the data files from the appropriate directory or directories, greatly reducing the amount of
+ data to read and test.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <p class="p">
+ See <a class="xref" href="impala_tutorial.html#tut_external_partition_data">Attaching an External Partitioned Table to an HDFS Directory Structure</a> for an example that illustrates the syntax for creating partitioned
+ tables, the underlying directory structure in HDFS, and how to attach a partitioned Impala external table to data files stored
+ elsewhere in HDFS.
+ </p>
+
+ <p class="p">
+ Parquet is a popular format for partitioned Impala tables because it is well suited to handle huge data volumes. See
+ <a class="xref" href="impala_parquet.html#parquet_performance">Query Performance for Impala Parquet Tables</a> for performance considerations for partitioned Parquet tables.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_literals.html#null">NULL</a> for details about how <code class="ph codeph">NULL</code> values are represented in partitioned tables.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about setting up tables where some or all partitions reside on the Amazon Simple
+ Storage Service (S3).
+ </p>
+
+ </div>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title2" id="partitioning__partitioning_choosing">
+
+ <h2 class="title topictitle2" id="ariaid-title2">When to Use Partitioned Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Partitioning is typically appropriate for:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Tables that are very large, where reading the entire data set takes an impractical amount of time.
+ </li>
+
+ <li class="li">
+ Tables that are always or almost always queried with conditions on the partitioning columns. In our example of a table partitioned
+ by year, <code class="ph codeph">SELECT COUNT(*) FROM school_records WHERE year = 1985</code> is efficient, only examining a small fraction of
+ the data; but <code class="ph codeph">SELECT COUNT(*) FROM school_records</code> has to process a separate data file for each year, resulting in
+ more overall work than in an unpartitioned table. You would probably not partition this way if you frequently queried the table
+ based on last name, student ID, and so on without testing the year.
+ </li>
+
+ <li class="li">
+ Columns that have reasonable cardinality (number of different values). If a column only has a small number of values, for example
+ <code class="ph codeph">Male</code> or <code class="ph codeph">Female</code>, you do not gain much efficiency by eliminating only about 50% of the data to
+ read for each query. If a column has only a few rows matching each value, the number of directories to process can become a
+ limiting factor, and the data file in each directory could be too small to take advantage of the Hadoop mechanism for transmitting
+ data in multi-megabyte blocks. For example, you might partition census data by year, store sales data by year and month, and web
+ traffic data by year, month, and day. (Some users with high volumes of incoming data might even partition down to the individual
+ hour and minute.)
+ </li>
+
+ <li class="li">
+ Data that already passes through an extract, transform, and load (ETL) pipeline. The values of the partitioning columns are
+ stripped from the original data files and represented by directory names, so loading data into a partitioned table involves some
+ sort of transformation or preprocessing.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="partitioning__partition_sql">
+
+ <h2 class="title topictitle2" id="ariaid-title3">SQL Statements for Partitioned Tables</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In terms of Impala SQL syntax, partitioning affects these statements:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph"><a class="xref" href="impala_create_table.html#create_table">CREATE TABLE</a></code>: you specify a <code class="ph codeph">PARTITIONED
+ BY</code> clause when creating the table to identify names and data types of the partitioning columns. These columns are not
+ included in the main list of columns for the table.
+ </li>
+
+ <li class="li">
+ In <span class="keyword">Impala 2.5</span> and higher, you can also use the <code class="ph codeph">PARTITIONED BY</code> clause in a <code class="ph codeph">CREATE TABLE AS
+ SELECT</code> statement. This syntax lets you use a single statement to create a partitioned table, copy data into it, and
+ create new partitions based on the values in the inserted data.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph"><a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE</a></code>: you can add or drop partitions, to work with
+ different portions of a huge data set. You can designate the HDFS directory that holds the data files for a specific partition.
+ With data partitioned by date values, you might <span class="q">"age out"</span> data that is no longer relevant.
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+ a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">SET LOCATION</code> clauses.
+ </div>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph"><a class="xref" href="impala_insert.html#insert">INSERT</a></code>: When you insert data into a partitioned table, you identify
+ the partitioning columns. One or more values from each inserted row are not stored in data files, but instead determine the
+ directory where that row value is stored. You can also specify which partition to load a set of data into, with <code class="ph codeph">INSERT
+ OVERWRITE</code> statements; you can replace the contents of a specific partition but you cannot append data to a specific
+ partition.
+ <p class="p">
+ By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+ table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+ make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+ <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+ </li>
+
+ <li class="li">
+ Although the syntax of the <code class="ph codeph"><a class="xref" href="impala_select.html#select">SELECT</a></code> statement is the same whether or
+ not the table is partitioned, the way queries interact with partitioned tables can have a dramatic impact on performance and
+ scalability. The mechanism that lets queries skip certain partitions during a query is known as partition pruning; see
+ <a class="xref" href="impala_partitioning.html#partition_pruning">Partition Pruning for Queries</a> for details.
+ </li>
+
+ <li class="li">
+ In Impala 1.4 and later, there is a <code class="ph codeph">SHOW PARTITIONS</code> statement that displays information about each partition in a
+ table. See <a class="xref" href="impala_show.html#show">SHOW Statement</a> for details.
+ </li>
+ </ul>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="partitioning__partition_static_dynamic">
+
+ <h2 class="title topictitle2" id="ariaid-title4">Static and Dynamic Partitioning Clauses</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Specifying all the partition columns in a SQL statement is called <dfn class="term">static partitioning</dfn>, because the statement affects a
+ single predictable partition. For example, you use static partitioning with an <code class="ph codeph">ALTER TABLE</code> statement that affects
+ only one partition, or with an <code class="ph codeph">INSERT</code> statement that inserts all values into the same partition:
+ </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x=10, y='a')</strong> select c1 from some_other_table;
+</code></pre>
+
+ <p class="p">
+ When you specify some partition key columns in an <code class="ph codeph">INSERT</code> statement, but leave out the values, Impala determines
+ which partition to insert. This technique is called <dfn class="term">dynamic partitioning</dfn>:
+ </p>
+
+<pre class="pre codeblock"><code>insert into t1 <strong class="ph b">partition(x, y='b')</strong> select c1, c2 from some_other_table;
+-- Create new partition if necessary based on variable year, month, and day; insert a single value.
+insert into weather <strong class="ph b">partition (year, month, day)</strong> select 'cloudy',2014,4,21;
+-- Create new partition if necessary for specified year and month but variable day; insert a single value.
+insert into weather <strong class="ph b">partition (year=2014, month=04, day)</strong> select 'sunny',22;
+</code></pre>
+
+ <p class="p">
+ The more key columns you specify in the <code class="ph codeph">PARTITION</code> clause, the fewer columns you need in the <code class="ph codeph">SELECT</code>
+ list. The trailing columns in the <code class="ph codeph">SELECT</code> list are substituted in order for the partition key columns with no
+ specified value.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="partitioning__partition_refresh">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Refreshing a Single Partition</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">REFRESH</code> statement is typically used with partitioned tables when new data files are loaded into a partition by
+ some non-Impala mechanism, such as a Hive or Spark job. The <code class="ph codeph">REFRESH</code> statement makes Impala aware of the new data
+ files so that they can be used in Impala queries. Because partitioned tables typically contain a high volume of data, the
+ <code class="ph codeph">REFRESH</code> operation for a full partitioned table can take significant time.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.7</span> and higher, you can include a <code class="ph codeph">PARTITION (<var class="keyword varname">partition_spec</var>)</code> clause in the
+ <code class="ph codeph">REFRESH</code> statement so that only a single partition is refreshed. For example, <code class="ph codeph">REFRESH big_table PARTITION
+ (year=2017, month=9, day=30)</code>. The partition spec must include all the partition key columns. See
+ <a class="xref" href="impala_refresh.html#refresh">REFRESH Statement</a> for more details and examples of <code class="ph codeph">REFRESH</code> syntax and usage.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="partitioning__partition_permissions">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Permissions for Partition Subdirectories</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ By default, if an <code class="ph codeph">INSERT</code> statement creates any new subdirectories underneath a partitioned
+ table, those subdirectories are assigned default HDFS permissions for the <code class="ph codeph">impala</code> user. To
+ make each subdirectory have the same permissions as its parent directory in HDFS, specify the
+ <code class="ph codeph">--insert_inherit_permissions</code> startup option for the <span class="keyword cmdname">impalad</span> daemon.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="partitioning__partition_pruning">
+
+ <h2 class="title topictitle2" id="ariaid-title7">Partition Pruning for Queries</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Partition pruning refers to the mechanism where a query can skip reading the data files corresponding to one or more partitions. If
+ you can arrange for queries to prune large numbers of unnecessary partitions from the query execution plan, the queries use fewer
+ resources and are thus proportionally faster and more scalable.
+ </p>
+
+ <p class="p">
+ For example, if a table is partitioned by columns <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>, then
+ <code class="ph codeph">WHERE</code> clauses such as <code class="ph codeph">WHERE year = 2013</code>, <code class="ph codeph">WHERE year < 2010</code>, or <code class="ph codeph">WHERE
+ year BETWEEN 1995 AND 1998</code> allow Impala to skip the data files in all partitions outside the specified range. Likewise,
+ <code class="ph codeph">WHERE year = 2013 AND month BETWEEN 1 AND 3</code> could prune even more partitions, reading the data files for only a
+ portion of one year.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title8" id="partition_pruning__partition_pruning_checking">
+
+ <h3 class="title topictitle3" id="ariaid-title8">Checking if Partition Pruning Happens for a Query</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ To check the effectiveness of partition pruning for a query, check the <code class="ph codeph">EXPLAIN</code> output for the query before
+ running it. For example, this example shows a table with 3 partitions, where the query only reads 1 of them. The notation
+ <code class="ph codeph">#partitions=1/3</code> in the <code class="ph codeph">EXPLAIN</code> plan confirms that Impala can do the appropriate partition
+ pruning.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > insert into census partition (year=2010) values ('Smith'),('Jones');
+[localhost:21000] > insert into census partition (year=2011) values ('Smith'),('Jones'),('Doe');
+[localhost:21000] > insert into census partition (year=2012) values ('Smith'),('Doe');
+[localhost:21000] > select name from census where year=2010;
++-------+
+| name |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] > explain select name from census <strong class="ph b">where year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 1:EXCHANGE |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 1 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=12B |
++------------------------------------------------------------------+</code></pre>
+
+ <p class="p">
+ For a report of the volume of data that was actually read and processed at each stage of the query, check the output of the
+ <code class="ph codeph">SUMMARY</code> command immediately after running the query. For a more detailed analysis, look at the output of the
+ <code class="ph codeph">PROFILE</code> command; it includes this same summary report near the start of the profile output.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="partition_pruning__partition_pruning_sql">
+
+ <h3 class="title topictitle3" id="ariaid-title9">What SQL Constructs Work with Partition Pruning</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala can even do partition pruning in cases where the partition key column is not directly compared to a constant, by applying
+ the transitive property to other parts of the <code class="ph codeph">WHERE</code> clause. This technique is known as predicate propagation, and
+ is available in Impala 1.2.2 and later. In this example, the census table includes another column indicating when the data was
+ collected, which happens in 10-year intervals. Even though the query does not compare the partition key column
+ (<code class="ph codeph">YEAR</code>) to a constant value, Impala can deduce that only the partition <code class="ph codeph">YEAR=2010</code> is required, and
+ again only reads 1 out of 3 partitions.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > drop table census;
+[localhost:21000] > create table census (name string, census_year int) partitioned by (year int);
+[localhost:21000] > insert into census partition (year=2010) values ('Smith',2010),('Jones',2010);
+[localhost:21000] > insert into census partition (year=2011) values ('Smith',2020),('Jones',2020),('Doe',2020);
+[localhost:21000] > insert into census partition (year=2012) values ('Smith',2020),('Doe',2020);
+[localhost:21000] > select name from census where year = census_year and census_year=2010;
++-------+
+| name |
++-------+
+| Smith |
+| Jones |
++-------+
+[localhost:21000] > explain select name from census <strong class="ph b">where year = census_year and census_year=2010</strong>;
++------------------------------------------------------------------+
+| Explain String |
++------------------------------------------------------------------+
+| PLAN FRAGMENT 0 |
+| PARTITION: UNPARTITIONED |
+| |
+| 1:EXCHANGE |
+| |
+| PLAN FRAGMENT 1 |
+| PARTITION: RANDOM |
+| |
+| STREAM DATA SINK |
+| EXCHANGE ID: 1 |
+| UNPARTITIONED |
+| |
+| 0:SCAN HDFS |
+| table=predicate_propagation.census <strong class="ph b">#partitions=1/3</strong> size=22B |
+| predicates: census_year = 2010, year = census_year |
++------------------------------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ If a view applies to a partitioned table, any partition pruning considers the clauses on both
+ the original query and any additional <code class="ph codeph">WHERE</code> predicates in the query that refers to the view.
+ Prior to Impala 1.4, only the <code class="ph codeph">WHERE</code> clauses on the original query from the
+ <code class="ph codeph">CREATE VIEW</code> statement were used for partition pruning.
+ </p>
+
+ <p class="p">
+ In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+ clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+ the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code> in the analytic function call;
+ for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title10" id="partition_pruning__dynamic_partition_pruning">
+
+ <h3 class="title topictitle3" id="ariaid-title10">Dynamic Partition Pruning</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The original mechanism uses to prune partitions is <dfn class="term">static partition pruning</dfn>, in which the conditions in the
+ <code class="ph codeph">WHERE</code> clause are analyzed to determine in advance which partitions can be safely skipped. In <span class="keyword">Impala 2.5</span>
+ and higher, Impala can perform <dfn class="term">dynamic partition pruning</dfn>, where information about the partitions is collected during
+ the query, and Impala prunes unnecessary partitions in ways that were impractical to predict in advance.
+ </p>
+
+ <p class="p">
+ For example, if partition key columns are compared to literal values in a <code class="ph codeph">WHERE</code> clause, Impala can perform static
+ partition pruning during the planning phase to only read the relevant partitions:
+ </p>
+
+<pre class="pre codeblock"><code>
+-- The query only needs to read 3 partitions whose key values are known ahead of time.
+-- That's static partition pruning.
+SELECT COUNT(*) FROM sales_table WHERE year IN (2005, 2010, 2015);
+</code></pre>
+
+ <p class="p">
+ Dynamic partition pruning involves using information only available at run time, such as the result of a subquery:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table yy (s string) partitioned by (year int) stored as parquet;
+insert into yy partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001), ('2010',2010);
+compute stats yy;
+
+create table yy2 (s string) partitioned by (year int) stored as parquet;
+insert into yy2 partition (year) values ('1999', 1999), ('2000', 2000),
+ ('2001', 2001);
+compute stats yy2;
+
+-- The query reads an unknown number of partitions, whose key values are only
+-- known at run time. The 'runtime filters' lines show how the information about
+-- the partitions is calculated in query fragment 02, and then used in query
+-- fragment 00 to decide which partitions to skip.
+explain select s from yy2 where year in (select year from yy where year between 2000 and 2005);
++----------------------------------------------------------+
+| Explain String |
++----------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=16.00MB VCores=2 |
+| |
+| 04:EXCHANGE [UNPARTITIONED] |
+| | |
+| 02:HASH JOIN [LEFT SEMI JOIN, BROADCAST] |
+| | hash predicates: year = year |
+| | <strong class="ph b">runtime filters: RF000 <- year</strong> |
+| | |
+| |--03:EXCHANGE [BROADCAST] |
+| | | |
+| | 01:SCAN HDFS [dpp.yy] |
+| | partitions=2/4 files=2 size=468B |
+| | |
+| 00:SCAN HDFS [dpp.yy2] |
+| partitions=2/3 files=2 size=468B |
+| <strong class="ph b">runtime filters: RF000 -> year</strong> |
++----------------------------------------------------------+
+</code></pre>
+
+
+
+ <p class="p">
+ In this case, Impala evaluates the subquery, sends the subquery results to all Impala nodes participating in the query, and then
+ each <span class="keyword cmdname">impalad</span> daemon uses the dynamic partition pruning optimization to read only the partitions with the
+ relevant key values.
+ </p>
+
+ <p class="p">
+ Dynamic partition pruning is especially effective for queries involving joins of several large partitioned tables. Evaluating the
+ <code class="ph codeph">ON</code> clauses of the join predicates might normally require reading data from all partitions of certain tables. If
+ the <code class="ph codeph">WHERE</code> clauses of the query refer to the partition key columns, Impala can now often skip reading many of the
+ partitions while evaluating the <code class="ph codeph">ON</code> clauses. The dynamic partition pruning optimization reduces the amount of I/O
+ and the amount of intermediate data stored and transmitted across the network during the query.
+ </p>
+
+ <p class="p">
+ When the spill-to-disk feature is activated for a join node within a query, Impala does not
+ produce any runtime filters for that join operation on that host. Other join nodes within
+ the query are not affected.
+ </p>
+
+ <p class="p">
+ Dynamic partition pruning is part of the runtime filtering feature, which applies to other kinds of queries in addition to queries
+ against partitioned tables. See <a class="xref" href="impala_runtime_filtering.html#runtime_filtering">Runtime Filtering for Impala Queries (Impala 2.5 or higher only)</a> for full details about this feature.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="partitioning__partition_key_columns">
+
+ <h2 class="title topictitle2" id="ariaid-title11">Partition Key Columns</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The columns you choose as the partition keys should be ones that are frequently used to filter query results in important,
+ large-scale queries. Popular examples are some combination of year, month, and day when the data has associated time values, and
+ geographic region when the data is associated with some place.
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ For time-based data, split out the separate parts into their own columns, because Impala cannot partition based on a
+ <code class="ph codeph">TIMESTAMP</code> column.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ The data type of the partition columns does not have a significant effect on the storage required, because the values from those
+ columns are not stored in the data files, rather they are represented as strings inside HDFS directory names.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ In <span class="keyword">Impala 2.5</span> and higher, you can enable the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code> query option to speed up
+ queries that only refer to partition key columns, such as <code class="ph codeph">SELECT MAX(year)</code>. This setting is not enabled by
+ default because the query behavior is slightly different if the table contains partition directories without actual data inside.
+ See <a class="xref" href="impala_optimize_partition_key_scans.html#optimize_partition_key_scans">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a> for details.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Partitioned tables can contain complex type columns.
+ All the partition key columns must be scalar types.
+ </p>
+ </li>
+
+ <li class="li">
+ <p class="p">
+ Remember that when Impala queries data stored in HDFS, it is most efficient to use multi-megabyte files to take advantage of the
+ HDFS block size. For Parquet tables, the block size (and ideal size of the data files) is <span class="ph">256 MB in
+ Impala 2.0 and later</span>. Therefore, avoid specifying too many partition key columns, which could result in individual
+ partitions containing only small amounts of data. For example, if you receive 1 GB of data per day, you might partition by year,
+ month, and day; while if you receive 5 GB of data per minute, you might partition by year, month, day, hour, and minute. If you
+ have data with a geographic component, you might partition based on postal code if you have many megabytes of data for each
+ postal code, but if not, you might partition by some larger region such as city, state, or country. state
+ </p>
+ </li>
+ </ul>
+
+ <p class="p">
+ If you frequently run aggregate functions such as <code class="ph codeph">MIN()</code>, <code class="ph codeph">MAX()</code>, and
+ <code class="ph codeph">COUNT(DISTINCT)</code> on partition key columns, consider enabling the <code class="ph codeph">OPTIMIZE_PARTITION_KEY_SCANS</code>
+ query option, which optimizes such queries. This feature is available in <span class="keyword">Impala 2.5</span> and higher.
+ See <a class="xref" href="../shared/../topics/impala_optimize_partition_key_scans.html">OPTIMIZE_PARTITION_KEY_SCANS Query Option (Impala 2.5 or higher only)</a>
+ for the kinds of queries that this option applies to, and slight differences in how partitions are
+ evaluated when this query option is enabled.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="partitioning__mixed_format_partitions">
+
+ <h2 class="title topictitle2" id="ariaid-title12">Setting Different File Formats for Partitions</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Partitioned tables have the flexibility to use different file formats for different partitions. (For background information about
+ the different file formats Impala supports, see <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a>.) For example, if you originally
+ received data in text format, then received new data in RCFile format, and eventually began receiving data in Parquet format, all
+ that data could reside in the same table for queries. You just need to ensure that the table is structured so that the data files
+ that use different file formats reside in separate partitions.
+ </p>
+
+ <p class="p">
+ For example, here is how you might switch from text to Parquet data as you receive data for different years:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > create table census (name string) partitioned by (year smallint);
+[localhost:21000] > alter table census add partition (year=2012); -- Text format;
+
+[localhost:21000] > alter table census add partition (year=2013); -- Text format switches to Parquet before data loaded;
+[localhost:21000] > alter table census partition (year=2013) set fileformat parquet;
+
+[localhost:21000] > insert into census partition (year=2012) values ('Smith'),('Jones'),('Lee'),('Singh');
+[localhost:21000] > insert into census partition (year=2013) values ('Flores'),('Bogomolov'),('Cooper'),('Appiah');</code></pre>
+
+ <p class="p">
+ At this point, the HDFS directory for <code class="ph codeph">year=2012</code> contains a text-format data file, while the HDFS directory for
+ <code class="ph codeph">year=2013</code> contains a Parquet data file. As always, when loading non-trivial data, you would use <code class="ph codeph">INSERT ...
+ SELECT</code> or <code class="ph codeph">LOAD DATA</code> to import data in large batches, rather than <code class="ph codeph">INSERT ... VALUES</code> which
+ produces small files that are inefficient for real-world queries.
+ </p>
+
+ <p class="p">
+ For other file types that Impala cannot create natively, you can switch into Hive and issue the <code class="ph codeph">ALTER TABLE ... SET
+ FILEFORMAT</code> statements and <code class="ph codeph">INSERT</code> or <code class="ph codeph">LOAD DATA</code> statements there. After switching back to
+ Impala, issue a <code class="ph codeph">REFRESH <var class="keyword varname">table_name</var></code> statement so that Impala recognizes any partitions or new
+ data added through Hive.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="partitioning__partition_management">
+
+ <h2 class="title topictitle2" id="ariaid-title13">Managing Partitions</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can add, drop, set the expected file format, or set the HDFS location of the data files for individual partitions within an
+ Impala table. See <a class="xref" href="impala_alter_table.html#alter_table">ALTER TABLE Statement</a> for syntax details, and
+ <a class="xref" href="impala_partitioning.html#mixed_format_partitions">Setting Different File Formats for Partitions</a> for tips on managing tables containing partitions with different file
+ formats.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+ a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">SET LOCATION</code> clauses.
+ </div>
+
+ <p class="p">
+ What happens to the data files when a partition is dropped depends on whether the partitioned table is designated as internal or
+ external. For an internal (managed) table, the data files are deleted. For example, if data in the partitioned table is a copy of
+ raw data files stored elsewhere, you might save disk space by dropping older partitions that are no longer required for reporting,
+ knowing that the original data is still available if needed later. For an external table, the data files are left alone. For
+ example, dropping a partition without deleting the associated files lets Impala consider a smaller set of partitions, improving
+ query efficiency and reducing overhead for DDL operations on the table; if the data is needed again later, you can add the partition
+ again. See <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="partitioning__partition_kudu">
+
+ <h2 class="title topictitle2" id="ariaid-title14">Using Partitioning with Kudu Tables</h2>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Kudu tables use a more fine-grained partitioning scheme than tables containing HDFS data files. You specify a <code class="ph codeph">PARTITION
+ BY</code> clause with the <code class="ph codeph">CREATE TABLE</code> statement to identify how to divide the values from the partition key
+ columns.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for
+ details and examples of the partitioning techniques
+ for Kudu tables.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="partitioning__partition_stats">
+ <h2 class="title topictitle2" id="ariaid-title15">Keeping Statistics Up to Date for Partitioned Tables</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ Because the <code class="ph codeph">COMPUTE STATS</code> statement can be resource-intensive to run on a partitioned table
+ as new partitions are added, Impala includes a variation of this statement that allows computing statistics
+ on a per-partition basis such that stats can be incrementally updated when new partitions are added.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ For a particular table, use either <code class="ph codeph">COMPUTE STATS</code> or
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>, but never combine the two or
+ alternate between them. If you switch from <code class="ph codeph">COMPUTE STATS</code> to
+ <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> during the lifetime of a table, or
+ vice versa, drop all statistics by running <code class="ph codeph">DROP STATS</code> before
+ making the switch.
+ </p>
+ <p class="p">
+ When you run <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> on a table for the first time,
+ the statistics are computed again from scratch regardless of whether the table already
+ has statistics. Therefore, expect a one-time resource-intensive operation
+ for scanning the entire table when running <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ for the first time on a given table.
+ </p>
+ <p class="p">
+ For a table with a huge number of partitions and many columns, the approximately 400 bytes
+ of metadata per column per partition can add up to significant memory overhead, as it must
+ be cached on the <span class="keyword cmdname">catalogd</span> host and on every <span class="keyword cmdname">impalad</span> host
+ that is eligible to be a coordinator. If this metadata for all tables combined exceeds 2 GB,
+ you might experience service downtime.
+ </p>
+ </div>
+
+ <p class="p">
+ The <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> variation computes statistics only for partitions that were
+ added or changed since the last <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, rather than the entire
+ table. It is typically used for tables where a full <code class="ph codeph">COMPUTE STATS</code>
+ operation takes too long to be practical each time a partition is added or dropped. See
+ <a class="xref" href="impala_perf_stats.html#perf_stats_incremental">impala_perf_stats.html#perf_stats_incremental</a> for full usage details.
+ </p>
+
+<pre class="pre codeblock"><code>-- Initially the table has no incremental stats, as indicated
+-- 'false' under Incremental stats.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | -1 | 1 | 223.74KB | NOT CACHED | PARQUET | false
+| Children | -1 | 1 | 230.05KB | NOT CACHED | PARQUET | false
+| Electronics | -1 | 1 | 232.67KB | NOT CACHED | PARQUET | false
+| Home | -1 | 1 | 232.56KB | NOT CACHED | PARQUET | false
+| Jewelry | -1 | 1 | 223.72KB | NOT CACHED | PARQUET | false
+| Men | -1 | 1 | 231.25KB | NOT CACHED | PARQUET | false
+| Music | -1 | 1 | 237.90KB | NOT CACHED | PARQUET | false
+| Shoes | -1 | 1 | 234.90KB | NOT CACHED | PARQUET | false
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | -1 | 1 | 226.27KB | NOT CACHED | PARQUET | false
+| Total | -1 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After the first COMPUTE INCREMENTAL STATS,
+-- all partitions have stats. The first
+-- COMPUTE INCREMENTAL STATS scans the whole
+-- table, discarding any previous stats from
+-- a traditional COMPUTE STATS statement.
+compute incremental stats item_partitioned;
++-------------------------------------------+
+| summary |
++-------------------------------------------+
+| Updated 10 partition(s) and 21 column(s). |
++-------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 10 | 2.25MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- Add a new partition...
+alter table item_partitioned add partition (i_category='Camping');
+-- Add or replace files in HDFS outside of Impala,
+-- rendering the stats for a partition obsolete.
+!import_data_into_sports_partition.sh
+refresh item_partitioned;
+drop incremental stats item_partitioned partition (i_category='Sports');
+-- Now some partitions have incremental stats
+-- and some do not.
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Camping | -1 | 1 | 408.02KB | NOT CACHED | PARQUET | false
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | -1 | 1 | 227.97KB | NOT CACHED | PARQUET | false
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 11 | 2.65MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+
+-- After another COMPUTE INCREMENTAL STATS,
+-- all partitions have incremental stats, and only the 2
+-- partitions without incremental stats were scanned.
+compute incremental stats item_partitioned;
++------------------------------------------+
+| summary |
++------------------------------------------+
+| Updated 2 partition(s) and 21 column(s). |
++------------------------------------------+
+show table stats item_partitioned;
++-------------+-------+--------+----------+--------------+---------+------------------
+| i_category | #Rows | #Files | Size | Bytes Cached | Format | Incremental stats
++-------------+-------+--------+----------+--------------+---------+------------------
+| Books | 1733 | 1 | 223.74KB | NOT CACHED | PARQUET | true
+| Camping | 5328 | 1 | 408.02KB | NOT CACHED | PARQUET | true
+| Children | 1786 | 1 | 230.05KB | NOT CACHED | PARQUET | true
+| Electronics | 1812 | 1 | 232.67KB | NOT CACHED | PARQUET | true
+| Home | 1807 | 1 | 232.56KB | NOT CACHED | PARQUET | true
+| Jewelry | 1740 | 1 | 223.72KB | NOT CACHED | PARQUET | true
+| Men | 1811 | 1 | 231.25KB | NOT CACHED | PARQUET | true
+| Music | 1860 | 1 | 237.90KB | NOT CACHED | PARQUET | true
+| Shoes | 1835 | 1 | 234.90KB | NOT CACHED | PARQUET | true
+| Sports | 1783 | 1 | 227.97KB | NOT CACHED | PARQUET | true
+| Women | 1790 | 1 | 226.27KB | NOT CACHED | PARQUET | true
+| Total | 17957 | 11 | 2.65MB | 0B | |
++-------------+-------+--------+----------+--------------+---------+------------------
+</code></pre>
+
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_benchmarking.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_benchmarking.html b/docs/build3x/html/topics/impala_perf_benchmarking.html
new file mode 100644
index 0000000..0470e72
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_benchmarking.html
@@ -0,0 +1,27 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_benchmarks"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Benchmarking Impala Queries</title></head><body id="perf_benchmarks"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Benchmarking Impala Queries</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Because Impala, like other Hadoop components, is designed to handle large data volumes in a distributed
+ environment, conduct any performance tests using realistic data and cluster configurations. Use a multi-node
+ cluster rather than a single node; run queries against tables containing terabytes of data rather than tens
+ of gigabytes. The parallel processing techniques used by Impala are most appropriate for workloads that are
+ beyond the capacity of a single server.
+ </p>
+
+ <p class="p">
+ When you run queries returning large numbers of rows, the CPU time to pretty-print the output can be
+ substantial, giving an inaccurate measurement of the actual query time. Consider using the
+ <code class="ph codeph">-B</code> option on the <code class="ph codeph">impala-shell</code> command to turn off the pretty-printing, and
+ optionally the <code class="ph codeph">-o</code> option to store query results in a file rather than printing to the
+ screen. See <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_perf_cookbook.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_perf_cookbook.html b/docs/build3x/html/topics/impala_perf_cookbook.html
new file mode 100644
index 0000000..5e7c7ec
--- /dev/null
+++ b/docs/build3x/html/topics/impala_perf_cookbook.html
@@ -0,0 +1,256 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_performance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="perf_cookbook"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Performance Guidelines and Best Practices</title></head><body id="perf_cookbook"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Performance Guidelines and Best Practices</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Here are performance guidelines and best practices that you can use during planning, experimentation, and
+ performance tuning for an Impala-enabled <span class="keyword"></span> cluster. All of this information is also available in more
+ detail elsewhere in the Impala documentation; it is gathered together here to serve as a cookbook and
+ emphasize which performance techniques typically provide the highest return on investment
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_file_format"><h2 class="title sectiontitle">Choose the appropriate file format for the data.</h2>
+
+
+
+ <p class="p">
+ Typically, for large volumes of data (multiple gigabytes per table or partition), the Parquet file format
+ performs best because of its combination of columnar storage layout, large I/O request size, and
+ compression and encoding. See <a class="xref" href="impala_file_formats.html#file_formats">How Impala Works with Hadoop File Formats</a> for comparisons of all
+ file formats supported by Impala, and <a class="xref" href="impala_parquet.html#parquet">Using the Parquet File Format with Impala Tables</a> for details about the
+ Parquet file format.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ For smaller volumes of data, a few gigabytes or less for each table or partition, you might not see
+ significant performance differences between file formats. At small data volumes, reduced I/O from an
+ efficient compressed file format can be counterbalanced by reduced opportunity for parallel execution. When
+ planning for a production deployment or conducting benchmarks, always use realistic data volumes to get a
+ true picture of performance and scalability.
+ </div>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_small_files"><h2 class="title sectiontitle">Avoid data ingestion processes that produce many small files.</h2>
+
+
+
+ <p class="p">
+ When producing data files outside of Impala, prefer either text format or Avro, where you can build up the
+ files row by row. Once the data is in Impala, you can convert it to the more efficient Parquet format and
+ split into multiple data files using a single <code class="ph codeph">INSERT ... SELECT</code> statement. Or, if you have
+ the infrastructure to produce multi-megabyte Parquet files as part of your data preparation process, do
+ that and skip the conversion step inside Impala.
+ </p>
+
+ <p class="p">
+ Always use <code class="ph codeph">INSERT ... SELECT</code> to copy significant volumes of data from table to table
+ within Impala. Avoid <code class="ph codeph">INSERT ... VALUES</code> for any substantial volume of data or
+ performance-critical tables, because each such statement produces a separate tiny data file. See
+ <a class="xref" href="impala_insert.html#insert">INSERT Statement</a> for examples of the <code class="ph codeph">INSERT ... SELECT</code> syntax.
+ </p>
+
+ <p class="p">
+ For example, if you have thousands of partitions in a Parquet table, each with less than
+ <span class="ph">256 MB</span> of data, consider partitioning in a less granular way, such as by
+ year / month rather than year / month / day. If an inefficient data ingestion process produces thousands of
+ data files in the same table or partition, consider compacting the data by performing an <code class="ph codeph">INSERT ...
+ SELECT</code> to copy all the data to a different table; the data will be reorganized into a smaller
+ number of larger files by this process.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_partitioning"><h2 class="title sectiontitle">Choose partitioning granularity based on actual data volume.</h2>
+
+
+
+ <p class="p">
+ Partitioning is a technique that physically divides the data based on values of one or more columns, such
+ as by year, month, day, region, city, section of a web site, and so on. When you issue queries that request
+ a specific value or range of values for the partition key columns, Impala can avoid reading the irrelevant
+ data, potentially yielding a huge savings in disk I/O.
+ </p>
+
+ <p class="p">
+ When deciding which column(s) to use for partitioning, choose the right level of granularity. For example,
+ should you partition by year, month, and day, or only by year and month? Choose a partitioning strategy
+ that puts at least <span class="ph">256 MB</span> of data in each partition, to take advantage of
+ HDFS bulk I/O and Impala distributed queries.
+ </p>
+
+ <p class="p">
+ Over-partitioning can also cause query planning to take longer than necessary, as Impala prunes the
+ unnecessary partitions. Ideally, keep the number of partitions in the table under 30 thousand.
+ </p>
+
+ <p class="p">
+ When preparing data files to go in a partition directory, create several large files rather than many small
+ ones. If you receive data in the form of many small files and have no control over the input format,
+ consider using the <code class="ph codeph">INSERT ... SELECT</code> syntax to copy data from one table or partition to
+ another, which compacts the files into a relatively small number (based on the number of nodes in the
+ cluster).
+ </p>
+
+ <p class="p">
+ If you need to reduce the overall number of partitions and increase the amount of data in each partition,
+ first look for partition key columns that are rarely referenced or are referenced in non-critical queries
+ (not subject to an SLA). For example, your web site log data might be partitioned by year, month, day, and
+ hour, but if most queries roll up the results by day, perhaps you only need to partition by year, month,
+ and day.
+ </p>
+
+ <p class="p">
+ If you need to reduce the granularity even more, consider creating <span class="q">"buckets"</span>, computed values
+ corresponding to different sets of partition key values. For example, you can use the
+ <code class="ph codeph">TRUNC()</code> function with a <code class="ph codeph">TIMESTAMP</code> column to group date and time values
+ based on intervals such as week or quarter. See
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions">Impala Date and Time Functions</a> for details.
+ </p>
+
+ <p class="p">
+ See <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a> for full details and performance considerations for
+ partitioning.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_partition_keys"><h2 class="title sectiontitle">Use smallest appropriate integer types for partition key columns.</h2>
+
+
+
+ <p class="p">
+ Although it is tempting to use strings for partition key columns, since those values are turned into HDFS
+ directory names anyway, you can minimize memory usage by using numeric values for common partition key
+ fields such as <code class="ph codeph">YEAR</code>, <code class="ph codeph">MONTH</code>, and <code class="ph codeph">DAY</code>. Use the smallest
+ integer type that holds the appropriate range of values, typically <code class="ph codeph">TINYINT</code> for
+ <code class="ph codeph">MONTH</code> and <code class="ph codeph">DAY</code>, and <code class="ph codeph">SMALLINT</code> for <code class="ph codeph">YEAR</code>.
+ Use the <code class="ph codeph">EXTRACT()</code> function to pull out individual date and time fields from a
+ <code class="ph codeph">TIMESTAMP</code> value, and <code class="ph codeph">CAST()</code> the return value to the appropriate integer
+ type.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_parquet_block_size"><h2 class="title sectiontitle">Choose an appropriate Parquet block size.</h2>
+
+
+
+ <p class="p">
+ By default, the Impala <code class="ph codeph">INSERT ... SELECT</code> statement creates Parquet files with a 256 MB
+ block size. (This default was changed in Impala 2.0. Formerly, the limit was 1 GB, but Impala made
+ conservative estimates about compression, resulting in files that were smaller than 1 GB.)
+ </p>
+
+ <p class="p">
+ Each Parquet file written by Impala is a single block, allowing the whole file to be processed as a unit by a single host.
+ As you copy Parquet files into HDFS or between HDFS filesystems, use <code class="ph codeph">hdfs dfs -pb</code> to preserve the original
+ block size.
+ </p>
+
+ <p class="p">
+ If there is only one or a few data block in your Parquet table, or in a partition that is the only one
+ accessed by a query, then you might experience a slowdown for a different reason: not enough data to take
+ advantage of Impala's parallel distributed queries. Each data block is processed by a single core on one of
+ the DataNodes. In a 100-node cluster of 16-core machines, you could potentially process thousands of data
+ files simultaneously. You want to find a sweet spot between <span class="q">"many tiny files"</span> and <span class="q">"single giant
+ file"</span> that balances bulk I/O and parallel processing. You can set the <code class="ph codeph">PARQUET_FILE_SIZE</code>
+ query option before doing an <code class="ph codeph">INSERT ... SELECT</code> statement to reduce the size of each
+ generated Parquet file. <span class="ph">(Specify the file size as an absolute number of bytes, or in Impala
+ 2.0 and later, in units ending with <code class="ph codeph">m</code> for megabytes or <code class="ph codeph">g</code> for
+ gigabytes.)</span> Run benchmarks with different file sizes to find the right balance point for your
+ particular data volume.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_stats"><h2 class="title sectiontitle">Gather statistics for all tables used in performance-critical or high-volume join queries.</h2>
+
+
+
+ <p class="p">
+ Gather the statistics with the <code class="ph codeph">COMPUTE STATS</code> statement. See
+ <a class="xref" href="impala_perf_joins.html#perf_joins">Performance Considerations for Join Queries</a> for details.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_network"><h2 class="title sectiontitle">Minimize the overhead of transmitting results back to the client.</h2>
+
+
+
+ <p class="p">
+ Use techniques such as:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Aggregation. If you need to know how many rows match a condition, the total values of matching values
+ from some column, the lowest or highest matching value, and so on, call aggregate functions such as
+ <code class="ph codeph">COUNT()</code>, <code class="ph codeph">SUM()</code>, and <code class="ph codeph">MAX()</code> in the query rather than
+ sending the result set to an application and doing those computations there. Remember that the size of an
+ unaggregated result set could be huge, requiring substantial time to transmit across the network.
+ </li>
+
+ <li class="li">
+ Filtering. Use all applicable tests in the <code class="ph codeph">WHERE</code> clause of a query to eliminate rows
+ that are not relevant, rather than producing a big result set and filtering it using application logic.
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">LIMIT</code> clause. If you only need to see a few sample values from a result set, or the top
+ or bottom values from a query using <code class="ph codeph">ORDER BY</code>, include the <code class="ph codeph">LIMIT</code> clause
+ to reduce the size of the result set rather than asking for the full result set and then throwing most of
+ the rows away.
+ </li>
+
+ <li class="li">
+ Avoid overhead from pretty-printing the result set and displaying it on the screen. When you retrieve the
+ results through <span class="keyword cmdname">impala-shell</span>, use <span class="keyword cmdname">impala-shell</span> options such as
+ <code class="ph codeph">-B</code> and <code class="ph codeph">--output_delimiter</code> to produce results without special
+ formatting, and redirect output to a file rather than printing to the screen. Consider using
+ <code class="ph codeph">INSERT ... SELECT</code> to write the results directly to new files in HDFS. See
+ <a class="xref" href="impala_shell_options.html#shell_options">impala-shell Configuration Options</a> for details about the
+ <span class="keyword cmdname">impala-shell</span> command-line options.
+ </li>
+ </ul>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_explain"><h2 class="title sectiontitle">Verify that your queries are planned in an efficient logical manner.</h2>
+
+
+
+ <p class="p">
+ Examine the <code class="ph codeph">EXPLAIN</code> plan for a query before actually running it. See
+ <a class="xref" href="impala_explain.html#explain">EXPLAIN Statement</a> and <a class="xref" href="impala_explain_plan.html#perf_explain">Using the EXPLAIN Plan for Performance Tuning</a> for
+ details.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_profile"><h2 class="title sectiontitle">Verify performance characteristics of queries.</h2>
+
+
+
+ <p class="p">
+ Verify that the low-level aspects of I/O, memory usage, network bandwidth, CPU utilization, and so on are
+ within expected ranges by examining the query profile for a query after running it. See
+ <a class="xref" href="impala_explain_plan.html#perf_profile">Using the Query Profile for Performance Tuning</a> for details.
+ </p>
+ </section>
+
+ <section class="section" id="perf_cookbook__perf_cookbook_os"><h2 class="title sectiontitle">Use appropriate operating system settings.</h2>
+
+
+
+ <p class="p">
+ See <span class="xref">the documentation for your Apache Hadoop distribution</span> for recommendations about operating system
+ settings that you can change to influence Impala performance. In particular, you might find
+ that changing the <code class="ph codeph">vm.swappiness</code> Linux kernel setting to a non-zero value improves
+ overall performance.
+ </p>
+ </section>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_performance.html">Tuning Impala for Performance</a></div></div></nav></article></main></body></html>
[44/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_breakpad.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_breakpad.html b/docs/build3x/html/topics/impala_breakpad.html
new file mode 100644
index 0000000..eb59388
--- /dev/null
+++ b/docs/build3x/html/topics/impala_breakpad.html
@@ -0,0 +1,239 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_troubleshooting.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="breakpad"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Breakpad Minidumps for Impala (Impala 2.6 or higher only)</title></head><body id="breakpad"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Breakpad Minidumps for Impala (<span class="keyword">Impala 2.6</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <a class="xref" href="https://chromium.googlesource.com/breakpad/breakpad/" target="_blank">breakpad</a>
+ project is an open-source framework for crash reporting.
+ In <span class="keyword">Impala 2.6</span> and higher, Impala can use <code class="ph codeph">breakpad</code> to record stack information and
+ register values when any of the Impala-related daemons crash due to an error such as <code class="ph codeph">SIGSEGV</code>
+ or unhandled exceptions.
+ The dump files are much smaller than traditional core dump files. The dump mechanism itself uses very little
+ memory, which improves reliability if the crash occurs while the system is low on memory.
+ </p>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ Because of the internal mechanisms involving Impala memory allocation and Linux
+ signalling for out-of-memory (OOM) errors, if an Impala-related daemon experiences a
+ crash due to an OOM condition, it does <em class="ph i">not</em> generate a minidump for that error.
+ <p class="p">
+
+ </p>
+ </div>
+
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_troubleshooting.html">Troubleshooting Impala</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="breakpad__breakpad_minidump_enable">
+ <h2 class="title topictitle2" id="ariaid-title2">Enabling or Disabling Minidump Generation</h2>
+ <div class="body conbody">
+ <p class="p">
+ By default, a minidump file is generated when an Impala-related daemon
+ crashes.
+ </p>
+
+ <div class="p">
+ To turn off generation of the minidump files, use one of the following
+ options:
+
+ <ul class="ul">
+ <li class="li">
+ Set the <code class="ph codeph">--enable_minidumps</code> configuration setting
+ to <code class="ph codeph">false</code>. Restart the corresponding services or
+ daemons.
+ </li>
+
+ <li class="li">
+ Set the <code class="ph codeph">--minidump_path</code> configuration setting to
+ an empty string. Restart the corresponding services or daemons.
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.7</span> and higher,
+ you can send a <code class="ph codeph">SIGUSR1</code> signal to any Impala-related daemon to write a
+ Breakpad minidump. For advanced troubleshooting, you can now produce a minidump
+ without triggering a crash.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="breakpad__breakpad_minidump_location">
+ <h2 class="title topictitle2" id="ariaid-title3">Specifying the Location for Minidump Files</h2>
+ <div class="body conbody">
+ <div class="p">
+ By default, all minidump files are written to the following location
+ on the host where a crash occurs:
+
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Clusters not managed by cluster management software:
+ <span class="ph filepath"><var class="keyword varname">impala_log_dir</var>/<var class="keyword varname">daemon_name</var>/minidumps/<var class="keyword varname">daemon_name</var></span>
+ </p>
+ </li>
+ </ul>
+ The minidump files for <span class="keyword cmdname">impalad</span>, <span class="keyword cmdname">catalogd</span>,
+ and <span class="keyword cmdname">statestored</span> are each written to a separate directory.
+ </div>
+ <p class="p">
+ To specify a different location, set the
+
+ <span class="ph uicontrol">minidump_path</span>
+ configuration setting of one or more Impala-related daemons, and restart the corresponding services or daemons.
+ </p>
+ <p class="p">
+ If you specify a relative path for this setting, the value is interpreted relative to
+ the default <span class="ph uicontrol">minidump_path</span> directory.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="breakpad__breakpad_minidump_number">
+ <h2 class="title topictitle2" id="ariaid-title4">Controlling the Number of Minidump Files</h2>
+ <div class="body conbody">
+ <p class="p">
+ Like any files used for logging or troubleshooting, consider limiting the number of
+ minidump files, or removing unneeded ones, depending on the amount of free storage
+ space on the hosts in the cluster.
+ </p>
+ <p class="p">
+ Because the minidump files are only used for problem resolution, you can remove any such files that
+ are not needed to debug current issues.
+ </p>
+ <p class="p">
+ To control how many minidump files Impala keeps around at any one time,
+ set the <span class="ph uicontrol">max_minidumps</span> configuration setting for
+ of one or more Impala-related daemon, and restart the corresponding services or daemons.
+ The default for this setting is 9. A zero or negative value is interpreted as
+ <span class="q">"unlimited"</span>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="breakpad__breakpad_minidump_logging">
+ <h2 class="title topictitle2" id="ariaid-title5">Detecting Crash Events</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ You can see in the Impala log files when crash events occur that generate
+ minidump files. Because each restart begins a new log file, the <span class="q">"crashed"</span> message
+ is always at or near the bottom of the log file. There might be another later message
+ if core dumps are also enabled.
+ </p>
+
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="breakpad__breakpad_demo">
+ <h2 class="title topictitle2" id="ariaid-title6">Demonstration of Breakpad Feature</h2>
+ <div class="body conbody">
+ <p class="p">
+ The following example uses the command <span class="keyword cmdname">kill -11</span> to
+ simulate a <code class="ph codeph">SIGSEGV</code> crash for an <span class="keyword cmdname">impalad</span>
+ process on a single DataNode, then examines the relevant log files and minidump file.
+ </p>
+
+ <p class="p">
+ First, as root on a worker node, kill the <span class="keyword cmdname">impalad</span> process with a
+ <code class="ph codeph">SIGSEGV</code> error. The original process ID was 23114.
+ </p>
+
+<pre class="pre codeblock"><code>
+# ps ax | grep impalad
+23114 ? Sl 0:18 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31259 pts/0 S+ 0:00 grep impalad
+#
+# kill -11 23114
+#
+# ps ax | grep impalad
+31374 ? Rl 0:04 /opt/local/parcels/<parcel_version>/lib/impala/sbin/impalad --flagfile=/var/run/impala/process/114-impala-IMPALAD/impala-conf/impalad_flags
+31475 pts/0 S+ 0:00 grep impalad
+
+</code></pre>
+
+ <p class="p">
+ We locate the log directory underneath <span class="ph filepath">/var/log</span>.
+ There is a <code class="ph codeph">.INFO</code>, <code class="ph codeph">.WARNING</code>, and <code class="ph codeph">.ERROR</code>
+ log file for the 23114 process ID. The minidump message is written to the
+ <code class="ph codeph">.INFO</code> file and the <code class="ph codeph">.ERROR</code> file, but not the
+ <code class="ph codeph">.WARNING</code> file. In this case, a large core file was also produced.
+ </p>
+<pre class="pre codeblock"><code>
+# cd /var/log/impalad
+# ls -la | grep 23114
+-rw------- 1 impala impala 3539079168 Jun 23 15:20 core.23114
+-rw-r--r-- 1 impala impala 99057 Jun 23 15:20 hs_err_pid23114.log
+-rw-r--r-- 1 impala impala 351 Jun 23 15:20 impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+-rw-r--r-- 1 impala impala 29101 Jun 23 15:20 impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+-rw-r--r-- 1 impala impala 228 Jun 23 14:03 impalad.worker_node_123.impala.log.WARNING.20160623-140343.23114
+
+</code></pre>
+ <p class="p">
+ The <code class="ph codeph">.INFO</code> log includes the location of the minidump file, followed by
+ a report of a core dump. With the breakpad minidump feature enabled, now we might
+ disable core dumps or keep fewer of them around.
+ </p>
+<pre class="pre codeblock"><code>
+# cat impalad.worker_node_123.impala.log.INFO.20160623-140343.23114
+...
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+#
+# A fatal error has been detected by the Java Runtime Environment:
+#
+# SIGSEGV (0xb) at pc=0x00000030c0e0b68a, pid=23114, tid=139869541455968
+#
+# JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build 1.7.0_67-b01)
+# Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode linux-amd64 compressed oops)
+# Problematic frame:
+# C [libpthread.so.0+0xb68a] pthread_cond_wait+0xca
+#
+# Core dump written. Default location: /var/log/impalad/core or core.23114
+#
+# An error report file with more information is saved as:
+# /var/log/impalad/hs_err_pid23114.log
+#
+# If you would like to submit a bug report, please visit:
+# http://bugreport.sun.com/bugreport/crash.jsp
+# The crash happened outside the Java Virtual Machine in native code.
+# See problematic frame for where to report the bug.
+...
+
+# cat impalad.worker_node_123.impala.log.ERROR.20160623-140343.23114
+
+Log file created at: 2016/06/23 14:03:43
+Running on machine:.worker_node_123
+Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg
+E0623 14:03:43.911002 23114 logging.cc:118] stderr will be logged to this file.
+Wrote minidump to /var/log/impala-minidumps/impalad/0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+
+ <p class="p">
+ The resulting minidump file is much smaller than the corresponding core file,
+ making it much easier to supply diagnostic information to <span class="keyword">the appropriate support channel</span>.
+ </p>
+
+<pre class="pre codeblock"><code>
+# pwd
+/var/log/impalad
+# cd ../impala-minidumps/impalad
+# ls
+0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+# du -kh *
+2.4M 0980da2d-a905-01e1-25ff883a-04ee027a.dmp
+
+</code></pre>
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_buffer_pool_limit.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_buffer_pool_limit.html b/docs/build3x/html/topics/impala_buffer_pool_limit.html
new file mode 100644
index 0000000..e9406b7
--- /dev/null
+++ b/docs/build3x/html/topics/impala_buffer_pool_limit.html
@@ -0,0 +1,71 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="buffer_pool_limit"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>BUFFER_POOL_LIMIT Query Option</title></head><body id="buffer_pool_limit"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">BUFFER_POOL_LIMIT Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Defines a limit on the amount of memory that a query can allocate from the
+ internal buffer pool. The value for this limit applies to the memory on each host,
+ not the aggregate memory across the cluster. Typically not changed by users, except
+ during diagnosis of out-of-memory errors during queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> integer
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Default:</strong>
+ </p>
+ <p class="p">
+ The default setting for this option is the lower of 80% of the
+ <code class="ph codeph">MEM_LIMIT</code> setting, or the <code class="ph codeph">MEM_LIMIT</code>
+ setting minus 100 MB.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.10.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ If queries encounter out-of-memory errors, consider decreasing the
+ <code class="ph codeph">BUFFER_POOL_LIMIT</code> setting to less than 80% of the
+ <code class="ph codeph">MEM_LIMIT setting</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Set an absolute value.
+set buffer_pool_limit=8GB;
+
+-- Set a relative value based on the MEM_LIMIT setting.
+set buffer_pool_limit=80%;
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+ <p class="p">
+ <a class="xref" href="impala_default_spillable_buffer_size.html">DEFAULT_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_max_row_size.html">MAX_ROW_SIZE Query Option</a>,
+ <a class="xref" href="impala_min_spillable_buffer_size.html">MIN_SPILLABLE_BUFFER_SIZE Query Option</a>,
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_char.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_char.html b/docs/build3x/html/topics/impala_char.html
new file mode 100644
index 0000000..6441539
--- /dev/null
+++ b/docs/build3x/html/topics/impala_char.html
@@ -0,0 +1,305 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_datatypes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="char"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>CHAR Data Type (Impala 2.0 or higher only)</title></head><body id="char"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">CHAR Data Type (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ A fixed-length character type, padded with trailing spaces if necessary to achieve the specified length. If
+ values are longer than the specified length, Impala truncates any trailing characters.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+ <p class="p">
+ In the column definition of a <code class="ph codeph">CREATE TABLE</code> statement:
+ </p>
+
+<pre class="pre codeblock"><code><var class="keyword varname">column_name</var> CHAR(<var class="keyword varname">length</var>)</code></pre>
+
+ <p class="p">
+ The maximum length you can specify is 255.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Semantics of trailing spaces:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ When you store a <code class="ph codeph">CHAR</code> value shorter than the specified length in a table, queries return
+ the value padded with trailing spaces if necessary; the resulting value has the same length as specified in
+ the column definition.
+ </li>
+
+ <li class="li">
+ If you store a <code class="ph codeph">CHAR</code> value containing trailing spaces in a table, those trailing spaces are
+ not stored in the data file. When the value is retrieved by a query, the result could have a different
+ number of trailing spaces. That is, the value includes however many spaces are needed to pad it to the
+ specified length of the column.
+ </li>
+
+ <li class="li">
+ If you compare two <code class="ph codeph">CHAR</code> values that differ only in the number of trailing spaces, those
+ values are considered identical.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Partitioning:</strong> This type can be used for partition key columns. Because of the efficiency advantage
+ of numeric values over character-based values, if the partition key is a string representation of a number,
+ prefer to use an integer type with sufficient range (<code class="ph codeph">INT</code>, <code class="ph codeph">BIGINT</code>, and so
+ on) where practical.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong> This data type cannot be used with HBase tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ This type can be read from and written to Parquet files.
+ </li>
+
+ <li class="li">
+ There is no requirement for a particular level of Parquet.
+ </li>
+
+ <li class="li">
+ Parquet files generated by Impala and containing this type can be freely interchanged with other components
+ such as Hive and MapReduce.
+ </li>
+
+ <li class="li">
+ Any trailing spaces, whether implicitly or explicitly specified, are not written to the Parquet data files.
+ </li>
+
+ <li class="li">
+ Parquet data files might contain values that are longer than allowed by the
+ <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> length limit. Impala ignores any extra trailing characters when
+ it processes those values during a query.
+ </li>
+ </ul>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong>
+ </p>
+
+ <p class="p">
+ Text data files might contain values that are longer than allowed for a particular
+ <code class="ph codeph">CHAR(<var class="keyword varname">n</var>)</code> column. Any extra trailing characters are ignored when Impala
+ processes those values during a query. Text data files can also contain values that are shorter than the
+ defined length limit, and Impala pads them with trailing spaces up to the specified length. Any text data
+ files produced by Impala <code class="ph codeph">INSERT</code> statements do not include any trailing blanks for
+ <code class="ph codeph">CHAR</code> columns.
+ </p>
+
+ <p class="p"><strong class="ph b">Avro considerations:</strong></p>
+ <p class="p">
+ The Avro specification allows string values up to 2**64 bytes in length.
+ Impala queries for Avro tables use 32-bit integers to hold string lengths.
+ In <span class="keyword">Impala 2.5</span> and higher, Impala truncates <code class="ph codeph">CHAR</code>
+ and <code class="ph codeph">VARCHAR</code> values in Avro tables to (2**31)-1 bytes.
+ If a query encounters a <code class="ph codeph">STRING</code> value longer than (2**31)-1
+ bytes in an Avro table, the query fails. In earlier releases,
+ encountering such long values in an Avro table could cause a crash.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Compatibility:</strong>
+ </p>
+
+ <p class="p">
+ This type is available using <span class="keyword">Impala 2.0</span> or higher.
+ </p>
+
+ <p class="p">
+ Some other database systems make the length specification optional. For Impala, the length is required.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Internal details:</strong> Represented in memory as a byte array with the same size as the length
+ specification. Values that are shorter than the specified length are padded on the right with trailing
+ spaces.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Column statistics considerations:</strong> Because this type has a fixed size, the maximum and average size
+ fields are always filled in for column statistics, even before you run the <code class="ph codeph">COMPUTE STATS</code>
+ statement.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">UDF considerations:</strong> This type cannot be used for the argument or return type of a user-defined
+ function (UDF) or user-defined aggregate function (UDA).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ These examples show how trailing spaces are not considered significant when comparing or processing
+ <code class="ph codeph">CHAR</code> values. <code class="ph codeph">CAST()</code> truncates any longer string to fit within the defined
+ length. If a <code class="ph codeph">CHAR</code> value is shorter than the specified length, it is padded on the right with
+ spaces until it matches the specified length. Therefore, <code class="ph codeph">LENGTH()</code> represents the length
+ including any trailing spaces, and <code class="ph codeph">CONCAT()</code> also treats the column value as if it has
+ trailing spaces.
+ </p>
+
+<pre class="pre codeblock"><code>select cast('x' as char(4)) = cast('x ' as char(4)) as "unpadded equal to padded";
++--------------------------+
+| unpadded equal to padded |
++--------------------------+
+| true |
++--------------------------+
+
+create table char_length(c char(3));
+insert into char_length values (cast('1' as char(3))), (cast('12' as char(3))), (cast('123' as char(3))), (cast('123456' as char(3)));
+select concat("[",c,"]") as c, length(c) from char_length;
++-------+-----------+
+| c | length(c) |
++-------+-----------+
+| [1 ] | 3 |
+| [12 ] | 3 |
+| [123] | 3 |
+| [123] | 3 |
++-------+-----------+
+</code></pre>
+
+ <p class="p">
+ This example shows a case where data values are known to have a specific length, where <code class="ph codeph">CHAR</code>
+ is a logical data type to use.
+
+ </p>
+
+<pre class="pre codeblock"><code>create table addresses
+ (id bigint,
+ street_name string,
+ state_abbreviation char(2),
+ country_abbreviation char(2));
+</code></pre>
+
+ <p class="p">
+ The following example shows how values written by Impala do not physically include the trailing spaces. It
+ creates a table using text format, with <code class="ph codeph">CHAR</code> values much shorter than the declared length,
+ and then prints the resulting data file to show that the delimited values are not separated by spaces. The
+ same behavior applies to binary-format Parquet data files.
+ </p>
+
+<pre class="pre codeblock"><code>create table char_in_text (a char(20), b char(30), c char(40))
+ row format delimited fields terminated by ',';
+
+insert into char_in_text values (cast('foo' as char(20)), cast('bar' as char(30)), cast('baz' as char(40))), (cast('hello' as char(20)), cast('goodbye' as char(30)), cast('aloha' as char(40)));
+
+-- Running this Linux command inside impala-shell using the ! shortcut.
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+foo,bar,baz
+hello,goodbye,aloha
+</code></pre>
+
+ <p class="p">
+ The following example further illustrates the treatment of spaces. It replaces the contents of the previous
+ table with some values including leading spaces, trailing spaces, or both. Any leading spaces are preserved
+ within the data file, but trailing spaces are discarded. Then when the values are retrieved by a query, the
+ leading spaces are retrieved verbatim while any necessary trailing spaces are supplied by Impala.
+ </p>
+
+<pre class="pre codeblock"><code>insert overwrite char_in_text values (cast('trailing ' as char(20)), cast(' leading and trailing ' as char(30)), cast(' leading' as char(40)));
+!hdfs dfs -cat 'hdfs://127.0.0.1:8020/user/hive/warehouse/impala_doc_testing.db/char_in_text/*.*';
+trailing, leading and trailing, leading
+
+select concat('[',a,']') as a, concat('[',b,']') as b, concat('[',c,']') as c from char_in_text;
++------------------------+----------------------------------+--------------------------------------------+
+| a | b | c |
++------------------------+----------------------------------+--------------------------------------------+
+| [trailing ] | [ leading and trailing ] | [ leading ] |
++------------------------+----------------------------------+--------------------------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+ <p class="p">
+ Currently, the data types <code class="ph codeph">CHAR</code>, <code class="ph codeph">VARCHAR</code>,
+ <code class="ph codeph">ARRAY</code>, <code class="ph codeph">MAP</code>, and <code class="ph codeph">STRUCT</code> cannot be used with Kudu tables.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ Because the blank-padding behavior requires allocating the maximum length for each value in memory, for
+ scalability reasons avoid declaring <code class="ph codeph">CHAR</code> columns that are much longer than typical values in
+ that column.
+ </p>
+
+ <p class="p">
+ All data in <code class="ph codeph">CHAR</code> and <code class="ph codeph">VARCHAR</code> columns must be in a character encoding that
+ is compatible with UTF-8. If you have binary data from another database system (that is, a BLOB type), use
+ a <code class="ph codeph">STRING</code> column to hold it.
+ </p>
+
+ <p class="p">
+ When an expression compares a <code class="ph codeph">CHAR</code> with a <code class="ph codeph">STRING</code> or
+ <code class="ph codeph">VARCHAR</code>, the <code class="ph codeph">CHAR</code> value is implicitly converted to <code class="ph codeph">STRING</code>
+ first, with trailing spaces preserved.
+ </p>
+
+<pre class="pre codeblock"><code>select cast("foo " as char(5)) = 'foo' as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| false |
++----------------------+
+</code></pre>
+
+ <p class="p">
+ This behavior differs from other popular database systems. To get the expected result of
+ <code class="ph codeph">TRUE</code>, cast the expressions on both sides to <code class="ph codeph">CHAR</code> values of the appropriate
+ length:
+ </p>
+
+<pre class="pre codeblock"><code>select cast("foo " as char(5)) = cast('foo' as char(3)) as "char equal to string";
++----------------------+
+| char equal to string |
++----------------------+
+| true |
++----------------------+
+</code></pre>
+
+ <p class="p">
+ This behavior is subject to change in future releases.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_string.html#string">STRING Data Type</a>, <a class="xref" href="impala_varchar.html#varchar">VARCHAR Data Type (Impala 2.0 or higher only)</a>,
+ <a class="xref" href="impala_literals.html#string_literals">String Literals</a>,
+ <a class="xref" href="impala_string_functions.html#string_functions">Impala String Functions</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_datatypes.html">Data Types</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_comments.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_comments.html b/docs/build3x/html/topics/impala_comments.html
new file mode 100644
index 0000000..62bd6ee
--- /dev/null
+++ b/docs/build3x/html/topics/impala_comments.html
@@ -0,0 +1,46 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="comments"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Comments</title></head><body id="comments"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Comments</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Impala supports the familiar styles of SQL comments:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ All text from a <code class="ph codeph">--</code> sequence to the end of the line is considered a comment and ignored.
+ This type of comment can occur on a single line by itself, or after all or part of a statement.
+ </li>
+
+ <li class="li">
+ All text from a <code class="ph codeph">/*</code> sequence to the next <code class="ph codeph">*/</code> sequence is considered a
+ comment and ignored. This type of comment can stretch over multiple lines. This type of comment can occur
+ on one or more lines by itself, in the middle of a statement, or before or after a statement.
+ </li>
+ </ul>
+
+ <p class="p">
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>-- This line is a comment about a table.
+create table ...;
+
+/*
+This is a multi-line comment about a query.
+*/
+select ...;
+
+select * from t /* This is an embedded comment about a query. */ where ...;
+
+select * from t -- This is a trailing comment within a multi-line command.
+where ...;
+</code></pre>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref.html">Impala SQL Language Reference</a></div></div></nav></article></main></body></html>
[49/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_aggregate_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_aggregate_functions.html b/docs/build3x/html/topics/impala_aggregate_functions.html
new file mode 100644
index 0000000..9175be2
--- /dev/null
+++ b/docs/build3x/html/topics/impala_aggregate_functions.html
@@ -0,0 +1,34 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_appx_median.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_avg.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_count.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_group_concat.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_max.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_min.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_ndv.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_stddev.html"><meta name="DC.Relation" scheme="URI" conte
nt="../topics/impala_sum.html"><meta name="DC.Relation" scheme="URI" content="../topics/impala_variance.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aggregate_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Aggregate Functions</title></head><body id="aggregate_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Aggregate Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Aggregate functions are a special category with different rules. These functions calculate a return value
+ across all the items in a result set, so they require a <code class="ph codeph">FROM</code> clause in the query:
+ </p>
+
+<pre class="pre codeblock"><code>select count(product_id) from product_catalog;
+select max(height), avg(height) from census_data where age > 20;
+</code></pre>
+
+ <p class="p">
+ Aggregate functions also ignore <code class="ph codeph">NULL</code> values rather than returning a <code class="ph codeph">NULL</code>
+ result. For example, if some rows have <code class="ph codeph">NULL</code> for a particular column, those rows are
+ ignored when computing the <code class="ph codeph">AVG()</code> for that column. Likewise, specifying
+ <code class="ph codeph">COUNT(<var class="keyword varname">col_name</var>)</code> in a query counts only those rows where
+ <var class="keyword varname">col_name</var> contains a non-<code class="ph codeph">NULL</code> value.
+ </p>
+
+ <p class="p">
+
+ </p>
+
+ <p class="p toc"></p>
+ </div>
+<nav role="navigation" class="related-links"><ul class="ullinks"><li class="link ulchildlink"><strong><a href="../topics/impala_appx_median.html">APPX_MEDIAN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_avg.html">AVG Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_count.html">COUNT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_group_concat.html">GROUP_CONCAT Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_max.html">MAX Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_min.html">MIN Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_ndv.html">NDV Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_stddev.html">STDDEV, STDDEV_SAMP, STDDEV_POP Functions</a></strong><br></li><li cl
ass="link ulchildlink"><strong><a href="../topics/impala_sum.html">SUM Function</a></strong><br></li><li class="link ulchildlink"><strong><a href="../topics/impala_variance.html">VARIANCE, VARIANCE_SAMP, VARIANCE_POP, VAR_SAMP, VAR_POP Functions</a></strong><br></li></ul><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_aliases.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_aliases.html b/docs/build3x/html/topics/impala_aliases.html
new file mode 100644
index 0000000..95f4da8
--- /dev/null
+++ b/docs/build3x/html/topics/impala_aliases.html
@@ -0,0 +1,148 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_schema_objects.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="aliases"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Overview of Impala Aliases</title></head><body id="aliases"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Overview of Impala Aliases</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ When you write the names of tables, columns, or column expressions in a query, you can assign an alias at the
+ same time. Then you can specify the alias rather than the original name when making other references to the
+ table or column in the same statement. You typically specify aliases that are shorter, easier to remember, or
+ both than the original names. The aliases are printed in the query header, making them useful for
+ self-documenting output.
+ </p>
+
+ <p class="p">
+ To set up an alias, add the <code class="ph codeph">AS <var class="keyword varname">alias</var></code> clause immediately after any table,
+ column, or expression name in the <code class="ph codeph">SELECT</code> list or <code class="ph codeph">FROM</code> list of a query. The
+ <code class="ph codeph">AS</code> keyword is optional; you can also specify the alias immediately after the original name.
+ </p>
+
+<pre class="pre codeblock"><code>-- Make the column headers of the result set easier to understand.
+SELECT c1 AS name, c2 AS address, c3 AS phone FROM table_with_terse_columns;
+SELECT SUM(ss_xyz_dollars_net) AS total_sales FROM table_with_cryptic_columns;
+-- The alias can be a quoted string for extra readability.
+SELECT c1 AS "Employee ID", c2 AS "Date of hire" FROM t1;
+-- The AS keyword is optional.
+SELECT c1 "Employee ID", c2 "Date of hire" FROM t1;
+
+-- The table aliases assigned in the FROM clause can be used both earlier
+-- in the query (the SELECT list) and later (the WHERE clause).
+SELECT one.name, two.address, three.phone
+ FROM census one, building_directory two, phonebook three
+WHERE one.id = two.id and two.id = three.id;
+
+-- The aliases c1 and c2 let the query handle columns with the same names from 2 joined tables.
+-- The aliases t1 and t2 let the query abbreviate references to long or cryptically named tables.
+SELECT t1.column_n AS c1, t2.column_n AS c2 FROM long_name_table AS t1, very_long_name_table2 AS t2
+ WHERE c1 = c2;
+SELECT t1.column_n c1, t2.column_n c2 FROM table1 t1, table2 t2
+ WHERE c1 = c2;
+</code></pre>
+
+ <p class="p">
+ From Impala 3.0, the alias substitution logic has changed.
+ </p>
+ <div class="p">
+ You can specify column aliases with or without the <code class="ph codeph">AS</code> keyword, and with no quotation
+ marks, single quotation marks, or double quotation marks. Some kind of quotation marks are required if the
+ column alias contains any spaces or other problematic characters. The alias text is displayed in the
+ <span class="keyword cmdname">impala-shell</span> output as all-lowercase. For example:
+<pre class="pre codeblock"><code>[localhost:21000] > select c1 First_Column from t;
+[localhost:21000] > select c1 as First_Column from t;
++--------------+
+| first_column |
++--------------+
+...
+
+[localhost:21000] > select c1 'First Column' from t;
+[localhost:21000] > select c1 as 'First Column' from t;
++--------------+
+| first column |
++--------------+
+...
+
+[localhost:21000] > select c1 "First Column" from t;
+[localhost:21000] > select c1 as "First Column" from t;
++--------------+
+| first column |
++--------------+
+...</code></pre>
+ From Impala 3.0, the alias substitution logic in the <code class="ph codeph">GROUP BY</code>, <code class="ph codeph">HAVING</code>,
+ and <code class="ph codeph">ORDER BY</code> clauses has become more consistent with standard SQL behavior, as follows.
+ Aliases are now only legal at the top level, and not in subexpressions. The following statements are
+ allowed:
+<pre class="pre codeblock"><code>
+ SELECT int_col / 2 AS x
+ FROM t
+ GROUP BY x;
+
+ SELECT int_col / 2 AS x
+ FROM t
+ ORDER BY x;
+
+ SELECT NOT bool_col AS nb
+ FROM t
+ GROUP BY nb
+ HAVING nb;
+</code></pre>
+ And the following statements are NOT allowed:
+<pre class="pre codeblock"><code>
+ SELECT int_col / 2 AS x
+ FROM t
+ GROUP BY x / 2;
+
+ SELECT int_col / 2 AS x
+ FROM t
+ ORDER BY -x;
+
+ SELECT int_col / 2 AS x
+ FROM t
+ GROUP BY x
+ HAVING x > 3;
+</code></pre>
+ </div>
+
+ <p class="p">
+ To use an alias name that matches one of the Impala reserved keywords (listed in
+ <a class="xref" href="impala_reserved_words.html#reserved_words">Impala Reserved Words</a>), surround the identifier with either single or
+ double quotation marks, or <code class="ph codeph">``</code> characters (backticks).
+ </p>
+
+ <p class="p">
+ <span class="ph"> Aliases follow the same rules as identifiers when it comes to case
+ insensitivity. Aliases can be longer than identifiers (up to the maximum length of a Java string) and can
+ include additional characters such as spaces and dashes when they are quoted using backtick characters.
+ </span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ Queries involving the complex types (<code class="ph codeph">ARRAY</code>,
+ <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>), typically make
+ extensive use of table aliases. These queries involve join clauses
+ where the complex type column is treated as a joined table.
+ To construct two-part or three-part qualified names for the
+ complex column elements in the <code class="ph codeph">FROM</code> list,
+ sometimes it is syntactically required to construct a table
+ alias for the complex column where it is referenced in the join clause.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details and examples.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Alternatives:</strong>
+ </p>
+
+ <p class="p">
+ Another way to define different names for the same tables or columns is to create views. See
+ <a class="xref" href="../shared/../topics/impala_views.html#views">Overview of Impala Views</a> for details.
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_schema_objects.html">Impala Schema Objects and Object Names</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_allow_unsupported_formats.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_allow_unsupported_formats.html b/docs/build3x/html/topics/impala_allow_unsupported_formats.html
new file mode 100644
index 0000000..6481bf3
--- /dev/null
+++ b/docs/build3x/html/topics/impala_allow_unsupported_formats.html
@@ -0,0 +1,24 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="allow_unsupported_formats"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALLOW_UNSUPPORTED_FORMATS Query Option</title></head><body id="allow_unsupported_formats"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ALLOW_UNSUPPORTED_FORMATS Query Option</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ An obsolete query option from early work on support for file formats. Do not use. Might be removed in the
+ future.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_alter_table.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_alter_table.html b/docs/build3x/html/topics/impala_alter_table.html
new file mode 100644
index 0000000..628b779
--- /dev/null
+++ b/docs/build3x/html/topics/impala_alter_table.html
@@ -0,0 +1,1117 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_table"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER TABLE Statement</title></head><body id="alter_table"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ALTER TABLE Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ The <code class="ph codeph">ALTER TABLE</code> statement changes the structure or properties of an existing Impala table.
+ </p>
+ <p class="p">
+ In Impala, this is primarily a logical operation that updates the table metadata in the metastore database that Impala
+ shares with Hive. Most <code class="ph codeph">ALTER TABLE</code> operations do not actually rewrite, move, and so on the actual data
+ files. (The <code class="ph codeph">RENAME TO</code> clause is the one exception; it can cause HDFS files to be moved to different paths.)
+ When you do an <code class="ph codeph">ALTER TABLE</code> operation, you typically need to perform corresponding physical filesystem operations,
+ such as rewriting the data files to include extra fields, or converting them to a different file format.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE [<var class="keyword varname">old_db_name</var>.]<var class="keyword varname">old_table_name</var> RENAME TO [<var class="keyword varname">new_db_name</var>.]<var class="keyword varname">new_table_name</var>
+
+ALTER TABLE <var class="keyword varname">name</var> ADD COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+ALTER TABLE <var class="keyword varname">name</var> DROP [COLUMN] <var class="keyword varname">column_name</var>
+ALTER TABLE <var class="keyword varname">name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>
+
+ALTER TABLE <var class="keyword varname">name</var> REPLACE COLUMNS (<var class="keyword varname">col_spec</var>[, <var class="keyword varname">col_spec</var> ...])
+
+<span class="ph">-- Kudu tables only.
+ALTER TABLE <var class="keyword varname">name</var> ALTER [COLUMN] <var class="keyword varname">column_name</var>
+ { SET <var class="keyword varname">kudu_storage_attr</var> <var class="keyword varname">attr_value</var>
+ | DROP DEFAULT }
+
+kudu_storage_attr ::= { DEFAULT | BLOCK_SIZE | ENCODING | COMPRESSION }</span>
+
+<span class="ph">-- Non-Kudu tables only.
+ALTER TABLE <var class="keyword varname">name</var> ALTER [COLUMN] <var class="keyword varname">column_name</var>
+ SET COMMENT '<var class="keyword varname">comment_text</var>'</span>
+
+ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+ <span class="ph">[<var class="keyword varname">location_spec</var>]</span>
+ <span class="ph">[<var class="keyword varname">cache_spec</var>]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> ADD [IF NOT EXISTS] RANGE PARTITION <var class="keyword varname">kudu_partition_spec</var></span>
+
+ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] PARTITION (<var class="keyword varname">partition_spec</var>)
+ <span class="ph">[PURGE]</span>
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> DROP [IF EXISTS] RANGE PARTITION <var class="keyword varname">kudu_partition_spec</var></span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> RECOVER PARTITIONS</span>
+
+ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)]
+ SET { FILEFORMAT <var class="keyword varname">file_format</var>
+ | LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'
+ | TBLPROPERTIES (<var class="keyword varname">table_properties</var>)
+ | SERDEPROPERTIES (<var class="keyword varname">serde_properties</var>) }
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> <var class="keyword varname">colname</var>
+ ('<var class="keyword varname">statsKey</var>'='<var class="keyword varname">val</var>, ...)
+
+statsKey ::= numDVs | numNulls | avgSize | maxSize</span>
+
+<span class="ph">ALTER TABLE <var class="keyword varname">name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET { CACHED IN '<var class="keyword varname">pool_name</var>' <span class="ph">[WITH REPLICATION = <var class="keyword varname">integer</var>]</span> | UNCACHED }</span>
+
+<var class="keyword varname">new_name</var> ::= [<var class="keyword varname">new_database</var>.]<var class="keyword varname">new_table_name</var>
+
+<var class="keyword varname">col_spec</var> ::= <var class="keyword varname">col_name</var> <var class="keyword varname">type_name</var> <span class="ph">[<var class="keyword varname">kudu_attributes</var>]</span>
+
+<span class="ph"><var class="keyword varname">kudu_attributes</var> ::= { [NOT] NULL | ENCODING <var class="keyword varname">codec</var> | COMPRESSION <var class="keyword varname">algorithm</var> |
+ DEFAULT <var class="keyword varname">constant</var> | BLOCK_SIZE <var class="keyword varname">number</var> }</span>
+
+<var class="keyword varname">partition_spec</var> ::= <var class="keyword varname">simple_partition_spec</var> | <span class="ph"><var class="keyword varname">complex_partition_spec</var></span>
+
+<var class="keyword varname">simple_partition_spec</var> ::= <var class="keyword varname">partition_col</var>=<var class="keyword varname">constant_value</var>
+
+<span class="ph"><var class="keyword varname">complex_partition_spec</var> ::= <var class="keyword varname">comparison_expression_on_partition_col</var></span>
+
+<span class="ph"><var class="keyword varname">kudu_partition_spec</var> ::= <var class="keyword varname">constant</var> <var class="keyword varname">range_operator</var> VALUES <var class="keyword varname">range_operator</var> <var class="keyword varname">constant</var> | VALUE = <var class="keyword varname">constant</var></span>
+
+<span class="ph">cache_spec ::= CACHED IN '<var class="keyword varname">pool_name</var>' [WITH REPLICATION = <var class="keyword varname">integer</var>] | UNCACHED</span>
+
+<span class="ph">location_spec ::= LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>'</span>
+
+<var class="keyword varname">table_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">serde_properties</var> ::= '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>'[, '<var class="keyword varname">name</var>'='<var class="keyword varname">value</var>' ...]
+
+<var class="keyword varname">file_format</var> ::= { PARQUET | TEXTFILE | RCFILE | SEQUENCEFILE | AVRO }
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Complex type considerations:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">ALTER TABLE</code> statement can
+ change the metadata for tables containing complex types (<code class="ph codeph">ARRAY</code>,
+ <code class="ph codeph">STRUCT</code>, and <code class="ph codeph">MAP</code>).
+ For example, you can use an <code class="ph codeph">ADD COLUMNS</code>, <code class="ph codeph">DROP COLUMN</code>, or <code class="ph codeph">CHANGE</code>
+ clause to modify the table layout for complex type columns.
+ Although Impala queries only work for complex type columns in Parquet tables, the complex type support in the
+ <code class="ph codeph">ALTER TABLE</code> statement applies to all file formats.
+ For example, you can use Impala to update metadata for a staging table in a non-Parquet file format where the
+ data is populated by Hive. Or you can use <code class="ph codeph">ALTER TABLE SET FILEFORMAT</code> to change the format
+ of an existing table to Parquet so that Impala can query it. Remember that changing the file format for a table does
+ not convert the data files within the table; you must prepare any Parquet data files containing complex types
+ outside Impala, and bring them into the table using <code class="ph codeph">LOAD DATA</code> or updating the table's
+ <code class="ph codeph">LOCATION</code> property.
+ See <a class="xref" href="impala_complex_types.html#complex_types">Complex Types (Impala 2.3 or higher only)</a> for details about using complex types.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Whenever you specify partitions in an <code class="ph codeph">ALTER TABLE</code> statement, through the <code class="ph codeph">PARTITION
+ (<var class="keyword varname">partition_spec</var>)</code> clause, you must include all the partitioning columns in the
+ specification.
+ </p>
+
+ <p class="p">
+ Most of the <code class="ph codeph">ALTER TABLE</code> operations work the same for internal tables (managed by Impala) as
+ for external tables (with data files located in arbitrary locations). The exception is renaming a table; for
+ an external table, the underlying data directory is not renamed or moved.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Dropping or altering multiple partitions:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.8</span> and higher,
+ the expression for the partition clause with a <code class="ph codeph">DROP</code> or <code class="ph codeph">SET</code>
+ operation can include comparison operators such as <code class="ph codeph"><</code>, <code class="ph codeph">IN</code>,
+ or <code class="ph codeph">BETWEEN</code>, and Boolean operators such as <code class="ph codeph">AND</code>
+ and <code class="ph codeph">OR</code>.
+ </p>
+
+ <p class="p">
+ For example, you might drop a group of partitions corresponding to a particular date
+ range after the data <span class="q">"ages out"</span>:
+ </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year < 1995);
+alter table historical_data drop partition (year = 1996 and month between 1 and 6);
+
+</code></pre>
+
+ <p class="p">
+ For tables with multiple partition keys columns, you can specify multiple
+ conditions separated by commas, and the operation only applies to the partitions
+ that match all the conditions (similar to using an <code class="ph codeph">AND</code> clause):
+ </p>
+
+<pre class="pre codeblock"><code>
+alter table historical_data drop partition (year < 1995, last_name like 'A%');
+
+</code></pre>
+
+ <p class="p">
+ This technique can also be used to change the file format of groups of partitions,
+ as part of an ETL pipeline that periodically consolidates and rewrites the underlying
+ data files in a different file format:
+ </p>
+
+<pre class="pre codeblock"><code>
+alter table fast_growing_data partition (year = 2016, month in (10,11,12)) set fileformat parquet;
+
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The extended syntax involving comparison operators and multiple partitions
+ applies to the <code class="ph codeph">SET FILEFORMAT</code>, <code class="ph codeph">SET TBLPROPERTIES</code>,
+ <code class="ph codeph">SET SERDEPROPERTIES</code>, and <code class="ph codeph">SET [UN]CACHED</code> clauses.
+ You can also use this syntax with the <code class="ph codeph">PARTITION</code> clause
+ in the <code class="ph codeph">COMPUTE INCREMENTAL STATS</code> statement, and with the
+ <code class="ph codeph">PARTITION</code> clause of the <code class="ph codeph">SHOW FILES</code> statement.
+ Some forms of <code class="ph codeph">ALTER TABLE</code> still only apply to one partition
+ at a time: the <code class="ph codeph">SET LOCATION</code> and <code class="ph codeph">ADD PARTITION</code>
+ clauses. The <code class="ph codeph">PARTITION</code> clauses in the <code class="ph codeph">LOAD DATA</code>
+ and <code class="ph codeph">INSERT</code> statements also only apply to one partition at a time.
+ </p>
+ <p class="p">
+ A DDL statement that applies to multiple partitions is considered successful
+ (resulting in no changes) even if no partitions match the conditions.
+ The results are the same as if the <code class="ph codeph">IF EXISTS</code> clause was specified.
+ </p>
+ <p class="p">
+ The performance and scalability of this technique is similar to
+ issuing a sequence of single-partition <code class="ph codeph">ALTER TABLE</code>
+ statements in quick succession. To minimize bottlenecks due to
+ communication with the metastore database, or causing other
+ DDL operations on the same table to wait, test the effects of
+ performing <code class="ph codeph">ALTER TABLE</code> statements that affect
+ large numbers of partitions.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Amazon S3 considerations:</strong>
+ </p>
+
+ <p class="p">
+ You can specify an <code class="ph codeph">s3a://</code> prefix on the <code class="ph codeph">LOCATION</code> attribute of a table or partition
+ to make Impala query data from the Amazon S3 filesystem. In <span class="keyword">Impala 2.6</span> and higher, Impala automatically
+ handles creating or removing the associated folders when you issue <code class="ph codeph">ALTER TABLE</code> statements
+ with the <code class="ph codeph">ADD PARTITION</code> or <code class="ph codeph">DROP PARTITION</code> clauses.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala DDL statements such as
+ <code class="ph codeph">CREATE DATABASE</code>, <code class="ph codeph">CREATE TABLE</code>, <code class="ph codeph">DROP DATABASE CASCADE</code>,
+ <code class="ph codeph">DROP TABLE</code>, and <code class="ph codeph">ALTER TABLE [ADD|DROP] PARTITION</code> can create or remove folders
+ as needed in the Amazon S3 system. Prior to <span class="keyword">Impala 2.6</span>, you had to create folders yourself and point
+ Impala database, tables, or partitions at them, and manually remove folders when no longer needed.
+ See <a class="xref" href="../shared/../topics/impala_s3.html#s3">Using Impala with the Amazon S3 Filesystem</a> for details about reading and writing S3 data with Impala.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS caching (CACHED IN clause):</strong>
+ </p>
+
+ <p class="p">
+ If you specify the <code class="ph codeph">CACHED IN</code> clause, any existing or future data files in the table
+ directory or the partition subdirectories are designated to be loaded into memory with the HDFS caching
+ mechanism. See <a class="xref" href="impala_perf_hdfs_caching.html#hdfs_caching">Using HDFS Caching with Impala (Impala 2.1 or higher only)</a> for details about using the HDFS
+ caching feature.
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.2</span> and higher, the optional <code class="ph codeph">WITH REPLICATION</code> clause
+ for <code class="ph codeph">CREATE TABLE</code> and <code class="ph codeph">ALTER TABLE</code> lets you specify
+ a <dfn class="term">replication factor</dfn>, the number of hosts on which to cache the same data blocks.
+ When Impala processes a cached data block, where the cache replication factor is greater than 1, Impala randomly
+ selects a host that has a cached copy of that data block. This optimization avoids excessive CPU
+ usage on a single host when the same cached data block is processed multiple times.
+ Where practical, specify a value greater than or equal to the HDFS block replication factor.
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ The following sections show examples of the use cases for various <code class="ph codeph">ALTER TABLE</code> clauses.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To rename a table (RENAME TO clause):</strong>
+ </p>
+
+
+
+ <p class="p">
+ The <code class="ph codeph">RENAME TO</code> clause lets you change the name of an existing table, and optionally which
+ database it is located in.
+ </p>
+
+ <p class="p">
+ For internal tables, this operation physically renames the directory within HDFS that contains the data files;
+ the original directory name no longer exists. By qualifying the table names with database names, you can use
+ this technique to move an internal table (and its associated data directory) from one database to another.
+ For example:
+ </p>
+
+<pre class="pre codeblock"><code>create database d1;
+create database d2;
+create database d3;
+use d1;
+create table mobile (x int);
+use d2;
+-- Move table from another database to the current one.
+alter table d1.mobile rename to mobile;
+use d1;
+-- Move table from one database to another.
+alter table d2.mobile rename to d3.mobile;</code></pre>
+
+ <p class="p">
+ For external tables,
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To change the physical location where Impala looks for data files associated with a table or
+ partition:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> [PARTITION (<var class="keyword varname">partition_spec</var>)] SET LOCATION '<var class="keyword varname">hdfs_path_of_directory</var>';</code></pre>
+
+ <p class="p">
+ The path you specify is the full HDFS path where the data files reside, or will be created. Impala does not
+ create any additional subdirectory named after the table. Impala does not move any data files to this new
+ location or change any data files that might already exist in that directory.
+ </p>
+
+ <p class="p">
+ To set the location for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+ same partitioning columns for the table, with a constant value for each, to precisely identify the single
+ partition affected by the statement:
+ </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Redirect queries, INSERT, and LOAD DATA for one partition
+-- to a specific different directory.
+alter table p1 partition (month=1, day=1) set location '/usr/external_data/new_years_day';
+</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ If you are creating a partition for the first time and specifying its location, for maximum efficiency, use
+ a single <code class="ph codeph">ALTER TABLE</code> statement including both the <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">LOCATION</code> clauses, rather than separate statements with <code class="ph codeph">ADD PARTITION</code> and
+ <code class="ph codeph">SET LOCATION</code> clauses.
+ </div>
+
+ <p class="p">
+ <strong class="ph b">To automatically detect new partition directories added through Hive or HDFS operations:</strong>
+ </p>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.3</span> and higher, the <code class="ph codeph">RECOVER PARTITIONS</code> clause scans
+ a partitioned table to detect if any new partition directories were added outside of Impala,
+ such as by Hive <code class="ph codeph">ALTER TABLE</code> statements or by <span class="keyword cmdname">hdfs dfs</span>
+ or <span class="keyword cmdname">hadoop fs</span> commands. The <code class="ph codeph">RECOVER PARTITIONS</code> clause
+ automatically recognizes any data files present in these new directories, the same as
+ the <code class="ph codeph">REFRESH</code> statement does.
+ </p>
+
+ <p class="p">
+ For example, here is a sequence of examples showing how you might create a partitioned table in Impala,
+ create new partitions through Hive, copy data files into the new partitions with the <span class="keyword cmdname">hdfs</span>
+ command, and have Impala recognize the new partitions and new data:
+ </p>
+
+ <p class="p">
+ In Impala, create the table, and a single partition for demonstration purposes:
+ </p>
+
+<pre class="pre codeblock"><code>
+
+create database recover_partitions;
+use recover_partitions;
+create table t1 (s string) partitioned by (yy int, mm int);
+insert into t1 partition (yy = 2016, mm = 1) values ('Partition exists');
+show files in t1;
++---------------------------------------------------------------------+------+--------------+
+| Path | Size | Partition |
++---------------------------------------------------------------------+------+--------------+
+| /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt | 17B | yy=2016/mm=1 |
++---------------------------------------------------------------------+------+--------------+
+quit;
+
+</code></pre>
+
+ <p class="p">
+ In Hive, create some new partitions. In a real use case, you might create the
+ partitions and populate them with data as the final stages of an ETL pipeline.
+ </p>
+
+<pre class="pre codeblock"><code>
+
+hive> use recover_partitions;
+OK
+hive> alter table t1 add partition (yy = 2016, mm = 2);
+OK
+hive> alter table t1 add partition (yy = 2016, mm = 3);
+OK
+hive> quit;
+
+</code></pre>
+
+ <p class="p">
+ For demonstration purposes, manually copy data (a single row) into these
+ new partitions, using manual HDFS operations:
+ </p>
+
+<pre class="pre codeblock"><code>
+
+$ hdfs dfs -ls /user/hive/warehouse/recover_partitions.db/t1/yy=2016/
+Found 3 items
+drwxr-xr-x - impala hive 0 2016-05-09 16:06 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:14 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2
+drwxr-xr-x - jrussell hive 0 2016-05-09 16:13 /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3
+
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+ /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=2/data.txt
+$ hdfs dfs -cp /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=1/data.txt \
+ /user/hive/warehouse/recover_partitions.db/t1/yy=2016/mm=3/data.txt
+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+
+hive> select * from t1;
+OK
+Partition exists 2016 1
+Partition exists 2016 2
+Partition exists 2016 3
+hive> quit;
+
+</code></pre>
+
+ <p class="p">
+ In Impala, initially the partitions and data are not visible.
+ Running <code class="ph codeph">ALTER TABLE</code> with the <code class="ph codeph">RECOVER PARTITIONS</code>
+ clause scans the table data directory to find any new partition directories, and
+ the data files inside them:
+ </p>
+
+<pre class="pre codeblock"><code>
+
+select * from t1;
++------------------+------+----+
+| s | yy | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1 |
++------------------+------+----+
+
+alter table t1 recover partitions;
+select * from t1;
++------------------+------+----+
+| s | yy | mm |
++------------------+------+----+
+| Partition exists | 2016 | 1 |
+| Partition exists | 2016 | 3 |
+| Partition exists | 2016 | 2 |
++------------------+------+----+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">To change the key-value pairs of the TBLPROPERTIES and SERDEPROPERTIES fields:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);
+ALTER TABLE <var class="keyword varname">table_name</var> SET SERDEPROPERTIES ('<var class="keyword varname">key1</var>'='<var class="keyword varname">value1</var>', '<var class="keyword varname">key2</var>'='<var class="keyword varname">value2</var>'[, ...]);</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">TBLPROPERTIES</code> clause is primarily a way to associate arbitrary user-specified data items
+ with a particular table.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">SERDEPROPERTIES</code> clause sets up metadata defining how tables are read or written, needed
+ in some cases by Hive but not used extensively by Impala. You would use this clause primarily to change the
+ delimiter in an existing text table or partition, by setting the <code class="ph codeph">'serialization.format'</code> and
+ <code class="ph codeph">'field.delim'</code> property values to the new delimiter character:
+ </p>
+
+<pre class="pre codeblock"><code>-- This table begins life as pipe-separated text format.
+create table change_to_csv (s1 string, s2 string) row format delimited fields terminated by '|';
+-- Then we change it to a CSV table.
+alter table change_to_csv set SERDEPROPERTIES ('serialization.format'=',', 'field.delim'=',');
+insert overwrite change_to_csv values ('stop','go'), ('yes','no');
+!hdfs dfs -cat 'hdfs://<var class="keyword varname">hostname</var>:8020/<var class="keyword varname">data_directory</var>/<var class="keyword varname">dbname</var>.db/change_to_csv/<var class="keyword varname">data_file</var>';
+stop,go
+yes,no</code></pre>
+
+ <p class="p">
+ Use the <code class="ph codeph">DESCRIBE FORMATTED</code> statement to see the current values of these properties for an
+ existing table. See <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a> for more details about these clauses.
+ See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">impala_perf_stats.html#perf_table_stats_manual</a> for an example of using table properties to
+ fine-tune the performance-related table statistics.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To manually set or update table or column statistics:</strong>
+ </p>
+
+ <p class="p">
+ Although for most tables the <code class="ph codeph">COMPUTE STATS</code> or <code class="ph codeph">COMPUTE INCREMENTAL STATS</code>
+ statement is all you need to keep table and column statistics up to date for a table,
+ sometimes for a very large table or one that is updated frequently, the length of time to recompute
+ all the statistics might make it impractical to run those statements as often as needed.
+ As a workaround, you can use the <code class="ph codeph">ALTER TABLE</code> statement to set table statistics
+ at the level of the entire table or a single partition, or column statistics at the level of
+ the entire table.
+ </p>
+
+ <div class="p">
+ You can set the <code class="ph codeph">numrows</code> value for table statistics by changing the
+ <code class="ph codeph">TBLPROPERTIES</code> setting for a table or partition.
+ For example:
+<pre class="pre codeblock"><code>create table analysis_data stored as parquet as select * from raw_data;
+Inserted 1000000000 rows in 181.98s
+compute stats analysis_data;
+insert into analysis_data select * from smaller_table_we_forgot_before;
+Inserted 1000000 rows in 15.32s
+-- Now there are 1001000000 rows. We can update this single data point in the stats.
+alter table analysis_data set tblproperties('numRows'='1001000000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+<pre class="pre codeblock"><code>-- If the table originally contained 1 million rows, and we add another partition with 30 thousand rows,
+-- change the numRows property for the partition and the overall table.
+alter table partitioned_data partition(year=2009, month=4) set tblproperties ('numRows'='30000', 'STATS_GENERATED_VIA_STATS_TASK'='true');
+alter table partitioned_data set tblproperties ('numRows'='1030000', 'STATS_GENERATED_VIA_STATS_TASK'='true');</code></pre>
+ See <a class="xref" href="impala_perf_stats.html#perf_table_stats_manual">impala_perf_stats.html#perf_table_stats_manual</a> for details.
+ </div>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, you can use the <code class="ph codeph">SET COLUMN STATS</code> clause
+ to set a specific stats value for a particular column.
+ </p>
+
+ <div class="p">
+ You specify a case-insensitive symbolic name for the kind of statistics:
+ <code class="ph codeph">numDVs</code>, <code class="ph codeph">numNulls</code>, <code class="ph codeph">avgSize</code>, <code class="ph codeph">maxSize</code>.
+ The key names and values are both quoted. This operation applies to an entire table,
+ not a specific partition. For example:
+<pre class="pre codeblock"><code>
+create table t1 (x int, s string);
+insert into t1 values (1, 'one'), (2, 'two'), (2, 'deux');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x | INT | -1 | -1 | 4 | 4 |
+| s | STRING | -1 | -1 | -1 | -1 |
++--------+--------+------------------+--------+----------+----------+
+alter table t1 set column stats x ('numDVs'='2','numNulls'='0');
+alter table t1 set column stats s ('numdvs'='3','maxsize'='4');
+show column stats t1;
++--------+--------+------------------+--------+----------+----------+
+| Column | Type | #Distinct Values | #Nulls | Max Size | Avg Size |
++--------+--------+------------------+--------+----------+----------+
+| x | INT | 2 | 0 | 4 | 4 |
+| s | STRING | 3 | -1 | 4 | -1 |
++--------+--------+------------------+--------+----------+----------+
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">To reorganize columns for a table:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ALTER TABLE <var class="keyword varname">table_name</var> ADD COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> REPLACE COLUMNS (<var class="keyword varname">column_defs</var>);
+ALTER TABLE <var class="keyword varname">table_name</var> CHANGE <var class="keyword varname">column_name</var> <var class="keyword varname">new_name</var> <var class="keyword varname">new_type</var>;
+ALTER TABLE <var class="keyword varname">table_name</var> DROP <var class="keyword varname">column_name</var>;</code></pre>
+
+ <p class="p">
+ The <var class="keyword varname">column_spec</var> is the same as in the <code class="ph codeph">CREATE TABLE</code> statement: the column
+ name, then its data type, then an optional comment. You can add multiple columns at a time. The parentheses
+ are required whether you add a single column or multiple columns. When you replace columns, all the original
+ column definitions are discarded. You might use this technique if you receive a new set of data files with
+ different data types or columns in a different order. (The data files are retained, so if the new columns are
+ incompatible with the old ones, use <code class="ph codeph">INSERT OVERWRITE</code> or <code class="ph codeph">LOAD DATA OVERWRITE</code>
+ to replace all the data before issuing any further queries.)
+ </p>
+
+ <p class="p">
+ For example, here is how you might add columns to an existing table.
+ The first <code class="ph codeph">ALTER TABLE</code> adds two new columns, and the second
+ <code class="ph codeph">ALTER TABLE</code> adds one new column.
+ A single Impala query reads both the old and new data files, containing different numbers of columns.
+ For any columns not present in a particular data file, all the column values are
+ considered to be <code class="ph codeph">NULL</code>.
+ </p>
+
+<pre class="pre codeblock"><code>
+create table t1 (x int);
+insert into t1 values (1), (2);
+
+alter table t1 add columns (s string, t timestamp);
+insert into t1 values (3, 'three', now());
+
+alter table t1 add columns (b boolean);
+insert into t1 values (4, 'four', now(), true);
+
+select * from t1 order by x;
++---+-------+-------------------------------+------+
+| x | s | t | b |
++---+-------+-------------------------------+------+
+| 1 | NULL | NULL | NULL |
+| 2 | NULL | NULL | NULL |
+| 3 | three | 2016-05-11 11:19:45.054457000 | NULL |
+| 4 | four | 2016-05-11 11:20:20.260733000 | true |
++---+-------+-------------------------------+------+
+</code></pre>
+
+ <p class="p">
+ You might use the <code class="ph codeph">CHANGE</code> clause to rename a single column, or to treat an existing column as
+ a different type than before, such as to switch between treating a column as <code class="ph codeph">STRING</code> and
+ <code class="ph codeph">TIMESTAMP</code>, or between <code class="ph codeph">INT</code> and <code class="ph codeph">BIGINT</code>. You can only drop a
+ single column at a time; to drop multiple columns, issue multiple <code class="ph codeph">ALTER TABLE</code> statements, or
+ define the new set of columns with a single <code class="ph codeph">ALTER TABLE ... REPLACE COLUMNS</code> statement.
+ </p>
+
+ <p class="p">
+ The following examples show some safe operations to drop or change columns. Dropping the final column
+ in a table lets Impala ignore the data causing any disruption to existing data files. Changing the type
+ of a column works if existing data values can be safely converted to the new type. The type conversion
+ rules depend on the file format of the underlying table. For example, in a text table, the same value
+ can be interpreted as a <code class="ph codeph">STRING</code> or a numeric value, while in a binary format such as
+ Parquet, the rules are stricter and type conversions only work between certain sizes of integers.
+ </p>
+
+<pre class="pre codeblock"><code>
+create table optional_columns (x int, y int, z int, a1 int, a2 int);
+insert into optional_columns values (1,2,3,0,0), (2,3,4,100,100);
+
+-- When the last column in the table is dropped, Impala ignores the
+-- values that are no longer needed. (Dropping A1 but leaving A2
+-- would cause problems, as we will see in a subsequent example.)
+alter table optional_columns drop column a2;
+alter table optional_columns drop column a1;
+
+select * from optional_columns;
++---+---+---+
+| x | y | z |
++---+---+---+
+| 1 | 2 | 3 |
+| 2 | 3 | 4 |
++---+---+---+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+create table int_to_string (s string, x int);
+insert into int_to_string values ('one', 1), ('two', 2);
+
+-- What was an INT column will now be interpreted as STRING.
+-- This technique works for text tables but not other file formats.
+-- The second X represents the new name of the column, which we keep the same.
+alter table int_to_string change x x string;
+
+-- Once the type is changed, we can insert non-integer values into the X column
+-- and treat that column as a string, for example by uppercasing or concatenating.
+insert into int_to_string values ('three', 'trois');
+select s, upper(x) from int_to_string;
++-------+----------+
+| s | upper(x) |
++-------+----------+
+| one | 1 |
+| two | 2 |
+| three | TROIS |
++-------+----------+
+</code></pre>
+
+ <p class="p">
+ Remember that Impala does not actually do any conversion for the underlying data files as a result of
+ <code class="ph codeph">ALTER TABLE</code> statements. If you use <code class="ph codeph">ALTER TABLE</code> to create a table
+ layout that does not agree with the contents of the underlying files, you must replace the files
+ yourself, such as using <code class="ph codeph">LOAD DATA</code> to load a new set of data files, or
+ <code class="ph codeph">INSERT OVERWRITE</code> to copy from another table and replace the original data.
+ </p>
+
+ <p class="p">
+ The following example shows what happens if you delete the middle column from a Parquet table containing three columns.
+ The underlying data files still contain three columns of data. Because the columns are interpreted based on their positions in
+ the data file instead of the specific column names, a <code class="ph codeph">SELECT *</code> query now reads the first and second
+ columns from the data file, potentially leading to unexpected results or conversion errors.
+ For this reason, if you expect to someday drop a column, declare it as the last column in the table, where its data
+ can be ignored by queries after the column is dropped. Or, re-run your ETL process and create new data files
+ if you drop or change the type of a column in a way that causes problems with existing data files.
+ </p>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce unexpected results.
+create table p1 (s1 string, s2 string, s3 string) stored as parquet;
+
+insert into p1 values ('one', 'un', 'uno'), ('two', 'deux', 'dos'),
+ ('three', 'trois', 'tres');
+select * from p1;
++-------+-------+------+
+| s1 | s2 | s3 |
++-------+-------+------+
+| one | un | uno |
+| two | deux | dos |
+| three | trois | tres |
++-------+-------+------+
+
+alter table p1 drop column s2;
+-- The S3 column contains unexpected results.
+-- Because S2 and S3 have compatible types, the query reads
+-- values from the dropped S2, because the existing data files
+-- still contain those values as the second column.
+select * from p1;
++-------+-------+
+| s1 | s3 |
++-------+-------+
+| one | un |
+| two | deux |
+| three | trois |
++-------+-------+
+</code></pre>
+
+<pre class="pre codeblock"><code>
+-- Parquet table showing how dropping a column can produce conversion errors.
+create table p2 (s1 string, x int, s3 string) stored as parquet;
+
+insert into p2 values ('one', 1, 'uno'), ('two', 2, 'dos'), ('three', 3, 'tres');
+select * from p2;
++-------+---+------+
+| s1 | x | s3 |
++-------+---+------+
+| one | 1 | uno |
+| two | 2 | dos |
+| three | 3 | tres |
++-------+---+------+
+
+alter table p2 drop column x;
+select * from p2;
+WARNINGS:
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+
+File '<var class="keyword varname">hdfs_filename</var>' has an incompatible Parquet schema for column 'add_columns.p2.s3'.
+Column type: STRING, Parquet schema:
+optional int32 x [i:1 d:1 r:0]
+</code></pre>
+
+ <p class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, if an Avro table is created without column definitions in the
+ <code class="ph codeph">CREATE TABLE</code> statement, and columns are later
+ added through <code class="ph codeph">ALTER TABLE</code>, the resulting
+ table is now queryable. Missing values from the newly added
+ columns now default to <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To change the file format that Impala expects data to be in, for a table or partition:</strong>
+ </p>
+
+ <p class="p">
+ Use an <code class="ph codeph">ALTER TABLE ... SET FILEFORMAT</code> clause. You can include an optional <code class="ph codeph">PARTITION
+ (<var class="keyword varname">col1</var>=<var class="keyword varname">val1</var>, <var class="keyword varname">col2</var>=<var class="keyword varname">val2</var>,
+ ...</code> clause so that the file format is changed for a specific partition rather than the entire table.
+ </p>
+
+ <p class="p">
+ Because this operation only changes the table metadata, you must do any conversion of existing data using
+ regular Hadoop techniques outside of Impala. Any new data created by the Impala <code class="ph codeph">INSERT</code>
+ statement will be in the new format. You cannot specify the delimiter for Text files; the data files must be
+ comma-delimited.
+
+ </p>
+
+ <p class="p">
+ To set the file format for a single partition, include the <code class="ph codeph">PARTITION</code> clause. Specify all the
+ same partitioning columns for the table, with a constant value for each, to precisely identify the single
+ partition affected by the statement:
+ </p>
+
+<pre class="pre codeblock"><code>create table p1 (s string) partitioned by (month int, day int);
+-- Each ADD PARTITION clause creates a subdirectory in HDFS.
+alter table p1 add partition (month=1, day=1);
+alter table p1 add partition (month=1, day=2);
+alter table p1 add partition (month=2, day=1);
+alter table p1 add partition (month=2, day=2);
+-- Queries and INSERT statements will read and write files
+-- in this format for this specific partition.
+alter table p1 partition (month=2, day=2) set fileformat parquet;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">To add or drop partitions for a table</strong>, the table must already be partitioned (that is, created with a
+ <code class="ph codeph">PARTITIONED BY</code> clause). The partition is a physical directory in HDFS, with a name that
+ encodes a particular column value (the <strong class="ph b">partition key</strong>). The Impala <code class="ph codeph">INSERT</code> statement
+ already creates the partition if necessary, so the <code class="ph codeph">ALTER TABLE ... ADD PARTITION</code> is
+ primarily useful for importing data by moving or copying existing data files into the HDFS directory
+ corresponding to a partition. (You can use the <code class="ph codeph">LOAD DATA</code> statement to move files into the
+ partition directory, or <code class="ph codeph">ALTER TABLE ... PARTITION (...) SET LOCATION</code> to point a partition at
+ a directory that already contains data files.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">DROP PARTITION</code> clause is used to remove the HDFS directory and associated data files for
+ a particular set of partition key values; for example, if you always analyze the last 3 months worth of data,
+ at the beginning of each month you might drop the oldest partition that is no longer needed. Removing
+ partitions reduces the amount of metadata associated with the table and the complexity of calculating the
+ optimal query plan, which can simplify and speed up queries on partitioned tables, particularly join queries.
+ Here is an example showing the <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses.
+ </p>
+
+ <p class="p">
+ To avoid errors while adding or dropping partitions whose existence is not certain,
+ add the optional <code class="ph codeph">IF [NOT] EXISTS</code> clause between the <code class="ph codeph">ADD</code> or
+ <code class="ph codeph">DROP</code> keyword and the <code class="ph codeph">PARTITION</code> keyword. That is, the entire
+ clause becomes <code class="ph codeph">ADD IF NOT EXISTS PARTITION</code> or <code class="ph codeph">DROP IF EXISTS PARTITION</code>.
+ The following example shows how partitions can be created automatically through <code class="ph codeph">INSERT</code>
+ statements, or manually through <code class="ph codeph">ALTER TABLE</code> statements. The <code class="ph codeph">IF [NOT] EXISTS</code>
+ clauses let the <code class="ph codeph">ALTER TABLE</code> statements succeed even if a new requested partition already
+ exists, or a partition to be dropped does not exist.
+ </p>
+
+<p class="p">
+Inserting 2 year values creates 2 partitions:
+</p>
+
+<pre class="pre codeblock"><code>
+create table partition_t (s string) partitioned by (y int);
+insert into partition_t (s,y) values ('two thousand',2000), ('nineteen ninety',1990);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990 | -1 | 1 | 16B | NOT CACHED | NOT CACHED | TEXT | false |
+| 2000 | -1 | 1 | 13B | NOT CACHED | NOT CACHED | TEXT | false |
+| Total | -1 | 2 | 29B | 0B | | | |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Without the <code class="ph codeph">IF NOT EXISTS</code> clause, an attempt to add a new partition might fail:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add partition (y=2000);
+ERROR: AnalysisException: Partition spec already exists: (y=2000).
+</code></pre>
+
+<p class="p">
+The <code class="ph codeph">IF NOT EXISTS</code> clause makes the statement succeed whether or not there was already a
+partition with the specified key value:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t add if not exists partition (y=2000);
+alter table partition_t add if not exists partition (y=2010);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990 | -1 | 1 | 16B | NOT CACHED | NOT CACHED | TEXT | false |
+| 2000 | -1 | 1 | 13B | NOT CACHED | NOT CACHED | TEXT | false |
+| 2010 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false |
+| Total | -1 | 2 | 29B | 0B | | | |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+<p class="p">
+Likewise, the <code class="ph codeph">IF EXISTS</code> clause lets <code class="ph codeph">DROP PARTITION</code> succeed whether or not the partition is already
+in the table:
+</p>
+
+<pre class="pre codeblock"><code>
+alter table partition_t drop if exists partition (y=2000);
+alter table partition_t drop if exists partition (y=1950);
+show partitions partition_t;
++-------+-------+--------+------+--------------+-------------------+--------+-------------------+
+| y | #Rows | #Files | Size | Bytes Cached | Cache Replication | Format | Incremental stats |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+| 1990 | -1 | 1 | 16B | NOT CACHED | NOT CACHED | TEXT | false |
+| 2010 | -1 | 0 | 0B | NOT CACHED | NOT CACHED | TEXT | false |
+| Total | -1 | 1 | 16B | 0B | | | |
++-------+-------+--------+------+--------------+-------------------+--------+-------+
+</code></pre>
+
+ <p class="p"> The optional <code class="ph codeph">PURGE</code> keyword, available in
+ <span class="keyword">Impala 2.3</span> and higher, is used with the <code class="ph codeph">DROP
+ PARTITION</code> clause to remove associated HDFS data files
+ immediately rather than going through the HDFS trashcan mechanism. Use
+ this keyword when dropping a partition if it is crucial to remove the data
+ as quickly as possible to free up space, or if there is a problem with the
+ trashcan, such as the trash cannot being configured or being in a
+ different HDFS encryption zone than the data files. </p>
+
+
+
+<pre class="pre codeblock"><code>-- Create an empty table and define the partitioning scheme.
+create table part_t (x int) partitioned by (month int);
+-- Create an empty partition into which you could copy data files from some other source.
+alter table part_t add partition (month=1);
+-- After changing the underlying data, issue a REFRESH statement to make the data visible in Impala.
+refresh part_t;
+-- Later, do the same for the next month.
+alter table part_t add partition (month=2);
+
+-- Now you no longer need the older data.
+alter table part_t drop partition (month=1);
+-- If the table was partitioned by month and year, you would issue a statement like:
+-- alter table part_t drop partition (year=2003,month=1);
+-- which would require 12 ALTER TABLE statements to remove a year's worth of data.
+
+-- If the data files for subsequent months were in a different file format,
+-- you could set a different file format for the new partition as you create it.
+alter table part_t add partition (month=3) set fileformat=parquet;
+</code></pre>
+
+ <p class="p">
+ The value specified for a partition key can be an arbitrary constant expression, without any references to
+ columns. For example:
+ </p>
+
+<pre class="pre codeblock"><code>alter table time_data add partition (month=concat('Decem','ber'));
+alter table sales_data add partition (zipcode = cast(9021 * 10 as string));</code></pre>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ An alternative way to reorganize a table and its associated data files is to use <code class="ph codeph">CREATE
+ TABLE</code> to create a variation of the original table, then use <code class="ph codeph">INSERT</code> to copy the
+ transformed or reordered data to the new table. The advantage of <code class="ph codeph">ALTER TABLE</code> is that it
+ avoids making a duplicate copy of the data files, allowing you to reorganize huge volumes of data in a
+ space-efficient way using familiar Hadoop techniques.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">To switch a table between internal and external:</strong>
+ </p>
+
+ <div class="p">
+ You can switch a table from internal to external, or from external to internal, by using the <code class="ph codeph">ALTER
+ TABLE</code> statement:
+<pre class="pre codeblock"><code>
+-- Switch a table from internal to external.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='TRUE');
+
+-- Switch a table from external to internal.
+ALTER TABLE <var class="keyword varname">table_name</var> SET TBLPROPERTIES('EXTERNAL'='FALSE');
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong>
+ </p>
+ <p class="p">
+ Most <code class="ph codeph">ALTER TABLE</code> clauses do not actually
+ read or write any HDFS files, and so do not depend on
+ specific HDFS permissions. For example, the <code class="ph codeph">SET FILEFORMAT</code>
+ clause does not actually check the file format existing data files or
+ convert them to the new format, and the <code class="ph codeph">SET LOCATION</code> clause
+ does not require any special permissions on the new location.
+ (Any permission-related failures would come later, when you
+ actually query or insert into the table.)
+ </p>
+
+
+ <p class="p">
+ In general, <code class="ph codeph">ALTER TABLE</code> clauses that do touch
+ HDFS files and directories require the same HDFS permissions
+ as corresponding <code class="ph codeph">CREATE</code>, <code class="ph codeph">INSERT</code>,
+ or <code class="ph codeph">SELECT</code> statements.
+ The permissions allow
+ the user ID that the <span class="keyword cmdname">impalad</span> daemon runs under,
+ typically the <code class="ph codeph">impala</code> user, to read or write
+ files or directories, or (in the case of the execute bit) descend into a directory.
+ The <code class="ph codeph">RENAME TO</code> clause requires read, write, and execute permission in the
+ source and destination database directories and in the table data directory,
+ and read and write permission for the data files within the table.
+ The <code class="ph codeph">ADD PARTITION</code> and <code class="ph codeph">DROP PARTITION</code> clauses
+ require write and execute permissions for the associated partition directory.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Kudu considerations:</strong>
+ </p>
+
+ <div class="p">
+ Because of the extra constraints and features of Kudu tables, such as the <code class="ph codeph">NOT NULL</code>
+ and <code class="ph codeph">DEFAULT</code> attributes for columns, <code class="ph codeph">ALTER TABLE</code> has specific
+ requirements related to Kudu tables:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ In an <code class="ph codeph">ADD COLUMNS</code> operation, you can specify the <code class="ph codeph">NULL</code>,
+ <code class="ph codeph">NOT NULL</code>, and <code class="ph codeph">DEFAULT <var class="keyword varname">default_value</var></code>
+ column attributes.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ In <span class="keyword">Impala 2.9</span> and higher, you can also specify the <code class="ph codeph">ENCODING</code>,
+ <code class="ph codeph">COMPRESSION</code>, and <code class="ph codeph">BLOCK_SIZE</code> attributes when adding a column.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ If you add a column with a <code class="ph codeph">NOT NULL</code> attribute, it must also have a
+ <code class="ph codeph">DEFAULT</code> attribute, so the default value can be assigned to that
+ column for all existing rows.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">DROP COLUMN</code> clause works the same for a Kudu table as for other
+ kinds of tables.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ Although you can change the name of a column with the <code class="ph codeph">CHANGE</code> clause,
+ you cannot change the type of a column in a Kudu table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ You cannot change the nullability of existing columns in a Kudu table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ In <span class="keyword">Impala 2.10</span>, you can change the default value, encoding,
+ compression, or block size of existing columns in a Kudu table by using the
+ <code class="ph codeph">SET</code> clause.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ You cannot use the <code class="ph codeph">REPLACE COLUMNS</code> clause with a Kudu table.
+ </p>
+ </li>
+ <li class="li">
+ <p class="p">
+ The <code class="ph codeph">RENAME TO</code> clause for a Kudu table only affects the name stored in the
+ metastore database that Impala uses to refer to the table. To change which underlying Kudu
+ table is associated with an Impala table name, you must change the <code class="ph codeph">TBLPROPERTIES</code>
+ property of the table: <code class="ph codeph">SET TBLPROPERTIES('kudu.table_name'='<var class="keyword varname">kudu_tbl_name</var>)</code>.
+ Doing so causes Kudu to change the name of the underlying Kudu table.
+ </p>
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ The following are some examples of using the <code class="ph codeph">ADD COLUMNS</code> clause for a Kudu table:
+ </p>
+
+<pre class="pre codeblock"><code>
+CREATE TABLE t1 ( x INT, PRIMARY KEY (x) )
+ PARTITION BY HASH (x) PARTITIONS 16
+ STORED AS KUDU
+
+ALTER TABLE t1 ADD COLUMNS (y STRING ENCODING prefix_encoding);
+ALTER TABLE t1 ADD COLUMNS (z INT DEFAULT 10);
+ALTER TABLE t1 ADD COLUMNS (a STRING NOT NULL DEFAULT '', t TIMESTAMP COMPRESSION default_compression);
+</code></pre>
+
+ <p class="p">
+ The following are some examples of modifying column defaults and storage attributes for a Kudu table:
+ </p>
+
+<pre class="pre codeblock"><code>
+create table kt (x bigint primary key, s string default 'yes', t timestamp)
+ stored as kudu;
+
+-- You can change the default value for a column, which affects any rows
+-- inserted after this change is made.
+alter table kt alter column s set default 'no';
+
+-- You can remove the default value for a column, which affects any rows
+-- inserted after this change is made. If the column is nullable, any
+-- future inserts default to NULL for this column. If the column is marked
+-- NOT NULL, any future inserts must specify a value for the column.
+alter table kt alter column s drop default;
+
+insert into kt values (1, 'foo', now());
+-- Because of the DROP DEFAULT above, omitting S from the insert
+-- gives it a value of NULL.
+insert into kt (x, t) values (2, now());
+
+select * from kt;
++---+------+-------------------------------+
+| x | s | t |
++---+------+-------------------------------+
+| 2 | NULL | 2017-10-02 00:03:40.652156000 |
+| 1 | foo | 2017-10-02 00:03:04.346185000 |
++---+------+-------------------------------+
+
+-- Other storage-related attributes can also be changed for columns.
+-- These changes take effect for any newly inserted rows, or rows
+-- rearranged due to compaction after deletes or updates.
+alter table kt alter column s set encoding prefix_encoding;
+-- The COLUMN keyword is optional in the syntax.
+alter table kt alter x set block_size 2048;
+alter table kt alter column t set compression zlib;
+
+desc kt;
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+| name | type | comment | primary_key | nullable | default_value | encoding | compression | block_size |
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+| x | bigint | | true | false | | AUTO_ENCODING | DEFAULT_COMPRESSION | 2048 |
+| s | string | | false | true | | PREFIX_ENCODING | DEFAULT_COMPRESSION | 0 |
+| t | timestamp | | false | true | | AUTO_ENCODING | ZLIB | 0 |
++------+-----------+---------+-------------+----------+---------------+-----------------+---------------------+------------+
+</code></pre>
+
+ <p class="p">
+ Kudu tables all use an underlying partitioning mechanism. The partition syntax is different than for non-Kudu
+ tables. You can use the <code class="ph codeph">ALTER TABLE</code> statement to add and drop <dfn class="term">range partitions</dfn>
+ from a Kudu table. Any new range must not overlap with any existing ranges. Dropping a range removes all the associated
+ rows from the table. See <a class="xref" href="impala_kudu.html#kudu_partitioning">Partitioning for Kudu Tables</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a>,
+ <a class="xref" href="impala_create_table.html#create_table">CREATE TABLE Statement</a>, <a class="xref" href="impala_drop_table.html#drop_table">DROP TABLE Statement</a>,
+ <a class="xref" href="impala_partitioning.html#partitioning">Partitioning for Impala Tables</a>, <a class="xref" href="impala_tables.html#internal_tables">Internal Tables</a>,
+ <a class="xref" href="impala_tables.html#external_tables">External Tables</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_alter_view.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_alter_view.html b/docs/build3x/html/topics/impala_alter_view.html
new file mode 100644
index 0000000..2d96fa1
--- /dev/null
+++ b/docs/build3x/html/topics/impala_alter_view.html
@@ -0,0 +1,139 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_langref_sql.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="alter_view"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>ALTER VIEW Statement</title></head><body id="alter_view"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">ALTER VIEW Statement</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Changes the characteristics of a view. The syntax has two forms:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ The <code class="ph codeph">AS</code> clause associates the view with a different query.
+ </li>
+ <li class="li">
+ The <code class="ph codeph">RENAME TO</code> clause changes the name of the view, moves the view to
+ a different database, or both.
+ </li>
+ </ul>
+
+ <p class="p">
+ Because a view is purely a logical construct (an alias for a query) with no physical data behind it,
+ <code class="ph codeph">ALTER VIEW</code> only involves changes to metadata in the metastore database, not any data files
+ in HDFS.
+ </p>
+
+
+
+
+
+
+
+
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> AS <var class="keyword varname">select_statement</var>
+ALTER VIEW [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var> RENAME TO [<var class="keyword varname">database_name</var>.]<var class="keyword varname">view_name</var></code></pre>
+
+ <p class="p">
+ <strong class="ph b">Statement type:</strong> DDL
+ </p>
+
+ <p class="p">
+ If you connect to different Impala nodes within an <span class="keyword cmdname">impala-shell</span> session for
+ load-balancing purposes, you can enable the <code class="ph codeph">SYNC_DDL</code> query option to make each DDL
+ statement wait before returning, until the new or changed metadata has been received by all the Impala
+ nodes. See <a class="xref" href="../shared/../topics/impala_sync_ddl.html#sync_ddl">SYNC_DDL Query Option</a> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Security considerations:</strong>
+ </p>
+ <p class="p">
+ If these statements in your environment contain sensitive literal values such as credit card numbers or tax
+ identifiers, Impala can redact this sensitive information when displaying the statements in log files and
+ other administrative contexts. See <span class="xref">the documentation for your Apache Hadoop distribution</span> for details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Cancellation:</strong> Cannot be cancelled.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HDFS permissions:</strong> This statement does not touch any HDFS files or directories,
+ therefore no HDFS permissions are required.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>create table t1 (x int, y int, s string);
+create table t2 like t1;
+create view v1 as select * from t1;
+alter view v1 as select * from t2;
+alter view v1 as select x, upper(s) s from t2;</code></pre>
+
+
+
+ <div class="p">
+ To see the definition of a view, issue a <code class="ph codeph">DESCRIBE FORMATTED</code> statement, which shows the
+ query from the original <code class="ph codeph">CREATE VIEW</code> statement:
+<pre class="pre codeblock"><code>[localhost:21000] > create view v1 as select * from t1;
+[localhost:21000] > describe formatted v1;
+Query finished, fetching results ...
++------------------------------+------------------------------+------------+
+| name | type | comment |
++------------------------------+------------------------------+------------+
+| # col_name | data_type | comment |
+| | NULL | NULL |
+| x | int | None |
+| y | int | None |
+| s | string | None |
+| | NULL | NULL |
+| # Detailed Table Information | NULL | NULL |
+| Database: | views | NULL |
+| Owner: | doc_demo | NULL |
+| CreateTime: | Mon Jul 08 15:56:27 EDT 2013 | NULL |
+| LastAccessTime: | UNKNOWN | NULL |
+| Protect Mode: | None | NULL |
+| Retention: | 0 | NULL |
+<strong class="ph b">| Table Type: | VIRTUAL_VIEW | NULL |</strong>
+| Table Parameters: | NULL | NULL |
+| | transient_lastDdlTime | 1373313387 |
+| | NULL | NULL |
+| # Storage Information | NULL | NULL |
+| SerDe Library: | null | NULL |
+| InputFormat: | null | NULL |
+| OutputFormat: | null | NULL |
+| Compressed: | No | NULL |
+| Num Buckets: | 0 | NULL |
+| Bucket Columns: | [] | NULL |
+| Sort Columns: | [] | NULL |
+| | NULL | NULL |
+| # View Information | NULL | NULL |
+<strong class="ph b">| View Original Text: | SELECT * FROM t1 | NULL |
+| View Expanded Text: | SELECT * FROM t1 | NULL |</strong>
++------------------------------+------------------------------+------------+
+</code></pre>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_views.html#views">Overview of Impala Views</a>, <a class="xref" href="impala_create_view.html#create_view">CREATE VIEW Statement</a>,
+ <a class="xref" href="impala_drop_view.html#drop_view">DROP VIEW Statement</a>
+ </p>
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_langref_sql.html">Impala SQL Statements</a></div></div></nav></article></main></body></html>
[37/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_datetime_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_datetime_functions.html b/docs/build3x/html/topics/impala_datetime_functions.html
new file mode 100644
index 0000000..61ae72a
--- /dev/null
+++ b/docs/build3x/html/topics/impala_datetime_functions.html
@@ -0,0 +1,3105 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="datetime_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Date and Time Functions</title></head><body id="datetime_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Date and Time Functions</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The underlying Impala data type for date and time data is
+ <code class="ph codeph"><a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP</a></code>, which has both a date and a
+ time portion. Functions that extract a single field, such as <code class="ph codeph">hour()</code> or
+ <code class="ph codeph">minute()</code>, typically return an integer value. Functions that format the date portion, such as
+ <code class="ph codeph">date_add()</code> or <code class="ph codeph">to_date()</code>, typically return a string value.
+ </p>
+
+ <p class="p">
+ You can also adjust a <code class="ph codeph">TIMESTAMP</code> value by adding or subtracting an <code class="ph codeph">INTERVAL</code>
+ expression. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details. <code class="ph codeph">INTERVAL</code>
+ expressions are also allowed as the second argument for the <code class="ph codeph">date_add()</code> and
+ <code class="ph codeph">date_sub()</code> functions, rather than integers.
+ </p>
+
+ <p class="p">
+ Some of these functions are affected by the setting of the
+ <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+ <span class="keyword cmdname">impalad</span> daemon. This setting is off by default, meaning that
+ functions such as <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code>
+ consider the input values to always represent the UTC time zone.
+ This setting also applies when you <code class="ph codeph">CAST()</code> a <code class="ph codeph">BIGINT</code>
+ value to <code class="ph codeph">TIMESTAMP</code>, or a <code class="ph codeph">TIMESTAMP</code>
+ value to <code class="ph codeph">BIGINT</code>.
+ When this setting is enabled, these functions and operations convert to and from
+ values representing the local time zone.
+ See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about how
+ Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Function reference:</strong>
+ </p>
+
+ <p class="p">
+ Impala supports the following data and time functions:
+ </p>
+
+
+
+ <dl class="dl">
+
+
+ <dt class="dt dlterm" id="datetime_functions__add_months">
+ <code class="ph codeph">add_months(timestamp date, int months)</code>, <code class="ph codeph">add_months(timestamp date, bigint
+ months)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Same as <code class="ph codeph"><a class="xref" href="#datetime_functions__months_add">months_add()</a></code>.
+ Available in Impala 1.4 and higher. For
+ compatibility when porting code with vendor extensions.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples demonstrate adding months to construct the same
+ day of the month in a different month; how if the current day of the month
+ does not exist in the target month, the last day of that month is substituted;
+ and how a negative argument produces a return value from a previous month.
+ </p>
+<pre class="pre codeblock"><code>
+select now(), add_months(now(), 2);
++-------------------------------+-------------------------------+
+| now() | add_months(now(), 2) |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:00.429109000 | 2016-07-31 10:47:00.429109000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), 1);
++-------------------------------+-------------------------------+
+| now() | add_months(now(), 1) |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:14.540226000 | 2016-06-30 10:47:14.540226000 |
++-------------------------------+-------------------------------+
+
+select now(), add_months(now(), -1);
++-------------------------------+-------------------------------+
+| now() | add_months(now(), -1) |
++-------------------------------+-------------------------------+
+| 2016-05-31 10:47:31.732298000 | 2016-04-30 10:47:31.732298000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__adddate">
+ <code class="ph codeph">adddate(timestamp startdate, int days)</code>, <code class="ph codeph">adddate(timestamp startdate, bigint
+ days)</code>,
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+ <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+ string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how to add a number of days to a <code class="ph codeph">TIMESTAMP</code>.
+ The number of days can also be negative, which gives the same effect as the <code class="ph codeph">subdate()</code> function.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, adddate(now(), 30) as now_plus_30;
++-------------------------------+-------------------------------+
+| right_now | now_plus_30 |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:08.640111000 | 2016-06-19 10:23:08.640111000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, adddate(now(), -15) as now_minus_15;
++-------------------------------+-------------------------------+
+| right_now | now_minus_15 |
++-------------------------------+-------------------------------+
+| 2016-05-20 10:23:38.214064000 | 2016-05-05 10:23:38.214064000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__current_timestamp">
+ <code class="ph codeph">current_timestamp()</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Alias for the <code class="ph codeph">now()</code> function.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now(), current_timestamp();
++-------------------------------+-------------------------------+
+| now() | current_timestamp() |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:10:14.237849000 | 2016-05-19 16:10:14.237849000 |
++-------------------------------+-------------------------------+
+
+select current_timestamp() as right_now,
+ current_timestamp() + interval 3 hours as in_three_hours;
++-------------------------------+-------------------------------+
+| right_now | in_three_hours |
++-------------------------------+-------------------------------+
+| 2016-05-19 16:13:20.017117000 | 2016-05-19 19:13:20.017117000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__date_add">
+ <code class="ph codeph">date_add(timestamp startdate, int days)</code>, <code class="ph codeph">date_add(timestamp startdate,
+ <var class="keyword varname">interval_expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value.
+
+ With an <code class="ph codeph">INTERVAL</code>
+ expression as the second argument, you can calculate a delta value using other units such as weeks,
+ years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows the simplest usage, of adding a specified number of days
+ to a <code class="ph codeph">TIMESTAMP</code> value:
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), 7) as next_week;
++-------------------------------+-------------------------------+
+| right_now | next_week |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:03:48.687055000 | 2016-05-27 11:03:48.687055000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+ <p class="p">
+ The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+ expression, instead of specifying the precise number of days.
+ The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+ a single day.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_add(now(), interval 3 weeks) as in_3_weeks;
++-------------------------------+-------------------------------+
+| right_now | in_3_weeks |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:05:39.173331000 | 2016-06-10 11:05:39.173331000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_add(now(), interval 6 hours) as in_6_hours;
++-------------------------------+-------------------------------+
+| right_now | in_6_hours |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:13:51.492536000 | 2016-05-20 17:13:51.492536000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+ <p class="p">
+ Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+ handles nonexistent dates past the end of a month by setting the date to the
+ last day of the month. The following example shows how the nonexistent date
+ April 31st is normalized to April 30th:
+ </p>
+<pre class="pre codeblock"><code>
+select date_add(cast('2016-01-31' as timestamp), interval 3 months) as 'april_31st';
++---------------------+
+| april_31st |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__date_part">
+ <code class="ph codeph">date_part(string, timestamp)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Similar to
+ <a class="xref" href="impala_datetime_functions.html#datetime_functions__extract"><code class="ph codeph">EXTRACT()</code></a>,
+ with the argument order reversed. Supports the same date and time units as <code class="ph codeph">EXTRACT()</code>.
+ For compatibility with SQL code containing vendor extensions.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select date_part('year',now()) as current_year;
++--------------+
+| current_year |
++--------------+
+| 2016 |
++--------------+
+
+select date_part('hour',now()) as hour_of_day;
++-------------+
+| hour_of_day |
++-------------+
+| 11 |
++-------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__date_sub">
+ <code class="ph codeph">date_sub(timestamp startdate, int days)</code>, <code class="ph codeph">date_sub(timestamp startdate,
+ <var class="keyword varname">interval_expression</var>)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value.
+
+ With an
+ <code class="ph codeph">INTERVAL</code> expression as the second argument, you can calculate a delta value using other
+ units such as weeks, years, hours, seconds, and so on; see <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+ for details.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows the simplest usage, of subtracting a specified number of days
+ from a <code class="ph codeph">TIMESTAMP</code> value:
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), 7) as last_week;
++-------------------------------+-------------------------------+
+| right_now | last_week |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:21:30.491011000 | 2016-05-13 11:21:30.491011000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ <p class="p">
+ The following examples show the shorthand notation of an <code class="ph codeph">INTERVAL</code>
+ expression, instead of specifying the precise number of days.
+ The <code class="ph codeph">INTERVAL</code> notation also lets you work with units smaller than
+ a single day.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, date_sub(now(), interval 3 weeks) as 3_weeks_ago;
++-------------------------------+-------------------------------+
+| right_now | 3_weeks_ago |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:05.176953000 | 2016-04-29 11:23:05.176953000 |
++-------------------------------+-------------------------------+
+
+select now() as right_now, date_sub(now(), interval 6 hours) as 6_hours_ago;
++-------------------------------+-------------------------------+
+| right_now | 6_hours_ago |
++-------------------------------+-------------------------------+
+| 2016-05-20 11:23:35.439631000 | 2016-05-20 05:23:35.439631000 |
++-------------------------------+-------------------------------+
+</code></pre>
+
+ <p class="p">
+ Like all date/time functions that deal with months, <code class="ph codeph">date_add()</code>
+ handles nonexistent dates past the end of a month by setting the date to the
+ last day of the month. The following example shows how the nonexistent date
+ April 31st is normalized to April 30th:
+ </p>
+<pre class="pre codeblock"><code>
+select date_sub(cast('2016-05-31' as timestamp), interval 1 months) as 'april_31st';
++---------------------+
+| april_31st |
++---------------------+
+| 2016-04-30 00:00:00 |
++---------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__date_trunc">
+ <code class="ph codeph">date_trunc(string unit, timestamp)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Truncates a <code class="ph codeph">TIMESTAMP</code> value to the specified precision.
+ <p class="p">
+ <strong class="ph b">Unit argument:</strong> The <code class="ph codeph">unit</code> argument value for truncating
+ <code class="ph codeph">TIMESTAMP</code> values is not case-sensitive. This argument string
+ can be one of:
+ </p>
+ <ul class="ul">
+ <li class="li">microseconds</li>
+ <li class="li">milliseconds</li>
+ <li class="li">second</li>
+ <li class="li">minute</li>
+ <li class="li">hour</li>
+ <li class="li">day</li>
+ <li class="li">week</li>
+ <li class="li">month</li>
+ <li class="li">year</li>
+ <li class="li">decade</li>
+ <li class="li">century</li>
+ <li class="li">millennium</li>
+ </ul>
+ <p class="p">
+ For example, calling <code class="ph codeph">date_trunc('hour',ts)</code> truncates
+ <code class="ph codeph">ts</code> to the beginning of the corresponding hour, with
+ all minutes, seconds, milliseconds, and so on set to zero. Calling
+ <code class="ph codeph">date_trunc('milliseconds',ts)</code> truncates
+ <code class="ph codeph">ts</code> to the beginning of the corresponding millisecond,
+ with all microseconds and nanoseconds set to zero.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The sub-second units are specified in plural form. All units representing
+ one second or more are specified in singular form.
+ </div>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.11.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Although this function is similar to calling <code class="ph codeph">TRUNC()</code>
+ with a <code class="ph codeph">TIMESTAMP</code> argument, the order of arguments
+ and the recognized units are different between <code class="ph codeph">TRUNC()</code>
+ and <code class="ph codeph">DATE_TRUNC()</code>. Therefore, these functions are not
+ interchangeable.
+ </p>
+ <p class="p">
+ This function is typically used in <code class="ph codeph">GROUP BY</code>
+ queries to aggregate results from the same hour, day, week, month, quarter, and so on.
+ You can also use this function in an <code class="ph codeph">INSERT ... SELECT</code> into a
+ partitioned table to divide <code class="ph codeph">TIMESTAMP</code> values into the correct partition.
+ </p>
+ <p class="p">
+ Because the return value is a <code class="ph codeph">TIMESTAMP</code>, if you cast the result of
+ <code class="ph codeph">DATE_TRUNC()</code> to <code class="ph codeph">STRING</code>, you will often see zeroed-out portions such as
+ <code class="ph codeph">00:00:00</code> in the time field. If you only need the individual units such as hour, day,
+ month, or year, use the <code class="ph codeph">EXTRACT()</code> function instead. If you need the individual units
+ from a truncated <code class="ph codeph">TIMESTAMP</code> value, run the <code class="ph codeph">TRUNCATE()</code> function on the
+ original value, then run <code class="ph codeph">EXTRACT()</code> on the result.
+ </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how to call <code class="ph codeph">DATE_TRUNC()</code> with different unit values:
+ </p>
+<pre class="pre codeblock"><code>
+select now(), date_trunc('second', now());
++-------------------------------+-----------------------------------+
+| now() | date_trunc('second', now()) |
++-------------------------------+-----------------------------------+
+| 2017-12-05 13:58:04.565403000 | 2017-12-05 13:58:04 |
++-------------------------------+-----------------------------------+
+
+select now(), date_trunc('hour', now());
++-------------------------------+---------------------------+
+| now() | date_trunc('hour', now()) |
++-------------------------------+---------------------------+
+| 2017-12-05 13:59:01.884459000 | 2017-12-05 13:00:00 |
++-------------------------------+---------------------------+
+
+select now(), date_trunc('millennium', now());
++-------------------------------+---------------------------------+
+| now() | date_trunc('millennium', now()) |
++-------------------------------+---------------------------------+
+| 2017-12-05 14:00:30.296812000 | 2000-01-01 00:00:00 |
++-------------------------------+---------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__datediff">
+ <code class="ph codeph">datediff(timestamp enddate, timestamp startdate)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the number of days between two <code class="ph codeph">TIMESTAMP</code> values.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ If the first argument represents a later date than the second argument,
+ the return value is positive. If both arguments represent the same date,
+ the return value is zero. The time portions of the <code class="ph codeph">TIMESTAMP</code>
+ values are irrelevant. For example, 11:59 PM on one day and 12:01 on the next
+ day represent a <code class="ph codeph">datediff()</code> of -1 because the date/time values
+ represent different days, even though the <code class="ph codeph">TIMESTAMP</code> values differ by only 2 minutes.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows how comparing a <span class="q">"late"</span> value with
+ an <span class="q">"earlier"</span> value produces a positive number. In this case,
+ the result is (365 * 5) + 1, because one of the intervening years is
+ a leap year.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now() + interval 5 years, now()) as in_5_years;
++-------------------------------+------------+
+| right_now | in_5_years |
++-------------------------------+------------+
+| 2016-05-20 13:43:55.873826000 | 1826 |
++-------------------------------+------------+
+</code></pre>
+ <p class="p">
+ The following examples show how the return value represent the number of days
+ between the associated dates, regardless of the time portion of each <code class="ph codeph">TIMESTAMP</code>.
+ For example, different times on the same day produce a <code class="ph codeph">date_diff()</code> of 0,
+ regardless of which one is earlier or later. But if the arguments represent different dates,
+ <code class="ph codeph">date_diff()</code> returns a non-zero integer value, regardless of the time portions
+ of the dates.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, datediff(now(), now() + interval 4 hours) as in_4_hours;
++-------------------------------+------------+
+| right_now | in_4_hours |
++-------------------------------+------------+
+| 2016-05-20 13:42:05.302747000 | 0 |
++-------------------------------+------------+
+
+select now() as right_now, datediff(now(), now() - interval 4 hours) as 4_hours_ago;
++-------------------------------+-------------+
+| right_now | 4_hours_ago |
++-------------------------------+-------------+
+| 2016-05-20 13:42:21.134958000 | 0 |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() + interval 12 hours) as in_12_hours;
++-------------------------------+-------------+
+| right_now | in_12_hours |
++-------------------------------+-------------+
+| 2016-05-20 13:42:44.765873000 | -1 |
++-------------------------------+-------------+
+
+select now() as right_now, datediff(now(), now() - interval 18 hours) as 18_hours_ago;
++-------------------------------+--------------+
+| right_now | 18_hours_ago |
++-------------------------------+--------------+
+| 2016-05-20 13:54:38.829827000 | 1 |
++-------------------------------+--------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__day">
+ <code class="ph codeph">day(timestamp date), <span class="ph" id="datetime_functions__dayofmonth">dayofmonth(timestamp date)</span></code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+ The value represents the day of the month, therefore is in the range 1-31, or less for
+ months without 31 days.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how the day value corresponds to the day
+ of the month, resetting back to 1 at the start of each month.
+ </p>
+<pre class="pre codeblock"><code>
+select now(), day(now());
++-------------------------------+------------+
+| now() | day(now()) |
++-------------------------------+------------+
+| 2016-05-20 15:01:51.042185000 | 20 |
++-------------------------------+------------+
+
+select now() + interval 11 days, day(now() + interval 11 days);
++-------------------------------+-------------------------------+
+| now() + interval 11 days | day(now() + interval 11 days) |
++-------------------------------+-------------------------------+
+| 2016-05-31 15:05:56.843139000 | 31 |
++-------------------------------+-------------------------------+
+
+select now() + interval 12 days, day(now() + interval 12 days);
++-------------------------------+-------------------------------+
+| now() + interval 12 days | day(now() + interval 12 days) |
++-------------------------------+-------------------------------+
+| 2016-06-01 15:06:05.074236000 | 1 |
++-------------------------------+-------------------------------+
+</code></pre>
+ <p class="p">
+ The following examples show how the day value is <code class="ph codeph">NULL</code>
+ for nonexistent dates or misformatted date strings.
+ </p>
+<pre class="pre codeblock"><code>
+-- 2016 is a leap year, so it has a Feb. 29.
+select day('2016-02-29');
++-------------------+
+| day('2016-02-29') |
++-------------------+
+| 29 |
++-------------------+
+
+-- 2015 is not a leap year, so Feb. 29 is nonexistent.
+select day('2015-02-29');
++-------------------+
+| day('2015-02-29') |
++-------------------+
+| NULL |
++-------------------+
+
+-- A string that does not match the expected YYYY-MM-DD format
+-- produces an invalid TIMESTAMP, causing day() to return NULL.
+select day('2016-02-028');
++--------------------+
+| day('2016-02-028') |
++--------------------+
+| NULL |
++--------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__dayname">
+ <code class="ph codeph">dayname(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, converted to the string
+ corresponding to that day name. The range of return values is <code class="ph codeph">'Sunday'</code> to
+ <code class="ph codeph">'Saturday'</code>. Used in report-generating queries, as an alternative to calling
+ <code class="ph codeph">dayofweek()</code> and turning that numeric return value into a string using a
+ <code class="ph codeph">CASE</code> expression.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show the day name associated with
+ <code class="ph codeph">TIMESTAMP</code> values representing different days.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ dayofweek(now()) as todays_day_of_week,
+ dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3 | Tuesday |
++-------------------------------+--------------------+-----------------+
+
+select now() + interval 1 day as tomorrow,
+ dayname(now() + interval 1 day) as tomorrows_day_name;
++-------------------------------+--------------------+
+| tomorrow | tomorrows_day_name |
++-------------------------------+--------------------+
+| 2016-06-01 10:58:53.945761000 | Wednesday |
++-------------------------------+--------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__dayofweek">
+ <code class="ph codeph">dayofweek(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the day field from the date portion of a <code class="ph codeph">TIMESTAMP</code>, corresponding to the day of
+ the week. The range of return values is 1 (Sunday) to 7 (Saturday).
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ dayofweek(now()) as todays_day_of_week,
+ dayname(now()) as todays_day_name;
++-------------------------------+--------------------+-----------------+
+| right_now | todays_day_of_week | todays_day_name |
++-------------------------------+--------------------+-----------------+
+| 2016-05-31 10:57:03.953670000 | 3 | Tuesday |
++-------------------------------+--------------------+-----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__dayofyear">
+ <code class="ph codeph">dayofyear(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the day field from a <code class="ph codeph">TIMESTAMP</code> value, corresponding to the day
+ of the year. The range of return values is 1 (January 1) to 366 (December 31 of a leap year).
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show return values from the
+ <code class="ph codeph">dayofyear()</code> function. The same date
+ in different years returns a different day number
+ for all dates after February 28,
+ because 2016 is a leap year while 2015 is not a leap year.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ dayofyear(now()) as today_day_of_year;
++-------------------------------+-------------------+
+| right_now | today_day_of_year |
++-------------------------------+-------------------+
+| 2016-05-31 11:05:48.314932000 | 152 |
++-------------------------------+-------------------+
+
+select now() - interval 1 year as last_year,
+ dayofyear(now() - interval 1 year) as year_ago_day_of_year;
++-------------------------------+----------------------+
+| last_year | year_ago_day_of_year |
++-------------------------------+----------------------+
+| 2015-05-31 11:07:03.733689000 | 151 |
++-------------------------------+----------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__days_add">
+ <code class="ph codeph">days_add(timestamp startdate, int days)</code>, <code class="ph codeph">days_add(timestamp startdate, bigint
+ days)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Adds a specified number of days to a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+ <code class="ph codeph">date_add()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+ string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_add(now(), 31) as 31_days_later;
++-------------------------------+-------------------------------+
+| right_now | 31_days_later |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:12:32.216764000 | 2016-07-01 11:12:32.216764000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__days_sub">
+ <code class="ph codeph">days_sub(timestamp startdate, int days)</code>, <code class="ph codeph">days_sub(timestamp startdate, bigint
+ days)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Subtracts a specified number of days from a <code class="ph codeph">TIMESTAMP</code> value. Similar to
+ <code class="ph codeph">date_sub()</code>, but starts with an actual <code class="ph codeph">TIMESTAMP</code> value instead of a
+ string that is converted to a <code class="ph codeph">TIMESTAMP</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, days_sub(now(), 31) as 31_days_ago;
++-------------------------------+-------------------------------+
+| right_now | 31_days_ago |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:13:42.163905000 | 2016-04-30 11:13:42.163905000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__extract">
+ <code class="ph codeph">extract(timestamp, string unit)</code>, <code class="ph codeph">extract(unit FROM timestamp)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns one of the numeric date or time fields from a
+ <code class="ph codeph">TIMESTAMP</code> value.
+ <p class="p">
+ <strong class="ph b">Unit argument:</strong> The <code class="ph codeph">unit</code> string can be one of
+ <code class="ph codeph">epoch</code>, <code class="ph codeph">year</code>,
+ <code class="ph codeph">quarter</code>, <code class="ph codeph">month</code>,
+ <code class="ph codeph">day</code>, <code class="ph codeph">hour</code>,
+ <code class="ph codeph">minute</code>, <code class="ph codeph">second</code>, or
+ <code class="ph codeph">millisecond</code>. This argument value is
+ case-insensitive.
+ </p>
+ <div class="p"> In Impala 2.0 and higher, you can use special syntax
+ rather than a regular function call, for compatibility with code
+ that uses the SQL-99 format with the <code class="ph codeph">FROM</code> keyword.
+ With this style, the unit names are identifiers rather than
+ <code class="ph codeph">STRING</code> literals. For example, the following calls
+ are both equivalent:
+ <pre class="pre codeblock"><code>extract(year from now());
+extract(now(), "year");
+</code></pre>
+ </div>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p"> Typically used in <code class="ph codeph">GROUP BY</code> queries to arrange
+ results by hour, day, month, and so on. You can also use this
+ function in an <code class="ph codeph">INSERT ... SELECT</code> into a partitioned
+ table to split up <code class="ph codeph">TIMESTAMP</code> values into individual
+ parts, if the partitioned table has separate partition key columns
+ representing year, month, day, and so on. If you need to divide by
+ more complex units of time, such as by week or by quarter, use the
+ <code class="ph codeph">TRUNC()</code> function instead. </p>
+ <p class="p">
+ <strong class="ph b">Return type:</strong>
+ <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <pre class="pre codeblock"><code>
+select now() as right_now,
+ extract(year from now()) as this_year,
+ extract(month from now()) as this_month;
++-------------------------------+-----------+------------+
+| right_now | this_year | this_month |
++-------------------------------+-----------+------------+
+| 2016-05-31 11:18:43.310328000 | 2016 | 5 |
++-------------------------------+-----------+------------+
+
+select now() as right_now,
+ extract(day from now()) as this_day,
+ extract(hour from now()) as this_hour;
++-------------------------------+----------+-----------+
+| right_now | this_day | this_hour |
++-------------------------------+----------+-----------+
+| 2016-05-31 11:19:24.025303000 | 31 | 11 |
++-------------------------------+----------+-----------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__from_timestamp">
+ <code class="ph codeph">from_timestamp(datetime timestamp, pattern string)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Converts a <code class="ph codeph">TIMESTAMP</code> value into a
+ string representing the same value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The <code class="ph codeph">from_timestamp()</code> function provides a flexible way to convert <code class="ph codeph">TIMESTAMP</code>
+ values into arbitrary string formats for reporting purposes.
+ </p>
+ <p class="p">
+ Because Impala implicitly converts string values into <code class="ph codeph">TIMESTAMP</code>, you can
+ pass date/time values represented as strings (in the standard <code class="ph codeph">yyyy-MM-dd HH:mm:ss.SSS</code> format)
+ to this function. The result is a string using different separator characters, order of fields, spelled-out month
+ names, or other variation of the date/time string representation.
+ </p>
+ <p class="p">
+ The allowed tokens for the pattern string are the same as for the <code class="ph codeph">from_unixtime()</code> function.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show different ways to format a <code class="ph codeph">TIMESTAMP</code>
+ value as a string:
+ </p>
+<pre class="pre codeblock"><code>
+-- Reformat arbitrary TIMESTAMP value.
+select from_timestamp(now(), 'yyyy/MM/dd');
++-------------------------------------+
+| from_timestamp(now(), 'yyyy/mm/dd') |
++-------------------------------------+
+| 2017/10/01 |
++-------------------------------------+
+
+-- Reformat string literal representing date/time.
+select from_timestamp('1984-09-25', 'yyyy/MM/dd');
++--------------------------------------------+
+| from_timestamp('1984-09-25', 'yyyy/mm/dd') |
++--------------------------------------------+
+| 1984/09/25 |
++--------------------------------------------+
+
+-- Alternative format for reporting purposes.
+select from_timestamp('1984-09-25 16:45:30.125', 'MMM dd, yyyy HH:mm:ss.SSS');
++------------------------------------------------------------------------+
+| from_timestamp('1984-09-25 16:45:30.125', 'mmm dd, yyyy hh:mm:ss.sss') |
++------------------------------------------------------------------------+
+| Sep 25, 1984 16:45:30.125 |
++------------------------------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__from_unixtime">
+ <code class="ph codeph">from_unixtime(bigint unixtime[, string format])</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Converts the number of seconds from the Unix epoch to the specified time into a string in
+ the local time zone.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ <p class="p">
+ In Impala 2.2.0 and higher, built-in functions that accept or return integers representing <code class="ph codeph">TIMESTAMP</code> values
+ use the <code class="ph codeph">BIGINT</code> type for parameters and return values, rather than <code class="ph codeph">INT</code>.
+ This change lets the date and time functions avoid an overflow error that would otherwise occur
+ on January 19th, 2038 (known as the
+ <a class="xref" href="http://en.wikipedia.org/wiki/Year_2038_problem" target="_blank"><span class="q">"Year 2038 problem"</span> or <span class="q">"Y2K38 problem"</span></a>).
+ This change affects the <code class="ph codeph">from_unixtime()</code> and <code class="ph codeph">unix_timestamp()</code> functions.
+ You might need to change application code that interacts with these functions, change the types of
+ columns that store the return values, or add <code class="ph codeph">CAST()</code> calls to SQL statements that
+ call these functions.
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The format string accepts the variations allowed for the <code class="ph codeph">TIMESTAMP</code>
+ data type: date plus time, date by itself, time by itself, and optional fractional seconds for the
+ time. See <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details.
+ </p>
+ <p class="p">
+ Currently, the format string is case-sensitive, especially to distinguish <code class="ph codeph">m</code> for
+ minutes and <code class="ph codeph">M</code> for months. In Impala 1.3 and later, you can switch the order of
+ elements, use alternative separator characters, and use a different number of placeholders for each
+ unit. Adding more instances of <code class="ph codeph">y</code>, <code class="ph codeph">d</code>, <code class="ph codeph">H</code>, and so on
+ produces output strings zero-padded to the requested number of characters. The exception is
+ <code class="ph codeph">M</code> for months, where <code class="ph codeph">M</code> produces a non-padded value such as
+ <code class="ph codeph">3</code>, <code class="ph codeph">MM</code> produces a zero-padded value such as <code class="ph codeph">03</code>,
+ <code class="ph codeph">MMM</code> produces an abbreviated month name such as <code class="ph codeph">Mar</code>, and sequences of
+ 4 or more <code class="ph codeph">M</code> are not allowed. A date string including all fields could be
+ <code class="ph codeph">"yyyy-MM-dd HH:mm:ss.SSSSSS"</code>, <code class="ph codeph">"dd/MM/yyyy HH:mm:ss.SSSSSS"</code>,
+ <code class="ph codeph">"MMM dd, yyyy HH.mm.ss (SSSSSS)"</code> or other combinations of placeholders and separator
+ characters.
+ </p>
+ <p class="p">
+ The way this function deals with time zones when converting to or from <code class="ph codeph">TIMESTAMP</code>
+ values is affected by the <code class="ph codeph">--use_local_tz_for_unix_timestamp_conversions</code> startup flag for the
+ <span class="keyword cmdname">impalad</span> daemon. See <a class="xref" href="../shared/../topics/impala_timestamp.html#timestamp">TIMESTAMP Data Type</a> for details about
+ how Impala handles time zone considerations for the <code class="ph codeph">TIMESTAMP</code> data type.
+ </p>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The more flexible format strings allowed with the built-in functions do not change the rules about
+ using <code class="ph codeph">CAST()</code> to convert from a string to a <code class="ph codeph">TIMESTAMP</code> value. Strings
+ being converted through <code class="ph codeph">CAST()</code> must still have the elements in the specified order and use the specified delimiter
+ characters, as described in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>.
+ </p>
+ </div>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>select from_unixtime(1392394861,"yyyy-MM-dd HH:mm:ss.SSSS");
++-------------------------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd hh:mm:ss.ssss') |
++-------------------------------------------------------+
+| 2014-02-14 16:21:01.0000 |
++-------------------------------------------------------+
+
+select from_unixtime(1392394861,"yyyy-MM-dd");
++-----------------------------------------+
+| from_unixtime(1392394861, 'yyyy-mm-dd') |
++-----------------------------------------+
+| 2014-02-14 |
++-----------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss.SSSS");
++--------------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss.ssss') |
++--------------------------------------------+
+| 16:21:01.0000 |
++--------------------------------------------+
+
+select from_unixtime(1392394861,"HH:mm:ss");
++---------------------------------------+
+| from_unixtime(1392394861, 'hh:mm:ss') |
++---------------------------------------+
+| 16:21:01 |
++---------------------------------------+</code></pre>
+ <div class="p">
+ <code class="ph codeph">unix_timestamp()</code> and <code class="ph codeph">from_unixtime()</code> are often used in combination to
+ convert a <code class="ph codeph">TIMESTAMP</code> value into a particular string format. For example:
+<pre class="pre codeblock"><code>select from_unixtime(unix_timestamp(now() + interval 3 days),
+ 'yyyy/MM/dd HH:mm') as yyyy_mm_dd_hh_mm;
++------------------+
+| yyyy_mm_dd_hh_mm |
++------------------+
+| 2016/06/03 11:38 |
++------------------+
+</code></pre>
+ </div>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__from_utc_timestamp">
+ <code class="ph codeph">from_utc_timestamp(timestamp, string timezone)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Converts a specified UTC timestamp value into the appropriate value for a specified time
+ zone.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong> Often used to translate UTC time zone data stored in a table back to the local
+ date and time for reporting. The opposite of the <code class="ph codeph">to_utc_timestamp()</code> function.
+ </p>
+ <p class="p">
+ To determine the time zone of the server you are connected to, in <span class="keyword">Impala 2.3</span> and
+ higher you can call the <code class="ph codeph">timeofday()</code> function, which includes the time zone
+ specifier in its return value. Remember that with cloud computing, the server you interact
+ with might be in a different time zone than you are, or different sessions might connect to
+ servers in different time zones, or a cluster might include servers in more than one time zone.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ See discussion of time zones in <a class="xref" href="impala_timestamp.html#timestamp">TIMESTAMP Data Type</a>
+ for information about using this function for conversions between the local time zone and UTC.
+ </p>
+ <p class="p">
+ The following example shows how when <code class="ph codeph">TIMESTAMP</code> values representing the UTC time zone
+ are stored in a table, a query can display the equivalent local date and time for a different time zone.
+ </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-06-02 16:25:36.116143000' as timestamp) as utc_datetime)
+ select utc_datetime as 'Date/time in Greenwich UK',
+ from_utc_timestamp(utc_datetime, 'PDT')
+ as 'Equivalent in California USA'
+ from t1;
++-------------------------------+-------------------------------+
+| date/time in greenwich uk | equivalent in california usa |
++-------------------------------+-------------------------------+
+| 2016-06-02 16:25:36.116143000 | 2016-06-02 09:25:36.116143000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ <p class="p">
+ The following example shows that for a date and time when daylight savings
+ is in effect (<code class="ph codeph">PDT</code>), the UTC time
+ is 7 hours ahead of the local California time; while when daylight savings
+ is not in effect (<code class="ph codeph">PST</code>), the UTC time is 8 hours ahead of
+ the local California time.
+ </p>
+<pre class="pre codeblock"><code>
+select now() as local_datetime,
+ to_utc_timestamp(now(), 'PDT') as utc_datetime;
++-------------------------------+-------------------------------+
+| local_datetime | utc_datetime |
++-------------------------------+-------------------------------+
+| 2016-05-31 11:50:02.316883000 | 2016-05-31 18:50:02.316883000 |
++-------------------------------+-------------------------------+
+
+select '2016-01-05' as local_datetime,
+ to_utc_timestamp('2016-01-05', 'PST') as utc_datetime;
++----------------+---------------------+
+| local_datetime | utc_datetime |
++----------------+---------------------+
+| 2016-01-05 | 2016-01-05 08:00:00 |
++----------------+---------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__hour">
+ <code class="ph codeph">hour(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the hour field from a <code class="ph codeph">TIMESTAMP</code> field.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, hour(now()) as current_hour;
++-------------------------------+--------------+
+| right_now | current_hour |
++-------------------------------+--------------+
+| 2016-06-01 14:14:12.472846000 | 14 |
++-------------------------------+--------------+
+
+select now() + interval 12 hours as 12_hours_from_now,
+ hour(now() + interval 12 hours) as hour_in_12_hours;
++-------------------------------+-------------------+
+| 12_hours_from_now | hour_in_12_hours |
++-------------------------------+-------------------+
+| 2016-06-02 02:15:32.454750000 | 2 |
++-------------------------------+-------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__hours_add">
+ <code class="ph codeph">hours_add(timestamp date, int hours)</code>, <code class="ph codeph">hours_add(timestamp date, bigint
+ hours)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of hours.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ hours_add(now(), 12) as in_12_hours;
++-------------------------------+-------------------------------+
+| right_now | in_12_hours |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:19:48.948107000 | 2016-06-02 02:19:48.948107000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__hours_sub">
+ <code class="ph codeph">hours_sub(timestamp date, int hours)</code>, <code class="ph codeph">hours_sub(timestamp date, bigint
+ hours)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of hours.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ hours_sub(now(), 18) as 18_hours_ago;
++-------------------------------+-------------------------------+
+| right_now | 18_hours_ago |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:23:13.868150000 | 2016-05-31 20:23:13.868150000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__int_months_between">
+ <code class="ph codeph">int_months_between(timestamp newer, timestamp older)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values,
+ as an <code class="ph codeph">INT</code> representing only the full months that passed.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically used in business contexts, for example to determine whether
+ a specified number of months have passed or whether some end-of-month deadline was reached.
+ </p>
+ <p class="p">
+ The method of determining the number of elapsed months includes some special handling of
+ months with different numbers of days that creates edge cases for dates between the
+ 28th and 31st days of certain months. See <code class="ph codeph">months_between()</code> for details.
+ The <code class="ph codeph">int_months_between()</code> result is essentially the <code class="ph codeph">floor()</code>
+ of the <code class="ph codeph">months_between()</code> result.
+ </p>
+ <p class="p">
+ If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+ nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+ the result is also <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ If the first argument represents an earlier time than the second argument, the result is negative.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>/* Less than a full month = 0. */
+select int_months_between('2015-02-28', '2015-01-29');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-29') |
++------------------------------------------------+
+| 0 |
++------------------------------------------------+
+
+/* Last day of month to last day of next month = 1. */
+select int_months_between('2015-02-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-02-28', '2015-01-31') |
++------------------------------------------------+
+| 1 |
++------------------------------------------------+
+
+/* Slightly less than 2 months = 1. */
+select int_months_between('2015-03-28', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-28', '2015-01-31') |
++------------------------------------------------+
+| 1 |
++------------------------------------------------+
+
+/* 2 full months (identical days of the month) = 2. */
+select int_months_between('2015-03-31', '2015-01-31');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-31') |
++------------------------------------------------+
+| 2 |
++------------------------------------------------+
+
+/* Last day of month to last day of month-after-next = 2. */
+select int_months_between('2015-03-31', '2015-01-30');
++------------------------------------------------+
+| int_months_between('2015-03-31', '2015-01-30') |
++------------------------------------------------+
+| 2 |
++------------------------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__last_day">
+ <code class="ph codeph">last_day(timestamp t)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns a <code class="ph codeph">TIMESTAMP</code> corresponding to
+ the beginning of the last calendar day in the same month as the
+ <code class="ph codeph">TIMESTAMP</code> argument.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.9.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ If the input argument does not represent a valid Impala <code class="ph codeph">TIMESTAMP</code>
+ including both date and time portions, the function returns <code class="ph codeph">NULL</code>.
+ For example, if the input argument is a string that cannot be implicitly cast to
+ <code class="ph codeph">TIMESTAMP</code>, does not include a date portion, or is out of the
+ allowed range for Impala <code class="ph codeph">TIMESTAMP</code> values, the function returns
+ <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows how to examine the current date, and dates around the
+ end of the month, as <code class="ph codeph">TIMESTAMP</code> values with any time portion removed:
+ </p>
+<pre class="pre codeblock"><code>
+select
+ now() as right_now
+ , trunc(now(),'dd') as today
+ , last_day(now()) as last_day_of_month
+ , last_day(now()) + interval 1 day as first_of_next_month;
++-------------------------------+---------------------+---------------------+---------------------+
+| right_now | today | last_day_of_month | first_of_next_month |
++-------------------------------+---------------------+---------------------+---------------------+
+| 2017-08-15 15:07:58.823812000 | 2017-08-15 00:00:00 | 2017-08-31 00:00:00 | 2017-09-01 00:00:00 |
++-------------------------------+---------------------+---------------------+---------------------+
+</code></pre>
+ <p class="p">
+ The following example shows how to examine the current date and dates around the
+ end of the month as integers representing the day of the month:
+ </p>
+<pre class="pre codeblock"><code>
+select
+ now() as right_now
+ , dayofmonth(now()) as day
+ , extract(day from now()) as also_day
+ , dayofmonth(last_day(now())) as last_day
+ , extract(day from last_day(now())) as also_last_day;
++-------------------------------+-----+----------+----------+---------------+
+| right_now | day | also_day | last_day | also_last_day |
++-------------------------------+-----+----------+----------+---------------+
+| 2017-08-15 15:07:59.417755000 | 15 | 15 | 31 | 31 |
++-------------------------------+-----+----------+----------+---------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__microseconds_add">
+ <code class="ph codeph">microseconds_add(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_add(timestamp
+ date, bigint microseconds)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of microseconds.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ microseconds_add(now(), 500000) as half_a_second_from_now;
++-------------------------------+-------------------------------+
+| right_now | half_a_second_from_now |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:25:11.455051000 | 2016-06-01 14:25:11.955051000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__microseconds_sub">
+ <code class="ph codeph">microseconds_sub(timestamp date, int microseconds)</code>, <code class="ph codeph">microseconds_sub(timestamp
+ date, bigint microseconds)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of microseconds.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ microseconds_sub(now(), 500000) as half_a_second_ago;
++-------------------------------+-------------------------------+
+| right_now | half_a_second_ago |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:26:16.509990000 | 2016-06-01 14:26:16.009990000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__millisecond">
+ <code class="ph codeph">millisecond(timestamp)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the millisecond portion of a <code class="ph codeph">TIMESTAMP</code> value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.5.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ The millisecond value is truncated, not rounded, if the <code class="ph codeph">TIMESTAMP</code>
+ value contains more than 3 significant digits to the right of the decimal point.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+252.4 milliseconds truncated to 252.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now() | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:25.252400000 | 252 |
++-------------------------------+--------------------+
+
+761.767 milliseconds truncated to 761.
+
+select now(), millisecond(now());
++-------------------------------+--------------------+
+| now() | millisecond(now()) |
++-------------------------------+--------------------+
+| 2016-03-14 22:30:58.761767000 | 761 |
++-------------------------------+--------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__milliseconds_add">
+ <code class="ph codeph">milliseconds_add(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_add(timestamp
+ date, bigint milliseconds)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of milliseconds.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ milliseconds_add(now(), 1500) as 1_point_5_seconds_from_now;
++-------------------------------+-------------------------------+
+| right_now | 1_point_5_seconds_from_now |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:30.067366000 | 2016-06-01 14:30:31.567366000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__milliseconds_sub">
+ <code class="ph codeph">milliseconds_sub(timestamp date, int milliseconds)</code>, <code class="ph codeph">milliseconds_sub(timestamp
+ date, bigint milliseconds)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of milliseconds.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now,
+ milliseconds_sub(now(), 1500) as 1_point_5_seconds_ago;
++-------------------------------+-------------------------------+
+| right_now | 1_point_5_seconds_ago |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:30:53.467140000 | 2016-06-01 14:30:51.967140000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__minute">
+ <code class="ph codeph">minute(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the minute field from a <code class="ph codeph">TIMESTAMP</code> value.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minute(now()) as current_minute;
++-------------------------------+----------------+
+| right_now | current_minute |
++-------------------------------+----------------+
+| 2016-06-01 14:34:08.051702000 | 34 |
++-------------------------------+----------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__minutes_add">
+ <code class="ph codeph">minutes_add(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_add(timestamp date, bigint
+ minutes)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of minutes.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_add(now(), 90) as 90_minutes_from_now;
++-------------------------------+-------------------------------+
+| right_now | 90_minutes_from_now |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:04.887095000 | 2016-06-01 16:06:04.887095000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__minutes_sub">
+ <code class="ph codeph">minutes_sub(timestamp date, int minutes)</code>, <code class="ph codeph">minutes_sub(timestamp date, bigint
+ minutes)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time minus some number of minutes.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, minutes_sub(now(), 90) as 90_minutes_ago;
++-------------------------------+-------------------------------+
+| right_now | 90_minutes_ago |
++-------------------------------+-------------------------------+
+| 2016-06-01 14:36:32.643061000 | 2016-06-01 13:06:32.643061000 |
++-------------------------------+-------------------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__month">
+ <code class="ph codeph">month(timestamp date)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the month field, represented as an integer, from the date portion of a <code class="ph codeph">TIMESTAMP</code>.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">int</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+<pre class="pre codeblock"><code>
+select now() as right_now, month(now()) as current_month;
++-------------------------------+---------------+
+| right_now | current_month |
++-------------------------------+---------------+
+| 2016-06-01 14:43:37.141542000 | 6 |
++-------------------------------+---------------+
+</code></pre>
+ </dd>
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__monthname">
+ <code class="ph codeph">monthname(timestamp date)</code>
+ </dt>
+ <dd class="dd">
+ <strong class="ph b">Purpose:</strong> Returns the month field from a
+ <code class="ph codeph">TIMESTAMP</code> value, converted to the string
+ corresponding to that month name.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">string</code>
+ </p>
+ </dd>
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__months_add">
+ <code class="ph codeph">months_add(timestamp date, int months)</code>, <code class="ph codeph">months_add(timestamp date, bigint
+ months)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the specified date and time plus some number of months.
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">timestamp</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following example shows the effects of adding some number of
+ months to a <code class="ph codeph">TIMESTAMP</code> value, using both the
+ <code class="ph codeph">months_add()</code> function and its <code class="ph codeph">add_months()</code>
+ alias. These examples use <code class="ph codeph">trunc()</code> to strip off the time portion
+ and leave just the date.
+ </p>
+<pre class="pre codeblock"><code>
+with t1 as (select trunc(now(), 'dd') as today)
+ select today, months_add(today,1) as next_month from t1;
++---------------------+---------------------+
+| today | next_month |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select trunc(now(), 'dd') as today)
+ select today, add_months(today,1) as next_month from t1;
++---------------------+---------------------+
+| today | next_month |
++---------------------+---------------------+
+| 2016-05-19 00:00:00 | 2016-06-19 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+ <p class="p">
+ The following examples show how if <code class="ph codeph">months_add()</code>
+ would return a nonexistent date, due to different months having
+ different numbers of days, the function returns a <code class="ph codeph">TIMESTAMP</code>
+ from the last day of the relevant month. For example, adding one month
+ to January 31 produces a date of February 29th in the year 2016 (a leap year),
+ and February 28th in the year 2015 (a non-leap year).
+ </p>
+<pre class="pre codeblock"><code>
+with t1 as (select cast('2016-01-31' as timestamp) as jan_31)
+ select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31 | feb_31 |
++---------------------+---------------------+
+| 2016-01-31 00:00:00 | 2016-02-29 00:00:00 |
++---------------------+---------------------+
+
+with t1 as (select cast('2015-01-31' as timestamp) as jan_31)
+ select jan_31, months_add(jan_31,1) as feb_31 from t1;
++---------------------+---------------------+
+| jan_31 | feb_31 |
++---------------------+---------------------+
+| 2015-01-31 00:00:00 | 2015-02-28 00:00:00 |
++---------------------+---------------------+
+</code></pre>
+ </dd>
+
+
+
+
+
+ <dt class="dt dlterm" id="datetime_functions__months_between">
+ <code class="ph codeph">months_between(timestamp newer, timestamp older)</code>
+ </dt>
+
+ <dd class="dd">
+
+ <strong class="ph b">Purpose:</strong> Returns the number of months between the date portions of two <code class="ph codeph">TIMESTAMP</code> values.
+ Can include a fractional part representing extra days in addition to the full months
+ between the dates. The fractional component is computed by dividing the difference in days by 31 (regardless of the month).
+ <p class="p">
+ <strong class="ph b">Return type:</strong> <code class="ph codeph">double</code>
+ </p>
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.3.0</span>
+ </p>
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+ <p class="p">
+ Typically used in business contexts, for example to determine whether
+ a specified number of months have passed or whether some end-of-month deadline was reached.
+ </p>
+ <p class="p">
+ If the only consideration is the number of full months and any fractional value is
+ not significant, use <code class="ph codeph">int_months_between()</code> instead.
+ </p>
+ <p class="p">
+ The method of determining the number of elapsed months includes some special handling of
+ months with different numbers of days that creates edge cases for dates between the
+ 28th and 31st days of certain months.
+ </p>
+ <p class="p">
+ If either value is <code class="ph codeph">NULL</code>, which could happen for example when converting a
+ nonexistent date string such as <code class="ph codeph">'2015-02-29'</code> to a <code class="ph codeph">TIMESTAMP</code>,
+ the result is also <code class="ph codeph">NULL</code>.
+ </p>
+ <p class="p">
+ If the first argument represents an earlier time than the second argument, the result is negative.
+ </p>
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+ <p class="p">
+ The following examples show how dates that are on the same day of the month
+ are considered to be exactly N months apart, even if the months have different
+ numbers of days.
+ </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-28') |
++--------------------------------------------+
+| 1 |
++--------------------------------------------+
+
+select months_between(now(), now() + interval 1 month);
++-------------------------------------------------+
+| months_between(now(), now() + interval 1 month) |
++-------------------------------------------------+
+| -1 |
++-------------------------------------------------+
+
+select months_between(now() + interval 1 year, now());
++------------------------------------------------+
+| months_between(now() + interval 1 year, now()) |
++------------------------------------------------+
+| 12 |
++------------------------------------------------+
+</code></pre>
+ <p class="p">
+ The following examples show how dates that are on the last day of the month
+ are considered to be exactly N months apart, even if the months have different
+ numbers of days. For example, from January 28th to February 28th is exactly one
+ month because the day of the month is identical; January 31st to February 28th
+ is exactly one month because in both cases it is the last day of the month;
+ but January 29th or 30th to February 28th is considered a fractional month.
+ </p>
+<pre class="pre codeblock"><code>select months_between('2015-02-28', '2015-01-31');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-31') |
++--------------------------------------------+
+| 1 |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-29');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-29') |
++--------------------------------------------+
+| 0.967741935483871 |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-01-30');;
++--------------------------------------------+
+| months_between('2015-02-28', '2015-01-30') |
++--------------------------------------------+
+| 0.935483870967742 |
++--------------------------------------------+
+</code></pre>
+ <p class="p">
+ The following examples show how dates that are not a precise number
+ of months apart result in a fractional return value.
+ </p>
+<pre class="pre codeblock"><code>select months_between('2015-03-01', '2015-01-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-28') |
++--------------------------------------------+
+| 1.129032258064516 |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-28');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-28') |
++--------------------------------------------+
+| 0.1290322580645161 |
++--------------------------------------------+
+
+select months_between('2015-06-02', '2015-05-29');
++--------------------------------------------+
+| months_between('2015-06-02', '2015-05-29') |
++--------------------------------------------+
+| 0.1290322580645161 |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-01-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-01-25') |
++--------------------------------------------+
+| 1.225806451612903 |
++--------------------------------------------+
+
+select months_between('2015-03-01', '2015-02-25');
++--------------------------------------------+
+| months_between('2015-03-01', '2015-02-25') |
++--------------------------------------------+
+| 0.2258064516129032 |
++--------------------------------------------+
+
+select months_between('2015-02-28', '2015-02-01');
++--------------------------------------------+
+| months_between('2015-02-28', '2015-02-01') |
++--------------------------------------------+
+| 0.8709677419354839 |
++--------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839 |
++--------------------------------------------+
+</code></pre>
+ <p class="p">
+ The following examples show how the time portion of the <code class="ph codeph">TIMESTAMP</code>
+ values are irrelevant for calculating the month interval. Even the fractional part
+ of the result only depends on the number of full days between the argument values,
+ regardless of the time portion.
+ </p>
+<pre class="pre codeblock"><code>select months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00');
++--------------------------------------------------------------+
+| months_between('2015-05-28 23:00:00', '2015-04-28 11:45:00') |
++--------------------------------------------------------------+
+| 1 |
++--------------------------------------------------------------+
+
+select months_between('2015-03-28', '2015-03-01');
++--------------------------------------------+
+| months_between('2015-03-28', '2015-03-01') |
++--------------------------------------------+
+| 0.8709677419354839 |
<TRUNCATED>
[48/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_analytic_functions.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_analytic_functions.html b/docs/build3x/html/topics/impala_analytic_functions.html
new file mode 100644
index 0000000..607633d
--- /dev/null
+++ b/docs/build3x/html/topics/impala_analytic_functions.html
@@ -0,0 +1,1785 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_functions.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Im
pala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="an
alytic_functions"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Analytic Functions</title></head><body id="analytic_functions"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Analytic Functions</h1>
+
+
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+
+
+ Analytic functions (also known as window functions) are a special category of built-in functions. Like
+ aggregate functions, they examine the contents of multiple input rows to compute each output value. However,
+ rather than being limited to one result value per <code class="ph codeph">GROUP BY</code> group, they operate on
+ <dfn class="term">windows</dfn> where the input rows are ordered and grouped using flexible conditions expressed through
+ an <code class="ph codeph">OVER()</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+
+
+ <p class="p">
+ Some functions, such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">RANK()</code>, can only be used in this analytic
+ context. Some aggregate functions do double duty: when you call the aggregation functions such as
+ <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">AVG()</code>, and so on with an
+ <code class="ph codeph">OVER()</code> clause, they produce an output value for each row, based on computations across other
+ rows in the window.
+ </p>
+
+ <p class="p">
+ Although analytic functions often compute the same value you would see from an aggregate function in a
+ <code class="ph codeph">GROUP BY</code> query, the analytic functions produce a value for each row in the result set rather
+ than a single value for each group. This flexibility lets you include additional columns in the
+ <code class="ph codeph">SELECT</code> list, offering more opportunities for organizing and filtering the result set.
+ </p>
+
+ <p class="p">
+ Analytic function calls are only allowed in the <code class="ph codeph">SELECT</code> list and in the outermost
+ <code class="ph codeph">ORDER BY</code> clause of the query. During query processing, analytic functions are evaluated
+ after other query stages such as joins, <code class="ph codeph">WHERE</code>, and <code class="ph codeph">GROUP BY</code>,
+ </p>
+
+
+
+
+
+
+
+
+
+ <p class="p">
+ The rows that are part of each partition are analyzed by computations across an ordered or unordered set of
+ rows. For example, <code class="ph codeph">COUNT()</code> and <code class="ph codeph">SUM()</code> might be applied to all the rows in
+ the partition, in which case the order of analysis does not matter. The <code class="ph codeph">ORDER BY</code> clause
+ might be used inside the <code class="ph codeph">OVER()</code> clause to defines the ordering that applies to functions
+ such as <code class="ph codeph">LAG()</code> and <code class="ph codeph">FIRST_VALUE()</code>.
+ </p>
+
+
+
+
+
+ <p class="p">
+ Analytic functions are frequently used in fields such as finance and science to provide trend, outlier, and
+ bucketed analysis for large data sets. You might also see the term <span class="q">"window functions"</span> in database
+ literature, referring to the sequence of rows (the <span class="q">"window"</span>) that the function call applies to,
+ particularly when the <code class="ph codeph">OVER</code> clause includes a <code class="ph codeph">ROWS</code> or <code class="ph codeph">RANGE</code>
+ keyword.
+ </p>
+
+ <p class="p">
+ The following sections describe the analytic query clauses and the pure analytic functions provided by
+ Impala. For usage information about aggregate functions in an analytic context, see
+ <a class="xref" href="impala_aggregate_functions.html#aggregate_functions">Impala Aggregate Functions</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_functions.html">Impala Built-In Functions</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="analytic_functions__over">
+
+ <h2 class="title topictitle2" id="ariaid-title2">OVER Clause</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The <code class="ph codeph">OVER</code> clause is required for calls to pure analytic functions such as
+ <code class="ph codeph">LEAD()</code>, <code class="ph codeph">RANK()</code>, and <code class="ph codeph">FIRST_VALUE()</code>. When you include an
+ <code class="ph codeph">OVER</code> clause with calls to aggregate functions such as <code class="ph codeph">MAX()</code>,
+ <code class="ph codeph">COUNT()</code>, or <code class="ph codeph">SUM()</code>, they operate as analytic functions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>function(<var class="keyword varname">args</var>) OVER([<var class="keyword varname">partition_by_clause</var>] [<var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>]])
+
+partition_by_clause ::= PARTITION BY <var class="keyword varname">expr</var> [, <var class="keyword varname">expr</var> ...]
+order_by_clause ::= ORDER BY <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] [, <var class="keyword varname">expr</var> [ASC | DESC] [NULLS FIRST | NULLS LAST] ...]
+window_clause: See <a class="xref" href="#window_clause">Window Clause</a>
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">PARTITION BY clause:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause acts much like the <code class="ph codeph">GROUP BY</code> clause in the
+ outermost block of a query. It divides the rows into groups containing identical values in one or more
+ columns. These logical groups are known as <dfn class="term">partitions</dfn>. Throughout the discussion of analytic
+ functions, <span class="q">"partitions"</span> refers to the groups produced by the <code class="ph codeph">PARTITION BY</code> clause, not
+ to partitioned tables. However, note the following limitation that applies specifically to analytic function
+ calls involving partitioned tables.
+ </p>
+
+ <p class="p">
+ In queries involving both analytic functions and partitioned tables, partition pruning only occurs for columns named in the <code class="ph codeph">PARTITION BY</code>
+ clause of the analytic function call. For example, if an analytic function query has a clause such as <code class="ph codeph">WHERE year=2016</code>,
+ the way to make the query prune all other <code class="ph codeph">YEAR</code> partitions is to include <code class="ph codeph">PARTITION BY year</code> in the analytic function call;
+ for example, <code class="ph codeph">OVER (PARTITION BY year,<var class="keyword varname">other_columns</var> <var class="keyword varname">other_analytic_clauses</var>)</code>.
+
+ </p>
+
+ <p class="p">
+ The sequence of results from an analytic function <span class="q">"resets"</span> for each new partition in the result set.
+ That is, the set of preceding or following rows considered by the analytic function always come from a
+ single partition. Any <code class="ph codeph">MAX()</code>, <code class="ph codeph">SUM()</code>, <code class="ph codeph">ROW_NUMBER()</code>, and so
+ on apply to each partition independently. Omit the <code class="ph codeph">PARTITION BY</code> clause to apply the
+ analytic operation to all the rows in the table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">ORDER BY clause:</strong>
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause works much like the <code class="ph codeph">ORDER BY</code> clause in the outermost
+ block of a query. It defines the order in which rows are evaluated for the entire input set, or for each
+ group produced by a <code class="ph codeph">PARTITION BY</code> clause. You can order by one or multiple expressions, and
+ for each expression optionally choose ascending or descending order and whether nulls come first or last in
+ the sort order. Because this <code class="ph codeph">ORDER BY</code> clause only defines the order in which rows are
+ evaluated, if you want the results to be output in a specific order, also include an <code class="ph codeph">ORDER
+ BY</code> clause in the outer block of the query.
+ </p>
+
+ <p class="p">
+ When the <code class="ph codeph">ORDER BY</code> clause is omitted, the analytic function applies to all items in the
+ group produced by the <code class="ph codeph">PARTITION BY</code> clause. When the <code class="ph codeph">ORDER BY</code> clause is
+ included, the analysis can apply to all or a subset of the items in the group, depending on the optional
+ window clause.
+ </p>
+
+ <p class="p">
+ The order in which the rows are analyzed is only defined for those columns specified in <code class="ph codeph">ORDER
+ BY</code> clauses.
+ </p>
+
+ <p class="p">
+ One difference between the analytic and outer uses of the <code class="ph codeph">ORDER BY</code> clause: inside the
+ <code class="ph codeph">OVER</code> clause, <code class="ph codeph">ORDER BY 1</code> or other integer value is interpreted as a
+ constant sort value (effectively a no-op) rather than referring to column 1.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Window clause:</strong>
+ </p>
+
+ <p class="p">
+ The window clause is only allowed in combination with an <code class="ph codeph">ORDER BY</code> clause. If the
+ <code class="ph codeph">ORDER BY</code> clause is specified but the window clause is not, the default window is
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>. See
+ <a class="xref" href="impala_analytic_functions.html#window_clause">Window Clause</a> for full details.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">HBase considerations:</strong>
+ </p>
+
+ <p class="p">
+ Because HBase tables are optimized for single-row lookups rather than full scans, analytic functions using
+ the <code class="ph codeph">OVER()</code> clause are not recommended for HBase tables. Although such queries work, their
+ performance is lower than on comparable tables using HDFS data files.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Parquet considerations:</strong>
+ </p>
+
+ <p class="p">
+ Analytic functions are very efficient for Parquet tables. The data that is examined during evaluation of
+ the <code class="ph codeph">OVER()</code> clause comes from a specified set of columns, and the values for each column
+ are arranged sequentially within each data file.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Text table considerations:</strong>
+ </p>
+
+ <p class="p">
+ Analytic functions are convenient to use with text tables for exploratory business intelligence. When the
+ volume of data is substantial, prefer to use Parquet tables for performance-critical analytic queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows how to synthesize a numeric sequence corresponding to all the rows in a table.
+ The new table has the same columns as the old one, plus an additional column <code class="ph codeph">ID</code> containing
+ the integers 1, 2, 3, and so on, corresponding to the order of a <code class="ph codeph">TIMESTAMP</code> column in the
+ original table.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>CREATE TABLE events_with_id AS
+ SELECT
+ row_number() OVER (ORDER BY date_and_time) AS id,
+ c1, c2, c3, c4
+ FROM events;
+</code></pre>
+
+ <p class="p">
+ The following example shows how to determine the number of rows containing each value for a column. Unlike
+ a corresponding <code class="ph codeph">GROUP BY</code> query, this one can analyze a single column and still return all
+ values (not just the distinct ones) from the other columns.
+ </p>
+
+
+
+<pre class="pre codeblock"><code>SELECT x, y, z,
+ count() OVER (PARTITION BY x) AS how_many_x
+FROM t1;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Restrictions:</strong>
+ </p>
+
+ <p class="p">
+ You cannot directly combine the <code class="ph codeph">DISTINCT</code> operator with analytic function calls. You can
+ put the analytic function call in a <code class="ph codeph">WITH</code> clause or an inline view, and apply the
+ <code class="ph codeph">DISTINCT</code> operator to its result set.
+ </p>
+
+<pre class="pre codeblock"><code>WITH t1 AS (SELECT x, sum(x) OVER (PARTITION BY x) AS total FROM t1)
+ SELECT DISTINCT x, total FROM t1;
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="analytic_functions__window_clause">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Window Clause</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Certain analytic functions accept an optional <dfn class="term">window clause</dfn>, which makes the function analyze
+ only certain rows <span class="q">"around"</span> the current row rather than all rows in the partition. For example, you can
+ get a moving average by specifying some number of preceding and following rows, or a running count or
+ running total by specifying all rows up to the current position. This clause can result in different
+ analytic results for rows within the same partition.
+ </p>
+
+ <p class="p">
+ The window clause is supported with the <code class="ph codeph">AVG()</code>, <code class="ph codeph">COUNT()</code>,
+ <code class="ph codeph">FIRST_VALUE()</code>, <code class="ph codeph">LAST_VALUE()</code>, and <code class="ph codeph">SUM()</code> functions.
+
+ For <code class="ph codeph">MAX()</code> and <code class="ph codeph">MIN()</code>, the window clause only allowed if the start bound is
+ <code class="ph codeph">UNBOUNDED PRECEDING</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ROWS BETWEEN [ { <var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]
+RANGE BETWEEN [ {<var class="keyword varname">m</var> | UNBOUNDED } PRECEDING | CURRENT ROW] [ AND [CURRENT ROW | { UNBOUNDED | <var class="keyword varname">n</var> } FOLLOWING] ]</code></pre>
+
+ <p class="p">
+ <code class="ph codeph">ROWS BETWEEN</code> defines the size of the window in terms of the indexes of the rows in the
+ result set. The size of the window is predictable based on the clauses the position within the result set.
+ </p>
+
+ <p class="p">
+ <code class="ph codeph">RANGE BETWEEN</code> does not currently support numeric arguments to define a variable-size
+ sliding window.
+
+ </p>
+
+
+
+ <p class="p">
+ Currently, Impala supports only some combinations of arguments to the <code class="ph codeph">RANGE</code> clause:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> (the default when <code class="ph codeph">ORDER
+ BY</code> is specified and the window clause is omitted)
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING</code>
+ </li>
+
+ <li class="li">
+ <code class="ph codeph">RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING</code>
+ </li>
+ </ul>
+
+ <p class="p">
+ When <code class="ph codeph">RANGE</code> is used, <code class="ph codeph">CURRENT ROW</code> includes not just the current row but all
+ rows that are tied with the current row based on the <code class="ph codeph">ORDER BY</code> expressions.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show financial data for a fictional stock symbol <code class="ph codeph">JDR</code>. The closing
+ price moves up and down each day.
+ </p>
+
+<pre class="pre codeblock"><code>create table stock_ticker (stock_symbol string, closing_price decimal(8,2), closing_date timestamp);
+...load some data...
+select * from stock_ticker order by stock_symbol, closing_date
++--------------+---------------+---------------------+
+| stock_symbol | closing_price | closing_date |
++--------------+---------------+---------------------+
+| JDR | 12.86 | 2014-10-02 00:00:00 |
+| JDR | 12.89 | 2014-10-03 00:00:00 |
+| JDR | 12.94 | 2014-10-04 00:00:00 |
+| JDR | 12.55 | 2014-10-05 00:00:00 |
+| JDR | 14.03 | 2014-10-06 00:00:00 |
+| JDR | 14.75 | 2014-10-07 00:00:00 |
+| JDR | 13.98 | 2014-10-08 00:00:00 |
++--------------+---------------+---------------------+
+</code></pre>
+
+ <p class="p">
+ The queries use analytic functions with window clauses to compute moving averages of the closing price. For
+ example, <code class="ph codeph">ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING</code> produces an average of the value from a
+ 3-day span, producing a different value for each row. The first row, which has no preceding row, only gets
+ averaged with the row following it. If the table contained more than one stock symbol, the
+ <code class="ph codeph">PARTITION BY</code> clause would limit the window for the moving average to only consider the
+ prices for a single stock.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ avg(closing_price) over (partition by stock_symbol order by closing_date
+ rows between 1 preceding and 1 following) as moving_average
+ from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR | 2014-10-02 00:00:00 | 12.86 | 12.87 |
+| JDR | 2014-10-03 00:00:00 | 12.89 | 12.89 |
+| JDR | 2014-10-04 00:00:00 | 12.94 | 12.79 |
+| JDR | 2014-10-05 00:00:00 | 12.55 | 13.17 |
+| JDR | 2014-10-06 00:00:00 | 14.03 | 13.77 |
+| JDR | 2014-10-07 00:00:00 | 14.75 | 14.25 |
+| JDR | 2014-10-08 00:00:00 | 13.98 | 14.36 |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+ <p class="p">
+ The clause <code class="ph codeph">ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code> produces a cumulative moving
+ average, from the earliest data up to the value for each day.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ avg(closing_price) over (partition by stock_symbol order by closing_date
+ rows between unbounded preceding and current row) as moving_average
+ from stock_ticker;
++--------------+---------------------+---------------+----------------+
+| stock_symbol | closing_date | closing_price | moving_average |
++--------------+---------------------+---------------+----------------+
+| JDR | 2014-10-02 00:00:00 | 12.86 | 12.86 |
+| JDR | 2014-10-03 00:00:00 | 12.89 | 12.87 |
+| JDR | 2014-10-04 00:00:00 | 12.94 | 12.89 |
+| JDR | 2014-10-05 00:00:00 | 12.55 | 12.81 |
+| JDR | 2014-10-06 00:00:00 | 14.03 | 13.05 |
+| JDR | 2014-10-07 00:00:00 | 14.75 | 13.33 |
+| JDR | 2014-10-08 00:00:00 | 13.98 | 13.42 |
++--------------+---------------------+---------------+----------------+
+</code></pre>
+
+
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title4" id="analytic_functions__avg_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title4">AVG Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_avg.html#avg">AVG Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="analytic_functions__count_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title5">COUNT Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_count.html#count">COUNT Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="analytic_functions__cume_dist">
+
+ <h2 class="title topictitle2" id="ariaid-title6">CUME_DIST Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the cumulative distribution of a value. The value for each row in the result set is greater than 0
+ and less than or equal to 1.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>CUME_DIST (<var class="keyword varname">expr</var>)
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Within each partition of the result set, the <code class="ph codeph">CUME_DIST()</code> value represents an ascending
+ sequence that ends at 1. Each value represents the proportion of rows in the partition whose values are
+ less than or equal to the value in the current row.
+ </p>
+
+ <p class="p">
+ If the sequence of input values contains ties, the <code class="ph codeph">CUME_DIST()</code> results are identical for the
+ tied values.
+ </p>
+
+ <p class="p">
+ Impala only supports the <code class="ph codeph">CUME_DIST()</code> function in an analytic context, not as a regular
+ aggregate function.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ This example uses a table with 9 rows. The <code class="ph codeph">CUME_DIST()</code>
+ function evaluates the entire table because there is no <code class="ph codeph">PARTITION BY</code> clause,
+ with the rows ordered by the weight of the animal.
+ the sequence of values shows that 1/9 of the values are less than or equal to the lightest
+ animal (mouse), 2/9 of the values are less than or equal to the second-lightest animal,
+ and so on up to the heaviest animal (elephant), where 9/9 of the rows are less than or
+ equal to its weight.
+ </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+insert into animals values
+ ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+ ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+ ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, cume_dist() over (order by kilos) from animals;
++------------+-----------------------+
+| name | cume_dist() OVER(...) |
++------------+-----------------------+
+| Elephant | 1 |
+| Giraffe | 0.8888888888888888 |
+| Polar bear | 0.7777777777777778 |
+| Horse | 0.6666666666666666 |
+| Ostrich | 0.5555555555555556 |
+| Condor | 0.4444444444444444 |
+| Housecat | 0.3333333333333333 |
+| Owl | 0.2222222222222222 |
+| Mouse | 0.1111111111111111 |
++------------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ Using a <code class="ph codeph">PARTITION BY</code> clause produces a separate sequence for each partition
+ group, in this case one for mammals and one for birds. Because there are 3 birds and 6 mammals,
+ the sequence illustrates how 1/3 of the <span class="q">"Bird"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+ the lightest bird, 1/6 of the <span class="q">"Mammal"</span> rows have a <code class="ph codeph">kilos</code> value that is less than or equal to
+ the lightest mammal, and so on until both the heaviest bird and heaviest mammal have a <code class="ph codeph">CUME_DIST()</code>
+ value of 1.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) from animals
++------------+--------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 0.6666666666666666 |
+| Owl | Bird | 0.3333333333333333 |
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 0.8333333333333334 |
+| Polar bear | Mammal | 0.6666666666666666 |
+| Horse | Mammal | 0.5 |
+| Housecat | Mammal | 0.3333333333333333 |
+| Mouse | Mammal | 0.1666666666666667 |
++------------+--------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ We can reverse the ordering within each partition group by using an <code class="ph codeph">ORDER BY ... DESC</code>
+ clause within the <code class="ph codeph">OVER()</code> clause. Now the lightest (smallest value of <code class="ph codeph">kilos</code>)
+ animal of each kind has a <code class="ph codeph">CUME_DIST()</code> value of 1.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos desc) from animals
++------------+--------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++------------+--------+-----------------------+
+| Owl | Bird | 1 |
+| Condor | Bird | 0.6666666666666666 |
+| Ostrich | Bird | 0.3333333333333333 |
+| Mouse | Mammal | 1 |
+| Housecat | Mammal | 0.8333333333333334 |
+| Horse | Mammal | 0.6666666666666666 |
+| Polar bear | Mammal | 0.5 |
+| Giraffe | Mammal | 0.3333333333333333 |
+| Elephant | Mammal | 0.1666666666666667 |
++------------+--------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ The following example manufactures some rows with identical values in the <code class="ph codeph">kilos</code> column,
+ to demonstrate how the results look in case of tie values. For simplicity, it only shows the <code class="ph codeph">CUME_DIST()</code>
+ sequence for the <span class="q">"Bird"</span> rows. Now with 3 rows all with a value of 15, all of those rows have the same
+ <code class="ph codeph">CUME_DIST()</code> value. 4/5 of the rows have a value for <code class="ph codeph">kilos</code> that is less than or
+ equal to 15.
+ </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('California Condor', 'Bird', 15), ('Andean Condor', 'Bird', 15)
+
+select name, kind, cume_dist() over (order by kilos) from animals where kind = 'Bird';
++-------------------+------+-----------------------+
+| name | kind | cume_dist() OVER(...) |
++-------------------+------+-----------------------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 0.8 |
+| California Condor | Bird | 0.8 |
+| Andean Condor | Bird | 0.8 |
+| Owl | Bird | 0.2 |
++-------------------+------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how to use an <code class="ph codeph">ORDER BY</code> clause in the outer block
+ to order the result set in case of ties. Here, all the <span class="q">"Bird"</span> rows are together, then in descending order
+ by the result of the <code class="ph codeph">CUME_DIST()</code> function, and all tied <code class="ph codeph">CUME_DIST()</code>
+ values are ordered by the animal name.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, cume_dist() over (partition by kind order by kilos) as ordering
+ from animals
+where
+ kind = 'Bird'
+order by kind, ordering desc, name;
++-------------------+------+----------+
+| name | kind | ordering |
++-------------------+------+----------+
+| Ostrich | Bird | 1 |
+| Andean Condor | Bird | 0.8 |
+| California Condor | Bird | 0.8 |
+| Condor | Bird | 0.8 |
+| Owl | Bird | 0.2 |
++-------------------+------+----------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="analytic_functions__dense_rank">
+
+ <h2 class="title topictitle2" id="ariaid-title7">DENSE_RANK Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+ for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+ for the <span class="q">"tied"</span> input values, the function continues the sequence with the next higher integer.
+ Therefore, the sequence contains duplicates but no gaps when the input contains duplicates. Starts the
+ sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>DENSE_RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is not allowed.
+ </p>
+
+
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+ all the items with the 10 highest values, even if several items tied for 1st place.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">RANK</code>. These functions differ in how they treat
+ duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how the <code class="ph codeph">DENSE_RANK()</code> function identifies where each
+ value <span class="q">"places"</span> in the result set, producing the same result for duplicate values, but with a strict
+ sequence from 1 to the number of groups. For example, when results are ordered by the <code class="ph codeph">X</code>
+ column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code> values are tied for
+ second; and so on.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | square |
+| 1 | 1 | odd |
+| 2 | 2 | even |
+| 2 | 2 | prime |
+| 3 | 3 | prime |
+| 3 | 3 | odd |
+| 4 | 4 | even |
+| 4 | 4 | square |
+| 5 | 5 | odd |
+| 5 | 5 | prime |
+| 6 | 6 | even |
+| 6 | 6 | perfect |
+| 7 | 7 | lucky |
+| 7 | 7 | lucky |
+| 7 | 7 | lucky |
+| 7 | 7 | odd |
+| 7 | 7 | prime |
+| 8 | 8 | even |
+| 9 | 9 | square |
+| 9 | 9 | odd |
+| 10 | 10 | round |
+| 10 | 10 | even |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">DENSE_RANK()</code> function is affected by the
+ <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+ and <code class="ph codeph">DENSE_RANK()</code> returns the place of each value within the group, producing several
+ ascending sequences.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 2 | 1 | even |
+| 4 | 2 | even |
+| 6 | 3 | even |
+| 8 | 4 | even |
+| 10 | 5 | even |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 1 | 1 | odd |
+| 3 | 2 | odd |
+| 5 | 3 | odd |
+| 7 | 4 | odd |
+| 9 | 5 | odd |
+| 6 | 1 | perfect |
+| 2 | 1 | prime |
+| 3 | 2 | prime |
+| 5 | 3 | prime |
+| 7 | 4 | prime |
+| 10 | 1 | round |
+| 1 | 1 | square |
+| 4 | 2 | square |
+| 9 | 3 | square |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+ place each value within the group; because each value occurs only 1 or 2 times,
+ <code class="ph codeph">DENSE_RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+ group.
+ </p>
+
+<pre class="pre codeblock"><code>select x, dense_rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 1 | even |
+| 2 | 2 | prime |
+| 3 | 1 | odd |
+| 3 | 2 | prime |
+| 4 | 1 | even |
+| 4 | 2 | square |
+| 5 | 1 | odd |
+| 5 | 2 | prime |
+| 6 | 1 | even |
+| 6 | 2 | perfect |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 2 | odd |
+| 7 | 3 | prime |
+| 8 | 1 | even |
+| 9 | 1 | odd |
+| 9 | 2 | square |
+| 10 | 1 | even |
+| 10 | 2 | round |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how <code class="ph codeph">DENSE_RANK()</code> produces a continuous sequence while still
+ allowing for ties. In this case, Croesus and Midas both have the second largest fortune, while Crassus has
+ the third largest. (In <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, you see a similar query with the
+ <code class="ph codeph">RANK()</code> function that shows that while Crassus has the third largest fortune, he is the
+ fourth richest person.)
+ </p>
+
+<pre class="pre codeblock"><code>select dense_rank() over (order by net_worth desc) as placement, name, net_worth from wealth order by placement, name;
++-----------+---------+---------------+
+| placement | name | net_worth |
++-----------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 2 | Midas | 1000000000.00 |
+| 3 | Crassus | 500000000.00 |
+| 4 | Scrooge | 80000000.00 |
++-----------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="analytic_functions__first_value">
+
+ <h2 class="title topictitle2" id="ariaid-title8">FIRST_VALUE Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the expression value from the first row in the window. The return value is <code class="ph codeph">NULL</code> if
+ the input expression is <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>FIRST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is optional.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+ of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+ ensure consistent ordering.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows a table with a wide variety of country-appropriate greetings. For consistency,
+ we want to standardize on a single greeting for each country. The <code class="ph codeph">FIRST_VALUE()</code> function
+ helps to produce a mail merge report where every person from the same country is addressed with the same
+ greeting.
+ </p>
+
+<pre class="pre codeblock"><code>select name, country, greeting from mail_merge
++---------+---------+--------------+
+| name | country | greeting |
++---------+---------+--------------+
+| Pete | USA | Hello |
+| John | USA | Hi |
+| Boris | Germany | Guten tag |
+| Michael | Germany | Guten morgen |
+| Bjorn | Sweden | Hej |
+| Mats | Sweden | Tja |
++---------+---------+--------------+
+
+select country, name,
+ first_value(greeting)
+ over (partition by country order by name, greeting) as greeting
+ from mail_merge;
++---------+---------+-----------+
+| country | name | greeting |
++---------+---------+-----------+
+| Germany | Boris | Guten tag |
+| Germany | Michael | Guten tag |
+| Sweden | Bjorn | Hej |
+| Sweden | Mats | Hej |
+| USA | John | Hi |
+| USA | Pete | Hi |
++---------+---------+-----------+
+</code></pre>
+
+ <p class="p">
+ Changing the order in which the names are evaluated changes which greeting is applied to each group.
+ </p>
+
+<pre class="pre codeblock"><code>select country, name,
+ first_value(greeting)
+ over (partition by country order by name desc, greeting) as greeting
+ from mail_merge;
++---------+---------+--------------+
+| country | name | greeting |
++---------+---------+--------------+
+| Germany | Michael | Guten morgen |
+| Germany | Boris | Guten morgen |
+| Sweden | Mats | Tja |
+| Sweden | Bjorn | Tja |
+| USA | Pete | Hello |
+| USA | John | Hello |
++---------+---------+--------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#last_value">LAST_VALUE Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="analytic_functions__lag">
+
+ <h2 class="title topictitle2" id="ariaid-title9">LAG Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This function returns the value of an expression using column values from a preceding row. You specify an
+ integer offset, which designates a row position some number of rows previous to the current row. Any column
+ references in the expression argument refer to column values from that prior row. Typically, the table
+ contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LAG (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Sometimes used an an alternative to doing a self-join.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. For each day, the
+ query prints the closing price alongside the previous day's closing price. The first row for each stock
+ symbol has no previous row, so that <code class="ph codeph">LAG()</code> value is <code class="ph codeph">NULL</code>.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ lag(closing_price,1) over (partition by stock_symbol order by closing_date) as "yesterday closing"
+ from stock_ticker
+ order by closing_date;
++--------------+---------------------+---------------+-------------------+
+| stock_symbol | closing_date | closing_price | yesterday closing |
++--------------+---------------------+---------------+-------------------+
+| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |
+| JDR | 2014-09-14 00:00:00 | 12.89 | 12.86 |
+| JDR | 2014-09-15 00:00:00 | 12.94 | 12.89 |
+| JDR | 2014-09-16 00:00:00 | 12.55 | 12.94 |
+| JDR | 2014-09-17 00:00:00 | 14.03 | 12.55 |
+| JDR | 2014-09-18 00:00:00 | 14.75 | 14.03 |
+| JDR | 2014-09-19 00:00:00 | 13.98 | 14.75 |
++--------------+---------------------+---------------+-------------------+
+</code></pre>
+
+ <p class="p">
+ The following example does an arithmetic operation between the current row and a value from the previous
+ row, to produce a delta value for each day. This example also demonstrates how <code class="ph codeph">ORDER BY</code>
+ works independently in the different parts of the query. The <code class="ph codeph">ORDER BY closing_date</code> in the
+ <code class="ph codeph">OVER</code> clause makes the query analyze the rows in chronological order. Then the outer query
+ block uses <code class="ph codeph">ORDER BY closing_date DESC</code> to present the results with the most recent date
+ first.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ cast(
+ closing_price - lag(closing_price,1) over
+ (partition by stock_symbol order by closing_date)
+ as decimal(8,2)
+ )
+ as "change from yesterday"
+ from stock_ticker
+ order by closing_date desc;
++--------------+---------------------+---------------+-----------------------+
+| stock_symbol | closing_date | closing_price | change from yesterday |
++--------------+---------------------+---------------+-----------------------+
+| JDR | 2014-09-19 00:00:00 | 13.98 | -0.76 |
+| JDR | 2014-09-18 00:00:00 | 14.75 | 0.72 |
+| JDR | 2014-09-17 00:00:00 | 14.03 | 1.47 |
+| JDR | 2014-09-16 00:00:00 | 12.55 | -0.38 |
+| JDR | 2014-09-15 00:00:00 | 12.94 | 0.04 |
+| JDR | 2014-09-14 00:00:00 | 12.89 | 0.03 |
+| JDR | 2014-09-13 00:00:00 | 12.86 | NULL |
++--------------+---------------------+---------------+-----------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ This function is the converse of <a class="xref" href="impala_analytic_functions.html#lead">LEAD Function</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="analytic_functions__last_value">
+
+ <h2 class="title topictitle2" id="ariaid-title10">LAST_VALUE Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the expression value from the last row in the window. This same value is repeated for all result
+ rows for the group. The return value is <code class="ph codeph">NULL</code> if the input expression is
+ <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LAST_VALUE(<var class="keyword varname">expr</var>) OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var> [<var class="keyword varname">window_clause</var>])</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is optional.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ If any duplicate values occur in the tuples evaluated by the <code class="ph codeph">ORDER BY</code> clause, the result
+ of this function is not deterministic. Consider adding additional <code class="ph codeph">ORDER BY</code> columns to
+ ensure consistent ordering.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same <code class="ph codeph">MAIL_MERGE</code> table as in the example for
+ <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>. Because the default window when <code class="ph codeph">ORDER
+ BY</code> is used is <code class="ph codeph">BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW</code>, the query requires the
+ <code class="ph codeph">UNBOUNDED FOLLOWING</code> to look ahead to subsequent rows and find the last value for each
+ country.
+ </p>
+
+<pre class="pre codeblock"><code>select country, name,
+ last_value(greeting) over (
+ partition by country order by name, greeting
+ rows between unbounded preceding and unbounded following
+ ) as greeting
+ from mail_merge
++---------+---------+--------------+
+| country | name | greeting |
++---------+---------+--------------+
+| Germany | Boris | Guten morgen |
+| Germany | Michael | Guten morgen |
+| Sweden | Bjorn | Tja |
+| Sweden | Mats | Tja |
+| USA | John | Hello |
+| USA | Pete | Hello |
++---------+---------+--------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#first_value">FIRST_VALUE Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title11" id="analytic_functions__lead">
+
+ <h2 class="title topictitle2" id="ariaid-title11">LEAD Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This function returns the value of an expression using column values from a following row. You specify an
+ integer offset, which designates a row position some number of rows after to the current row. Any column
+ references in the expression argument refer to column values from that later row. Typically, the table
+ contains a time sequence or numeric sequence column that clearly distinguishes the ordering of the rows.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>LEAD (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var>] [, <var class="keyword varname">default</var>])
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Sometimes used an an alternative to doing a self-join.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same stock data created in <a class="xref" href="#window_clause">Window Clause</a>. The query analyzes
+ the closing price for a stock symbol, and for each day evaluates if the closing price for the following day
+ is higher or lower.
+ </p>
+
+<pre class="pre codeblock"><code>select stock_symbol, closing_date, closing_price,
+ case
+ (lead(closing_price,1)
+ over (partition by stock_symbol order by closing_date)
+ - closing_price) > 0
+ when true then "higher"
+ when false then "flat or lower"
+ end as "trending"
+from stock_ticker
+ order by closing_date;
++--------------+---------------------+---------------+---------------+
+| stock_symbol | closing_date | closing_price | trending |
++--------------+---------------------+---------------+---------------+
+| JDR | 2014-09-13 00:00:00 | 12.86 | higher |
+| JDR | 2014-09-14 00:00:00 | 12.89 | higher |
+| JDR | 2014-09-15 00:00:00 | 12.94 | flat or lower |
+| JDR | 2014-09-16 00:00:00 | 12.55 | higher |
+| JDR | 2014-09-17 00:00:00 | 14.03 | higher |
+| JDR | 2014-09-18 00:00:00 | 14.75 | flat or lower |
+| JDR | 2014-09-19 00:00:00 | 13.98 | NULL |
++--------------+---------------------+---------------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ This function is the converse of <a class="xref" href="impala_analytic_functions.html#lag">LAG Function</a>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title12" id="analytic_functions__max_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title12">MAX Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_max.html#max">MAX Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title13" id="analytic_functions__min_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title13">MIN Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_min.html#min">MIN Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title14" id="analytic_functions__ntile">
+
+ <h2 class="title topictitle2" id="ariaid-title14">NTILE Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns the <span class="q">"bucket number"</span> associated with each row, between 1 and the value of an expression. For
+ example, creating 100 buckets puts the lowest 1% of values in the first bucket, while creating 10 buckets
+ puts the lowest 10% of values in the first bucket. Each partition can have a different number of buckets.
+
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>NTILE (<var class="keyword varname">expr</var> [, <var class="keyword varname">offset</var> ...]
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ The <span class="q">"ntile"</span> name is derived from the practice of dividing result sets into fourths (quartile), tenths
+ (decile), and so on. The <code class="ph codeph">NTILE()</code> function divides the result set based on an arbitrary
+ percentile value.
+ </p>
+
+ <p class="p">
+ The number of buckets must be a positive integer.
+ </p>
+
+ <p class="p">
+ The number of items in each bucket is identical or almost so, varying by at most 1. If the number of items
+ does not divide evenly between the buckets, the remaining N items are divided evenly among the first N
+ buckets.
+ </p>
+
+ <p class="p">
+ If the number of buckets N is greater than the number of input rows in the partition, then the first N
+ buckets each contain one item, and the remaining buckets are empty.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example shows divides groups of animals into 4 buckets based on their weight. The
+ <code class="ph codeph">ORDER BY ... DESC</code> clause in the <code class="ph codeph">OVER()</code> clause means that the heaviest 25%
+ are in the first group, and the lightest 25% are in the fourth group. (The <code class="ph codeph">ORDER BY</code> in the
+ outermost part of the query shows how you can order the final result set independently from the order in
+ which the rows are evaluated by the <code class="ph codeph">OVER()</code> clause.) Because there are 9 rows in the group,
+ divided into 4 buckets, the first bucket receives the extra item.
+ </p>
+
+<pre class="pre codeblock"><code>create table animals (name string, kind string, kilos decimal(9,3));
+
+insert into animals values
+ ('Elephant', 'Mammal', 4000), ('Giraffe', 'Mammal', 1200), ('Mouse', 'Mammal', 0.020),
+ ('Condor', 'Bird', 15), ('Horse', 'Mammal', 500), ('Owl', 'Bird', 2.5),
+ ('Ostrich', 'Bird', 145), ('Polar bear', 'Mammal', 700), ('Housecat', 'Mammal', 5);
+
+select name, ntile(4) over (order by kilos desc) as quarter
+ from animals
+order by quarter desc;
++------------+---------+
+| name | quarter |
++------------+---------+
+| Owl | 4 |
+| Mouse | 4 |
+| Condor | 3 |
+| Housecat | 3 |
+| Horse | 2 |
+| Ostrich | 2 |
+| Elephant | 1 |
+| Giraffe | 1 |
+| Polar bear | 1 |
++------------+---------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">PARTITION</code> clause works for the
+ <code class="ph codeph">NTILE()</code> function. Here, we divide each kind of animal (mammal or bird) into 2 buckets,
+ the heavier half and the lighter half.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+ from animals
+order by kind;
++------------+--------+------+
+| name | kind | half |
++------------+--------+------+
+| Ostrich | Bird | 1 |
+| Condor | Bird | 1 |
+| Owl | Bird | 2 |
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 1 |
+| Polar bear | Mammal | 1 |
+| Horse | Mammal | 2 |
+| Housecat | Mammal | 2 |
+| Mouse | Mammal | 2 |
++------------+--------+------+
+</code></pre>
+
+ <p class="p">
+ Again, the result set can be ordered independently
+ from the analytic evaluation. This next example lists all the animals heaviest to lightest,
+ showing that elephant and giraffe are in the <span class="q">"top half"</span> of mammals by weight, while
+ housecat and mouse are in the <span class="q">"bottom half"</span>.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, ntile(2) over (partition by kind order by kilos desc) as half
+ from animals
+order by kilos desc;
++------------+--------+------+
+| name | kind | half |
++------------+--------+------+
+| Elephant | Mammal | 1 |
+| Giraffe | Mammal | 1 |
+| Polar bear | Mammal | 1 |
+| Horse | Mammal | 2 |
+| Ostrich | Bird | 1 |
+| Condor | Bird | 1 |
+| Housecat | Mammal | 2 |
+| Owl | Bird | 2 |
+| Mouse | Mammal | 2 |
++------------+--------+------+
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title15" id="analytic_functions__percent_rank">
+
+ <h2 class="title topictitle2" id="ariaid-title15">PERCENT_RANK Function (<span class="keyword">Impala 2.3</span> or higher only)</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>PERCENT_RANK (<var class="keyword varname">expr</var>)
+ OVER ([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)
+</code></pre>
+
+ <p class="p">
+ Calculates the rank, expressed as a percentage, of each row within a group of rows.
+ If <code class="ph codeph">rank</code> is the value for that same row from the <code class="ph codeph">RANK()</code> function (from 1 to the total number of rows in the partition group),
+ then the <code class="ph codeph">PERCENT_RANK()</code> value is calculated as <code class="ph codeph">(<var class="keyword varname">rank</var> - 1) / (<var class="keyword varname">rows_in_group</var> - 1)</code> .
+ If there is only a single item in the partition group, its <code class="ph codeph">PERCENT_RANK()</code> value is 0.
+ </p>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ This function is similar to the <code class="ph codeph">RANK</code> and <code class="ph codeph">CUME_DIST()</code> functions: it returns an ascending sequence representing the position of each
+ row within the rows of the same partition group. The actual numeric sequence is calculated differently,
+ and the handling of duplicate (tied) values is different.
+ </p>
+
+ <p class="p">
+ The return values range from 0 to 1 inclusive.
+ The first row in each partition group always has the value 0.
+ A <code class="ph codeph">NULL</code> value is considered the lowest possible value.
+ In the case of duplicate input values, all the corresponding rows in the result set
+ have an identical value: the lowest <code class="ph codeph">PERCENT_RANK()</code> value of those
+ tied rows. (In contrast to <code class="ph codeph">CUME_DIST()</code>, where all tied rows have
+ the highest <code class="ph codeph">CUME_DIST()</code> value.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example uses the same <code class="ph codeph">ANIMALS</code> table as the examples for <code class="ph codeph">CUME_DIST()</code>
+ and <code class="ph codeph">NTILE()</code>, with a few additional rows to illustrate the results where some values are
+ <code class="ph codeph">NULL</code> or there is only a single row in a partition group.
+ </p>
+
+<pre class="pre codeblock"><code>insert into animals values ('Komodo dragon', 'Reptile', 70);
+insert into animals values ('Unicorn', 'Mythical', NULL);
+insert into animals values ('Fire-breathing dragon', 'Mythical', NULL);
+</code></pre>
+
+ <p class="p">
+ As with <code class="ph codeph">CUME_DIST()</code>, there is an ascending sequence for each kind of animal.
+ For example, the <span class="q">"Birds"</span> and <span class="q">"Mammals"</span> rows each have a <code class="ph codeph">PERCENT_RANK()</code> sequence
+ that ranges from 0 to 1.
+ The <span class="q">"Reptile"</span> row has a <code class="ph codeph">PERCENT_RANK()</code> of 0 because that partition group contains only a single item.
+ Both <span class="q">"Mythical"</span> animals have a <code class="ph codeph">PERCENT_RANK()</code> of 0 because
+ a <code class="ph codeph">NULL</code> is considered the lowest value within its partition group.
+ </p>
+
+<pre class="pre codeblock"><code>select name, kind, percent_rank() over (partition by kind order by kilos) from animals;
++-----------------------+----------+--------------------------+
+| name | kind | percent_rank() OVER(...) |
++-----------------------+----------+--------------------------+
+| Mouse | Mammal | 0 |
+| Housecat | Mammal | 0.2 |
+| Horse | Mammal | 0.4 |
+| Polar bear | Mammal | 0.6 |
+| Giraffe | Mammal | 0.8 |
+| Elephant | Mammal | 1 |
+| Komodo dragon | Reptile | 0 |
+| Owl | Bird | 0 |
+| California Condor | Bird | 0.25 |
+| Andean Condor | Bird | 0.25 |
+| Condor | Bird | 0.25 |
+| Ostrich | Bird | 1 |
+| Fire-breathing dragon | Mythical | 0 |
+| Unicorn | Mythical | 0 |
++-----------------------+----------+--------------------------+
+</code></pre>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title16" id="analytic_functions__rank">
+
+ <h2 class="title topictitle2" id="ariaid-title16">RANK Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. The output sequence produces duplicate integers
+ for duplicate values of the <code class="ph codeph">ORDER BY</code> expressions. After generating duplicate output values
+ for the <span class="q">"tied"</span> input values, the function increments the sequence by the number of tied values.
+ Therefore, the sequence contains both duplicates and gaps when the input contains duplicates. Starts the
+ sequence over for each group produced by the <code class="ph codeph">PARTITIONED BY</code> clause.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>RANK() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">PARTITION BY</code> clause is optional. The <code class="ph codeph">ORDER BY</code> clause is required. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+
+
+ <p class="p">
+ Often used for top-N and bottom-N queries. For example, it could produce a <span class="q">"top 10"</span> report including
+ several items that were tied for 10th place.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">ROW_NUMBER</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they
+ treat duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how the <code class="ph codeph">RANK()</code> function identifies where each value
+ <span class="q">"places"</span> in the result set, producing the same result for duplicate values, and skipping values in the
+ sequence to account for the number of duplicates. For example, when results are ordered by the
+ <code class="ph codeph">X</code> column, both <code class="ph codeph">1</code> values are tied for first; both <code class="ph codeph">2</code>
+ values are tied for third; and so on.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | square |
+| 1 | 1 | odd |
+| 2 | 3 | even |
+| 2 | 3 | prime |
+| 3 | 5 | prime |
+| 3 | 5 | odd |
+| 4 | 7 | even |
+| 4 | 7 | square |
+| 5 | 9 | odd |
+| 5 | 9 | prime |
+| 6 | 11 | even |
+| 6 | 11 | perfect |
+| 7 | 13 | lucky |
+| 7 | 13 | lucky |
+| 7 | 13 | lucky |
+| 7 | 13 | odd |
+| 7 | 13 | prime |
+| 8 | 18 | even |
+| 9 | 19 | square |
+| 9 | 19 | odd |
+| 10 | 21 | round |
+| 10 | 21 | even |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">RANK()</code> function is affected by the
+ <code class="ph codeph">PARTITION</code> property within the <code class="ph codeph">ORDER BY</code> clause.
+ </p>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">PROPERTY</code> column groups all the even, odd, and so on values together,
+ and <code class="ph codeph">RANK()</code> returns the place of each value within the group, producing several ascending
+ sequences.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by property order by x) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 2 | 1 | even |
+| 4 | 2 | even |
+| 6 | 3 | even |
+| 8 | 4 | even |
+| 10 | 5 | even |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 1 | 1 | odd |
+| 3 | 2 | odd |
+| 5 | 3 | odd |
+| 7 | 4 | odd |
+| 9 | 5 | odd |
+| 6 | 1 | perfect |
+| 2 | 1 | prime |
+| 3 | 2 | prime |
+| 5 | 3 | prime |
+| 7 | 4 | prime |
+| 10 | 1 | round |
+| 1 | 1 | square |
+| 4 | 2 | square |
+| 9 | 3 | square |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ Partitioning by the <code class="ph codeph">X</code> column groups all the duplicate numbers together and returns the
+ place each value within the group; because each value occurs only 1 or 2 times,
+ <code class="ph codeph">RANK()</code> designates each <code class="ph codeph">X</code> value as either first or second within its
+ group.
+ </p>
+
+<pre class="pre codeblock"><code>select x, rank() over(partition by x order by property) as rank, property from int_t;
++----+------+----------+
+| x | rank | property |
++----+------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 1 | even |
+| 2 | 2 | prime |
+| 3 | 1 | odd |
+| 3 | 2 | prime |
+| 4 | 1 | even |
+| 4 | 2 | square |
+| 5 | 1 | odd |
+| 5 | 2 | prime |
+| 6 | 1 | even |
+| 6 | 2 | perfect |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 1 | lucky |
+| 7 | 4 | odd |
+| 7 | 5 | prime |
+| 8 | 1 | even |
+| 9 | 1 | odd |
+| 9 | 2 | square |
+| 10 | 1 | even |
+| 10 | 2 | round |
++----+------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how a magazine might prepare a list of history's wealthiest people. Croesus and
+ Midas are tied for second, then Crassus is fourth.
+ </p>
+
+<pre class="pre codeblock"><code>select rank() over (order by net_worth desc) as rank, name, net_worth from wealth order by rank, name;
++------+---------+---------------+
+| rank | name | net_worth |
++------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 2 | Midas | 1000000000.00 |
+| 4 | Crassus | 500000000.00 |
+| 5 | Scrooge | 80000000.00 |
++------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>,
+ <a class="xref" href="impala_analytic_functions.html#row_number">ROW_NUMBER Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title17" id="analytic_functions__row_number">
+
+ <h2 class="title topictitle2" id="ariaid-title17">ROW_NUMBER Function</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Returns an ascending sequence of integers, starting with 1. Starts the sequence over for each group
+ produced by the <code class="ph codeph">PARTITIONED BY</code> clause. The output sequence includes different values for
+ duplicate input values. Therefore, the sequence never contains any duplicates or gaps, regardless of
+ duplicate input values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Syntax:</strong>
+ </p>
+
+<pre class="pre codeblock"><code>ROW_NUMBER() OVER([<var class="keyword varname">partition_by_clause</var>] <var class="keyword varname">order_by_clause</var>)</code></pre>
+
+ <p class="p">
+ The <code class="ph codeph">ORDER BY</code> clause is required. The <code class="ph codeph">PARTITION BY</code> clause is optional. The
+ window clause is not allowed.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Usage notes:</strong>
+ </p>
+
+ <p class="p">
+ Often used for top-N and bottom-N queries where the input values are known to be unique, or precisely N
+ rows are needed regardless of duplicate values.
+ </p>
+
+ <p class="p">
+ Because its result value is different for each row in the result set (when used without a <code class="ph codeph">PARTITION
+ BY</code> clause), <code class="ph codeph">ROW_NUMBER()</code> can be used to synthesize unique numeric ID values, for
+ example for result sets involving unique values or tuples.
+ </p>
+
+ <p class="p">
+ Similar to <code class="ph codeph">RANK</code> and <code class="ph codeph">DENSE_RANK</code>. These functions differ in how they treat
+ duplicate combinations of values.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Added in:</strong> <span class="keyword">Impala 2.0.0</span>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following example demonstrates how <code class="ph codeph">ROW_NUMBER()</code> produces a continuous numeric
+ sequence, even though some values of <code class="ph codeph">X</code> are repeated.
+ </p>
+
+<pre class="pre codeblock"><code>select x, row_number() over(order by x, property) as row_number, property from int_t;
++----+------------+----------+
+| x | row_number | property |
++----+------------+----------+
+| 1 | 1 | odd |
+| 1 | 2 | square |
+| 2 | 3 | even |
+| 2 | 4 | prime |
+| 3 | 5 | odd |
+| 3 | 6 | prime |
+| 4 | 7 | even |
+| 4 | 8 | square |
+| 5 | 9 | odd |
+| 5 | 10 | prime |
+| 6 | 11 | even |
+| 6 | 12 | perfect |
+| 7 | 13 | lucky |
+| 7 | 14 | lucky |
+| 7 | 15 | lucky |
+| 7 | 16 | odd |
+| 7 | 17 | prime |
+| 8 | 18 | even |
+| 9 | 19 | odd |
+| 9 | 20 | square |
+| 10 | 21 | even |
+| 10 | 22 | round |
++----+------------+----------+
+</code></pre>
+
+ <p class="p">
+ The following example shows how a financial institution might assign customer IDs to some of history's
+ wealthiest figures. Although two of the people have identical net worth figures, unique IDs are required
+ for this purpose. <code class="ph codeph">ROW_NUMBER()</code> produces a sequence of five different values for the five
+ input rows.
+ </p>
+
+<pre class="pre codeblock"><code>select row_number() over (order by net_worth desc) as account_id, name, net_worth
+ from wealth order by account_id, name;
++------------+---------+---------------+
+| account_id | name | net_worth |
++------------+---------+---------------+
+| 1 | Solomon | 2000000000.00 |
+| 2 | Croesus | 1000000000.00 |
+| 3 | Midas | 1000000000.00 |
+| 4 | Crassus | 500000000.00 |
+| 5 | Scrooge | 80000000.00 |
++------------+---------+---------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_analytic_functions.html#rank">RANK Function</a>, <a class="xref" href="impala_analytic_functions.html#dense_rank">DENSE_RANK Function</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title18" id="analytic_functions__sum_analytic">
+
+ <h2 class="title topictitle2" id="ariaid-title18">SUM Function - Analytic Context</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ You can include an <code class="ph codeph">OVER</code> clause with a call to this function to use it as an analytic
+ function. See <a class="xref" href="impala_sum.html#sum">SUM Function</a> for details and examples.
+ </p>
+
+ </div>
+
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_appx_count_distinct.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_appx_count_distinct.html b/docs/build3x/html/topics/impala_appx_count_distinct.html
new file mode 100644
index 0000000..c42c2ca
--- /dev/null
+++ b/docs/build3x/html/topics/impala_appx_count_distinct.html
@@ -0,0 +1,82 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_query_options.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="appx_count_distinct"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>APPX_COUNT_DISTINCT Query Option (Impala 2.0 or higher only)</title></head><body id="appx_count_distinct"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">APPX_COUNT_DISTINCT Query Option (<span class="keyword">Impala 2.0</span> or higher only)</h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+
+ Allows multiple <code class="ph codeph">COUNT(DISTINCT)</code> operations within a single query, by internally rewriting
+ each <code class="ph codeph">COUNT(DISTINCT)</code> to use the <code class="ph codeph">NDV()</code> function. The resulting count is
+ approximate rather than precise.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Type:</strong> Boolean; recognized values are 1 and 0, or <code class="ph codeph">true</code> and <code class="ph codeph">false</code>;
+ any other value interpreted as <code class="ph codeph">false</code>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Default:</strong> <code class="ph codeph">false</code> (shown as 0 in output of <code class="ph codeph">SET</code> statement)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Examples:</strong>
+ </p>
+
+ <p class="p">
+ The following examples show how the <code class="ph codeph">APPX_COUNT_DISTINCT</code> lets you work around the restriction
+ where a query can only evaluate <code class="ph codeph">COUNT(DISTINCT <var class="keyword varname">col_name</var>)</code> for a single
+ column. By default, you can count the distinct values of one column or another, but not both in a single
+ query:
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > select count(distinct x) from int_t;
++-------------------+
+| count(distinct x) |
++-------------------+
+| 10 |
++-------------------+
+[localhost:21000] > select count(distinct property) from int_t;
++--------------------------+
+| count(distinct property) |
++--------------------------+
+| 7 |
++--------------------------+
+[localhost:21000] > select count(distinct x), count(distinct property) from int_t;
+ERROR: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters
+as count(DISTINCT x); deviating function: count(DISTINCT property)
+</code></pre>
+
+ <p class="p">
+ When you enable the <code class="ph codeph">APPX_COUNT_DISTINCT</code> query option, now the query with multiple
+ <code class="ph codeph">COUNT(DISTINCT)</code> works. The reason this behavior requires a query option is that each
+ <code class="ph codeph">COUNT(DISTINCT)</code> is rewritten internally to use the <code class="ph codeph">NDV()</code> function instead,
+ which provides an approximate result rather than a precise count.
+ </p>
+
+<pre class="pre codeblock"><code>[localhost:21000] > set APPX_COUNT_DISTINCT=true;
+[localhost:21000] > select count(distinct x), count(distinct property) from int_t;
++-------------------+--------------------------+
+| count(distinct x) | count(distinct property) |
++-------------------+--------------------------+
+| 10 | 7 |
++-------------------+--------------------------+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Related information:</strong>
+ </p>
+
+ <p class="p">
+ <a class="xref" href="impala_count.html#count">COUNT Function</a>,
+ <a class="xref" href="impala_distinct.html#distinct">DISTINCT Operator</a>,
+ <a class="xref" href="impala_ndv.html#ndv">NDV Function</a>
+ </p>
+
+ </div>
+<nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_query_options.html">Query Options for the SET Statement</a></div></div></nav></article></main></body></html>
[26/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_kerberos.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_kerberos.html b/docs/build3x/html/topics/impala_kerberos.html
new file mode 100644
index 0000000..582c7da
--- /dev/null
+++ b/docs/build3x/html/topics/impala_kerberos.html
@@ -0,0 +1,342 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_authentication.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala
3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="kerberos"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Enabling Kerberos Authentication for Impala</title></head><body id="kerberos"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Enabling Kerberos Authentication for Impala</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala supports an enterprise-grade authentication system called Kerberos. Kerberos provides strong security benefits including
+ capabilities that render intercepted authentication packets unusable by an attacker. It virtually eliminates the threat of
+ impersonation by never sending a user's credentials in cleartext over the network. For more information on Kerberos, visit
+ the <a class="xref" href="https://web.mit.edu/kerberos/" target="_blank">MIT Kerberos website</a>.
+ </p>
+
+ <p class="p">
+ The rest of this topic assumes you have a working <a class="xref" href="https://web.mit.edu/kerberos/krb5-latest/doc/admin/install_kdc.html" target="_blank">Kerberos Key Distribution Center (KDC)</a>
+ set up. To enable Kerberos, you first create a Kerberos principal for each host running
+ <span class="keyword cmdname">impalad</span> or <span class="keyword cmdname">statestored</span>.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Regardless of the authentication mechanism used, Impala always creates HDFS directories and data files
+ owned by the same user (typically <code class="ph codeph">impala</code>). To implement user-level access to different
+ databases, tables, columns, partitions, and so on, use the Sentry authorization feature, as explained in
+ <a class="xref" href="../shared/../topics/impala_authorization.html#authorization">Enabling Sentry Authorization for Impala</a>.
+ </div>
+
+ <p class="p">
+ An alternative form of authentication you can use is LDAP, described in <a class="xref" href="impala_ldap.html#ldap">Enabling LDAP Authentication for Impala</a>.
+ </p>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_authentication.html">Impala Authentication</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="kerberos__kerberos_prereqs">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Requirements for Using Impala with Kerberos</h2>
+
+
+ <div class="body conbody">
+
+ <div class="p">
+ On version 5 of Red Hat Enterprise Linux and comparable distributions, some additional setup is needed for
+ the <span class="keyword cmdname">impala-shell</span> interpreter to connect to a Kerberos-enabled Impala cluster:
+<pre class="pre codeblock"><code>sudo yum install python-devel openssl-devel python-pip
+sudo pip-python install ssl</code></pre>
+ </div>
+
+ <div class="note important note_important"><span class="note__title importanttitle">Important:</span>
+ <p class="p">
+ If you plan to use Impala in your cluster, you must configure your KDC to allow tickets to be renewed,
+ and you must configure <span class="ph filepath">krb5.conf</span> to request renewable tickets. Typically, you can do
+ this by adding the <code class="ph codeph">max_renewable_life</code> setting to your realm in
+ <span class="ph filepath">kdc.conf</span>, and by adding the <span class="ph filepath">renew_lifetime</span> parameter to the
+ <span class="ph filepath">libdefaults</span> section of <span class="ph filepath">krb5.conf</span>. For more information about
+ renewable tickets, see the
+ <a class="xref" href="http://web.mit.edu/Kerberos/krb5-1.8/" target="_blank"> Kerberos
+ documentation</a>.
+ </p>
+ <p class="p">
+ Currently, you cannot use the resource management feature on a cluster that has Kerberos
+ authentication enabled.
+ </p>
+ </div>
+
+ <p class="p">
+ Start all <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons with the
+ <code class="ph codeph">--principal</code> and <code class="ph codeph">--keytab-file</code> flags set to the principal and full path
+ name of the <code class="ph codeph">keytab</code> file containing the credentials for the principal.
+ </p>
+
+ <p class="p">
+ To enable Kerberos in the Impala shell, start the <span class="keyword cmdname">impala-shell</span> command using the
+ <code class="ph codeph">-k</code> flag.
+ </p>
+
+ <p class="p">
+ To enable Impala to work with Kerberos security on your Hadoop cluster, make sure you perform the
+ installation and configuration steps in
+ <a class="xref" href="https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SecureMode.html#Authentication" target="_blank">Authentication in Hadoop</a>.
+ Note that when Kerberos security is enabled in Impala, a web browser that
+ supports Kerberos HTTP SPNEGO is required to access the Impala web console (for example, Firefox, Internet
+ Explorer, or Chrome).
+ </p>
+
+ <p class="p">
+ If the NameNode, Secondary NameNode, DataNode, JobTracker, TaskTrackers, ResourceManager, NodeManagers,
+ HttpFS, Oozie, Impala, or Impala statestore services are configured to use Kerberos HTTP SPNEGO
+ authentication, and two or more of these services are running on the same host, then all of the running
+ services must use the same HTTP principal and keytab file used for their HTTP endpoints.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title3" id="kerberos__kerberos_config">
+
+ <h2 class="title topictitle2" id="ariaid-title3">Configuring Impala to Support Kerberos Security</h2>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ Enabling Kerberos authentication for Impala involves steps that can be summarized as follows:
+ </p>
+
+ <ul class="ul">
+ <li class="li">
+ Creating service principals for Impala and the HTTP service. Principal names take the form:
+ <code class="ph codeph"><var class="keyword varname">serviceName</var>/<var class="keyword varname">fully.qualified.domain.name</var>@<var class="keyword varname">KERBEROS.REALM</var></code>.
+ <p class="p">
+ In Impala 2.0 and later, <code class="ph codeph">user()</code> returns the full Kerberos principal string, such as
+ <code class="ph codeph">user@example.com</code>, in a Kerberized environment.
+ </p>
+ </li>
+
+ <li class="li">
+ Creating, merging, and distributing key tab files for these principals.
+ </li>
+
+ <li class="li">
+ Editing <code class="ph codeph">/etc/default/impala</code>
+ to accommodate Kerberos authentication.
+ </li>
+ </ul>
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title4" id="kerberos_config__kerberos_setup">
+
+ <h3 class="title topictitle3" id="ariaid-title4">Enabling Kerberos for Impala</h3>
+
+ <div class="body conbody">
+
+
+
+ <ol class="ol">
+ <li class="li">
+ Create an Impala service principal, specifying the name of the OS user that the Impala daemons run
+ under, the fully qualified domain name of each node running <span class="keyword cmdname">impalad</span>, and the realm
+ name. For example:
+<pre class="pre codeblock"><code>$ kadmin
+kadmin: addprinc -requires_preauth -randkey impala/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+ </li>
+
+ <li class="li">
+ Create an HTTP service principal. For example:
+<pre class="pre codeblock"><code>kadmin: addprinc -randkey HTTP/impala_host.example.com@TEST.EXAMPLE.COM</code></pre>
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The <code class="ph codeph">HTTP</code> component of the service principal must be uppercase as shown in the
+ preceding example.
+ </div>
+ </li>
+
+ <li class="li">
+ Create <code class="ph codeph">keytab</code> files with both principals. For example:
+<pre class="pre codeblock"><code>kadmin: xst -k impala.keytab impala/impala_host.example.com
+kadmin: xst -k http.keytab HTTP/impala_host.example.com
+kadmin: quit</code></pre>
+ </li>
+
+ <li class="li">
+ Use <code class="ph codeph">ktutil</code> to read the contents of the two keytab files and then write those contents
+ to a new file. For example:
+<pre class="pre codeblock"><code>$ ktutil
+ktutil: rkt impala.keytab
+ktutil: rkt http.keytab
+ktutil: wkt impala-http.keytab
+ktutil: quit</code></pre>
+ </li>
+
+ <li class="li">
+ (Optional) Test that credentials in the merged keytab file are valid, and that the <span class="q">"renew until"</span>
+ date is in the future. For example:
+<pre class="pre codeblock"><code>$ klist -e -k -t impala-http.keytab</code></pre>
+ </li>
+
+ <li class="li">
+ Copy the <span class="ph filepath">impala-http.keytab</span> file to the Impala configuration directory. Change the
+ permissions to be only read for the file owner and change the file owner to the <code class="ph codeph">impala</code>
+ user. By default, the Impala user and group are both named <code class="ph codeph">impala</code>. For example:
+<pre class="pre codeblock"><code>$ cp impala-http.keytab /etc/impala/conf
+$ cd /etc/impala/conf
+$ chmod 400 impala-http.keytab
+$ chown impala:impala impala-http.keytab</code></pre>
+ </li>
+
+ <li class="li">
+ Add Kerberos options to the Impala defaults file, <span class="ph filepath">/etc/default/impala</span>. Add the
+ options for both the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> daemons, using the
+ <code class="ph codeph">IMPALA_SERVER_ARGS</code> and <code class="ph codeph">IMPALA_STATE_STORE_ARGS</code> variables. For
+ example, you might add:
+
+<pre class="pre codeblock"><code>-kerberos_reinit_interval=60
+-principal=impala_1/impala_host.example.com@TEST.EXAMPLE.COM
+-keytab_file=<var class="keyword varname">/path/to/impala.keytab</var></code></pre>
+ <p class="p">
+ For more information on changing the Impala defaults specified in
+ <span class="ph filepath">/etc/default/impala</span>, see
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup
+ Options</a>.
+ </p>
+ </li>
+ </ol>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ Restart <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">statestored</span> for these configuration changes to
+ take effect.
+ </div>
+ </div>
+ </article>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title5" id="kerberos__kerberos_proxy">
+
+ <h2 class="title topictitle2" id="ariaid-title5">Enabling Kerberos for Impala with a Proxy Server</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ A common configuration for Impala with High Availability is to use a proxy server to submit requests to the
+ actual <span class="keyword cmdname">impalad</span> daemons on different hosts in the cluster. This configuration avoids
+ connection problems in case of machine failure, because the proxy server can route new requests through one
+ of the remaining hosts in the cluster. This configuration also helps with load balancing, because the
+ additional overhead of being the <span class="q">"coordinator node"</span> for each query is spread across multiple hosts.
+ </p>
+
+ <p class="p">
+ Although you can set up a proxy server with or without Kerberos authentication, typically users set up a
+ secure Kerberized configuration. For information about setting up a proxy server for Impala, including
+ Kerberos-specific steps, see <a class="xref" href="impala_proxy.html#proxy">Using Impala through a Proxy for High Availability</a>.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title6" id="kerberos__spnego">
+
+ <h2 class="title topictitle2" id="ariaid-title6">Using a Web Browser to Access a URL Protected by Kerberos HTTP SPNEGO</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Your web browser must support Kerberos HTTP SPNEGO. For example, Chrome, Firefox, or Internet Explorer.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">To configure Firefox to access a URL protected by Kerberos HTTP SPNEGO:</strong>
+ </p>
+
+ <ol class="ol">
+ <li class="li">
+ Open the advanced settings Firefox configuration page by loading the <code class="ph codeph">about:config</code> page.
+ </li>
+
+ <li class="li">
+ Use the <strong class="ph b">Filter</strong> text box to find <code class="ph codeph">network.negotiate-auth.trusted-uris</code>.
+ </li>
+
+ <li class="li">
+ Double-click the <code class="ph codeph">network.negotiate-auth.trusted-uris</code> preference and enter the hostname
+ or the domain of the web server that is protected by Kerberos HTTP SPNEGO. Separate multiple domains and
+ hostnames with a comma.
+ </li>
+
+ <li class="li">
+ Click <strong class="ph b">OK</strong>.
+ </li>
+ </ol>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title7" id="kerberos__kerberos_delegation">
+ <h2 class="title topictitle2" id="ariaid-title7">Enabling Impala Delegation for Kerberos Users</h2>
+ <div class="body conbody">
+ <p class="p">
+ See <a class="xref" href="impala_delegation.html#delegation">Configuring Impala Delegation for Hue and BI Tools</a> for details about the delegation feature
+ that lets certain users submit queries using the credentials of other users.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title8" id="kerberos__ssl_jdbc_odbc">
+ <h2 class="title topictitle2" id="ariaid-title8">Using TLS/SSL with Business Intelligence Tools</h2>
+ <div class="body conbody">
+ <p class="p">
+ You can use Kerberos authentication, TLS/SSL encryption, or both to secure
+ connections from JDBC and ODBC applications to Impala.
+ See <a class="xref" href="impala_jdbc.html#impala_jdbc">Configuring Impala to Work with JDBC</a> and <a class="xref" href="impala_odbc.html#impala_odbc">Configuring Impala to Work with ODBC</a>
+ for details.
+ </p>
+
+ <p class="p">
+ Prior to <span class="keyword">Impala 2.5</span>, the Hive JDBC driver did not support connections that use both Kerberos authentication
+ and SSL encryption. If your cluster is running an older release that has this restriction,
+ use an alternative JDBC driver that supports
+ both of these security features.
+ </p>
+ </div>
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title9" id="kerberos__whitelisting_internal_apis">
+ <h2 class="title topictitle2" id="ariaid-title9">Enabling Access to Internal Impala APIs for Kerberos Users</h2>
+ <div class="body conbody">
+
+ <p class="p">
+ For applications that need direct access
+ to Impala APIs, without going through the HiveServer2 or Beeswax interfaces, you can
+ specify a list of Kerberos users who are allowed to call those APIs. By default, the
+ <code class="ph codeph">impala</code> and <code class="ph codeph">hdfs</code> users are the only ones authorized
+ for this kind of access.
+ Any users not explicitly authorized through the <code class="ph codeph">internal_principals_whitelist</code>
+ configuration setting are blocked from accessing the APIs. This setting applies to all the
+ Impala-related daemons, although currently it is primarily used for HDFS to control the
+ behavior of the catalog server.
+ </p>
+ </div>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title10" id="kerberos__auth_to_local">
+ <h2 class="title topictitle2" id="ariaid-title10">Mapping Kerberos Principals to Short Names for Impala</h2>
+ <div class="body conbody">
+ <div class="p">
+ In <span class="keyword">Impala 2.6</span> and higher, Impala recognizes the <code class="ph codeph">auth_to_local</code> setting,
+ specified through the HDFS configuration setting
+ <code class="ph codeph">hadoop.security.auth_to_local</code>.
+ This feature is disabled by default, to avoid an unexpected change in security-related behavior.
+ To enable it:
+ <ul class="ul">
+ <li class="li">
+ <p class="p">
+ Specify <code class="ph codeph">--load_auth_to_local_rules=true</code>
+ in the <span class="keyword cmdname">impalad</span> and <span class="keyword cmdname">catalogd</span> configuration settings.
+ </p>
+ </li>
+ </ul>
+ </div>
+ </div>
+ </article>
+
+</article></main></body></html>
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_known_issues.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_known_issues.html b/docs/build3x/html/topics/impala_known_issues.html
new file mode 100644
index 0000000..275753b
--- /dev/null
+++ b/docs/build3x/html/topics/impala_known_issues.html
@@ -0,0 +1,1012 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="DC.Relation" scheme="URI" content="../topics/impala_release_notes.html"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="known_issues"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Known Issues and Workarounds in Impala</title></head><body id="known_issues"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1"><span class="ph">Known Issues and Workarounds in Impala</span></h1>
+
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ The following sections describe known issues and workarounds in Impala, as of the current
+ production release. This page summarizes the most serious or frequently encountered issues
+ in the current release, to help you make planning decisions about installing and
+ upgrading. Any workarounds are listed here. The bug links take you to the Impala issues
+ site, where you can see the diagnosis and whether a fix is in the pipeline.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ The online issue tracking system for Impala contains comprehensive information and is
+ updated in real time. To verify whether an issue you are experiencing has already been
+ reported, or which release an issue is fixed in, search on the
+ <a class="xref" href="https://issues.apache.org/jira/" target="_blank">issues.apache.org
+ JIRA tracker</a>.
+ </div>
+
+ <p class="p toc inpage"></p>
+
+ <p class="p">
+ For issues fixed in various Impala releases, see
+ <a class="xref" href="impala_fixed_issues.html#fixed_issues">Fixed Issues in Apache Impala</a>.
+ </p>
+
+
+
+ </div>
+
+ <nav role="navigation" class="related-links"><div class="familylinks"><div class="parentlink"><strong>Parent topic:</strong> <a class="link" href="../topics/impala_release_notes.html">Impala Release Notes</a></div></div></nav><article class="topic concept nested1" aria-labelledby="ariaid-title2" id="known_issues__known_issues_startup">
+
+ <h2 class="title topictitle2" id="ariaid-title2">Impala Known Issues: Startup</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues can prevent one or more Impala-related daemons from starting properly.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title3" id="known_issues_startup__IMPALA-4978">
+
+ <h3 class="title topictitle3" id="ariaid-title3">Impala requires FQDN from hostname command on kerberized clusters</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The method Impala uses to retrieve the host name while constructing the Kerberos
+ principal is the <code class="ph codeph">gethostname()</code> system call. This function might not
+ always return the fully qualified domain name, depending on the network configuration.
+ If the daemons cannot determine the FQDN, Impala does not start on a kerberized
+ cluster.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Test if a host is affected by checking whether the output of the
+ <span class="keyword cmdname">hostname</span> command includes the FQDN. On hosts where
+ <span class="keyword cmdname">hostname</span>, only returns the short name, pass the command-line flag
+ <code class="ph codeph">--hostname=<var class="keyword varname">fully_qualified_domain_name</var></code> in the
+ startup options of all Impala-related daemons.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Apache Issue:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4978" target="_blank">IMPALA-4978</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="known_issues_performance__ki_performance" id="known_issues__known_issues_performance">
+
+ <h2 class="title topictitle2" id="known_issues_performance__ki_performance">Impala Known Issues: Performance</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues involve the performance of operations such as queries or DDL statements.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title5" id="known_issues_performance__impala-6671">
+
+ <h3 class="title topictitle3" id="ariaid-title5">Metadata operations block read-only operations on unrelated tables</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Metadata operations that change the state of a table, like <code class="ph codeph">COMPUTE
+ STATS</code> or <code class="ph codeph">ALTER RECOVER PARTITIONS</code>, may delay metadata
+ propagation of unrelated unloaded tables triggered by statements like
+ <code class="ph codeph">DESCRIBE</code> or <code class="ph codeph">SELECT</code> queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6671" target="_blank">IMPALA-6671</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title6" id="known_issues_performance__IMPALA-3316">
+
+ <h3 class="title topictitle3" id="ariaid-title6">Slow queries for Parquet tables with convert_legacy_hive_parquet_utc_timestamps=true</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The configuration setting
+ <code class="ph codeph">convert_legacy_hive_parquet_utc_timestamps=true</code> uses an underlying
+ function that can be a bottleneck on high volume, highly concurrent queries due to the
+ use of a global lock while loading time zone information. This bottleneck can cause
+ slowness when querying Parquet tables, up to 30x for scan-heavy queries. The amount of
+ slowdown depends on factors such as the number of cores and number of threads involved
+ in the query.
+ </p>
+
+ <div class="note note note_note"><span class="note__title notetitle">Note:</span>
+ <p class="p">
+ The slowdown only occurs when accessing <code class="ph codeph">TIMESTAMP</code> columns within
+ Parquet files that were generated by Hive, and therefore require the on-the-fly
+ timezone conversion processing.
+ </p>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3316" target="_blank">IMPALA-3316</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> If the <code class="ph codeph">TIMESTAMP</code> values stored in the table
+ represent dates only, with no time portion, consider storing them as strings in
+ <code class="ph codeph">yyyy-MM-dd</code> format. Impala implicitly converts such string values to
+ <code class="ph codeph">TIMESTAMP</code> in calls to date/time functions.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title7" id="known_issues_performance__ki_file_handle_cache">
+
+ <h3 class="title topictitle3" id="ariaid-title7">Interaction of File Handle Cache with HDFS Appends and Short-Circuit Reads</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If a data file used by Impala is being continuously appended or overwritten in place
+ by an HDFS mechanism, such as <span class="keyword cmdname">hdfs dfs -appendToFile</span>, interaction
+ with the file handle caching feature in <span class="keyword">Impala 2.10</span> and higher
+ could cause short-circuit reads to sometimes be disabled on some DataNodes. When a
+ mismatch is detected between the cached file handle and a data block that was
+ rewritten because of an append, short-circuit reads are turned off on the affected
+ host for a 10-minute period.
+ </p>
+
+ <p class="p">
+ The possibility of encountering such an issue is the reason why the file handle
+ caching feature is currently turned off by default. See
+ <a class="xref" href="impala_scalability.html">Scalability Considerations for Impala</a> for information about this feature and
+ how to enable it.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong>
+ <a class="xref" href="https://issues.apache.org/jira/browse/HDFS-12528" target="_blank">HDFS-12528</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Verify whether your ETL process is susceptible to this issue before
+ enabling the file handle caching feature. You can set the <span class="keyword cmdname">impalad</span>
+ configuration option <code class="ph codeph">unused_file_handle_timeout_sec</code> to a time period
+ that is shorter than the HDFS setting
+ <code class="ph codeph">dfs.client.read.shortcircuit.streams.cache.expiry.ms</code>. (Keep in mind
+ that the HDFS setting is in milliseconds while the Impala setting is in seconds.)
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong> Fixed in HDFS 2.10 and higher. Use the new HDFS parameter
+ <code class="ph codeph">dfs.domain.socket.disable.interval.seconds</code> to specify the amount of
+ time that short circuit reads are disabled on encountering an error. The default value
+ is 10 minutes (<code class="ph codeph">600</code> seconds). It is recommended that you set
+ <code class="ph codeph">dfs.domain.socket.disable.interval.seconds</code> to a small value, such as
+ <code class="ph codeph">1</code> second, when using the file handle cache. Setting <code class="ph codeph">
+ dfs.domain.socket.disable.interval.seconds</code> to <code class="ph codeph">0</code> is not
+ recommended as a non-zero interval protects the system if there is a persistent
+ problem with short circuit reads.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="known_issues_drivers__ki_drivers" id="known_issues__known_issues_drivers">
+
+ <h2 class="title topictitle2" id="known_issues_drivers__ki_drivers">Impala Known Issues: JDBC and ODBC Drivers</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues affect applications that use the JDBC or ODBC APIs, such as business
+ intelligence tools or custom-written applications in languages such as Java or C++.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title9" id="known_issues_drivers__IMPALA-1792">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title9">ImpalaODBC: Can not get the value in the SQLGetData(m-x th column) after the SQLBindCol(m th column)</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If the ODBC <code class="ph codeph">SQLGetData</code> is called on a series of columns, the function
+ calls must follow the same order as the columns. For example, if data is fetched from
+ column 2 then column 1, the <code class="ph codeph">SQLGetData</code> call for column 1 returns
+ <code class="ph codeph">NULL</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1792" target="_blank">IMPALA-1792</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Fetch columns in the same order they are defined in the table.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="known_issues_resources__ki_resources" id="known_issues__known_issues_resources">
+
+ <h2 class="title topictitle2" id="known_issues_resources__ki_resources">Impala Known Issues: Resources</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues involve memory or disk usage, including out-of-memory conditions, the
+ spill-to-disk feature, and resource management features.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title11" id="known_issues_resources__IMPALA-6028">
+
+ <h3 class="title topictitle3" id="ariaid-title11">Handling large rows during upgrade to <span class="keyword">Impala 2.10</span> or higher</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ After an upgrade to <span class="keyword">Impala 2.10</span> or higher, users who process
+ very large column values (long strings), or have increased the
+ <code class="ph codeph">--read_size</code> configuration setting from its default of 8 MB, might
+ encounter capacity errors for some queries that previously worked.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong> After the upgrade, follow the instructions in
+ <span class="xref"></span> to check if your queries are affected by these
+ changes and to modify your configuration settings if so.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Apache Issue:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6028" target="_blank">IMPALA-6028</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title12" id="known_issues_resources__IMPALA-5605">
+
+ <h3 class="title topictitle3" id="ariaid-title12">Configuration to prevent crashes caused by thread resource limits</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala could encounter a serious error due to resource usage under very high
+ concurrency. The error message is similar to:
+ </p>
+
+<pre class="pre codeblock"><code>
+F0629 08:20:02.956413 29088 llvm-codegen.cc:111] LLVM hit fatal error: Unable to allocate section memory!
+terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::thread_resource_error> >'
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-5605" target="_blank">IMPALA-5605</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> To prevent such errors, configure each host running an
+ <span class="keyword cmdname">impalad</span> daemon with the following settings:
+ </p>
+
+<pre class="pre codeblock"><code>
+echo 2000000 > /proc/sys/kernel/threads-max
+echo 2000000 > /proc/sys/kernel/pid_max
+echo 8000000 > /proc/sys/vm/max_map_count
+</code></pre>
+
+ <p class="p">
+ Add the following lines in <span class="ph filepath">/etc/security/limits.conf</span>:
+ </p>
+
+<pre class="pre codeblock"><code>
+impala soft nproc 262144
+impala hard nproc 262144
+</code></pre>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title13" id="known_issues_resources__drop_table_purge_s3a">
+
+ <h3 class="title topictitle3" id="ariaid-title13"><strong class="ph b">Breakpad minidumps can be very large when the thread count is high</strong></h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The size of the breakpad minidump files grows linearly with the number of threads. By
+ default, each thread adds 8 KB to the minidump size. Minidump files could consume
+ significant disk space when the daemons have a high number of threads.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Add
+ <samp class="ph systemoutput">--minidump_size_limit_hint_kb=size</samp>
+ to set a soft upper limit on the size of each minidump file. If the minidump file
+ would exceed that limit, Impala reduces the amount of information for each thread from
+ 8 KB to 2 KB. (Full thread information is captured for the first 20 threads, then 2 KB
+ per thread after that.) The minidump file can still grow larger than the "hinted"
+ size. For example, if you have 10,000 threads, the minidump file can be more than 20
+ MB.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Apache Issue:</strong>
+ <a class="xref" href="https://issues.cloudera.org/browse/IMPALA-3509" target="_blank">IMPALA-3509</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title14" id="known_issues_resources__IMPALA-691">
+
+ <h3 class="title topictitle3" id="ariaid-title14"><strong class="ph b">Process mem limit does not account for the JVM's memory usage</strong></h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Some memory allocated by the JVM used internally by Impala is not counted against the
+ memory limit for the impalad daemon.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> To monitor overall memory usage, use the top command, or add the
+ memory figures in the Impala web UI <strong class="ph b">/memz</strong> tab to JVM memory usage shown on the
+ <strong class="ph b">/metrics</strong> tab.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Apache Issue:</strong>
+ <a class="xref" href="https://issues.cloudera.org/browse/IMPALA-691" target="_blank">IMPALA-691</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="known_issues_correctness__ki_correctness" id="known_issues__known_issues_correctness">
+
+ <h2 class="title topictitle2" id="known_issues_correctness__ki_correctness">Impala Known Issues: Correctness</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues can cause incorrect or unexpected results from queries. They typically only
+ arise in very specific circumstances.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title16" id="known_issues_correctness__IMPALA-3094">
+
+ <h3 class="title topictitle3" id="ariaid-title16">Incorrect result due to constant evaluation in query with outer join</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ An <code class="ph codeph">OUTER JOIN</code> query could omit some expected result rows due to a
+ constant such as <code class="ph codeph">FALSE</code> in another join clause. For example:
+ </p>
+
+<pre class="pre codeblock"><code>
+explain SELECT 1 FROM alltypestiny a1
+ INNER JOIN alltypesagg a2 ON a1.smallint_col = a2.year AND false
+ RIGHT JOIN alltypes a3 ON a1.year = a1.bigint_col;
++---------------------------------------------------------+
+| Explain String |
++---------------------------------------------------------+
+| Estimated Per-Host Requirements: Memory=1.00KB VCores=1 |
+| |
+| 00:EMPTYSET |
++---------------------------------------------------------+
+
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3094" target="_blank">IMPALA-3094</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title17" id="known_issues_correctness__IMPALA-3006">
+
+ <h3 class="title topictitle3" id="ariaid-title17">Impala may use incorrect bit order with BIT_PACKED encoding</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Parquet <code class="ph codeph">BIT_PACKED</code> encoding as implemented by Impala is LSB first.
+ The parquet standard says it is MSB first.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3006" target="_blank">IMPALA-3006</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High, but rare in practice because BIT_PACKED is infrequently used,
+ is not written by Impala, and is deprecated in Parquet 2.0.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title18" id="known_issues_correctness__IMPALA-3082">
+
+ <h3 class="title topictitle3" id="ariaid-title18">BST between 1972 and 1995</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The calculation of start and end times for the BST (British Summer Time) time zone
+ could be incorrect between 1972 and 1995. Between 1972 and 1995, BST began and ended
+ at 02:00 GMT on the third Sunday in March (or second Sunday when Easter fell on the
+ third) and fourth Sunday in October. For example, both function calls should return
+ 13, but actually return 12, in a query such as:
+ </p>
+
+<pre class="pre codeblock"><code>
+select
+ extract(from_utc_timestamp(cast('1970-01-01 12:00:00' as timestamp), 'Europe/London'), "hour") summer70start,
+ extract(from_utc_timestamp(cast('1970-12-31 12:00:00' as timestamp), 'Europe/London'), "hour") summer70end;
+</code></pre>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-3082" target="_blank">IMPALA-3082</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title19" id="known_issues_correctness__IMPALA-2422">
+
+ <h3 class="title topictitle3" id="ariaid-title19">% escaping does not work correctly when occurs at the end in a LIKE clause</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If the final character in the RHS argument of a <code class="ph codeph">LIKE</code> operator is an
+ escaped <code class="ph codeph">\%</code> character, it does not match a <code class="ph codeph">%</code> final
+ character of the LHS argument.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2422" target="_blank">IMPALA-2422</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title20" id="known_issues_correctness__IMPALA-2603">
+
+ <h3 class="title topictitle3" id="ariaid-title20">Crash: impala::Coordinator::ValidateCollectionSlots</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ A query could encounter a serious error if includes multiple nested levels of
+ <code class="ph codeph">INNER JOIN</code> clauses involving subqueries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2603" target="_blank">IMPALA-2603</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+
+
+ <article class="topic concept nested1" aria-labelledby="known_issues_interop__ki_interop" id="known_issues__known_issues_interop">
+
+ <h2 class="title topictitle2" id="known_issues_interop__ki_interop">Impala Known Issues: Interoperability</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues affect the ability to interchange data between Impala and other database
+ systems. They cover areas such as data types and file formats.
+ </p>
+
+ </div>
+
+
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title22" id="known_issues_interop__describe_formatted_avro">
+
+ <h3 class="title topictitle3" id="ariaid-title22">DESCRIBE FORMATTED gives error on Avro table</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ This issue can occur either on old Avro tables (created prior to Hive 1.1) or when
+ changing the Avro schema file by adding or removing columns. Columns added to the
+ schema file will not show up in the output of the <code class="ph codeph">DESCRIBE FORMATTED</code>
+ command. Removing columns from the schema file will trigger a
+ <code class="ph codeph">NullPointerException</code>.
+ </p>
+
+ <p class="p">
+ As a workaround, you can use the output of <code class="ph codeph">SHOW CREATE TABLE</code> to drop
+ and recreate the table. This will populate the Hive metastore database with the
+ correct column definitions.
+ </p>
+
+ <div class="note warning note_warning"><span class="note__title warningtitle">Warning:</span>
+ <div class="p">
+ Only use this for external tables, or Impala will remove the data files. In case of
+ an internal table, set it to external first:
+<pre class="pre codeblock"><code>
+ALTER TABLE table_name SET TBLPROPERTIES('EXTERNAL'='TRUE');
+</code></pre>
+ (The part in parentheses is case sensitive.) Make sure to pick the right choice
+ between internal and external when recreating the table. See
+ <a class="xref" href="impala_tables.html#tables">Overview of Impala Tables</a> for the differences between internal and
+ external tables.
+ </div>
+ </div>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title23" id="known_issues_interop__IMP-175">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title23">Deviation from Hive behavior: Out of range values float/double values are returned as maximum allowed value of type (Hive returns NULL)</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala behavior differs from Hive with respect to out of range float/double values.
+ Out of range values are returned as maximum allowed value of type (Hive returns NULL).
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> None
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title24" id="known_issues_interop__flume_writeformat_text">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title24">Configuration needed for Flume to be compatible with Impala</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ For compatibility with Impala, the value for the Flume HDFS Sink
+ <code class="ph codeph">hdfs.writeFormat</code> must be set to <code class="ph codeph">Text</code>, rather than
+ its default value of <code class="ph codeph">Writable</code>. The <code class="ph codeph">hdfs.writeFormat</code>
+ setting must be changed to <code class="ph codeph">Text</code> before creating data files with
+ Flume; otherwise, those files cannot be read by either Impala or Hive.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong> This information has been requested to be added to the upstream
+ Flume documentation.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title25" id="known_issues_interop__IMPALA-635">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title25">Avro Scanner fails to parse some schemas</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Querying certain Avro tables could cause a crash or return no rows, even though Impala
+ could <code class="ph codeph">DESCRIBE</code> the table.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-635" target="_blank">IMPALA-635</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Swap the order of the fields in the schema specification. For
+ example, <code class="ph codeph">["null", "string"]</code> instead of <code class="ph codeph">["string",
+ "null"]</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong> Not allowing this syntax agrees with the Avro specification, so it
+ may still cause an error even when the crashing issue is resolved.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title26" id="known_issues_interop__IMPALA-1024">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title26">Impala BE cannot parse Avro schema that contains a trailing semi-colon</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If an Avro table has a schema definition with a trailing semicolon, Impala encounters
+ an error when the table is queried.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1024" target="_blank">IMPALA-1024</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> Remove trailing semicolon from the Avro schema.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title27" id="known_issues_interop__IMPALA-1652">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title27">Incorrect results with basic predicate on CHAR typed column</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ When comparing a <code class="ph codeph">CHAR</code> column value to a string literal, the literal
+ value is not blank-padded and so the comparison might fail when it should match.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1652" target="_blank">IMPALA-1652</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Use the <code class="ph codeph">RPAD()</code> function to blank-pad literals
+ compared with <code class="ph codeph">CHAR</code> columns to the expected length.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title28" id="known_issues__known_issues_limitations">
+
+ <h2 class="title topictitle2" id="ariaid-title28">Impala Known Issues: Limitations</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues are current limitations of Impala that require evaluation as you plan how
+ to integrate Impala into your data management workflow.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title29" id="known_issues_limitations__IMPALA-4551">
+
+ <h3 class="title topictitle3" id="ariaid-title29">Set limits on size of expression trees</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Very deeply nested expressions within queries can exceed internal Impala limits,
+ leading to excessive memory usage.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-4551" target="_blank">IMPALA-4551</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Severity:</strong> High
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Avoid queries with extremely large expression trees. Setting the
+ query option <code class="ph codeph">disable_codegen=true</code> may reduce the impact, at a cost of
+ longer query runtime.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title30" id="known_issues_limitations__IMPALA-77">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title30">Impala does not support running on clusters with federated namespaces</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Impala does not support running on clusters with federated namespaces. The
+ <code class="ph codeph">impalad</code> process will not start on a node running such a filesystem
+ based on the <code class="ph codeph">org.apache.hadoop.fs.viewfs.ViewFs</code> class.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-77" target="_blank">IMPALA-77</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Anticipated Resolution:</strong> Limitation
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Use standard HDFS on all Impala nodes.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title31" id="known_issues__known_issues_misc">
+
+ <h2 class="title topictitle2" id="ariaid-title31">Impala Known Issues: Miscellaneous</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues do not fall into one of the above categories or have not been categorized
+ yet.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title32" id="known_issues_misc__IMPALA-2005">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title32">A failed CTAS does not drop the table if the insert fails</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If a <code class="ph codeph">CREATE TABLE AS SELECT</code> operation successfully creates the target
+ table but an error occurs while querying the source table or copying the data, the new
+ table is left behind rather than being dropped.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-2005" target="_blank">IMPALA-2005</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Drop the new table manually after a failed <code class="ph codeph">CREATE TABLE AS
+ SELECT</code>.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title33" id="known_issues_misc__IMPALA-1821">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title33">Casting scenarios with invalid/inconsistent results</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ Using a <code class="ph codeph">CAST()</code> function to convert large literal values to smaller
+ types, or to convert special values such as <code class="ph codeph">NaN</code> or
+ <code class="ph codeph">Inf</code>, produces values not consistent with other database systems. This
+ could lead to unexpected results from queries.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-1821" target="_blank">IMPALA-1821</a>
+ </p>
+
+
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title34" id="known_issues_misc__IMPALA-941">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title34">Impala Parser issue when using fully qualified table names that start with a number</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ A fully qualified table name starting with a number could cause a parsing error. In a
+ name such as <code class="ph codeph">db.571_market</code>, the decimal point followed by digits is
+ interpreted as a floating-point number.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-941" target="_blank">IMPALA-941</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Surround each part of the fully qualified name with backticks
+ (<code class="ph codeph">``</code>).
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title35" id="known_issues_misc__IMPALA-532">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title35">Impala should tolerate bad locale settings</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ If the <code class="ph codeph">LC_*</code> environment variables specify an unsupported locale,
+ Impala does not start.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-532" target="_blank">IMPALA-532</a>
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Add <code class="ph codeph">LC_ALL="C"</code> to the environment settings for
+ both the Impala daemon and the Statestore daemon. See
+ <a class="xref" href="impala_config_options.html#config_options">Modifying Impala Startup Options</a> for details about modifying
+ these environment settings.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Resolution:</strong> Fixing this issue would require an upgrade to Boost 1.47 in the
+ Impala distribution.
+ </p>
+
+ </div>
+
+ </article>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title36" id="known_issues_misc__IMP-1203">
+
+
+
+ <h3 class="title topictitle3" id="ariaid-title36">Log Level 3 Not Recommended for Impala</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ The extensive logging produced by log level 3 can cause serious performance overhead
+ and capacity issues.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Workaround:</strong> Reduce the log level to its default value of 1, that is,
+ <code class="ph codeph">GLOG_v=1</code>. See <a class="xref" href="impala_logging.html#log_levels">Setting Logging Levels</a> for
+ details about the effects of setting different logging levels.
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+ <article class="topic concept nested1" aria-labelledby="ariaid-title37" id="known_issues__known_issues_crash">
+
+ <h2 class="title topictitle2" id="ariaid-title37">Impala Known Issues: Crashes and Hangs</h2>
+
+ <div class="body conbody">
+
+ <p class="p">
+ These issues can cause Impala to quit or become unresponsive.
+ </p>
+
+ </div>
+
+ <article class="topic concept nested2" aria-labelledby="ariaid-title38" id="known_issues_crash__impala-6841">
+
+ <h3 class="title topictitle3" id="ariaid-title38">Unable to view large catalog objects in catalogd Web UI</h3>
+
+ <div class="body conbody">
+
+ <p class="p">
+ In <code class="ph codeph">catalogd</code> Web UI, you can list metadata objects and view their
+ details. These details are accessed via a link and printed to a string formatted using
+ thrift's <code class="ph codeph">DebugProtocol</code>. Printing large objects (> 1 GB) in Web UI can
+ crash <code class="ph codeph">catalogd</code>.
+ </p>
+
+ <p class="p">
+ <strong class="ph b">Bug:</strong> <a class="xref" href="https://issues.apache.org/jira/browse/IMPALA-6841" target="_blank">IMPALA-6841</a>
+ </p>
+
+ </div>
+
+ </article>
+
+ </article>
+
+</article></main></body></html>
[09/51] [partial] impala git commit: [DOCS] Impala doc site update
for 3.0
Posted by mi...@apache.org.
http://git-wip-us.apache.org/repos/asf/impala/blob/fae51ec2/docs/build3x/html/topics/impala_reserved_words.html
----------------------------------------------------------------------
diff --git a/docs/build3x/html/topics/impala_reserved_words.html b/docs/build3x/html/topics/impala_reserved_words.html
new file mode 100644
index 0000000..3676084
--- /dev/null
+++ b/docs/build3x/html/topics/impala_reserved_words.html
@@ -0,0 +1,3853 @@
+<!DOCTYPE html
+ SYSTEM "about:legacy-compat">
+<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><meta charset="UTF-8"><meta name="copyright" content="(C) Copyright 2018"><meta name="DC.rights.owner" content="(C) Copyright 2018"><meta name="DC.Type" content="concept"><meta name="prodname" content="Impala"><meta name="prodname" content="Impala"><meta name="version" content="Impala 3.0.x"><meta name="version" content="Impala 3.0.x"><meta name="DC.Format" content="XHTML"><meta name="DC.Identifier" content="reserved_words"><link rel="stylesheet" type="text/css" href="../commonltr.css"><title>Impala Reserved Words</title></head><body id="reserved_words"><main role="main"><article role="article" aria-labelledby="ariaid-title1">
+
+ <h1 class="title topictitle1" id="ariaid-title1">Impala Reserved Words</h1>
+
+
+ <div class="body conbody">
+
+ <p class="p">
+ This topic lists
+ the reserved words in Impala.
+ </p>
+ <div class="p">
+ A reserved word is one that cannot be used directly as an identifier. If
+ you need to use it as an identifier, you must quote it with backticks.
+ For example:
+ <ul class="ul">
+ <li class="li">
+ <code class="ph codeph">CREATE TABLE select (x INT)</code>: fails
+ </li>
+ <li class="li">
+ <code class="ph codeph">CREATE TABLE `select` (x INT)</code>: succeeds
+ </li>
+ </ul>
+ </div>
+
+ <p class="p">
+ Because different database systems have different sets of reserved words,
+ and the reserved words change from release to release, carefully consider
+ database, table, and column names to ensure maximum compatibility between
+ products and versions.
+ </p>
+
+ <p class="p">
+ Also consider whether your object names are the same as any Hive
+ keywords, and rename or quote any that conflict as you might switch
+ between Impala and Hive when doing analytics and ETL. Consult the
+ <a class="xref" href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Keywords,Non-reservedKeywordsandReservedKeywords" target="_blank">list of Hive keywords</a>.
+ </p>
+ <p class="p">
+ To future-proof your code, you should avoid additional words in case they
+ become reserved words if Impala adds features in later releases. This kind
+ of planning can also help to avoid name conflicts in case you port SQL
+ from other systems that have different sets of reserved words. The Future
+ Keyword column in the table below indicates those additional words that
+ you should avoid for table, column, or other object names, even though
+ they are not currently reserved by Impala.
+ </p>
+ <p class="p">
+ The following is a summary of the process for deciding whether a
+ particular SQL 2016 word is to be reserved in Impala.
+ </p>
+ <ul class="ul">
+ <li class="li">
+ By default, Impala targets to have the same list of reserved words as
+ SQL 2016.
+ </li>
+ <li class="li">
+ At the same time, to be compatible with earlier versions of Impala
+ and to avoid breaking existing tables/workloads, Impala built-in
+ function names are removed from the reserved words list, e.g. COUNT,
+ AVG, as Impala generally does not need to reserve the names of built-in
+ functions for parsing to work.
+ </li>
+ <li class="li">
+ For those remaining SQL 2016 reserved words, if a word is likely to be
+ in-use by users of older Impala versions and if there is a low chance of
+ Impala needing to reserve that word in the future, then the word is not
+ reserved.
+ </li>
+ <li class="li">
+ Otherwise, the word is reserved in Impala.
+ </li>
+ </ul>
+
+ <p class="p toc inpage"></p>
+
+ </div>
+
+<article class="topic concept nested1" aria-labelledby="ariaid-title2" id="reserved_words__reserved_words_current">
+<h2 class="title topictitle2" id="ariaid-title2">List of Reserved Words</h2>
+<div class="body conbody">
+
+ <div class="p">
+ <table dir="ltr" class="table frame-all" id="reserved_words_current__table_lfw_pjs_cdb"><caption></caption><colgroup><col><col><col><col><col></colgroup><tbody class="tbody">
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Keyword</strong></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">SQL:2016</strong></p></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">Impala 2.12 and
+ lower</strong></p></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Reserved</strong><p class="p"><strong class="ph b">in</strong></p><p class="p"><strong class="ph b">Impala 3.0 and
+ higher</strong></p></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><strong class="ph b">Future Keyword</strong></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">abs</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">acos</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">add</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">aggregate</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">all</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">allocate</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">alter</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">analytic</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">and</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">anti</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">any</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">api_version</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">are</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array_agg</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">array_max_cardinality</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">as</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asc</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asensitive</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asin</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">asymmetric</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">at</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">atan</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">atomic</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">authorization</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">avg</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">avro</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">backup</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin_frame</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">begin_partition</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">between</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">bigint</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">binary</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">blob</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">block_size</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">boolean</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">both</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">break</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">browse</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">bulk</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">by</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cache</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cached</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">call</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">called</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cardinality</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cascade</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cascaded</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">case</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cast</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">ceil</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">ceiling</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">change</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">char</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">char_length</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">character</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">character_length</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">check</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">checkpoint</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">class</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">classifier</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">clob</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">close</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">close_fn</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">clustered</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">coalesce</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">collate</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">collect</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">column</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">columns</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">comment</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">commit</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">compression</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">compute</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">condition</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">conf</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">connect</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">constraint</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">contains</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">continue</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">convert</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">copy</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">corr</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">corresponding</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cos</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cosh</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">count</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">covar_pop</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">covar_samp</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">create</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cross</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cube</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cume_dist</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_catalog</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_date</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_default_transform_group</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_path</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_role</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_row</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_schema</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_time</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_timestamp</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_transform_group_for_type</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">current_user</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cursor</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">cycle</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">data</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">database</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">databases</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">date</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">datetime</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">day</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dayofweek</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dbcc</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deallocate</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dec</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">decfloat</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">decimal</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">declare</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">default</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">define</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">delete</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">delimited</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dense_rank</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deny</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deref</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">desc</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">describe</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">deterministic</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">disconnect</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">disk</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">distinct</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">distributed</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">div</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">double</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">drop</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dump</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">dynamic</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">each</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">element</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">else</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">empty</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">encoding</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end-exec</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end_frame</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">end_partition</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">equals</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">errlvl</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">escape</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">escaped</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">every</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">except</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exchange</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exec</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">execute</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exists</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exit</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">exp</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">explain</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ </tr>
+ <tr class="row">
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"><code class="ph codeph">extended</code></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1">X</td>
+ <td class="entry cellrowborder align-left colsep-1 rowsep-1"></td>
+
<TRUNCATED>